Towards the creation of synthetic Escherichia coli via Tryptophan and Methionine substitutions

vorgelegt von Master of Science (M. Sc.)

Isabella Tolle

an der Fakultät II – Mathematik und Naturwissenschaften der Technischen Universität Berlin zur Erlangung des akademischen Grades

Doktor der Naturwissenschaften Dr. rer. nat.

genehmigte Dissertation

Promotionsausschuss:

Vorsitzender: Prof. Dr. Reinhard Schomäcker Gutachter: Prof. Dr. Nediljko Budisa Gutachter: Prof. Dr. Thomas Friedrich Gutachterin: Prof. Dr. Zoya Ignatova

Tag der wissenschaftlichen Aussprache: 04. Juni 2021

Berlin 2021

Danksagung

Mein besonderer Dank gilt meinem Betreuer Herrn Prof. Dr. Nediljko Budisa für seine Unterstützung und die Überlassung dieses spannenden Promotionsthemas. Es erfüllt mich mit Stolz, dass er mir die Bearbeitung dieser Forschungsfrage anvertraute, aber es erfüllt mich mit noch größerer Dankbarkeit, dass er mir weitestgehend meinen eigenen, souveränen Umgang damit überließ. Ich habe diesen Vertrauensvorschuss jederzeit gespürt und hoffe, ihm angemessen gerecht geworden zu sein.

Ich möchte mich auch herzlich bei Prof. Dr. Thomas Friedrich und Prof. Dr. Zoya Ignatova für die Übernahme des Gutachtens meiner Dissertation, sowie bei Prof. Dr. Reinhard Schomäcker für die Übernahme des Vorsitzes meiner wissenschaftlichen Aussprache bedanken.

Weiterhin möchte ich mich bei allen Mitgliedern des Arbeitskreises Biokatalyse für das hervorragende Arbeitsklima bedanken, das sich besonders durch gegenseitige Hilfsbereitschaft, professionelle Kollegialität, aber auch durch freundschaftlichen Umgang auszeichnete. Insbesondere möchte ich mich bei Dr. Stefan Oehm für die nette Aufnahme in die Arbeitsgruppe, die Einarbeitung in mein Forschungsthema und nicht zuletzt auch dafür bedanken, dass er den infantilen Humor im Büro L111 salonfähig gemacht hat. Meinen Büro-Weggefährten Dr. Matthias Hauf, Dr, Jessica Nickling, Christin Treiber-Kleinke und Maxi Marock danke ich für die vielen humorvollen Gespräche (und die bisweilen eigenartige akustische Untermalung). Jessi danke ich darüber hinaus für ihre Motivation zu sportlichen Aktivitäten, welche ein wichtiger Ausgleich zum Laboralltag darstellten. Mein besonderer Dank gilt Christin, und zwar weit über ihre Arbeit bei der Generierung des Met-auxotrophen Stammes hinaus: Sie ist eine treue und zuverlässige Kollegin und bewies dies unzählige Male durch ihre selbstlose Unterstützung sowohl im Laboralltag als auch darüber hinaus.

Dr. Tobias Schneider (aka Tobi II) danke ich herzlich für die Synthese von Trifluormethionin und seinen hilfreichen Ratschlägen zu allem Chemischen, die er stets mit seiner positiven, humorvollen Art erteilte.

Ich bedanke mich auch bei allen Mitgliedern der „Gourmet Lunch-Gruppe“, Dr. Ying Ma, Dr. Federica Agostini, Tuyet Mai Thi To und Georg Johannes Freiherr von Sass, für die vielseitigen, internationalen und köstlichen Mahlzeiten, welche dem Namen der Gruppe alle Ehre gemacht haben. Unsere anregenden und keineswegs ausschließlich intellektuellen Gespräche bei leckerem Essen haben jede Mittagspause zu einem Highlight werden lassen.

Mai, Hannes, Christin, Fede und Dr. Tobias Baumann danke ich aber hauptsächlich für die vielen produktiven wissenschaftlichen Diskussionen, sowie für ihren stets wertvollen und geistreichen Input zu meinen Forschungsfragen. Besonders diese Personen boten darüber hinaus die wichtige seelische Unterstützung, die im Laufe einer Promotion bisweilen unverzichtbar ist.

Besonders möchte ich mich bei Ying, Maxi, Tobi, Matze, Hannes, Fede, Mai und Christin für ihre wunderbare Freundschaft bedanken. Ihr habt das Büro – und Berlin – in allen Belangen noch besser gemacht!

Weiterhin möchte ich mich bei meiner besten Freundin, Manni, und der Kanutentruppe mit Max, Tabea, Tobi und Verena für ihre langjährige Freundschaft bedanken. Ich bedanke mich auch bei meinen Kommilitoninnen und Freundinnen, Lena und Daniela, die mit ihrer Unterstützung während des Bachelor Studiums den Grundstein für meinen beruflichen Werdegang mitgelegt haben und mich auch jetzt nach all den Jahren noch freundschaftlich begleiten.

Ganz besonders möchte ich auch meinem Bruder Fabian danken, zu dem ich schon mein ganzes Leben lang mit Bewunderung aufblicke und dem ich stets vertrauen kann. Vielen Dank für das kritische

Gegenlesen der Rohfassungen dieser Arbeit sowie auch jeder anderen meiner wissenschaftlichen Arbeiten!

Nicht zuletzt möchte ich auch meinen liebevollen Eltern und meinem ebenso liebevollen Partner Flo für ihren bedingungslosen und unermüdlichen Beistand in jeder Lebenslage danken. Ihr seid der Grund, warum ich ein nahezu sorgenfreies Leben führen durfte und mich komplett auf meine Promotion konzentrieren konnte.

Abstract

Billions of years of evolution have produced extant living organisms with a vast biodiversity and the ability to adapt to changing environments. At the foundation of it all lies the central dogma of molecular biology, according to which the information flows from the information storage polymer DNA to RNA (“informational polymers”), which is finally translated into proteins (“catalytic polymers”). This fundamental translation process relies on the universal standard genetic code, that assigns nucleobase triplets to the 20 proteinogenic amino acids. However, even after some six decades of research and the formulation of various theories and models, the origin and evolution of the standard genetic code remain an enigma and a comprehensive and conclusive story has yet to be assembled. The discovery of the 21st and 22nd amino acids selenocysteine and pyrrolysine imply a certain flexibility of the genetic code, which is further affirmed by the co-translational incorporation of over 200 noncanonical amino acids (ncAAs) into proteins over the last decades, culminating in the proteome- wide replacement of the latest addition to the genetic code, Trp, with its close structural analog L-β-(thieno[3,2-b]pyrrolyl)alanine (3,2[Tpa]). During this study, it was attempted to further alienate this strain, designated as TUB170, from life as we know it, by turning a tolerance towards [3,2]Tpa into an addiction. In TUB170 [3,2]Tpa incorporation relies on the catalytic promiscuity of the endogenous tryptophanyl-tRNA synthesis, which, in addition to charging its cognate tRNA with the canonical amino acid Trp, also charges the analog [3,2]Tpa to said tRNA. By replacing this with an enzyme capable of discriminating between these two amino acids, a biocontained organism dependent on a synthetic nutrient might be engineered. To this end, multiple enzyme libraries from different organisms were designed and assembled via site-saturation mutagenesis as well as error-prone PCR. They were screened employing diverse experimental parameters with varying stringencies. Nevertheless, an aminoacyl-tRNA synthetase that exclusively incorporates [3,2]Tpa could not be selected, which might be attributed to the close structural resemblance of the analog to its counterpart Trp, as this is what drove the choice of analog for the adaptation experiment of TUB170. Another approach towards the creation of synthetic life might be through the replacement of further canonical amino acids, thereby advancing our understanding of the genetic code and its flexibility, as well as the interplay of diverse cellular processes. For the adaptation of an E. coli strain towards utilization of Met analogs, a Met-auxotrophy robust under all cultivation conditions was established in the laboratory wildtype strain MG1655. Furthermore, this strain was optimized for ethionine (Eth) turnover to S-adenosyl ethionine to promote transethylation reactions as a substitute for transmethylation, as Met functions as the precursor for the major cellular methyl donor. After 31 passages of continuous cultivation, an increase in general fitness could be observed, as evinced by a more stable number of colony forming units compared to those produced prior to the adaptation. These results suggest that adaptation of a strain tolerant towards the replacement of methionine with its synthetic counterpart ethionine might be feasible.

Zusammenfassung

Milliarden von Jahren der Evolution haben Lebewesen mit einer erstaunlichen Variation und der Fähigkeit, sich ständig wechselnden Umgebungen und Lebensbedingungen anzupassen, hervorgebracht. Das Fundament dieses Lebens liegt beim zentralen Dogma der Molekularbiologie, nach dem die Information ausgehend vom informationsspeichernden Polymer DNS über die RNS („Informationspolymere“) fließt und letztendlich in ein Protein („katalytisches Polymer“) übersetzt wird. Diesem fundamentalen Übersetzungsprozess, der Proteintranslation, liegt der universale genetische Code zugrunde, welcher Basentripletts den 20 proteinogenen Aminosäuren zuordnet. Allerdings bleiben die Entstehung und Evolution des genetischen Codes trotz sechs Jahrzehnten der Forschung und der Formulierung zahlreicher Theorien und Modelle weiterhin ein Mysterium. Die Entdeckungen der 21. und 22. Aminosäuren, Selenocystein und Pyrrolysin, implizieren eine gewisse Flexibilität des genetischen Codes, welche weiterhin durch den co-translationalen Einbau von über 200 nichtkanonischen Aminosäuren (nkAS) in Proteine bestätigt wird und im proteomweiten Austausch der kanonischen Aminosäure Trp mit ihrem strukturellen Analogon L-β-(Thieno[3,2-b]pyrrolyl)alanin (3,2[Tpa]) kulminiert. Im Rahmen dieser Doktorarbeit wurde versucht, die Toleranz dieses Stammes zur synthetischen Aminosäure [3,2]Tpa in ein Abhängigkeitsverhältnis umzuwandeln und den Stamm dabei weiter von einem natürlichen zu einem synthetischen Organismus zu verfremden. In diesem als TUB170 bezeichneten Stamm beruht der Einbau der nichtkanonischen Aminosäure auf der Tatsache, dass das endogene Enzym Tryptophanyl-tRNA-Synthetase zusätzlich zum natürlichen Substrat Trp, auch das Analogon [3,2]Tpa auf die zugehörige tRNA lädt. Durch den Austausch dieses Enzyms mit einem Enzym, welches zwischen den beiden Aminosäuren diskriminieren kann, könnte ein Organismus erschaffen werden, welcher zum Überleben vollständig von einem synthetischen Substrat abhängig ist. Hierzu wurden mehrere Enzymbibliotheken aus unterschiedlichen Organismen entworfen und mit Hilfe verschiedener Mutationstechniken generiert. Diese Bibliotheken wurden anschließend unter der Variation diverser Versuchsparameter gescreent. Jedoch konnte keine Aminoacyl-tRNA-Synthetase, welche ausschließlich [3,2]Tpa einbaut, selektiert werden. Dies ist möglicherweise der großen strukturellen Ähnlichkeit des Analogons zu seinem Gegenstück Trp zuzuschreiben, welche die Wahl auf dieses spezifische Analogon für das Adaptationsexperiment lenkte. Eine andere Herangehensweise zur Erschaffung synthetischen Lebens könnte die Substitution weiterer kanonischer Aminosäuren sein, wodurch wir unser Verständnis des genetischen Codes und seiner Flexibilität, sowie des Zusammenspiels verschiedener zellulärer Prozesse vorantreiben könnten. Zur Adaptation E. colis an Methionin-Analoga wurde eine Met-Auxotrophie im Wildtypstamm MG1655 etabliert, welche unter allen Kultivierungsbedingungen robust ist. Weiterhin, wurde dieser Stamm zur Umsetzung Ethionins in S-Adenosylethionin optimiert, um die Substitution von Transmethylierungsreaktionen durch Transethylierungen zu unterstützen, da Met als Vorstufe des Haptmethyldonors fungiert. Nach 31 Passagen fortlaufender Kultivierung, konnte über eine stabilere Anzahl der koloniebildenden Einheiten in der Gegenwart von Ethionin eine Verbesserung der allgemeinen Fitness beobachtet werden. Diese Ergebnisse suggerieren, dass eine Adaptation E. colis an das synthetische Substrat Ethionin möglich sein könnte.

I Table of Contents

II Abbreviations ...... III 1 Introduction ...... 1 1.1 Genetic Code Origin and Evolution ...... 1 1.1.1 Trp and Met: Two of the Latest Additions to the Genetic Code ...... 6 1.1.2 S-Adenosylmethionine (SAM) ...... 8 1.2 Xenobiology ...... 10 1.2.1 Genetic Code Engineering and Expansion ...... 11 1.2.2 Biocontainment ...... 19 1.3 Adaptive Laboratory Evolution (ALE)...... 20 1.3.1 Adaptation towards [3,2]Tp usage ...... 22 2 Aim of this Study...... 24 2.1 Evolution of bacterial strains toward methionine analog usage ...... 24 2.2 Biocontainment of TUB170 ...... 25 3 Results and Discussion ...... 27 3.1 Evolution of bacterial strains toward methionine analog usage ...... 27 3.1.1 Establishing Met-auxotrophy ...... 27 3.1.2 Choice of analogs ...... 30 3.1.3 ALE starting conditions ...... 35 3.1.4 Adaptive Laboratory Evolution (ALE) ...... 37 3.1.5 Plug and play with metK ...... 41 3.1.6 ALE 2.0 ...... 51 3.2 Biocontainment of TUB170 ...... 55 3.2.1 M. mazei PylRS Library ...... 57 3.2.2 E. coli TrpRS Library ...... 63 3.2.3 M. jannaschii TyrRS Library ...... 71 4 Conclusion and Outlook ...... 87 4.1 Evolution of bacterial strains toward methionine analog usage ...... 87 4.1 Biocontainment of TUB170 ...... 89 5 Materials and Methods ...... 92 5.1 Materials ...... 92 5.1.1 Chemicals ...... 92 5.1.2 Media and supplements ...... 92 5.1.3 Strains ...... 94 5.1.4 Plasmids ...... 95 5.1.5 Primers ...... 96

I

5.1.6 Biomolecular reagents, , and kits ...... 98 5.1.7 Buffers and Solutions ...... 99 5.1.8 Miscellaneous ...... 102 5.1.9 Technical equipment ...... 102 5.2 Methods ...... 104 5.2.1 Polymerase chain reaction (PCR) ...... 104 5.2.2 DNA purification and Gel extraction ...... 105 5.2.3 Protein expression ...... 106 5.2.4 Protein purification ...... 106 5.2.5 Agarose gel electrophoresis ...... 107 5.2.6 Polyacrylamide gel electrophoresis ...... 107 5.2.7 Restriction digest ...... 107 5.2.8 Ligation ...... 108 5.2.9 Assembly of aaRS Libraries ...... 108 5.2.10 Double-sieve selection ...... 109 5.2.11 Fluorescence readout ...... 110 5.2.12 Expression of tryptophan synthase ...... 110 5.2.13 Enzymatic [3,2]Tpa synthesis ...... 111 5.2.14 Production of Competent Cells ...... 111 5.2.15 Bacterial transformation ...... 112 5.2.16 Isolation of plasmid DNA ...... 112 5.2.17 Genome engineering ...... 112 5.2.18 DNA concentration measurements ...... 116 5.2.19 Protein concentration measurements ...... 117 5.2.20 Sequencing ...... 117 5.2.21 Mass spectrometry (MS) ...... 118 6 Bibliography ...... 119 7 List of Figures ...... 148 8 List of Tables ...... 155 9 Appendix ...... 156 9.1 Heterologous MAT expression ...... 156 9.2 ALE 2.0 ...... 159 9.3 M. mazei PylRS Library ...... 163 9.4 SCS of sfGFP(R2TAG) ...... 164 9.5 Enzymatic [3,2]Tpa synthesis ...... 165

II

II Abbreviations

[3,2]Tp L-β-(thieno[3,2-b]pyrrole [3,2]Tpa L-β-(thieno[3,2-b]pyrrolyl)alanine A absorption AA amino acid aaRS aminoacyl-tRNA synthetase ALE adaptive laboratory evolution Amp ampicillin AMP adenosine monophosphate ATP adenosine triphosphate bp base pair c concentration cAA canonical amino acid CAGO CRISPR/Cas9-assisted gRNA-free one-step Cas CRISPR-associated cat chloramphenicol resistance gene CAT chloramphenicol acetyl CDS coding DNA sequence CFU colony forming unit Cm chloramphenicol CP1 connective peptide 1 CRISPR Clustered Regularly Interspaced Short Palindromic Repeats ºC degree Celsius Da Dalton (1.66018 × 10-24 g) dATP deoxyadenosine triphosphate dCTP deoxycitidine triphosphate ddNTP dideoxyribonucleotide triphosphate dGTP deoxyguanosine triphosphate dH2O autoclaved distilled water DMSO dimethyl sulfoxide DNA deoxyribonucleic acid dNTP deoxyribonucleotide triphosphate ds double-stranded DTT Dithiotreitol

III

dTTP deoxythymidine triphosphate E. coli Escherichia coli Ec Escherichia coli EDTA ethylene-diamine-tetraacetic acid EF-Tu elongation factor Tu em emission EP-PCR error-prone polymerase chain reaction ESI electron spray ionization et al. et alii EtBr ethidiumbromide EtOH ethanol ε280 molar extinction coefficient at λ = 280 nm fwd forward g gram ·g multiples of the standard gravity GMO genetically modified protein h hour HCl hydrochloric acid HGT horizontal gene transfer IPTG isopropyl-beta-D-thiogalactopyranoside K kilo Kan kanamycin kb kilobasepairs L liter LTEE long-term evolution experiment LUCA last universal common ancestor λ wavelength M molar m milli MAT methionine adenosyl transferase max maximum, maximal MetO methionine sulfoxide MetRS methionyl-tRNA synthetase min minute MS mass spectrometry

IV

MTase methionine transferase µ micro M. barkeri Methanosarcina barkeri Mb Methanosarcina barkeri M. jannaschii Methanocaldococcus jannaschii Mj Methanocaldococcus jannaschii M. mazei Methanosarcina mazei Mm Methanosarcina mazei n nano NaCl sodium chloride NaOH sodium hydroxide ncAA non-canonical amino acid nt nucleotide OD optical density

OD600 optical density at λ = 600 nm ori origin of replication OTS orthogonal translation system PAGE polyacrylamide gel electrophoresis PCR polymerase chain reaction PDB PLP pyridoxal phosphate PMSF phenylmethylsulfonyl fluoride PPi pyrophosphate Pyl pyrrolysine PylRS pyrrolysyl-tRNA synthetase pylT pyrrolysyl-tRNA gene pylS pyrrolysyl-tRNA synthetase gene R resistant, resistance RBS ribosome REase restriction endonuclease rev reverse RF1 release factor 1 RNA ribonucleic acid RT room temperature s second

V

SAM S-adenosyl methionine SCS stop codon suppression sfGFP super folder green fluorescent protein SGC standard genetic code SPI selective pressure incorporation SSM site-saturation mutagenesis T temperature TfMet trifluoromethionine Tris tris(hydroxymethyl)aminomethane Trp tryptophan TrpRS tryptophanyl-tRNA synthetase tRNA transfer ribonucleic acid TyrRS tyrosyl-tRNA synthetase UV ultra violet V volume v/v volume per volume w/v weight per volume w/w weight per weight wt wild-type XNA xeno nucleic acid

VI 1 Introduction

1 Introduction

„Wir wollen nicht nur wissen, wie die Natur ist (und wie ihre Vorgänge ablaufen), sondern wir wollen auch nach Möglichkeit das vielleicht utopisch und anmassend erscheinende Ziel erreichen, zu wissen, warum die Natur so und nicht anders ist.“

“We not only want to know how nature is organized (and how natural phenomena proceed), but also as far as possible to gain the aim, which may look Utopian and impudent, to find out why the nature is just such and not another.”

Albert Einstein, Über den gegenwärtigen Stand der Feld-Theorie (in: „Festschrift Prof. Dr. A. Stodola zum 70. Geburtstag“)1

1.1 Genetic Code Origin and Evolution

The mechanisms of evolution yielded extant living organisms with immense biodiversity and the potential to adapt and even further evolve in continuously varying environments. At the basis of it lies protein translation according to the genetic code with its components; mRNA, tRNA, aminoacyl-tRNA synthetases (aaRSs), and the ribosome. A closer look at the genetic code prompts several questions: Why are amino acids encoded by triplets and why are the codon assignments what they are today? What is the mechanism behind these assignments? Why are there 20 universal proteinogenic amino acids and why are these specific amino acids used? Why is the standard genetic code (SGC) universal? These musings have occupied scientists for the past six decades and have led to numerous hypotheses regarding the origin as well as the evolution of the genetic code. Some of the most common theories are outlined below. The stereochemical theory first proposed by Gamow in 1954 aims to explain the assignment of amino acids to their cognate triplets2. It was further developed by Woese in 1966 upon the observation that the amino acids exhibit different mobility in paper chromatography with pyridine as the mobile phase. It was proposed that distinct amino acids might also display differing affinities towards the four nucleotides and that therefore amino acids might have been assigned to their cognate triplet (codon or anticodon) through direct interactions3,4. The theory claims support by experiments conducted with random aptamer libraries. After selection for amino acid binding, enrichment of cognate triplets was observed for some amino acids5–7. However, in some cases, the codon and in other cases the anticodon was enriched, and the amino acids with the strongest affinities are those commonly thought to be late additions to the genetic code due to their complex biosyntheses. While the larger and more complex sidechains of these amino acids offer wonderful handles for interactions with nucleic acids, it contradicts the notion that amino acids first entered the genetic code because of their affinity to their cognate triplets8. In his seminal paper from 1968 Crick presents his “frozen accident” theory, wherein he proposes that once the genetic code had been defined, already a single codon reassignment would have been deleterious to the translation process and thereby the fidelity and viability of the organism9. This notion is supported by the fact that the SGC has barely changed since the emergence of the last universal common ancestor (LUCA) at least 3 billion years ago10,11. But why was there only one frozen accident? Why are there not multiple codes of high fitness, separated by valleys of low fitness12? The answer likely lies within the existence of horizontal gene transfer (HGT), i. e. the lateral, inter-organismal

1 1.1 Genetic Code Origin and Evolution propagation of genetic information via plasmids, transposons, or viruses. While HGT confers vulnerability towards parasitic genetic elements, it provides an enormous advantage in changing environments, thereby promoting evolution13,14. The benefits seem to outweigh the disadvantages, slight variations in the SGC occur only through random drift in isolated microbial populations such as parasitic and endosymbiotic , as well as organelles, where no HGT can take place15–17. The “frozen” part of the theory attempts to explain why there are (only) 20 common proteinogenic amino acids, however, the “accident” part is challenged by the observation that the genetic code is clearly nonrandom18. In 1975 Wong proposed the coevolution theory, arguing that the genetic code evolved parallel to the amino acid metabolic pathways19–22. Accordingly, simple, abiogenic amino acids entered the code first, and as biosynthetic pathways for more complex amino acids evolved, these amino acids became available and were added to the code. Abiogenic amino acids are those that can be produced in prebiotic experiments, such as the famous Miller experiment23, under plausible prebiotic conditions from inorganic material and can also be found in meteorites24. Their order of abundance in both, meteorites and prebiotic experiments, corresponds to their chemical complexity and thermodynamic stability: Gly, Ala, Asp, Glu, Val, Ser, Ile, Leu, Pro, Thr25–28. The initial (few) amino acids were encoded by large blocks of codons, which split to accommodate new additions. Thus, precursors made room for their products, which were likely only one catalytic step away19. In this context, it is worth considering that the biosynthesis of polymers, such as proteins and nucleic acids, requires a continuous supply of monomers, which can be more reliably provided by a stable metabolism rather than abiogenic sources such as meteorites. Thus, the role of the core metabolism, which is universal to life on earth, should not be underestimated when reflecting on the emergence of life. The coding coenzyme handle theory attempts to illustrate the advantage of amino acid acquisition in the RNA world leading to the origin of the genetic code. Amino acids, with their diverse chemical functionalities, might have served as catalysts at the catalytic core of ribozymes, thereby fulfilling the role of coenzymes. An amino acid on a nucleotide-based (e.g. a handle’s triplet end loop) might have been chemically attached to ribozymes as a sort of a proto-tRNA handle29,30. There are 123 peptides spanning all main enzymatic classes that only have a single amino acid at their catalytic core, demonstrating that single amino acids are indeed capable of catalysis. While histidine is generally the most abundant amino acid at catalytic sites, aspartic acid and glutamic acid, which are thought to be among the earliest additions to the code, more frequently act alone18. Furthermore, the cofactor S- adenosylmethionine (SAM), comprised of an amino acid attached to an adenosine moiety, might very well be a relic from that time supporting the coenzyme theory31,32. On a different note, experiments with the dipeptide PheLeu show that upon attachment to vesicle membranes, the dipeptide recruits fatty acids, promoting vesicle growth and thereby demonstrating another adaptive virtue of amino acids during early life evolution33. Nonetheless, the coding coenzyme handle theory fails to provide a sound reason for why amino acids became associated with triplets18. When plotting the variations in GC content of the first, second, and third codon positions against the GC contents of a variety of organisms, it becomes apparent that the variation of GC content between different organisms varies significantly for each codon position. While at position one there is a 31% variation in GC content between organisms, the variation is only 12% at position two and 80% at position three34,35. As during evolution advantageous mutations are selected for, while disadvantageous mutations are selected against, the low variation at position two might indicate mutational constraints and thereby the importance of this position35,36. While the second position specifies the type of amino acid (hydrophobic, hydrophilic, semipolar), the first position indicates a

2 1 Introduction specific amino acid and the third position is often redundant (Figure 1) and allows for wobble base pairing, where only the type of nucleobase (purine or pyrimidine) is important37. These observations gave rise to the 2-1-3 model38 and the related 4-column theory39. According to these models, at first, only the middle codon position specified the amino acid. With G at position one stabilizing codon- anticodon interactions with its three hydrogen bonds, the primordial code could have looked like this: GGN=Gly, GAN=Glu/Asp, GCN=Ala, and GUN=Val. These amino acids are the five most abundant in prebiotic soups and meteorites. As more amino acids were added to the code, position one became coding, followed by position three8,37.

semipolar

U C A G U C A G A G UC C A U G G U A C C A U G C G U C G G C U C A A C G A opal A U U G amber G U U U A A A C C A U C G G C G G U A U C C A A U G G U A C C A U G U G A C U G A C

hydrophilic hydrophobic

Figure 1 I Radial representation of the genetic code in mRNA format. The primary importance of the second codon position in determining the type of amino acid is emphasized. The first position determines the specific amino acid and the third (wobble) position demonstrates the degeneracy of the genetic code. The natural expansion of the genetic code at opal (SeC) and amber (Pyl) is illustrated in pink.

Perusal of the codon table (Figure 1) shows that the codons of similar amino acids often only differ in one position. For example, all aliphatic hydrophobic amino acids are encoded by U at position 2 and the two acidic amino acids Asp and Glu only differ by either having a pyrimidine or a purine at position three. Consequently, point mutations often only have a minimal effect on the physicochemical properties of the encoded amino acid, making the code very robust against errors in replication, transcription, and translation. The error minimization theory attributes these properties to evolution under selection for maximum robustness40. Critics of this theory point out that in this case, the only observable pattern of the SGC should be error minimization, however, other patterns, such as biosynthetically close amino acids having similar codons, are there10,41,42. Numerous cost functions to

3 1.1 Genetic Code Origin and Evolution assess the robustness of the SGC and compare it to other codes have been computed and indicate that, while the SGC is well adapted to mitigate errors, many even more robust versions are conceivable12,36,43–50. In another study, evolution simulations of random codes compare evolution according to three different models: the 2-1-3 model, the precursor-product expansion model (coevolution), and the ambiguity reduction model. The latter model operates under the assumption that initially groups of amino acids were encoded ambiguously and specificity was increased over the course of evolution51. The codes resulting from the simulations of the 2-1-3 and the ambiguity reduction model were more robust than the standard genetic code, while the one resulting from the coevolution model was inferior45–47. Although the robustness of the SGC is undeniable, these findings suggest that it might have arisen as a selectively neutral by-product of evolution, as opposed to being the driving force behind the evolution8. As all of these theories capture some, but not all aspects of the SGC, they are not mutually exclusive, and newer theories draw on them offering slight variations and piecing them together to supply a broader picture. Koonin and Novozhilov for example, propose grouping the amino acids according to the free energy of their codon-anticodon interactions (which naturally correlates to the GC content): a) strong (-3.1 kcal/mol mean free energy) -> Gly, Ala, Pro, Arg; b) intermediate (-2.2 kcal/mol) -> Asp, Glu, Cys, Ser, Gln, His, Lys, Thr, Trp; and c) weak (-1.0 kcal/mol) -> Asn, Ile, Leu, Met, Phe, Tyr8. The interactions of the latter group are so weak, that an extended anticodon loop with a modified tRNA base is required for stable codon-anticodon pairing, further supporting the idea of these amino acids being late additions to the code52. Grosjean and Westhof also observe a correlation between the increase of complexity of the modifications for the anticodon bases 34 and 37 and the AU-content of codon-anticodon pairs53. This order of appearance is well in line with the coevolution theory. Phylogenetic analyses of the Rossmann fold and biotin synthase superfamilies assert that their members had already been evolved by the time the aaRS classes evolved their specificities54–56, suggesting high fidelity translation prior to the introduction of aaRSs. Therefore, early mRNA decoding might have been handled by proto-tRNAs bearing unique pockets for amino acid attachment12 and the first anticodons could have been assigned randomly through a “frozen accident”. Driven by diversification of the repertoire with error minimization as a by-product, the code expanded via duplication of the proto-tRNAs, resulting in codon grouping of related amino acids8. Borrowing from the coevolution theory as well, Hartman and Smith57 present a theory not too different from that of Koonin and Novozhilov. Looking at the ribosome and aminoacyl-tRNA synthetases and combining these observations with a metabolic metric, Hartman and Smith arrive at an amino acid order very similar to the one presented above. There are two classes of aaRSs defined by their catalytic domains with ten amino acids belonging to each class. The class II enzymes bear a fold found in Biotin Carboxylases and are mostly associated with prebiotic amino acids, which led to the proposition that it might be the older of the two classes, whereas class I aaRSs contain a Rossmann fold58. Furthermore, since not all aaRSs interact with the anticodon loop, but interaction with positions 1-2-3 in the acceptor arm is always necessary for the identification of cognate tRNAs, an ancient operational code consisting mostly of GC pairs was proposed59,60. Taken together with the observation that the oldest parts of ribosomal proteins, thought to be the most ancient peptides61, show a bias for the amino acids Gly, Arg, Pro, Lys, Ala and additionally minding the number of catalytic steps from the citric acid cycle, Hartman and Smith suggest a GC-GCA-GCAU scheme for the evolution of the genetic code. According to this scheme, the first amino acids, Gly, Ala, Pro, and a simpler Arg precursor, were encoded by triplets consisting solely of G and C. The resulting peptides were largely unstructured, but the positively charged arginine precursor allowed interaction with negatively charged RNA backbones. The addition of A to the triplets allowed the recruitment of catalytic and polar amino acids and led to the formation

4 1 Introduction of α-helices. Finally, together with U, hydrophobic amino acids entered the SGC, enabling the formation of globular structures with a hydrophobic core, as well as interaction with membranes. This GC-GCA-GCAU model correlates not only with the metabolic significance of the amino acids but also with the hierarchy of protein folding57,62. Further, Budisa and Kubyshkin observe a correlation between the predominant secondary structure elements (α-helices, β-sheets) and the identity of the monomers that compose our catalytic polymers. Building on the model proposed by Hartman and Smith, they form the “alanine-world” hypothesis, where they recognize that all amino acids added to the SGC after the proposed GC-phase are alanine derivatives, the amino acid with the greatest α-helical propensity (31,32). Protein backbones are constituted by alanine moieties, whereby the attached side-chain defines the chemical function. Thus, point mutations do not impact the backbone fold and only an accumulation of mutations impairs the secondary structure63. Even after some six decades of research and the formulation of various theories and models, the origin and evolution of the standard genetic code remain an enigma and a comprehensive and conclusive story has yet to be assembled. This might well be, at least partly, attributed to the ancient “chicken or egg” paradox: a functional translation system depends on proteins, yet there are no proteins without a translation system8. Nevertheless, there are a few undeniable truths and characteristics of the code that everybody agrees on. Firstly, the code is degenerate, with several codons coding for a single amino acid and there is a rough correlation between the number of codons for one amino acid and its frequency in proteins64. The SGC is universal and nonrandom. Billions of years of evolution produced only slight variations in the code and while the codons remain mostly the same throughout all three domains of life, only the frequency of codon usage differs from organism to organism. However, codon usage should not be neglected, as rare codons are involved in the regulation of co-translational protein folding, have an effect on covalent protein modifications during and after synthesis, and affect co- and posttranscriptional secretion65. Highly expressed genes constitute many common codons with prevalent tRNAs and only few rare codons65,66. Lastly, the code is robust, but not optimal and it expanded from a smaller set of simple, primordial amino acids to the current set of 20, with methionine and tryptophan among the most recent acquisitions. It should be noted, that some organisms are capable of incorporating an additional amino acid into their proteins, either selenocysteine or pyrrolysine, thereby expanding their repertoire67–71. Pyrrolysine will be discussed in more detail later.

5 1.1 Genetic Code Origin and Evolution

1.1.1 Trp and Met: Two of the Latest Additions to the Genetic Code

The amino acids methionine and tryptophan are thought to be two of the latest additions to the genetic code and are both activated and charged onto cognate tRNAs by class I aaRSs. They are among the more complex proteinogenic amino acids and those with the greatest metabolic cost. They are the only amino acids with a single codon, breaking the code’s degeneracy57 and the rarest with an abundance in proteins of 1.4% for Trp72 and 2% for Met73. In comparison, leucine has an abundance of 9.1% in proteins72. Both amino acids are hydrophobic and contribute to protein stability. As the bulkiest amino acid, Trp affords a large surface for Van der Waals interactions in protein hydrophobic cores72. Furthermore, it can participate in cation – π interactions (preferably with Arg, Figure 2 c), which belong to the most stable non-covalent interactions74–76, and with its indole nitrogen Trp can act as a hydrogen-bond donor. A common feature for protein stabilization are aromatic – aromatic interactions, where often three or more aromatic side chains interact in the protein core77. The most frequent conformation among Figure 2 I Common Trp orientations. a) Edge-to- aromatic side chains is the perpendicular edge-to-face face orientation in a β-hairpin peptide (1LE0) b) Parallel-displaced orientation in a parallel β-sheet conformation (Figure 2 a)78, followed by parallel- (2KI0) c) Cation-π interaction between Arg and displaced (Figure 2 b) or offset-stacked interactions, Trp (2B2U). whereas aromatic stacking causes repulsion of the π electrons and is therefore rare79,80. While Trp is considered to be hydrophobic, there are more than 40 hydrophobicity scales published with Trp in varying positions81,82. Therefore, together with Met, Trp shows no strict preferences for protein interiors or surfaces. Using the same interactions described above for protein stability, at the surface Trp plays a key role in enzyme-substrate binding, antigen-antibody recognition, receptor-ligand interactions, membrane anchoring, and due to its similarity with the nucleobases also DNA/RNA binding83,84. Owing to the increased electron density in the π-system from the nitrogen lone-pair and the electron delocalization across this only polyaromatic amino acid, tryptophan is easily oxidized and susceptible to electrophilic substitutions, such as alkylation, nitration, or halogenation, resulting in vast chemical diversity85,86. It is therefore not surprising that Trp is not only used in protein biosynthesis but also serves as a precursor for many complex natural products such as alkaloids, hormones, antibiotics, anticancer agents, and antifungals87,88. These attributes make tryptophan an attractive target for drug discovery and development88, as well as studying protein function, structure and dynamics89. With the unique biophysical properties of Trp and its analogs and as the rarest amino acid, Trp is the ideal candidate for site-specific, intrinsic probes in proteins. Furthermore, the tryptophanyl-tRNA synthetase (TrpRS) is fairly permissive towards Trp analogs, enabling the incorporation of a variety of non- canonical amino acids (ncAA). Most aaRSs did not evolve any editing mechanisms against synthetic amino acids, likely because they were not present during evolution and therefore did not impose selection pressures against their incorporation into proteins90. Incorporation of ncAAs will be discussed in more detail further below.

6 1 Introduction

As already mentioned above, Met contributes to protein stability, often via interactions with an aromatic side chain91. While sulfur – aromatic interactions are longer (5-7Å) than salt bridges (<4Å), both types of interaction have comparable energies92,93. In the TrpRS catalytic site, for example, Met129 plays a crucial role in Trp recognition with the sulfur pointing to the middle of the indole ring94. Moreover, due to its unbranched nature and the fact that the S-C bond affords little energetic difference between its rotamers, the Met sidechain is very flexible and capable of molding itself to diverse sequences95. Additionally, the sulfur can be reversibly oxidized by either Mical or methionine sulfoxide reductase A (MsrA) and reduced back to methionine by MsrA or MsrB (in bacteria). Sulfoxidation turns the apolar Met side chain into highly polar MetO, which in some instances causes drastic changes in the physicochemical properties of the entire protein and in other cases has little or no effect on the protein91. In the latter instances, Met has been proposed to serve as an antioxidant by scavenging reactive oxygen species (ROS)96. It has been observed that cells bearing norleucine in their proteins instead of methionine exhibit lower fitness under oxidative stress97. Thus, the fact that norleucine lacks the sulfur and the associated scavenging properties, as well as the flexibility, might be the reason why methionine asserted itself over norleucine in the genetic code over the course of evolution. In another study, where longevity was used as an inverse proxy to ROS production, it was observed that mitochondrial peptides from short-lived species contain more Met residues than their counterparts from long-lived species98. Indeed, in mammalian nuclear DNA, the Met codon appears with a 2% abundance, whereas in their mitochondria (where the respiratory activity is highest) the abundance rises to 6%, as there the AUA Ile codon is reprogrammed to code for Met73. In a phenomenon known as adaptive mistranslation, the MetRS is phosphorylated by ERK1/2 under oxidative stress, causing MetRS promiscuity and misacylation of non-methionyl tRNA with Met99. While under non-stress conditions about 1% of Met residues are misacylated, the number increases to 10% during oxidative stress100. Taken together, these observations suggest a protective role for methionine against ROS. As already indicated above, in other cases site-specific Met sulfoxidation provokes major changes in the affected protein and related pathways, suggesting a position as a posttranslational modification and concomitant signaling. For example, oxidation of actin Met44 by Mical results in depolymerization of F-actin, subsequent reduction of MetO44 by MsrB induces G-actin polymerization101–103. Further, there are examples of indirect signaling, where sulfoxidation within phosphorylation motifs impedes phosphorylation104,105, or where kinases are activated/inactivated upon sulfoxidation106,107. A calculation on the probabilities of each amino acid being replaced by methionine over a certain evolutionary time period indicated Ile, Val, Leu, but also Gln, Lys, and Thr as the most probable amino acids108. Whereas Ile, Val, and Leu are the obvious candidates for Met substitution, Gln and Thr are great MetO mimics109,110, suggesting that these amino acids might have served as MetO predecessors and further hinting at a late arrival of Met in the genetic code, perhaps in response to local oxygenation111. In contrast to Trp, which is not only the least abundant amino acid in proteins but also as a free amino acid in cells112, Met, while being rather scarce in proteins (about 2%), in the form of S- adenosylmethionine serves as the main methyl donor throughout all kingdoms of life.

7 1.1 Genetic Code Origin and Evolution

1.1.2 S-Adenosylmethionine (SAM)

S-adenosylmethionine is the second most widely used enzyme substrate after ATP113. It is synthesized from methionine and ATP by the enzyme methionine adenosyltransferase (MAT, Figure 3a) and serves as a substrate in a variety of different cellular processes114. As the major methyl-donor in all living organisms113,115, S-adenosylmethionine is involved in a plethora of transmethylation reactions with diverse substrates ranging from nucleic acids to hormones, neurotransmitters, phospholipids, and natural products (Figure 3e)116,117. Upon donation of its methyl group, methyltransferases convert SAM to S-adenosylhomocysteine (SAH), which is in turn hydrolyzed to adenosine and homocysteine by SAH hydrolase118. Homocysteine is then either turned into the antioxidant glutathione or recycled back to methionine119. Methylation of DNA occurs at adenine and nucleotides, resulting in N6adenine, C5cytosine, and in bacteria also N4cytosine120,121. In bacteria, DNA methylation is primarily carried out by restriction-modification systems (RMS), which serve as a primitive immune system122. They consist of a methyltransferase (MTase), as well as a restriction endonuclease (REase)123 and protect the cell from foreign DNA, by digesting unmethylated sequences, such as those derived from bacteriophages. Self- recognition is achieved by methylation of specific sequence motifs, rendering those sequences resistant to REase activity124. Furthermore, methylation plays an important role in DNA replication by modulating the affinity of replication-associated proteins to the chromosomal origin of replication (oriC)125, preventing re- initiation of replication, as well as assisting in correct chromosome distribution to the daughter cells via hemi-methylated chromosome binding to designated areas of the cell membrane126. During mismatch repair, methylation helps in distinguishing the correct template from the newly synthesized strand bearing the replication error127,128. Methylation in promoter regions and protein binding sites affects the affinity of RNA polymerase and transcription regulators129,130, allowing for quick adjustment to environmental changes, for example via the RpoS-mediated stress response131. Gene expression is additionally regulated through methylation of histones in eukaryotes and histone-like proteins in bacteria, which also aid in the minimization of chromosome length via supercoiling132,133. Furthermore, DNA methylation regulates the virulence of human pathogens, as well as motility and adhesion134,135. There are 144 known types of RNA modification. Whereas the heavily modified tRNAs exhibit an extraordinary diversity of modifications, the most common modification in ribosomal RNA (rRNA) is methylation136. Methylation of rRNA occurs predominantly in the proximity of ribosome functional centers137, where it impacts rates and accuracy of translation138. For example, m2G966 in the 16S rRNA in bacteria interacts with the wobble base pair in the ribosomal P-site139, and loss of methylation at this position impairs translation initiation140,141. Further, rRNA methylation affects responses to metabolites142–144, and resistance of pathogenic bacteria to rRNA-targeting antibiotics is often achieved through methylation of key nucleotides145.

8 1 Introduction

Figure 3 I Overview of some biological processes with SAM participation. a) SAM biosynthesis. b) Donation of the amino group for biotin biosynthesis. c) Donation of ribosyl group for tRNA modification. d) Donation of aminoalkyl group for tRNA modification. e) Methyl group donation in a range of biological reactions involving DNA, RNA, proteins, and natural products. f) Aminoalkyl group used in polyamine synthesis. g) Donation of the aminoalkyl group in the synthesis of the quorum-sensing molecule N-acylhomoserine lactone. h) SAM aminoalkyl group utilized in 1-aminocyclopropane-1-carboxylic acid (ACC) synthesis (precursor of the hormone ethylene). i) SAM as a source of methylene groups in cyclopropane fatty acid (CFA) synthesis.

9 1.2 Xenobiology

However, SAM does not only donate its methyl group to miscellaneous cellular processes but rather every single functional group of this versatile molecule is used116. The methylene group is utilized during cyclopropane fatty acid (CPA) biosynthesis (Figure 3i)146,147 and SAM donates its amino group to biotin biosynthesis (Figure 3b)148,149. One of the numerous tRNA modifications mentioned above is queuosine, whose ribosyl group stems from SAM (Figure 3c) and which can be found at position 34 in the anticodon loop of asparaginyl-, aspartyl-, histidyl-, and tyrosyl-tRNA150. SAMs aminoalkyl group is employed in the modification of phenylalanyl-tRNA (Figure 3d)151, as well as in the synthesis of the quorum-sensing molecule N-acylhomoserine lactone in bacteria (Figure 3g)152. During polyamine biosynthesis, decarboxy-SAM donates its aminoalkyl group to the conversion of putrescine to spermidine (Figure 3f)153. Even 5’-deoxyadenosyl radicals are derived from SAM and help radical SAM enzymes to carry out a multitude of biological reactions, usually by the abstraction of a substrate hydrogen atom154. Figure 3 provides an overview of some of the many biological pathways influenced by SAM and illustrates the importance of this molecule. Alteration of methylation has been implicated in cancer155, inflammation156,157, neurodegenerative and neuropsychiatric disorders158–160, metabolic disorders161, and drug resistance162–164. However, studying methyltransferase spatial and temporal resolution, as well as specificity and function is challenging165. Therefore, SAM analogs provide a valuable tool to investigate epigenetic regulation166–168 and track protein169–171 and RNA methylation172– 174.

1.2 Xenobiology

For already thousands of years humankind has been using microorganisms for baking and brewing. Scientific understanding of how microorganisms and cell extracts can be applied for useful biotransformations gave rise to the first wave of biocatalysis more than a century ago175. The second wave of biocatalysis (1970s-1980s) was shaped by emerging protein-engineering techniques, allowing for overexpression and isolation of enzymes176,177. With the inception of directed evolution in the 1990s, tailoring and optimization of protein activity, stability, selectivity, as well as expansion of the substrate scope became possible and characterize the third wave of biocatalysis178–181. Nowadays, biocatalytic products can be found in a wide variety of goods ranging from medicines to vitamins, additives, biofuels, fragrances, and polymers178–186. However, although enzymatic transformations are regio-, enantio-, and chemoselective and achieve high rates and lifetimes, nature, employing mainly carbon, hydrogen, oxygen, nitrogen, sulfur, and phosphorus is somewhat limited when compared to the vast richness of organic synthetic chemistry187,188. For example, boron, fluorine, and silicon, which are important in medicinal chemistry189,190, rarely occur in living organisms and there are a plethora of industrial transformations that are not (yet) accessible through enzymatic reactions191. The chemical space of the macromolecules employed in nature is restricted by their building blocks, i. e. the four nucleotides adenosine, guanosine, cytidine, thymidine/uridine, and the 20 canonical amino acids. Therefore, scientists have started to introduce new-to-nature building blocks and cofactors, heralding the fourth wave of biocatalysis and bringing about the field of xenobiology192,193. All levels of the central dogma of molecular biology are subject to xenobiological research. At the informational level, unnatural base pairs offer the possibility of expanding the genetic alphabet, while xeno nucleic acids (XNA) can replace DNA or RNA in storage and propagation194. Nucleobase modifications can alter base pairing properties and modifications of the sugar or phosphate moiety

10 1 Introduction can confer nuclease resistance195. Furthermore, at the interface of the informational and executional level, manipulation of nucleic acids can broaden the substrate scope of aptamers. Aptamers are short nucleic acids, which bind to their target molecule selectively and with high specificity196,197. For example, combining click chemistry with SELEX (systematic evolution of ligands by exponential enrichment) enables the selection of aptamers against previously inaccessible protein targets via the incorporation of an alkyne-modified nucleotide. Subsequent modification with an azide of choice offers the possibility of modularly choosing various modifications while retaining compatibility with the conventional steps of the selection procedure198,199. Finally, at the translational level, the incorporation of noncanonical amino acids into proteins is used to either expand or alter the genetic code. Xenobiology can serve several purposes. The budding field of functional xenobiology aims at endowing the resulting macromolecule with new abilities, such as catalysis of chemical reactions that do not occur in nature. For example, incorporation of (2,2’-bipyridin-5yl)alanine (BpyA) into the lactococcal multidrug resistance regulator LmrR facilitates site-specific CuII-binding. The thusly modified enzyme is capable of catalyzing Friedel-Crafts alkylation200, as well as water addition to enones201. Controlled insertion of bioorthogonal functional groups and markers provide valuable tools for the study of protein function and structure, while another field is concerned with biosafety via biocontainment of genetically modified organisms (GMO)202. Lastly, exploring the boundaries of the genetic code and experimentation with alternate building blocks can afford fundamental insights into the origin and evolution of life203.

1.2.1 Genetic Code Engineering and Expansion

First experiments on the incorporation of noncanonical amino acids into proteins were conducted as early as the 1950s, where Levine and Tarver fed rats with the methionine analog ethionine, which bears an ethyl group instead of a methyl group at the sulfur atom. Labeling of the methylene group of the ethyl residue with 14C led to the observation that the ncAA ethionine is indeed incorporated into rat proteins204. By now, hundreds of different ncAA have been incorporated into specific protein targets202,205, ranging from synthetic substrates that are structurally similar to their canonical counterparts designated as analogs, to those that exhibit more diverse sidechains and are classified as surrogates206. The introduction of cAA analogs enables precise manipulation of target proteins at the single-atom level207,208. For example, biologically abundant elements like sulfur and hydrogen can be substituted by elements that do not frequently appear in biological molecules, such as selenium, fluorine, or heavy atoms209–211; modifications that are helpful for the determination of protein structure via crystallography and 19F-NMR spectroscopy212–214. Further, redox reactions can be tuned by inserting electron-withdrawing/donating groups, such as nitro and methoxy groups215. Incorporation of ncAAs with chemoselective tags216,217 allows for site-specific protein labeling with fluorophores and probes, for example via click reactions218, thereby facilitating (among other things) protein localization studies219. UV-sensitive sidechains offer the possibility of spatio-temporally controlling functional moieties and inducing photoreactive groups, as well as using photoswitches to induce conformational changes219,220. Strategically placed spectroscopic probes also enable characterization of protein conformation by making use of Förster resonance energy transfer (FRET)221 or studying allosteric

11 1.2 Xenobiology information transfer via vibrational energy transfer (VET)222. Furthermore, the role of amino acids in catalytic mechanisms can be deciphered by utilizing ncAAs as biophysical probes223–226. In contrast to techniques involving chemical peptide synthesis, co-translational incorporation of ncAAs has the advantage that the resulting enzymes are already a part of living systems, which facilitates further manipulations via directed evolution or integration into existing biosynthetic pathways192. There are two major approaches for the co-translational incorporation of ncAAs, namely selective pressure incorporation (SPI)227 and stop-codon suppression (SCS)228. The suitability of each technique for a given project depends on the choice of ncAA (analog or surrogate), as well as the nature of the desired modification (global or site-specific). Both techniques, however, depend on cellular uptake of the ncAA, usually via amino acid transporters or diffusion. As charged molecules are often impermeable to cell membranes, delivery in form of dipeptides can improve ncAA transport229. Alternatively, non-charged, permeable precursors can be transformed into the amino acid intracellularly230,231.

1.2.1.1 Selective Pressure Incorporation (SPI)

Selective pressure incorporation is most famously used for the incorporation of selenomethionine or azidohomoalanine for x-ray crystallography applications213,214 and in vivo protein labeling232. In SPI a canonical amino acid is replaced by a structural analog in a residue-specific manner. This technique (Figure 4, top). exploits the substrate tolerance of an endogenous aaRS and the translation machinery towards isostructural, synthetic amino acids, resulting in their incorporation into a target protein in response to their canonical counterpart’s codon233,234. Bacterial strains that are auxotrophic for the amino acid of interest, i. e. strains where the biosynthesis pathway of this amino acid has been deleted, can be driven toward ncAA incorporation upon cultivation in defined synthetic media deprived of the canonical amino acid and instead supplied with the non-canonical counterpart. Typically, the cells are first cultivated in rich media until mid-log phase, allowing for the synthesis of essential cellular components under nutrient-rich conditions with the full set of canonical amino acids. After washing and transferring the cells into minimal media containing the ncAA and only 19 cAA, production of the target protein is induced90. Deprived of one cAA and with no means of synthesizing that amino acid, the cells resort to incorporating the analog in the positions of the missing amino acid, culminating in the global replacement of this amino acid with its synthetic counterpart. This method of protein modification is also known as genetic code engineering. While this technique is relatively simple and requires no additional translational components, it relies on the promiscuity of aaRSs and is therefore limited to structurally similar amino acids. As all sense codons of the replaced amino acid are suppressed, multiple incorporations of ncAAs are possible, but no specific position can be targeted209.

12 1 Introduction

ncAAs SPI defined, synthetic media

3’ X 5’ 19 cAAs 3’ 5’ ribosome CGU

CGU endogenous aaRS/tRNA

5’ CGU NNN 3’

GCA XXX} mRNA nonsense 20 cAAs 3’ codon 5’ 3’ 5’

NNN

orthogonal NNN

} ncAAs aaRS/tRNA nonsense anticodon

rich media

SCS

Figure 4 I Schematic overview of the two main techniques for ncAA incorporation: selective pressure incorporation (SPI, top) and stop-codon suppression (SCS, bottom).

1.2.1.2 Stop-Codon Suppression (SCS)

The first site-specific incorporation of a ncAA into a protein was already achieved in 1989 by the Schultz lab in a cell-free system. Phe analogs were incorporated into β-lactamase in response to an amber stop codon by supplying chemically acylated tRNAs to an in vitro transcription-translation system235. Twelve years later, the same group published the first orthogonal translation system (OTS) and reported in vivo incorporation of O-methyl-tyrosine228. To date, more than 200 ncAAs have been site-specifically incorporated in a variety of proteins205. In contrast to SPI, where a canonical amino acid is replaced by an analog and the total number of amino acids stays the same, during stop-codon suppression an extra amino acid is added to the pool of 20 cAAs (Figure 4, bottom). Therefore, this method is also known as genetic code expansion and necessitates certain features: an unassigned or liberated codon that can be assigned to encode the non-canonical amino acid, an orthogonal aaRS/tRNA pair for the delivery of the ncAA to the ribosome, efficient transport of the non-canonical amino acid into the cell or its biosynthesis by the cell, as well as nontoxicity and metabolic stability of the ncAA. The orthogonal aaRS is not allowed to charge any endogenous tRNA’s with the ncAA and should not charge its cognate tRNA with any canonical amino acids, while the orthogonal tRNA, as well as the ncAA, should not serve as a substrate for any endogenous aaRS. However, the orthogonal aaRS/tRNA pair still needs to be compatible with the ribosome and its elongation factors. As the name of the technique already indicates, stop codons are

13 1.2 Xenobiology a popular choice for reassignment of ncAAs, due to the fact that they are not assigned to any cAAs. Most commonly reprogrammed for ncAA incorporation is the amber stop codon (UAG)205,236,237, since it is the rarest translation termination signal with a frequency of only 7-8% in E. coli238. Nevertheless, competition of release factor 1 (RF1) with suppressor tRNAs at amber codons can decrease suppression efficiency and lead to truncation products. Hence, to improve amber suppression, several strains have been constructed, where amber codons and RF1 have been deleted239–243. Further manipulations to increase efficiency and yield include optimization of elongation factor Tu (EF-Tu)244,245, the ribosome221,246,247, and OTS sequences and expression levels248–251. While SCS permits site-specific insertion of amino acids with sidechains that are quite distinct from those that can be found in nature, it requires the design of suitable orthogonal aminoacyl-tRNA synthetases. Enzymes derived from a different domain of life often meet the orthogonality criteria, as aaRS/tRNA recognition sequences are often species-specific 252. Thus, the most widely used OTS are the Methanocaldococcus jannaschii tyrosyl-tRNA synthetase/tRNATyr (TyrRS/tRNATyr) and the pyrrolysyl-tRNA synthetase/tRNAPyl (PylRS/tRNAPyl), both derived from . Although a few other 2 237 OTS exist, more than ⁄3 of the ncAAs have been incorporated by systems derived from these two .

1.2.1.3 PylRS System

The pyrrolysyl-tRNA synthetase (PylRS), a class II aaRS, encodes the 22nd proteinogenic amino acid pyrrolysine (the 21st being selenocysteine) in some archaea and bacteria. It consists of a highly conserved 270 amino acid long catalytic core253 with a β-sheet core surrounded by several helices forming a fold for ATP binding. It is an obligate dimer, with each subunit harboring an active site254–258. The enzyme was discovered when Krzycki and co-workers sequenced gene clusters for methanogenesis from methylamines in methanogenic archaea and found that all methylamine methyltransferases harbor a well-conserved in-frame amber stop codon. With the help of a crystal structure, it became clear that these Methanosarcinales incorporate an amino acid with (4R,5R)-4-methyl-pyrroline-5- carboxylate linked to the ε-amine of a lysine side-chain (Figure 5, molecule 4) in response to that amber codon. Pyrrolysine, the 22nd amino acid with the three-letter code Pyl and the one-letter code O was discovered259–261. While selenocysteine (Sec) is incorporated at opal positions only in response to a unique mRNA structure, metabolic conversion (Ser→Sec), and special elongation factor, for pyrrolysine incorporation solely the amber codon is sufficient and it is charged intact to the cognate amber suppressor-tRNA (tRNAPyl)71,262–266. In this way PylRS:tRNAPyl is a natural orthogonal pair. All the essential information for the biosynthesis of pyrrolysine, as well as its co-translational incorporation into proteins, is encoded in the pyl gene cluster next to the methylamine methyltransferase gene cluster. It contains the genes for Pyl the unique tRNACUA, the PylRS with low sequence identity to all other aaRS in the database, as well as pylB, pylC, and pylD for the biosynthesis of pyrrolysine (Figure 5)267.

14 1 Introduction

Figure 5 I Biosynthesis of pyrrolysine. The three enzymes PylB, PylC, and PylD catalyze the biosynthesis of pyrrolysine from two lysines.

The PylRS wild type enzyme, although binding pyrrolysine tightly268, exhibits a high tolerance toward other substrates254,269, among which even α-hydroxy acids can be found270,271. The substrate binding pocket of the enzyme (Figure 6) consists of a deep hydrophobic pocket with a bulky cavity for the accommodation of the pyrroline functionality of the pyrrolysine side chain, where substrate recognition occurs mostly via rather non-specific hydrophobic interactions. After binding its substrate, a flexible loop harboring Y384, which forms a hydrogen bond to the α-amine, folds back, forming a cap and closing the pocket255,256. In addition, the PylRS does not possess an editing domain, whereas most other aaRS undergo very specific interactions with their substrate side chains and α-amines for discrimination of structurally similar metabolites such as α-hydroxy acids. Many aaRS possess an editing domain and usually, the correct tRNA aminoacylation is kinetically favored272–277. As pyrrolysine is so much larger than the other amino acids, and there probably does not exist a structurally related metabolite in the cell, it is likely that the PylRS substrate binding pocket was never subject to much selection pressure. Therefore, the recognition of the substrate's rough shape and size, together with a high binding affinity for pyrrolysine, might be sufficient278. The crystal structure of the PylRS from Desulfitobacterium hafniense shows a distinct tRNA binding domain, a unique C-terminal tail, a loop region, and a bulge domain, which confer specific recognition of the anticodon, D and acceptor stems, though not directly of the anticodon nucleotides257. Experiments, where the anticodon of the amber suppressor tRNA from Methanosarcina mazei was mutated without impairing PylRS function, indicate that the PylRS indeed exhibits low selectivity toward the tRNA anticodon255,279. Most other aaRS, however, utilize the anticodon as a recognition element to evade misincorporation of amino acids280. Two exceptions are the LeuRS and the SerRS, where a single aaRS has to charge various isoacceptor tRNA’s with different anticodons for the recognition of six different codons281–285. As the PylRS only has to charge one tRNA with a single distinct anticodon for only one codon, it is not quite clear why it did not evolve a stricter Figure 6 I of PylRS. The enzyme is shown in pink, mechanism for the recognition of the correct bound Pyl-AMP is shown in sticks and Y384 is depicted in blue sticks (PDB entry: 2ZIM) anticodon. A possible explanation might be that the pyl gene cluster is tightly regulated and only expressed when methylamines are available. The

15 1.2 Xenobiology enzyme is therefore likely not subjected to steady evolutionary pressure, which might have impaired the evolution of a stricter mechanism278. The above discussed high substrate promiscuity of the PylRS together with its low selectivity toward the tRNA anticodon make the enzyme an excellent tool for genetic code expansion. Furthermore, the orthogonality of the PylRS in prokaryotes as well as in eukaryotes enables the selection of PylRS variants specific for a desired compound in fast-growing E. coli cells, followed by its direct transfer to eukaryotic cells205. In addition, the PylRS/tRNAPyl pair exhibits orthogonality toward the TyrRS/tRNATyr pairs from both E. coli and M. jannaschii, allowing the introduction of two distinct ncAA at two specific sites inside the same protein in prokaryotic as well as in eukaryotic cells251,286,287. All these extraordinary features have already contributed to the incorporation of over 100 different ncAA and α-hydroxy acids with PylRS mutants derived from Methanosarcina mazei and Methanosarcina barkeri, and still show great potential for the introduction of yet an even wider variety of compounds278.

1.2.1.4 MjTyrRS System

The TyrRS/tRNATyr from the archaeon Methanocaldococcus jannaschii.was the first orthogonal translation system (OTS) used for site-specific incorporation of a ncAA in vivo228. In order to find an aaRS/tRNA pair that is orthogonal in E. coli, the Schultz lab analyzed biochemical data from TyrRS/tRNATyr pairs from different organisms288 and found that eukaryotes and archaea recognize the identity element C1:G72, while prokaryotes recognize G1:C72289,290. Moreover, in vitro studies indicate that Saccharomyces cerevisiae and Homo sapiens tRNATyr are not aminoacylated by bacterial aaRS291,292. However, when testing amber suppressor tRNAs from these organisms in an in vivo complementation assay in E. coli, it was revealed that the G34 to C34 mutation of the anticodon (GUA, Tyr to CUA, amber) rendered both tRNAs susceptible to aminoacylation by endogenous aaRSs. The introduction of negative recognition elements to S. cerevisiae tRNATyr resulted in the loss of recognition by ScTyrRS. Therefore, Wang and coworkers started looking for tRNAs with recognition elements outside of the anticodon loop and came upon the MjtRNATyr.288 The MjTyrRS possesses a minimalist anticodon-loop-binding domain293 with only weak anticodon binding294, instead, A73 and C1:G72 in the acceptor stem are used as strong identity elements. In contrast, bacterial TyrRSs use the anticodon, G1:C72, and a long variable arm as identity elements (Figure 7 a)280,295,296. The MjTyrRS is a class I aaRS and a homodimer293 with each subunit divided into 5 regions: a short N-terminal domain followed by a Rossmann fold with connective peptide 1 (CP1) at the dimer interface and the class I consensus motif KMSKS297 linking the Rossmann fold to the C- terminal domain (Figure 7 b). The two bound tRNAs span both dimers with the acceptor stem bound by the Rossmann fold and CP1 domain of one subunit, and the C-terminal domain of the other subunit interacting with the anticodon loop298. While mutation of the first anticodon base, G34, still allows for aminoacylation by the cognate TyrRS, it has the biggest effect of all three anticodon bases on aminoacylation294. This observation can be explained by the binding mechanism of the MjTyrRS to the anticodon loop: G34 is flipped out and sandwiched between Phe261 and His283. In addition, Asp286 forms two hydrogen bonds to N1 and N2 of G34, whereas U35 only forms one hydrogen bond with Cys231 and A36 has no direct interactions with the C-terminal domain of MjTyrRS298.

16 1 Introduction

a) 3‘ 3‘ b) A OH A OH C C C C 5‘ A 5‘ A pC G pG C C G G C G C U A G C G C C G G C G C G C UG U A A UG U A A C GA CGGCC C GA CUUCC C CUUGAC A G GCCCUC G GCUGGU C GAAGGU C U GAAC C U G AGGG C U GG A G A U CC A A C UU U G C UG A G C G CA C G C G U G CA G C 3‘ A U C A G C A G C A U OH A U C A C C A U G C U A G A 5‘ A G A U pC G U C G M. tRNATyr G C Tyr jannaschii G C E. coli tRNA C G G C UG U A A C GA CGGCC A CUUGAC A GCUGGU C G GAAC C U G A G G GC G CAUG C G G C G C A U C A U A C U A orthogonal MjtRNATyr CUA Figure 7 I Tyrosyl-tRNA synthetase and tRNATyr from Methanocaldococcus jannaschii. a) Tyrosyl tRNA secondary structures ��� from M. jannaschii and E. coli, identity elements are shown in red. An orthogonal Mj������� with mutated bases illustrated in red is also shown. b) Crystal structure of wildtype M. jannaschii TyrRS with bound tRNA. The N-terminal domain is shown in orange, the Rossmann fold in red, the CP1 domain in green, the KMSKS loop in yellow, and the C-terminal domain in blue.

As the mutation of the anticodon to amber places C34 too far away from Asp286 to form hydrogen bonds, Kobayashi and coworkers sought to improve anticodon recognition by replacing Asp286 with larger amino acids. Indeed, mutation of Asp286 to arginine led to an 8-fold increase in activity compared to the wildtype (wt) enzyme298. Further improvements of the M. jannaschii OTS include Tyr mutations of the suppressor tRNA. Initially, some misacylation of MjtRNACUA by endogenous aaRSs from E. coli could be observed. Therefore, Wang and coworkers randomized 11 nucleotides that do not interact directly with the cognate TyrRS and subjected this tRNA library to negative and positive selections228 very similar to the double-sieve selection explained in more detail in section 1.2.1.6 (p. 18) below. The resulting tRNA exhibits increased orthogonality in E. coli. Moreover, the affinity to the elongation factor Tu from E. coli (EcEF-Tu) is enhanced by a GC-rich T-stem299. Further tuning of expression levels and copy number300 led to a highly efficient OTS that has been used to incorporate a variety of functionalized aromatic ncAAs301 in E. coli236, Salmonella302, Streptomyces venezuelae ATCC15439303, and Mycobacterium tuberculosis304. The high tolerance for substitutions of this thermophilic enzyme236,305, as well as the fact that it does not have an editing domain306, explain the success of this system in genetic code expansion. As it is mutually orthogonal to the PylRS system, even incorporation of multiple different ncAAs is possible287,307.

1.2.1.5 Library Design

For the site-specific incorporation of ncAAs that do not have an isostructural canonical counterpart and are therefore not recognized by the endogenous translation apparatus, the construction of an orthogonal aaRS/tRNA pair with a novel substrate specificity to accommodate the ncAA is desirable.

17 1.2 Xenobiology

This is commonly achieved via directed evolution, where an enzyme library is created and then selected for variants with the desired function, in this case usually by employing a selection scheme called double-sieve selection (see next section 1.2.1.6)308. The enzyme libraries are often created by redesigning the active site and randomizing residues involved in substrate recognition236. Site-saturation mutagenesis (SSM) produces side-chain diversity by employing primers randomized to NNK or NNS, where N stands for any of the four nucleotides, K (keto) stands for G or T, and S (strong) stands for G or C. Due to the degeneracy of the genetic code and wobble base pairing, randomizations to NNK or NNS are sufficient to cover the entire set of amino acids while reducing the number of mutants compared to NNN randomizations. As the E. coli transformation efficiency lies in the range of 108-109 recombinants, the number of randomizations is limited to 5-8 residues309. These residues are rationally chosen based on crystal structures, where docking of the ncAA into the active site of the structure can help to choose the residues. Popular targets are first shell residues, i. e. side chains with a distance of <6 Å to the substrate308. However, these first shell residues intricately interact with second shell residues310, and mutations that change the active site architecture can have an impact on aaRS stability. Therefore, another popular approach for the creation of enzyme libraries is random mutagenesis, for example via error-prone PCR (EP-PCR). This technique exploits the inherent low fidelity of the Taq-Polymerase from Thermus aquaticus, which does not have proof-reading and therefore does not correct any mistakes that occur during DNA polymerization311. The fidelity is further decreased by altering the buffer composition 312 through the addition of MnCl2 and by increasing MgCl2 concentrations . Under these conditions correct base pairing is hindered, resulting in random mutations. By unbalancing the nucleotide concentrations and increasing dCTP as well as dTTP concentrations, a bias towards GC enrichment in the amplified sequence can be minimized313. Depending on the choice of primers for the PCR reaction, a specific region, such as the catalytic site, can be targeted or the entire gene can be amplified with random mutations spread out throughout the whole sequence. The average number of mutations per template depends on the number of amplifications and can thus be tuned by adjusting the number of PCR cycles312.

1.2.1.6 Double-Sieve Selection

The most popular selection technique to obtain functional aaRS mutants capable of charging the desired ncAA to their cognate tRNA from the enzyme libraries is the double-sieve selection illustrated in Figure 8. It consists of iterative rounds of positive and negative selections, where the positive selections help to identify functional library members and the negative selections filter out members that charge any of the canonical amino acids228,314.

During positive selections, the cells harboring the aaRS library are incubated in the presence of all 20 cAA, as well as the ncAA and a selection marker bearing one or more amber codons at permissive sites. Here, the chloramphenicol acetyl-transferase (CAT), which confers resistance to chloramphenicol is commonly used. Only cells bearing library members capable of suppressing the in-frame amber codon(s) and thereby read through the resistance gene can survive in the presence of chloramphenicol (Cm). The number of in-frame amber codons, as well as the Cm concentration, can be used to fine- tune stringency.

18 1 Introduction

aaRS cAAs aaRS aaRS library ncAA Selected Variants aaRS

aaRS

Transformation

amber aaRS codon(s) CAT 1) Plasmid Isolation 2) Transformation

tRNACUA Positive Selection

Cells with Cells with aaRS inactive aaRS activating

Cells with aaRS Cells with aaRS activating activating

amber aaRS Negative Selection codons Barnase

1) Plasmid Isolation tRNA CUA 2) Transformation

Figure 8 I Schematic overview of double-sieve selection. During positive selections, all functional library members are selected, while negative selections sieve out variants that are capable of charging cAAs. Typically, selection cycles are repeated around 2-3 times.

The functional members are then transferred to the negative selection. This time, the cells are incubated in the absence of the ncAA and the presence of a toxic selection marker, usually the bacterial ribonuclease barnase315, again bearing in-frame amber codons at permissive sites. Library members that charge cAAs to their cognate tRNA are able to produce barnase, resulting in their host cell’s death. Only cells harboring mutants that cannot charge cAAs and therefore do not produce the toxic gene product will survive301. Typically, two to three rounds of selection are repeated.

1.2.2 Biocontainment

With progress in genetic engineering rapidly advancing in the last decades, biosafety concerning genetically modified organisms (GMOs) has become an increasing concern. Due to the universality of the genetic code and concomitant horizontal gene transfer, genetic information could easily escape from a GMO and contaminate naturally evolved genetic pools. As biological systems are highly complex, the consequences of such a contamination are unforeseeable and could potentially lead to the alteration of entire ecosystems316. Thus, strategies akin to those outlined in the 1975 Asilomar conference for recombinant DNA are required317. Biocontainment, as a means of preventing the uncontrolled spread of such GMOs in a natural environment, provides a solution to these concerns. Biocontainment methods should take all possible GMO escape mechanisms, such as mutagenic drift, environmental supplementation, and HGT, into consideration318. The level of safety is hereby defined by the number of cells escaping containment relative to the cell number of the entire population and is set at 10-8 for a system considered safe by the National Institutes of Health (NIH)319. Strategies

19 1.3 Adaptive Laboratory Evolution (ALE) targeting the central dogma of molecular biology320 and introducing dependencies on synthetic molecules are especially popular. Modifications at the information storage level yield semantically contained organisms321. Ideas for how to achieve this are plentiful and range from the reduction of degenerate codons322 and the construction of minimal genomes (not all genes are needed under laboratory conditions323) to changing the coding unit to quadruplets247,324. While the assembly or reconstruction of an entire genome is extremely difficult, escape through HGT or evolution of thusly modified organisms is very unlikely. Other efforts aim at changing the identity of the informational polymer itself, either by modifying the DNA backbone, as is the case with XNAs325 or by employing synthetic base pairs. For example, the Romesberg group demonstrates not only the chromosomal incorporation of a hydrophobic base pair but also its replication and assignment to a ncAA326,327. However, the need for an information storage polymer to retain its ability to replicate and engage in transcription, as well as translation, means that alterations of the central dogma at its basis are remarkably challenging328. Therefore, alienation of organisms via trophic containment329 might be more straightforward. As ncAAs do not occur in nature, rendering organisms dependent on the incorporation of these synthetic molecules, for example by introducing in-frame amber codons into essential genes, results in these organisms being unable to survive outside of the controlled laboratory environment330,331. However, escape from the containment should not be as easy as reversion of one mutation. Escape frequencies of a single UAG codon are in the range of 10-6-10-7 and with that do not meet the NIH safety standards318. Insertion of multiple amber codons in multiple essential genes has proven successful330, but even then incorporation of cAAs in response to UAG codons can become a problem. For example, the Isaacs group applied multiplex automated genome engineering (MAGE)332 for the identification of three permissible TAG sites in three essential genes. The resulting strain escaped containment by creating a suppressor tRNA via mutation of the anticodon of one of the three existing tyrosyl-tRNAs, resulting in tyrosine incorporation at amber codons. Only deletion of two of the three tyrosyl-tRNAs yielded escape frequencies below the NIH threshold, as the sole remaining tRNATyr is needed for Tyr incorporation in response to tyrosine codons and cannot be easily mutated331. In some yeast species, the CUG codon, which traditionally codes for leucine is reassigned to serine. This rather drastic reassignment from a hydrophobic amino acid to a polar one entails that heterologous gene expression in a similar organism gives rise to misfolded proteins333. These observations suggest that rather than using nonsense codons, sense codon reassignment might also be a conceivable biocontainment strategy. Although much progress has been made in recent years334,335, establishing robust biocontainment remains challenging.

1.3 Adaptive Laboratory Evolution (ALE)

Over the years, exploiting microbes for their biotransformations of desirable products to achieve more sustainable chemical syntheses has become more and more important336,337. However, due to the complexity of metabolic networks338,339 rational design of metabolic engineering can be exceedingly complicated and difficult. Therefore, mimicking Darwinian evolution to obtain a desired phenotype can be advantageous340. By cultivating cell cultures in a controlled environment for a prolonged period of time, adaptive laboratory evolution (ALE) harnesses natural selection of random mutations in order to

20 1 Introduction enrich beneficial mutations341,342 in an unbiased way343. Owing to their simple nutrient requirement, facile laboratory cultivation, and fast growth microbes are ideal candidates for ALE341. This technique requires no advance knowledge of the necessary genetic modifications and enriches mutants with increased fitness. Ideally, a fitness advantage causes the mutant to outcompete his ancestor and dominate the culture, though often a significant amount of genetic diversity arises within the culture, leading to intrapopulation competition and resulting in clonal interference344. The fitness gain is defined by the selection pressure applied, which is set by the cultivation environment. Hereby, the most frequently used cultivation methods are batch culture and continuous culture employing chemostats or turbidostats. These bioreactors allow for tight control of the environment, such as pH or oxygenation, and via set dilution rates and a constant influx of media continuous growth can be maintained345. In these experimental setups, where the cells are constantly kept in the exponential phase and the presence of excess nutrients, fitness is defined by the growth rate342. Batch culture, on the other hand, relies on serial inoculation of flasks with fresh media. While maintaining the cells in the exponential phase with this cultivation method is theoretically possible, the regular adaptation of the propagation frequency throughout the experiment is difficult without automation, and decreasing the inoculation volume results in tighter population bottlenecks. Therefore, often fixed volumes and fixed intervals are chosen for propagation346. The cells then go through all of the growth phases after each inoculation step and consume their resources, regularly experiencing resource excess followed by depletion. Hence, cells in these setups are often selected for a decreased lag phase and survival in stationary phase347. Whereas this mode of propagation is simple and easy to handle, it may yield co-existing specialist strains that thrive in different phases348. The increasing availability of affordable high throughput DNA sequencing349 has greatly facilitated the identification of the mutations that are accumulated during ALE. For later characterization of the experiment, samples are periodically taken and frozen over the course of the ALE. To help identify adaptive mutations and distinguish them from neutral or hitchhiker mutations, independent replicates from the same ancestral strain are commonly evolved simultaneously350. The effect of an identified mutation can then be studied and further characterized by reintroducing the mutation in question to the ancestral strain351,352. Most studies focus on acquiring production strains for the industry by optimizing the growth rate353, increasing the tolerance towards stresses354,355 like high temperature356, or inhibiting product concentrations357, and by increasing metabolite production and nutrient uptake358–361. However, ALE can also afford valuable insights into evolutionary phenomena, such as clonal interference and regulatory rewiring344,362–364. A very well-characterized example is the long-term evolution experiment (LTEE) conducted by Lenski and coworkers346. Started in 1988 and still ongoing today, the study aims at following the fate and analyzing the repeatability of the evolutionary trajectories of 12 individual populations from the same ancestor cultivated in glucose minimal medium. The most profound observation was made when one population evolved the ability to utilize the citrate present in the medium as an alternative carbon source by 31,500 generations365. Under aerobic conditions, E. coli is normally unable to express the citrate transporter CitT and thus cannot utilize citrate as a carbon source. This obstacle was overcome by duplication of citT and placing it under the control of another promoter366. Although the experiment shows that even after more than 30 years and 60,000 generations the fitness gain reaches no asymptotic limit, the rate of the fitness gain does decrease over time367, and most ALE studies only last around 100-500 generations. Hereby, the end of such an adaptation experiment is determined somewhat arbitrarily and largely depends on when the researcher feels their goal has been achieved.

21 1.3 Adaptive Laboratory Evolution (ALE)

The LTEE conducted by the Lenski group represents a milestone of experimental evolution and inspired many other studies.

1.3.1 Adaptation towards [3,2]Tp usage

Inspired by experiments conducted by Bacher368 and Wong369, as well as Lenski, the Budisa group set out to explore the boundaries of the genetic code by changing its amino acid set and adapting Trp- auxotrophic E. coli strains to the usage of non-canonical Trp analogs. The first unambiguous proteome- wide trophic replacement of tryptophan residues in E. coli was achieved with the sulfurized isosteric analog L-β-(thieno[3,2-b]pyrrolyl)-alanine ([3,2]Tpa)231. The cells were cultivated in New Minimal Medium (NMM)370 containing all essential nutrients as well as glucose and at the beginning of the experiment also all canonical amino acids except Trp (NMM19 – Trp). To circumvent the need for transporter-mediated uptake of the ncAA the cells were supplied with the metabolic precursors indole and thienopyrrole ([3,2]Tp) rather than feeding the amino acids. After passive diffusion through the cell membrane371, indole and [3,2]Tp are intracellularly transformed into the corresponding amino acids by the tryptophan synthase TrpBA under consumption of serine (Figure 9)372,373.

Figure 9 I Reaction scheme of the reaction catalyzed by the tryptophan synthase TrpBA. In a simple metabolic conversion, indole (or its analog) reacts with serine to form tryptophan (or its analog [3,2]Tpa).

In addition to most of the Trp biosynthesis pathway, the TnaA was deleted to avoid Trp and [3,2]Tpa catabolism, yielding the strain MG1655 ΔtnaA ΔtrpLEDC, which was designated as TUB00. By gradually decreasing the indole concentration in the media, while keeping the [3,2]Tp concentration constant, the strain was adapted to utilize the analog, rather than Trp for survival. Finally, to avoid Trp contamination from commercial amino acid preparations, all other canonical amino acids were removed from the media, and the strain TUB170, which grows in the complete absence of Trp (or indole) was obtained. Genomic and proteomic analysis revealed, that the strain re-regulates its RpoS- mediated stress response to accommodate the alien substrate [3,2]Tpa in its proteome. While at the beginning of the experiment the cells react to the presence of [3,2]Tpa by triggering the general stress response, the evolved strain adapts a more relaxed state, similar to that of the ancestral strain in the

22 1 Introduction absence of the Trp analog. The proteomic data is supported by mutations in stress proteins, including the master regulator RpoS itself. Furthermore, mutations in proteases suggest that the strain lowers its protein quality management, presumably to neglect protein misfolding due to [3,2]Tpa incorporation. Finally, removal of rpoS from the ancestral strain improves [3,2]Tp tolerance of the unadapted strain, demonstrating the key role of the general stress response in the adaptation (results not yet published). To date, two more strains that are capable to survive in the complete absence of Trp by instead incorporating 4-FTrp and 5-FTrp, have been adapted in the Budisa group374. These studies contribute a step towards understanding possible environmental causes of genetic changes and their relationship to evolution and prove that the genetic code of an entire organism can be altered.

23 2.1 Evolution of bacterial strains toward methionine analog usage

2 Aim of this Study

This study aims at deepening the understanding of the mechanisms underlying changes to the standard genetic code in a proteome-wide manner. Furthermore, the next steps towards the alienation of life as we know it will be taken. To this end, efforts undertaken in this study will be twofold: the biocontainment of a strain already adapted to ncAA usage will be attempted, as will the adaptation of a strain(s) capable of survival on methionine analogs.

2.1 Evolution of bacterial strains toward methionine analog usage

The proteome-wide replacement of the rarest amino acid and the latest addition to the standard genetic code has been successfully demonstrated231,374, indicating a certain flexibility of the code. In order to suss out the boundaries of that flexibility and to further our understanding of the genetic code and its evolution, the replacement of another amino acid will be attempted. As the only other amino acid with only a single codon, a rather low abundance in the proteome, and also considered to be a late addition, methionine seems to be a likely candidate. Therefore, ALE experiments analogous to the one described in section 1.3.1 (p. 22) will be conducted with the methionine analogs (Figure 10) trifluoromethionine (TfMet) and ethionie (Eth). During these experiments, a methionine-auxotrophic strain is continuously cultivated in defined synthetic media in the presence of the ncAA. Analog usage is encouraged by gradually decreasing the methionine concentration in the media during re-inoculation of the cultures.

Figure 10 I Chemical structures of methionine and its analogs used in this study.

The resulting strain(s) will ideally be able to grow on the non-canonical amino acid alone. As methionine is not only used in protein biosynthesis, but also as a precursor of SAM (see Figure 3), not only the proteome of such a strain will be affected, but a plethora of other (macro)molecules as well. It will be interesting to characterize how such an organism adapts to the usage of trifluoromethyl and ethyl as an alternative to the methyl group in transmethylation reactions. Furthermore, strains with such a broadened scope of bioconversions might potentially be useful in biotechnological applications375,376.

24 2 Aim of this Study

2.2 Biocontainment of TUB170

There are three strains in the Budisa lab that have been adapted to usage of the tryptophan analogs [3,2]Tpa231, 4-F-Trp, and 5-F-Trp374, respectively. All three strains are viable when grown in the complete absence of tryptophan, accepting the respective analog as a substitute instead. However, if these strains are supplied with tryptophan or indole, the canonical substrate is preferred over the non- canonical one, thus leading to tryptophan incorporation throughout the proteome. These strains harbor the E. coli tryptophanyl-tRNA synthetase (TrpRS), which is able to charge its cognate tRNATrp with the Trp analogs in addition to its natural substrate tryptophan.

cell dies

natural environment orthogonal aaRS/tRNA trpS genes cAAs

S chromosome R Trp rp Trp T

ribosome 20 cAAs endogenous aaRS/tRNA

Trp analog 5’ 3’ mRNA Trp codon

orthogonal ncAAs aaRS/tRNA Trp anticodon

H N controlled laboratory environment HN S HN F F H2N

H N OH 2 O OH H2N O [3,2]Tpa OH O 4-F-Trp 5-F-Trp

cell lives and proliferates

Figure 11 I Schematic overview of the strategy to generate a biocontained organism. The E. coli trpS gene is to be replaced with an orthogonal aaRS/tRNA pair capable of discriminating between tryptophan and its analogs, rendering the strain- dependent (“addicted”) on these unnatural substrates. The strain would not be able to survive outside the laboratory, as the ncAAs 4-F-Trp, 5-F-Trp, and [3,2]Tpa do not occur in a natural environment.

25 2.2 Biocontainment of TUB170

By replacing the endogenous TrpRS with an aaRS capable of discriminating between tryptophan and its analogs, biocontainment could be achieved, resulting in a synthetic organism with an altered genetic code (Figure 11). This synthetic strain would be completely dependent (“addicted”) on a non-canonical amino acid and would not be able to survive in nature, where these non-canonical amino acids do not occur, thereby impeding its accidental propagation in natural environments. In addition, such a strain would allow for the incorporation of tryptophan analogs in rich media, where bacteria grow significantly better than in minimal media. This could enhance product yield when utilizing these strains as platforms for the incorporation of tryptophan analogs, for example in antimicrobial peptides377. Hence, this study aims to evolve an orthogonal aaRS/tRNA pair for the specific incorporation of [3,2]Tpa, focusing on containing the most viable of the three strains TUB170, and to a lesser extent also for 4-F-Trp and/or 5-F-Trp.

26 3 Results and Discussion

3 Results and Discussion

3.1 Evolution of bacterial strains toward methionine analog usage

To probe the flexibility of the genetic code ALE experiments analogous to the one described in section 1.3.1 (p. 22) were conducted with the methionine analogs trifluoromethionine (TfMet) and ethionie (Eth). Hereby, adaptation towards analog usage was encouraged by gradually decreasing the methionine concentration during cultivation in the presence of the ncAA. For these experiments, a methionine-auxotrophic strain is required.

3.1.1 Establishing Met-auxotrophy

The methionine biosynthesis pathway branches from the L-homoserine biosynthesis pathway, where O-succinyl-L-homoserine together with L-cysteine forms L-cystathionine, which is, in turn, converted into L-homocysteine378 (Figure 12). There are two different enzymes capable of converting homocysteine to methionine by using tetrahydrofolate as methyl-donor; the cobalamin-independent homocysteine transmethylase encoded by metE and the methionine synthase encoded by metH (Figure 12 purple box). While the metE gene product does not depend on vitamin B12 (cobalamin) and can only use the triglutamate analog of N5-methyl-tetrahydrofolate (N5-methyl-tetrahydropteroyltri-L-glutamate) as cofactor, the 379 metH gene product depends on exogenously provided vitamin B12 and can utilize both as cofactor, the monoglutamate as well as the triglutamate analog of N5-methyl-tetrahydrofolate380. 381 The expression of metE is repressed by methionine and vitamin B12 and it is post-transcriptionally regulated by an anaerobically induced small regulatory RNA called FnrS, so that only the metH gene product is active under anaerobic conditions382. Methionine is then either used for protein expression or converted to S-adenosyl methionine (SAM) by the actions of the metK gene product, the methionine adenosyl transferase (MAT). SAM serves as the major methyl donor and is converted to S-adenosyl homocysteine (SAH) upon methyl-donation, which is recycled back to homocysteine (Hcy) via two enzymatic steps383. The ultimate goal of this study was to adapt a strain for Met-analog utilization, in which ideally the respective SAM analog would be formed to serve as donor for transethylation/trifluoromethylation reactions. Such reactions would result in the formation of SAH, which would, in turn, be converted to Hcy. Thus, in the presence of metE or metH methionine could be synthesized from Hcy, and disruption of methionine biosynthesis by deletion of any of the genes involved in preceding biosynthetic steps (metA, metB, or metC) would not suffice. It is, therefore, necessary to delete both genes, metE and metH, to get a strain that is auxotrophic for methionine under all conditions (cobalamin present/not present, aerobic/anaerobic, Hcy formation via SAH recycling). Interestingly, there is a third enzyme capable of converting homocysteine to methionine in E. coli. In contrast to the metE and metH gene products, however, this enzyme, MmuM, uses (R)-SAM as methyl donor384,385. Therefore, even in the event of spontaneous epimerization of (S)-SAM (generated by MAT) to its R-enantiomer, no methionine biosynthesis would occur in the engineered adapted strain, as SAM would be substituted by S-adenosyl ethionine or S-adenosyl trifluoromethionine.

27 3.1 Evolution of bacterial strains toward methionine analog usage

Figure 12 I L-methionine de novo biosynthesis pathway in E. coli. The responsible genes and their products are denoted in dark purple. The genes that were deleted in this study to achieve full methionine-auxotrophy are highlighted by the purple box.

28 3 Results and Discussion

To this end, the metE deletion was established in MG1655 via transduction of phage P1 prepared from the strain JW3850-1 from the Keio collection386. Occasionally, instead of packing the viral genome, the bacteriophage P1 accidentally packs random parts of the bacterial genome in the virus particle (for more details on phage transduction see chapter 5.2.17.1, p. 112). This is exploited in the lab to transfer genetic markers, like antibiotic resistance genes, from one strain to another strain. Replacing a target gene with a selection marker is a popular laboratory technique for gene deletion, as it simultaneously enables the desired deletion, as well as provides a means of selection for those cells where the genetic manipulation was successful. The desired locus is targeted by supplying the marker with up- and downstream sequences homologous to the regions around the targeted gene.

metE metH metE: dKO 700bp, wt 2800bp a) metH: dKO 600bp, wt 4200bp C1+C2 C1+C5 C1+C4 C1+C2 C1+C5 C1+C4 dKO wt dKO wt dKO wt M dKO wt dKO wt dKO wt metEmetH: 1900bp, : 2400bp C5 C1 3000 bp metE/H C2 1000 bp C4 500 bp C1 1,330 1,340 1,350 1,360 1,370 1,380 1,390 1,400 FRT 1,410 KanR 1,420 FRT1,430 1,440 1,450 1,459 Consensus G GAA GA AAA AA TGA TTCCG GG GA TCC G TC GA CC TGC AG T TC GAA G T TCC TA TTC TC TAG AA AG TA TAGG AA C TTCG AAG CA GC TCC AGC C TACA CA GA -C T TGC G TCGG Translation1 2 3 4 5 6 7 8 9 10 11 12

Coverage

1,330 1,340 1,350 1,360 1,370 1,380 1,390 1,400 1,410 1,420 1,430 1,440 1,450 1,459

b) upstream metE FRT sequence downstream metE

1,210 metE::FRT G1,220GAA GA AAA AA1,230TGA TTCCG GG1,240GA TCC G TC GA1,250CC TGC AG T TC1,260GAA G T TCC TA1,270TTC TC TAG AA1,280AG TA TAGG AA1,290C TTCG AAG CA1,300GC TCC AGC C T1,310ACA CA GAA C T1,320TGC G TCGG 1,330 1,338 Translation Consensus GG GAG CA AG TG TGA T TC CGG GG A TC CG TCG AC C TG CA G TTCG AAG TTCC TA T TC TC TAGA AA G TA TA GGA AC TTC GA AGC AG C TC CA GCC TA CAC TG GGG TA TGA CG C Translation

metE-FRT-C1....Coverage G GAA GA AAA AA TGA TTCCG GG GA TCC G TC GA CC TGC AG T TC GAA G T TCC TA TTC TC TAG AA AG TA TAGG AA C TTCG AAG CA GC TCC AGC C TACA CA GA -C T TGC G TCGG Translation 1,210 1,220 1,230 1,240 1,250 1,260 1,270 1,280 1,290 1,300 1,310 1,320 1,330 1,338

upstream metH FRT sequence downstream metH metH::FRT GG GAG CA AG TG TGA T TC CGG GG A TC CG TCG AC C TG CA G TTCG AAG TTCC TA T TC TC TAGA AA G TA TA GGA AC TTC GA AGC AG C TC CA GCC TA CAC TG GGG TA TGA CG C Translation

metH-5_C1Se...GG GAG CA AG TG TGA T TC CGG GG A TC CG TCG AC C TG CA G TTCG AAG TTCC TA T TC TC TAGA AA G TA TA GGA AC TTC GA AGC AG C TC CA GCC TA CAC TG GGG TA TGA CG C Translation

c) -M +M d) ______2,0

1,8

1,6

1,4

1,2

1,0

OD(600) 0,8

0,6

0,4

0,2

0,0 24h 48h 24h 48h Figure 13 I Genomic and phenotypic verification of Met-auxotrophy for the strain MG1655 ∆metEH::FRT. a) Agarose gel of a colony-PCR of ∆metEH::FRT. On the right is a schematic illustration of where the primers bind and which fragment lengths are expected. dKO: double knockout, wt: wildtype. b) Sequencing analysis of the metE and metH loci. The red bars annotate the respective up- and downstream regions of the targeted genes, the blue bars denote FRT sequences. c) Optical densities of cultures cultivated for 48 h in the absence (-M) and presence (+M) of 1 mM Met. d) Cells plated on agar without any Met (left) and supplemented with 1 mM Met (right).

29 3.1 Evolution of bacterial strains toward methionine analog usage

In place of the metE gene, the Keio strain used here harbors a kanamycin resistance cassette as selection marker, which is flanked by FRT (Flp Recognition Target) sites. After successful recombineering of the selection marker at the desired chromosomal locus, these FRT sites can be used for the removal of the marker. Expression of the flippase (Flp) recombinase supplied on a helper plasmid results in the recombination of both FRT sites flanking the marker, thereby cutting out the marker and leaving behind only an FRT scar (Figure 13 b)387. After removal of the kanamycin resistance, the metH deletion could be established, also using phage P1 prepared from the metH KO strain JW3979-1 from the Keio-collection and kanamycin resistance as a selection marker. Again, the kanamycin resistance cassette was removed. After completion of the genome engineering, the helper plasmid carrying the Flp recombinase and harboring a temperature-sensitive ori was cured by incubation at 42°C. Met-auxotrophy was verified genomically (Figure 13 a and b) and phenotypically (Figure 13 c and d). Figure 13 a shows the agarose gel of a colony-PCR of the resulting double knockout (dKO) strain MG1655 ∆metEH::FRT compared to the wildtype (wt) MG1655. Three different primer combinations were tested for both loci (metE and metH): C1+C2 bind immediately up- and downstream of the targeted gene, resulting in much smaller PCR fragments for the successful gene deletion where only a small FRT scar remains compared to the wt fragment; C1+C5 bind upstream and inside of the targeted gene sequence, so that a PCR reaction can only be successful where the gene still remains (no band expected for dKO); and finally C1+C4 bind upstream and inside the kanamycin resistance cassette. As the selection marker was removed by the Flp recombinase, no fragment was expected. Sequencing analyses of both loci (Figure 13 b) additionally verify the gene deletions. To ascertain that the removal of metE and metH indeed results in a Met-auxotrophic phenotype, the strain was cultivated in NMM19 lacking Met and a control culture supplemented with 1 mM Met for 48 h at 37°C (Figure 13 c). Furthermore, the strain was plated on agar plates likewise in the absence and presence of 1 mM Met (Figure 13 d). Growth could only be observed where Met was supplemented, thereby confirming auxotrophy.

3.1.2 Choice of analogs

For the adaptation experiments, the methionine analogs trifluoromethionine (TfMet) and ethionine (Eth) were chosen (Figure 14).

Over the last few decades, the introduction of fluorine into organic compounds has become increasingly popular in medicinal chemistry as well as in crop and materials science388. Organofluorines exhibit a range of expedient biophysical, chemical, and biological properties as evinced by the fact that more than 50% of the blockbuster drugs and more than 20% of pharmaceuticals, in general, contain fluorine389,390. With a van der Waals radius of 1.47Å fluorine is somewhat - but not considerably - larger than hydrogen (1.2Å) and single fluorine for hydrogen substitutions generally do not cause great steric disturbances391. Contrary to common misconception, however, fluorination is not automatically accompanied by an increase in hydrophobicity and single fluorine for hydrogen exchange actually decreases hydrophobicity, as fluorine’s strong electronegativity results in a significantly higher and inversed dipole moment of the C-F bond when compared to the C-H bond. Perfluorination of hydrocarbons on the other hand causes a phenomenon known as the “fluorous effect”, where these highly fluorinated compounds segregate in a third phase distinct from both, the hydrophilic as well as

30 3 Results and Discussion the lipophilic phase392. Thus, perfluorinated hydrocarbons are quite prevalent in catalysis and separation processes. Fluorine has a very low polarizability and is a poor hydrogen-bond acceptor, which can be explained by its high nuclear charge and tightly-packed lone pairs393. Furthermore, fluorine exhibits a very strong inductive effect influencing the acidity/basicity, polarity, and reactivity of neighboring functional groups even across long distances394–396. The acidification of neighboring groups increases with the number of fluorine atoms, whereby in amino acids this effect is more pronounced at the amino group than the carboxylic group395. Incorporation of fluorine into peptides and proteins via fluorinated amino acids can increase hydrophobicity and enhance protein stability397,398. Protein folding and stability rely on the “hydrophobic effect”, according to which hydrophobic amino acids are packed in the protein interior, away from the aquatic environment, whereas hydrophilic amino acids are exposed at the protein surface399,400. This hydrophobic collapse is amplified by the “fluorous effect” and could potentially lead to proteins resistant to denaturation by organic solvents394,401. Furthermore, fluorination of proteins and peptides can enhance drug/receptor binding398,402–404, peptide-protein recognition, interaction with biomembranes, proteolytic stability405–412, and aid in structure/function characterization413–416 via 19F-NMR413,417,418.

TfMet Met Eth Figure 14 I Comparison of the structures of trifluoromethionine (TfMet), methionine (Met), and ethionine (Eth). Structures are represented as ball and stick models and Connolly molecular surfaces are shown in transparent blue.

While fluorine is typically introduced to aromatic amino acids by single fluorine for hydrogen exchange, in the case of aliphatic amino acids introduction of trifluoromethyl groups is more common. The 419–421 CF3 - group maximizes the hydrophobic and minimizes the polar contributions of fluorine . It is therefore a useful tool for fine-tuning protein hydrophobicity and trifluoromethionine has been reported to be twice as hydrophobic as the most hydrophobic cAA isoleucine422. This can be attributed 391,423,424 to the CF3 – group reducing electron density at the sulfur nucleus . Whilst single F for H substitutions are almost isosteric, the difference in van der Waals hemispheres between the methyl 3 3 3 391 group and the trifluoromethyl group adds up to 25.8 Å (CH3 = 16.8 Å , CF3 = 42.6 Å ) . Thus, the 425 CF3 – group is closer in size to the isopropyl group than the methyl group , which translates into a 15 % increase in steric bulk for TfM in comparison to methionine (Figure 14)426. Nevertheless, TfM is a substrate for the E. coli endogenous methionyl-tRNA synthetase (MetRS) and has been incorporated in a variety of different proteins and enzymes including GFP394, bacteriophage lambda lysozyme (LaL)424, and the KlenTaq DNA-polymerase427. Even multiple Met for TfMet substitutions of up to 14 residues have been achieved with an efficiency of 82 % and without loss of enzyme activity and fidelity427. TfMet is an excellent spectroscopic probe for 19F-NMR424,426–429 and has

31 3.1 Evolution of bacterial strains toward methionine analog usage been utilized for redox protein modulation430, as well as the characterization of PRC2 inhibition. PRC2 is a methyltransferase involved in gene silencing, which is inhibited by a histone H3 mutation present in pediatric glioblastoma cancer431.

Protein translation is always initiated with a methionine residue aminoacylated to an initiator tRNAi. fMet In E. coli, after aminoacylation and prior to translation initiation, the methionyl-tRNAi is formylated by the enzyme methionyl-tRNA transformylase (Figure 15)432. The formyl-ester mimics a peptide bond and the initiator tRNAi is the only tRNA that binds directly at the ribosomal P-site. Met residues that are encoded in later positions are incorporated by an aminoacylated elongator tRNAMet that is not formylated and binds at the ribosomal A-site (as is the case with all other aa)433. After translation, the N-terminal formyl-methionine is de-formylated by the enzyme peptide deformylase434 and in most proteins, the first Met is removed by the methionine aminopeptidase (MAP). Hereby, the nature of the second amino acid determines whether the N-terminal Met is removed or not; generally, large side chains inhibit N-terminal Met cleavage435,436.

S S O H NH2 NH O O O O

methionyl-tRNA transformylase

initiator tRNA CAU CAU

10-formyl tetrahydrofolate tetrahydrofolate

S O H N H O O 2) methionine E P A aminopeptidase S (when 2nd aa is small) NNN elongator tRNA O H N H N COOH 5’ UAC H UG XXX O mRNA A 3’ 1) peptide deformylase Figure 15 I Schematic overview of processes involved in formylation and deformylation of Met.

Met fMet TfM is charged to the elongator tRNA as well as the initiator tRNAi , as demonstrated by its presence at the N-terminus of TfMet-containing proteins394,424,427. However, it seems to be cleaved off of the N-terminus less efficiently than its canonical counterpart394. This is supported by similar observations of proteins containing the difluorinated analog (DfM)437. Taken together, the permissiveness of TfMet in protein expression and the relative scarcity of Met residues in the proteome (chapter 1.1.1, p. 6) make this analog a promising candidate for the adaptation experiment. In addition, methionine’s involvement in transmethylation reactions (chapter 1.1.2, p. 8) means that a strain adapted to TfM - utilization may be an excellent platform for

32 3 Results and Discussion

“trifluoromethylation” of a wide variety of (macro)molecules, thus facilitating fine-tuning of their hydrophobicity. Lastly, such a strain poses a great opportunity for studying the impact of fluorine not only on protein structure and stability but also on an entire organism and therefore offers potential to further our knowledge of organofluorine compounds.

Ethionine was chosen for its close structural and electronic resemblance to methionine (Figure 14). It is (one of) the first synthetic amino acid(s) to have been experimented with and was already fed to rats on cystine-deficient diets as early as 1938. It was concluded that, in contrast to methionine, ethionine is not able to rescue the growth of these sulfur-deficient rats438. Only 13 years later, Levine and Tarver were able to show that Eth is incorporated into rat proteins, likely by the same means as “normal” protein synthesis, which led them to the conclusion that “the structure of a protein is not fixed and immutable”204. Since then, ethionine and its impact on protein synthesis, as well as cellular metabolism was studied extensively over the years. The methionyl-tRNA synthetase (MetRS) aminoacylates Eth to the elongator tRNAMet as well as the fMet 439–443 fMet initiator tRNAi . The ethionyl-tRNAi is formylated and protein expression is initiated by fMet 442 formyl-ethionyl-tRNAi . Furthermore, formyl-ethionine-bearing proteins are deformylated and N- terminal ethionine residues are removed similarly to N-terminal Met residues (Figure 15)444. Proteins, where Met residues have been substituted with Eth are catalytically active445–449, and substitution of eight Met residues by Eth has been reported with an efficiency of over 90%450. In mice, rats451–453, and yeast439,454 S-adenosyl ethionine (SAE) formation has been observed and ethylation of rat liver DNA455, tRNAs456–458, and nuclear proteins459 have been reported. Ethionine’s role in protein expression and one-carbon metabolism has been useful for methylation studies460. While Eth inhibits the growth of microorganisms, the inhibition can be reversed by Met460 and therefore only be observed in minimal media461. Similar observations have been made with TfMet and an accompanying decrease of the intracellular free Met pool has been reported in Saccharomyces cerevisiae439. Therefore, a role of Met-analogs as repressors for Met biosynthesis has been proposed. Met biosynthesis is carefully balanced by transcriptional and (Figure 16)462, whereby feedback inhibition by Met and SAM affects the first step of Met biosynthesis (metA)463. It is therefore conceivable that analogs of Met might exert a similar effect. However, extensive analyses of Met analog - resistant E. coli strains revealed that Eth-resistance is always associated with mutations in the metJ gene464,465. MetJ is an apo-repressor that, together with its co-repressor SAM, represses transcription of metA, metBL, metC, metF, and metE465. Thus, Chattopadhyay and co-workers have suggested SAE to function as a false co-repressor for MetJ, demonstrating in vitro turnover of Eth by the E. coli methionine adenosyl transferase (MAT)464. However, the majority of the reports466,467, including more recent ones468–471, agree that bacterial MATs are not very permissive towards Met analogs and demonstrate only minimal turnover of Eth by the E. coli MAT. Nakamori and co-workers hypothesize that the metJ mutation found in the Eth-resistant strain in their study likely diminishes DNA-binding of MetJ, the exact involvement of Eth, however, remains unclear465. Interestingly, while Eth-resistance is often connected to mutations in metJ, strains resistant to other Met analogs can exhibit different genotypes but are usually likewise characterized by enhanced Met biosynthesis, indicating different mechanisms of de-repression triggered by different analogs464. Since it correlates with high Met production, Eth-resistance has become a useful tool to screen for Met overproducing microbes for the improvement of industrial L-Met production463,472. Met, as an essential amino acid for humans and livestock, is an important feed additive and is used for the treatment of liver disease463. However, chemical synthesis relies on hazardous materials and produces the racemate473.

33 3.1 Evolution of bacterial strains toward methionine analog usage

Figure 16 I Schematic overview of feedback inhibition and repression of methionine biosynthesis in E. coli. Proteins and their genes are denoted in purple, repression is represented by dashed arrows, and feedback inhibition by double arrows. The aporepressor MetJ requires SAM as cofactor.

In another example, Eth-resistance was key to understanding the importance of potentiating and actualizing mutations365,366,474,475 during the evolution of interspecies cooperation. In this study, the

34 3 Results and Discussion survival of consortia of Met-auxotrophic E. coli and Salmonella enterica relies on Met overproduction by S. enterica to supply the Met biosynthesis-deficient E. coli strain. S. enterica in turn relies on carbon by-products from E. coli. Cooperation evolved significantly faster in populations that were screened for Eth-resistance and it was shown that potentiating mutations in metJ were necessary for actualizing mutations in metA to occur476.

Due to its involvement in protein expression, as well as one-carbon metabolism, methionine is a fascinating amino acid to study and substitution with structural analogs will not only further our understanding of the flexibility (and possibly evolution) of the genetic code, but also of cellular methylation processes. While TfM, owing to fluorine’s intriguing properties, is the more exciting amino acid, the concomitant increase in steric bulk and its electronic differences might complicate the adaptation of an entire organism to its usage. Therefore, Eth with an only slightly longer side-chain and similar electronic properties might be a more permissive substituent.

3.1.3 ALE starting conditions

Before starting the adaptation experiment toward Met analog usage, the optimal starting conditions were elucidated. Overnight cultures of the Met-auxotrophic strain MG1655 ∆metEH::FRT (from here on denoted as ∆metEH::FRT) were cultivated in LB media. The cells were washed twice with minimal media lacking Met to remove excess Met and then inoculated 1:500 in 10 mL minimal media supplied with an excess of all cAA except Met in 100 ml flasks. In the ALE experiment, analog usage is encouraged by limiting the Met supply and instead supplying the desired analog in excess. Therefore, to determine limiting Met concentrations, the cells were first cultivated with Met concentrations ranging from 5 µM to 1 mM, and OD600 was measured after 24 h (dashed lines) and 48 h (solid lines) (Figure 17 a). At concentrations of 0.5 mM and above the cultures reached an OD600 of about 8 and Met is no longer the limiting factor under these conditions. No significant differences between measurements after 24 h and 48 h were observed. To avoid any potential selection pressure on phosphate metabolism, a buffer that is not metabolized in E. coli was tested as an alternative to the standard potassium phosphate buffer employed in NMM.

For that purpose, 3-Morpholinopropane-1-sulfonic acid (MOPS) with a pKa of 7.2 was chosen. MOPS- buffered minimal glucose media are known to support E. coli growth and sufficiently buffer these cultures (glucose metabolism accumulates acetate)477. Thus, minimal media analogous to NMM19 (19 cAA – Met) but buffered with 40 mM MOPS and a reduced potassium phosphate concentration of only 1 mM K2HPO4 were prepared. The inset in Figure 17 a compares NMM19 cultures with those of its MOPS-buffered counterpart for Met concentrations between 5 µM and 30 µM. No significant differences between cells cultivated in NMM19 and MOPS-buffered media (denoted as MOPS19) could be observed and the MOPS version supplied with 20 µM Met was chosen for the following experiments. This concentration allowed the cultures to accumulate a fair amount of cell mass and reach mid-log phase (OD600 ≈ 0.6), where the cells are active, while Met is still the limiting nutrient so that selection pressure for analog usage builds upon Met consumption. Next, ethionine concentrations ranging from 5 µM to 500 µM were tested (Figure 17 b). Culture densities increase with the amount of Eth until at 100 µM Eth and an OD600 of approximately 1.6 a plateau is reached. The inset in Figure 17 b shows optical densities of cultures grown in the absence of Met and with Eth concentrations ranging from 5 µM to 250 µM. Almost no cell growth can be observed on Eth alone.

35 3.1 Evolution of bacterial strains toward methionine analog usage

b) 1,8 24h + M 48h + M 1,6 24h - M 48h - M 1,4 0,05 1,2 0,04 1,0 0,03

0,8 OD(600) 0,02 0,6 a) NMM19 24h NMM19 48h 0,01 MOPS19 24h 0,4 MOPS19 48h 0,00 0 50 100 150 200 250 8 0,2

7 0,0 0 100 200 300 400 500 6 1,0 Eth [µM] 5 0,8 c) 4 0,6 1,8

OD(600) 24h + M 3 0,4 1,6 48h + M 2 0,2 24h - M 1,4 0,0 48h - M 1 0 5 10 15 20 25 30 35 0 1,2 0 200 400 600 800 1000 1,0 Met [µM]

0,8 OD(600)

0,6

0,4

0,2

0,0 0 1000 2000 3000 4000 5000 TfMet [µM] Figure 17 I Elucidation of optimal ALE starting conditions. a) Optical densities of ∆metEH::FRT cultivated in minimal media with Met concentrations ranging from 5 µM to 1 mM measured after 24 h (dashed lines) and 48 h (solid lines). Inset compares phosphate-buffered minimal media (NMM) with its MOPS-buffered counterpart (MOPS). b) OD600 values of ∆metEH::FRT cultivated in MOPS-buffered minimal media with 20 µM Met and Eth concentrations ranging from 5 µM to 0.5 mM measured after 24 h (dashed lines) and 48 h (solid lines). The inset shows cultures cultivated with 5 µM-250 µM Eth without the addition of Met. c) OD600 values of ∆metEH::FRT cultivated in MOPS-buffered minimal media with 20 µM Met and TfMet concentrations ranging from 5 µM to 5 mM measured after 24 h (dashed lines) and 48 h (solid lines). The inset shows cultures cultivated without any Met. Values represent the mean of three cultures with the SD as error bars.

Similar to cultivation with Eth, TfMet alone only affords minimal growth. The addition of trifluoromethionine to MOPS19 + 20 µM Met increases cell densities only slightly from 0.6 to about 0.8 and even concentrations as high as 5 mM seem to be well tolerated (Figure 17 c). Generally, Met analogs are known to have an inhibitory effect on growth, especially in minimal media. However, as already discussed in the previous chapter, this growth inhibition is associated with an inhibition of de novo Met biosynthesis. In this study, the biosynthetic pathway was disrupted and the cells rely on an external supply of the canonical amino acid, which explains the lack of inhibition observed here. Curiously though, Duewel and coworkers, although also working with a Met-auxotroph, observed severe growth inhibition depending on the TfM to Met ratio424. In their experiments, supplementation of 5 mM TfM proved to be significantly growth-inhibiting in the presence of 50 µM Met and only in the presence of 500 µM Met normal growth could be restored, while in this study no adverse effects could be observed in the presence of 5 mM TfM and as little as 20 µM Met. In lieu of these discrepancies, it is noteworthy to mention, that even though both strains are auxotrophic for Met, the strain used in

36 3 Results and Discussion this study is derived from a K-strain, while the other study employed a B-strain derivative (B834(DE3)) and thus considerable genotypic differences exist between these strains. It is possible that increasing TfMet to Met ratios would eventually yield adverse effects also in this strain, but these were not tested, as higher TfM concentrations were not relevant for this study. For the ALE experiments MOPS19 supplied with 20 µM Met and 100 µM Eth or 500 µM TfMet were chosen as starting conditions, as these concentrations afforded maximal optical densities with Met as the limiting nutrient and the analogs supplied in excess.

3.1.4 Adaptive Laboratory Evolution (ALE)

During the adaptive laboratory evolution, the cultivation conditions used for the pre-experiments described above were mirrored. Three different experiments with three cultures each were conducted: a) a control experiment without any analog supplementation, b) adaptation towards ethionine utilization, and c) adaptation towards trifluoromethionine usage. The control experiment was conducted to help distinguish between effects caused by prolonged incubation in minimal media with limiting Met concentrations and those responsible for adaptation to analog usage. Changes that occur in the control are likely to be neutral or due to adaptation to the media and passaging regime employed. Thus, comparison of the adapted strains with the control should facilitate identification of artifacts, such as fixation of neutral mutations due to genetic drift, and assist in focusing on analog- related characteristics. The cells were incubated at 37°C and 200 rpm and every 48 h to 72 h an aliquot corresponding to

0.02 OD600 was transferred into fresh media. Hereby, the parameters chosen for the serial inoculation are not trivial (Figure 18). The passaging imposes a bottleneck on the adaptation experiment, where the size of the passage determines the number of cells allowed to grow and persist in the population until the next dilution event. Thus, beneficial mutations that are unable to gain dominance in the time frame of one transfer cycle may be lost during the next passaging. This outcome becomes increasingly likely when multiple beneficial mutations compete against each other, a phenomenon known as clonal interference. On the other hand, passaging is necessary to supply fresh nutrients and discard waste products that accumulate during cultivation and large passages result in faster nutrient depletion and waste accumulation. In ALE setups where the population is kept in the exponential growth phase, large passage sizes result in shorter cultivation periods between transfers that need to be adjusted as the growth rate increases over the time course of the experiment478. In setups where the cells are allowed to go through all growth phases, the outcome is less stressful, but passage size determines the lengths of growth and stationary phases, which has an impact on adaptation. Upon transfer into fresh media, growth is initially slow and the cells need some time to accelerate their growth (lag phase) until they reach their maximal growth rate in the exponential phase, where nutrients are abundant. As nutrients are consumed and/or waste accumulates, growth decelerates until stationary phase is reached, where growth comes to a halt and some cells may die. When cells cycle through these periods of feast and famine during an adaptation experiment, all of these phases are under selection (as opposed to just the exponential phase in the other setup) and the lengths of the phases pose selective pressures366,479.

37 3.1 Evolution of bacterial strains toward methionine analog usage

waste

beneficial mutations

neutral mutation

large small passage passage

l l tia y tia y n r n r e na e na n o th n o o ti a o ti g xp a e g xp a

la e st d la e st

) )

0 0

0 0

6

6

( (

D D

O O

time [h] time [h] Figure 18 I Schematic representation of the effects of small and large passage sizes during adaptive laboratory cultivation. Large passages are more likely to adequately represent the population, including colonies with different beneficial mutations. The larger the passage size, however, the smaller the dilution necessary to discard accumulated waste products and for supplementation of fresh nutrients. It also results in shorter lag and exponential phases, while increasing stationary phase and possibly favoring the onset of the death phase. Small passages, on the other hand, result in longer lag and exponential phases with shorter stationary phases. If the passage size is too small, beneficial mutations may be lost.

Hence, by choosing a certain passage size, one does not only decide which portion of the population is allowed to propagate into the next passage, but one might also inadvertently select for duration of the lag phase or survival in the stationary phase. Furthermore, the maximal growth rate depends on the initial cell density of a culture, where larger passage sizes result in lower maximal growth rates480. Thus, the serial passaging concept may seem simple and straightforward at first glance, but the parameters chosen for the ALE can have complex repercussions that are difficult to foresee.

38 3 Results and Discussion

a) 1,6 20 µM Met ctrl. ΔmetEH::FRT 1 1,4 ΔmetEH::FRT 2 ΔmetEH::FRT 3

1,2

1,0

0,8 OD(600) 0,6

0,4

0,2

0,0 0 5 10 15 20 25 30 35 40 45 50 55 Passage

100 µM Ethionine b) 20 µM Met 15 µM Met 10 µM Met 1,6 ______

1,4 ΔmetEH::FRT 1

1,2 ΔmetEH::FRT 2

1,0 ΔmetEH::FRT 3

0,8 OD(600) 0,6

0,4

0,2

0,0 0 5 10 15 20 25 30 35 40 45 50 55 Passage

c) 1,6 500 µM TfMet ΔmetEH::FRT 1 1,4 ΔmetEH::FRT 2

1,2 ΔmetEH::FRT 2

20 µM Met 15 µM Met 1,0 ______

0,8 OD(600) 0,6

0,4

0,2

0,0 0 5 10 15 20 25 30 35 40 45 50 55 Passage Figure 19 I Adaptive laboratory evolution towards Met analog utilization. Optical densities (OD600) are plotted against the number of passages. a) Control experiment with 20 µM Met and no analog addition. b) Adaptation towards Eth usage. c) Adaptation towards TfMet usage.

39 3.1 Evolution of bacterial strains toward methionine analog usage

Here, a setup where the cells cycle through feast and famine was chosen, as selection pressure for analog usage is only imposed upon Met consumption, when the concentration of the only limiting resource diminishes. As long as the canonical substrate is abundant, the cells prefer it over the analog. However, when Met is depleted, all other essential resources and the analog are still abundant, encouraging the cells to keep growing by utilizing the analog instead. Nevertheless, until enough beneficial mutations accumulate to allow proliferation on the analog, the cells will enter the stationary phase upon Met depletion. Intervals of 48 h-72 h between re-inoculation were chosen for extended intervals of selection pressure in the stationary phase. Each of the three ALE experiments was conducted in triplicates with the start cultures derived from the same three colonies of ∆metEH::FRT labeled as ∆metEH::FRT 1, 2, and 3. Samples were taken on a weekly basis and stored at -80°C as “frozen fossils” for later characterization of the adaptation or emergency resurrection in case of any problems with the ALE. In Figure 19 the optical densities of each passage are plotted against the passage number. Curiously, after the first 3-5 passages the optical densities considerably decrease in both adaptation experiments (Figure 19 b and c) without any change in media composition or cultivation conditions. After 9 (Eth adaptation) or 10 (TfM adaptation) passages the Met concentration was decreased from 20 µM to 15 µM to increase the selection pressure. In the case of the ALE experiment with Eth another attempt at increasing the selection pressure was made by decreasing the Met supplementation further to 10 µM. However, even after 55 (Eth ALE) or 51 (TfM ALE) passages and approximately 4 months of cultivation no increase in optical densities was observed, indicating no enhancement of general fitness. As expected the Met control exhibits stable optical densities with a slight increase in the case of ∆metEH::FRT 1 towards the end (Figure 19 a), hinting perhaps at adaptation towards growth in minimal media. At this point, the adaptations were stopped and the experiments were assessed. In E. coli only about 20 % of the intracellular Met pool is used for protein expression and the rest ends up in the form of SAM, which is involved in a plethora of cellular reactions ranging from, among other things, transmethylation of nucleic acids, proteins and natural products to donating its amino group for biotin synthesis and its aminoalkyl group to polyamine biosynthesis (see also chapter 1.1.2, p. 8). As already mentioned in chapter 3.1.2 (p. 30), the E. coli methionine adenosyl transferase is known to not be very permissive towards Met analogs and only exhibits minimal activity with Eth as substrate. It is therefore very likely that the enzyme is not capable of TfM turnover. Since this is such a crucial reaction for cellular metabolism, it is likely to pose a major bottleneck in the adaptation towards Met analog utilization and the chance of beneficial mutations occurring in a reasonable time frame may be low. Some species, however, have MATs that are much more permissible towards Met analogs. The laboratories of Thorson and collaborators have tested MATs from E. coli, Homo sapiens, M. jannaschii, and Sulfolobus solfataricus for turnover of more than 40 different Met analogs469,470. All tested enzymes (except the one from E. coli) readily convert Eth to its S-adenosylated counterpart with an efficiency of at least 80 % of that of Met conversion (Met conversion set as 100 %). TfM was not tested, but the enzyme from S. solfataricus is permissible towards Met analogs bearing an isopropyl and even isobutyl group and with that, it is the first tested MAT permissible towards branched Met analogs470. As the isopropyl group structurally resembles the trifluoromethyl group, this enzyme was deemed promising. Therefore, rather than waiting for an adaptation event during ALE, it was decided to test metK genes from other organisms in E. coli, in the hope that they might alleviate the bottleneck that inefficient endogenous Met analog conversion to its S-adenosylated counterpart poses.

40 3 Results and Discussion

3.1.5 Plug and play with metK

Two promising metK candidates were chosen: the gene from S. solfataricus, because of its ability to convert branched Met analogs, and the gene from M. jannaschii, as it is one of the most permissible variants towards a range of Met analogs. As both enzymes have already been shown to convert Eth efficiently to S-adenosyl ethionine (SAE)469,470, it was attempted to replace the endogenous metK gene in the strain ∆metEH::FRT with one of these genes. In another study, Parungao and coworkers have successfully substituted the E. coli metK gene with bacterial and eukaryotic orthologs supplied on plasmids, demonstrating the ability of E. coli to survive with foreign MAT variants481. These promising results were encouraging for the approach described here. Simultaneously, both enzymes were overexpressed heterogeneously in E. coli to test in an in vitro assay, if one of these variants is also capable of TfMet turnover.

3.1.5.1 1st approach: Rescue plasmid

Therefore, the metK genes from S. solfataricus and M. jannaschii were codon-optimized for expression in E. coli and ordered from GeneArt. To knock out the endogenous gene, the λ red system, which assists in recombination (for more details see chapter 5.2.17.2, p. 114), was employed in an approach analogous to the one described in chapter 3.2.2.1 (p. 66) for the knockout of the trpS gene. As metK is also an essential gene, a copy needs to be supplied on a rescue plasmid to be able to replace the chromosomal gene with a selection marker. To test whether the enzymes from S. solfataricus and/or M. jannaschii are capable of supporting E. coli growth, they were cloned on the rescue plasmid and it was attempted to replace the endogenous metK with a kanamycin resistance cassette (Figure 20, step 1). In the event of a successful knockout, the resistance cassette in the chromosome would then be replaced with the S. solfataricus or M. jannaschii gene (Figure 20, step 2). The rescue plasmid would be removed (Figure 20, step 3) in order to start another adaptation experiment with a plasmid-free setup. Analogously to the trpS rescue plasmid, these plasmids harbored 3 I-SceI restriction sites to allow for their removal. The successful assembly of all constructs was verified via sequencing analysis.

41 3.1 Evolution of bacterial strains toward methionine analog usage

λ red Chromosome PCR-derived recombination E.coli metK resistance cassette PBAD system > KanR βγ exo Plac Transformation pKD46 Mj/Ss metK I-SceI Step 1 AmpR rescue plasmid cleavage site Elimination of chromosomal metK CmR

Chromosome KanR PCR-derived metK cassette PBAD > Mj/Ss metK βγ exo Plac > pKD46 Transformation Mj/Ss metK I-SceI cleavage Step 2 AmpR rescue plasmid Insertion of Mj/Ss metK site into chromosome CmR (in place of KanR)

survival?

Chromosome Mj/Ss metK PBAD > βγ exo Transformation of helper plasmid

Plac pKD46 > Mj/Ss metK I-SceI Step 3 AmpR cleavage Removal of superfluous rescue plasmid site rescue plasmid (and elimination of λ red CmR system)

Chromosome

PBAD Mj/Ss metK > Elimination I-SceI of helper helper plasmid plasmid ALE 2.0 GentaR Plac > CmR Mj/Ss metK

Figure 20 I Overview of the 1st approach for the replacement of the E. coli metK with either the M. jannaschii or S. solfataricus metK. Mj: M. jannaschii, Ss: S. solfataricus. The plasmid harboring the λ red system and the helper plasmid have temperature-sensitive origins of replication and can be removed via incubation at 42°C. The linearized pieces of the rescue plasmid are digested by endogenous restriction enzymes.

42 3 Results and Discussion

However, even after several KO attempts, no colonies where the chromosomal metK gene was replaced with the kanamycin resistance cassette could be obtained. Figure 21 shows the agarose gel of the colony PCR of a representative colony (one of many picked colonies). Only bands corresponding to the wt (metK in chromosome still present) can be seen and the fragment corresponding to the kanamycin resistance (C1+C4) is not formed, indicating that the KO attempt was unsuccessful.

5 4 5 4 C2 C C C2 C C 1+ 1+ 1+ + 1+ 1+ M C C C C1 C C 1209 bp ΔmetK ΔmetEH C5 C1 Ec metK C2 843 bp 1500 bp 1200 bp 1000 bp

1544 bp 500 bp C4 C1 FRT KanR FRT C2 530 bp 1 2 3 4 5 6 Figure 21 I Colony PCR of a representative clone from the first metK KO approach. Left: Agarose gel of the colony PCR. Bands are shown for a representative clone from the KO attempt (∆metK) as well as ∆metEH as wt ctrl. Right: Schematic overview showing where the primers bind and the fragment lengths of their PCR products.

This could either mean, that both foreign MAT copies do not support the growth of E. coli, or that the expression of these genes from a plasmid is insufficient, due to plasmid copy number and/or expression from the lac promotor employed on the plasmid. In their study, Parungao and coworkers observed starvation for Met in their engineered strains. There is a very sensitive intracellular equilibrium for Met and SAM, as both products regulate their own biosynthesis via feedback inhibition. Met starvation was likely caused by the overproduction of SAM from the metK genes on high copy plasmids481.

3.1.5.2 2nd approach: CRISPR/Cas9

Thus, to rule out effects caused by the expression vector, another KO approach was tested, where the E. coli gene was attempted to be replaced by the S. solfataricus/M. jannaschii gene directly on the chromosome while retaining the original metK promoter. To this end, a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR-associated protein 9 (Cas9) based approach was tested. The CRISPR/Cas system is an ancient immune system in bacteria as a protection against foreign DNA (e.g. from viruses). After the survival of an infection, bits of the foreign DNA are stored in a CRISPR array, which serves as a library and helps the cell “remember” foreign DNA for a quick response upon a recurring infection with the same pathogen. Together with a trans-activating CRISPR RNA (tracrRNA) and a Cas protein, the foreign DNA is precisely targeted and cleaved (for more details on CRISPR/Cas see section 5.2.17.2, p. 114). Emmanuelle Charpentier, who came across the system by

43 3.1 Evolution of bacterial strains toward methionine analog usage studying pathogenic bacteria in the hope of discovering a new antibiotic, collaborated with Jennifer Doudna, who came across CRISPR as an expert for small regulatory RNA, to develop a genetic tool which they published in their seminal paper in 2012482. Since then, CRISPR/Cas has revolutionized genetic engineering, and Charpentier together with Doudna were awarded the Nobel Prize in Chemistry 2020 “for the development of a method for genome editing”. By now, numerous CRISPR/Cas systems for a variety of applications and several Cas proteins for the targeting of a wide range of DNA sequences have been published483. Most systems, however, require the design of a new guide RNA (gRNA, a tracrRNA/CRISPR RNA chimera) for each target, which, despite the availability of several online tools, can be cumbersome and Cas9 off-target effects limit the targetable sequence space484,485. Therefore, a more general system that does not require the design of specific gRNAs for each target was chosen here. Zhao and coworkers developed a technique they coined “the CRISPR/Cas9-assisted gRNA-free one-step (CAGO) genome editing technique”486. Critical for their approach was the design of a universal CRISPR/Cas9 recognition sequence (N20PAM) with minimal sequence homology to the E. coli genome to minimize Cas9 off-target effects. This universal N20PAM can be incorporated into any desired editing cassette and is targeted by its homologous gRNA, which is supplied on the pCAGO plasmid. Additionally, the cas9 gene, as well as the λ red recombination system, are supplied on the pCAGO plasmid.

44 3 Results and Discussion

Step 1 Assembly of editing cassette

R short R short Left homo Mj / Ss metKEc (ctrl: metK ) CmR Right homo N20PAM

Step 2 Left Right Recombination into target locus Chromosome

R short N20PAM Left homo Mj / Ss metKEc (ctrl: metK ) CmR Right homo R short Chromosome

R short N20 Left homo Mj / Ss metKEc (ctrl: metK ) CmR

Step 3 Cas9 - mediated DNA cleavage DSB and λ red - mediated recombination Right homo PAM

R short Left homo Mj / Ss metKEc (ctrl: metK ) Right homo Chromosome with desired insert / mutation Figure 22 I Schematic overview of processes involved in the CAGO technique. Left homo: left homology region, R short: first 40-50 bp of the right homology region, CmR: chloramphenicol resistance cassette, Right homo: right homology region. The scissors represent Cas9-mediated DNA cleavage of the universal N20PAM sequence, DSB: double-strand break.

Step one of the CAGO editing technique is the design and assembly of a suitable editing cassette, which should be comprised of three homology regions targeting the desired gene locus, a selection marker, and the universal N20PAM for later removal of the selection marker (Figure 22). In the case of a simple gene KO, no additional insert is needed. However, if editing of a gene locus or insertions of any kind are desired, an appropriate insert bearing the desired modification needs to be incorporated into the editing cassette. In this study, the goal was to replace the E. coli metK gene with a chloramphenicol resistance cassette followed by a copy of either the S. solfataricus or M. jannaschii metK gene. After λ red-mediated recombination of the editing cassette at the desired locus (Figure 22, step two), the success of the recombination is verified via colony-PCR and/or sequencing analysis. Positive clones are then subjected to step three, where the selection marker is removed via Cas9-mediated DNA cleavage and λ red-assisted recombination. Finally, the pCAGO plasmid bearing a temperature-sensitive ori is cured by incubation at 42°C (step four, not shown in the figure). For the editing cassette, four PCR fragments with BsaI recognition sites were generated and the cassette was assembled in a one-pot golden gate reaction. Hereby, alternating cycles of digestion with the BsaI restriction enzyme and ligation with the T4 DNA yielded the desired cassette, which was further amplified in a final PCR reaction due to the high DNA requirements during electroporation and recombination. Three different editing cassettes were generated: one bearing the S. solfataricus metK

45 3.1 Evolution of bacterial strains toward methionine analog usage gene, one harboring the M. jannaschii metK gene, and a control harboring the E. coli gene to monitor any potential problems with this new editing technique. The successful assembly was verified via sequencing analysis. For the introduction of the editing cassette into the target organism, the λ red system was induced from pCAGO in ∆metEH::FRT, the cells were made electrocompetent, and electroporated with about 400 ng of the editing cassette. After approximately 2 h of recovery in rich media lacking glucose (to avoid suppression of the lac promoter and concomitant λ red expression), the cells were plated on Agar-plates with Cm and incubated overnight at 30 °C. Single colonies were picked and the success of the recombination was monitored via sequencing analysis.

800 1,000 1,200 1,400 1,600 1,800 2,000 2,200 2,400 2,600 2,800 3,000 3,200 3,400 3,600 3,800 4,000 4,200 4,400 4,562 Consensus

Coverage

3,086,242 3,086,442 3,086,642 152 352 552 752 952 1,152 578 778 978 1,178 3,400 3,600 3,800 4,000 4,200 4,400 4,562 MG1655 metK regi...

EcMAT g.. galP gene left homolo... CmR from pSU18 right homology arm rest R short N20PAM

CAGO_M4_L h...

CAGO_S1_L h...

CAGO_M4_R h...

CAGO_S1_R h...

Figure 23 I Sequencing chromatogram of a representative colony from the CAGO attempt with the M. jannaschii and S. solfataricus editing cassettes. The sequencing chromatograms are aligned to the E. coli control cassette with its left homology arm (dark blue), the E. coli metK gene (green), R short fragment (light blue), Cm resistance cassette (grey), N20PAM (purple), and right homology arm (dark blue).

However, all sequenced colonies exhibited the control cassette with the E. coli metK gene, even those transformed with the M. jannaschii and S. solfataricus cassettes. To rule out any cross-contamination with the control cassette, the M. jannaschii and S. solfataricus cassettes were reassembled with fresh stocks and fresh water without handling the control cassette in parallel. Additionally, the freshly assembled editing cassettes were once again sequenced immediately before the transformation. Nevertheless, sequencing analyses after another recombination attempt revealed the same outcome as before: the E. coli metK gene followed by the homology regions, Cm resistance cassette, and N20PAM (Figure 23). As the correctness of the editing cassettes was confirmed and contamination with the control was ruled out, it seems likely that these archaeal genes are not able to support E. coli growth and cannot substitute for the endogenous metK. This is further supported by the lack of success of the first KO approach. Furthermore, bacteria and archaea are two entirely different kingdoms and therefore not closely related. It is perhaps not too surprising, that substituting such an essential gene with an ortholog from another kingdom does not support growth. In fact, it is a common approach to obtain orthogonal aaRS/tRNA pairs for stop-codon suppression, that do not interact with any endogenous aaRSs, tRNAs, or amino acids, from another kingdom308. The presence of the Cm resistance, however, is curious. The sequencing results look like the cells retained their endogenous metK gene, while only the part of the editing cassette bearing the Cm resistance and right homology regions were recombined into the chromosome. Such a recombination event may have been driven by the presence of Cm in the agar plates, but partial recombination of editing cassettes is highly unusual and a mechanism is not known. The recombination events leading to these results were not studied further. Instead, it was decided to focus on a third approach.

46 3 Results and Discussion

3.1.5.3 3rd approach: Point mutation

Due to the complex nature of methionine’s involvement in translation initiation, elongation, and the synthesis of the major methyl donor SAM, it was decided to simplify the goal of this project and focus solely on the adaptation to the closer structural analog ethionine. The purification optimizations (Appendix 9.1, p. 156) of the M. jannaschii and S. solfataricus MATs for the in vitro assay to test for TfMet turnover were thus discontinued. In their study, Dippe and coworkers demonstrate that a single point mutation, in the metK gene from Bacillus subtilis vastly improves its ethionine turnover to S-adenosyl ethionine471. With a sequence homology of 65%, the MAT enzymes from B. subtilis and E. coli are closely related and the mutated residue (I302 in E. coli) is highly conserved throughout all tested MAT enzymes. Indeed, the crystal structure of the E. coli MAT reveals that the mutation of the isoleucine at position 302 to the shorter side chain of valine likely affords more space to accommodate the ethyl group of ethionine (Figure 24). This mechanism has also been proposed by Dippe and coworkers for the success of their mutation in the B. subtilis enzyme471. Therefore, this point mutation was inserted in the metK gene in the strain ∆metEH::FRT to start a novel adaptation experiment.

Figure 24 I Crystal structure of the E. coli MAT with bound SAM (PDB: 1RG9). The isoleucine at position 302 is highlighted in red and bound SAM is shown as sticks colored by elements. The distance between the methyl group from SAM and the Ile side chain is shown in yellow (3.6 Å).

An editing cassette analogous to the control cassette described in the previous chapter was constructed, but bearing the desired I302V point mutation in the E. coli metK gene. The λ red system was induced and the editing cassette was transformed in ∆metEH::FRT. After incubation at 30°C overnight single colonies were picked and the recombination of the editing cassette at the metK locus was verified via sequencing analysis. After induction of both, the λ red system and Cas9, the cells were incubated at 30°C overnight in liquid culture and then spread on agar plates for the separation of single colonies. The removal of the resistance marker was verified via sequencing analysis (Figure 25 a). The pCAGO plasmid was cured via incubation at 42°C and removal of the plasmid was verified by streaking the cells on agar plates with and without the pCAGO selection marker ampicillin (Amp). This step was repeated until no cell growth on Amp could be observed (Figure 25 b). The resulting strain was termed ∆metEH::FRT metK(I302V) (abbreviated as metK(I302V)) and bears the isoleucine to valine point mutation at position 302 of the E. coli metK gene.

47 2,140 3.1 Evolution2,150 of bacterial 2,160strains toward methionine2,170 analog usage2,180 2,190 2,200 2,210 2,220 2,227 Consensus C G T T G T G A A A T T C A G G T T T C C T A C G C A G T C G G C G T G G C T G A A C C G A C C T C C A T C A T G G T A G A A A C T T T C G G T A C T Frame 1 R C E I Q V S Y A V G V A E P T S I M V E T F G T

Coverage a) 2,140 2,150 2,160 2,170 2,180 2,190 2,200 2,210 2,220 2,227 Ec metK I302 MG1655 metK...C G T T G T G A A A T T C A G G T T T C C T A C G C A A T C G G C G T G G C T G A A C C G A C C T C C A T C A T G G T A G A A A C T T T C G G T A C T Frame 1 R C E I Q V S Y A I G V A E P T S I M V E T F G T

I302V Rec...C G T T G T G A A A T T C A G G T T T C C T A C G C A G T C G G C G T G G C T G A A C C G A C C T C C A T C A T G G T A G A A A C T T T C G G T A C T Frame 1 R C E I Q V S Y A V G V A E P T S I M V E T F G T

b)

Figure 25 I Establishing the strain ∆metEH::FRT metK(I302V). a) Verification of the isoleucine to valine point mutation at position 302 (ATC -> GTC) of the E. coli metK. b) LB-agar plate verifying the removal of the pCAGO plasmid. Left: ampicillin supplementation, right: without ampicillin.

Next, the growth behavior of the new strain with the modified metK gene was compared to its ancestor ∆metEH::FRT, which was used in the first ALE experiment. Both strains were cultivated in NMM19 supplied with 15 µM Met and Eth concentrations ranging from 0-1000 µM Eth, OD600 measurements were taken over the time course of 48 h (Figure 26). The optical densities increase with the Eth concentration, whereby for ∆metEH::FRT there is a noticeable gap between cultures cultivated with up to 150 µM Eth and those cultivated with 500 µM Eth and 1000 µM Eth. In the case of metK(I302V) the optical densities do not vary as much, especially during exponential growth the OD600 values are very similar, regardless of the Eth concentration. While in the case of metK(I302V) as little as 15 µM

Eth is enough for growth up to approximately OD600 = 1.1, the same amount of Eth only results in a maximal OD600 of around 0.9 for ∆metEH::FRT. At a concentration of 1 mM Eth, however, the ancestral strain reaches a higher optical density of approximately 1.7, while the new strain only reaches values of around 1.5. Thus, high ethionine concentrations seem to have less impact on the growth of the new strain than on its ancestor ∆metEH::FRT.

a) 0 µM Eth b) ∆metEH 15 µM Eth metK(I302V) 0 µM Eth 30 µM Eth 15 µM Eth 50 µM Eth 30 µM Eth 100 µM Eth 50 µM Eth 150 µM Eth 100 µM Eth 1,8 500 µM Eth 1,8 150 µM Eth 1000 µM Eth 500 µM Eth 1,6 1,6 1000 µM Eth 1,4 1,4 1,2 1,2 1,0 1,0

0,8 0,8

OD(600) OD(600) 0,6 0,6 0,4 0,4 0,2 0,2 0,0 0,0 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 Time [h] Time [h]

Figure 26 I Comparison of optical densities of the new strain metK(I302V) and its ancestor ∆metEH::FRT in the presence of increasing Eth concentrations. a) OD600 values of ∆metEH::FRT cultivated in NMM19 supplied with 15 µM Met and Eth concentrations ranging from 0-1 mM measured over 48 h. b) OD600 values of metK(I302V) cultivated in NMM19 supplied with 15 µM Met and Eth concentrations ranging from 0-1 mM measured over 48 h.

48 3 Results and Discussion

Measuring optical densities at 600 nm is the most widely used and easiest way to monitor bacterial growth. However, these values make no statement about the viability of the measured cells, as dead cells are measured together with the live ones. Furthermore, changes in cell morphology may skewer the correlation between OD600 and cell number, and starvation for SAM is known to result in longer cells, as DNA replication and cell division are regulated by methylation385. As Eth is not a very good substrate for bacterial MATs and Met is only supplied in limiting concentrations, the cells in this study may also be somewhat starved for SAM. Therefore, in addition to measuring OD600 values, the viability of the cells was assessed by counting colony forming units (CFU). As before, both strains were cultivated in NMM19 supplied with 15 µM Met as well as with 0 µM Eth, 15 µM Eth, 50 µM Eth, and 100 µM Eth. For the first 8 - 10 h samples were taken every 2 h with further samples taken after approximately 24 h, 30 h, and 48 h. OD600 values were measured and dilution series were spotted on agar plates with media compositions corresponding to those of the respective liquid culture. After incubation at 37°C, the CFU per mL of cell culture were counted and plotted with the corresponding OD600 values against the cultivation time (Figure 27).

The number of CFUs correlates with the corresponding OD600 values in cultures where no Eth is supplied. However, upon Eth supplementation, a clear discrepancy between OD600 values and CFU can be observed. The number of CFU starts to decrease after approximately 6 h, when the OD600 values are still increasing, indicating a diminished number of cells capable of reproduction while the cell mass is growing. This observation might be well in line with a functional protein biosynthesis, accounting for the increase in cell mass, and defective methylation, hampering DNA replication487,488, gene regulation489, and cell division385,490. In the time between around 8 h and 22 h of cultivation, the decrease in CFU is especially noticeable and this effect is even more pronounced for Eth concentrations of 50 µM and 100 µM. Therefore, for the second ALE experiment, a distinctly lower Eth concentration of only 15 µM was chosen in contrast to 100 µM Eth during the first experiment. Furthermore, instead of employing the MOPS-buffered NMM used in the first ALE, it was decided to use regular, phosphate-buffered NMM, as it was successfully used in five different adaptation experiments in the Budisa lab (adaptation to [3,2]Tp231, 4-F-Trp, 5-F-Trp491, 6-F-Trp, 7-F-Trp [data not yet published]).

49 3.1 Evolution of bacterial strains toward methionine analog usage

a) ∆metEH: 0 µM Eth b) metK(I302V): 0 µM Eth 1,4 3,50E+08 1,4 3,50E+08 OD(600) OD(600) 1,2 CFU 3,00E+08 1,2 CFU 3,00E+08

1,0 2,50E+08 1,0 2,50E+08

0,8 2,00E+08 0,8 2,00E+08

0,6 1,50E+08 0,6 1,50E+08

CFU/mL

OD(600)

CFU/mL OD(600)

0,4 1,00E+08 0,4 1,00E+08

0,2 5,00E+07 0,2 5,00E+07

0,0 0,00E+00 0,0 0,00E+00 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Time [h] Time [h] ∆metEH: 15 µM Eth 1,4 3,50E+08 1,4 metK(I302V): 15 µM Eth 3,50E+08 OD(600) OD(600) 1,2 CFU 3,00E+08 1,2 CFU 3,00E+08

1,0 2,50E+08 1,0 2,50E+08

0,8 2,00E+08 0,8 2,00E+08

0,6 1,50E+08 0,6 1,50E+08

CFU/mL

OD(600)

CFU/mL OD(600)

0,4 1,00E+08 0,4 1,00E+08

0,2 5,00E+07 0,2 5,00E+07

0,0 0,00E+00 0,0 0,00E+00 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Time [h] Time [h]

∆metEH: 50 µM Eth 1,4 3,50E+08 1,4 metK(I302V): 50 µM Eth 3,50E+08 OD(600) OD(600) 1,2 CFU 3,00E+08 1,2 CFU 3,00E+08

1,0 2,50E+08 1,0 2,50E+08

0,8 2,00E+08 0,8 2,00E+08

0,6 1,50E+08 0,6 1,50E+08

CFU/mL

CFU/mL

OD(600) OD(600)

0,4 1,00E+08 0,4 1,00E+08

0,2 5,00E+07 0,2 5,00E+07

0,0 0,00E+00 0,0 0,00E+00 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Time [h] Time [h]

metK(I302V): 100 µM Eth 1,4 ∆metEH: 100 µM Eth 3,50E+08 1,4 3,50E+08 OD(600) OD(600) 1,2 CFU 3,00E+08 1,2 CFU 3,00E+08

1,0 2,50E+08 1,0 2,50E+08

0,8 2,00E+08 0,8 2,00E+08

0,6 1,50E+08 0,6 1,50E+08

CFU/mL

OD(600)

CFU/mL OD(600)

0,4 1,00E+08 0,4 1,00E+08

0,2 5,00E+07 0,2 5,00E+07

0,0 0,00E+00 0,0 0,00E+00 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Time [h] Time [h]

Figure 27 I Comparison of OD600 values and CFU between ∆metEH::FRT and metK(I302V) for increasing Eth concentrations. a) Values for the strain ∆metEH::FRT with Eth concentrations increasing from 0-100 µM from top to bottom. b) Values for the strain metK(I302V) with Eth concentrations increasing from 0-100 µM from top to bottom. Values represent the mean of two (50 µM Eth, 100 µM Eth) and three (0 µM Eth, 15 µM Eth) experiments with the standard deviation as error bars.

50 3 Results and Discussion

3.1.6 ALE 2.0

A second attempt at adapting E. coli to growth on the Met analog ethionine was started with the newly engineered strain metK(I302V). As before, a control experiment with Met and in the absence of any analog was conducted (Figure 28 a) to help distinguish between effects associated with the presence of Eth and those associated with long-term cultivation in minimal media. Six colonies of metK(I302V), numbered 1, 2, 6, 7, 10, and 15, were used to start ALE experiments in the absence and presence of 15 µM Eth, resulting in a total of 12 different populations. The colonies were used to inoculate six precultures in LB and after cultivation at 37°C overnight the cells were washed three times with NMM lacking Met. In order to be able to more easily handle a larger number of cultures, this ALE experiment was conducted in uncoated 24-well plates sealed with a sterile and gas permeable membrane, rather than in flasks covered with aluminum foil. 500 µL of NMM19 supplemented with 10 µM Met and with or without 15 µM Eth was inoculated with 0.02 OD600 of the washed preculture. As this led to the evaporation of roughly 50 % of the culture volume, the volume was increased to 1 mL starting with the fourth passage. The high evaporation ratio and concomitant drastic decrease in culture volume at the time of OD600 measurements account for the higher values of the first three passages that can be seen in Figure 28.

a) 2,4 10 µM Met ctrl. 2,2 metK(I302V) 1 metK(I302V) 2 2,0 metK(I302V) 6 1,8 metK(I302V) 7 metK(I302V) 10 1,6 metK(I302V) 15 1,4 1,2

OD(600) 1,0 0,8 0,6 0,4 0,2 0,0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 Passage

b) 15 µM Ethionine 10 µM Met 7 µM Met ______2,4 metK(I302V) 1 2,2 metK(I302V) 2 2,0 1,8 metK(I302V) 6

1,6 metK(I302V) 7

1,4 metK(I302V) 10 1,2

OD(600) metK(I302V) 15 1,0 0,8 0,6 0,4 0,2 0,0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 Passage

Figure 28 I Second adaptive laboratory evolution towards Met analog utilization. Optical densities (OD600) are plotted against the number of passages. a) Control experiment with 10 µM Met and no analog addition. b) Adaptation towards Eth usage.

51 3.1 Evolution of bacterial strains toward methionine analog usage

Instead of passaging the cultures only every 48-72 h, this time they were passaged every 24 h, as the number of CFU noticeably decreased after 8-10 h of cultivation, indicating a decrease in viability. Apart from the evaporation-associated decrease in optical densities after the first three passages, a slight decrease in optical densities can be observed for the experiment with Eth within the first 6 or 7 passages without there being a change in the cultivation conditions (Figure 28 b). Similar observations were made with the previous TfMet (Figure 19 c) and Eth (Figure 19 b) ALE experiments. In all cases, the precultures were cultivated in Met-rich media and then washed in minimal media prior to inoculation of the ALE experiments. While washing the cells removes extracellular Met, it does not immediately affect the intracellular Met pool. In addition to the free intracellular Met pool, the amino acid may also be salvaged from protein catabolism and cleavage of the N-terminal Met. This may be responsible for the higher optical densities observed for the first few passages.

Curiously, starting from passage 7, the OD600 values of the population metK(I302V) 15 distinctly decreases compared to the other populations. A similar effect can be seen for metK(I302V) 7 starting from passage 9 or 10. After 15 passages the Met concentration was reduced to 7 µM in an attempt to increase the selection pressure for Eth utilization. In passage 21 (after 3 weeks of cultivation) the optical density of population metK(I302V) 10 exhibits an exceptionally high value, indicating a possible adaptation event. Unfortunately, the effect disappears already in the next passage. The control cultures grow quite stably from the fourth passage on. Due to time constraints, this second ALE experiment was stopped after 31 passages, and the resulting populations were denoted as 31Eth 1, 2, 6, 7, 10, 15 for those cultivated in Eth-supplemented media and 31Met 1, 2, 6, 7, 10, 15 for those cultivated in NMM19 lacking the ncAA. To be able to better assess possible fitness increases after these 31 passages, CFU were once again counted and compared to the data collected before the adaptation experiment.

NMM19 + 7µM Met + 15µM Eth

a) b) metK(I302V) 2 1,4 metK(I302V) 2 3,50E+08 1,4 31Met 2 3,50E+08 31Eth 2 M in Eth Eth in Eth OD(600) 1,2 CFU 3,00E+08 1,2 OD(600) 3,00E+08 CFU 1,0 2,50E+08 1,0 2,50E+08

0,8 2,00E+08 0,8 2,00E+08

0,6 1,50E+08 0,6 1,50E+08

CFU/mL

OD(600)

CFU/mL OD(600)

0,4 1,00E+08 0,4 1,00E+08

0,2 5,00E+07 0,2 5,00E+07

0,0 0,00E+00 0,0 0,00E+00 0 5 10 15 20 25 30 0 5 10 15 20 25 Time [h] Time [h]

1,4 metK(I302V) 7 3,50E+08 1,4 3,50E+08 31Eth 7 metK(I302V)31Met 7 7 Eth in Eth OD(600) M in Eth OD(600) 1,2 CFU 3,00E+08 1,2 CFU 3,00E+08

1,0 2,50E+08 1,0 2,50E+08 0,8 0,8 2,00E+08 2,00E+08

0,6 1,50E+08 0,6

CFU/mL

CFU/mL OD(600) OD(600) 1,50E+08 0,4 1,00E+08 0,4 1,00E+08 0,2 5,00E+07 0,2 5,00E+07 0,0 0,00E+00 0,0 0 5 10 15 20 25 0 5 10 15 20 25 Time [h] Time [h] Figure 29 I Comparison of OD600 values and CFU between the adapted populations 2 (top) and 7 (bottom) and the corresponding control populations cultivated in media supplemented with Eth. a) Populations cultivated in Eth- supplemented media for 31 passages. b) Control populations cultivated in NMM19 lacking Eth for 31 passages and then cultivated in Eth-supplemented media for this experiment.

52 3 Results and Discussion

The CFU of the populations grown in the presence of Eth for 31 passages (31Eth) were evaluated in their native environment of NMM19 supplemented with 7 µM Met and 15 µM Eth, as well as in NMM19 solely supplied with 10 µM Met. Likewise, the CFU of the control populations (31Met) were evaluated in their native media lacking the analog and also in media supplemented with 15 µM Eth. Figure 29 shows representative plots of the populations 31Eth 2/31Met 2 and 31Eth 7/31Met 7 evaluated in Eth-supplemented media. Depicted in blue in Figure 29 a are the populations that were cultivated in the adaptive media, while Figure 29 b depicts their control populations in grey. After long- term cultivation in Eth-supplemented media, the OD600 and CFU curves resemble those of the start strain metK(I302V) cultivated in the absence of Eth. While prior to adaptation the number of CFU decreased noticeably after approximately 8 h (Figure 27), this is no longer the case after 31 passages of adaptation (Figure 29 a), suggesting an increase in general fitness. To eliminate the possibility that this fitness increase derives solely from an adaptation to long-term cultivation in NMM19, the CFU of the 31Met control populations were evaluated in Eth-supplemented media. Indeed, these curves (Figure 29 b) resemble those of the unadapted strain in the presence of ethionine (Figure 27) with a clear reduction of CFU after about 8 h, indicating that the fitness increase observed for the 31Eth populations is likely related to adaptation towards analog usage. Interestingly, despite growing to distinctly lower optical densities than 31Eth 2, the population 31Eth 7 produces a similar number of

CFU, once again demonstrating that cell viability cannot necessarily be deduced from OD600 values. Similar observations were also made with the other populations and are summarized in Figure 30. Individual plots can be looked up in the appendix (chapter 9.2, p. 159).

NMM19 + 7µM Met + 15µM Eth a) b) 1,4 1,4 31M in Eth 31M 1 31Eth in Eth 31Eth 1 31M 2 31Eth 2 1,2 1,2 31M 6 31Eth 6 31M 7 1,0 31Eth 7 1,0 31Eth 10 31M 10 31M 15

0,8 31Eth 15 0,8 OD(600) OD(600) 0,6 0,6

0,4 0,4

0,2 0,2

0,0 0,0 0 5 10 15 20 25 0 5 10 15 20 25 Time [h] Time [h]

3,50E+08 31Eth in Eth 3,50E+08 31M in Eth 31Eth 1 31M 1 31Eth 2 3,00E+08 3,00E+08 31M 2 31Eth 6 31M 6 2,50E+08 31Eth 7 2,50E+08 31M 7 31Eth 10 31M 10

2,00E+08 31Eth 15 2,00E+08 31M 15 CFU/mL CFU/mL 1,50E+08 1,50E+08

1,00E+08 1,00E+08

5,00E+07 5,00E+07

0,00E+00 0,00E+00 0 5 10 15 20 25 0 5 10 15 20 25 Time [h] Time [h] Figure 30 I Overview of OD600 values (top) and CFU (bottom) of all the adapted populations and the control populations cultivated in media supplemented with Eth. a) Populations cultivated in Eth-supplemented media for 31 passages. b) Control populations cultivated in NMM19 lacking Eth for 31 passages and then cultivated in Eth-supplemented media for this experiment. For better visibility, CFU/mL values are plotted as lines in these summary graphs.

53 3.1 Evolution of bacterial strains toward methionine analog usage

While long-term cultivation in Eth-supplemented media resulted in populations with divergent maximal optical densities (Figure 30 a, top), they all produce a similar number of CFU (Figure 30 a, bottom). The control populations cultivated only in the presence of Met, on the other hand, all produce similar OD600 values, while the number of their corresponding CFU differ noticeably from population to population (Figure 30 b), albeit for most populations a general downward trend of the curve can be observed starting approximately after 10 h of cultivation.

The OD600 and CFU curves of the 31Met control populations assessed in NMM19 lacking Eth are similar to those taken under the same conditions prior to adaptation and can also be found in the appendix (p. 161). Taken together, a strain auxotrophic for Met under all conditions was established, further improved for ethionine to SAE turnover, and shows improved fitness after 31 passages of cultivation in the presence of the synthetic amino acid ethionine. These results indicate that adaptation to methionine analog usage might be feasible and thus poses another step toward the alienation of life as we know it.

54 3 Results and Discussion

3.2 Biocontainment of TUB170

The unambiguous, proteome-wide replacement of Trp with several different analogs has already been shown231,491, demonstrating the flexibility of the genetic code. In the first part of this thesis, this flexibility is further tested by examining the replacement of another amino acid, methionine, with synthetic structural analogs. Here, another approach towards further alienation of life is discussed, where it was attempted to establish biocontainment in one of the already adapted, Trp-substituted strains. These efforts will focus primarily on the fittest of the adapted strains, TUB170, and its substrate [3,2]Tp. In order to achieve [3,2]Tp dependency, so that the adapted strain TUB170 will not be viable outside of a controlled laboratory environment, an aaRS capable of discriminating between Trp and [3,2]Tpa needs to be constructed. The so-called double-sieve selection (Figure 31, step 1) has proven successful for the selection of many orthogonal translation systems (OTS) for the incorporation of non- canonical amino acids over the years205,228,492. It consists of iterative rounds of positive and negative selections, whereby positive selections select for viable library members in the presence of the target ncAA and negative selections are used to sort out variants incorporating canonical amino acids in the absence of the ncAA. Both methods select for the survival of the host cell and usually, two or three rounds of consecutive positive and negative selection are repeated (see also chapter 1.2.1.6, p. 18). After the double-sieve selection, selected variants are screened for promising candidates via two different assays (Figure 31, step 2), one relies on cell survival analogous to positive selections, and the other monitors the fluorescence of a reporter protein. In both assays, survival/fluorescence in the presence of the target ncAA is compared to survival/fluorescence in its absence as a negative control. Promising candidates, that exhibit enhanced growth and fluorescence in the presence of the ncAA are used for the expression of a reporter protein via stop-codon suppression for the identification of the incorporated amino acid via mass spectrometry (MS) analysis. All the above-mentioned steps rely on the amber codon as a “free” codon, unassigned to any of the cAAs and therefore available for the targeted incorporation of ncAAs. Finally, the endogenous TrpRS needs to be replaced with the discriminating [3,2]Tpa-specific aaRS (TpaRS) in the adapted strain (Figure 31, step 3). However, while the first two steps rely on the amber stop codon, in the adapted strain TUB170, [3,2]Tpa is incorporated in response to the Trp codon, as Trp is completely eradicated from the organism. Thus, the anticodon of the suppressor tRNA needs to be mutated from amber (CUA) to tryptophan (CCA) after selection and screening and prior to insertion in the adapted strain. For this purpose, the pyrrolysyl OTS was chosen, as experimental data where the amber suppressor-tRNA anticodon was mutated without impairing PylRS function, as well as the crystal structure of Desulfitobacterium hafniense PylRS, indicate, that this enzyme exhibits low selectivity toward the tRNA anticodon and is thus permissive towards anticodon mutations255,257,279.

55 3.2 Biocontainment of TUB170

1. Double-sieve selection

aaRS library cAAs Selected Variants ncAA

Transformation amber codon(s)

1) Plasmid Isolation 2) Transformation Positive Selection

Cells with aaRS Cells with activating inactive aaRS

Cells with aaRS Cells with aaRS activating activating amber codons Negative Selection

n

o

1) Plasmid Isolation d

o

2) Transformation c

i

t

n

a

r

e

b

m

a

h

t

i

2. Screening of selected variants w

A

N

e

R

c

t

n

e

c

s

e

r

o

u

l

F

orthogonal 3. Replacement of endogenous aaRS/tRNA genes n with selected variant o

d

o

c

i

t

n

a trpS

h

t

i

w

A

N

R

t

chromosome adapted strain

Figure 31 I Overview of the biocontainment approach. Step 1: Double-sieve selection of an aaRS library, consisting of consecutive rounds of positive and negative selections. During positive selections, the target ncAA is supplied and functional library members are selected. Negative selections take place in the absence of the ncAA to sort out library members incorporating cAAs. Step 2: Screening of the selected variants for promising candidates by comparing cell growth and fluorescence in the absence and presence of the ncAA. These steps require an amber anticodon. Step 3: Replacement of the endogenous trpS gene in the adapted strain with a selected aaRS/tRNA pair for suppression of Trp codons. The tRNA anticodon needs to be mutated from amber to Trp.

56 3 Results and Discussion

Efforts to select a TpaRS were already started within the framework of the master thesis “Directed Evolution of Orthogonal Pyrrolysyl-tRNA Synthetases (TpaRS)”. First selection experiments were conducted with an M. mazei PylRS library constructed by a colleague (Matthias Exner) for the incorporation of S-allyl-cysteine. However, even after several selection attempts and consecutive changes to the experimental parameters such as selection plasmids, growth media, chloramphenicol (Cm) concentrations, incubation times, and digestion strategies, no promising candidates could be isolated. After selections with an M. barkeri PylRS library constructed by another colleague (Matthias Hauf) for the incorporation of o-nitrobenzyl-3,4-dihydroxyphenylalanine (ONB-Dopa) did not yield any promising mutants either, a new M. barkeri PylRS library was constructed. Nevertheless, even after plasmid backbone optimizations, the M. barkeri based libraries only yielded very few colonies and selection attempts were unfruitful.

3.2.1 M. mazei PylRS Library

As the libraries constructed from the M. barkeri PylRS did not seem very viable, a new library based on the PylRS from M. mazei was created within the framework of this thesis. To decide which residues should be mutated, the Trp analog [3,2]Tp was modeled into the M. mazei PylRS active site (Figure 32 a). Based on their proximity to [3,2]Tpa, the active site residues N346, C348, V401, W417, and G419 were chosen for randomization to NNS via site-saturation mutagenesis. The residues N346 and C348 are common targets for library design and frequently mutated in PylRS mutants selected for the incorporation of ncAAs205.The library was assembled via golden gate cloning493,494 on a pJZ plasmid with ampicillin resistance (backbone constructed during the master thesis). This cloning technique utilizes type IIs restriction endonucleases, which cut outside of their recognition sequence and thereby enable “scarless” cloning. As the recognition sequence is cut off during digestion, cut sites can be placed anywhere, e. g. in the middle of genes, and are commonly introduced in the overhang of the PCR primers. The quality of the library was surveyed via sequencing of the library mixture (Figure 32 b) as well as randomly picked single colonies (Figure 32 c). Some bias towards cytosine, especially at position G419 with the codon GGG can be seen. This is not unexpected, as GC base pairs form three hydrogen bonds and are therefore often favored in the annealing step of PCR reactions. At the beginning of the PCR only the wildtype, unmutated sequence is available as a template and thus due to stronger hydrogen bonding, those primers that contain many will preferentially anneal to the GGG codon of G419, producing C-rich templates for the following PCR cycles. Nevertheless, the sequencing results of the randomly picked colonies show good variety at the mutated positions. It should be pointed out that, with libraries of sizes commonly in the 106-108 range, sequencing of a few variants only offers a small snapshot and is not necessarily representative of the library. However, this small snapshot might at least reveal any major systematic errors that might have occurred during design and/or assembly. As the results presented below do not reveal any obvious problems, the library was deemed satisfactory.

57 3.2 Biocontainment of TUB170

a)

C348 N346 W417

G419

V401

1,2101,360 1,2201,370 1,2301,380 1,2401,390 1,2501,400 1,260 1,410 1,270 1,420 1,273 1,424 1,420 1,430 1,440 1,450 1,460 1,470 1,480 ConsensusConsensus C T G G NG K AS A T C T TC TN TR CS CC TA CG TA GT CG AG KG NA ST GC TG CG GG GA AT CG CC CA AC TA AC CG CG G CA AT AT GA AT C CT TG G AG AA AA GT G G G Consensus A T A A A C C C M C S A T A C C S G C A G G T T T C G G G C T C G A A C G C C T T C T A A A G G T Translation Translation ? W ? N S F ? P R L QW / H D ? R S D D A P H Y G R K L I T L G K N G Translation I N P P / R Y P / R Q V S G S N A F * R Coverage Coverage Coverage

2,4472,597 2,4572,607 2,4672,617 2,4772,627 2,4872,637 2,497 2,647 2,507 2,657 2,510 2,661 2,657 2,667 2,677 2,687 2,697 2,707 2,717 b) Asn346 Cys348 Val401 Trp417 Gly419 Mm PylRSMm PylRS Mm PylRS pJZ_Trp MmpJZ_Trp PylS(Y3... MmC PylS(Y3...T G G A G A AC A T C T TC TT TG CC CC TA CG TA GTpJZ_TrpCG A GMmG GPylS(Y3...TA ATAGCT TGA CGA GGA GAC ATC CGC CCT CAG ACG TA ACT CGA CG G GCA GAT CAT AGA GAT GC TCT TTG TG CAG GAA GAA GGT CG TGC GG A A C G C C T T C T A A A G G T Translation Translation L N E F L C S Q S M TranslationA G V D S V K G G P C P W T I I R P G E L A N D G L R F E E G S WL E R L L K V

1,370 1,200 1,380 1,210 1,390 1,220 1,4001,3701,230 1,4101,3801,240 1,4201,3901,250 1,4301,4001,260 1,4401,4101,270 1,4501,4201,280 1,4601,4301,290 1,4701,4401,300 1,4801,4501,3101,483 1,4601,319 1,470 1,480 1,483 Consensus ConsensusTG C A V R G G CTCTG GTKA GC CTCTCA KTAWGC C ConsensusAG GC AT TG GA GC AC GTCG GTGACGAATVGTRG GC GA CTCA GCTGTAG CGACATCA A ATACTC CTGRTCGV ASTATGATAG CSCANGCTGA GACATAT-AGTGC GTGTGATCTAGTGTCGTGC ATGCTAGA AC C CAGC RC VTSGTAGC GTA SA ANTCGTG CATATA-TGATGCA ATATGTCA GTGC G C TC G A A C G C C T TC TA A A G G T TA A A Frame 3 Frame 3 A ? V L GW /L P F I? P QFrameL M 3 D G R S EAG W?C GVT IGR DPE K IN PPL ?LE IDS ?RI A EI - GWT GDF GIF DL KNE PHR ?L GLI ?KI ADV -KFG K F IG L E R L L K V K MmPylS_PB161.a...MmPylS_PB161.a...C T G G NG K AS A T C T TC TN TR CS CC TA CG TA GT CMmPylS_PB161.a...G AG KG NA STAGCT TGA CGA GGA GAC ATC CGC CCM CAC ACS TA ACT CGA CG GC SCA GAT CAT AGA GAT GC TCT TTG TG CAG GAA GAA GGT CG TGC GG A A C G C C T T C T A A A G G T Translation Translation Translation Coverage Coverage Coverage

2,609 2,439 2,619 2,449 2,629 2,459 2,6392,6092,469 2,6492,6192,479 2,6592,6292,489 2,6692,6392,499 2,6492,6792,509 2,6882,6592,519 2,6982,6692,529 2,708 2,6792,539 2,7182,6882,5492,721 2,6982,558 2,708 2,718 2,721 pJZ_Trp Mmc) PylS(Y3...pJZ_TrpTG MmC A PylS(Y3...G TA G CTCTG AG A C CTCTCA TGA C C pJZ_TrpAG GC AT T MmG GA PylS(Y3...GC AC GTCG GTGACGAATGTGTACGGA CTCA GCTGTAG CGACATCA A ATACTC CTGTCG AGTATGATAG CGCAGGTGA GACATAT-AGTGC GTGTGATCTAGTGTCGTGC ATGCTAGA AC C CAGC CTGTGTAGC GTA GA GATGTG CATATA-TGATGCA ATATGTCA GTGC G C TC G A A C G C C T TC TA A A G G T TA A A Frame 3 Frame 3 A V V L G N P F IC P QFrameL M 3 D G R S EAG WVC GVT IGR DPE K IN PPL WLE IDS GRI A EI - GWT GDF GIF DL KNE PHR LW GLI GKI ADV -KFG K F IG L E R L L K V K Mm PylRS Mm PylRS Mm PylRS V... A... C... V... Tr... Gl... Tr... Gl...

Mm_lib_007_PB...Mm_lib_007_PB...TG C A G G C G CTCTG GTA GC CTCTCA GTA GC C AG GCMm_lib_007_PB...AT TG GA GC AC GTCG GTGACGAATGTG CGGA CTCA GCTGTAG CGACATCA A ATACTC CTGCTCG AGTATGATAG CACAGGCTGA GACATAT-AGTGC GTGTGATCTAGTGTCGTGC ATGCTAGA AC C CAGC C GTGTAGC GTA A GATCGTG CATATA-TGATGCA ATATGTCA GTGC G C TC G A A C G C C T TC TA A A G G T TA A A Frame 3 Frame 3 A G V L G * P F I E P QFrameL M 3 D G R S EAG WGC GVT IGR DPE K IN PPL RLE IDS SRI A EI - GWT GDF GIF DL KNE PHR RL GLI SKI ADV -KFG K F IG L E R L L K V K

Mm_lib_003_PB...Mm_lib_003_PB...TG C A T TG G CTCTG GTATGC CTCTCA GTGA GC C AG GCMm_lib_003_PB...AT TG GA GC AC GTCG GTGACGAATGTGTGC GA CTCA GCTGTAG CGACATCA A ATACTC CTGTCG ACTATGATAG CGCAGGCTGA GACATAT-AGTGC GTGTGATCTAGTGTCGTGC ATGCTAGA AC C CAGC GC GTCGTAGC GTA GA GATCGTG CATATA-TGATGCA ATATGTCA GTGC G C TC G A A C G C C T TC TA A A G G T TA A A Frame 3 Frame 3 A L V L G L P F IG P QFrameL M 3 D G R S EAG WLC GVT IGR DPE K IN PPL GLE IDS GRI A EI - GWT GDF GIF DL KNE PHR GL GLI GKI ADV -KFG K F IG L E R L L K V K

Mm_lib_001_PB...Mm_lib_001_PB...TG C A C C C G CTCTG GTGA GC CTCTCA TA GC C AG GCMm_lib_001_PB...AT TG GA GC AC GTCG GTGACGAATCGTCG CGGA CTCA GCTGTAG CGACATCA A ATACTC CTGTCG AGTATGATAG CTCATGCTGA GACATATAGTGC GTGTGATCTAGTGTCGTGC ATGCTAGA AC C CAGC GC CTGTAGC GTA ATATCGTG CATATATGATGCA ATATGTCA GTGC G C TC G A A C G C C T TC TA A A G G T TA A A Frame 3 Frame 3 A P V L G W P F I * P QFrameL M 3 D G R S EAG WPC GVT IGR DPE K IN PPL ALE IDS FRI A EI GWT GDF GIF DL KNE PHR AL GLI KFI ADV KFG K F IG L E R L L K V K

Mm_lib_006_PB...Mm_lib_006_PB...TG C A A G G G CTCTG G GA C CTCTCA ATATGC C AG GCMm_lib_006_PB...AT TG GA GC AC GTCG GTGACGAATAGTG GC GA CTCA GCTGTAG CGACATCA A ATACTC CTGATCGA AGTATGATAG C CAGGTGA GACATAT-AGTGC GTGTGATCTAGTGTCGTGC ATGCTAGA AC C CAGC AC ATGTAGC GTA CA GATGTG CATATA-TGATGCA ATATGTCA GTGC G C TC G A A C G C C T TC TA A A G G T TA A A Frame 3 Frame 3 A R V L G G P F I M P QFrameL M 3 D G R S EAG WRC GVT IGR DPE K IN PPL KLE IDS RRI A EI - GWT GDF GIF DL KNE PHR KL GLI RKI ADV -KFG K F IG L E R L L K V K

Mm_lib_002_PB...Mm_lib_002_PB...TG C A G G G G CTCTG AG CA GC CTCTCA TCA C C AG GCMm_lib_002_PB...AT TG GA GC AC GTCG GTGACGAATGTG GC GA CTCA GCTGTAG CGACATCA A ATACTC CTGTCG ACTATGATAG CGCA GCTGA GACATAT-AGTGC GTGTGA-CTAGTGTCGTGC ATGCTAGA AC C CAGC GC CTCGTAGC GTA GA CATCGTG CATATA-TGATGCA ATATG-CA GTGC G C TC G A A C G C C T TC TA A A G G T TA A A Frame 3 Frame 3 A G V L G T P F IS P QFrameL M 3 D G R S EAG WGC GVT IGR DPE K IN PPL ALE IDS ARI A EI - GWT GD- GIF DL KNE PHR AL GLI AKI ADV -KFG K - IG L E R L L K V K

Mm_lib_005_PB...Mm_lib_005_PB...TG C A C A G G CTCTG GTATC CTCTCA CTATC C AG GCMm_lib_005_PB...AT TG GA GC AC GTCG GTGACGAATCGTAG GC GA CTCA GCTGTAG CGACATCA A ATACTC CTGTCGA ACTATGATAG C CAGGTGA GACATAT-AGTGC GTGTGATCTAGTGTCGTGC ATGCTAGA AC C CAGC GC ATCGTAGC GTA CA GATGTG CATATA-TGATGCA ATATGTCA GTGC G C TC G A A C G C C T TC TA A A G G T TA A A Frame 3 Frame 3 A Q V L G F P F IL P QFrameL M 3 D G R S EAG WQC GVT IGR DPE K IN PPL DLE IDS RRI A EI - GWT GDF GIF DL KNE PHR DL GLI RKI ADV -KFG K F IG L E R L L K V K

Mm_lib_004_PB...Mm_lib_004_PB...TG C A A A G G CTCTG AG ATGC CTCTCA TCA GC C AG GCMm_lib_004_PB...AT TG GA GC AC GTCG GTGACGAATAGTAG GC GA CTCA GCTGTAG CGACATCA A ATACTC CTGATCGA ACTATGATAG C CA GCTGA GACATAT-AGTGC GTGTGATCTAGTGTCGTGC ATGCTAGA AC C CAGC AC ATCGTAGC GTA CA ATCGTG CATATA-TGATGCA ATATGTCA GTGC G C TC G A A C G C C T TC TA A A G G T TA A A Frame 3 Frame 3 A K V L G M P F IS P QFrameL M 3 D G R S EAG WKC GVT IGR DPE K IN PPL NLE IDS HRI A EI - GWT GDF GIF DL KNE PHR LN GLI HKI ADV -KFG K F IG L E R L L K V K

Figure 32 I M. mazei PylRS library. a) Crystal structure of M. mazei PylRS active site with bound ATP and modeled in [3,2]Tpa. ATP (pink) and [3,2]Tpa (blue) are shown as sticks. Residues chosen for randomization are colored in yellow and also shown as sticks. PDB: 3VQV b) Chromatograph of the whole library sequencing. c) Sequencing results of seven randomly picked library members.

The completed library was then used in double-sieve selections to identify members capable of distinguishing between Trp and [3,2]Tpa. With NNS randomizations at 5 positions, the library has a theoretical size of 3 x 107 members. To ensure coverage of the entire library, freshly prepared electrocompetent cells were used for the first round of double-sieve selections. Transformation efficiencies were assessed by plating a dilution series of the recovered transformation on LB agar plates containing only the library plasmid’s selection marker (no double-sieve selection conditions). Counting of the single colonies of the dilution that produced distinguishable single colonies revealed that with 5 x 107 transformants library coverage was achieved. To start selection with a low stringency a positive selection plasmid harboring only one stop codon in the chloramphenicol acetyl-transferase (CAT) gene was chosen. During this step, functional library

58 3 Results and Discussion members are selected, as only colonies capable of suppressing the amber codon and expressing the resistance gene can survive in the presence of the selection marker chloramphenicol. As the selection plasmids harbored the same (low copy) origin of replication as the library plasmid (p15a), the ori on the selection plasmids was changed to ColE1. Plasmids with the same ori are thought to compete for replication factors, leading to one plasmid outgrowing its competitor495–497. Thus, the p15a ori’s on the selection plasmids were substituted with ColE1, which has a similar copy number, by golden gate cloning and the success of the substitution was verified via sequencing analysis. The selection experiments were conducted on LB media and the stringency was increased with every positive selection by increasing the chloramphenicol (Cm) concentration from 37 µg/mL to 50 µg/mL to 70 µg/mL and finally 90 µg/mL. To follow the progress of the selection, the cells were streaked on selection plates in the presence of 0.1 mM [3,2]Tp, but also on control plates, where no [3,2]Tp was added. Comparison of colony numbers from the selection plates with those from the control plates gives a first indication, whether the mutants that are enriched with each round of selection charge the desired ncAA or one of the canonical amino acids. If mutants charging the desired ncAA are enriched, colonies on the selection plates should outnumber those on the control plates. For the first positive selection, the plates were incubated at 30 °C for 72 h to allow even slowly growing cells harboring less efficient PylRS variants to form colonies. Subsequent positive selections were incubated at 30 °C for 48 h. To ensure the screening of all library members, three transformations were conducted for the first positive selection. The transformations were pooled and spread on five large LB agar plates, whereby one plate served as a control plate without [3,2]Tp supplementation. As functional PylRS variants should be selected with each round of positive selection and thus the number of library members should decrease with each round, the number of transformations was likewise decreased with each round of positive selection. An overview of the experimental parameters is depicted in Table 1.

Table 1 I Overview of experimental parameters of the positive selections.

Cm [3,2]Tp No. of No. of sel. No. of ctrl. Round Media [µg/mL] [mM] transform. plates plates 1 LB 37 0.1 3 4 1 2 LB 50 0.1 2 3 1 3 LB 70 0.1 1 1 1 4 LB 90 0.1 1 1 1

After each positive selection, a negative selection was performed. Here, [3,2]Tp was not supplied and cells harboring PylRS mutants capable of charging any of the cAAs to their cognate tRNA can read through both amber codons, express the toxic barnase gene and die. Negative selections were incubated at 30 °C for only 14-15 h to avoid mutations of the barnase gene or inactivation of the selection plasmid. As barnase expression is extremely deadly, the selection pressure to avoid its expression is very high and thus the exposure was minimized. Expression of barnase was induced with 0.02 % arabinose and like the positive selections, the negative selections were conducted on LB media. After each round of selection, the selection plasmids were digested to avoid carryover into the next round.

59 3.2 Biocontainment of TUB170

4000

3500

3000

2500

2000

1500 number of colonies 1000

500

0

P1 P3 P4 ctrl P2 ctrl ctrl ctrl Figure 33 I Number of colonies on positive selection plates supplemented with 0.1 mM [3,2]Tp compared to control plates lacking [3,2]Tp. The colonies were counted using ImageJ, where every colony larger than 5 pixels was counted. The colony numbers per plate were normalized to the culture volume for better comparison. P1-4: positive selection 1-4, ctrl: control plates. P1 and P2 represent the mean of 4 and 3 plates, respectively, with the SD represented as an error bar.

Figure 33 shows the number of colonies on the selection plates compared to the number of colonies on the control plates after each round of positive selection (P1-4). The agar plates were scanned after incubation and colony numbers were counted using ImageJ, whereby every colony larger than 5 pixels was counted. It is of note that the first positive selection resulted in very diverse colony sizes with a few larger colonies (sel. Plate: 65 ± 12, ctrl.: 44) and numerous very small colonies (sel. plate: 1074 ± 337, ctrl.: 333 total), while the following positive selections yielded colonies of similar size. This phenomenon can likely be attributed to the longer incubation period of the first positive selection, which allowed even very slow-growing colonies to form. Representative agar plates of the positive selections are shown in the appendix (chapter 9.3, p. 163). Starting from the second round of positive selection, the number of colonies on the selection plates remains similar to the number of colonies on the corresponding control plates, while the colony number increases with each round of selection, even though the stringency is increased. The overall increase of colony numbers after four rounds of positive selections and three rounds of negative selections indicate that certain functional PylRS variants were accumulated. However, while after the first round of positive selection the control plate exhibited only about 31 % of the colony numbers compared to the selection plates, this number increases to 95 % after the fourth positive selection, indicating that the selected variants are capable of charging one (or more) of the canonical amino acids. In order to examine whether [3,2]Tpa-charging mutants were enriched during selection, 30 colonies were picked and spotted on NMM19 –Trp plates with and without the non-canonical amino acid precursor [3,2]Tp and with increasing Cm concentrations (Figure 34). Minimal media lacking Trp were chosen, as the selection of a PylRS mutant capable of incorporating both, Trp as well as [3,2]Tp, was deemed likely due to their great structural similarity. In this case, the presence of Trp could complicate the read-out of this assay.

60 3 Results and Discussion

Cm - ncAA + 0.1 mM [3,2]Tp [µg/mL]

37

70

120

Figure 34 I Screening for promising M. mazei PylRS library members. After the fourth positive selection, 30 colonies were resuspended in 50 µL sterile ddH2O and 2 µL of the suspension were spotted on NMM19 –Trp plates with and without 0.1 mM [3,2]Tp and with increasing Cm concentrations. The cells were incubated overnight at 37 °C.

Although all colonies can tolerate even high Cm concentrations there is no difference in growth between plates supplied with [3,2]Tp and plates without [3,2]Tp, suggesting that the selected PylRS mutants charge one of the canonical amino acids rather than [3,2]Tpa. Nevertheless, a few colonies from the fourth positive selection were sequenced in order to get an idea of what was enriched during selection. The sequencing results are summarized in Table 2. As there was one sequence that was observed six times, a fluorescence assay was conducted to establish if this mutant is able to charge [3,2]Tpa (or Trp) to its cognate tRNA. BL21(DE3) cells were co- transformed with this PylRS HLLNQ mutant and a plasmid harboring the gene for sfGFP with an amber stop codon at position 2. In this experiment, sfGFP can only be successfully expressed and emit green light, when the amber stop codon is suppressed by the PylRS mutant. The experiment was conducted in a 96 well plate and fluorescence was detected by a plate reader. The cells were inoculated 1:100 and immediately induced with 1 mM IPTG.

61 3.2 Biocontainment of TUB170

Table 2 I Summary of the sequencing results of PylRS mutants isolated after the fourth positive selection. Six of eleven sequenced mutants have the same sequence (shown in bold).

Clone # Asn346 Cys348 V401 Trp417 Gly419 1 His Leu Leu Asn Gln 2 Gly Glu Ser His Asp 3 His Leu Leu Asn Gln 4 His Leu Leu Asn Gln 5 Gly Glu Ser His Asp 6 Asn Cys Val Trp Gly 7 Arg Gly Leu Pro Ala 8 Leu Gly Asp Val Leu 9 His Leu Leu Asn Gln 10 His Leu Leu Asn Gln 11 His Leu Leu Asn Gln

The tested media compositions are depicted in Table 3.

Table 3 I Media compositions tested in the fluorescence assay.

Media Amino acid/precursor Conc. [mM] LB - LB [3,2]Tp (precursor) 0.1 NMM19 - Trp [3,2]Tp (precursor) 1 NMM19 - Trp [3,2]Tp (precursor) 0.1 NMM19 - Trp [3,2]Tp (precursor) 0.03 NMM19 - Trp - - NMM19 - Trp Trp (aa) 0.03 NMM19 - Trp Trp (aa) 0.1 NMM19 - Trp Trp (aa) 1

The absorbance at 600 nm as well as the fluorescence at 511 nm was detected overnight (Figure 35). The absorption spectra (Figure 35 a) reveal that [3,2]Tp concentrations as high as 0.1 mM and 1 mM are likely toxic in minimal media, as no cell growth can be observed for these cultures. Interestingly, cells cultured in minimal media supplied with 0.03 mM [3,2]Tp seem to have a very long lag phase and only start growing after some 15 h of incubation at 37°C. The [3,2]Tp toxicity is less pronounced in rich media and cells supplemented with 0.1 mM in LB grow normally. Figure 35 b shows the fluorescence after 15 h of incubation, where most cultures had reached the stationary phase. However, as one culture had not reached the stationary phase yet and two cultures did not grow at all, fluorescence was not normalized to the absorption. No increase in fluorescence could be observed after 15 h of incubation, not even for NMM19 – Trp + 0.03 µM [3,2]Tp, where cell growth only started after 15 h. There is no difference in fluorescence for cells cultivated in the absence of [3,2]Tp or Trp and those

62 3 Results and Discussion supplemented with [3,2]Tp or Trp, indicating that the enriched PylRS mutant does not incorporate either of these two amino acids. As suggested by phylogenetic analyses, the PylRS likely evolved from bacterial PheRS498, which have a structurally similar catalytic site. As a consequence, PylRS mutants that incorporate Phe are fairly easily selected499, which might have happened in this case as well. However, sequence comparison of the mutant enriched in this study with selected FRS mutants revealed only the C348L mutation as a match. Especially the N346H mutation observed here differs greatly from the C348A or C348S mutations observed in the selected FRS mutants and cannot be observed in any other published PylRS mutants205,278. Thus, the identity of the amino acid incorporated by the HLLNQ mutant enriched here remains a mystery and was not investigated further.

a) b) LB LB NMM19 - Trp LB + 0.1 mM [3,2]TP 100 0,6 ______NMM + 1 mM [3,2]Tp NMM + 0,1 mM [3,2]Tp 80 NMM + 0,03 mM [3,2]Tp NMM19 -Trp 60 0,4 NMM + 0,03 mM Trp NMM + 0,1 mM Trp 40

NMM + 1 mM Trp Fluorescence 20 0,2

0 Absorbance600nm at

LB

NMM19 0,0 + 1mM Trp + 0.1mM Trp 0 10 20 + 1mM [3,2]Tp + 0.03mM Trp + 0.1mM [3,2]Tp + 0.1mM [3,2]Tp + 0.03mM [3,2]Tp time [h] Figure 35 I Fluorescence assay of sfGFP(R2TAG) with the PylRS HLLNQ mutant in media supplemented with and without Trp or [3,2]Tp. a) Absorbance at 600 nm. b) Fluorescence after 15 h of incubation.

3.2.2 E. coli TrpRS Library

All the results above indicate that the double-sieve selection experiments conducted with the PylRS system were not successful and no TpaRS could be found. Therefore, the choice of aaRS for library design was reconsidered. The PylRS system was chosen for this study because of its substrate promiscuity and especially its low selectivity toward the tRNAPyl anticodon. However, the substrate binding pocket of the enzyme consists of a deep hydrophobic pocket with a bulky cavity for the accommodation of the pyrroline functionality of the pyrrolysine side chain, where substrate recognition occurs mostly via rather non-specific hydrophobic interactions. The purpose of this study is to selectively incorporate the much smaller [3,2]Tpa and even find an aaRS that can distinguish between the very similar amino acids Trp and [3,2]Tpa (Figure 36). Therefore, the PylRS system with its large binding pocket might not be the best system for this study, and other enzymes might be better suited for this task.

63 3.2 Biocontainment of TUB170

Figure 36 I Comparison of the structures of pyrrolysine, tryptophan, and its analog [3,2]Tpa.

Here, the aim is to replace the endogenous TrpRS with an engineered one (TpaRS) rather than to introduce an additional aaRS. It is therefore not necessary to restrict selection experiments to aaRS/tRNA systems that are orthogonal in E. coli. As the E. coli TrpRS is already capable of charging [3,2]Tpa to its cognate tRNA, this enzyme might be a good starting point for the selection of an aaRS capable of distinguishing between Trp and [3,2]Tpa. The tryptophanyl-tRNA synthetase is a class I aaRS with a Rossmann dinucleotide-binding fold. It is an 94 α2-dimer (Figure 37 a), where each monomer has a mass of 37 kDa . It is thus the smallest aaRS in E. coli. Unfortunately, there is no crystal structure for the E. coli enzyme. Therefore the crystal structure of the closely related (57% identical residues) Bacillus stearothermophilus TrpRS was used for library design94,500. The residues shown in red in Figure 37 b were chosen for randomization to NNS, as they interact with the indole side chain. The sulfur of Met129 points into the center of the five-membered ring and makes van der Waals contacts with each atom. The indole nitrogen atom interacts via hydrogen bonding with the carboxylate of Asp132 and the six-membered ring stacks against the edge of Phe5 (Figure 37 c)94. The sequence alignment of the E. coli gene and the B. stearothermophilus gene constructed by Hall and co-workers501 was used to determine the corresponding residues in the E. coli gene.

64 3 Results and Discussion

a) b)

M129 D132

V143 G7 F5 V141

c)

Figure 37 I The tryptophanyl-tRNA synthetase. a) Structure of the B. stearothermophilus TrpRS dimer with bound tryptophan (shown in pink). PDB: 1MB2 b) Active site of the B. stearothermophilus TrpRS with bound tryptophan. Highlighted in red and shown as sticks are the residues chosen for randomization in the TrpRS library, bound Trp is shown in green. c) Residues that interact with bound Trp-Amp in the B. stearothermophilus crystal structure (reproduced from Doublié et al94).

In accordance with the strategy used for the construction of the previous libraries, the TrpRS library was also constructed via golden gate cloning. The plasmid pQE80L_ecTrpRS has internal recognition sequences for BsmBI as well as for BsaI. To be able to use one of the type IIS restriction enzymes already present in the lab, the ampicillin resistance gene on the plasmid was switched with a chloramphenicol resistance gene, abolishing the recognition sequences for BsaI. Before library construction, the six residues of interest (F7, G9, M132, D135, V144, V146, Figure 37 b) were mutated to alanine (GCA) to avoid a background of the wildtype enzyme. The resulting alanine mutant of the TrpRS was then used as the template for the actual library construction.

65 570 190 580 200 3.2570 590Biocontainment210 of580 TUB170600 220 590 610 230 600 620 240 610 630 631250 256 620 630 631 Consensus C TConsensusG M Y C G CT AT G CN GS AM GN CT ConsensusAN TN CS CG CTC GAT CG ATM GY CTC ACG CTC CTA ACG AC AGG CGM TN AGC AA ATT CTC TC GT GAG GCC GCT CAG CTT GTA GT GC GTA GAA GAA CC TT AA CA AT TC T G G G G C C G G G G G G Frame 2 L Frame 2 ? AV A? ?S Frame?I 2 LA L LQ ? PY A QS A GT ? NE I L L GT L PI Y G Q GN T Y N M L G P G G

Coverage Coverage Coverage

539 159 549 169 539 559 179 549 569 189 559 579 199 569 589 209 579 599 600219 225 589 599 600 a) ecTrpRS ecTrpRS ecTrpRS Met132 Phe7 Asp135 Gly9 Met132 Asp135 V144 Val146 V144 Val146 pQE80L_ecTrpRS_C...C TpQE80L_ecTrpRS_C...G A T G G CT AT GT CT GT GA AG CT pQE80L_ecTrpRS_C...AG GT C CG CTC GAT CG ATA GT CTG ACG CTC CTA ACG AC AGG CGG TA AGC AA ATT CTC TC GT GAG CTC ACT CAG CTT GTA GT GTC GTA GAA GAA CC TT AA CA AT TC T G G T A C C G G T G G G Frame 2 L Frame 2 M AV AF DS FrameGI 2 LA L LQ M PY A QS A GT D NE I L L VT L PI Y VG Q GN T Y N M L V P V G

160 170530 180540 190 550 200 560 210 570 220 580 230 590 240600 250610 260 620 270 630 280 640 290 650 300 660 310670313 680 687 Consensus CGTConsensusTKSGAGTSDGGCACGACGTCGCBCMTCGACGAGGTCGGASASTCTAGTACCTAGTCTTGGTTATACATAACCATTAGAGTGCTTGGCRGGCGTCGCGGTRCGAGGTTGGAGATGAAACACCAGTAGACACGAGGACTAGCACCTCAGCACACTTGACGACTCTGTCAGCATTGATTATGCCGCTCTAGGACCGCTATATCCAACAGCCGCAGTCTAGCTCAGTGGCCGGCACGAATCTTTAAGGTGCCGGA V ? S ? A Q LP ?S AG AE ?L IT LI LG YN QY TM NG LAR/GL PRE/GQ GW EV DN QM K Q QD HD LY EH LC SI RY DC II AV QD RQ FH NA AI LT YV GR EQ I F K V P E ecWRS_library_PB9...C TFrameecWRS_library_PB95...G M 1Y C G CT AT G CN FrameGS AM 1GN CT ecWRS_library_PB9...AN TN CS CG CTC GAT CG ATM GY CTC ACG CTC CTA ACG AC AGG CGM TN AGC AA ATT CTC TC GT GAG GCC GCT CAG CTT GTA GT GC GTA GAA GAA CC TT AA CA AT TC T G G G G C C G G G G G G Frame 2 L Frame 2 ? AV A? ?S Frame?I 2 LA L LQ ? PY A QS A GT ? NE I L L GT L PI Y G Q GN T Y N M L G P G G Coverage Coverage

160 170530 180540 190 550 200 560 210 570 220 580 230 590 240600 250610 260 620 270 630 280 640 290 650 300 660 310670313 680 687 b) pQE80L_ecTrpRS_... CGTpQE80L_ecTrpRS_...TTTTAGTGGCGCACGACGTCGCACTGCGACGAGGTCGGAGAATCTAGTACCTAGTCTTGGTTATACATAACCATTAGAGTGCTTGGCGTCATCGCGGTTCGAGGTTGGAGATGAAACACCAGTAGACACGAGGACTAGCACCTCAGCACACTTGACGACTCTGTCAGCATTGATTATGCCGCTCTAGGACCGCTATATCCAACAGCCGCAGTCTAGCTCAGTGGCCGGCACGAATCTTTAAGGTGCCGGA Frame 1 V FrameF 1S G A Q LP MS AG AE D L IT LI LG YN QY TM NG LA V L PR VQ GW EV DN QM K Q QD HD LY EH LC SI RY DC II AV QD RQ FH NA AI LT YV GR EQ I F K V P E P G M A V V ecTrpRSecTrpRS

ecWRS_Lib_c6_PB... CGTecWRS_Lib_c6_PB...TGGGAGTAGCGCACGACGTCGCCCATCGACGAGGTCGGAGAATGTAGTACCTAGTCTTGGTTATACATAACCATTAGAGTGCTTGGCGACGTCGCGCTACGAGGTTGGAGATGAAACACCAGTAGACACGAGGACTAGCACCTCAGCACACTTGACGACTCTGTCAGCATTGATTATGCCGCTCTAGGACCGCTATATCCAACAGCCGCAGTCTAGCTCAGTGGCCGGCACGAATCTTTAAGGTGCCGGA Frame 1 V FrameG 1S S A Q LP HS AG AE E L IT LI LG YN QY TM NG LA E L PR QQ GW EV DN QM K Q QD HD LY EH LC SI RY DC II AV QD RQ FH NA AI LT YV GR EQ I F K V P E

ecWRS_Lib_c10_P... CGTecWRS_Lib_c10_P...TCTCAGTACGGCACGACGTCGCCCATCGACGAGGTCGGAAGTCTAGTACCTAGTCTTGGTTATACATAACCATTAGAGTGCTTGGCGGCGTCGCGGTGCGAGGTTGGAGATGAAACACCAGTAGACACGAGGACTAGCACCTCAGCACACTTGACGACTCTGTCAGCATTGATTATGCCGCTCTAGGACCGCTATATCCAACAGCCGCAGTCTAGCTCAGTGGCCGGCACGAATCTTTAAGGTGCCGGA Frame 1 V FrameL 1S T A Q LP HS AG AE S L IT LI LG YN QY TM NG LA G L PR GQ GW EV DN QM K Q QD HD LY EH LC SI RY DC II AV QD RQ FH NA AI LT YV GR EQ I F K V P E

ecWRS_Lib_c2_PB... CGTecWRS_Lib_c2_PB...TGGGAGTTTGGCACGACGTCGCACCTGCGACGAGGTCGGACAGTCTAGTACCTAGTCTTGGTTATACATAACCATTAGAGTGCTTGGCGGCGTCGCGATTCGAGGTTGGAGATGAAACACCAGTAGACACGAGGACTAGCACCTCAGCACACTTGACGACTCTGTCAGCATTGATTATGCCGCTCTAGGACCGCTATATCCAACAGCCGCAGTCTAGCTCAGTGGCCGGCACGAATCTTTAAGGTGCCGGA Frame 1 V FrameG 1S L A Q LP TS AG AE R L IT LI LG YN QY TM NG LA G L PR MQ GW EV DN QM K Q QD HD LY EH LC SI RY DC II AV QD RQ FH NA AI LT YV GR EQ I F K V P E

ecWRS_Lib_c8_PB... CGTecWRS_Lib_c8_PB...TGGGAGTKCGGCACGACGTCGCTCCTCGACGAGGTCGGACACTGTAGTACCTAGTCTTGGTTATACATAACCATTAGAGTGCTTGGCAGGCCTCGCGTCCAGGTTGGAGATGAAACACCAGTAGACACGAGGACTAGCACCTCAGCACACTTGACGACTCTGTCAGCATTGATTATGCCGCTCTAGGACCGCTATATCCAACAGCCGCAGTCTAGCTCAGTGGCCGGCACGAATCTTTAAGGTGCCGGA Frame 1 V FrameG 1S A/S A Q LP SS AG AE P L IT LI LG YN QY TM NG LA S L PR SQ GW EV DN QM K Q QD HD LY EH LC SI RY DC II AV QD RQ FH NA AI LT YV GR EQ I F K V P E

ecWRS_Lib_c9_PB... CGTecWRS_Lib_c9_PB...TGGGAGTGTGGCACGACGTCGCTCTCGACGAGGTCGGAGAGTCTAGTACCTAGTCTTGGTTATACATAACCATTAGAGTGCTTGGCGGCGTCGCGGTGCGAGGTTGGAGATGAAACACCAGTAGACACGAGGACTAGCACCTCAGCACACTTGACGACTCTGTCAGCATTGATTATGCCGCTCTAGGACCGCTATATCCAACAGCCGCAGTCTAGCTCAGTGGCCGGCACGAATCTTTAAGGTGCCGGA Frame 1 V FrameG 1S V A Q LP FS AG AE G L IT LI LG YN QY TM NG LA G L PR GQ GW EV DN QM K Q QD HD LY EH LC SI RY DC II AV QD RQ FH NA AI LT YV GR EQ I F K V P E

ecWRS_Lib_c7_PB... CGTecWRS_Lib_c7_PB...TTGGAGTCGGGCACGACGTCGCACCTCGACGAGGTCGGATACTCTAGTACCTAGTCTTGGTTATACATAACCATTAGAGTGCTTGGCGACGTCGCGGTGCGAGGTTGGAGATGAAACACCAGTAGACACGAGGACTAGCACCTCAGCACACTTGACGACTCTGTCAGCATTGATTATGCCGCTCTAGGACCGCTATATCCAACAGCCGCAGTCTAGCTCAGTGGCCGGCACGAATCTTTAAGGTGCCGGA Frame 1 V FrameW 1S R A Q LP TS AG AE S L IT LI LG YN QY TM NG LA E L PR GQ GW EV DN QM K Q QD HD LY EH LC SI RY DC II AV QD RQ FH NA AI LT YV GR EQ I F K V P E

ecWRS_Lib_c3_PB... CGTecWRS_Lib_c3_PB...TCCGAGTGAGGCACGACGTCGCGCCTCGACGAGGTCGGACAATCTAGTACCTAGTCTTGGTTATACATAACCATTAGAGTGCTTGGCAGGCGTCGCGGTACGAGGTTGGAGATGAAACACCAGTAGACACGAGGACTAGCACCTCAGCACACTTGACGACTCTGTCAGCATTGATTATGCCGCTCTAGGACCGCTATATCCAACAGCCGCAGTCTAGCTCAGTGGCCGGCACGAATCTTTAAGGTGCCGGA Frame 1 V FrameP 1S E A Q LP AS AG AE H L IT LI LG YN QY TM NG LA R L PR EQ GW EV DN QM K Q QD HD LY EH LC SI RY DC II AV QD RQ FH NA AI LT YV GR EQ I F K V P E

ecWRS_Lib_c1_PB... CGTecWRS_Lib_c1_PB...TGCCAGTGGGGCACGACGTCGCTCATCGACGAGGTCGGAGAGTCTAGTACCTAGTCTTGGTTATACATAACCATTAGAGTGCTTGGCAGGCGTCGCGGTGCGAGGTTGGAGATGAAACACCAGTAGACACGAGGACTAGCACCTCAGCACACTTGACGACTCTGTCAGCATTGATTATGCCGCTCTAGGACCGCTATATCCAACAGCCGCAGTCTAGCTCAGTGGCCGGCACGAATCTTTAAGGTGCCGGA Frame 1 V FrameA 1S G A Q LP YS AG AE G L IT LI LG YN QY TM NG LA R L PR GQ GW EV DN QM K Q QD HD LY EH LC SI RY DC II AV QD RQ FH NA AI LT YV GR EQ I F K V P E

ecWRS_Lib_c5_PB... CGTecWRS_Lib_c5_PB...TTCCAGTCACGCACGACGTCGCGCCTCGACGAGGTCGGAAGTCTAGTACCTAGTCTTGGTTATACATAACCATTAGAGTGCTTGGCAGACGTCGCGGTACCAGGTTGGAGATGAAACACCAGTAGACACGAGGACTAGCACCTCAGCACACTTGACGACTCTGTCAGCATTGATTATGCCGCTCTAGGACCGCTATATCCAACAGCCGCAGTCTAGCTCAGTGGCCGGCACGAATCTTTAAGGTGCCGGA Frame 1 V FrameS 1S H A Q LP AS AG AE S L IT LI LG YN QY TM NG LA K L PR DQ GW E V DN QM K Q QD HD LY EH LC SI RY DC II AV QD RQ FH NA AI LT YV GR EQ I F K V P E Figure 38 I Sequencing results of the E. coli TrpRS library. a) Sequencing chromatogram of the library mixture. b) Sequencing results of randomly picked colonies.

With six NNS randomizations and a theoretical size of 1.07 x 109 variants, the library is fairly big and could not be covered with a transformation efficiency of only 1.5 x 107. Furthermore, sequencing of the library revealed that the randomizations of V144 and V146 were insufficient (Figure 38 a). Sequencing of randomly picked single colonies further demonstrates a tendency toward anticodons containing G, resulting in a rather high number of Gly residues (Figure 38 b).

3.2.2.1 An alternative selection system

The E. coli tRNATrp already has the desired anticodon for [3,2]Tpa incorporation in response to the Trp codon in TUB170. However, the double-sieve selection relies on the amber stop codon. Therefore, an alternative selection method analogous to the system developed by Billerbeck and Panke was established502. This system allows for the in vivo selection of libraries of essential genes via the genetic replacement of its wild-type variant in E. coli (Figure 39). The approach requires the knockout (KO) of the essential gene of interest, in this case, trpS.

66 3 Results and Discussion

Chromosome λ red PCR-derived recombination E.coli trpS resistance cassette PBAD system > KanR βγ exo Plac > Transformation pKD46 trpS I-SceI AmpR rescue plasmid cleavage Step 1 site Elimination of chromosomal CmR gene of interest

>

Chromosome KanR PBAD > Transformation of helper plasmid βγ exo Plac > pKD46 trpS I-SceI Step 2 cleavage Elimination of rescue AmpR rescue plasmid site plasmid harboring

gene of interest CmR >

Chromosome

PBAD KanR > I-SceI Transformation of library helper plasmid

GentaR Step 3 P Selection of functional lac > mR variants from the library trpS C

>

Chromosome KanR

PBAD >

Only cells encoding I-SceI + ncAA PglnS functional library > > members can survive helper plasmid aaRS/tRNA

GentaR library plasmid

AmpR

Figure 39 I Overview of the processes involved in selection with a genetic replacement system. The gene of interest is replaced with an antibiotic resistance cassette while the rescue plasmid complements for the chromosomal loss. Subsequent elimination of the rescue plasmid followed by transformation of the library yield functional library members, which can complement for the loss of the wild type gene of interest.

This was already achieved in the framework of the master thesis and the strain was denoted as G2748 ∆trpS. As the cells would not be viable without this gene, a wildtype copy is provided in trans on a rescue plasmid (Figure 39, Step 1), which harbors I-SceI restriction sites for efficient elimination of the plasmid. The I-SceI mega-nuclease is suitable for this purpose, as the entire E. coli chromosome does not bear any I-SceI recognition sites and is thus not targeted by this enzyme. Transformation of the helper plasmid carrying the I-SceI mega-nuclease followed by its induction results in the elimination of the wildtype gene of interest on the rescue plasmid (Figure 39, Step 2). The cells remain viable long

67 3.2 Biocontainment of TUB170 enough for the transformation of the library, in this case, the E. coli TrpRS library. By plating the transformants on plates supplied with [3,2]Tp, only those cells encoding a functional library member should be able to suppress Trp codons and survive (Figure 39, Step 3). The helper plasmid carrying the I-SceI mega-nuclease has a temperature-sensitive ori and can later be cured by incubation at 42°C. Throughout the master thesis, the genetic replacement system was successfully established and efficient rescue plasmid elimination could be improved by adding two additional I-SceI restriction sites, resulting in a total of three such restriction sites. However, although the lack of growth on LB agar plates supplemented with chloramphenicol after selections suggested the efficient elimination of the rescue plasmid, a colony PCR showed that the E. coli trpS was still present. These results suggest that during the elimination step of the rescue plasmid, parts of the linearized rescue plasmid might have recombined with the library plasmid or the chromosome, resulting in the cells retaining the wildtype trpS gene and causing a large number of false-positives. In addition, extensive analyses of the elimination dynamics of the rescue plasmid showed that the time point of library introduction in this selection system is crucial. The rescue plasmid needs to be fully eliminated at the time the library is introduced for the selection to be successful. However, if the rescue plasmid is eliminated and no more copies of the wildtype TrpRS enzyme remain in the cells before the cells are made electrocompetent and the library is transformed, the cells will not survive and transformation will be impossible. Furthermore, it was observed that 2.5 h after induction of the I-SceI mega-nuclease expression almost all cells were dead, but already one hour later (3.5 h after induction) the cells started to recover. Sequencing analysis of one colony showed a point mutation in the I-SceI mega-nuclease gene. Another colony exhibited large insertions and deletions in the I-SceI mega-nuclease gene, indicating that mutation occurs randomly via multiple different mechanisms. Hence, it is of paramount importance to choose the time point of library transformation carefully. In order to assess the optimal time points for induction and transformation, several different conditions were tested and colony forming units (CFU) as well as OD600 measurements were determined and are summarized in Figure 40. Inoculation of 1:250 led to low cell densities and faster rescue plasmid elimination, while inoculation of 1:100 and immediate induction resulted in cell densities of 0.475 OD600 2 h after induction and rescue plasmid elimination after 2.5 h. The latter condition was chosen for further experiments, as it provides a good number of cells to make electrocompetent prior to rescue plasmid elimination.

68 3 Results and Discussion

Inoculation Induction Time after OD600 CFU induction 1:100 immediately 1 h 0.144

2.5 h 0.806

1:100 immediately 1 h 0.146

1.5 h 0.414

2 h 0.475

1:250 immediately 1 h 0.039

2 h

2.5 h 0.273

1:250 after 45 min 1 h 0.132

2 h 0.496

1:250 after 45 min 1 h 0.168

1.5 h 0.431

Figure 40 I Assessment of the optimal time point of library transformation for the genetic replacement system. G2748 ∆trpS was cultivated in LB media at 30 °C and I-SceI mega-nuclease expression was induced at the indicated time points. Samples were taken at different time points, OD600 was measured and CFU were assessed by plating 2 µL of a dilution series of the samples on LB agar plates.

For the determination of false-positive variants due to recombination events, a plasmid with the E. coli trpS gene harboring an amber stop codon at position three was created. Only a recombination event can restore the trpS gene and therefore allow these cells to survive. To characterize the system, cells were inoculated 1:100 with an overnight culture harboring the rescue plasmid (trpS) and the helper plasmid (I-SceI mega-nuclease). I-SceI mega-nuclease expression was immediately induced with 0.5 % arabinose, samples were taken 1 h and 2 h after induction, and CFU (Figure 41 a) were assessed. Immediately after taking the last sample (2 h after induction), cells were made electrocompetent and transformed with three control plasmids. The cells were incubated at 37 °C overnight and CFU were assessed (Figure 41 b).

69 3.2 Biocontainment of TUB170

Figure 41 I Colony forming units of the samples taken during the final optimization experiment with the recombination system. a) Colony forming units of the samples taken 1 h and 2 h after induction of I-SceI mega-nuclease expression. b) Colony forming units of the transformations with three control plasmids; wildtype ecTrpRS on two different backbones and ecTrpRS with an amber stop codon at position 3 for assessment of false positives due to recombination events.

Two different backbones harboring the wild-type E. coli trpS gene were tested, as well as the control plasmid for false-positives. The cells could be rescued with the first two test plasmids, showing that the genetic replacement system works. The pQE80L backbone seemed to work better than the pJZ backbone, as it produced more colonies. Thus, pQE80L was chosen as the backbone for the library construction (see chapter 3.2.2). The third test plasmid pJZ_ecTrpRS_K3TAG yielded fewer and only very small colonies, suggesting slower growth of false-positive colonies, which might be avoided by reducing the overnight incubation time. As the library is derived from the E. coli TrpRS, using the same enzyme on the rescue plasmid likely contributes significantly to recombination events during selection. Therefore, a rescue plasmid harboring the trpS gene from the gram-negative bacterium Aquifex aeolicus was constructed. This organism was chosen, as its trpS gene was already present in the laboratory. While the genetic replacement system was established with TUB00 (MG1655 ∆tnaA, ∆trpLEDC), the ancestral strain of the [3,2]Tp-adapted TUB170, the actual library selection relies on TUB170, as the selection event depends on survival on [3,2]Tp. Functional library members will ideally incorporate [3,2]Tpa in response to Trp codons in a proteome-wide manner, which is only tolerated in the adapted strain. Thus, the A. aeolicus trpS rescue plasmid was transformed into TUB170 and the chromosomal trpS knockout was transferred via phage P1 (for more details on phage P1 transduction see chapter 5.2.17.1, p. 112). However, the resulting strain was not very robust with a lag phase of more than 48 h. While TUB170 affords robust growth on the non-canonical substrate [3,2]Tp, the transformation of large enzyme libraries requires exceptionally fit cells to cover all variants. The poor growth of the selection strain taken together with the high number of false-positives and the insufficiently randomized TrpRS library represent an accumulation of sub-optimal selection conditions that made a successful selection seem rather unlikely. At this point, a colleague (Fabian Schildhauer) had constructed a library based on the M. jannaschii TyrRS system and used it successfully for the selection of an aaRS capable of charging its cognate tRNA with the Trp analog β-(1-azulenyl)-l-alanine (AzAla). The TyrRS and the TrpRS are related94, which might facilitate the selection of Trp analogs. The MjTyrRS is a very efficient OTS with a well-established selection system (double-sieve selection). Furthermore, the MjTyrRS exhibits only weak anticodon binding with only one hydrogen bond between U35 and Cys231294,298. Therefore, using the MjTyrRS library in the double-sieve selection and mutating the anticodon from CUA (amber) to CCA (Trp) after successful selection might be feasible. Hence, rather than reconstructing the TrpRS library to use it in a selection system with a tendency to produce false-positives with a selection strain that is not as fit as common cloning strains, it was decided to instead use the already constructed TyrRS library for Trp analogs. Furthermore, while the TrpRS already recognizes [3,2]Tpa and it might be easier to find a variant that can discriminate between [3,2]Tpa and Trp, it is likely also just as easy for this enzyme to mutate back to its wildtype. Normally, mutation back to the wildtype sequence is not a big issue with

70 3 Results and Discussion

OTS, however, in this case, the desired [3,2]TpaRS is supposed to replace the endogenous TrpRS in TUB170 to produce a stably biocontained organism over numerous generations. In this scenario, mutation of a TrpRS-based [3,2]TpaRS back to its wildtype is a legitimate risk. Therefore, it might be more reasonable to start from the TyrRS system, which is homologous to the TrpRS but does not incorporate Trp.

3.2.3 M. jannaschii TyrRS Library

In the MjTyrRS library constructed by Fabian Schildhauer, a total of 8 residues were mutated, resulting in a theoretical library size of 6.9 x 108 variants: Tyr32VNK (all aa except Cys, Phe, Tyr, Trp), Ala67GBC, His70CTC, Asp158CTC (small aa: Ala, Gly, Leu, Val), Leu65NNS, Phe108NNS, Gln109NDT, Leu162VHG (all aa except Trp). In Figure 42 the mutated residues are shown as sticks and highlighted in red. In addition to screening for variants capable of charging [3,2]Tp to the cognate tRNA, it was simultaneously screened for variants capable of charging 4-F-indole. In the meantime, in the Budisa lab, two more strains had been adapted to growth on the indole analogs 4-F-indole and 5-F-indole491, respectively, with the former strain being the more robust of the two.

L162 D158 Y32 F108

Q109 L65 H70 A67

Figure 42 I Active site of the M. jannaschii TyrRS with bound tyrosine (green). Residues chosen for mutation are shown as sticks and highlighted in red. PDB: 1J1U

Prior to selection, the maximal concentrations of indole analogs tolerated by NEB10-beta cells under selection conditions were tested by comparing colony growths of culture dilutions after the first round of positive selection on LB-agar plates containing increasing concentrations of [3,2]Tp and 4-F-indole (Figure 43).

71 3.2 Biocontainment of TUB170

Conc. Indole Indole Dilution series Dilution series [mM] analog analog 4-F- 0.03 [3,2]Tp indole 4-F- 0.1 [3,2]Tp indole 4-F- 0.5 [3,2]Tp indole 4-F- 1 [3,2]Tp indole

Figure 43 I Indole analog tolerance of NEB10-beta cells under selection conditions. After transformation with the MjTyrRS library, a dilution series was prepared from recovered NEB10-beta cells and 2 µL of each dilution were plated on LB agar plates containing 37 µg/mL Cm and increasing concentrations of 4-F-indole or [3,2]Tp.

While 4-F-indole only supports the growth of NEB10-beta cells in concentrations of up to 0.1 mM, the cells tolerated [3,2]Tp in all tested concentrations. Based on these results, concentrations of 0.1 mM 4-F-indole and 0.5 mM [3,2]Tp were chosen for the selection experiments. The concentration of [3,2]Tp was increased in comparison to prior selection experiments to increase the selection pressure towards [3,2]Tpa-incorporating aaRS variants. During the selection experiments with fluoro-indoles, a concentration of 0.03 µM 4-F-indole was sufficient to support the growth of the adapted strain to OD600 values of around 1.5. Thus 0.1 mM 4-F-indole should provide enough excess of the substrate to drive the selection experiments towards 4-F-Trp-incorporating aaRS variants. Transformation efficiencies of 1.6 x 109 for [3,2]Tp and 6 x 108 for 4-F-indole were achieved for the first positive selection with the TyrRS library. The chloramphenicol concentrations were increased from 30 µg/mL in the first round of positive selection (P1) to 50 µg/mL in later rounds. After each positive selection, the number of colonies on the selection plates was counted and compared to the number of colonies on the control plate lacking the indole analog. The number of colonies on the selection plates was set as 100 % and the percentage of the colony numbers on the corresponding control plate was calculated. Figure 44 depicts the (mean) colony numbers for one agar plate each. A volume of 450 µL of recovered transformation was spread on each plate, except for the first positive selection with [3,2]Tp, where 1 mL of culture volume was spread on each plate. Therefore, the numbers of colonies for P1 were adjusted to a culture volume of 450 µL for comparison with the following positive selections.

72 3 Results and Discussion

a) 3500 [3,2]Tp b) 18000 4-F-indole 16000 3000 14000 2500 12000

2000 10000

1500 8000

6000 number of colonies 1000 number of colonies 4000 500 2000

0 0

P1 P3 P1 P3 ctrl P2 ctrl ctrl ctrl P2 ctrl ctrl Figure 44 I Number of colonies on positive selection plates supplemented with the indicated indole analog compared to control plates lacking the analog. The colonies were counted using ImageJ, where every colony larger than 5 pixels was counted. P1-3: positive selection 1-3, ctrl: control plates. P1 represents the mean of 3 ([3,2]Tp) and 5 (4-F-indole) agar plates with the SD represented as the error bar. a) Double-sieve selection with [3,2]Tp. The colony numbers for P1 were adjusted to the volume of recovered transformations spread on P2 and P3 plates (450 µL / plate). b) Selection experiments with 4-F- indole.

Curiously, in both experiments, the number of colonies per plate drastically decreases during the second positive selection (P2) and increases again during the third round of positive selections (P3). Usually, a steady increase in colony numbers would be expected with the progression of the selection, as functional library members should be enriched. The drastic decrease in colony numbers observed here, suggests a reduction of functional library members. Possible reasons for this observation could be the very stringent negative selection, which might have killed promising candidates in addition to those that incorporate only cAAs. For example, a variant that incorporates the desired ncAA might also incorporate one or more cAAs. This variant would still be a promising candidate, as long as the ncAA outcompetes its canonical contender(s), but would be killed during the negative selection in the absence of the non-canonical substrate. Another reason for the decrease in colony numbers might be a possible loss of library members during harvesting of the colonies from the selection plates, followed by plasmid preparations (midiprep), digestion of the selection plasmid, the subsequent purification of the library plasmids, and transformation of the next selection round. Finally, the high number of colonies per plate during the first positive selections might be attributed to a high number of variants that solely incorporate canonical amino acids, which are then removed during the negative selection. This hypothesis might be supported by the similarly high colony numbers on the corresponding control plates. Furthermore, in the case of the experiment with [3,2]Tp the difference between colony numbers on the selection and control plate increases during the second round of positive selections and even more so during the third round, where the number of colonies drastically increases again. While the number of colonies on the selection plate is approximately 1.3 fold higher for P3 compared to P1, the colony number on the corresponding control plate decreases from 86 % for P1 to 55 % for P3, indicating enrichment of MjTyrRS variants capable of charging [3,2]Tp to their cognate tRNA. In the case of the experiment with 4-F-indole, the colony number on the selection plate increases by around 7-fold in P3 compared to P1, but the colony number on the corresponding control plates stays at 90 % or higher compared to the selection plates. After the second and third positive selections, 40 colonies were screened and plated on increasing Cm concentrations with and without the respective ncAA. Even after three rounds of positive selection and two rounds of negative selection, in the case of experiments with 4-F-indole still no promising candidates could be obtained.

73 3.2 Biocontainment of TUB170

Cm Variant - ncAA + 0.5 mM [3,2]Tp [µg/mL] 33 37 39

33 70 39

33 100 39

33 150 39

Figure 45 I Serial dilutions of the two promising MjTyrRS variants 33 and 39 on increasing Cm concentrations with and without [3,2]Tp.

However, screening of the colonies from the third positive selection with [3,2]Tp yielded two promising candidates, colony 33 and colony 39. Serial dilutions of these cultures were spotted on plates with and without [3,2]Tp and Cm concentrations ranging from 37 µg/mL to 70 µg/mL, 100 µg/mL and finally 150 µg/mL. As can be seen in Figure 45, both isolates, 33 and 39, exhibit enhanced growth on plates supplemented with the indole analog [3,2]Tp, suggesting that these may be able to charge [3,2]Tp to their cognate tRNA and thus suppress the stop codons in the cat gene. Sequencing analysis of the two isolates revealed the mutations indicated in Table 4.

Table 4 I Summary of mutations in isolates 33 and 39 as revealed through sequencing analysis.

Residue 32 65 67 70 108 109 158 163 wt Tyr Leu Ala His Phe Gln Asp Leu 33 Gly Glu Gly Leu His Glu Leu Pro 39 Gly Gly Ala Gly Leu His Ala Asp

In order to further characterize the variants’ ability to suppress amber codons a fluorescence assay analogous to the one in chapter 3.2.1 (p. 63) was conducted. The two variants, 33 and 39, were transferred from the high copy library backbone (pBU18) to a low copy backbone (pBU16) harboring the cognate tRNA and co-transformed with sfGFP(R2TAG) into BL21(DE3) cells. Again, both absorption at 600 nm as well as fluorescence at 511 nm of six biological replicates were measured over 24 h. The cells were cultured in ZYP 5052 auto-induction media supplied with either 0.5 mM [3,2]Tp, 0.1 mM 4F- indole (to check for promiscuity of the variants), or no ncAA. As a control, wt sfGFP was expressed. This

74 3 Results and Discussion time, all cultures exhibited similar growth curves with the exception of the wt sfGFP cultures reaching higher absorptions (almost twofold), which might be explained by these cells harboring only the sfGFP plasmid, while the other cells additionally harbor the MjTyrRS/tRNA plasmid. The cultures reached stationary phase after about 6 h of incubation at 37°C and in the case of the wildtype control approximately 12 h. The mean, as well as the standard deviation, were calculated and the fluorescence was normalized to the absorption. Figure 46 shows the normalized fluorescence after 15 h of incubation.

a) b) 6000 Variant E33 Variant E39 -ncAA ______+ [3,2]Tp 1,2 + 4-F-indole 5000 -ncAA 1,0 +[3,2]Tp +4-F-indole 4000 wt sfGFP 0,8 3000 0,6

2000

0,4 norm. Fluorescence Absorption(600nm)

0,2 1000

0,0 0 0 5 10 15 20 25 30 time [h] -ncAA -ncAA + [3,2]Tp + [3,2]Tp wt sfGFP + 4F-Indole + 4F-Indole Figure 46 I Normalized fluorescence of MjTyrRS variant 33 and 39 after 15 h of incubation. Six biological replicates were cultivated in ZYP 5052 media supplied with the appropriate antibiotics, as well as either 0.5 mM [3,2]Tp, 0.1 mM 4-F-indole, or no ncAA at all. Fluorescence was normalized to absorption at 600 nm.

The controls cultivated in the absence of any ncAA exhibit a rather high fluorescence compared to the wt control, indicating background suppression. No (distinct) increase in fluorescence can be observed upon 4-F-indole supplementation, however, an approximately twofold increase in fluorescence can be seen upon [3,2]Tp supplementation. To identify the amino acid incorporated in response to the amber codon the mass was determined. BL21(DE3) cells harboring pET28a_sfGFP(R2TAG)-His and the MjTyrRS variants 33 and 39 were cultivated in DYT supplemented with the appropriate antibiotics and

0.1 % glucose to suppress leaky sfGFP expression before induction. At a cell density of OD600 = 1 the indole analog [3,2]Tp was added at a concentration of 0.5 mM and it was incubated for another 30 min at 37°C to allow uptake of [3,2]Tp and conversion to the amino acid [3,2]Tpa. sfGFP expression was induced with 1 mM IPTG and expressed for 4 h at 37°C. After purification via immobilized metal affinity chromatography (IMAC) the mass was detected via electrospray ionization mass spectrometry (ESI- MS). The MS spectra were extracted from the highest peak of the total ion count (TIC) chromatogram and deconvoluted using the Agilent Mass Hunter BioConfirm software (Figure 47).

75 3.2 Biocontainment of TUB170

a) b) Amino acid Predicted mass 27744.8 Amino acid Predicted mass 27725.8 100 100 [3,2]Tpa 27789.3 Da [3,2]Tpa 27789.3 Da 27747.5 Phe 27744.19 Da Gln 27725.2 Da 80 80 27766.3 27769.2 60 60

27788.4 counts counts 27790.9 27810.2 40 40 27832.4 27854.1 20 20

0 0 27500 28000 28500 27500 28000 28500 mass in Da mass in Da Figure 47 I Identification of the amino acid incorporated in response to the amber codon in sfGFP(R2TAG). The reporter protein sfGFP(R2TAG) was co-expressed with MjTyrRS variants 33 or 39 in the presence of [3,2]Tp. The ESI-MS spectra were deconvoluted using the Agilent Mass Hunter BioConfirm software for masses between 27 kDa and 30 kDa. The highest peak was normalized to 100. a) Deconvoluted spectrum of sfGFP(R2TAG) co-expressed with variant 33. The highest peak corresponds to phenylalanine incorporation. The smaller peaks are each approximately 22 Da apart and likely correspond to sodium adducts (Na = 22.99 Da). b) Deconvoluted spectrum of sfGFP(R2TAG) co-expressed with variant 39. The highest peak corresponds to glutamine incorporation. The smaller peaks are each approximately 22 Da apart and likely correspond to sodium adducts (Na = 22.99 Da).

In the case of stop-codon suppression with variant 33, the main peak corresponds to phenylalanine incorporation at position 2 of the reporter protein (predicted mass: 27744.2 Da, observed mass: 27744.8 Da). In the experiment with variant 39, the main peak corresponds to Gln (predicted mass: 27744.85 Da, observed mass: 27744.84 Da). Incorporation of glutamine at amber positions is common in the absence of an amber suppressor503 as its CAG codon is similar to the amber codon (UAG). The other peaks are each approximately 22 Da apart and likely represent sodium adducts (Na = 22.99 Da). The third peak in Figure 47 a (27788.4 da) and the fourth peak in Figure 47 b (27790.9 Da) exhibit masses similar to that expected for [3,2]Tpa incorporation (27789.3). These results might indicate a TyrRS variant promiscuous for aromatic side chains in the former case and a variant that charges [3,2]Tpa to its cognate tRNA so inefficiently, that the main expression product stems from background suppression of the UAG codon with glutaminyl-tRNA in the latter case, resulting in protein mixtures with different amino acids at position 2. However, due to the regular pattern of the peaks, it is more likely that these represent sodium adducts. Nevertheless, as both, the Cm assay and the fluorescence assay, show a clear difference between the control and [3,2]Tp-supplemented samples, these variants were deemed a promising starting point for further improvement.

3.2.3.1 Error-prone PCR library

The semi-rational design of mutant aaRS libraries usually focuses on first shell active site residues, as transformation efficiencies limit the library size and thereby the number of mutable residues. However, these libraries often result in enzymes with low affinities for their ncAA substrates, as well as low overall aminoacylation237,504–506. Furthermore, the active site is sensitive to perturbations; first shell residues interact with other residues in a way that is not always fully understood and those interactions may be inadvertently perturbed by mutated sidechains310. Hence, only mutating first shell

76 3 Results and Discussion residues may not be enough to yield efficient enzymes507. The creation of randomly mutagenized libraries via error-prone PCR has therefore proven beneficial508. Thus, variant 33 was used as a template for error-prone PCR (epPCR), where the mutation rate of the

Taq polymerase is further enhanced by the addition of MnCl2, resulting in a MjTyrRS library with random point mutations distributed throughout the entire enzyme. Variant 33 was chosen, as it produced higher fluorescence and increased colony growth on Cm in comparison to variant 39. Eight single colonies were sequenced and their mutations are summarized in Table 5.

Table 5 I Summary of the number and type of mutations found in eight colonies of the epPCR-based MjTyrRS library.

Type(s) of mutations Frequency Proportion of total Transitions 48 39.36 % A --> G, T --> C 34 27.88 % G --> A, C --> T 14 11.48 % Transversions 30 24.6 % A --> T, T --> A 22 18.04 % A --> C, T --> G 6 4.92 % G --> C, C --> G 1 0.82 % G --> T, C --> A 1 0.82 % Insertions and deletions 4 3.28 % Insertions 3 2.46 % Deletions 1 0.82 % Summary of bias Transitions / transversions 1.6 NA AT --> GC / GC --> AT 2.43 NA A --> N, T --> N 62 50.84 % G --> N, C --> N 16 13.12 % Mutation rate Mutations per kb 11.13 NA Mutations per MjTyrRS gene 10.25 NA

This data was used to estimate the library characteristics using PEDEL-AA509, the results are summarized in Table 6.

Table 6 I Summary of library characteristics of the error-prone PCR-based MjTyrRS library estimated with PEDEL-AA509.

Property Estimate Total library size 1 x 107 Number of variants with no indels or stop codons 4.48 x 106 Mean number of amino acid substitutions per variant 7.28 Unmutated (wildtype) sequences (% of library, Poisson est.) 0.03091 % Number of distinct full-length proteins in the library (Poisson est.) 4.46 x 106

77 3.2 Biocontainment of TUB170

With a transformation efficiency of 4.2 x 107 the estimated library size of 1 x 107 was covered. Double- sieve selection was performed as described above and a total of three consecutive rounds of positive and negative selections were conducted with Cm concentrations increasing from 30 µg/mL to 45 µg/mL and finally 70 µg/mL. From the beginning, a clear difference in colony numbers between the control plate and the selection plates could be observed with the number of colonies on the control plate only 9 % that of those on the corresponding selection plates (Figure 48). With the second positive selection, the overall number of colonies increased, indicating the accumulation of functional library members. However, with 26 % the ratio of colonies on the control plate compared to the selection plates also noticeably increased, indicating a possible enrichment of variants charging canonical amino acids to their cognate tRNAs. During the third positive selection, the number of colonies drastically decreased to levels lower than during the first round of selection with the number of colonies on the control plate also slightly decreasing to 17 % of those on the selection plates. This decrease in colony numbers could suggest that 70 µg/mL Cm might have posed too high of a selection pressure. However, during this third positive selection, a new selection plasmid was used, which additionally harbors the gene for sfGFP(R2TAG) to facilitate screening with the fluorescence assay without the need for cloning and transformations between Cm and fluorescence assays. It is possible that leaky sfGFP expression from the lac promoter in addition to chloramphenicol acetyl-transferase and MjTyrRS expression might have posed too much strain on the cells.

9000

8000

7000

6000

5000

4000

3000 number of colonies 2000

1000

0 P1 ctrl P2 ctrl P3 ctrl Figure 48 I Number of colonies on positive selection plates supplemented [3,2]Tp compared to control plates lacking the analog during selections with the epPCR MjTyrRS 33 library. The colonies were counted using ImageJ, where every colony larger than 5 pixels was counted. P1-3: positive selection 1-3, ctrl: control plates. The numbers for the selection plates represent the mean of 5 (P1), 3 (P2), and 2 (P3) agar plates with the SD represented as error bars. The colony numbers were normalized to the culture volume.

Once again, single colonies were screened with and without [3,2]Tp on increasing Cm concentrations after the second and the third positive selection. Promising candidates were isolated and their serial dilutions were plated under the same conditions. Figure 49 provides an overview of promising variants isolated after the second positive selection.

78 3 Results and Discussion

Cm Variant - ncAA + 0.5 mM [3,2]Tp [µg/mL]

3 4 37 8 24 25 30

3 4 8 70 24 25 30

3 4 8 120 24 25 30

Figure 49 I Serial dilutions of promising variants on increasing Cm concentrations with and without [3,2]Tp after the second round of positive selection with the epPCR MjTyrRS 33 library.

These variants grow in the absence of the analog even at very high Cm concentrations, but the number of CFU somewhat increases upon [3,2]Tp addition. The pronounced growth in the absence of the indole analog is in line with the high fluorescence of variant 33 in the absence of [3,2]Tp, which was used as a template for the error-prone PCR based library used here, and indicates incorporation of cAAs in response to the amber codons in the CAT. Nevertheless, the increase of cell growth upon [3,2]Tp supplementation also suggests incorporation of the target ncAA.

However, in the case of variants isolated after the third round of positive selection, almost no difference between the serial dilutions plated in the absence or presence of [3,2]Tp could be seen (Figure 50).

79 3.2 Biocontainment of TUB170

Cm Variant - ncAA + 0.5 mM [3,2]Tp [µg/mL]

16 17 19 20 37 38 39 40 41 42

16 17 19 20 100 38 39 40 41 42

Figure 50 I Representative serial dilutions of promising variants on increasing Cm concentrations with and without [3,2]Tp after the third round of positive selection with the epPCR MjTyrRS 33 library.

Nevertheless, some of the more promising candidates picked after the second and third round were screened in the fluorescence assay with sfGFP(R2TAG) and are shown in Figure 51 a. BL21(DE3) cells transformed with sfGFP(R2TAG) but not with any of the MjTyrRS variants served as a control for background suppression. Unfortunately, no significant increase in fluorescence intensity upon analog addition could be observed for any of the tested variants. In parallel, another epPCR library was constructed and screened for variants capable of charging 5-F- Trp to their cognate tRNA. This library was based on a promiscuous MjTyrRS variant (termed 14.2) found during selections for another Trp analog (β-(1-Azulenyl)-L-Alanine) in the Budisa laboratory. In Tyr addition to charging the target ncAA to the suppressor tRNACUA, variant 14.2 was found to exhibit some activity for 5-F-Trp as well. Therefore, this variant was employed in a test to compare the efficacy of 5- F-indole and 5-F-Trp in the sfGFP(R2TAG) assay employed for the identification of promising aaRS variants throughout this study. During the numerous selection experiments with [3,2]Tp (and 4-F- indole), only the precursor was used rather than the amino acid [3,2]Tpa itself. The indole analog diffuses through the cell membrane and is intracellularly converted to the corresponding amino acid231. While this approach worked well for the adaptation experiments (chapter 1.3.1, p. 22), it might not work as well during double-sieve selection and the screening experiments described here. Expression of the orthogonal aaRS/tRNA pair, as well as the selection marker (CmR/sfGFP), already poses significant stress to the host cells. Having to additionally convert the precursor to the amino acid prior to selection marker expression might put too much strain on the cells. Hence, having access to an aaRS

80 3 Results and Discussion variant capable of incorporating Trp analogs was a good opportunity to test this hypothesis in a fluorescence assay. Figure 51 b presents the fluorescence of sfGFP(R2TAG) co-expressed with variant 14.2 in the absence and presence of increasing 5-F-indole and 5-F-Trp concentrations. Indeed, fluorescence is considerably higher when 5-F-Trp is supplied than with supplementation of the indole precursor.

a) -ncAA 3200 +0.5 mM [3,2]Tp 3000 2800 2600 2400 2200 2000 1800 1600 1400 1200

norm. Fluorescence norm. 1000 800 600 400 200 0

P2_3 P2_4 P2_8 P2_24 P2_25P2_30 P3_16 P3_17 P3_20 P3_38 P3_39P3_40 P3_41 P3_42 P3_46 P3_47 P3_49 P3_56 P3_60 P3_61 P3_62 P3_63P3_65 P3_66 P3_67 P3_68

sfGFP(R2TAG)

b) analog concentration 1400

1200

1000

800

600

400 norm. norm. Fluorescence 200

0

-ncAA

+ 5-F-Trp + 5-F-Trp + 5-F-Trp + 5-F-Trp + 5-F-indole+ 5-F-indole+ 5-F-indole + 5-F-indole + 5-F-indole Figure 51 I Fluorescence assay of sfGFP(R2TAG) with different MjTyrRS variants in the absence and presence of ncAAs. a) Variants picked after the second and third round of positive selection with the epPCR library based on variant 33. P2_x denotes variants picked after the second positive selection, with x designating the number of the randomly+ 5-F-indole picked colony. P3_x denotes variants picked after the third round. sfGFP(R2TAG): control transformed with sfGFP, but no MjTyrRS. b) Test of the efficacy of 5-F-indole compared to 5-F-Trp in the fluorescence assay with variant 14.2. Analog concentrations used were: 0.03 mM 5-F-indole, 0.1 mM 5-F-indole/Trp, 0.5 mM 5-F-indole/Trp, 1 mM 5-F-indole/Trp, and 5 mM 5-F-indole/Trp.

81 3.2 Biocontainment of TUB170

This might also well be the case with other indole analogs and their corresponding amino acids, such as [3,2]Tp/[3,2]Tpa. Furthermore, using the precursor might also be problematic during the much more stringent double-sieve selection, where cells are screened for survival.

Thus, the epPCR library based on MjTyrRS variant 14.2 was screened using 1 mM 5-F-Trp rather than 5-F-indole and Cm concentrations of 45 µg/mL and 70 µg/mL. While the number of colonies increased considerably with the second round of positive selection, the colony number on the corresponding control plate stayed similar to that on the selection plates (Figure 52 a), suggesting enrichment of variants primarily charging cAAs to their tRNA. These observations are well in line with the results of the screening experiments with single colonies from both positive selections on increasing Cm concentrations in the presence and absence of 1 mM 5-F-Trp (Figure 52 b). All tested variants grow even on high Cm concentrations independent from the presence of 5-F-Trp, further confirming incorporation of canonical amino acids rather than the target ncAA.

a) b) Cm [µg/mL] Variant -ncAA +1mM 5-F-Trp 19 25000

22500 23 20000 37 25 17500 29 15000 35 12500

10000 19

number of colonies 7500 23 5000 25 2500 100 29 0 35 P1 ctrl P2 ctrl

Figure 52 I Screening of the epPCR library based on MjTyrRS variant 14.2. a) Number of colonies on positive selection plates supplemented with 5-F-Trp compared to control plates lacking the analog. The colonies were counted using ImageJ, where every colony larger than 5 pixels was counted. P1-2: positive selection 1-2, ctrl: control plates. The numbers for the selection plates represent the mean of 2 agar plates with the SD represented as error bars. The colony numbers were normalized to the culture volume. b) Representative serial dilutions of variants on increasing Cm concentrations with and without 5-F-Trp after the first and second round of positive selections with the epPCR MjTyrRS 14.2 library.

3.2.3.2 Screening with the amino acid [3,2]Tpa

To test whether the lack of success during the previous selection attempts with [3,2]Tp might be attributed to the application of the precursor, the amino acid was synthesized enzymatically from [3,2]Tp. The tryptophan synthase from Salmonella typhimurium was purified from E. coli CB149, as it is known to readily convert [3,2]Tp to the corresponding amino acid510. In an in vitro reaction511 [3,2]Tp and serine were converted to [3,2]Tpa for 24 h at 37°C and 80 rpm. The reaction was stopped via lyophilization and the sample was desalted via cation exchange chromatography. After purification via HPLC followed by removal of the solvent, the synthesis of [3,2]Tpa was verified by mass spectrometry

82 3 Results and Discussion

(see appendix 9.5, p.165) and cultivation of the adapted strain TUB170 in media lacking any cAA and solely supplied with [3,2]Tpa. In another fluorescence assay, the variants 33 and 39, as well as some of the more promising variants from the epPCR library, were tested in the absence and presence of the amino acid [3,2]Tpa. While the assay resulted in increased fluorescence upon addition of the precursor [3,2]Tp, supplementation of the corresponding amino acid yields lower fluorescence compared to the control without ncAA supplementation (Figure 53). Therefore, the initially thought to be promising candidates 33 and 39 do not seem to incorporate [3,2]Tpa and it is indeed likely that all smaller peaks in Figure 47 correspond to sodium adducts.

1800 -ncAA +0.5mM [3,2]Tp 1600 +0.5mM [3,2]Tpa +1mM [3,2]Tpa sfGFP(R2TAG) 1400

1200

1000

800

600 norm. Fluorescence

400

200

0

33 39 P2_8 P2_25 P2_30 P3_41 P3_63

sfGFP(R2TAG) Figure 53 I Fluorescence assay of MjTyrRS variants in the presence of the non-canonical amino acid [3,2]Tpa. sfGFP(R2TAG): control transformed with sfGFP, but no MjTyrRS. Columns with error bars represent the mean of two or three values with the SD as error bars.

As variants 33 and 39 were selected with the precursor, a final selection attempt with the MjTyrRS library described at the beginning of chapter 3.2.3 (p. 71) was conducted, this time in the presence of the amino acid [3,2]Tpa. As the goal was to find an aaRS capable of distinguishing between [3,2]Tpa and Trp, double-sieve selection on minimal media lacking Trp was tested, to reduce the amount of Trp present during the selection and ensure a high excess of the target ncAA [3,2]Tpa. A total of two rounds of positive selections with a negative selection in between were conducted. The number of colonies on the control plates after both, the first and second positive selection, exceeds those on the corresponding selection plates, indicating variants that incorporate canonical amino acids (Figure 54 a). Furthermore, the overall number of colonies decreases from the first to the second round of positive selection by a factor of approximately 16, suggesting a decrease in functional library members rather than the intended enrichment. These observations imply that the negative selection might be too stringent, causing an excessive decimation of functional library members. These results may also reflect the close structural

83 3.2 Biocontainment of TUB170 similarity between Trp and [3,2]Tpa. In the absence of Trp in the cultivation media, the number of functional library members is greatly decimated by the negative selection, and growth on Cm does not depend on the target analog, emphasizing the intricacy of selecting an aaRS mutant capable of discriminating between these two amino acids. After the second positive selection, single colonies were resuspended in sterile water and spotted on NMM19 -Trp agar plates with increasing Cm concentrations and in the absence and presence of the target amino acid [3,2]Tpa. None of the tested colonies exhibit improved growth upon [3,2]Tpa supplementation and most colonies grow even at high Cm concentrations in the absence of the ncAA (Figure 54 b), hinting at TyrRS mutants charging cAAs to their cognate tRNA.

a) b) Cm [µg/mL] -ncAA +0.5mM [3,2]Tpa

2000 37

1500

1000 number of colonies 500

70 0

P1 ctrl P2 ctrl

Figure 54 I Selection with the amino acid [3,2]Tpa on minimal media (NMM19 -Trp). a) Number of colonies on positive selection plates supplemented with [3,2]Tpa compared to control plates lacking the amino acid. The colonies were counted using ImageJ, where every colony larger than 5 pixels was counted. The numbers for the selection plates represent the mean of 2 agar plates with the SD represented as error bars. The colony numbers were normalized to the culture volume. Between both positive selections, a negative selection was performed. b) Single colonies from the first and second round of positive selections with the MjTyrRS library on increasing Cm concentrations with and without the amino acid [3,2]Tpa.

Sometimes reducing the selection pressure imposed on the enzyme library might be helpful to obtain highly active enzymes. In their study, Cooley and collaborators compare their first-generation nitroTyr- RS enzymes with their variants obtained under relaxed selection conditions. They observe that, while the variants exhibit much higher background suppression in the absence of nitroTyr than the first generation enzymes, the second generation variants perform considerably better in the presence of the target ncAA, without any evidence of cAA incorporation. They argue that strict exclusion of cAAs in the absence of the target ncAA is unnecessary as long as the ncAA outcompetes any cAAs under the conditions used for ncAA incorporation512.

84 3 Results and Discussion

a) Approach 1 Approach 2 Cm [µg/mL] selection screening abbreviation selection screening abbreviation 30 positive - P1 positve - P1 - negative - N1 - - 70 positive + NP2 positive + PP2 100 positive + NP3 positive + PP3

b) c) 20000 -ncAA +0.5mM [3,2]Tpa App. 1 App. 2 18000

16000

14000 NP2

12000 70µg/mL 10000 Cm

8000

number of colonies 6000

4000

2000 NP3 0

P1 ctrl ctrl ctrl ctrl ctrl NP2 NP3 PP2 PP3 100 µg/mL Cm

PP2

70µg/mL Cm

PP3

100 µg/mL Cm

Figure 55 I Selection with the amino acid [3,2]Tpa on LB media. a) Overview of the two different approaches employed during selection experiments with 0.5mM [3,2]Tpa. b) Number of colonies on positive selection plates supplemented with [3,2]Tpa compared to control plates lacking the amino acid. The colonies were counted using ImageJ, where every colony larger than 5 pixels was counted. The numbers for the selection plates represent the mean of 2 agar plates with the SD represented as error bars. The colony numbers were normalized to the culture volume. c) Single colonies from the second and third rounds of positive selections with the MjTyrRS library on increasing Cm concentrations with and without the amino acid [3,2]Tpa.

85 3.2 Biocontainment of TUB170

Thus, in addition to the classic double-sieve selection, another approach with consecutive positive selections was tested. A summary of both approaches is indicated in Figure 55 a. The selections were performed on LB and colony numbers for each positive selection were once again monitored (Figure 55 b). Similar to the experiment with minimal media, the number of colonies was significantly reduced after the negative selection (approach 1, NP2), and the number of colonies on the control plates exceeded those on the corresponding selection plates. However, after the third positive selection, NP3 (directly following the second positive selection), the number of colonies rises by almost two orders of magnitude with colonies on the control plate decreasing to approximately 90 % of those on the corresponding selection plates. In the case of approach 2, the colony numbers increase during the second positive selection, PP2, to levels similar to those for NP3, with no difference in colony numbers on the control plate. The number of colonies on the selection plate stays similar during the third positive selection, PP3, but colony numbers decrease somewhat on the control plate to about 85 % compared to the selection plates. These results support the hypothesis of a too stringent negative selection, as colony numbers are only severely decimated after the negative selection and the approach without negative selection even results in the highest difference of colony numbers on the control plate compared to selection plates (PP3, approach 2). Nevertheless, not even approach 2, where no negative selections were performed, resulted in the isolation of promising candidates. Single colonies were screened on increasing Cm concentrations in the absence and presence of the ncAA after the second and third positive selections (Figure 55 c). No enhancement of growth could be observed upon [3,2]Tpa addition for any of the tested colonies, again indicating incorporation of canonical amino acids. At this point, selections for an aaRS capable of discriminating between the ncAA [3,2]Tpa and its canonical counterpart Trp were discontinued.

86 4 Conclusion and Outlook

4 Conclusion and Outlook

4.1 Evolution of bacterial strains toward methionine analog usage

The aim of the projects presented here was to deepen our knowledge of the genetic code, including its evolution and flexibility, by using ncAAs and their incorporation on a proteome-wide scale as a tool. The unambiguous proteome-wide replacement of the latest addition to the genetic code, Trp, with a number of different analogs was already shown231,491. The second part of this thesis discusses attempts at further alienating one of these strains from life as we know it. The first part of this thesis aims at adapting E. coli to replace another late addition to the genetic code, Met, which is the only other amino acid encoded by a single codon. As Met is not only used in protein biosynthesis but is also a precursor for the major cellular methyl donor, such a strain would not only increase our understanding of the genetic code but also our knowledge of transmethylation reactions and pose a potential platform for intriguing trans-alkylation reactions. Furthermore, the more strains with replaced cAA there are and the better the underlying mechanisms are understood, the more options there are for combining them and creating organisms with multiple amino acid replacements, thus alienating them farther and farther (and making them safer). Such adaptation experiments require strains that are auxotrophic for the amino acid to be replaced. Therefore, an MG1655 derivative auxotrophic under all cultivation conditions (cobalamin present/not present, aerobic/anaerobic, Hcy formation via SAH recycling) was created by knocking out the last two genes in the methionine biosynthesis pathway, metE, and metH. The resulting strain, denoted as ∆metEH::FRT, was cultivated in the presence of Eth/TfM for approximately 4 months with decreasing Met concentrations. Unfortunately, even after more than 50 passages no increase in optical densities, indicating adaptation to the Met analogs, could be observed. In comparison, during the adaptation experiments with [3,2]Tp, 4-F-indole, and 5-F-indole the Trp precursor indole was completely removed from the cultivation media within around 25 to 30 passages491,513. The lack of adaptation observed here was attributed to methionine’s role as a precursor for S-adenosyl methionine and the fact that the bacterial enzyme catalyzing this reaction (MAT) is not very permissible towards Met analogs. Thus, two promiscuous archaeal MAT enzymes were chosen to replace the endogenous variant. However, even after three different approaches, the endogenous metK gene could not be replaced with any of the two archaeal orthologs, suggesting that these orthologs do not support E. coli growth. Due to the complexities of methionine’s role as SAM precursor and concomitant difficulties with the experiments, the aim of the project was amended to adaptation towards utilization of the close structural analog ethionine. A metK point mutation known to increase Eth to SAE conversion in B. subtilis was established in ∆metEH::FRT, yielding the strain metK(I302V). Comparison of this new strain's cultivation characteristics with its ancestor revealed, that optical densities of the ancestral strain depend much stronger on the Eth concentration than those of metK(I302V). While as little as 15µM Eth suffice for metK(302V) to grow to optical densities of approximately 1.0, higher Eth concentrations do not increase OD600 values by much. In the case of ∆metEH::FRT, on the other hand, low Eth concentrations result in lower optical densities, while higher Eth concentrations result in higher optical densities than those of metk(I302V). A second adaptive laboratory evolution was conducted with the new strain, which was restricted to 31 passages due to time constraints. While one of the six populations exhibited a striking increase in OD600 in passage 21, indicating a possible adaptation event, this effect could no longer be observed in the following passages. As discussed in chapter 3.1.4 (p. 37), the passage size chosen for a given ALE experiment is crucial to the outcome. The observations made in passage 21 and the following passages suggest, that a beneficial mutation might have arisen in this population, but was lost again during

87 4.1 Evolution of bacterial strains toward methionine analog usage

passaging. It is possible that the passage size of 0.02 OD600, which was chosen because it produced satisfying results during the ALE experiments with Trp, might be too small to allow accumulation of beneficial mutations for the adaptation towards Met analog usage in a reasonable time frame. Comparison of optical densities and CFU assessed after a month of cultivation to those taken before the adaptation experiment, revealed a slight increase in general fitness in the presence of Eth. The plots of the 31Eth populations resemble those of metK(I302V) cultivated in minimal media lacking Eth prior to the ALE experiment, while the plots of the 31Met control populations cultivated in the presence of the analog resemble those of the unadapted strain. These observations could be in line with the cells entering a state of stress upon the first confrontation with the synthetic amino acid, resulting in a noticeable drop of CFU after only 8 h of cultivation. After continuously being confronted with Eth for 31 passages, the number of CFU remains stable in the presence of Eth for at least 24 h. The control populations, on the other hand, exhibit the same drop of CFU after approximately 8 h of cultivation in the presence of Eth, indicating a response to Eth in 31Eth populations rather than simply adaptation to prolonged cultivation in minimal media. Characterization of the [3,2]Tp-adapted strain TUB170 revealed a relaxation of the RpoS-mediated general stress response as the key mechanism for adaptation (chapter 1.3.1, p. 22, data not yet published). The general stress response is triggered upon entering stationary phase and provides protection against a number of different stresses514,515. It is also connected to the stringent response, which is triggered by amino acid starvation516,517. It would certainly be interesting to investigate whether an inactivation or attenuation of the general stress response would also be beneficial for the adaptation towards Eth utilization and if RpoS might already be involved in the slight relaxation of the 31Eth populations.

Furthermore, it would be interesting to investigate why there is such a large increase in OD600, upon Eth addition, and especially why this phenomenon can only be observed for the first few passages, and why there is such a large discrepancy in the number of corresponding CFU. It might be worthwhile to continue the ALE experiments with Eth by reviving the “frozen fossils” from passage 31. Despite there being no general increase in optical densities after a month of cultivation, the stable number of CFU is promising and gives a tentative hope that a complete adaptation to Eth utilization might yet be achievable with the Met-auxotrophic strain established in this study. Taken together, an E. coli strain auxotrophic for Met under all cultivation conditions was established and further improved for ethionine to S-adenosyl ethionine turnover. Finally, tolerance towards the ncAA ethionine could be improved within 31 passages of cultivation in the presence of this Met analog.

88 4 Conclusion and Outlook

4.1 Biocontainment of TUB170

This project aimed to achieve biocontainment of the [3,2]Tp-adapted strain TUB170. While this Trp- auxotrophic strain survives on the non-canonical substrate [3,2]Tp, the complete substitution of the canonical amino acid with the synthetic substrate is only possible in the absence of Trp or its precursor indole. In this strain, incorporation of the ncAA relies on the endogenous TrpRS, which is unable to discriminate between its natural substrate Trp and the close structural analog [3,2]Tpa231. However, by replacing this promiscuous enzyme with one that exclusively recognizes the synthetic amino acid, biocontainment could be achieved, bringing us one step closer to creating safe synthetic organisms unable to survive in a natural environment. Initial attempts to select an orthogonal translation system capable of distinguishing between Trp and [3,2]Tpa focused on the pyrrolysyl system due to its permissiveness towards anticodon mutations. While selection systems for orthogonal translation systems require stop codons, the goal of this project was to incorporate the target ncAA at Trp codons. Thus, after selection of a suitable aaRS, the anticodon on its cognate tRNA would need to be mutated for suppression of Trp codons in the adapted strain TUB170. However, as extensive selection experiments with several different PylRS libraries and a variety of different selection conditions did not yield an enzyme capable of [3,2]Tpa incorporation, other systems more suitable for the incorporation of smaller, aromatic side chains were taken under consideration. The first enzyme that came to mind was the TrpRS enzyme that already recognizes the target amino acid and even charges a tRNA with the correct anticodon. Changing the active site in a way that enables the enzyme to exclusively recognize [3,2]Tpa or at least prefer it over the natural substrate would result in a biocontained organism. An alternative selection system for selections with sense codons was established and suitable controls were implemented. However, a sub-optimally randomized TrpRS library, coupled with a propensity for false-positive results of the alternative selection system lead to reconsiderations of this approach. Furthermore, the goal for biocontained organisms is to keep escape frequencies below the NIH-defined threshold. Using the natural system as a starting point might encourage the reversal of introduced mutations to the wildtype and might thus not be the ideal strategy to obtain stable biocontainment. Therefore, the spotlight fell on a related enzyme that also incorporates an aromatic amino acid and for which a very successful OTS already exists: the M. jannaschii TyrRS/tRNATyr pair. Selection experiments with a MjTyrRS-derived library yielded two promising candidates that showed potential in two different screening assays. Unfortunately, MS analysis revealed Gln incorporation as the main product, which is a common phenomenon during stop-codon suppression in the absence of an OTS503. Ambiguous interpretation of smaller peaks in the mass spectrum coupled with the promising results of the screening assays lead to the hypothesis, that these mutants do incorporate the target amino acid [3,2]Tpa, but not efficiently enough to suppress background suppression. However, efforts to improve variant 33 by creating an error-prone PCR-based library did not succeed. Finally, the amino acid [3,2]Tpa was synthesized enzymatically from its precursor [3,2]Tp, as having to convert the precursor to the corresponding amino acid in addition to expressing the OTS and selection markers might have posed too much stress for the host cells. Furthermore, as the target ncAA had to be produced intracellularly from the precursor, there might not have been enough of the target ncAA present at the onset of the selections to drive selections towards [3,2]Tpa-incorporating variants. This might have been the reason for the lack of success of the previous selection experiments. Reassessment of the promising variants from previous selection experiments in the presence of the amino acid revealed that these mutants likely do not incorporate [3,2]Tpa. Last selection experiments,

89 4.1 Biocontainment of TUB170 this time in the presence of the amino acid rather than the precursor, failed to produce any promising candidates. It is worth mentioning that [3,2]Tpa and its precursor [3,2]Tp are sensitive to light, temperature, and oxidation, resulting in the formation of polymers518,519. Furthermore, the low water solubility of the indole analog resulted in low yields of the enzymatic reaction. Therefore, only small amounts of the ncAA were available for selection experiments, and polymerization of the amino acid during cultivation might have further decreased the concentration available for stop-codon suppression. Thus, it is possible that extensive selection with larger amounts of [3,2]Tpa might yet yield the desired TpaRS, as a large excess of ncAAs is known to be conducive to ncAA incorporation308. However, taken together, the results outlined above suggest that the creation of an enzyme capable of discriminating between Trp and [3,2]Tpa is unlikely to succeed. After all, this analog was chosen for the adaptation experiment precisely for its close structural resemblance to Trp and its good compatibility with the host protein expression system. This close resemblance and its concomitant minimal invasiveness in the proteome are likely the features that drove the success of the adaptation and at the same time represent the Achilles’ heel of the project described here. Nevertheless, the importance of biocontainment has steadily increased over the years, which is perhaps more obvious now than ever. The need for effective containment strategies to prevent the spread of SARS-CoV-2 has gained wide-spread awareness throughout the general public. However, the health sector is only one of numerous areas where biocontainment is critical, especially with the increasing application of GMOs as sustainable industrial workhorses. For example, recently bio-solar 520,521 cell factories for the fixation of CO2 to value-added chemicals were developed . Such applications outside of a controlled laboratory environment increase the risk of GMOs escaping into nature, where the consequences cannot be foreseen. Therefore, Lee and coworkers have recently developed a biocontainment system for producing α-farnesene, by knocking out the CO2 concentrating mechanism. These engineered microbes are no longer able to survive under ambient

CO2 concentrations and depend on high CO2 concentrations for survival and α-farnesene production. Farnesene is a renewable hydrocarbon building block and serves as a precursor for high-performance polymers and is a bio-jet fuel candidate522. Another area where biocontainment plays a significant role is agriculture. Our growing world population is accompanied by an increasing demand for food without an expansion of arable land523,524. To avoid further loss of our natural ecosystems it is thus highly desirable to enhance food production without employing harmful chemicals. One promising approach is the utilization of plant growth- promoting rhizobacteria (PGPR)525, some of which are already commercial526,527. To avoid their spreading throughout the environment and colonizing off-target hosts, a few methods have been developed. A synthetic signaling circuit enables plant-host-specific communication and limits the expression of plant growth-promoting genes to the presence of the desired target528. That same circuit can also be wired to activate essential genes529,530 or control genetic “kill switches”531,532. Genetic manipulation of relies on Agrobacterium tumefaciens as a transformation tool. Hence, prior to the release of transgenic plants, the bacterium must be completely eradicated from the plant, which is not always easy. Biocontainment strategies have the potential to facilitate Agrobacterium removal, but no truly satisfying systems have been generated to date533. Efforts towards Agrobacterium biocontainment include encoding the production of the sugar levan, which is toxic for Gram-negative bacteria, as a kill switch. However, a single mutation suffices to inactivate the toxic gene534. Another approach revolves around controlling the expression of the gene required for T-DNA transfer (virE2) by expressing it from an inducible promoter. However, this system is not completely tight and T-DNA transfer does not entirely depend on virE2 expression535.

90 4 Conclusion and Outlook

More robust biocontainment could likely be achieved by combining multiple approaches, according to the principle “the farther the safer”. Thus, even though various biocontainment strategies for different fields and organisms have emerged in recent years, further improvement and development of new strategies, of which synthetic biology approaches show great promise, would be beneficial. For example, Rubini and Mayer recently reported a system that relies on the catalysis of abiotic reactions by biocompatible Pd and Ru catalysts. They employed an E. coli strain addicted to the ncAA 3-nitro-L- tyrosine (3nY) in the presence of ampicillin and managed to localize growth by spotting the catalyst on Agar plates containing Amp and allyloxycarbonyl-protected 3nY (alloc-3nY). The strain depends on alloc-3nY deprotection by the catalyst for growth. Further, they were able to reuse a second catalyst by entrapping Pd nanoparticles in polystyrene beads, thus creating a modular tool for biocontainment that can be transferred to other organisms and combined with other biocontainment strategies536.

91 5.1 Materials

5 Materials and Methods

5.1 Materials

5.1.1 Chemicals

Standard chemicals were purchased from Carl Roth GmbH (Karlsruhe, Germany), Merck (Darmstadt, Germany), VWR International GmbH (Darmstadt, Germany), or Sigma-Aldrich (St Louis, USA, now Merck). The non-canonical amino acids or precursors were obtained from the following vendors or academic partners: L-β-thieno[3,2-b]pyrrol (3,2[Tp]) was obtained from Patrick Durkin (Technische Universität Berlin, Germany), trifluoromethionine (TfM) was obtained from Tobias Schneider (Technische Universität Berlin, Germany), and Ethionine (Eth) was purchased from Sigma-Aldrich (Taufkirchen, Germany).

5.1.2 Media and supplements

The media were prepared with dH2O and autoclaved at 120 °C for 20 min. Antibiotics were sterilized by filtration (Ø 0,33 μm) and subsequently added to the media. Ampicillin was added after the media had cooled below 50°C. Super Optimal Broth with Catabolite Repression (SOC) medium was prepared freshly from Super Optimal Broth (SOB) medium by supplementation of 20 mM sterile glucose. LB-Agar was prepared by the addition of 1.5 % Agar-Agar to LB media prior to autoclaving. 2X YT Medium was purchased from Carl Roth. MOPS19 was prepared analogously to NMM19, with K2HPO4 reduced to

1 mM and KH2PO4 replaced by 40 mM 3-Morpholinopropane-1-sulfonic acid (MOPS). Both NMM19 and MOPS19 were sterilized by filtration and NMM agar plates were poured by mixing 2x concentrated

NMM medium with 3 % (w/w) agar in dH2O. All components were purchased from Carl Roth GmbH (Karlsruhe, Germany) or Sigma-Aldrich (Taufkirchen, Germany).

Media LB-Medium Bacto-tryptone 10 g/L Yeast extract 5 g/L NaCl 10 g/L Trace elements (1000X)

CuSO4 10 mg/L

ZnCl2 10 mg/L

MnCl2 10 mg/L

(NH4)2MoO4 10 mg/L SOB Yeast extract 5 g/L Tryptone 20 g/L NaCl 10 mM KCl 2.5 mM

MgCl2 10 mM

MgSO4 10 mM

92 5 Materials and Methods

ZYP-5052 ZY Tryptone 1 % 928 mL Yeast extract 0.5 %

20 x P Na2HPO4 50 mM 50 mL

KH2PO4 50 mM

(NH4)SO4 25 mM 50 x 5052 Glycerol 0.5 % 20 mL Glucose 0.05 % α-lactose 0.2 %

1 M MgSO4 MgSO4 2 mM 2 mL 1000 x Trace elements Trace elements 0.2 x 0.2 mL

NMM19 MOPS19

(NH4)2SO4 7.5 mM 7.5 mM NaCl 8.5 mM 8.5 mM MOPS - 40 mM

KH2PO4 22 mM -

K2HPO4 50 mM 1 mM

MgSO4 1 mM 1 mM D-Glucose 20 mM 20 mM All amino acids (except target) 50 mg/L 50 mg/L Ca2+ 1 µg/mL 1 µg/mL Fe2+ 1 µg/mL 1 µg/mL Trace elements 0.01 µg/mL 0.01 µg/mL Thiamine 10 µg/mL 10 µg/mL Biotin 10 µg/mL 10 µg/mL

Supplement Conc. of stock Final concentration solution Ampicillin 100 mg/mL 100 µg/mL Kanamycin 50 mg/mL 50 µg/mL Chloramphenicol 37 mg/mL 37 µg/mL Gentamicin 10 mg/mL 10 µg/mL Tetracycline 10 mg/mL 10 µg/mL Streptomycin 50 mg/mL 50 µg/mL IPTG 1 M 1.0-0.5 mM Arabinose 20 % 0.2 % - 0.002 % (w/v)

93 5.1 Materials

5.1.3 Strains

Strain Genotype Source E. coli G2748 E. coli K12 F- λ- ilvG- rfb-50 rph-1 Δtrp LEDC:: Dr. Volker Doering, FRT+ ΔtnaA:: FRT+ Dr. Michael Hoesl E. coli BL21 (DE3) E. coli B F- ompT hsdS(rB- mB-) dcm+ Tetr gal Budisa group endA Hte E. coli NEB10-beta E. coli K12 Δ(ara-leu) 7697 araD139 fhuA Budisa group ΔlacX74 galK16 galE15 e14- ϕ80dlacZΔM15 recA1 relA1 endA1 nupG rpsL (StrR) rph spoT1 Δ(mrr-hsdRMS-mcrBC) E. coli MG1655 E. coli K12 F- λ- ilvG- rfb-50 rph-1 Budisa group, wild- type K12 strain [M347] E. coli G2748 ∆trpS E. coli K12 F- λ- ilvG- rfb-50 rph-1 ∆trpLEDC::FRT Master’s thesis ∆tnaA::FRT ∆trpS::Kan pEVL648_trpS E. coli G2748 ∆trpS E. coli K12 F- λ- ilvG- rfb-50 rph-1 ∆trpLEDC::FRT This study pEVL648_Aa_trpS ∆tnaA::FRT ∆trpS::Kan pEVL648_Aa_trpS E. coli CB149 E. coli K12 C>DtrpEDCBA2 hsdR514 (rk- mk+) Kawasaki et al.537 supE44 supF58 lacY1 lacU169 galT22 P>ID66 pEBA-10 E. coli JW3805-1 E. coli K12 F- λ- rph-1 hsdR514 Δ(araD-araB)567 CGSC ΔlacZ4787(::rrnB-3) ΔmetE774::kan Δ(rhaD- rhaB)568 E. coli JW3979-1 F-, Δ(araD-araB)567, ΔlacZ4787(::rrnB-3), λ-, rph- CGSC 1, Δ(rhaD-rhaB)568, ΔmetH786::kan, hsdR514 E. coli ∆metEH::FRT E. coli K12 F- λ- ilvG- rfb-50 rph-1 ΔmetE774::FRT This study ΔmetH786::FRT E. coli metK(I302V) E. coli K12 F- λ- ilvG- rfb-50 rph-1 ΔmetE774::FRT This study ΔmetH786::FRT metK(I302V)

94 5 Materials and Methods

5.1.4 Plasmids

Name Ori Resistance marker Source pJZ_Trp_Mm_PylS(Y349F)_Library_pylT_ p15a Amp This study AmpR pBU26'_AraC_pAra_Barnase2xTAG_pylT_ ColE1 Kan This study ColE1 pPAB26'_cat(Q98TAG,D181TAG)_MmPylT ColE1 Kan This study rev_ColE1 pET28a_sfGFP(R2TAG)_His pBR322 Kan Budisa group pQE80L_EcTrpRS(Ala)_CmR ColE1 Cm This study pQE80L_EcTrpRS_Library_CmR ColE1 Cm This study pEVL648_3xSceI_AmpR_AaWRS p15a Amp This study pKD46 OriR101 Amp Budisa group pParaI_SceI p15a Genta Budisa group pEVL648_3xSceI_AmpR_EcWRS p15a Amp Master’s thesis pJZ_Trp AmpR_EcWRS p15a Amp This study pJZ_Trp AmpR_ecWRS(K3TAG) p15a Amp This study pQE80L_EcTrpRS_CmR ColE1 Cm This study pBU181GK_MjTyrRSlib-Azu pUC Amp Budisa group pPAB26'_cat(Q98TAG, D181TAG) p15a Kan Budisa group MjtRNATyropt pBU26'_AraC_pAra_Barnase2xTAg_mjtR p15a Genta Budisa group NATyrCUA_opt pBU16_MjTyrRS_tRNA_33 p15a Amp This study pBU16_MjTyrRS_tRNA_39 p15a Amp This study pBU181GK_33_epLibrary pUC Amp This study pBU181GK_5FWRS_epLibrary pUC Amp This study pPAB26'_cat(Q98TAG,D181TAG)_MjtRNA p15a Kan Budisa group _Tyropt_sfGFP pCP20 OriR101 Amp Budisa group pQE80L_EcMAT ColE1 Amp This study pQE80L_MjMAT ColE1 Amp This study pQE80L_SsMAT ColE1 Amp This study pEVL648_3xSceI_MjMAT p15a Cm This study pEVL648_3xSceI_SsMAT p15a Cm This study pKD4 R6Kg Kan/Amp Budisa group pCAGO pSC101 Amp GenScript486

95 5.1 Materials

5.1.5 Primers

All primers were purchased from Sigma-Aldrich (Steinheim, Germany), resuspended in ddH2O to a stock concentration of 100 µM, and stored at -20°C. From these stocks, the primers were adjusted to a 10 µM or 20 µM working concentration. Primers were generally purchased in desalted form and only primers exceeding about 50 bp in lengths were purchased in HPLC-purified form.

Name Sequence MmPylRS library Mm_lib_backb_fwd GAAGGAGCGTCTCCGGTTTCGGGCTCGAACGCCT Mm_lib_backb_rev GAAGGAGCGTCTCCCTCTTCGAGGTGTTCTTTGCCG Mm_lib_N346_C348_fwd GAAGGAGCGTCTCGAGAGTTTACCATGCTGNNSTTCNNSCAGATGGGA TCGGGA Mm_lib_V401_fwd TCAACCGCGTCTCGGAACTTTCCTCTGCANNSGTCGGACCCATACCG Mm_lib_V401_rev TCAACCGCGTCTCCGTTCCAGGTCTCCGTGCATTACATC Mm_lib_W417_rev GAAGGAGCGTCTCGAACCTGCCCCTATSNNGGGTTTATCAATACC ColE1 to sel. plasmids ColE1_into_560_backb_fwd CAAGGTCTCGGACTAGCATGAGCGGATACATATTTG ColE1_into_560_backb_rev CAAGGTCTCGCCTACACTCCGCTAGCGCTG ColE1_into_560_ins_fwd CAAGGTCTCGTAGGCCGCGTTGCTGGCG ColE1_into_560_ins_rev CAAGGTCTCGAGTCATGACCAAAATCCCTTAACGTGAGTTTTCG ColE1_into_645_backb_fwd CGTGGTCTCGGACTCACTCCGCTAGCGCTG ColE1_into_645_backb_rev CGTGGTCTCGCCTAAGCATGAGCGGATACATATTTG EcTrpRS to Ala ecWRS_F7A_G9A_fwd CACGGTCTCGCTAAGCCCATCGTTGCAAGTGCAGCACAGCCCTCAG ecWRS_lib_rev_new CACGGTCTCGTTAGTCATGGATCCGTGATGGTGATGGTG ecWRS_M132A_D135A_rev CACGGTCTCGTTGATACAGCAGGATTGCCGCTGCTGCCAGCACCGGATA GTC ecWRS_V146A_fwd CACGGTCTCGTCAAACTAATCTGGTACCGGCAGGTGAAGACCAGAAAC EcTrpRS library ecWRS_F7_G9_fwd CACGGTCTCGCTAAGCCCATCGTTNNSAGTNNSGCACAGCCCTCAG ecWRS_lib_rev_new CACGGTCTCGTTAGTCATGGATCCGTGATGGTGATGGTG ecWRS_M132_D135_rev CACGGTCTCGTTGATACAGCAGGATSNNCGCTGCSNNCAGCACCGGAT AGTC ecWRS_V144_V146_fwd CACGGTCTCGTCAAACTAATCTGNNSCCGNNSGGTGAAGACCAGAAAC AaTrpRS rescue plasmid pEVL648_backb_fwd AATTCGTCTCCGAGCGGCCGCTAGGGATAAC pEVL648_backb_rev AATTCGTCTCCATTTAATTAACCTCCTTAGTTTAAACCTAGGC pQE80L_aaTrpRS_fwd AATTCGTCTCGAAATGCGAATAGTTAGCGGAATGAG pQE80L_aaTrpRS_rev AATTCGTCTCGGCTCAGGGAAGGTTCATGG Control TrpRS sel. system ecWRS(K3amber)_fwd TCACGGATCCATGACTTAGCCCATCGTTTTTAGTGG ecWRS(K3amber)_rev CTAATTTCTAGATTACGGCTTCGCCACAAAACC ecWRS_into_pJZ_fwd CTGTAATAATCTAGAGTCGACCTGCAG ecWRS_into_pJZ_rev AATTTCTAGATTACGGCTTCGCC

96 5 Materials and Methods

V33/39 to pBU16 33_39_fwd TGAGGGTCTCCCATATGGATGAATTTGAAATGATTAAACGC 33_39_rev TGAGGGTCTCCAGTTACAGACGTTTACGAATC pBU16Mj_fwd TGAGGGTCTCGAACTGCAGTGATCATCTGACCG pBU16Mj_rev TGAGGGTCTCGTATGGGATTCCTCAAAGCGTA error-prone PCR libraries ep_backb18_fwd GACAGGTCTCGTGCGCTACGCTTATCAGGCC ep_backb18_rev GACAGGTCTCGAACGTATAACGGCGTATGATATTAAAGC ep_insert_fwd GACAGGTCTCCCGTTGTTTACGCTTTAGGAGATATAC ep_insert_rev GACAGGTCTCCCGCATCAGGCAATTTAGCGTT ∆metEH verification C1_metE AATAAACTTGCCGCCTTC C2_metE GTATTACCACCCGGTTTG C5_metE ATCCAGTCCCAAACGTTCCT C1_metH ATCTGGGTTGAGCGTG C2_metH GCGCCCTGTTTGTTG C5_metH TTTATTCACTTCCCACGAGC metK orthologs bb_MAT_pQE_fwd ATCAAGCTTAATTAGCTGAGC bb_MAT_pQE_rev ATGGATCCGTGATGGTG Ec MAT_fwd GCTTCTGGTCTCGCCATGGCAAAACACCTTTTTAC Ec MAT_rev GCTCTTGGTCTCGAGTTACTTCAGACCGGCAGCATC Mj MAT_pQE_fwd ATGGATCCTCAATGCGCAACATCATCG Mj MAT_pQE_rev ATCAAGCTTTTAAAAGGTGGTCACTTTACC Ss MAT_pQE_fwd TCTGGTCTCTCTCAATGCGAAATATTAACGT Ss MAT_pQE_rev ATCAAGCTTTTAAAACAGGGTCGCTTTAC metK rescue plasmids MjMAT_opt FWD GATAGGTCTCCAATGCGCAACATCATCGTG MjMAT_opt REV GTGTGGTCTCCGCTTAAAAGGTGGTCACTTTACCT SsMAT_opt FWD TACAGGTCTCCAATGCGCAATATTAACGTGC SsMAT_opt REV TACAGGTCTCCGCTTAAAACAGGGTCGCTTTAC pEVL648_3xSceI_FWD TATAGGTCTCGAAGCGGCCGCTAGGG pEVL648_3xSceI_REV GAGTGGTCTCGCATTTAATTAACCTCCTTAGTTTAAACCTAGG metK KO approach 1 metK KO_pKD4_fwd ATCCACACAACAGTTTGAGCTAACCAAATTCTCTTTAGGTGATATTAAAT TAGGCTGGAGCTGCTTC metK KO_pKD4_rev GGCCTTTGAACGCAGGTGAAGAAAGATTACTTCAGACCGGCAGCATCG CGTATCCTCCTTAGTTCCTATTCC C1_metK_KO GAGCTAACCAAATTCTCTTTAGGTG C2_metK_KO TGAACGCAGGTGAAGAAAGATTAC C5_metK TTTGATGGATCTTTACCAGAGAATGC C4_Kan CAGTCATAGCCGAATAGCCT metK KO approach 2 EcMAT ed cassette fwd TATGGTCTCCTATGGCAAAACACCTTTTTACGT EcMAT ed cassette rev TAAGGTCTCCGATTACTTCAGACCGGCAGC L homo fwd CCGAACGTAAGTGTGAAAGTTC

97 5.1 Materials

L homo rev TAAGGTCTCCCATATTTAATATCACCTAAAGAGAATTTGGT marker N20PAM rev TATGGTCTCGGACCTTACTTCGGTTCGATGGACTATTACGCCCCGCCCTG C MjMAT ed cassette fwd TATGGTCTCCTATGCGCAACATCATCGTG MjMAT ed cassette rev TAAGGTCTCCGATTAAAAGGTGGTCACTTTACCT R homo fwd TATGGTCTCGGGTCTTTCTTCACCTGCGTTCAAAG R homo rev CGAAGTAATCTGGAATTCATCTGCAATAAACG R short marker fwd TATGGTCTCGAATCTTTCTTCACCTGCGTTCAAAGGCCAGCCTCGCGCTG GCTGGCGAAAATGAGACGTTGATCGGC SsMAT ed cassette fwd TATGGTCTCCTATGCGCAATATTAACGTGCAG SsMAT ed cassette rev TAAGGTCTCCGATTAAAACAGGGTCGCTTTACC metK(I302V) (approach 3) I302V_fwd GCAGTCGGCGTGGCTG I302V_rev CAGCCACGCCGACTGC

5.1.6 Biomolecular reagents, enzymes, and kits

Reagents ROTI®Quant Carl Roth (Karlsruhe, Germany) Ethidium bromide solution 1% Carl Roth (Karlsruhe, Germany) Unstained Protein Marker Thermo Fisher Scientific (Waltham, USA) dNTP mix Thermo Fisher Scientific (Waltham, USA) dCTP Sigma-Aldrich (St Louis, USA) dTTP Sigma-Aldrich (St Louis, USA) GeneRuler DNA Ladder Mix Thermo Fisher Scientific (Waltham, USA)

Commercial buffers 6 x Loading Dye Thermo Fisher Scientific (Waltham, USA) FastDigest® Buffer Thermo Fisher Scientific (Waltham, USA) High Fidelity Buffer New England Biolabs (Ipswich, USA) CutSmart buffer New England Biolabs (Ipswich, USA) T4 DNA Ligase Buffer Thermo Fisher Scientific (Waltham, USA) Q5® Reaction Buffer New England Biolabs (Ipswich, USA) Phusion HF Buffer Thermo Fisher Scientific (Waltham, USA) DreamTaqTM Buffer (10X) Thermo Fisher Scientific (Waltham, USA DreamTaqTM Green Buffer Thermo Fisher Scientific (Waltham, USA)

98 5 Materials and Methods

Enzymes Phusion High-Fidelity DNA Polymerase Thermo Fisher Scientific (Waltham, USA) Q5® High-Fidelity DNA Polymerase New England Biolabs (Ipswich, USA) T4 DNA Ligase Thermo Fisher Scientific (Waltham, USA) DreamTaqTM DNA Polymerase Thermo Fisher Scientific (Waltham, USA) Taq DNA Polymerase AG Budisa Restriction Enzymes New England Biolabs (Ipswich, USA) FastDigest Restriction Enzymes Thermo Fisher Scientific (Waltham, USA) Lysozyme Carl Roth (Karlsruhe, Germany) DNase Carl Roth (Karlsruhe, Germany) RNase Carl Roth (Karlsruhe, Germany)

Kits GeneJETTM Plasmid Miniprep Kit Thermo Fisher Scientific (Waltham, USA) GeneJETTM Plasmid Midiprep Kit Thermo Fisher Scientific (Waltham, USA) GeneJET™ Gel Extraction Kit Thermo Fisher Scientific (Waltham, USA) GeneJET™ PCR Purification Kit Thermo Fisher Scientific (Waltham, USA)

5.1.7 Buffers and Solutions

All buffers were prepared with dH2O or ddH2O (MilliQ, Merck-Millipore).

Agarose gel electrophoresis:

Name Composition 50X TAE buffer 2 M Tris 2 M acetic acid 10 % (v/v) 0.5 M EDTA, pH 8.0 6X DNA loading dye 0.25 % bromphenol blue 0.25 % xylencyanole 30 % glycerol TE buffer 10 mM Tris pH 7.4 1 mM EDTA pH 8.0

99 5.1 Materials

Polyacrylamide gel electrophoresis:

Name Composition 5X SDS loading dye 80 mM TRIS, pH 6.8 10 % SDS 12.5 % glycerol 4 % (v/v) mercaptoethanol 0.2 % (w/v) bromophenol blue SDS running buffer 190 mM glycine 25 mM TRIS 3.5 mM SDS Resolving gel 380 mM Tris-HCl, pH 8.8 15 % Acrylamid / bis-acrylamide (37.5:1) 0.1 % SDS 0.05 % APS 0.05 % TEMED Stacking gel 125 mM TRIS-HCl pH 6.8 5 % acrylamide / bis-actylamide (37.5:1) 0.1 % SDS 0.05 % APS 0.17 % TEMED Coomassie staining solution 1 g Coomassie Brilliant Blue R-250 500 mL ethanol 100 mL glacial acetic acid

ad 1 L dH2O

Protein Purification:

Name Composition IMAC

Lysis/Binding buffer 50 mM Sodium phosphate (NaH2PO4 + Na2HPO4 1:5) 300 mM NaCl pH 8.0

Wash buffer 50 mM Sodium phosphate (NaH2PO4 + Na2HPO4 1:5) 500 mM NaCl 20 mM Imidazol pH 8.0

Elution buffer 50 mM Sodium phosphate (NaH2PO4 + Na2HPO4 1:5) 300 mM NaCl 500 mM Imidazol pH 8.0

100 5 Materials and Methods

Storage Buffer 50 mM Sodium phosphate (NaH2PO4 + Na2HPO4 1:5) 100 mM NaCl 10 % Glycerol pH 8.0 Stripping buffer 20 mM sodium phosphate 0.5 M NaCl 50 mM EDTA pH 7.4 IEX Wash buffer 20 mM Tris 10 mM NaCl pH 8.0 Elution buffer 20 mM Tris 1 M NaCl pH 8.0 Storage buffer 20 mM Tris 10 mM NaCl 10 % Glycerol pH 8.0 TrpBA Expression Wash buffer 50 mM Tris 150 mM NaCl pH 7.8 Buffer T 50 mM Tris 5 mM EDTA 10 mM β-mercaptoethanol 0.1 mM pyridoxal phosphate (PLP) pH 7.8 5X Buffer B 250 mM Bicine 5 mM EDTA 0.1 mM PLP 5 mM DTT pH 7.8

101 5.1 Materials

5.1.8 Miscellaneous

Miscellaneous HisTrap HP GE Healthcare Life Sciences (München, Germany) HiTrap Q HP GE Healthcare Life Sciences (München, Germany) MicrosepTM Advance Pall Laboratory (Dreieich, Germany) PD-10 desalting columns GE Healthcare Life Sciences (München, Germany) UV cuvettes, z = 8.5 Brand (Wertheim, Germany) Rotilabo®UV cuvettes, 1.6 mL Carl Roth (Karlsruhe, Germany) Dialysis membrane ZelluTrans, MWCO 3500 Carl Roth (Karlsruhe, Germany) Electroporation cuvettes VWR (Darmstadt, Germany) Parafilm® Pechiney Plastic Packaging (Chicago, USA) Bottle top vacuum filtration systems, 0.22 µm VWR (Darmstadt, Germany) MF™- Membrane Filters, 0.025 µm VSWP Merck (Darmstadt, Germany) Rotilabo® Syringe Filters, 0.22 µm Carl Roth (Karlsruhe, Germany) Rotilabo® Syringe Filters, 0.45 µm Carl Roth (Karlsruhe, Germany)

5.1.9 Technical equipment

Technical equipment Balances: TE 1502S Sartorius (Göttingen, Germany) GR-120 A&D (San Jose, CA, USA) Mettler PE 3600 Deltarange Mettler Toledo (Gießen, Germany) Centrifuges: Avanti J-26 XP Beckman Coulter (Krefeld, Germany) Centrifuge 5810 R Eppendorf AG (Hamburg, Germany) Centrifuge 5418 R Eppendorf AG (Hamburg, Germany) Heraeus Fresco 17 Thermo Fisher Scientific (Waltham, MA, USA) MiniSpin plus Eppendorf AG (Hamburg, Germany) Gel electrophoresis: Horizontal agarose gel system Factory of the Max-Planck Institute for Biochemistry (Martinsried, Germany) Electrophoresis unit Hoefer Scientific Instruments (Holliston, MA, USA) Vertical SDS-gel system Factory of the Max-Planck Institute for Biochemistry (Martinsried, Germany) Incubators, mixers & shakers: Ecotron Infors HT (Einsbach, Germany) Multitron Infors HT (Einsbach, Germany) Incubator series B, KB Binder (Tuttlingen, Germany) Liquid Chromatography: ÄKTA purifier GE Healthcare Life Sciences (München, Germany) Peristaltic pump P1 Pharmacia Biotech (now: GE Healthcare Life Sciences, München)

102 5 Materials and Methods

Mass spectrometry: Agilent 6530 Accurate-Mass Q-TOF Agilent (Santa Clara, CA, USA) LTQ Orbitrap XL Thermo Fisher Scientific (Waltham, USA) Spectroscopy: Ultrospec 6300 pro Amersham Biosciences (now: GE Healthcare, München, Germany) BioPhotometer plus Eppendorf AG (Hamburg, Germany) Microplate reader Infinite M200 Tecan (Männedorf, Switzerland) Thermocyclers: Mastercycler Gradient Eppendorf AG (Hamburg, Germany) Peqstar 2x Gradient Peqlab (Erlangen, Germany) Thermomixer: Thermomixer compact Eppendorf AG (Hamburg, Germany) Thermomixer 5437 Eppendorf AG (Hamburg, Germany) Mixing Block MB-102 Bioer Technology (Binjiang, China) Other devices: Sonopuls HD 3200 Bandelin (Berlin, Germany) Sonotrodes MS72, KE76 Bandelin (Berlin, Germany) Microfluidizer M-110L Microfluidics (Newton, MA, USA) Power supply Power Pack P25 T Biometra (Jena, Germany) Power supply Consort EV 261 und E 143 Sigma-Aldrich (Taufkirchen, Germany) Power supply Consort E 143 Sigma-Aldrich (Taufkirchen, Germany) Orbital shaker Rotamax 120 Heidolph (Schwabach, Germany) Microwave KOR-6305 Daewoo (Butzbach, Germany) IKA Combimag RET IKA (Staufen, Germany) Vortex Genie™ Bender & Hobein AG (Zürich, Switzerland) Ice machine Scotsman AF 80 Scotsman (Vernon Hills, IL, USA) pH-Meter S20-SevenEasy™ Mettler Toledo (Gießen, Germany) Gel-documentation system Felix 2050 Biostep (Jahnsdorf, Germany) Scanner ViewPix 700 Biostep (Jahnsdorf, Germany) Water bath VWB 12 VWR (Darmstadt, Germany) MicroPulser™ Bio-Rad Laboratories GmbH (München, Germany)

103 5.2 Methods

5.2 Methods

5.2.1 Polymerase chain reaction (PCR)

The PCR reactions were always prepared on ice and the thermocycler was preheated to 98 °C to avoid side products due to unspecific interactions and amplification during heating of the cycler. In the case of longer primers prone to the formation of secondary structures, DMSO was often supplied to the reaction mixture to further avoid side product formation. DMSO decreases the melting temperature (Tm), thus destabilizing weaker hydrogen bonding in unspecific primer hybridization. For a standard PCR reaction, the following program was used:

Step Temperature Time Initial Denaturation 98 °C 1 min 30 Cycles: Denaturation 98 °C 10 s Annealing 55-72 °C 20 s Elongation 72 °C 30 s/kb Final Extension 72 °C 2 min Store 8 °C forever

Annealing temperatures depend on the size and sequence of the primers and were calculated using the Tm calculator from New England Biolabs (NEB). When using primers with overhangs, a PCR program with two different cycles was used. In the beginning, only a part of the primer will anneal to the template. Only after a few rounds of amplification will there be a template that includes the overhang, so that the whole primer can anneal. Therefore, to accommodate both annealing temperatures (for only part of the primer and the whole primer), a program with two cycles with different annealing temperatures was written. This PCR program was written as follows:

Step Temperature Time Initial Denaturation 98 °C 1 min 5 Cycles: Denaturation 98 °C 10 s Annealing 55-72 °C 20 s Elongation 72 °C 30 s/kb 25 Cycles: Denaturation 98 °C 10 s Annealing 55-72 °C 20 s Elongation 72 °C 30 s/kb Final Extension 72 °C 2 min Store 8 °C forever

Mutations were introduced via site-directed mutagenesis using mutagenic primers. These primers carried the mutation flanked by sequences of about 15 to 25 nucleotide lengths complementary to the target sequence. Sequences amplified with these primers carried the desired mutation. The parental template DNA was degraded using DpnI nuclease.

104 5 Materials and Methods

A typical PCR reaction was prepared as follows:

5x Reaction Buffer 5 µL dNTP Mix (10mM) 0.5 µL Primer forward (fwd) 1.25 µL Primer reversed (rev) 1.25µL Template DNA 0.5 ng DNA polymerase 0.25 µL (DMSO 1.25 µL) Nuclease-free water ad 25 µL

The verification of a knockout or a more complex cloning step was done by colony PCR. A colony of interest was picked from the agar and resuspended in 50 µL of sterile water. The bacteria were lysed by heating the suspension to 95 °C for 10 min, releasing the DNA from the cells. This suspension was then used as a template for the PCR reaction, which was typically prepared as follows:

MgCl2 0.6 µL Taq Polymerase 0.6 µL dNTP Mix 0.5 µL DreamTaq Green Buffer 2 µL Primer fwd (20 µM) 0.5 µL Primer rev (20 µM) 0.5 µL Bacterial lysate 1 µL Nuclease-free water ad 20 µL

Step Temperature Time Initial Denaturation 95 °C 2 min 30 Cycles: Denaturation 95 °C 20 s Annealing 60 °C 20 s Elongation 72 °C 1 min/kb Final Extension 72 °C 5 min Store 8 °C forever

5.2.2 DNA purification and Gel extraction

After PCR reactions or restriction digests the DNA was purified using the GeneJET™ PCR Purification Kit according to the manufacturer’s specifications. Longer DNA fragments of similar size were separated via agarose gel electrophoresis, the desired fragment was excised from the gel and purified using the GeneJET™ Gel Extraction Kit from Thermo Fisher Scientific (Waltham, USA).

105 5.2 Methods

5.2.3 Protein expression

Stop codon suppression of sfGFP(R2TAG)

An overnight culture of BL21(DE3) co-transformed with pET28a_sfGFP(R2TAG) and pBU16_MjTyrRS_tRNA_33 / pBU16_MjTyrRS_tRNA_39 was prepared in 2X YT medium supplemented with the appropriate selection markers and 1% glucose to suppress any leaky expression. The overnight culture was inoculated 1:200 in 50 mL of 2X YT media supplied with the appropriate selection markers and 0.1% glucose and grown until an OD600 of approximately 0.8-1.0 at 37°C and 220 rpm. Concentrations ranging from 0.5 mM-1.5 mM [3,2]Tp were added and the cultures were further incubated for 30 min to allow [3,2]Tp to [3,2]Tpa conversion. After 30 min sfGFP expression was induced with 1 mM IPTG. After 4 h of expression at 37°C, the cells were harvested for 15 min at 5000 x g and 4°C, and the cell pellet was stored at -80°C.

Heterologous MAT expressions

500 mL of LB supplied with the appropriate antibiotics was inoculated 1:100 with an overnight culture of BL21(DE3) harboring pQE80L_EcMAT / MjMAT / SsMAT and grown until an OD600 of approximately 0.8-1.0 at 37°C. After induction with 0.5 mM IPTG, the MATs were overexpressed for 4 h at 30°C. The cells were harvested at 5,000 x g and 4°C for 15 min and the pellet was stored at -80°C.

5.2.4 Protein purification

Stop codon suppression of sfGFP(R2TAG)

The pellet was thawed on ice and washed once with lysis buffer. After resuspension in about 15 mL lysis buffer, 3 mM MgCl2 were added together with 1.5 mg lysozyme, 135 µg DNase, and 135 µg RNase per 1 g of cells. After 1 h incubation on ice with occasional inverting, the cells were lysed with a microfluidizer. Cell debris was pelleted at 13,500 x g and 4°C for 1 h and the supernatant was filtered (0.45 µm). The lysate was brought to 20 mM imidazole and loaded onto a HisTrap column pre- equilibrated with lysis buffer. IMAC purification was performed at RT using a peristaltic pump. After washing with 30-50 column volumes (CV) of wash buffer, sfGFP was eluted with elution buffer until no more green protein was left on the column. The eluate was collected in aliquots of approximately 0.5 mL and dialyzed three times in 1 L of storage buffer for several hours at 8°C. The progress of the purification was characterized via SDS PAGE analysis (see appendix). The samples were stored at -80°C until MS analysis.

Heterologous MAT expressions

The cells were washed with lysis buffer, treated with lysozyme, DNase, RNase, lysed, and filtered as described above, with the exception of additionally being treated with 1 mM PMSF during lysis. Here, IMAC purification was performed using an ÄKTA purifier at 8°C. The lysates were brought to 5 mM- 20 mM imidazole prior to sample loading and washed with 20 CV of a high salt wash buffer containing 15 mM imidazole. The proteins were eluted with a linear gradient of 15 CV ranging from 15 mM imidazole to 450 mM imidazole or a step elution with the following steps: 10 CV 30 mM imidazole,

106 5 Materials and Methods

10 CV 440 mM imidazole, 10 CV 500 mM imidazole. The eluates were collected in 2-4 mL aliquots and dialyzed 3 times in 1 L storage buffer for several hours at 8°C. During the last dialysis step, 1 mM DTT was added to the buffer. The samples were stored at -80°C. For further purification, the samples were additionally purified via anion exchange chromatography using a 5 mL HiTrap Q HP column on an ÄKTA purifier at 8°C. The buffers were changed to the IEX wash buffer via dialysis. After sample loading and washing with 20 CV of wash buffer, the MATs were eluted with a 15 CV linear gradient ranging from 15 mM-250 mM imidazole. The eluates were concentrated with centrifugal filter devices at 4°C and 5,000 x g and stored at -80°C. The progress of the purification was characterized via SDS PAGE analysis (see appendix).

5.2.5 Agarose gel electrophoresis

All analytical gels were prepared with 1 % agarose and 1xTAE buffer. They were run at 100 V for usually about 20-30 min. When separating larger fragments (2-4 kb) of similar size in preparative gels, such as linearized backbone from undigested backbone, 1.5 % agarose gels were used for better separation. They were run at 80 V for 1-1.5 h to increase resolution. Prior to loading of the gels, all samples were supplied with 1 x Loading Dye from Thermo Fisher Scientific. Loading dye increases the density of the sample, causing the DNA to sink to the bottom of the gel pocket. It also contains bromophenol blue (migrates at 370 bp) and xylene cyanol FF (migrates at 4160 bp) to visualize the progress of the gel. For the characterization of the gels GeneRulerTM 1 kb DNA Ladder Mix from Thermo Fisher Scientific, containing DNA fragments of defined length, was loaded on the gels together with the samples. The DNA was made visible with ethidium bromide.

5.2.6 Polyacrylamide gel electrophoresis

Sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) is used to separate proteins according to their molecular mass. Prior to loading, the samples were mixed with 5X SDS loading dye and boiled at 95°C for 10 min. PageRuler unstained protein ladder was used for the characterization of protein lengths. The separation was performed with the SDS running buffer at 80 V until the samples had passed through the stacking gel, then the power was increased to 100 V. The proteins were visualized by staining the gel with the Coomassie staining solution (60 min at RT), followed by destaining with hot dH2O.

5.2.7 Restriction digest

A typical restriction digest was prepared as follows:

DNA 1 µg Restriction enzyme 1 µL 10 x Digestion buffer 2 µL Nuclease-free water ad 20 µL

The volume of the restriction enzyme depends on the amount of DNA in the reaction mixture. A volume of 1 µL of restriction enzyme (1 unit) can digest 1 µg of DNA in 1 h. When using the appropriate

107 5.2 Methods enzymes (FastDigest/High-Fidelity) digestion times could be increased to ensure complete digestion of higher amounts of DNA without having to increase the volume of the enzyme. All digestion reactions were incubated at the temperature indicated by the manufacturer (usually 37 °C).

5.2.8 Ligation

Ligation reactions were prepared as follows:

Backbone x mol Insert 5 x mol T4 DNA Ligase Buffer 2 µL T4 DNA Ligase 1 µL Nuclease-free water ad 20 µL

Backbone and insert were used in a 1:5 ratio. The reactions were either incubated at 16 °C overnight or in case of larger sticky ends and less complicated ligations for 1 h at room temperature.

5.2.9 Assembly of aaRS Libraries

5.2.9.1 Site-saturation mutagenesis

The libraries were constructed via golden gate cloning493,494. This method uses a type IIS restriction enzyme (e. g. BsaI or BsmbI), which cuts outside of its recognition sequence, allowing scarless and parallel DNA assembly. The libraries were created by randomizing the target residues in the substrate binding pocket to NNS. The randomizations were introduced by the primers, bearing the REase recognition sites as overhangs. After digestion and purification, all PCR fragments were ligated in a large one-pot ligation (10 x 50µL aliquots to ensure library coverage) at 16°C overnight to ensure complete hybridization of the library fragments. The aliquots were pooled, purified, and transformed using freshly prepared electrocompetent NEB10-beta cells. The cells were spread on several large LB agar plates containing the appropriate selection marker and incubated at 37°C overnight. The cells were washed from the plates with 3 + 2 mL of LB and library plasmids were isolated using the GeneJETTM Plasmid Midiprep Kit from Thermo Fisher Scientific.

5.2.9.2 Error-prone PCR

To obtain mutations distributed throughout the entire enzyme, the error-prone PCR-based libraries were generated with primers that bind at the beginning and the end of the gene. The error rate of the

DreamTaq DNA polymerase (Thermo Fisher Scientific) was enhanced by the addition of MnCl2 and increasing the MgCl2 concentration. A bias towards GC enrichment in the amplified sequence was minimized by enhancing dCTP and dTTP concentrations.

108 5 Materials and Methods

DreamTaq Buffer 15 µL dNTP Mix 3 µL dCTP (10 mM) 12 µL dTTP (10 mM) 12 µL Primer fwd (20 µM) 1.5 µL Primer rev (20 µM) 1.5 µL

MgCl2 (50 mM) 9 µL

MnCl2 (25 mM) 1.2 µL Template 3 ng DreamTaq DNA Pol. 6 µL Nuclease-free water ad 150 µL

Step Temperature Time Initial Denaturation 95 °C 2 min 20 Cycles: Denaturation 95 °C 30 s Annealing 61.5 °C 30 s Elongation 72 °C 1 min/kb Final Extension 72 °C 5 min Store 8 °C forever

Digestion, ligation, and plasmid isolation were carried out as described for the other libraries.

5.2.10 Double-sieve selection

Double-sieve selections were always started with a round of positive selection. For the first positive selection, the transformation efficiency was assessed by spotting 2 µL of a serial dilution of the recovered transformation on agar plates under non-selection conditions. These plates were incubated at 37 °C overnight. The rest of the recovered transformation was spread on large agar plates, whereby 330-660 µL of culture was spread on one plate. For each positive selection, one control plate without the target ncAA (or precursor) was prepared. After incubation at 30 °C for 48-72 h, the plates were scanned and colony numbers were counted using the ImageJ software. The cells were harvested by rinsing each agar plate with 3 mL media using a 1 mL pipette. The process was repeated with another 2 mL of media to remove any residual bacteria and all cells from one round of positive selection were pooled. Depending on the culture volume and density, plasmids were extracted using the GeneJETTM Plasmid Miniprep or Midiprep Kit. The selection plasmid was digested and the library plasmid purified via PCR purification or gel extraction. Typically, positive selections were followed by a round of negative selection, unless otherwise stated. Here, 500 µL of recovered transformation were spread on each large LB agar plate containing 0.02 % arabinose for induction of barnase expression. The target ncAA was not added to negative selection plates. These plates were incubated at 30 °C for only 14-16 h to avoid inactivation of the toxic barnase gene. The cells were harvested, plasmids extracted and selection plasmids extracted and purified as described above. Iterative rounds of positive and negative selections were performed two to three times. Usually, after the second and/or third round of positive selection, single colonies were screened for promising aaRS variants. To this end, each colony was resuspended in 50 µL of sterile water and 2 µL of each

109 5.2 Methods suspension were spotted on agar plates with increasing Cm concentrations and in the absence and presence of the target ncAA. They were additionally spotted on a backup plate under non-selection conditions. The plates were incubated at 37 °C overnight and scanned. Promising candidates (improved growth upon ncAA supplementation) were isolated and further characterized by repeating the assay with dilution series. Promising candidates were additionally screened in a fluorescence assay.

5.2.11 Fluorescence readout

Promising library members were screened via a fluorescence readout using a super folder green fluorescent protein (sfGFP) gene with an internal codon mutated to amber. For this, the library plasmid was isolated, the selection plasmid digested and the aaRS mutant cloned to a low copy backbone harboring the cognate tRNA. This plasmid was co-transformed with pET28a_sfGFP(R2TAG)_His into BL21(DE3) cells. Overnight cultures were inoculated into a black 96-well-plate with a clear bottom to avoid cross-talk between the wells but allow for absorption measurements (at 600 nm) from the bottom in addition to fluorescence (at 511 nm). ZYP medium, a medium for auto-induction of protein expression, was inoculated 1:100 with the overnight cultures containing the different constructs and the sfGFP gene with the amber stop codon. The cultures were incubated in the Infinite®M200 plate reader (Tecan Group AG, Männedorf, Switzerland) for 24 h (200 µL culture/well with a gas-permeable membrane covering the well-plate). The fluorescence was detected over time and is proportional to the amount of expressed sfGFP. Excitation was set to 481nm and emission was collected at 511 nm. In parallel, the absorption at 600 nm was also detected.

5.2.12 Expression of tryptophan synthase

The Trp synthase from Salmonella typhimurium was overexpressed as previously reported511. 1 L of TB medium supplemented with Amp was inoculated 1:200 with an overnight culture of E. coli CB149 harboring the plasmid pEBA10 for overexpression of S. typhimurium tryptophan synthase. The cells were cultivated at 37°C until an OD600 of approximately 0.8 was reached and expression of the Trp synthase was induced with 1 mM IPTG. The temperature was reduced to 30°C for overnight expression. The cells were harvested (20 min, 5000 x g, 4°C) and washed once with 1 L wash buffer (50 mM TRIS, 150 mM NaCl, pH 7.8). After resuspension in buffer T (50 mM TRIS, 5 mM EDTA, 10 mM β- mercaptoethanol, 0.1 mM pyridoxal phosphate (PLP), pH 7.8) supplemented with 1 mM PMSF and lysozyme (1.5 mg/1 g of cells) and 20 min incubation on ice, the cells were disrupted via homogenization. Cell debris was pelleted at 17,000 x g and 4°C for 1 hour and the supernatant was supplemented with 5 mM spermine and 6 % PEG 8000. After another 5 min of centrifugation, the supernatant was crystallized for 60 h at 4°C. The crystals were washed (17,000 x g, 4°C, 20 min) and washed with 5 mL buffer T containing 5 mM spermine and 6 % PEG 8000. The precipitate was resuspended in 2 mL of buffer B (50 mM Bicine, 1 mM EDTA, 0.02 mM PLP, 1 mM DTT, pH 7.8) at 37°C for 10 min and dialyzed in 1 L buffer B for several hours. The sample was transferred to a fresh batch of buffer B (4 L) and dialyzed overnight at 4°C. After centrifugation at 17,000 x g and 4°C for 15 min, the supernatant was brought to 10 % glycerine and stored at -80°C.

110 5 Materials and Methods

5.2.13 Enzymatic [3,2]Tpa synthesis

[3,2]Tpa was enzymatically synthesized from [3,2]Tp and serine as previously reported510. 200 mL of 100 mM phosphate buffer supplied with 0.2 mM freshly added PLP as co-factor, 3 mM serine, 2 mM [3,2]Tp, and 1 mL of S. typhimurium tryptophan synthase solution was incubated at 37°C and 80 rpm for 24 h. The reaction was monitored via thin-layer chromatography (TLC) with 1-butanol, acetic acid, and water in a 2:1:1 ratio as mobile phase and indole as well as Trp as controls. After stopping the reaction by freeze-drying, the crude product was resuspended in 0.1 M HCl and desalted via cation exchange chromatography (Dowex 50WX8 resin). After lyophilization, the sample was resuspended in 10 % acetonitrile and purified via reversed-phase chromatography on a preparative C18 column

(gradient of 10 % B to 30 % B over 25 min, 100 % B for 3 min, A = H2O + 0.1 % formic acid, B = acetonitrile). [3,2]Tpa synthesis was verified via TLC, ESI-MS analysis (LTQ Orbitrap XL, Thermo Fisher Scientific), and cultivation of the adapted strain in minimal media lacking any cAA and solely supplied with [3,2]Tpa.

5.2.14 Production of Competent Cells

For standard transformations, chemically competent cells were used. Electrocompetent cells were used where a high transformation efficiency was required (e. g. library transformation).

Electrocompetent Cells

250 mL of SOB medium with appropriate antibiotics were inoculated with 1 mL of an overnight culture.

The culture was incubated at 37 °C and 200 rpm until an optical density at λ = 600 nm (OD600) of about 0.5-0.6 was achieved. The cells were chilled on ice for 10 min and then transferred to cold 50 mL centrifuge tubes. From here on the cells were strictly handled on ice. After centrifugation at 3,000 g for 10 min at 4 °C, the supernatant was discarded and the cells were resuspended in 30 mL of 10 % glycerol (sterile, 4 °C). When the cells were used immediately without freezing them for storage, milliQ water was used. The cells were washed three times with 30 mL of 10 % glycerol to avoid short circuits during electroporation and centrifuged at 3,000 g for 10 min at 4 °C. During the washing steps, the cells were pooled and after the last step, the supernatant was discarded. The cells were resuspended in the remaining drop of 10 % glycerol. The cells were stored at -80 °C in 50 µL aliquots.

Chemically Competent Cells

200 mL of LB medium with the appropriate antibiotics were inoculated 1:100 with an overnight culture and incubated at 37 °C and 200 rpm. The cells were grown to an OD600 of about 0.5-0.6 and then chilled on ice for 10 min. After centrifugation at 5,000 g for 5 min at 4 °C, the cells were resuspended in 10 mL of cold 100 mM MgCl2 and then incubated on ice for 20-30 min. After centrifugation at 5,000 g for 5 min at 4 °C, the supernatant was discarded and the cells were resuspended in 1 mL of cold

100 mM CaCl2/15 % glycerol. After preparing 100 µL aliquots the cells were stored at -80 °C.

111 5.2 Methods

5.2.15 Bacterial transformation

To introduce a desired plasmid into the cells, the bacteria were transformed either via electroporation or via chemical transformation.

Electroporation

Electroporation cuvettes were incubated on ice and the electrocompetent cells were thawed on ice. 20 µL of the cells were mixed with 0.5-1 µg of DNA. For electroporation, the DNA needs to be solved in water or dialyzed before electroporation to avoid short circuits. The cell suspension was placed between the electrodes of the cuvette. The outside of the cuvette was dried to avoid short circuits and an electrical pulse of 1.8 kV was applied. Immediately afterward 980 µL of SOC medium were added to the cells and they were incubated in a thermomixer at 37 °C for 1 h. The cells were centrifuged at 5,000 g for 2 min and resuspended in about 100-200 µL of the medium. The bacteria were plated on LB agar plates with the appropriate antibiotics and usually incubated at 37 °C overnight. However, during positive selections, the cells were incubated at 30 °C for 48 h and during negative selections, they were incubated at 30 °C for 14-16 h.

Chemical Transformation

After thawing the chemically competent cells on ice, 50 µL of the cells were mixed with 1-5 µL of DNA. The suspension was incubated on ice for 20 min and then incubated at 42 °C for 2 min. After cooling the cells shortly on ice, 950 µL of SOC or LB medium were applied to the cells. The cells were incubated at 37 °C for 1 h and 50-100µL were plated on LB agar with the appropriate antibiotics. The plates were incubated at 37 °C overnight.

5.2.16 Isolation of plasmid DNA

For the isolation of plasmid DNA from standard overnight cultures (5-6 mL), the GeneJETTM Plasmid Miniprep Kit was used according to the manufacturer's instructions. For larger culture volumes the GeneJETTM Plasmid Midiprep Kit from Thermo Fisher Scientific was used.

5.2.17 Genome engineering

5.2.17.1 Transfer of knockouts using phage P1

Phage P1 is a temperate bacteriophage that can choose between the lysogenic cycle in which it exists inside the bacterial cell without producing progeny and the lytic cycle during which the virus produces new particles resulting in the host’s cell death. During lysogeny, the phage DNA exists inside the bacterial cell in an independent circular form. The desired phage P1 particles can be produced by infecting the donor strain with the phage in the presence of CaCl2. As phage P1 needs calcium for infectivity, the phage will enter the lytic cycle and produce viral particles in which occasionally random bacterial DNA is inserted instead of the viral genome. This is exploited in the lab to transfer genetic markers, like antibiotic resistance genes, from one strain to another strain. These particles are then

112 5 Materials and Methods used to transduce the recipient strain. Cells infected with particles containing the viral genome will produce new particles and die. Those particles containing bacterial DNA are incapable of virus production. Once injected into the host, the bacterial DNA will insert into the host genome by homologous recombination, replacing the gene containing homologous regions. By cultivating the infected recipient strain e.g. on the appropriate antibiotic and in the presence of sodium citrate, cells transduced with the genetic marker can be selected. Citrate decreases the concentration of free calcium by chelation, thus inhibiting functional phage P1 particles to infect the surrounding cells. Only those cells transduced with the genetic marker will survive538,539. In this study, strains from the Keio collection386 were used as donor strains (supplied by the Coli Genetic Stock Center, CGSC). The Keio collection is a collection of strains containing single-gene deletions of non-essential genes, which were knocked out via the Datsenko and Wanner method540. In place of the target gene, these strains harbor a kanamycin resistance cassette for selection, which is flanked by FLP recognition target (FRT) sites. By expressing the FLP recombinase the resistance gene can later be eliminated through recombination of these FRT sites.

Preparation of Phages 5 mL of LB medium with the appropriate antibiotics were inoculated with 50 µL of an overnight culture of the donor strain. The cells were incubated at 37 °C and 200 rpm until an OD600 of about 0.6 was reached. After the addition of 5 mM CaCl2, the cells were incubated at 37 °C and 200 rpm for another 30 min. The donor cell suspension was then mixed 1:1 with a solution of phage P1: 100 µL phage P1 (10-3) 100 µL donor cell suspension Three microcentrifuge tubes were prepared. The mixtures were incubated at 37 °C (without shaking) for 20 min and then added to 4 mL of 0.6 % soft agar with 5 mM CaCl2. The soft agar mixture was subsequently poured on LB agar plates (4 mL soft agar per plate) and incubated at 37 °C overnight. The soft agar containing plaques was removed with a drigalski spatula and 800 µL of chloroform was added per plate. The soft agar/chloroform mixture was then vortexed until only small agar pieces remained. It was centrifuged at 15,422 g for 10 min and the supernatant was transferred to a glass bottle with a lid. A few drops of chloroform were added and the phage solution was stored at 4 °C.

Transduction 5 mL of LB medium with the appropriate antibiotics were inoculated with 50 µL of an overnight culture of the recipient strain and incubated at 37 °C and 200 rpm until an OD600 of 0.6-0.7 was reached. After the addition of CaCl2 (end concentration of 5 mM), the culture was incubated at 37 °C and 200 rpm for another 30 min. The following mixtures were prepared: 1) 3 µL donor phage solution + 1 mL recipient strain suspension 2) 30 µL donor phage solution + 1 mL recipient strain suspension 3) 1 mL recipient strain suspension (control)

4) 10 µL donor phage solution + 1 mL LB medium + 5 µL 1 M CaCl2 (control) The mixtures were incubated at 37 °C for 15 min (without shaking). They were vortexed, centrifuged at 7,000 rpm for 1 min, and resuspended in 1 mL LB medium with 0.1 M sodium citrate. The samples were incubated at 37 °C for 45 min and plated on LB agar plates with the appropriate antibiotics. The plates were incubated at 37 °C overnight and the knockout was verified via colony PCR.

The Kan resistance cassette was flp-out by the transformation of pCP20, which harbors the FLP recombinase. After successful KanR removal, the plasmid was cured by incubation at 42°C.

113 5.2 Methods

5.2.17.2 CRISPR/Cas9

The CRISPR/Cas system is an ancient immune system in bacteria as a protection against foreign DNA (e.g. from viruses). After the survival of an infection, bits of the foreign DNA (called spacer) are stored in a CRISPR array, which serves as a library. Interspersed between each spacer in the CRISPR array are repeat regions. After the entire CRISPR array is transcribed into one long CRISPR RNA, another RNA molecule, the trans-activating CRISPR RNA binds to the repeat regions of the CRISPR array and recruits Cas9. The CRISPR array is then processed by RNase III, yielding functional CRISPR/Cas9 units containing a single spacer (and tracrRNA). Upon a recurring infection with the same pathogen, its DNA will be targeted by the CRISPR/Cas9 unit bearing the homologous RNA fragment (spacer) and the foreign DNA will be degraded. The region of the foreign DNA that is homologous to the spacer is called protospacer and requires a protospacer adjacent motif (PAM) to be targeted by CRISPR/Cas9. This is a defense system against self-cleavage483. The CRISPR/Cas system can be exploited in the laboratory for genome engineering. For this purpose, scientists have developed a single guide RNA (gRNA or sgRNA), which combines the spacer and the attached tracrRNA into one molecule. Various genome engineering techniques make use of the CRISPR/Cas system, in this study the CAGO technique486 was chosen, which employs a universal gRNA. This universal gRNA is encoded on the pCAGO plasmid and targets the N20PAM fragment, which is inserted into the editing cassette. Also encoded on the pCAGO plasmid is the Cas9 protein, as well as the λ Red system. It is comprised of the genes γ, β, and exo from the bacteriophage λ. The γ gene product, Gam, inhibits the host RecBCD exonuclease V which would otherwise degrade the linear editing cassette. The gene products from β and exo, Bet and Exo, can then bind to the ends of the PCR product to promote recombination and replace the gene of interest with the editing cassette harboring an antibiotic resistance gene for selection. After successful recombination, the selection marker is removed. Cas9 bound to the universal gRNA targets the N20PAM sequence embedded in the editing cassette and induces a double-strand break, which is repaired by λ Red-mediated recombination of the R short region of the editing cassette with its homologous region in the right homology arm of the cassette. Thus, the selection marker between these regions is cut out.

114 5 Materials and Methods

Figure 56 I Schematic overview of processes involved in the CRISPR/Cas9 response. Figure supplied by © Johan Jarnestad / The Royal Swedish Academy of Sciences. 115 5.2 Methods

The editing cassette was assembled from four PCR fragments in a one-pot golden gate assembly according to the following program:

N20PAM_marker 100 ng L_homo 75 ng Insert 140 ng R homo 68 ng T4 buffer 1.5 µL FD Eco31I 1 µL T4 DNA ligase 1 µL ddH2O ad 15 µL

Step Temperature Time 25 Cycles: Digestion 37 °C 3 min Ligation 16 °C 4 min Inactivation of enzymes 80 °C 5 min Store 8 °C forever

The editing cassette was amplified via PCR and purified via gel extraction as described above. 5 mL of LBAmp were inoculated 1:100 with an overnight culture of ∆metEH::FRT transformed with pCAGO. The culture was incubated at 30 °C until an OD600 of approximately 0.2 was reached and the λ Red system was induced with 1 mM IPTG. The cells were further cultivated at 30 °C until an OD600 of approximately 0.6 was reached. The cells were made electrocompetent as described above, resuspended in 100 µL of cold ddH2O and 50 µL aliquots were prepared. Approximately 400 ng of the editing cassette were transformed via electroporation and the cells were recovered for 1 h 45 min at 30 °C. After recovery, the cells were spread on LB agar plates containing the appropriate antibiotics and 1% glucose to suppress leaky expression of Cas9 from the arabinose promoter. Single colonies were picked and the recombination of the editing cassette was verified via sequencing. Positive clones were incubated overnight at 30 °C in 5 mL LBAmp containing 1 mM IPTG for expression of the λ Red system and 10 mM arabinose for Cas9 expression. The cultures were spread on LB agar plates containing Amp and incubated for 30 °C overnight. Once again, single colonies were picked and the removal of the selection marker was verified via sequencing. The plasmid pCAGO was cured by incubation at 42 °C until no more growth on Amp could be observed.

5.2.18 DNA concentration measurements

The concentration of DNA can be measured using UV/Vis spectroscopy where the sample is irradiated with monochromatic light. The sample absorbs the light and the decreased light intensity is detected by the spectrometer. The aromatic nucleobases of the DNA absorb at 260 nm. In a certain range of concentration the Lambert-Beer law can be applied:

�0 � = log ( ) = � × � × � � �

116 5 Materials and Methods

Where A = absorbance

I0 = incident intensity I = transmitted intensity

ελ = molar absorption coefficient [L/mol*cm] c = concentration of the sample [mol/L] � = path length [cm]

This law can be used to calculate the concentration of a sample from the absorbance, the molar absorption coefficient, and the width of the cuvette.

5.2.19 Protein concentration measurements

Protein concentrations were calculated by measuring the absorbance of the aromatic side chains at

280 nm. The molar extinction coefficient (εM) at 280 nm was calculated (ProtParam, Expasy Proteomics Server, http://web.expasy.org/protparam/) using the Edelhoch method541. The concentration of sfGFP was determined at 488 nm, where the chromophore absorbs, using the published extinction coefficient542. The extinction coefficients and molecular masses of the proteins used in this study are shown below.

Protein ԑ[µM-1 cm-1] molecular mass [Da] EcMAT 0.04137 43787.61 MjMAT 0.02636 47088.18 SsMAT 0.02785 46499.29 sfGFP 0.0833 27725.19

Additionally, protein concentrations were determined via the Bradford method543. In acidic pH, the dye coomassie brilliant blue G-250 reacts with the cationic and apolar hydrophobic side chains of proteins and forms complexes, whereby its absorption shifts from 470 nm (free form) to 595 nm in its complexed form. The increase in absorption at 595 nm can therefore be used to measure protein concentrations. For the calibration, the standard protein BSA was used.

5.2.20 Sequencing

Sequencing is a method for the determination of the nucleotide sequence of nucleic acids. The Sanger method works analogous to PCR, with the distinction that apart from the regular dNTP’s, also labeled dideoxyribonucleotides (ddNTP’s) lacking both, the 2’ and the 3’ OH group, are supplied in the reaction mixture. Whenever the polymerase incorporates a ddNTP the elongation is terminated due to the lacking 3’ OH group, resulting in a mixture of oligonucleotides of different lengths. The identity of the last incorporated nucleotide can be decrypted by labeling each kind of ddNTP with a different fluorescent dye. Thus, after the separation of the fragments, the color pattern indicates the nucleotide sequence.

Sequencing of the samples in this study was handled by Seqlab laboratories in Göttingen.

117 5.2 Methods

5.2.21 Mass spectrometry (MS)

Molecular masses of purified intact proteins were measured by electrospray LC-MS on an Agilent 6530 quadrupole time of flight (QTOF) instrument (Agilent, Santa Clara, CA, USA) after external calibration coupled with an Agilent 1260 HPLC system. Samples were infused at a flow rate of 0.3 - 0.5 mL min-1 onto a gradient from 5 % Acetonitrile w/0.1 % formic acid in water to 80 % Acetonitrile w/0.1 % formic acid in water through a Discovery Bio Wide Pore C5 column, 2.1x100, 3 micron (Supelco analytical, Sigma-Aldrich, St. Lois, USA) over 20 minutes. Spectra deconvolution was performed with Agilent MassHunter Qualitative Analysis software version B.06.00 Bioconfirm Intact mass module, employing the maximum entropy deconvolution algorithm.

The molecular mass of [3,2]Tpa was measured by the MS service of the TU Berlin with an LTQ Orbitrap XL from Thermo Fisher Scientific (Waltham, USA).

118 6 Bibliography

6 Bibliography

1. Einstein, A. Über den gegenwärtigen Stand der Feld-Theorie. in Festschrift Prof. Dr. A. Stodola zum 70. Geburtstag (ed. Hohegger, E. (Hrsg. .) 126 (Zürich und Leipzig, Orell Füssli, 1929). 2. Gamow, G. Possible relation between deoxyribonucleic acid and protein structures. Nature vol. 173 318 (1954). 3. Woese, C. R., Dugre, D. H., Dugre, S. A., Kondo, M. & Saxinger, W. C. On the fundamental nature and evolution of the genetic code. Cold Spring Harb. Symp. Quant. Biol. 31, 723–736 (1966). 4. Woese, C. R., Dugre, D. H., Saxinger, W. C. & Dugre, S. A. The molecular basis for the genetic code. Proc. Natl. Acad. Sci. U. S. A. 55, 966–974 (1966). 5. Yarus, M. Amino acids as RNA ligands: A direct-RNA-template theory for the code’s origin. J. Mol. Evol. 47, 109–117 (1998). 6. Yarus, M., Caporaso, J. G. & Knight, R. Origins of the genetic code: the escaped triplet theory. Annu. Rev. Biochem. 74, 179–198 (2005). 7. Yarus, M., Widmann, J. J. & Knight, R. RNA-amino acid binding: a stereochemical era for the genetic code. J. Mol. Evol. 69, 406–429 (2009). 8. Koonin, E. V. & Novozhilov, A. S. Origin and Evolution of the Universal Genetic Code. Annu. Rev. Genet. 51, 45–62 (2017). 9. Crick, F. H. C. The origin of the genetic code. J. Mol. Biol. 38, 367–379 (1968). 10. Knight, R. D., Freeland, S. J. & Landweber, L. F. Selection, history and chemistry: the three faces of the genetic code. Trends Biochem. Sci. 24, 241–247 (1999). 11. Keeling, P. J. Genomics: Evolution of the Genetic Code. Curr. Biol. 26, R851–R853 (2016). 12. Novozhilov, A. S., Wolf, Y. I. & Koonin, E. V. Evolution of the genetic code: partial optimization of a random code for robustness to translation error in a rugged fitness landscape. Biol. Direct 2, 24 (2007). 13. Aggarwal, N., Bandhu, A. V. & Sengupta, S. Finite population analysis of the effect of horizontal gene transfer on the origin of an universal and optimal genetic code. Phys. Biol. 13, 36007 (2016). 14. Vetsigian, K., Woese, C. & Goldenfeld, N. Collective evolution and the genetic code. Proc. Natl. Acad. Sci. U. S. A. 103, 10696–10701 (2006). 15. Santos, M. A. S., Moura, G., Massey, S. E. & Tuite, M. F. Driving change: the evolution of alternative genetic codes. Trends Genet. 20, 95–102 (2004). 16. Sengupta, S. & Higgs, P. G. Pathways of Genetic Code Evolution in Ancient and Modern Organisms. J. Mol. Evol. 80, 229–243 (2015). 17. Ling, J., O’Donoghue, P. & Söll, D. Genetic code flexibility in microorganisms: novel mechanisms and impact on physiology. Nat. Rev. Microbiol. 13, 707–721 (2015). 18. Kun, Á. & Radványi, Á. The evolution of the genetic code: Impasses and challenges. BioSystems 164, 217–225 (2018). 19. Di Giulio, M. An extension of the coevolution theory of the origin of the genetic code. Biol. Direct 3, 37 (2008).

119 6 Bibliography

20. Di Giulio, M. The lack of foundation in the mechanism on which are based the physico-chemical theories for the origin of the genetic code is counterposed to the credible and natural mechanism suggested by the coevolution theory. J. Theor. Biol. 399, 134–140 (2016). 21. Wong, J. T. A co-evolution theory of the genetic code. Proc. Natl. Acad. Sci. U. S. A. 72, 1909– 1912 (1975). 22. Wong, J. T.-F., Ng, S.-K., Mat, W.-K., Hu, T. & Xue, H. Coevolution Theory of the Genetic Code at Age Forty: Pathway to Translation and Synthetic Life. Life (Basel, Switzerland) 6, (2016). 23. Miller, S. L. A production of amino acids under possible primitive earth conditions. Science (80- . ). 117, 528–529 (1953). 24. Patel, B. H., Percivalle, C., Ritson, D. J., Duffy, C. D. & Sutherland, J. D. Common origins of RNA, protein and lipid precursors in a cyanosulfidic protometabolism. Nat. Chem. 7, 301–307 (2015). 25. Zaia, D. A. M., Zaia, C. T. B. V & De Santana, H. Which amino acids should be used in prebiotic chemistry studies? Orig. life Evol. Biosph. J. Int. Soc. Study Orig. Life 38, 469–488 (2008). 26. Pizzarello, S. The chemistry of life’s origin: a carbonaceous meteorite perspective. Acc. Chem. Res. 39, 231–237 (2006). 27. Cleaves, H. J. 2nd. The origin of the biologically coded amino acids. J. Theor. Biol. 263, 490–498 (2010). 28. Burton, A. S., Stern, J. C., Elsila, J. E., Glavin, D. P. & Dworkin, J. P. Understanding prebiotic chemistry through the analysis of extraterrestrial amino acids and nucleobases in meteorites. Chem. Soc. Rev. 41, 5459–5472 (2012). 29. Szathmáry, E. Coding coenzyme handles: a hypothesis for the origin of the genetic code. Proc. Natl. Acad. Sci. U. S. A. 90, 9916–9920 (1993). 30. Szathmáry, E. The origin of the genetic code: amino acids as cofactors in an RNA world. Trends Genet. 15, 223–229 (1999). 31. Jadhav, V. R. & Yarus, M. Coenzymes as coribozymes. Biochimie 84, 877–888 (2002). 32. Saran, D., Frank, J. & Burke, D. H. The tyranny of adenosine recognition among RNA aptamers to coenzyme A. BMC Evol. Biol. 3, 26 (2003). 33. Adamala, K. & Szostak, J. W. Competition between model protocells driven by an encapsulated catalyst. Nat. Chem. 5, 495–501 (2013). 34. Krawiec, S. & Riley, M. Organization of the bacterial chromosome. Microbiol. Rev. 54, 502–539 (1990). 35. Muto, A. & Osawa, S. The guanine and cytosine content of genomic DNA and bacterial evolution. Proc. Natl. Acad. Sci. 84, 166 LP – 169 (1987). 36. Freeland, S. J. & Hurst, L. D. The genetic code is one in a million. J. Mol. Evol. 47, 238–248 (1998). 37. Saier, M. H. Understanding the Genetic Code. J. Bacteriol. 201, 1–12 (2019). 38. Massey, S. E. A sequential ‘2-1-3’ model of genetic code evolution that explains codon constraints. Journal of molecular evolution vol. 62 809–810 (2006). 39. Higgs, P. G. A four-column theory for the origin of the genetic code: tracing the evolutionary pathways that gave rise to an optimized code. Biol. Direct 4, 16 (2009). 40. Archetti, M. Selection on codon usage for error minimization at the protein level. J. Mol. Evol. 59, 400–415 (2004).

120 6 Bibliography

41. Taylor, F. J. & Coates, D. The code within the codons. Biosystems. 22, 177–187 (1989). 42. Di Giulio, M. Some pungent arguments against the physico-chemical theories of the origin of the genetic code and corroborating the coevolution theory. J. Theor. Biol. 414, 1–4 (2017). 43. Goodarzi, H., Nejad, H. A. & Torabi, N. On the optimality of the genetic code, with the consideration of termination codons. Biosystems. 77, 163–173 (2004). 44. Haig, D. & Hurst, L. D. A quantitative measure of error minimization in the genetic code. J. Mol. Evol. 49, 708 (1999). 45. Massey, S. E. A neutral origin for error minimization in the genetic code. J. Mol. Evol. 67, 510– 516 (2008). 46. Massey, S. E. Genetic code evolution reveals the neutral emergence of mutational robustness, and information as an evolutionary constraint. Life (Basel, Switzerland) 5, 1301–1332 (2015). 47. Massey, S. E. The neutral emergence of error minimized genetic codes superior to the standard genetic code. J. Theor. Biol. 408, 237–242 (2016). 48. Salinas, D. G., Gallardo, M. O. & Osorio, M. I. Local conditions for global stability in the space of codons of the genetic code. Biosystems. 150, 73–77 (2016). 49. Torabi, N., Goodarzi, H. & Shateri Najafabadi, H. The case for an error minimizing set of coding amino acids. J. Theor. Biol. 244, 737–744 (2007). 50. Zhu, W. & Freeland, S. The standard genetic code enhances adaptive evolution of proteins. J. Theor. Biol. 239, 63–70 (2006). 51. Fitch, W. M. & Upper, K. The phylogeny of tRNA sequences provides evidence for ambiguity reduction in the origin of the genetic code. Cold Spring Harb. Symp. Quant. Biol. 52, 759–767 (1987). 52. Allnér, O. & Nilsson, L. Nucleotide modifications and tRNA anticodon-mRNA codon interactions on the ribosome. RNA 17, 2177–2188 (2011). 53. Grosjean, H. & Westhof, E. An integrated, structure- and energy-based view of the genetic code. Nucleic Acids Res. 44, 8020–8040 (2016). 54. Artymiuk, P. J., Rice, D. W., Poirrette, A. R. & Willet, P. A tale of two synthetases. Nature structural biology vol. 1 758–760 (1994). 55. Anantharaman, V., Koonin, E. V & Aravind, L. Comparative genomics and evolution of proteins involved in RNA metabolism. Nucleic Acids Res. 30, 1427–1464 (2002). 56. Aravind, L., Anantharaman, V. & Koonin, E. V. Monophyly of class I aminoacyl tRNA synthetase, USPA, ETFP, , and PP-ATPase nucleotide-binding domains: implications for protein evolution in the RNA. Proteins 48, 1–14 (2002). 57. Hartman, H. & Smith, T. F. The evolution of the ribosome and the genetic code. Life 4, 227–249 (2014). 58. Klipcan, L. & Safro, M. Amino acid biogenesis, evolution of the genetic code and aminoacyl- tRNA synthetases. J. Theor. Biol. 228, 389–396 (2004). 59. Schimmel, P., Giegé, R., Moras, D. & Yokoyama, S. An operational RNA code for amino acids and possible relationship to genetic code. Proc. Natl. Acad. Sci. U. S. A. 90, 8763–8768 (1993). 60. Rodin, S., Rodin, A. & Ohno, S. The presence of codon-anticodon pairs in the acceptor stem of tRNAs. Proc. Natl. Acad. Sci. U. S. A. 93, 4537–4542 (1996).

121 6 Bibliography

61. Hsiao, C. et al. Molecular paleontology: a biochemical model of the ancestral ribosome. Nucleic Acids Res. 41, 3373–3385 (2013). 62. Kubyshkin, V. & Budisa, N. Anticipating alien cells with alternative genetic codes: away from the alanine world! Curr. Opin. Biotechnol. 60, 242–249 (2019). 63. Kubyshkin, V. & Budisa, N. The alanine world model for the development of the amino acid repertoire in protein biosynthesis. Int. J. Mol. Sci. 20, (2019). 64. King, J. L. & Jukes, T. H. Non-Darwinian evolution. Science 164, 788–798 (1969). 65. Chaney, J. L. & Clark, P. L. Roles for Synonymous Codon Usage in Protein Biogenesis. Annu. Rev. Biophys. 44, 143–166 (2015). 66. Gutman, G. A. & Hatfield, G. W. Nonrandom utilization of codon pairs in Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 86, 3699–3703 (1989). 67. Ambrogelly, A., Palioura, S. & Söll, D. Natural expansion of the genetic code. Nat. Chem. Biol. 3, 29–35 (2007). 68. Mukai, T. et al. Facile Recoding of Selenocysteine in Nature. Angew. Chem. Int. Ed. Engl. 55, 5337–5341 (2016). 69. Lobanov, A. V, Turanov, A. A., Hatfield, D. L. & Gladyshev, V. N. Dual functions of codons in the genetic code. Crit. Rev. Biochem. Mol. Biol. 45, 257–265 (2010). 70. Yuan, J. et al. Distinct genetic code expansion strategies for selenocysteine and pyrrolysine are reflected in different aminoacyl-tRNA formation systems. FEBS Lett. 584, 342–349 (2010). 71. Zhang, Y., Baranov, P. V, Atkins, J. F. & Gladyshev, V. N. Pyrrolysine and selenocysteine use dissimilar decoding strategies. J. Biol. Chem. 280, 20740–20751 (2005). 72. Santiveri, C. M. & Jiménez, M. A. Tryptophan residues: scarce in proteins but strong stabilizers of β-hairpin peptides. Biopolymers 94, 779–790 (2010). 73. Bender, A., Hajieva, P. & Moosmann, B. Adaptive antioxidant methionine accumulation in respiratory chain complexes explains the use of a deviant genetic code in mitochondria. Proc. Natl. Acad. Sci. U. S. A. 105, 16496–16501 (2008). 74. Ma, J. C. & Dougherty, D. A. The Cationminus signpi Interaction. Chem. Rev. 97, 1303–1324 (1997). 75. Gallivan, J. P. & Dougherty, D. A. Cation-pi interactions in structural biology. Proc. Natl. Acad. Sci. U. S. A. 96, 9459–9464 (1999). 76. Samanta, U., Pal, D. & Chakrabarti, P. Environment of tryptophan side chains in proteins. Proteins 38, 288–300 (2000). 77. Burley, S. K. & Petsko, G. A. Aromatic-aromatic interaction: a mechanism of protein structure stabilization. Science 229, 23–28 (1985). 78. Samanta, U., Pal, D. & Chakrabarti, P. Packing of aromatic rings against tryptophan residues in proteins. Acta Crystallogr. D. Biol. Crystallogr. 55, 1421–1427 (1999). 79. Guvench, O. & Brooks, C. L. 3rd. Tryptophan side chain electrostatic interactions determine edge-to-face vs parallel-displaced tryptophan side chain geometries in the designed beta- hairpin ‘trpzip2’. J. Am. Chem. Soc. 127, 4668–4674 (2005). 80. Hunter, C. A. & Sanders, J. K. M. The Nature of π-π Interactions. J. Am. Chem. Soc. 112, 5525– 5534 (1990).

122 6 Bibliography

81. Trinquier, G. & Sanejouand, Y. H. Which effective property of amino acids is best preserved by the genetic code? Protein Eng. 11, 153–169 (1998). 82. Mant, C. T., Kovacs, J. M., Kim, H.-M., Pollock, D. D. & Hodges, R. S. Intrinsic amino acid side- chain hydrophilicity/hydrophobicity coefficients determined by reversed-phase high- performance liquid chromatography of model peptides: comparison with other hydrophilicity/hydrophobicity scales. Biopolymers 92, 573–595 (2009). 83. Lepthien, S., Wiltschi, B., Bolic, B. & Budisa, N. In vivo engineering of proteins with nitrogen- containing tryptophan analogs. Appl. Microbiol. Biotechnol. 73, 740–754 (2006). 84. Pérez-Cañadillas, J. M. Grabbing the message: structural basis of mRNA 3’UTR recognition by Hrp1. EMBO J. 25, 3167–3178 (2006). 85. Naismith, J. H. Tryptophan oxygenation: Mechanistic considerations. Biochem. Soc. Trans. 40, 509–514 (2012). 86. Simat, T. J. & Steinhart, H. Oxidation of Free Tryptophan and Tryptophan Residues in Peptides and Proteins. J. Agric. Food Chem. 46, 490–498 (1998). 87. Chan, D. I., Prenner, E. J. & Vogel, H. J. Tryptophan- and arginine-rich antimicrobial peptides: structures and mechanisms of action. Biochim. Biophys. Acta 1758, 1184–1202 (2006). 88. Alkhalaf, L. M. & Ryan, K. S. Biosynthetic manipulation of tryptophan in bacteria: Pathways and mechanisms. Chem. Biol. 22, 317–328 (2015). 89. Budisa, N. & Pal, P. P. Designing novel spectral classes of proteins with a tryptophan-expanded genetic code. Biol. Chem. 385, 893–904 (2004). 90. Budisa, N. et al. Toward the experimental codon reassignment in vivo: protein building with an expanded amino acid repertoire. FASEB J. Off. Publ. Fed. Am. Soc. Exp. Biol. 13, 41–51 (1999). 91. Aledo, J. C. Methionine in proteins: The Cinderella of the proteinogenic amino acids. Protein Sci. 28, 1785–1796 (2019). 92. Orabi, E. A. & English, A. M. Modeling Protein S-Aromatic Motifs Reveals Their Structural and Redox Flexibility. J. Phys. Chem. B 122, 3760–3770 (2018). 93. Orabi, E. A. & English, A. M. Predicting structural and energetic changes in Met-aromatic motifs on methionine oxidation to the sulfoxide and sulfone. Phys. Chem. Chem. Phys. 20, 23132– 23141 (2018). 94. Doublié, S., Bricogne, G., Gilmore, C. & Carter, C. W. Tryptophanyl-tRNA synthetase crystal structure reveals an unexpected homology to tyrosyl-tRNA synthetase. Structure 3, 17–31 (1995). 95. Janin, J. & Wodak, S. Conformation of amino acid side-chains in proteins. J. Mol. Biol. 125, 357– 386 (1978). 96. Levine, R. L., Mosoni, L., Berlett, B. S. & Stadtman, E. R. Methionine residues as endogenous antioxidants in proteins. Proc. Natl. Acad. Sci. U. S. A. 93, 15036–15040 (1996). 97. Luo, S. & Levine, R. L. Methionine in proteins defends against oxidative stress. FASEB J. Off. Publ. Fed. Am. Soc. Exp. Biol. 23, 464–472 (2009). 98. Aledo, J. C., Li, Y., de Magalhães, J. P., Ruíz-Camacho, M. & Pérez-Claros, J. A. Mitochondrially encoded methionine is inversely related to longevity in mammals. Aging Cell 10, 198–207 (2011). 99. Lee, J. Y. et al. Promiscuous methionyl-tRNA synthetase mediates adaptive mistranslation to

123 6 Bibliography

protect cells against oxidative stress. J. Cell Sci. 127, 4234–4245 (2014). 100. Wiltrout, E., Goodenbour, J. M., Fréchin, M. & Pan, T. Misacylation of tRNA with methionine in Saccharomyces cerevisiae. Nucleic Acids Res. 40, 10494–10506 (2012). 101. Lee, B. C. et al. MsrB1 and MICALs regulate actin assembly and macrophage function via reversible stereoselective methionine oxidation. Mol. Cell 51, 397–404 (2013). 102. Hung, R.-J., Pak, C. W. & Terman, J. R. Direct redox regulation of F-actin assembly and disassembly by Mical. Science 334, 1710–1713 (2011). 103. Hung, R.-J., Spaeth, C. S., Yesilyurt, H. G. & Terman, J. R. SelR reverses Mical-mediated oxidation of actin to regulate F-actin dynamics. Nat. Cell Biol. 15, 1445–1454 (2013). 104. Veredas, F. J., Cantón, F. R. & Aledo, J. C. Methionine residues around phosphorylation sites are preferentially oxidized in vivo under stress conditions. Sci. Rep. 7, 40403 (2017). 105. Hardin, S. C., Larue, C. T., Oh, M.-H., Jain, V. & Huber, S. C. Coupling oxidative signals to protein phosphorylation via methionine oxidation in Arabidopsis. Biochem. J. 422, 305–312 (2009). 106. Erickson, J. R. et al. A dynamic pathway for calcium-independent activation of CaMKII by methionine oxidation. Cell 133, 462–474 (2008). 107. Carruthers, N. J. & Stemmer, P. M. Methionine oxidation in the calmodulin-binding domain of calcineurin disrupts calmodulin binding and calcineurin activation. Biochemistry 47, 3085–3095 (2008). 108. Dayhoff, M., Schwartz, R. & Orcutt, B. A model of evolutionary change in proteins. Atlas Protein Seq Struct. 5, 345–352 (1978). 109. Marimoutou, M., Springer, D. A., Liu, C., Kim, G. & Levine, R. L. Oxidation of Methionine 77 in Calmodulin Alters Mouse Growth and Behavior. Antioxidants (Basel, Switzerland) 7, (2018). 110. Elmallah, M. I. Y., Borgmeyer, U., Betzel, C. & Redecke, L. Impact of methionine oxidation as an initial event on the pathway of human prion protein conversion. Prion 7, 404–411 (2013). 111. Granold, M., Hajieva, P., Toşa, M. I., Irimie, F.-D. & Moosmann, B. Modern diversification of the amino acid repertoire driven by oxygen. Proc. Natl. Acad. Sci. U. S. A. 115, 41–46 (2018). 112. Bennett, B. D. et al. Absolute metabolite concentrations and implied enzyme active site occupancy in Escherichia coli. Nat. Chem. Biol. 5, 593–599 (2009). 113. Cantoni, G. L. Biological methylation: selected aspects. Annu. Rev. Biochem. 44, 435–451 (1975). 114. Markham, G. D., Hafner, E. W., Tabor, C. W. & Tabor, H. S-Adenosylmethionine synthetase from Escherichia coli. J. Biol. Chem. 255, 9082–9092 (1980). 115. Chiang, P. K. et al. S-Adenosylmethionine and methylation. FASEB J. Off. Publ. Fed. Am. Soc. Exp. Biol. 10, 471–480 (1996). 116. Fontecave, M., Atta, M. & Mulliez, E. S -adenosylmethionine : nothing goes to waste. 29, 1–7 (2004). 117. Huber, T. D., Johnson, B. R., Zhang, J. & Thorson, J. S. AdoMet analog synthesis and utilization : current state of the art. Curr. Opin. Biotechnol. 42, 189–197 (2016). 118. Palmer, J. L. & Abeles, R. H. The mechanism of action of S-adenosylhomocysteinase. J. Biol. Chem. 254, 1217–1226 (1979). 119. Ludwig, M. L. & Matthews, R. G. Structure-based perspectives on B12-dependent enzymes. Annu. Rev. Biochem. 66, 269–313 (1997).

124 6 Bibliography

120. Wion, D. & Casadesús, J. N6-methyl-adenine: an epigenetic signal for DNA-protein interactions. Nat. Rev. Microbiol. 4, 183–192 (2006). 121. Kumar, R. & Rao, D. N. Role of DNA methyltransferases in epigenetic regulation in bacteria. Subcell. Biochem. 61, 81–102 (2013). 122. Vasu, K. & Nagaraja, V. Diverse functions of restriction-modification systems in addition to cellular defense. Microbiol. Mol. Biol. Rev. 77, 53–72 (2013). 123. Roberts, R. J., Vincze, T., Posfai, J. & Macelis, D. REBASE--a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Res. 38, D234-6 (2010). 124. Wilson, G. G. Organization of restriction-modification systems. Nucleic Acids Res. 19, 2539– 2566 (1991). 125. Campbell, J. L. & Kleckner, N. E. coli oriC and the dnaA gene promoter are sequestered from dam methyltransferase following the passage of the chromosomal replication fork. Cell 62, 967–979 (1990). 126. Ogden, G. B., Pratt, M. J. & Schaechter, M. The replicative origin of the E. coli chromosome binds to cell membranes only when hemimethylated. Cell 54, 127–135 (1988). 127. Glickman, B. W. & Radman, M. Escherichia coli mutator mutants deficient in methylation- instructed DNA mismatch correction. Proc. Natl. Acad. Sci. U. S. A. 77, 1063–1067 (1980). 128. Claverys, J. P. & Lacks, S. A. Heteroduplex deoxyribonucleic acid base mismatch repair in bacteria. Microbiol. Rev. 50, 133–165 (1986). 129. Tavazoie, S. & Church, G. M. Quantitative whole-genome analysis of DNA-protein interactions by in vivo methylase protection in E. coli. Nat. Biotechnol. 16, 566–571 (1998). 130. Marinus, M. G. & Casadesus, J. Roles of DNA adenine methylation in host-pathogen interactions: mismatch repair, transcriptional regulation, and more. FEMS Microbiol. Rev. 33, 488–503 (2009). 131. Kahramanoglou, C. et al. Genomics of DNA cytosine methylation in Escherichia coli reveals its role in stationary phase transcription. Nat. Commun. 3, 886 (2012). 132. Grainger, D. C. Structure and function of bacterial H-NS protein. Biochem. Soc. Trans. 44, 1561– 1569 (2016). 133. Dorman, C. J. & Deighan, P. Regulation of gene expression by histone-like proteins in bacteria. Curr. Opin. Genet. Dev. 13, 179–184 (2003). 134. Kumar, R., Mukhopadhyay, A. K., Ghosh, P. & Rao, D. N. Comparative transcriptomics of H. pylori strains AM5, SS1 and their hpyAVIBM deletion mutants: possible roles of cytosine methylation. PLoS One 7, e42303 (2012). 135. Sánchez-Romero, M. A., Cota, I. & Casadesús, J. DNA methylation in bacteria: from the methyl group to the methylome. Curr. Opin. Microbiol. 25, 9–16 (2015). 136. Machnicka, M. A. et al. MODOMICS: a database of RNA modification pathways--2013 update. Nucleic Acids Res. 41, D262-7 (2013). 137. Sloan, K. E. et al. Tuning the ribosome: The influence of rRNA modification on eukaryotic ribosome biogenesis and function. RNA Biol. 14, 1138–1152 (2017). 138. Baudin-Baillieu, A. et al. Nucleotide modifications in three functionally important regions of the Saccharomyces cerevisiae ribosome affect translation accuracy. Nucleic Acids Res. 37, 7665– 7677 (2009).

125 6 Bibliography

139. Polikanov, Y. S., Melnikov, S. V, Söll, D. & Steitz, T. A. Structural insights into the role of rRNA modifications in protein synthesis and ribosome assembly. Nat. Struct. Mol. Biol. 22, 342–344 (2015). 140. Burakovsky, D. E. et al. Impact of methylations of m2G966/m5C967 in 16S rRNA on bacterial fitness and translation initiation. Nucleic Acids Res. 40, 7885–7895 (2012). 141. Arora, S. et al. Distinctive contributions of the ribosomal P-site elements m2G966, m5C967 and the C-terminal tail of the S9 protein in the fidelity of initiation of translation in Escherichia coli. Nucleic Acids Res. 41, 4963–4975 (2013). 142. Prokhorova, I. V et al. Modified nucleotides m(2)G966/m(5)C967 of Escherichia coli 16S rRNA are required for attenuation of tryptophan operon. Sci. Rep. 3, 3236 (2013). 143. van Buul, C. P., Visser, W. & van Knippenberg, P. H. Increased translational fidelity caused by the antibiotic kasugamycin and ribosomal ambiguity in mutants harbouring the ksgA gene. FEBS Lett. 177, 119–124 (1984). 144. Okamoto, S. et al. Loss of a conserved 7-methylguanosine modification in 16S rRNA confers low- level streptomycin resistance in bacteria. Mol. Microbiol. 63, 1096–1106 (2007). 145. Arenz, S. & Wilson, D. N. Blast from the Past: Reassessing Forgotten Translation Inhibitors, Antibiotic Selectivity, and Resistance Mechanisms to Aid Drug Development. Mol. Cell 61, 3– 14 (2016). 146. Taylor, F. R. & Cronan, J. E. J. Cyclopropane fatty acid synthase of Escherichia coli. Stabilization, purification, and interaction with phospholipid vesicles. Biochemistry 18, 3292–3300 (1979). 147. Grogan, D. W. & Cronan, J. E. J. Cyclopropane ring formation in membrane lipids of bacteria. Microbiol. Mol. Biol. Rev. 61, 429–441 (1997). 148. Stoner, G. L. & Eisenberg, M. A. Purification and properties of 7, 8-diaminopelargonic acid aminotransferase. J. Biol. Chem. 250, 4029–4036 (1975). 149. Stoner, G. L. & Eisenberg, M. A. Biosynthesis of 7, 8-diaminopelargonic acid from 7-keto-8- aminopelargonic acid and S-adenosyl-L-methionine. The kinetics of the reaction. J. Biol. Chem. 250, 4037–4043 (1975). 150. Iwata-Reuyl, D. Biosynthesis of the 7-deazaguanosine hypermodified nucleosides of transfer RNA. Bioorg. Chem. 31, 24–43 (2003). 151. Nishimura, S., Taya, Y., Kuchino, Y. & Oashi, Z. Enzymatic synthesis of 3-(3-amino-3- carboxypropyl)uridine in Escherichia coli phenylalanine transfer RNA: transfer of the 3-amino- acid-3-carboxypropyl group from S-adenosylmethionine. Biochem. Biophys. Res. Commun. 57, 702–708 (1974). 152. Val, D. L. & Cronan, J. E. J. In vivo evidence that S-adenosylmethionine and fatty acid synthesis intermediates are the substrates for the LuxI family of autoinducer synthases. J. Bacteriol. 180, 2644–2651 (1998). 153. Bowman, W. H., Tabor, C. W. & Tabor, H. Spermidine biosynthesis. Purification and properties of propylamine transferase from Escherichia coli. J. Biol. Chem. 248, 2480–2486 (1973). 154. Bauerle, M. R., Schwalm, E. L. & Booker, S. J. Mechanistic Diversity of Radical S - Adenosylmethionine ( SAM ) - dependent Methylation * □. 290, 3995–4002 (2015). 155. Campbell, R. M. & Tummino, P. J. Cancer epigenetics drug discovery and development: the challenge of hitting the mark. J. Clin. Invest. 124, 64–69 (2014). 156. Doyle, H. A., Yang, M.-L., Raycroft, M. T., Gee, R. J. & Mamula, M. J. Autoantigens: novel forms

126 6 Bibliography

and presentation to the immune system. Autoimmunity 47, 220–233 (2014). 157. Grolleau-Julius, A., Ray, D. & Yung, R. L. The role of epigenetics in aging and autoimmunity. Clin. Rev. Allergy Immunol. 39, 42–50 (2010). 158. Coşar, A., Ipçioğlu, O. M., Ozcan, O. & Gültepe, M. Folate and homocysteine metabolisms and their roles in the biochemical basis of neuropsychiatry. Turkish J. Med. Sci. 44, 1–9 (2014). 159. Gapp, K., Woldemichael, B. T., Bohacek, J. & Mansuy, I. M. Epigenetic regulation in neurodevelopment and neurodegenerative diseases. Neuroscience 264, 99–111 (2014). 160. Tremolizzo, L. et al. Novel therapeutic targets in neuropsychiatric disorders: the neuroepigenome. Curr. Pharm. Des. 20, 1831–1839 (2014). 161. Vickers, M. H. Early life nutrition, epigenetics and programming of later life disease. Nutrients 6, 2165–2178 (2014). 162. Bonnin, R. A., Nordmann, P. & Poirel, L. Screening and deciphering antibiotic resistance in Acinetobacter baumannii: a state of the art. Expert Rev. Anti. Infect. Ther. 11, 571–583 (2013). 163. Lötsch, J. et al. Common non-epigenetic drugs as epigenetic modulators. Trends Mol. Med. 19, 742–753 (2013). 164. Bottiglieri, T. S-Adenosyl-L-methionine (SAMe): from the bench to the bedside--molecular basis of a pleiotrophic molecule. Am. J. Clin. Nutr. 76, 1151S–7S (2002). 165. Huber, T. D. et al. Functional AdoMet Isosteres Resistant to Classical AdoMet Degradation Pathways. ACS Chem. Biol. 11, 2484–2491 (2016). 166. Vranken, C. et al. Super-resolution optical DNA Mapping via DNA methyltransferase-directed click chemistry. Nucleic Acids Res. 42, e50 (2014). 167. Dalhoff, C., Lukinavicius, G., Klimasăuskas, S. & Weinhold, E. Direct transfer of extended groups from synthetic cofactors by DNA methyltransferases. Nat. Chem. Biol. 2, 31–32 (2006). 168. Lukinavičius, G., Tomkuvienė, M., Masevičius, V. & Klimašauskas, S. Enhanced chemical stability of adomet analogues for improved methyltransferase-directed labeling of DNA. ACS Chem. Biol. 8, 1134–1139 (2013). 169. Zhang, Y. et al. In vivo protein allylation to capture protein methylation candidates. Chem. Commun. (Camb). 52, 6689–6692 (2016). 170. Bothwell, I. R. & Luo, M. Large-scale, protection-free synthesis of Se-adenosyl-L- selenomethionine analogues and their application as cofactor surrogates of methyltransferases. Org. Lett. 16, 3056–3059 (2014). 171. Guo, H. et al. Profiling substrates of protein arginine N-methyltransferase 3 with S-adenosyl-L- methionine analogues. ACS Chem. Biol. 9, 476–484 (2014). 172. Plotnikova, A., Osipenko, A., Masevičius, V., Vilkaitis, G. & Klimašauskas, S. Selective covalent labeling of miRNA and siRNA duplexes using HEN1 methyltransferase. J. Am. Chem. Soc. 136, 13550–13553 (2014). 173. Motorin, Y. et al. Expanding the chemical scope of RNA:methyltransferases to site-specific alkynylation of RNA for click labeling. Nucleic Acids Res. 39, 1943–1952 (2011). 174. Tomkuviene, M., Clouet-d’Orval, B., Cerniauskas, I., Weinhold, E. & Klimasauskas, S. Programmable sequence-specific click-labeling of RNA using archaeal box C/D RNP methyltransferases. Nucleic Acids Res. 40, 6765–6773 (2012).

127 6 Bibliography

175. Sedlaczek, L. Biotransformations of steroids. Crit. Rev. Biotechnol. 7, 187–236 (1988). 176. Lindberg, R. L. & Negishi, M. Alteration of mouse cytochrome P450coh substrate specificity by mutation of a single amino-acid residue. Nature 339, 632–634 (1989). 177. Bornscheuer, U. T. et al. Engineering the third wave of biocatalysis. Nature 485, 185–194 (2012). 178. Romero, P. A. & Arnold, F. H. Exploring protein fitness landscapes by directed evolution. Nat. Rev. Mol. Cell Biol. 10, 866–876 (2009). 179. Denard, C. A., Ren, H. & Zhao, H. Improving and repurposing biocatalysts via directed evolution. Curr. Opin. Chem. Biol. 25, 55–64 (2015). 180. Reetz, M. T. Laboratory evolution of stereoselective enzymes: a prolific source of catalysts for asymmetric reactions. Angew. Chem. Int. Ed. Engl. 50, 138–174 (2011). 181. Reetz, M. T. Biocatalysis in organic chemistry and biotechnology: past, present, and future. J. Am. Chem. Soc. 135, 12480–12496 (2013). 182. Nestl, B. M., Hammer, S. C., Nebel, B. A. & Hauer, B. New generation of biocatalysts for organic synthesis. Angew. Chem. Int. Ed. Engl. 53, 3070–3095 (2014). 183. Bommarius, A. S. Biocatalysis: A Status Report. Annu. Rev. Chem. Biomol. Eng. 6, 319–345 (2015). 184. Jäckel, C. & Hilvert, D. Biocatalysts by evolution. Curr. Opin. Biotechnol. 21, 753–759 (2010). 185. Turner, N. J. Directed evolution drives the next generation of biocatalysts. Nat. Chem. Biol. 5, 567–573 (2009). 186. Schmid, A. et al. Industrial biocatalysis today and tomorrow. Nature 409, 258–268 (2001). 187. Walsh, C. T., Garneau-Tsodikova, S. & Gatto, G. J. J. Protein posttranslational modifications: the chemistry of proteome diversifications. Angew. Chem. Int. Ed. Engl. 44, 7342–7372 (2005). 188. Kubyshkin, V. & Budisa, N. Synthetic alienation of microbial organisms by using genetic code engineering: Why and how? Biotechnol. J. 12, (2017). 189. Yong, J. H., Barth, R. F., Wyzlic, I. M., Soloway, A. H. & Rotaru, J. H. In vitro and in vivo evaluation of o-carboranylalanine as a potential boron delivery agent for neutron capture therapy. Anticancer Res. 15, 2033–2038 (1995). 190. Müller, K., Faeh, C. & Diederich, F. Fluorine in pharmaceuticals: looking beyond intuition. Science 317, 1881–1886 (2007). 191. Vornholt, T. & Jeschek, M. The Quest for Xenobiotic Enzymes : From New Enzymes for Chemistry to a Novel Chemistry of Life. 1–10 (2020) doi:10.1002/cbic.202000121. 192. Jeschek, M., Panke, S. & Ward, T. R. Artificial Metalloenzymes on the Verge of New-to-Nature Metabolism. Trends Biotechnol. 36, 60–72 (2018). 193. Bornscheuer, U. T. The fourth wave of biocatalysis is approaching. Philos. Trans. Ser. A, Math. Phys. Eng. Sci. 376, (2018). 194. Taylor, A. I., Arangundy-Franklin, S. & Holliger, P. Towards applications of synthetic genetic polymers in diagnosis and therapy. Curr. Opin. Chem. Biol. 22, 79–84 (2014). 195. Eremeeva, E. & Herdewijn, P. ScienceDirect Reprint of : Non Canonical Genetic Material $. Curr. Opin. Biotechnol. 1–9 (2019) doi:10.1016/j.copbio.2019.11.009. 196. Ellington, A. D. & Szostak, J. W. In vitro selection of RNA molecules that bind specific ligands.

128 6 Bibliography

Nature 346, 818–822 (1990). 197. Tuerk, C. & Gold, L. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249, 505–510 (1990). 198. Pfeiffer, F. et al. Identification and characterization of nucleobase- modified aptamers by click- SELEX. 13, 1153–1180 (2018). 199. Tolle, F., Brändle, G. M., Matzner, D. & Mayer, G. A Versatile Approach Towards Nucleobase- Modified Aptamers Angewandte. 10971–10974 (2015) doi:10.1002/anie.201503652. 200. Drienovská, I., Rioz-Martínez, A., Draksharapu, A. & Roelfes, G. Novel artificial metalloenzymes by in vivo incorporation of metal-binding unnatural amino acids. Chem. Sci. 6, 770–776 (2015). 201. Drienovská, I. et al. Design of an enantioselective artificial metallo-hydratase enzyme containing an unnatural metal-binding amino acid. Chem. Sci. 8, 7228–7235 (2017). 202. Agostini, F. et al. Biocatalysis with Unnatural Amino Acids : Enzymology Meets Xenobiology Angewandte. 9680–9703 (2017) doi:10.1002/anie.201610129. 203. Budisa, N., Kubyshkin, V. & Schmidt, M. Xenobiology: A Journey towards Parallel Life Forms. ChemBioChem 1–5 (2020) doi:10.1002/cbic.202000141. 204. Tarver, H. & Levine, M. STUDIES ON ETHIONINE III. INCORPORATION OF ETHIONINE INTO RAT PROTEINS. J. Biol. Chem. 192, 835–850 (1951). 205. Dumas, A., Lercher, L., Spicer, C. D. & Davis, B. G. Designing logical codon reassignment- Expanding the chemistry in biology. Chem. Sci. 6, 50–69 (2015). 206. Lepthien, S., Hoesl, M. G., Merkel, L. & Budisa, N. Azatryptophans endow proteins with intrinsic blue fluorescence. Proc. Natl. Acad. Sci. U. S. A. 105, 16095–16100 (2008). 207. Budisa, N. et al. Atomic mutations in annexin V--thermodynamic studies of isomorphous protein variants. Eur. J. Biochem. 253, 1–9 (1998). 208. Minks, C., Huber, R., Moroder, L. & Budisa, N. Atomic mutations at the single tryptophan residue of human recombinant annexin V: effects on structure, stability, and activity. Biochemistry 38, 10649–10659 (1999). 209. Budisa, N. Prolegomena to future experimental efforts on genetic code engineering by expanding its amino acid repertoire. Angew. Chemie - Int. Ed. 43, 6426–6463 (2004). 210. Young, T. S. & Schultz, P. G. Beyond the canonical 20 amino acids: expanding the genetic lexicon. J. Biol. Chem. 285, 11039–11044 (2010). 211. Merkel, L. & Budisa, N. Organic fluorine as a polypeptide building element: in vivo expression of fluorinated peptides, proteins and proteomes. Org. Biomol. Chem. 10, 7241–7261 (2012). 212. Seifert, M. H. J. et al. Slow exchange in the chromophore of a green fluorescent protein variant. J. Am. Chem. Soc. 124, 7932–7942 (2002). 213. Cowie, D. B. & Cohen, G. N. Biosynthesis by Escherichia coli of active altered proteins containing selenium instead of sulfur. BBA - Biochim. Biophys. Acta 26, 252–261 (1957). 214. Hendrickson, W. A., Horton, J. R. & LeMaster, D. M. Selenomethionyl proteins produced for analysis by multiwavelength anomalous diffraction (MAD): a vehicle for direct determination of three-dimensional structure. EMBO J. 9, 1665–1672 (1990). 215. Amir, L. et al. Surface display of a redox enzyme and its site-specific wiring to gold electrodes. J. Am. Chem. Soc. 135, 70–73 (2013).

129 6 Bibliography

216. Exner, M. P. et al. Design of S-Allylcysteine in Situ Production and Incorporation Based on a Novel Pyrrolysyl-tRNA Synthetase Variant. Chembiochem 18, 85–90 (2017). 217. Chalker, J. M., Bernardes, G. J. L., Lin, Y. A. & Davis, B. G. Chemical modification of proteins at cysteine: opportunities in chemistry and biology. Chem. Asian J. 4, 630–640 (2009). 218. Prescher, J. A. & Bertozzi, C. R. Chemistry in living systems. Nat. Chem. Biol. 1, 13–21 (2005). 219. Neumann, H. Rewiring translation - Genetic code expansion and its applications. FEBS Lett. 586, 2057–2064 (2012). 220. Spicer, C. D. & Davis, B. G. Selective chemical protein modification. Nat. Commun. 5, 4740 (2014). 221. Wang, K. et al. Optimized orthogonal translation of unnatural amino acids enables spontaneous protein double-labelling and FRET. Nat. Chem. 6, 393–403 (2014). 222. Baumann, T. et al. Site-Resolved Observation of Vibrational Energy Transfer Using a Genetically Encoded Ultrafast Heater. Angew. Chemie - Int. Ed. 58, 2899–2903 (2019). 223. Stubbe, J., Nocera, D. G., Yee, C. S. & Chang, M. C. Y. Radical initiation in the class I ribonucleotide reductase: long-range proton-coupled electron transfer? Chem. Rev. 103, 2167– 2201 (2003). 224. Seyedsayamdost, M. R., Reece, S. Y., Nocera, D. G. & Stubbe, J. Mono-, di-, tri-, and tetra- substituted fluorotyrosines: new probes for enzymes that use tyrosyl radicals in catalysis. J. Am. Chem. Soc. 128, 1569–1579 (2006). 225. Minnihan, E. C., Young, D. D., Schultz, P. G. & Stubbe, J. Incorporation of fluorotyrosines into ribonucleotide reductase using an evolved, polyspecific aminoacyl-tRNA synthetase. J. Am. Chem. Soc. 133, 15942–15945 (2011). 226. Oyala, P. H. et al. Biophysical Characterization of Fluorotyrosine Probes Site-Specifically Incorporated into Enzymes: E. coli Ribonucleotide Reductase As an Example. J. Am. Chem. Soc. 138, 7951–7964 (2016). 227. Minks, C., Alefelder, S., Moroder, L., Huber, R. & Budisa, N. Towards new protein engineering: In vivo building and folding of protein shuttles for drug delivery and targeting by the selective pressure incorporation (SPI) method. Tetrahedron 56, 9431–9442 (2000). 228. Wang, L., Brock, A., Herberich, B. & Schultz, P. G. Expanding the genetic code of Escherichia coli. Science (80-. ). 292, 498–500 (2001). 229. Luo, X. et al. Genetically encoding phosphotyrosine and its nonhydrolyzable analog in bacteria. Nat. Chem. Biol. 13, 845–849 (2017). 230. Zhang, M. S. et al. Biosynthesis and genetic encoding of phosphothreonine through parallel selection and deep sequencing. Nat. Methods 14, 729–736 (2017). 231. Hoesl, M. G. et al. Chemical Evolution of a Bacterial Proteome. Angew. Chemie - Int. Ed. 54, 10030–10034 (2015). 232. Dieterich, D. C. et al. In situ visualization and dynamics of newly synthesized proteins in rat hippocampal neurons. Nat. Neurosci. 13, 897–905 (2010). 233. Fan, C., Ho, J. M. L., Chirathivat, N., Söll, D. & Wang, Y.-S. Exploring the substrate range of wild- type aminoacyl-tRNA synthetases. Chembiochem 15, 1805–1809 (2014). 234. MUNIER, R. & COHEN, G. N. [Incorporation of structural analogues of amino acids into bacterial proteins during their synthesis in vivo]. Biochim. Biophys. Acta 31, 378–391 (1959).

130 6 Bibliography

235. Noren, C. J., Anthony-Cahill, S. J., Griffith, M. C. & Schultz, P. G. Method for Site-Specific. Science (80-. ). 244, 182–188 (1989). 236. Liu, C. C. & Schultz, P. G. Adding New Chemistries to the Genetic Code. Annu. Rev. Biochem. 79, 413–444 (2010). 237. Vargas-Rodriguez, O., Sevostyanova, A., Söll, D. & Crnković, A. Upgrading aminoacyl-tRNA synthetases for genetic code expansion. Curr. Opin. Chem. Biol. 46, 115–122 (2018). 238. Drienovská, I. & Roelfes, G. Expanding the enzyme universe with genetically encoded unnatural amino acids. Nat. Catal. 3, 193–202 (2020). 239. Mukai, T. et al. Codon reassignment in the Escherichia coli genetic code. Nucleic Acids Res. 38, 8188–8195 (2010). 240. Johnson, D. B. F. et al. RF1 knockout allows ribosomal incorporation of unnatural amino acids at multiple sites. Nat. Chem. Biol. 7, 779–786 (2011). 241. Johnson, D. B. F. et al. Release factor one is nonessential in Escherichia coli. ACS Chem. Biol. 7, 1337–1344 (2012). 242. Lajoie, M. J. et al. Genomically recoded organisms expand biological functions. Science 342, 357–360 (2013). 243. Mukai, T. et al. Highly reproductive Escherichia coli cells with no specific assignment to the UAG codon. Sci. Rep. 5, 9699 (2015). 244. Park, H.-S. et al. Expanding the genetic code of Escherichia coli with phosphoserine. Science 333, 1151–1154 (2011). 245. Fan, C., Ip, K. & Söll, D. Expanding the genetic code of Escherichia coli with phosphotyrosine. FEBS Lett. 292, 3040–3047 (2016). 246. Wang, K., Neumann, H., Peak-Chew, S. Y. & Chin, J. W. Evolved orthogonal ribosomes enhance the efficiency of synthetic genetic code expansion. Nat. Biotechnol. 25, 770–777 (2007). 247. Neumann, H., Wang, K., Davis, L., Garcia-Alai, M. & Chin, J. W. Encoding multiple unnatural amino acids via evolution of a quadruplet-decoding ribosome. Nature 464, 441–444 (2010). 248. Schmied, W. H., Elsässer, S. J., Uttamapinant, C. & Chin, J. W. Efficient multisite unnatural amino acid incorporation in mammalian cells via optimized pyrrolysyl tRNA synthetase/tRNA expression and engineered eRF1. J. Am. Chem. Soc. 136, 15577–15583 (2014). 249. Amiram, M. et al. Evolution of translation machinery in recoded bacteria enables multi-site incorporation of nonstandard amino acids. Nat. Biotechnol. 33, 1272–1279 (2015). 250. Zheng, Y., Lewis, T. L. J., Igo, P., Polleux, F. & Chatterjee, A. Virus-Enabled Optimization and Delivery of the Genetic Machinery for Efficient Unnatural Amino Acid Mutagenesis in Mammalian Cells and Tissues. ACS Synth. Biol. 6, 13–18 (2017). 251. Chatterjee, A., Sun, S. B., Furman, J. L., Xiao, H. & Schultz, P. G. A versatile platform for single- and multiple-unnatural amino acid mutagenesis in Escherichia coli. Biochemistry 52, 1828– 1837 (2013). 252. Kwok, Y. & Wong, J. T. Evolutionary relationship between Halobacterium cutirubrum and eukaryotes determined by use of aminoacyl-tRNA synthetases as phylogenetic probes. Can. J. Biochem. 58, 213–218 (1980). 253. Yanagisawa, T., Ishii, R., Fukunaga, R., Nureki, O. & Yokoyama, S. Crystallization and preliminary X-ray crystallographic analysis of the catalytic domain of pyrrolysyl-tRNA synthetase from the

131 6 Bibliography

methanogenic archaeon Methanosarcina mazei. Acta Crystallogr. Sect. F, Struct. Biol. Cryst. Commun. 62, 1031–1033 (2006). 254. Mukai, T. et al. Adding l-lysine derivatives to the genetic code of mammalian cells with engineered pyrrolysyl-tRNA synthetases. Biochem. Biophys. Res. Commun. 371, 818–822 (2008). 255. Yanagisawa, T. et al. Crystallographic studies on multiple conformational states of active-site loops in pyrrolysyl-tRNA synthetase. J. Mol. Biol. 378, 634–652 (2008). 256. Kavran, J. M. et al. Structure of pyrrolysyl-tRNA synthetase, an archaeal enzyme for genetic code innovation. Proc. Natl. Acad. Sci. U. S. A. 104, 11268–11273 (2007). 257. Nozawa, K. et al. Pyrrolysyl-tRNA synthetase-tRNA(Pyl) structure reveals the molecular basis of orthogonality. Nature 457, 1163–1167 (2009). 258. Yanagisawa, T., Sumida, T., Ishii, R. & Yokoyama, S. A novel crystal form of pyrrolysyl-tRNA synthetase reveals the pre- and post-aminoacyl-tRNA synthesis conformational states of the adenylate and aminoacyl moieties and an asparagine residue in the catalytic site. Acta Crystallogr. D. Biol. Crystallogr. 69, 5–15 (2013). 259. Paul, L., Ferguson, D. J. J. & Krzycki, J. A. The trimethylamine methyltransferase gene and multiple dimethylamine methyltransferase genes of Methanosarcina barkeri contain in-frame and read-through amber codons. J. Bacteriol. 182, 2520–2529 (2000). 260. Burke, S. A., Lo, S. L. & Krzycki, J. A. Clustered genes encoding the methyltransferases of methanogenesis from monomethylamine. J. Bacteriol. 180, 3432–3440 (1998). 261. Hao, B. et al. A new UAG-encoded residue in the structure of a methanogen methyltransferase. Science 296, 1462–1466 (2002). 262. Longstaff, D. G., Blight, S. K., Zhang, L., Green-Church, K. B. & Krzycki, J. A. In vivo contextual requirements for UAG translation as pyrrolysine. Mol. Microbiol. 63, 229–241 (2007). 263. Namy, O. et al. Adding pyrrolysine to the Escherichia coli genetic code. FEBS Lett. 581, 5282– 5288 (2007). 264. Polycarpo, C. et al. An aminoacyl-tRNA synthetase that specifically activates pyrrolysine. Proc. Natl. Acad. Sci. U. S. A. 101, 12450–12454 (2004). 265. Blight, S. K. et al. Direct charging of tRNA(CUA) with pyrrolysine in vitro and in vivo. Nature 431, 333–335 (2004). 266. Hao, B. et al. Reactivity and chemical synthesis of L-pyrrolysine- the 22(nd) genetically encoded amino acid. Chem. Biol. 11, 1317–1324 (2004). 267. Srinivasan, G., James, C. M. & Krzycki, J. A. Pyrrolysine encoded by UAG in Archaea: charging of a UAG-decoding specialized tRNA. Science 296, 1459–1462 (2002). 268. Polycarpo, C. R. et al. Pyrrolysine analogues as substrates for pyrrolysyl-tRNA synthetase. FEBS Lett. 580, 6695–6700 (2006). 269. Yanagisawa, T. et al. Multistep engineering of pyrrolysyl-tRNA synthetase to genetically encode N(epsilon)-(o-azidobenzyloxycarbonyl) lysine for site-specific protein modification. Chem. Biol. 15, 1187–1197 (2008). 270. Kobayashi, T., Yanagisawa, T., Sakamoto, K. & Yokoyama, S. Recognition of non-alpha-amino substrates by pyrrolysyl-tRNA synthetase. J. Mol. Biol. 385, 1352–1360 (2009). 271. Li, Y.-M. et al. Ligation of expressed protein α-hydrazides via genetic incorporation of an α-

132 6 Bibliography

hydroxy acid. ACS Chem. Biol. 7, 1015–1022 (2012). 272. Ibba, M. et al. Substrate selection by aminoacyl-tRNA synthetases. Nucleic Acids Symp. Ser. 40– 42 (1995). 273. Woese, C. R., Olsen, G. J., Ibba, M. & Söll, D. Aminoacyl-tRNA synthetases, the genetic code, and the evolutionary process. Microbiol. Mol. Biol. Rev. 64, 202–236 (2000). 274. Baldomà, L. & Aguilar, J. Involvement of lactaldehyde dehydrogenase in several metabolic pathways of Escherichia coli K12. J. Biol. Chem. 262, 13991–13996 (1987). 275. HANSEN, R. W. & HAYASHI, J. A. Glycolate metabolism in Escherichia coli. J. Bacteriol. 83, 679– 687 (1962). 276. Weber, W. W. & Zannoni, V. G. Reduction of aromatic alpha-keto acids by lactic dehydrogenase isozymes and aromatic alpha-keto acid reductase. Ann. N. Y. Acad. Sci. 151, 627–637 (1968). 277. Delarue, M. Aminoacyl-tRNA synthetases. Curr. Opin. Struct. Biol. 5, 48–55 (1995). 278. Wan, W., Tharp, J. M. & Liu, W. R. Pyrrolysyl-tRNA synthetase: an ordinary enzyme but an outstanding genetic code expansion tool. Biochim. Biophys. Acta 1844, 1059–1070 (2014). 279. Ambrogelly, A. et al. Pyrrolysine is not hardwired for cotranslational insertion at UAG codons. Proc. Natl. Acad. Sci. U. S. A. 104, 3141–3146 (2007). 280. Giegé, R., Sissler, M. & Florentz, C. Universal rules and idiosyncratic features in tRNA identity. Nucleic Acids Res. 26, 5017–5035 (1998). 281. Normanly, J., Ogden, R. C., Horvath, S. J. & Abelson, J. Changing the identity of a transfer RNA. Nature 321, 213–219 (1986). 282. Breitschopf, K., Achsel, T., Busch, K. & Gross, H. J. Identity elements of human tRNA(Leu): structural requirements for converting human tRNA(Ser) into a leucine acceptor in vitro. Nucleic Acids Res. 23, 3633–3637 (1995). 283. Palencia, A. et al. Structural dynamics of the aminoacylation and proofreading functional cycle of bacterial leucyl-tRNA synthetase. Nat. Struct. Mol. Biol. 19, 677–684 (2012). 284. Normanly, J., Ollick, T. & Abelson, J. Eight base changes are sufficient to convert a leucine- inserting tRNA into a serine-inserting tRNA. Proc. Natl. Acad. Sci. U. S. A. 89, 5680–5684 (1992). 285. Asahara, H. et al. Escherichia coli seryl-tRNA synthetase recognizes tRNA(Ser) by its characteristic tertiary structure. J. Mol. Biol. 236, 738–748 (1994). 286. Xiao, H. et al. Genetic incorporation of multiple unnatural amino acids into proteins in mammalian cells. Angew. Chem. Int. Ed. Engl. 52, 14080–14083 (2013). 287. Wan, W. et al. A facile system for genetic incorporation of two different noncanonical amino acids into one protein in Escherichia coli. Angew. Chem. Int. Ed. Engl. 49, 3211–3214 (2010). 288. Wang, L., Magliery, T. J., Liu, D. R. & Schultz, P. G. A new functional suppressor tRNA/aminoacyl- tRNA synthetase pair for the in vivo incorporation of unnatural amino acids into proteins [16]. J. Am. Chem. Soc. 122, 5010–5011 (2000). 289. Lee, C. P. & RajBhandary, U. L. Mutants of Escherichia coli initiator tRNA that suppress amber codons in Saccharomyces cerevisiae and are aminoacylated with tyrosine by yeast extracts. Proc. Natl. Acad. Sci. U. S. A. 88, 11378–11382 (1991). 290. Quinn, C. L., Tao, N. & Schimmel, P. Species-specific microhelix aminoacylation by a eukaryotic pathogen tRNA synthetase dependent on a single base pair. Biochemistry 34, 12489–12495

133 6 Bibliography

(1995). 291. Kleeman, T. A., Wei, D., Simpson, K. L. & First, E. A. Human tyrosyl-tRNA synthetase shares amino acid sequence homology with a putative cytokine. J. Biol. Chem. 272, 14420–14425 (1997). 292. Wakasugi, K., Quinn, C. L., Tao, N. & Schimmel, P. Genetic code in evolution: switching species- specific aminoacylation with a peptide transplant. EMBO J. 17, 297–305 (1998). 293. Steer, B. A. & Schimmel, P. Major anticodon-binding region missing from an archaebacterial tRNA synthetase. J. Biol. Chem. 274, 35601–35606 (1999). 294. Fechter, P., Rudinger-Thirion, J., Tukalo, M. & Giegé, R. Major tyrosine identity determinants in Methanococcus jannaschii and Saccharomyces cerevisiae tRNA(Tyr) are conserved but expressed differently. Eur. J. Biochem. 268, 761–767 (2001). 295. Marck, C. & Grosjean, H. tRNomics: analysis of tRNA genes from 50 genomes of Eukarya, Archaea, and Bacteria reveals anticodon-sparing strategies and domain-specific features. RNA 8, 1189–1232 (2002). 296. Himeno, H., Hasegawa, T., Ueda, T., Watanabe, K. & Shimizu, M. Conversion of aminoacylation specificity from tRNA(Tyr) to tRNA(Ser) in vitro. Nucleic Acids Res. 18, 6815–6819 (1990). 297. Eriani, G., Delarue, M., Poch, O., Gangloff, J. & Moras, D. Partition of tRNA synthetases into two classes based on mutually exclusive sets of sequence motifs. Nature 347, 203–206 (1990). 298. Kobayashi, T. et al. Structural basis for orthogonal tRNA specificities of tyrosyl-tRNA synthetases for genetic code expansion. Nat. Struct. Biol. 10, 425–432 (2003). 299. Guo, J., Melançon, C. E., Lee, H. S., Groff, D. & Schultz, P. G. Evolution of amber suppressor tRNAs for efficient bacterial production of proteins containing nonnatural amino acids. Angew. Chemie - Int. Ed. 48, 9148–9151 (2009). 300. Smolskaya, S. & Andreev, Y. A. Site-specific incorporation of unnatural amino acids into escherichia coli recombinant protein: Methodology development and recent achievement. Biomolecules 9, (2019). 301. Santoro, S. W., Wang, L., Herberich, B., King, D. S. & Schultz, P. G. An efficient system for the evolution of aminoacyl-tRNA synthetase specificity. Nature biotechnology vol. 20 (2002). 302. Gan, Q., Lehman, B. P., Bobik, T. A. & Fan, C. Expanding the genetic code of Salmonella with non-canonical amino acids. Sci. Rep. 6, 39920 (2016). 303. He, J., Van Treeck, B., Nguyen, H. B. & Melançon, C. E. 3rd. Development of an Unnatural Amino Acid Incorporation System in the Actinobacterial Natural Product Producer Streptomyces venezuelae ATCC 15439. ACS Synth. Biol. 5, 125–132 (2016). 304. Wang, F., Robbins, S., Guo, J., Shen, W. & Schultz, P. G. Genetic incorporation of unnatural amino acids into proteins in Mycobacterium tuberculosis. PLoS One 5, e9354 (2010). 305. Turner, J. M., Graziano, J., Spraggon, G. & Schultz, P. G. Structural plasticity of an aminoacyl- tRNA synthetase active site. Proc. Natl. Acad. Sci. U. S. A. 103, 6483–6488 (2006). 306. Jakubowski, H. & Goldman, E. Editing of errors in selection of amino acids for protein synthesis. Microbiol. Rev. 56, 412–429 (1992). 307. Wu, B., Wang, Z., Huang, Y. & Liu, W. R. Catalyst-free and site-specific one-pot dual-labeling of a protein directed by two genetically incorporated noncanonical amino acids. Chembiochem 13, 1405–1408 (2012).

134 6 Bibliography

308. Baumann, T. et al. Computational aminoacyl-tRNA synthetase library design for photocaged tyrosine. Int. J. Mol. Sci. 20, (2019). 309. Kille, S. et al. Reducing codon redundancy and screening effort of combinatorial protein libraries created by saturation mutagenesis. ACS Synth. Biol. 2, 83–92 (2013). 310. Perona, J. J. & Hadd, A. Structural diversity and protein engineering of the aminoacyl-tRNA synthetases. Biochemistry 51, 8705–8729 (2012). 311. Cline, J., Braman, J. C. & Hogrefe, H. H. PCR fidelity of pfu DNA polymerase and other thermostable DNA polymerases. Nucleic Acids Res. 24, 3546–3551 (1996). 312. Wilson, D. S. & Keefe, A. D. Random mutagenesis by PCR. Curr. Protoc. Mol. Biol. Chapter 8, 1– 9 (2001). 313. Mccullum, E. O., Williams, B. a R., Zhang, J. & Chaput, J. C. Error Based PCR Mutagenesis Protocols. Methods Mol. Biol. 634, 103–109 (2010). 314. Melançon, C. E. 3rd & Schultz, P. G. One plasmid selection system for the rapid evolution of aminoacyl-tRNA synthetases. Bioorg. Med. Chem. Lett. 19, 3845–3847 (2009). 315. Liu, D. R. & Schultz, P. G. Progress toward the evolution of an organism with an expanded genetic code. Proc. Natl. Acad. Sci. U. S. A. 96, 4780–4785 (1999). 316. Noble, C., Adlam, B., Church, G. M., Esvelt, K. M. & Nowak, M. A. Current CRISPR gene drive systems are likely to be highly invasive in wild populations. Elife 7, (2018). 317. Berg, P., Baltimore, D., Brenner, S., Roblin, R. O. & Singer, M. F. Summary statement of the Asilomar conference on recombinant DNA molecules. Proc. Natl. Acad. Sci. U. S. A. 72, 1981– 1984 (1975). 318. Simon, A. J. & Ellington, A. D. Recent advances in synthetic biosafety. F1000Research 5, (2016). 319. Wilson, D. J. NIH guidelines for research involving recombinant DNA molecules. Account. Res. 3, 177–185 (1993). 320. Crick, F. Central dogma of molecular biology. Nature 227, 561–563 (1970). 321. Tizei, P. A. G., Csibra, E., Torres, L. & Pinheiro, V. B. Selection platforms for directed evolution in synthetic biology. Biochem. Soc. Trans. 44, 1165–1175 (2016). 322. Ostrov, N. et al. Design, synthesis, and testing toward a 57-codon genome. Science 353, 819– 822 (2016). 323. Pósfai, G. et al. Emergent properties of reduced-genome Escherichia coli. Science 312, 1044– 1046 (2006). 324. Anderson, J. C. et al. An expanded genetic code with a functional quadruplet codon. Proc. Natl. Acad. Sci. U. S. A. 101, 7566–7571 (2004). 325. Liu, C. et al. Phosphonomethyl Oligonucleotides as Backbone-Modified Artificial Genetic Polymers. J. Am. Chem. Soc. 140, 6690–6699 (2018). 326. Zhang, Y. et al. A semi-synthetic organism that stores and retrieves increased genetic information. Nature 551, 644–647 (2017). 327. Zhang, Y. & Romesberg, F. E. Semisynthetic Organisms with Expanded Genetic Codes. Biochemistry 57, 2177–2178 (2018). 328. Torres, L., Krüger, A., Csibra, E., Gianni, E. & Pinheiro, V. B. Synthetic biology approaches to biological containment: pre-emptively tackling potential risks. Essays Biochem. 60, 393–410

135 6 Bibliography

(2016). 329. De Simone, A., Acevedo-Rocha, C. G., Hoesl, M. G. & Budisa, N. Towards reassignment of the methionine codon aug to two different noncanonical amino acids in bacterial translation. Croat. Chem. Acta 89, 243–253 (2016). 330. Mandell, D. J. et al. Biocontainment of genetically modified organisms by synthetic protein design. Nature 518, 55–60 (2015). 331. Rovner, A. J. et al. Recoded organisms engineered to depend on synthetic amino acids. Nature 518, 89–93 (2015). 332. Wang, H. H. et al. Programming cells by multiplex genome engineering and accelerated evolution. Nature 460, 894–898 (2009). 333. Santos, M. A., Cheesman, C., Costa, V., Moradas-Ferreira, P. & Tuite, M. F. Selective advantages created by codon ambiguity allowed for the evolution of an alternative genetic code in Candida spp. Mol. Microbiol. 31, 937–947 (1999). 334. Diwo, C. & Budisa, N. Alternative biochemistries for alien life: Basic concepts and requirements for the design of a Robust biocontainment system in genetic isolation. Genes (Basel). 10, (2019). 335. Mayer, C. Selection, Addiction and Catalysis: Emerging Trends for the Incorporation of Noncanonical Amino Acids into Peptides and Proteins in Vivo. ChemBioChem 20, 1357–1364 (2019). 336. Wohlgemuth, R. Biocatalysis--key to sustainable industrial chemistry. Curr. Opin. Biotechnol. 21, 713–724 (2010). 337. Erb, T. J., Jones, P. R. & Bar-Even, A. Synthetic metabolism: metabolic engineering meets enzyme design. Curr. Opin. Chem. Biol. 37, 56–62 (2017). 338. Si, T., Xiao, H. & Zhao, H. Rapid prototyping of microbial cell factories via genome-scale engineering. Biotechnol. Adv. 33, 1420–1432 (2015). 339. Esvelt, K. M. & Wang, H. H. Genome-scale engineering for systems and synthetic biology. Mol. Syst. Biol. 9, 641 (2013). 340. Cao, M., Tran, V. G. & Zhao, H. Unlocking nature’s biosynthetic potential by directed genome evolution. Curr. Opin. Biotechnol. 66, 95–104 (2020). 341. Dragosits, M. & Mattanovich, D. Adaptive laboratory evolution -- principles and applications for biotechnology. Microb. Cell Fact. 12, 64 (2013). 342. Sandberg, T. E. et al. Evolution of Escherichia coli to 42 °C and subsequent genetic engineering reveals adaptive mechanisms and novel mutations. Mol. Biol. Evol. 31, 2647–2662 (2014). 343. Portnoy, V. A., Bezdan, D. & Zengler, K. Adaptive laboratory evolution--harnessing the power of biology for metabolic engineering. Curr. Opin. Biotechnol. 22, 590–594 (2011). 344. Maddamsetti, R., Lenski, R. E. & Barrick, J. E. Adaptation, Clonal Interference, and Frequency- Dependent Interactions in a Long-Term Evolution Experiment with Escherichia coli. Genetics 200, 619–631 (2015). 345. Gresham, D. & Dunham, M. J. The enduring utility of continuous culturing in experimental evolution. Genomics 104, 399–405 (2014). 346. Lenski, R. E., Rose, M. R., Simpson, S. C. & Tadler, S. C. Long-Term Experimental Evolution in Escherichia coli . I . Adaptation and Divergence During 2 , 000 Generations Author ( s ): Richard E . Lenski , Michael R . Rose , Suzanne C . Simpson and Scott C . Tadler Published by : The

136 6 Bibliography

University of Chicago Press f. Am. Nat. 138, 1315–1341 (1991). 347. Wiser, M. J. & Lenski, R. E. A Comparison of Methods to Measure Fitness in Escherichia coli. PLoS One 10, e0126210 (2015). 348. Rozen, D. E., Philippe, N., Arjan de Visser, J., Lenski, R. E. & Schneider, D. Death and cannibalism in a seasonal environment facilitate bacterial coexistence. Ecol. Lett. 12, 34–44 (2009). 349. Shendure, J. et al. DNA sequencing at 40: past, present and future. Nature 550, 345–353 (2017). 350. Bailey, S. F., Rodrigue, N. & Kassen, R. The effect of selection environment on the probability of parallel evolution. Mol. Biol. Evol. 32, 1436–1448 (2015). 351. Herring, C. D. et al. Comparative genome sequencing of Escherichia coli allows observation of bacterial evolution on a laboratory timescale. Nat. Genet. 38, 1406–1412 (2006). 352. Liu, R., Bassalo, M. C., Zeitoun, R. I. & Gill, R. T. Genome scale engineering techniques for metabolic engineering. Metab. Eng. 32, 143–154 (2015). 353. Choe, D. et al. Adaptive laboratory evolution of a genome-reduced Escherichia coli. Nat. Commun. 10, (2019). 354. Qi, Y., Liu, H., Chen, X. & Liu, L. Engineering microbial membranes to increase stress tolerance of industrial strains. Metab. Eng. 53, 24–34 (2019). 355. Wang, S., Sun, X. & Yuan, Q. Strategies for enhancing microbial tolerance to inhibitors for biofuel production: A review. Bioresour. Technol. 258, 302–309 (2018). 356. Huang, C.-J., Lu, M.-Y., Chang, Y.-W. & Li, W.-H. Experimental Evolution of Yeast for High- Temperature Tolerance. Mol. Biol. Evol. 35, 1823–1839 (2018). 357. Pereira, R. et al. Adaptive laboratory evolution of tolerance to dicarboxylic acids in Saccharomyces cerevisiae. Metab. Eng. 56, 130–141 (2019). 358. Görke, B. & Stülke, J. Carbon catabolite repression in bacteria: many ways to make the most out of nutrients. Nat. Rev. Microbiol. 6, 613–624 (2008). 359. Ingram, L. O. & Doran, J. B. Conversion of cellulosic materials to ethanol⋆. FEMS Microbiol. Rev. 16, 235–241 (1995). 360. Wang, X. et al. GREACE-assisted adaptive laboratory evolution in endpoint fermentation broth enhances lysine production by Escherichia coli. Microb. Cell Fact. 18, 106 (2019). 361. Wen, Z., Ledesma-Amaro, R., Lin, J., Jiang, Y. & Yang, S. Improved n-Butanol Production from Clostridium cellulovorans by Integrated Metabolic and Evolutionary Engineering. Appl. Environ. Microbiol. 85, (2019). 362. Kao, K. C. & Sherlock, G. Molecular characterization of clonal interference during adaptive evolution in asexual populations of Saccharomyces cerevisiae. Nat. Genet. 40, 1499–1504 (2008). 363. Oud, B. et al. Genome duplication and mutations in ACE2 cause multicellular, fast-sedimenting phenotypes in evolved Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. U. S. A. 110, E4223-31 (2013). 364. Sniegowski, P. D. & Gerrish, P. J. Beneficial mutations and the dynamics of adaptation in asexual populations. Philos. Trans. R. Soc. London. Ser. B, Biol. Sci. 365, 1255–1263 (2010). 365. Blount, Z. D., Borland, C. Z. & Lenski, R. E. Historical contingency and the evolution of a key innovation in an experimental population of Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 105,

137 6 Bibliography

7899–7906 (2008). 366. Blount, Z. D., Barrick, J. E., Davidson, C. J. & Lenski, R. E. Genomic analysis of a key innovation in an experimental Escherichia coli population. Nature 489, 513–518 (2012). 367. Lenski, R. E. et al. Sustained fitness gains and variability in fitness trajectories in the long-term evolution experiment with Escherichia coli. Proceedings. Biol. Sci. 282, 20152292 (2015). 368. Bacher, J. M. & Ellington, A. D. Selection and characterization of escherichia coli variants capable of growth on an otherwise toxic tryptophan analogue. J. Bacteriol. 183, 5414–5425 (2001). 369. Tze-Fei Wong, J. Coevolution of genetic code and amino acid biosynthesis. Trends Biochem. Sci. 6, 33–36 (1981). 370. Budisa, N. et al. High-level Biosynthetic Substitution of Methionine in Proteins by its Analogs 2- Aminohexanoic Acid, Selenomethionine, Telluromethionine and Ethionine in Escherichia coli - Budisa - 2005 - European Journal of Biochemistry - Wiley Online Library. Eur. J. Biochem. 230, 788–796 (1995). 371. Piñero-Fernandez, S., Chimerel, C., Keyser, U. F. & Summers, D. K. Indole transport across Escherichia coli membranes. J. Bacteriol. 193, 1793–1798 (2011). 372. Crawford, I. P. & Yanofsky, C. ON THE SEPARATION OF THE TRYPTOPHAN SYNTHETASE OF ESCHERICHIA COLI INTO TWO PROTEIN COMPONENTS. Proc. Natl. Acad. Sci. U. S. A. 44, 1161– 1170 (1958). 373. Lane, A. N. & Kirschner, K. Mechanism of the Physiological Reaction Catalyzed by Tryptophan Synthase from Escherichia coli. Biochemistry 30, 479–484 (1991). 374. Agostini, F. et al. Laboratory evolution of <em>Escherichia coli</em> enables life based on fluorinated amino acids. bioRxiv 665950 (2019) doi:10.1101/665950. 375. Zhang, J. & Zheng, Y. G. SAM/SAH Analogs as Versatile Tools for SAM-Dependent Methyltransferases. ACS Chem. Biol. 11, 583–597 (2016). 376. Luo, H. et al. Coupling S-adenosylmethionine–dependent methylation to growth: Design and uses. PLoS Biol. 17, 1–13 (2019). 377. Kuthning, A. et al. Towards Biocontained Cell Factories: An Evolutionarily Adapted Escherichia coli Strain Produces a New-to-nature Bioactive Lantibiotic Containing Thienopyrrole-Alanine. Sci. Rep. 6, 1–7 (2016). 378. Ferla, M. P. & Patrick, W. M. Bacterial methionine biosynthesis. Microbiology 160, 1571–1584 (2014). 379. Banerjee, R. V, Johnston, N. L., Sobeski, J. K., Datta, P. & Matthews, R. G. Cloning and sequence analysis of the Escherichia coli metH gene encoding cobalamin-dependent methionine synthase and isolation of a tryptic fragment containing the cobalamin-binding domain. J. Biol. Chem. 264, 13888–13895 (1989). 380. Foster, M. A., Tejerina, G., Guest, J. R. & Woods, D. D. Two enzymic mechanisms for the methylation of homocysteine by extracts of Escherichia coli. Biochem. J. 92, 476–488 (1964). 381. Old, I. G. et al. Cloning and characterization of the genes for the two homocysteine transmethylases of Escherichia coli. Mol. Gen. Genet. 211, 78–87 (1988). 382. Boysen, A., Møller-Jensen, J., Kallipolitis, B., Valentin-Hansen, P. & Overgaard, M. Translational regulation of gene expression by an anaerobically induced small non-coding RNA in Escherichia coli. J. Biol. Chem. 285, 10690–10702 (2010).

138 6 Bibliography

383. Winzer, K. et al. LuxS: its role in central metabolism and the in vitro synthesis of 4-hydroxy-5- methyl-3(2H)-furanone. Microbiology 148, 909–922 (2002). 384. Neuhierl, B., Thanbichler, M., Lottspeich, F. & Böck, A. A family of S-methylmethionine- dependent thiol/selenol methyltransferases. Role in selenium tolerance and evolutionary relation. J. Biol. Chem. 274, 5407–5414 (1999). 385. El-Hajj, Z. W., Reyes-Lamothe, R. & Newman, E. B. Cell division, one-carbon metabolism and methionine synthesis in a metK-deficient Escherichia coli mutant, and a role for MmuM. Microbiol. (United Kingdom) 159, 2036–2048 (2013). 386. Baba, T. et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: The Keio collection. Mol. Syst. Biol. 2, (2006). 387. Schlake, T. & Bode, J. Use of Mutated FLP Recognition Target (FRT) Sites for the Exchange of Expression Cassettes at Defined Chromosomal Loci. Biochemistry 33, 12746–12751 (1994). 388. Salwiczek, M., Nyakatura, E. K., Gerling, U. I. M., Ye, S. & Koksch, B. Fluorinated amino acids: Compatibility with native protein structures and effects on protein–protein interactions. Chem. Soc. Rev. 41, 2135–2171 (2012). 389. Purser, S., Moore, P. R., Swallow, S. & Gouverneur, V. Fluorine in medicinal chemistry. Chem. Soc. Rev. 37, 320–330 (2008). 390. Mei, H. et al. Fluorine-Containing Drugs Approved by the FDA in 2018. Chemistry 25, 11797– 11819 (2019). 391. O’Hagan, D. & Rzepa, H. S. Some influences of fluorine in bioorganic chemistry. Chem. Commun. 645–652 (1997) doi:10.1039/a604140j. 392. Samsonov, S. A., Salwiczek, M., Anders, G., Koksch, B. & Pisabarro, M. T. Fluorine in protein environments: a QM and MD study. J. Phys. Chem. B 113, 16400–16408 (2009). 393. Howard, J. A. K., Hoy, V. J., O’Hagan, D. & Smith, G. T. How good is fluorine as a hydrogen bond acceptor? Tetrahedron 52, 12613–12622 (1996). 394. Budisa, N. et al. Efforts towards the design of ‘Teflon’ proteins: In vivo translation with trifluorinated leucine and methionine analogues. Chem. Biodivers. 1, 1465–1475 (2004). 395. Schlosser, M. Parametrization of Substituents: Effects of Fluorine and Other Heteroatoms on OH, NH, and CH Acidities. Angew. Chem. Int. Ed. Engl. 37, 1496–1513 (1998). 396. Walborsky, H. M. & Lang, J. H. Effects of the Trifluoromethyl Group. IV.1,2 the pK’s of ι- Trifluoromethyl Amino Acids. J. Am. Chem. Soc. 78, 4314–4316 (1956). 397. Holmgren, S. K., Bretscher, L. E., Taylor, K. M. & Raines, R. T. A hyperstable collagen mimic. Chem. Biol. 6, 63–70 (1999). 398. Tang, Y. et al. Fluorinated Coiled-Coil Proteins Prepared In Vivo Display Enhanced Thermal and Chemical Stability. Angew. Chem. Int. Ed. Engl. 40, 1494–1496 (2001). 399. Eriksson, A. E. et al. Response of a protein structure to cavity-creating mutations and its relation to the hydrophobic effect. Science 255, 178–183 (1992). 400. Nick Pace, C., Scholtz, J. M. & Grimsley, G. R. Forces stabilizing proteins. FEBS Lett. 588, 2177– 2184 (2014). 401. Merkel, L., Schauer, M., Antranikian, G. & Budisa, N. Parallel incorporation of different fluorinated amino acids: On the way to ‘teflon’ proteins. ChemBioChem 11, 1505–1507 (2010).

139 6 Bibliography

402. Ge, X. et al. Ligand-induced conformational change of Plasmodium falciparum AMA1 detected using 19F NMR. J. Med. Chem. 57, 6419–6427 (2014). 403. Mishra, N. K., Urick, A. K., Ember, S. W. J., Schönbrunn, E. & Pomerantz, W. C. Fluorinated aromatic amino acids are sensitive 19F NMR probes for bromodomain-ligand interactions. ACS Chem. Biol. 9, 2755–2760 (2014). 404. Arntson, K. E. & Pomerantz, W. C. K. Protein-Observed Fluorine NMR: A Bioorthogonal Approach for Small Molecule Discovery. J. Med. Chem. 59, 5158–5171 (2016). 405. Berger, A. A., Völler, J.-S., Budisa, N. & Koksch, B. Deciphering the Fluorine Code-The Many Hats Fluorine Wears in a Protein Environment. Acc. Chem. Res. 50, 2093–2103 (2017). 406. Marsh, E. N. G. Designing Fluorinated Proteins. Methods Enzymol. 580, 251–278 (2016). 407. Marsh, E. N. G. Fluorinated Proteins: From Design and Synthesis to Structure and Stability. Acc. Chem. Res. 47, 2878–2886 (2014). 408. Meng, H. & Kumar, K. Antimicrobial activity and protease stability of peptides containing fluorinated amino acids. J. Am. Chem. Soc. 129, 15615–15622 (2007). 409. Gerling, U. I. M. et al. Fluorinated amino acids in amyloid formation: a symphony of size{,} hydrophobicity and α-helix propensity. Chem. Sci. 5, 819–830 (2014). 410. Koksch, B., Sewald, N., Hofmann, H. J., Burger, K. & Jakubke, H. D. Proteolytically stable peptides by incorporation of alpha-Tfm amino acids. J. Pept. Sci. 3, 157–167 (1997). 411. Gottler, L. M., Lee, H.-Y., Shelburne, C. E., Ramamoorthy, A. & Marsh, E. N. G. Using fluorous amino acids to modulate the biological activity of an antimicrobial peptide. Chembiochem 9, 370–373 (2008). 412. Asante, V., Mortier, J., Wolber, G. & Koksch, B. Impact of fluorination on proteolytic stability of peptides: a case study with α-chymotrypsin and pepsin. Amino Acids 46, 2733–2744 (2014). 413. Marsh, E. N. G. & Suzuki, Y. Using (19)F NMR to probe biological interactions of proteins and peptides. ACS Chem. Biol. 9, 1242–1250 (2014). 414. Yu, J.-X., Hallac, R. R., Chiguru, S. & Mason, R. P. New frontiers and developing applications in 19F NMR. Prog. Nucl. Magn. Reson. Spectrosc. 70, 25–49 (2013). 415. Chen, H., Viel, S., Ziarelli, F. & Peng, L. 19F NMR: a valuable tool for studying biological events. Chem. Soc. Rev. 42, 7971–7982 (2013). 416. Kitevski-LeBlanc, J. L. & Prosser, R. S. Current applications of 19F NMR to studies of protein structure and dynamics. Prog. Nucl. Magn. Reson. Spectrosc. 62, 1–33 (2012). 417. Maisch, D. et al. Chemical labeling strategy with (R)- and (S)-trifluoromethylalanine for solid state 19F NMR analysis of peptaibols in membranes. J. Am. Chem. Soc. 131, 15596–15597 (2009). 418. Suzuki, Y., Brender, J. R., Hartman, K., Ramamoorthy, A. & Marsh, E. N. G. Alternative pathways of human islet amyloid polypeptide aggregation distinguished by (19)f nuclear magnetic resonance-detected kinetics of monomer consumption. Biochemistry 51, 8154–8162 (2012). 419. Huchet, Q. A. et al. On the polarity of partially fluorinated methyl groups. J. Fluor. Chem. 152, 119–128 (2013). 420. Müller, K. Simple vector considerations to assess the polarity of partially fluorinated alkyl and alkoxy groups. Chimia (Aarau). 68, 356–362 (2014).

140 6 Bibliography

421. Robalo, J. R., Huhmann, S., Koksch, B. & Vila Verde, A. The Multiple Origins of the Hydrophobicity of Fluorinated Apolar Amino Acids. Chem 3, 881–897 (2017). 422. Gadais, C. et al. Probing the Outstanding Local Hydrophobicity Increases in Peptide Sequences Induced by Incorporation of Trifluoromethylated Amino Acids. ChemBioChem 19, 1026–1030 (2018). 423. Houston, M. E., Harvath, L. & Honek, J. F. Syntheses of and chemotactic responses elicited by fMet-Leu-Phe analogs containing difluoro- and trifluoromethionine. Bioorg. Med. Chem. Lett. 7, 3007–3012 (1997). 424. Duewel, H., Daub, E., Robinson, V. & Honek, J. F. Incorporation of trifluoromethionine into a phage lysozyme: Implications and a new marker for use in protein 19F NMR. Biochemistry 36, 3404–3416 (1997). 425. Bott, G., Field, L. D. & Sternhell, S. Steric effects. A study of a rationally designed system. J. Am. Chem. Soc. 102, 5618–5626 (1980). 426. Vaughan, M. D., Cleve, P., Robinson, V., Duewel, H. S. & Honek, J. F. Difluoromethionine as a Novel 19F NMR Structural Probe for Internal Amino Acid Packing in Proteins. J. Am. Chem. Soc. 121, 8475–8478 (1999). 427. Holzberger, B., Rubini, M., Möller, H. M. & Marx, A. A highly active DNA polymerase with a fluorous core. Angew. Chemie - Int. Ed. 49, 1324–1327 (2010). 428. Leung, A. K., Duewel, H. S., Honek, J. F. & Berghuis, A. M. Crystal structure of the lytic transglycosylase from bacteriophage lambda in complex with hexa-N-acetylchitohexaose. Biochemistry 40, 5665–5673 (2001). 429. Duewel, H. S., Daub, E., Robinson, V. & Honek, J. F. Elucidation of solvent exposure, side-chain reactivity, and steric demands of the trifluoromethionine residue in a recombinant protein. Biochemistry 40, 13167–13176 (2001). 430. Garner, D. K. et al. Reduction potential tuning of the blue copper center in Pseudomonas aeruginosa azurin by the axial methionine as probed by unnatural amino acids. J. Am. Chem. Soc. 128, 15608–15617 (2006). 431. Brown, Z. Z. et al. Strategy for ‘Detoxification’ of a cancer-derived histone mutant based on mapping its interaction with the methyltransferase PRC2. J. Am. Chem. Soc. 136, 13498–13501 (2014). 432. Dickerman, H. W., Steers, E. J., Redfield, B. G. & Weissbach, H. Methionyl soluble ribonucleic acid transformylase. I. Purification and partial characterization. J. Biol. Chem. 242, 1522–1525 (1967). 433. Rajbhandary, U. L. Initiator Transfer RNAs. Microbiology 176, 547–552 (1994). 434. Adams, J. M. On the release of the formyl group from nascent protein. J. Mol. Biol. 33, 571–589 (1968). 435. Sherman, F., Stewart, J. W. & Tsunasawa, S. Methionine or not methionine at the beginning of a protein. BioEssays 3, 27–31 (1985). 436. Ben-Bassat, A., Bauer, K., Chang, S. Y., Myambo, K. & Boosman, A. Processing of the initiation methionine from proteins: Properties of the Escherichia coli methionine aminopeptidase and its gene structure. J. Bacteriol. 169, 751–757 (1987). 437. Walasek, P. & Honek, J. F. Nonnatural amino acid incorporation into the methionine 214 position of the metzincin Pseudomonas aeruginosa alkaline protease. BMC Biochem. 6, 21

141 6 Bibliography

(2005). 438. Dyer, H. M. EVIDENCE OF THE PHYSIOLOGICAL SPECIFICITY OF METHIONINE IN REGARD TO THE METHYLTHIOL GROUP : THE SYNTHESIS OF S-ETHYLHOMOCYSTEINE (ETHIONINE) AND A STUDY OF ITS AVAILABILITY FOR GROWTH. J. Biol. Chem. 124, 519–525 (1938). 439. Colombani, F., Cherest, H. & De Robichon Szulmajster, H. Biochemical and regulatory effects of methionine analogues in Saccharomyces cerevisiae. J. Bacteriol. 122, 375–384 (1975). 440. Fersht, A. R. & Dingwall, C. An editing mechanism for the methionyl-tRNA synthetase in the selection of amino acids in protein synthesis. Biochemistry 18, 1250–1256 (1979). 441. Lemoine, F., Waller, J. P. & van Rapenbusch, R. Studies on methionyl transfer RNA synthetase. 1. Purification and some properties of methionyl transfer RNA synthetase from Escherichia coli K-12. Eur. J. Biochem. 4, 213–221 (1968). 442. Old, J. M. & Jones, D. S. A comparison of ethionine with methionine in Escherichia coli in vitro polypeptide chain initiation and synthesis. FEBS Lett. 66, 264–268 (1976). 443. Old, J. M. & Jones, D. S. The aminoacylation of transfer ribonucleic acid. Recognition of methionine by Escherichia coli methionyl-transfer ribonucleic acid synthetase. Biochem. J. 165, 367–373 (1977). 444. Brown, J. L. The modification of the amino terminal region of Escherichia coli proteins after initiation with methionine analogues. Biochim. Biophys. Acta 294, 527–529 (1973). 445. Old, J. M. & Jones, D. S. The aminoacylation of transfer ribonucleic acid. Inhibitory effects of some amino acid analogues with altered side chains. Biochem. J. 159, 503–511 (1976). 446. Alix, J. H., Hayes, D. & Nierhaus, K. H. Properties of ribosomes and RNA synthesized by Escherichia coli grown in the presence of ethionine. V. Methylation dependence on the assembly of E. coli 50 S ribosomal subunits. J. Mol. Biol. 127, 375–395 (1979). 447. Beaud, G. & Hayes, D. H. [Properties of ribosomas and RNA synthesized by Escherichia coli cultivated in the presence of ethionine. 2. In vitro methylation of ribosome ethionine]. Eur. J. Biochem. 20, 525–534 (1971). 448. YOSHIDA, A. Studies on the mechanism of protein synthesis: bacterial alpha-amylase containing ethionine. Biochim. Biophys. Acta 29, 213–214 (1958). 449. YOSHIDA, A. & YAMASAKI, M. Studies on the mechanism of protein synthesis; incorporation of ethionine into alpha-amylase of Bacillus subtilis. Biochim. Biophys. Acta 34, 158–165 (1959). 450. Budisa, N. et al. High‐level Biosynthetic Substitution of Methionine in Proteins by its Analogs 2‐ Aminohexanoic Acid, Selenomethionine, Telluromethionine and Ethionine in Escherichia coli. Eur. J. Biochem. 230, 788–796 (1995). 451. GLENN, J. L. Activation of L-methionine and L-ethionine by pH 5 fraction of rat liver. Arch. Biochem. Biophys. 95, 14–18 (1961). 452. Smith, R. C. & Salmon, W. D. Formation of S-adenosylethionine by ethionine-treated rats. Arch. Biochem. Biophys. 111, 191–196 (1965). 453. Hoffman, J. A rapid liquid chromatographic determination of S-adenosylhomocysteine in subgram amounts of tissue. Anal. Biochem. 68, 522–530 (1975). 454. SCHLENK, F., DAINKO, J. L. & STANFORD, S. M. Improved procedure for the isolation of S- adenosylmethionine and S-adenosylethionine. Arch. Biochem. Biophys. 83, 28–34 (1959). 455. Swann, P. F., Pegg, A. E., Hawks, A., Farber, E. & Magee, P. N. Evidence for ethylation of rat liver

142 6 Bibliography

deoxyribonucleic acid after administration of ethionine. Biochem. J. 123, 175–181 (1971). 456. Rosen, L. Ethylation in vivo of purines in rat-liver RNA by L-ethionine. Biochem. Biophys. Res. Commun. 33, 546–550 (1968). 457. Ortwerth, B. J. & Novelli, G. D. Studies on the incorporation of L-ethionine-ethyl-l-14C into the transfer RNA of rat liver. Cancer Res. 29, 380–390 (1969). 458. Pegg, A. E. Studies of the ethylation of rat liver transfer ribonucleic acid after administration of L-ethionine. Biochem. J. 128, 59–68 (1972). 459. Friedman, M., Shull, K. H. & Farber, E. Highly selective in vivo ethylation of rat liver nuclear protein by ethionine. Biochem. Biophys. Res. Commun. 34, 857–864 (1969). 460. Alix, J. H. Molecular aspects of the in vivo and in vitro effects of ethionine, an analog of methionine. Microbiol. Rev. 46, 281–295 (1982). 461. Zygmunt, W. A. & Tavormina, P. A. DL-S-Trifluoromethylhomocysteine, a novel inhibitor of microbial growth. Can. J. Microbiol. 12, 143–148 (1966). 462. Becker, J. & Wittmann, C. Systems and synthetic metabolic engineering for amino acid production - the heartbeat of industrial strain development. Curr. Opin. Biotechnol. 23, 718– 726 (2012). 463. Huang, J. F. et al. Metabolic engineering of Escherichia coli for microbial production of L- methionine. Biotechnol. Bioeng. 114, 843–851 (2017). 464. Chattopadhyay, M. K., Ghosh, A. K. & Sengupta, S. Control of methionine biosynthesis in Escherichia coli K12: a closer study with analogue-resistant mutants. J. Gen. Microbiol. 137, 685–691 (1991). 465. Nakamori, S., Kobayashi, S., Nishimura, T. & Takagi, H. Mechanism of L-methionine overproduction by Escherichia coli: The replacement of Ser-54 by Asn in the MetJ protein causes the derepression of L-methionine biosynthetic enzymes. Appl. Microbiol. Biotechnol. 52, 179– 185 (1999). 466. Holloway, C. T., Greene, R. C. & Su, C. H. Regulation of S-adenosylmethionine synthetase in Escherichia coli. J. Bacteriol. 104, 734–747 (1970). 467. Lombardini, J. B., Coulter, A. W. & Talalay, P. Analogues of methionine as substrates and inhibitors of the methionine adenosyltransferase reaction. Deductions concerning the conformation of methionine. Mol. Pharmacol. 6, 481–499 (1970). 468. Zano, S. P., Bhansali, P., Luniwal, A. & Viola, R. E. Alternative substrates selective for S- adenosylmethionine synthetases from pathogenic bacteria. Arch. Biochem. Biophys. 536, 64– 71 (2013). 469. Singh, S. et al. Facile chemoenzymatic strategies for the synthesis and utilization of S-adenosyl- L-methionine analogues. Angew. Chemie - Int. Ed. 53, 3965–3969 (2014). 470. Wang, F. et al. Understanding molecular recognition of promiscuity of thermophilic methionine adenosyltransferase sMAT from Sulfolobus solfataricus. FEBS J. 281, 4224–4239 (2014). 471. Dippe, M. et al. Rationally engineered variants of S-adenosylmethionine (SAM) synthase: Reduced product inhibition and synthesis of artificial cofactor homologues. Chem. Commun. 51, 3637–3640 (2015). 472. Saengkerdsub, S. et al. Identification and methionine analog tolerance of environmental bacterial isolates selected on methionine analog containing medium. J. Environ. Sci. Heal. - Part B Pestic. Food Contam. Agric. Wastes 49, 290–298 (2014).

143 6 Bibliography

473. Willke, T. Methionine production--a critical review. Appl. Microbiol. Biotechnol. 98, 9893–9914 (2014). 474. Quandt, E. M., Deatherage, D. E., Ellington, A. D., Georgiou, G. & Barrick, J. E. Recursive genomewide recombination and sequencing reveals a key refinement step in the evolution of a metabolic innovation in Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 111, 2217–2222 (2014). 475. Quandt, E. M. et al. Fine-tuning citrate synthase flux potentiates and refines metabolic innovation in the Lenski evolution experiment. Elife 4, (2015). 476. Douglas, S. M., Chubiz, L. M., Harcombe, W. R. & Marx, C. J. Identification of the potentiating mutations and synergistic epistasis that enabled the evolution of inter-species cooperation. PLoS One 12, 1–18 (2017). 477. Neidhardt, F. C., Bloch, P. L. & Smith, D. F. Culture medium for enterobacteria. J. Bacteriol. 119, 736–747 (1974). 478. LaCroix, R. A., Palsson, B. O. & Feist, A. M. A model for designing adaptive laboratory evolution experiments. Appl. Environ. Microbiol. 83, (2017). 479. Monds, R. D. et al. Systematic Perturbation of Cytoskeletal Function Reveals a Linear Scaling Relationship between Cell Geometry and Fitness. Cell Rep. 9, 1528–1537 (2014). 480. Atolia, E. et al. Environmental and Physiological Factors Affecting High-Throughput Measurements of Bacterial Growth. 1–19 (2020). 481. Parungao, G. G. et al. Complementation of a metK- deficient E . coli strain with heterologous AdoMet synthetase genes. Microbiology 163, 1812–1821 (2017). 482. Jinek, M. et al. A Programmable Dual-RNA–Guided DNA Endonuclease in Adaptive Bacterial Immunity. 337, 816–822 (2012). 483. Mougiakos, I., Bosma, E. F., de Vos, W. M., van Kranenburg, R. & van der Oost, J. Next Generation Prokaryotic Engineering: The CRISPR-Cas Toolkit. Trends Biotechnol. 34, 575–587 (2016). 484. Cho, S. W. et al. Analysis of off-target effects of CRISPR/Cas-derived RNA-guided endonucleases and nickases. Genome Res. 24, 132–141 (2014). 485. Fu, Y. et al. High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells. Nat. Biotechnol. 31, 822–826 (2013). 486. Zhao, D. et al. CRISPR/Cas9-assisted gRNA-free one-step genome editing with no sequence limitations and improved targeting efficiency. Sci. Rep. 7, 1–9 (2017). 487. Bakker, A. & Smith, D. W. Methylation of GATC sites is required for precise timing between rounds of DNA replication in Escherichia coli. J. Bacteriol. 171, 5738–5742 (1989). 488. Boye, E. & Løbner-Olesen, A. The role of dam methyltransferase in the control of DNA replication in E. coli. Cell 62, 981–989 (1990). 489. Barras, F. & Marinus, M. G. The great GATC: DNA methylation in E. coli. Trends Genet. 5, 139– 143 (1989). 490. Reisenauer, A. N. N., Kahng, L. Y. N. S. U. E. & Collum, S. M. C. MINIREVIEW Bacterial DNA Methylation : a Cell Cycle Regulator ? 181, 5135–5139 (1999). 491. Agostini, F. et al. Multiomics Analysis Provides Insight into the Laboratory Evolution of Escherichia coli toward the Metabolic Usage of Fluorinated Indoles. ACS Cent. Sci. (2020) doi:10.1021/acscentsci.0c00679.

144 6 Bibliography

492. Chin, J. W. Expanding and reprogramming the genetic code. Nature 550, 53–60 (2017). 493. Engler, C., Kandzia, R. & Marillonnet, S. A one pot, one step, precision cloning method with high throughput capability. PLoS One 3, e3647 (2008). 494. Engler, C., Gruetzner, R., Kandzia, R. & Marillonnet, S. Golden gate shuffling: a one-pot DNA shuffling method based on type IIs restriction enzymes. PLoS One 4, e5553 (2009). 495. Austin, S. & Nordström, K. Partition-mediated incompatibility of bacterial plasmids. Cell 60, 351–354 (1990). 496. Nordström, K. & Austin, S. J. Mechanisms that contribute to the stable segregation of plasmids. Annu. Rev. Genet. 23, 37–69 (1989). 497. Novick, R. P. Plasmid incompatibility. Microbiol. Rev. 51, 381–395 (1987). 498. Ko, J. H. et al. Pyrrolysyl-tRNA synthetase variants reveal ancestral aminoacylation function. FEBS Lett. 587, 3243–3248 (2013). 499. Wang, Y. S. et al. The de novo engineering of pyrrolysyl-tRNA synthetase for genetic incorporation of l-phenylalanine and its derivatives. Mol. Biosyst. 7, 714–717 (2011). 500. Sever, S., Rogers, K., Rogers, M. J., Carter, C. & Söll, D. Escherichia coli tryptophanyl-tRNA synthetase mutants selected for tryptophan auxotrophy implicate the dimer interface in optimizing amino acid binding. Biochemistry 35, 32–40 (1996). 501. V. Hall, C., VanCleemput, M., Muench, K. H. & Yanofsky, C. The Nucleotide Sequence of the Structural Gene for Escherichia coli Tryptophanyl-tRNA Synthetase*. J. Biol. Chem. 257, 6132– 6136 (1982). 502. Billerbeck, S. & Panke, S. A genetic replacement system for selection-based engineering of essential proteins. Microb. Cell Fact. 11, 1 (2012). 503. Nilsson, M. & Rydén-Aulin, M. Glutamine is incorporated at the nonsense codons UAG and UAA in a suppressor-free Escherichia coli strain. Biochim. Biophys. Acta 1627, 1–6 (2003). 504. Nehring, S., Budisa, N. & Wiltschi, B. Performance analysis of orthogonal pairs designed for an expanded eukaryotic genetic code. PLoS One 7, (2012). 505. O’Donoghue, P., Ling, J., Wang, Y. S. & Söll, D. Upgrading protein synthesis for synthetic biology. Nat. Chem. Biol. 9, 594–598 (2013). 506. Yanagisawa, T. et al. Structural Basis for Genetic-Code Expansion with Bulky Lysine Derivatives by an Engineered Pyrrolysyl-tRNA Synthetase. Cell Chem. Biol. 26, 936-949.e13 (2019). 507. Bullock, T. L., Rodríguez-Hernández, A., Corigliano, E. M. & Perona, J. J. A rationally engineered misacylating aminoacyl-tRNA synthetase. Proc. Natl. Acad. Sci. U. S. A. 105, 7428–7433 (2008). 508. Kuhn, S. M., Rubini, M., Fuhrmann, M., Theobald, I. & Skerra, A. Engineering of an orthogonal aminoacyl-tRNA synthetase for efficient incorporation of the non-natural amino acid O-methyl- L-tyrosine using fluorescence-based bacterial cell sorting. J. Mol. Biol. 404, 70–87 (2010). 509. Firth, A. E. & Patrick, W. M. GLUE-IT and PEDEL-AA: new programmes for analyzing protein diversity in randomized libraries. Nucleic Acids Res. 36, W281-5 (2008). 510. Phillips, R. S., Cohen, L. A., Annby, U., Wensbo, D. & Gronowitz, S. Enzymatic synthesis of Thia L-tryptophans. Bioorg. Med. Chem. Lett. 5, 1133–1134 (1995). 511. Miles, E. W. et al. The beta subunit of tryptophan synthase. Clarification of the roles of histidine 86, lysine 87, arginine 148, cysteine 170, and cysteine 230. J. Biol. Chem. 264, 6280–6287

145 6 Bibliography

(1989). 512. Cooley, R. B. et al. Structural basis of improved second-generation 3-nitro-tyrosine tRNA synthetases. Biochemistry 53, 1916–1924 (2014). 513. Oehm, S. Adaptation of E. coli towards Tryptophan analog usage. (Technische Universität Berlin, 2016). 514. Hengge, R. Stationary-Phase Gene Regulation in Escherichia coli §. EcoSal Plus 4, (2014). 515. Battesti, A., Majdalani, N. & Gottesman, S. The RpoS-Mediated General Stress Response in Escherichia coli . Annu. Rev. Microbiol. 65, 189–213 (2010). 516. Chatterji, D. & Ojha, A. K. Revisiting the stringent response, ppGpp and starvation signaling. Curr. Opin. Microbiol. 4, 160–165 (2001). 517. Grace, E. D. et al. DksA and ppGpp Regulate the σS Stress Response by Activating Promoters for the Small RNA DsrA and the Anti-Adapter Protein IraP. J. Bacteriol. 200, 1–12 (2017). 518. Saravanan, C., Shekhar, R. C. & Palaniappan, S. Synthesis of Polypyrrole Using Benzoyl Peroxide as a Novel Oxidizing Agent. 342–348 (2006) doi:10.1002/macp.200500376. 519. Howard, J. K., Hyland, C. J. T., Just, J. & Smith, J. A. Controlled Oxidation of Pyrroles : Synthesis of Highly Functionalized γ ‑ Lactams. 2011–2014 (2013) doi:10.1021/ol400491p. 520. Choi, S. Y. et al. Scalable Cultivation of Engineered Cyanobacteria for Squalene Production from Industrial Flue Gas in a Closed Photobioreactor. J. Agric. Food Chem. 68, 10050–10055 (2020). 521. Woo, H. M. Solar-to-chemical and solar-to-fuel production from CO(2) by metabolically engineered microorganisms. Curr. Opin. Biotechnol. 45, 1–7 (2017). 522. Lee, H. J., Choi, J. & Woo, H. M. Biocontainment of Engineered Synechococcus elongatus PCC 7942 for Photosynthetic Production of α-Farnesene from CO 2 . J. Agric. Food Chem. (2021) doi:10.1021/acs.jafc.0c07020. 523. Tilman, D., Balzer, C., Hill, J. & Befort, B. L. Global food demand and the sustainable intensification of agriculture. Proc. Natl. Acad. Sci. U. S. A. 108, 20260–20264 (2011). 524. Hunter, M. C., Smith, R. G., Schipanski, M. E., Atwood, L. W. & Mortensen, D. A. Agriculture in 2050 : Recalibrating Targets for Sustainable Intensification. 67, 386–391 (2017). 525. Santos, M. S., Nogueira, M. A. & Hungria, M. Microbial inoculants: reviewing the past, discussing the present and previewing an outstanding future for the use of beneficial bacteria in agriculture. AMB Express 9, 205 (2019). 526. Chang, P. et al. Plant growth-promoting bacteria facilitate the growth of barley and oats in salt- impacted soil: implications for phytoremediation of saline soils. Int. J. Phytoremediation 16, 1133–1147 (2014). 527. O’Hanlon, K. Plant growth-promoting bacteria field trials in europe. in Endophytes for a Growing World 371–389 (2019). 528. Haskett, T. L., Tkacz, A. & Poole, P. S. Engineering rhizobacteria for sustainable agriculture. ISME J. (2020) doi:10.1038/s41396-020-00835-4. 529. Gallagher, R. R., Patel, J. R., Interiano, A. L., Rovner, A. J. & Isaacs, F. J. Multilayered genetic safeguards limit growth of microorganisms to defined environments. Nucleic Acids Res. 43, 1945–1954 (2015). 530. Huang, S. et al. Coupling spatial segregation with synthetic circuits to control bacterial survival.

146 6 Bibliography

Mol. Syst. Biol. 12, 859 (2016). 531. Ronchel, M. C. & Ramos, J. L. Dual system to reinforce biological containment of recombinant bacteria designed for rhizoremediation. Appl. Environ. Microbiol. 67, 2649–2656 (2001). 532. Chan, C. T. Y., Lee, J. W., Cameron, D. E., Bashor, C. J. & Collins, J. J. ‘Deadman’ and ‘Passcode’ microbial kill switches for bacterial containment. Nat. Chem. Biol. 12, 82–86 (2016). 533. De Saeger, J. et al. Agrobacterium strains and strain improvement: Present and outlook. Biotechnol. Adv. (2020) doi:10.1016/j.biotechadv.2020.107677. 534. Liu, Y. et al. SacB-SacR Gene Cassette As the Negative Selection Marker to Suppress Agrobacterium Overgrowth in Agrobacterium-Mediated Plant Transformation. Front. Mol. Biosci. 3, 70 (2016). 535. Denkovskienė, E., Paškevičius, Š., Werner, S., Gleba, Y. & Ražanskienė, A. Inducible Expression of Agrobacterium Virulence Gene VirE2 for Stringent Regulation of T-DNA Transfer in Plant Transient Expression Systems. Mol. Plant. Microbe. Interact. 28, 1247–1255 (2015). 536. Rubini, R. & Mayer, C. Addicting Escherichia coli to New-to-Nature Reactions. ACS Chem. Biol. 15, 3093–3098 (2020). 537. Kawasaki, H., Bauerle, R., Zon, G., Ahmed, S. A. & Miles, E. W. Site-specific mutagenesis of the alpha subunit of tryptophan synthase from Salmonella typhimurium. Changing arginine 179 to leucine alters the reciprocal transmission of substrate-induced conformational changes between the alpha and beta 2 subunits. J. Biol. Chem. 262, 10678–10683 (1987). 538. Thomason, L. C., Costantino, N. & Court, D. L. E. coli genome manipulation by P1 transduction. Curr. Protoc. Mol. Biol. Chapter 1, Unit 1.17 (2007). 539. Sternberg, N. L. & Maurer, R. Bacteriophage-Mediated Generalized Transduction in Escherichia coli and Salmonella typhimurium. Methods in Enzymology vol. 204 18–43 (1991). 540. Datsenko, K. A. & Wanner, B. L. One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. Proc. Natl. Acad. Sci. 97, 6640–6645 (2000). 541. Edelhoch, H. Spectroscopic determination of tryptophan and tyrosine in proteins. Biochemistry 6, 1948–1954 (1967). 542. Pédelacq, J.-D., Cabantous, S., Tran, T., Terwilliger, T. C. & Waldo, G. S. Engineering and characterization of a superfolder green fluorescent protein. Nat. Biotechnol. 24, 79–88 (2006). 543. Bradford, M. M. A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal. Biochem. 72, 248–254 (1976).

147 7 List of Figures

7 List of Figures

Figure 1 I Radial representation of the genetic code in mRNA format. The primary importance of the second codon position in determining the type of amino acid is emphasized. The first position determines the specific amino acid and the third (wobble) position demonstrates the degeneracy of the genetic code. The natural expansion of the genetic code at opal (SeC) and amber (Pyl) is illustrated in pink...... 3 Figure 2 I Common Trp orientations. a) Edge-to-face orientation in a β-hairpin peptide (1LE0) b) Parallel-displaced orientation in a parallel β-sheet (2KI0) c) Cation-π interaction between Arg and Trp (2B2U)...... 6 Figure 3 I Overview of some biological processes with SAM participation. a) SAM biosynthesis. b) Donation of the amino group for biotin biosynthesis. c) Donation of ribosyl group for tRNA modification. d) Donation of aminoalkyl group for tRNA modification. e) Methyl group donation in a range of biological reactions involving DNA, RNA, proteins, and natural products. f) Aminoalkyl group used in polyamine synthesis. g) Donation of the aminoalkyl group in the synthesis of the quorum-sensing molecule N-acylhomoserine lactone. h) SAM aminoalkyl group utilized in 1-aminocyclopropane-1-carboxylic acid (ACC) synthesis (precursor of the plant hormone ethylene). i) SAM as a source of methylene groups in cyclopropane fatty acid (CFA) synthesis...... 9 Figure 4 I Schematic overview of the two main techniques for ncAA incorporation: selective pressure incorporation (SPI, top) and stop-codon suppression (SCS, bottom)...... 13 Figure 5 I Biosynthesis of pyrrolysine. The three enzymes PylB, PylC, and PylD catalyze the biosynthesis of pyrrolysine from two lysines...... 15 Figure 6 I Active site of PylRS. The enzyme is shown in pink, bound Pyl-AMP is shown in sticks and Y384 is depicted in blue sticks (PDB entry: 2ZIM) ...... 15 Figure 7 I Tyrosyl-tRNA synthetase and tRNATyr from Methanocaldococcus jannaschii. a) Tyrosyl tRNA secondary structures from M. jannaschii and E. coli, identity elements are shown in red. An orthogonal Mj���������� with mutated bases illustrated in red is also shown. b) Crystal structure of wildtype M. jannaschii TyrRS with bound tRNA. The N-terminal domain is shown in orange, the Rossmann fold in red, the CP1 domain in green, the KMSKS loop in yellow, and the C-terminal domain in blue...... 17 Figure 8 I Schematic overview of double-sieve selection. During positive selections, all functional library members are selected, while negative selections sieve out variants that are capable of charging cAAs. Typically, selection cycles are repeated around 2-3 times...... 19 Figure 9 I Reaction scheme of the reaction catalyzed by the tryptophan synthase TrpBA. In a simple metabolic conversion, indole (or its analog) reacts with serine to form tryptophan (or its analog [3,2]Tpa)...... 22 Figure 10 I Chemical structures of methionine and its analogs used in this study...... 24 Figure 11 I Schematic overview of the strategy to generate a biocontained organism. The E. coli trpS gene is to be replaced with an orthogonal aaRS/tRNA pair capable of discriminating between tryptophan and its analogs, rendering the strain-dependent (“addicted”) on these unnatural substrates. The strain would not be able to survive outside the laboratory, as the ncAAs 4-F-Trp, 5-F-Trp, and [3,2]Tpa do not occur in a natural environment...... 25

148 7 List of Figures

Figure 12 I L-methionine de novo biosynthesis pathway in E. coli. The responsible genes and their products are denoted in dark purple. The genes that were deleted in this study to achieve full methionine-auxotrophy are highlighted by the purple box...... 28 Figure 13 I Genomic and phenotypic verification of Met-auxotrophy for the strain MG1655 ∆metEH::FRT. a) Agarose gel of a colony-PCR of ∆metEH::FRT. On the right is a schematic illustration of where the primers bind and which fragment lengths are expected. dKO: double knockout, wt: wildtype. b) Sequencing analysis of the metE and metH loci. The red bars annotate the respective up- and downstream regions of the targeted genes, the blue bars denote FRT sequences. c) Optical densities of cultures cultivated for 48 h in the absence (-M) and presence (+M) of 1 mM Met. d) Cells plated on agar without any Met (left) and supplemented with 1 mM Met (right)...... 29 Figure 14 I Comparison of the structures of trifluoromethionine (TfMet), methionine (Met), and ethionine (Eth). Structures are represented as ball and stick models and Connolly molecular surfaces are shown in transparent blue...... 31 Figure 15 I Schematic overview of processes involved in formylation and deformylation of Met. .. 32 Figure 16 I Schematic overview of feedback inhibition and repression of methionine biosynthesis in E. coli. Proteins and their genes are denoted in purple, repression is represented by dashed arrows, and feedback inhibition by double arrows. The aporepressor MetJ requires SAM as cofactor...... 34 Figure 17 I Elucidation of optimal ALE starting conditions. a) Optical densities of ∆metEH::FRT cultivated in minimal media with Met concentrations ranging from 5 µM to 1 mM measured after 24 h (dashed lines) and 48 h (solid lines). Inset compares phosphate-buffered minimal

media (NMM) with its MOPS-buffered counterpart (MOPS). b) OD600 values of ∆metEH::FRT cultivated in MOPS-buffered minimal media with 20 µM Met and Eth concentrations ranging from 5 µM to 0.5 mM measured after 24 h (dashed lines) and 48 h (solid lines). The inset shows

cultures cultivated with 5 µM-250 µM Eth without the addition of Met. c) OD600 values of ∆metEH::FRT cultivated in MOPS-buffered minimal media with 20 µM Met and TfMet concentrations ranging from 5 µM to 5 mM measured after 24 h (dashed lines) and 48 h (solid lines). The inset shows cultures cultivated without any Met. Values represent the mean of three cultures with the SD as error bars...... 36 Figure 18 I Schematic representation of the effects of small and large passage sizes during adaptive laboratory cultivation. Large passages are more likely to adequately represent the population, including colonies with different beneficial mutations. The larger the passage size, however, the smaller the dilution necessary to discard accumulated waste products and for supplementation of fresh nutrients. It also results in shorter lag and exponential phases, while increasing stationary phase and possibly favoring the onset of the death phase. Small passages, on the other hand, result in longer lag and exponential phases with shorter stationary phases. If the passage size is too small, beneficial mutations may be lost...... 38

Figure 19 I Adaptive laboratory evolution towards Met analog utilization. Optical densities (OD600) are plotted against the number of passages. a) Control experiment with 20 µM Met and no analog addition. b) Adaptation towards Eth usage. c) Adaptation towards TfMet usage...... 39 Figure 20 I Overview of the 1st approach for the replacement of the E. coli metK with either the M. jannaschii or S. solfataricus metK. Mj: M. jannaschii, Ss: S. solfataricus. The plasmid harboring the λ red system and the helper plasmid have temperature-sensitive origins of replication and can be removed via incubation at 42°C. The linearized pieces of the rescue plasmid are digested by endogenous restriction enzymes...... 42

149 7 List of Figures

Figure 21 I Colony PCR of a representative clone from the first metK KO approach. Left: Agarose gel of the colony PCR. Bands are shown for a representative clone from the KO attempt (∆metK) as well as ∆metEH as wt ctrl. Right: Schematic overview showing where the primers bind and the fragment lengths of their PCR products...... 43 Figure 22 I Schematic overview of processes involved in the CAGO technique. Left homo: left homology region, R short: first 40-50 bp of the right homology region, CmR: chloramphenicol resistance cassette, Right homo: right homology region. The scissors represent Cas9-mediated DNA cleavage of the universal N20PAM sequence, DSB: double-strand break...... 45 Figure 23 I Sequencing chromatogram of a representative colony from the CAGO attempt with the M. jannaschii and S. solfataricus editing cassettes. The sequencing chromatograms are aligned to the E. coli control cassette with its left homology arm (dark blue), the E. coli metK gene (green), R short fragment (light blue), Cm resistance cassette (grey), N20PAM (purple), and right homology arm (dark blue)...... 46 Figure 24 I Crystal structure of the E. coli MAT with bound SAM (PDB: 1RG9). The isoleucine at position 302 is highlighted in red and bound SAM is shown as sticks colored by elements. The distance between the methyl group from SAM and the Ile side chain is shown in yellow (3.6 Å). . 47 Figure 25 I Establishing the strain ∆metEH::FRT metK(I302V). a) Verification of the isoleucine to valine point mutation at position 302 (ATC -> GTC) of the E. coli metK. b) LB-agar plate verifying the removal of the pCAGO plasmid. Left: ampicillin supplementation, right: without ampicillin. .. 48 Figure 26 I Comparison of optical densities of the new strain metK(I302V) and its ancestor

∆metEH::FRT in the presence of increasing Eth concentrations. a) OD600 values of ∆metEH::FRT cultivated in NMM19 supplied with 15 µM Met and Eth concentrations ranging from 0-1 mM

measured over 48 h. b) OD600 values of metK(I302V) cultivated in NMM19 supplied with 15 µM Met and Eth concentrations ranging from 0-1 mM measured over 48 h...... 48

Figure 27 I Comparison of OD600 values and CFU between ∆metEH::FRT and metK(I302V) for increasing Eth concentrations. a) Values for the strain ∆metEH::FRT with Eth concentrations increasing from 0-100 µM from top to bottom. b) Values for the strain metK(I302V) with Eth concentrations increasing from 0-100 µM from top to bottom. Values represent the mean of two (50 µM Eth, 100 µM Eth) and three (0 µM Eth, 15 µM Eth) experiments with the standard deviation as error bars...... 50 Figure 28 I Second adaptive laboratory evolution towards Met analog utilization. Optical densities

(OD600) are plotted against the number of passages. a) Control experiment with 10 µM Met and no analog addition. b) Adaptation towards Eth usage...... 51

Figure 29 I Comparison of OD600 values and CFU between the adapted populations 2 (top) and 7 (bottom) and the corresponding control populations cultivated in media supplemented with Eth. a) Populations cultivated in Eth-supplemented media for 31 passages. b) Control populations cultivated in NMM19 lacking Eth for 31 passages and then cultivated in Eth- supplemented media for this experiment...... 52

Figure 30 I Overview of OD600 values (top) and CFU (bottom) of all the adapted populations and the control populations cultivated in media supplemented with Eth. a) Populations cultivated in Eth-supplemented media for 31 passages. b) Control populations cultivated in NMM19 lacking Eth for 31 passages and then cultivated in Eth-supplemented media for this experiment. For better visibility, CFU/mL values are plotted as lines in these summary graphs...... 53 Figure 31 I Overview of the biocontainment approach. Step 1: Double-sieve selection of an aaRS library, consisting of consecutive rounds of positive and negative selections. During positive

150 7 List of Figures

selections, the target ncAA is supplied and functional library members are selected. Negative selections take place in the absence of the ncAA to sort out library members incorporating cAAs. Step 2: Screening of the selected variants for promising candidates by comparing cell growth and fluorescence in the absence and presence of the ncAA. These steps require an amber anticodon. Step 3: Replacement of the endogenous trpS gene in the adapted strain with a selected aaRS/tRNA pair for suppression of Trp codons. The tRNA anticodon needs to be mutated from amber to Trp...... 56 Figure 32 I M. mazei PylRS library. a) Crystal structure of M. mazei PylRS active site with bound ATP and modeled in [3,2]Tpa. ATP (pink) and [3,2]Tpa (blue) are shown as sticks. Residues chosen for randomization are colored in yellow and also shown as sticks. PDB: 3VQV b) Chromatograph of the whole library sequencing. c) Sequencing results of seven randomly picked library members. 58 Figure 33 I Number of colonies on positive selection plates supplemented with 0.1 mM [3,2]Tp compared to control plates lacking [3,2]Tp. The colonies were counted using ImageJ, where every colony larger than 5 pixels was counted. The colony numbers per plate were normalized to the culture volume for better comparison. P1-4: positive selection 1-4, ctrl: control plates. P1 and P2 represent the mean of 4 and 3 plates, respectively, with the SD represented as an error bar...... 60 Figure 34 I Screening for promising M. mazei PylRS library members. After the fourth positive

selection, 30 colonies were resuspended in 50 µL sterile ddH2O and 2 µL of the suspension were spotted on NMM19 –Trp plates with and without 0.1 mM [3,2]Tp and with increasing Cm concentrations. The cells were incubated overnight at 37 °C...... 61 Figure 35 I Fluorescence assay of sfGFP(R2TAG) with the PylRS HLLNQ mutant in media supplemented with and without Trp or [3,2]Tp. a) Absorbance at 600 nm. b) Fluorescence after 15 h of incubation...... 63 Figure 36 I Comparison of the structures of pyrrolysine, tryptophan, and its analog [3,2]Tpa...... 64 Figure 37 I The tryptophanyl-tRNA synthetase. a) Structure of the B. stearothermophilus TrpRS dimer with bound tryptophan (shown in pink). PDB: 1MB2 b) Active site of the B. stearothermophilus TrpRS with bound tryptophan. Highlighted in red and shown as sticks are the residues chosen for randomization in the TrpRS library, bound Trp is shown in green. c) Residues that interact with bound Trp-Amp in the B. stearothermophilus crystal structure (reproduced from Doublié et al94)...... 65 Figure 38 I Sequencing results of the E. coli TrpRS library. a) Sequencing chromatogram of the library mixture. b) Sequencing results of randomly picked colonies...... 66 Figure 39 I Overview of the processes involved in selection with a genetic replacement system. The gene of interest is replaced with an antibiotic resistance cassette while the rescue plasmid complements for the chromosomal loss. Subsequent elimination of the rescue plasmid followed by transformation of the library yield functional library members, which can complement for the loss of the wild type gene of interest...... 67 Figure 40 I Assessment of the optimal time point of library transformation for the genetic replacement system. G2748 ∆trpS was cultivated in LB media at 30 °C and I-SceI mega-nuclease expression was induced at the indicated time points. Samples were taken at different time

points, OD600 was measured and CFU were assessed by plating 2 µL of a dilution series of the samples on LB agar plates...... 69 Figure 41 I Colony forming units of the samples taken during the final optimization experiment with the recombination system. a) Colony forming units of the samples taken 1 h and 2 h after

151 7 List of Figures

induction of I-SceI mega-nuclease expression. b) Colony forming units of the transformations with three control plasmids; wildtype ecTrpRS on two different backbones and ecTrpRS with an amber stop codon at position 3 for assessment of false positives due to recombination events. . 70 Figure 42 I Active site of the M. jannaschii TyrRS with bound tyrosine (green). Residues chosen for mutation are shown as sticks and highlighted in red. PDB: 1J1U ...... 71 Figure 43 I Indole analog tolerance of NEB10-beta cells under selection conditions. After transformation with the MjTyrRS library, a dilution series was prepared from recovered NEB10- beta cells and 2 µL of each dilution were plated on LB agar plates containing 37 µg/mL Cm and increasing concentrations of 4-F-indole or [3,2]Tp...... 72 Figure 44 I Number of colonies on positive selection plates supplemented with the indicated indole analog compared to control plates lacking the analog. The colonies were counted using ImageJ, where every colony larger than 5 pixels was counted. P1-3: positive selection 1-3, ctrl: control plates. P1 represents the mean of 3 ([3,2]Tp) and 5 (4-F-indole) agar plates with the SD represented as the error bar. a) Double-sieve selection with [3,2]Tp. The colony numbers for P1 were adjusted to the volume of recovered transformations spread on P2 and P3 plates (450 µL / plate). b) Selection experiments with 4-F-indole...... 73 Figure 45 I Serial dilutions of the two promising MjTyrRS variants 33 and 39 on increasing Cm concentrations with and without [3,2]Tp...... 74 Figure 46 I Normalized fluorescence of MjTyrRS variant 33 and 39 after 15 h of incubation. Six biological replicates were cultivated in ZYP 5052 media supplied with the appropriate antibiotics, as well as either 0.5 mM [3,2]Tp, 0.1 mM 4-F-indole, or no ncAA at all. Fluorescence was normalized to absorption at 600 nm...... 75 Figure 47 I Identification of the amino acid incorporated in response to the amber codon in sfGFP(R2TAG). The reporter protein sfGFP(R2TAG) was co-expressed with MjTyrRS variants 33 or 39 in the presence of [3,2]Tp. The ESI-MS spectra were deconvoluted using the Agilent Mass Hunter BioConfirm software for masses between 27 kDa and 30 kDa. The highest peak was normalized to 100. a) Deconvoluted spectrum of sfGFP(R2TAG) co-expressed with variant 33. The highest peak corresponds to phenylalanine incorporation. The smaller peaks are each approximately 22 Da apart and likely correspond to sodium adducts (Na = 22.99 Da). b) Deconvoluted spectrum of sfGFP(R2TAG) co-expressed with variant 39. The highest peak corresponds to glutamine incorporation. The smaller peaks are each approximately 22 Da apart and likely correspond to sodium adducts (Na = 22.99 Da)...... 76 Figure 48 I Number of colonies on positive selection plates supplemented [3,2]Tp compared to control plates lacking the analog during selections with the epPCR MjTyrRS 33 library. The colonies were counted using ImageJ, where every colony larger than 5 pixels was counted. P1- 3: positive selection 1-3, ctrl: control plates. The numbers for the selection plates represent the mean of 5 (P1), 3 (P2), and 2 (P3) agar plates with the SD represented as error bars. The colony numbers were normalized to the culture volume...... 78 Figure 49 I Serial dilutions of promising variants on increasing Cm concentrations with and without [3,2]Tp after the second round of positive selection with the epPCR MjTyrRS 33 library...... 79 Figure 50 I Representative serial dilutions of promising variants on increasing Cm concentrations with and without [3,2]Tp after the third round of positive selection with the epPCR MjTyrRS 33 library...... 80

152 7 List of Figures

Figure 51 I Fluorescence assay of sfGFP(R2TAG) with different MjTyrRS variants in the absence and presence of ncAAs. a) Variants picked after the second and third round of positive selection with the epPCR library based on variant 33. P2_x denotes variants picked after the second positive selection, with x designating the number of the randomly picked colony. P3_x denotes variants picked after the third round. sfGFP(R2TAG): control transformed with sfGFP, but no MjTyrRS. b) Test of the efficacy of 5-F-indole compared to 5-F-Trp in the fluorescence assay with variant 14.2. Analog concentrations used were: 0.03 mM 5-F-indole, 0.1 mM 5-F-indole/Trp, 0.5 mM 5-F-indole/Trp, 1 mM 5-F-indole/Trp, and 5 mM 5-F-indole/Trp...... 81 Figure 52 I Screening of the epPCR library based on MjTyrRS variant 14.2. a) Number of colonies on positive selection plates supplemented with 5-F-Trp compared to control plates lacking the analog. The colonies were counted using ImageJ, where every colony larger than 5 pixels was counted. P1-2: positive selection 1-2, ctrl: control plates. The numbers for the selection plates represent the mean of 2 agar plates with the SD represented as error bars. The colony numbers were normalized to the culture volume. b) Representative serial dilutions of variants on increasing Cm concentrations with and without 5-F-Trp after the first and second round of positive selections with the epPCR MjTyrRS 14.2 library...... 82 Figure 53 I Fluorescence assay of MjTyrRS variants in the presence of the non-canonical amino acid [3,2]Tpa. sfGFP(R2TAG): control transformed with sfGFP, but no MjTyrRS. Columns with error bars represent the mean of two or three values with the SD as error bars...... 83 Figure 54 I Selection with the amino acid [3,2]Tpa on minimal media (NMM19 -Trp). a) Number of colonies on positive selection plates supplemented with [3,2]Tpa compared to control plates lacking the amino acid. The colonies were counted using ImageJ, where every colony larger than 5 pixels was counted. The numbers for the selection plates represent the mean of 2 agar plates with the SD represented as error bars. The colony numbers were normalized to the culture volume. Between both positive selections, a negative selection was performed. b) Single colonies from the first and second round of positive selections with the MjTyrRS library on increasing Cm concentrations with and without the amino acid [3,2]Tpa...... 84 Figure 55 I Selection with the amino acid [3,2]Tpa on LB media. a) Overview of the two different approaches employed during selection experiments with 0.5mM [3,2]Tpa. b) Number of colonies on positive selection plates supplemented with [3,2]Tpa compared to control plates lacking the amino acid. The colonies were counted using ImageJ, where every colony larger than 5 pixels was counted. The numbers for the selection plates represent the mean of 2 agar plates with the SD represented as error bars. The colony numbers were normalized to the culture volume. c) Single colonies from the second and third rounds of positive selections with the MjTyrRS library on increasing Cm concentrations with and without the amino acid [3,2]Tpa...... 85 Figure 56 I Schematic overview of processes involved in the CRISPR/Cas9 response. Figure supplied by © Johan Jarnestad / The Royal Swedish Academy of Sciences...... 115 Figure 57 I Nucleotide and amino acid sequence of the M. jannaschii methionine adenosyl transferase. The sequence was codon-optimized for expression in E. coli by GeneArt (Thermo Fisher Scientific)...... 156 Figure 58 I Nucleotide and amino acid sequence of the S. solfataricus methionine adenosyl transferase. The sequence was codon-optimized for expression in E. coli by GeneArt (Thermo Fisher Scientific)...... 157 Figure 59 I SDS PAGE of EcMAT overexpression after IMAC purification. E1-E4 = eluate fraction 1- 4...... 157

153 7 List of Figures

Figure 60 I SDS PAGE of heterologous MjMAT (left) and SsMAT (right) expression after IMAC purification. E1-E2 = eluate fraction 1-2, pre E = small peak eluted prior to the main peak...... 158 Figure 61 I SDS PAGE of MjMAT (left) and EcMAT (right) after IEX purification...... 158

Figure 62 I Comparison of OD600 values and CFU between the adapted populations 1, 2, 6 and the corresponding control populations cultivated in media supplemented with Eth. a) Populations cultivated in Eth-supplemented media for 31 passages. b) Control populations cultivated in NMM19 lacking Eth for 31 passages and then cultivated in Eth-supplemented media for this experiment...... 159

Figure 63 I Comparison of OD600 values and CFU between the adapted populations 7, 10, 15 and the corresponding control populations cultivated in media supplemented with Eth. a) Populations cultivated in Eth-supplemented media for 31 passages. b) Control populations cultivated in NMM19 lacking Eth for 31 passages and then cultivated in Eth-supplemented media for this experiment...... 160

Figure 64 I Comparison of OD600 values and CFU of the control populations cultivated in media lacking Eth for 31 passages...... 161

Figure 65 I Overview of OD600 values (a) and CFU (b) of all the control populations cultivated in media lacking Eth...... 162 Figure 66 I Representative agar plates of the positive selections with the M. mazei PylRS library. P1 was incubated for 72 h at 30 °C and exhibits colonies of varying size. All other positive selections were incubated for 48 h, resulting in colonies of uniform size...... 163 Figure 67 I SDS PAGE of SCS with sfGFP (≈ 28 kDa) and MjTyrRS variant 33 (top) as well as variant 39 (bottom). E1-E3 = eluate fraction 1-3...... 164 Figure 68 I TLC chromatograms following the progress of [3,2]Tpa synthesis...... 165 Figure 69 I Mass spectrum of [3,2]Tpa after HPLC purification...... 165

154 8 List of Tables

8 List of Tables

Table 1 I Overview of experimental parameters of the positive selections...... 59 Table 2 I Summary of the sequencing results of PylRS mutants isolated after the fourth positive selection. Six of eleven sequenced mutants have the same sequence (shown in bold)...... 62 Table 3 I Media compositions tested in the fluorescence assay...... 62 Table 5 I Summary of mutations in isolates 33 and 39 as revealed through sequencing analysis. ... 74 Table 6 I Summary of the number and type of mutations found in eight colonies of the epPCR- based MjTyrRS library...... 77 Table 7 I Summary of library characteristics of the error-prone PCR-based MjTyrRS library estimated with PEDEL-AA509...... 77

155

9 Appendix

9.1 Heterologous MAT expression

1 10 20 30 40 50 MR N I IV K K L D V EP I E ER P T E IV ER K G L G H P D S IC D G IA ES V S R A L C K M Y M EK FG T IL Mj MAT opt GCGAGCAAAGGCGAAGAACTGTTTACCGGCGTGGTGCCGATTCTGGTGGAACTGGATGGCGATGTGAACGGCCATAAATTT...

60 70 80 90 100 110 H H N TD Q V EL V G G H A YP K FG G G V MV S P I Y IL L S G R A T M E IL D K EK N EV IK L P V G T TA V Mj MAT opt GCGAGCAAAGGCGAAGAACTGTTTACCGGCGTGGTGCCGATTCTGGTGGAACTGGATGGCGATGTGAACGGCCATAAA...

120 130 140 150 160 170 K A A K E YL K K V L R NV D V D K D V I ID C R IG Q G S MD L V D V F ER Q K N EV P L A ND TS FG V G YA Mj MAT opt GCGAGCAAAGGCGAAGAACTGTTTACCGGCGTGGTGCCGATTCTGGTGGAACTGGATGGCGATGTGAACGGCCATAAA...

180 190 200 210 220 P L S T T ER L V L E T ER FL NS D EL K N E IP A V G ED IK V MG L R EG K K I TL T IA MA V V D R YV K Mj MAT opt GCGAGCAAAGGCGAAGAACTGTTTACCGGCGTGGTGCCGATTCTGGTGGAACTGGATGGCGATGTGAACGGCCATAAA...

230 240 250 260 270 280 N I E E YK EV I EK V R K K V ED L A K K IA D G Y EV E IH I N TA D D Y ER ES V YL TV TG TS A E MG D Mj MAT opt GCGAGCAAAGGCGAAGAACTGTTTACCGGCGTGGTGCCGATTCTGGTGGAACTGGATGGCGATGTGAACGGCCATAAA...

290 300 310 320 330 340 D G S V G R G NR V NG L I TP FR P MS M EA A S G K NP V NH V G K I Y N IL A NL IA ND IA K L EG V K E Mj MAT opt GCGAGCAAAGGCGAAGAACTGTTTACCGGCGTGGTGCCGATTCTGGTGGAACTGGATGGCGATGTGAACGGCCATAAA...

350 360 370 380 390 C YV R IL S Q IG K P I N EP K A L D I E I I T ED S YD IK D I EP K A K E IA NK WL D N I M EV Q K M IV Mj MAT opt GCGAGCAAAGGCGAAGAACTGTTTACCGGCGTGGTGCCGATTCTGGTGGAACTGGATGGCGATGTGAACGGCCATAAA...

400 406 EG K V T T F

G.. Mj MAT opt Figure 57 I Nucleotide and amino acid sequence of the M. jannaschii methionine adenosyl transferase. The sequence was codon-optimized for expression in E. coli by GeneArt (Thermo Fisher Scientific).

156

1 10 20 30 40 50 MR N I NV Q L NP L S D I EK L Q V EL V ER K G L G H P D Y IA D A V A E EA S R K L S L Y YL K K YG V IL Ss MAT opt GCGAGCAAAGGCGAAGAACTGTTTACCGGCGTGGTGCCGATTCTGGTGGAACTGGATGGCGATGTGAACGGCCATAAATTT...

60 70 80 90 100 110 H H NL D K TL V V G G Q A TP R FK G G D I IQ P I Y I IV A G R A T T EV K T ES G ID Q IP V G T I I I ES Ss MAT opt GCGAGCAAAGGCGAAGAACTGTTTACCGGCGTGGTGCCGATTCTGGTGGAACTGGATGGCGATGTGAACGGCCATAAA...

120 130 140 150 160 170 V K E W IR N N FR YL D A ER H V IV D YK IG K G S S D L V G I F EA S K R V P L S ND TS FG V G FA P L T Ss MAT opt GCGAGCAAAGGCGAAGAACTGTTTACCGGCGTGGTGCCGATTCTGGTGGAACTGGATGGCGATGTGAACGGCCATAAA...

180 190 200 210 220 K L EK L V Y E T ER H L NS K Q FK A K L P EV G ED IK V MG L R R G N EV D L T IA MA T IS EL I ED V N Ss MAT opt GCGAGCAAAGGCGAAGAACTGTTTACCGGCGTGGTGCCGATTCTGGTGGAACTGGATGGCGATGTGAACGGCCATAAA...

230 240 250 260 270 280 H Y I NV K EQ V R NQ IL D L A S K IA P G Y NV R V YV N TG D K ID K N IL YL TV TG TS A EH G D D G M Ss MAT opt GCGAGCAAAGGCGAAGAACTGTTTACCGGCGTGGTGCCGATTCTGGTGGAACTGGATGGCGATGTGAACGGCCATAAA...

290 300 310 320 330 340 TG R G NR G V G L I TP MR P MS L EA TA G K NP V NH V G K L Y NV L A NL IA NK IA Q EV K D V K FS Q Ss MAT opt GCGAGCAAAGGCGAAGAACTGTTTACCGGCGTGGTGCCGATTCTGGTGGAACTGGATGGCGATGTGAACGGCCATAAA...

350 360 370 380 390 V Q V L G Q IG R P ID D P L IA NV D V I T YD G K L TD E TK N E IS G IV D E ML S S F NK L T EL IL EG Ss MAT opt GCGAGCAAAGGCGAAGAACTGTTTACCGGCGTGGTGCCGATTCTGGTGGAACTGGATGGCGATGTGAACGGCCATAAA...

400 404 K A TL F

Ss MAT opt Figure 58 I Nucleotide and amino acid sequence of the S. solfataricus methionine adenosyl transferase. The sequence was codon-optimized for expression in E. coli by GeneArt (Thermo Fisher Scientific).

Figure 59 I SDS PAGE of EcMAT overexpression after IMAC purification. E1-E4 = eluate fraction 1-4.

157

Figure 60 I SDS PAGE of heterologous MjMAT (left) and SsMAT (right) expression after IMAC purification. E1-E2 = eluate fraction 1-2, pre E = small peak eluted prior to the main peak.

Figure 61 I SDS PAGE of MjMAT (left) and EcMAT (right) after IEX purification.

158

9.2 ALE 2.0

a) 1,4 3,50E+08 b) 1,4 3,50E+08 metK(I302V) 1 metK(I302V) 1 Eth in Eth OD(600) M in Eth OD(600) 1,2 CFU 3,00E+08 1,2 CFU 3,00E+08 1,0 2,50E+08 1,0 2,50E+08

0,8 2,00E+08 0,8 2,00E+08

0,6 1,50E+08 0,6 1,50E+08

CFU/mL

CFU/mL

OD(600) OD(600)

0,4 1,00E+08 0,4 1,00E+08

0,2 5,00E+07 0,2 5,00E+07

0,0 0,00E+00 0,0 0,00E+00 0 5 10 15 20 25 0 5 10 15 20 25 Time [h] Time [h]

1,4 3,50E+08 1,4 metK(I302V) 2 3,50E+08 metK(I302V) 2 M in Eth Eth in Eth OD(600) 1,2 CFU 3,00E+08 1,2 OD(600) 3,00E+08 CFU 1,0 2,50E+08 1,0 2,50E+08

0,8 2,00E+08 0,8 2,00E+08

0,6 1,50E+08 0,6 1,50E+08

CFU/mL

OD(600)

CFU/mL OD(600)

0,4 1,00E+08 0,4 1,00E+08

0,2 5,00E+07 0,2 5,00E+07

0,0 0,00E+00 0,0 0,00E+00 0 5 10 15 20 25 30 0 5 10 15 20 25 Time [h] Time [h]

1,4 metK(I302V) 6 3,50E+08 1,4 metK(I302V) 6 3,50E+08 Eth in Eth OD(600) M in Eth OD(600) 1,2 CFU 3,00E+08 1,2 CFU 3,00E+08 1,0 2,50E+08 1,0 2,50E+08

0,8 2,00E+08 0,8 2,00E+08

0,6 1,50E+08 0,6 1,50E+08

CFU/mL

OD(600)

CFU/mL OD(600)

0,4 1,00E+08 0,4 1,00E+08

0,2 5,00E+07 0,2 5,00E+07

0,0 0,00E+00 0,0 0,00E+00 0 5 10 15 20 25 0 5 10 15 20 25 Time [h] Time [h]

Figure 62 I Comparison of OD600 values and CFU between the adapted populations 1, 2, 6 and the corresponding control populations cultivated in media supplemented with Eth. a) Populations cultivated in Eth-supplemented media for 31 passages. b) Control populations cultivated in NMM19 lacking Eth for 31 passages and then cultivated in Eth-supplemented media for this experiment.

159

a) 1,4 metK(I302V) 7 3,50E+08 b) 1,4 metK(I302V) 7 3,50E+08 Eth in Eth OD(600) M in Eth OD(600) 1,2 CFU 3,00E+08 1,2 CFU 3,00E+08

1,0 2,50E+08 1,0 2,50E+08 0,8 0,8 2,00E+08 2,00E+08

0,6 1,50E+08 0,6

CFU/mL

CFU/mL OD(600) OD(600) 1,50E+08 0,4 1,00E+08 0,4 1,00E+08 0,2 5,00E+07 0,2 5,00E+07 0,0 0,00E+00 0,0 0 5 10 15 20 25 0 5 10 15 20 25 Time [h] Time [h]

1,4 metK(I302V) 10 3,50E+08 1,4 metK(I302V) 10 3,50E+08 Eth in Eth OD(600) M in Eth OD(600) 1,2 1,2 3,00E+08 CFU 3,00E+08 CFU 1,0 1,0 2,50E+08 2,50E+08 0,8 0,8 2,00E+08 2,00E+08

0,6 0,6 1,50E+08

CFU/mL

CFU/mL

OD(600) OD(600) 1,50E+08 0,4 0,4 1,00E+08

0,2 1,00E+08 0,2 5,00E+07

0,0 5,00E+07 0,0 0,00E+00 0 5 10 15 20 25 0 5 10 15 20 25 Time [h] Time [h]

1,4 metK(I302V) 15 3,50E+08 1,4 metK(I302V) 15 3,50E+08 OD(600) Eth in Eth M in Eth OD(600) 1,2 CFU 3,00E+08 1,2 CFU 3,00E+08

1,0 2,50E+08 1,0 2,50E+08 0,8 2,00E+08 0,8 2,00E+08

0,6 1,50E+08 0,6

CFU/mL

OD(600) CFU/mL OD(600) 1,50E+08 0,4 1,00E+08 0,4 1,00E+08 0,2 5,00E+07 0,2 5,00E+07 0,0 0,00E+00 0,0 0 5 10 15 20 25 0 5 10 15 20 25 Time [h] Time [h]

Figure 63 I Comparison of OD600 values and CFU between the adapted populations 7, 10, 15 and the corresponding control populations cultivated in media supplemented with Eth. a) Populations cultivated in Eth-supplemented media for 31 passages. b) Control populations cultivated in NMM19 lacking Eth for 31 passages and then cultivated in Eth-supplemented media for this experiment.

160

NMM19 + 10µM Met

1,4 3,50E+08 1,4 3,50E+08 31Met 1 OD(600) 31Met 2 OD(600) CFU CFU 1,2 3,00E+08 1,2 3,00E+08

1,0 2,50E+08 1,0 2,50E+08

0,8 2,00E+08 0,8 2,00E+08

CFU/mL

CFU/mL OD(600) OD(600) 0,6 1,50E+08 0,6 1,50E+08

0,4 1,00E+08 0,4 1,00E+08

0,2 5,00E+07 0,2 5,00E+07

0,0 0,00E+00 0,0 0,00E+00 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Time [h] Time [h]

1,4 3,50E+08 1,4 3,50E+08 31Met 6 OD(600) 31Met 7 OD(600) CFU CFU 1,2 3,00E+08 1,2 3,00E+08

1,0 2,50E+08 1,0 2,50E+08

0,8 2,00E+08 0,8 2,00E+08

CFU/mL

CFU/mL OD(600) 0,6 1,50E+08 OD(600) 0,6 1,50E+08

0,4 1,00E+08 0,4 1,00E+08

0,2 5,00E+07 0,2 5,00E+07

0,0 0,00E+00 0,0 0,00E+00 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Time [h] Time [h]

1,4 3,50E+08 1,4 3,50E+08 31Met 10 OD(600) 31Met 15 OD(600) CFU CFU 1,2 3,00E+08 1,2 3,00E+08

1,0 2,50E+08 1,0 2,50E+08

0,8 2,00E+08 0,8 2,00E+08

CFU/mL OD(600)

CFU/mL 0,6 1,50E+08 OD(600) 0,6 1,50E+08

0,4 1,00E+08 0,4 1,00E+08

0,2 5,00E+07 0,2 5,00E+07

0,0 0,00E+00 0,0 0,00E+00 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Time [h] Time [h] Figure 64 I Comparison of OD600 values and CFU of the control populations cultivated in media lacking Eth for 31 passages.

161

NMM19 + 10µM Met

a) b) 1,4 31M in M 31M in M 3,50E+08 31M 1 31M 1 31M 2 1,2 31M 2 3,00E+08 31M 6 31M 6 31M 7 1,0 31M 7 2,50E+08 31M 10 31M 10 31M 15

0,8 31M 15 2,00E+08 OD(600) 0,6 CFU/mL 1,50E+08

0,4 1,00E+08

0,2 5,00E+07

0,0 0,00E+00 0 5 10 15 20 25 0 5 10 15 20 25 Time [h] Time [h] Figure 65 I Overview of OD600 values (a) and CFU (b) of all the control populations cultivated in media lacking Eth.

162

9.3 M. mazei PylRS Library

control selection plate

P1 37 µg/mL Cm

P2 50 µg/mL Cm

P3 70 µg/mL Cm

P4 90 µg/mL Cm

Figure 66 I Representative agar plates of the positive selections with the M. mazei PylRS library. P1 was incubated for 72 h at 30 °C and exhibits colonies of varying size. All other positive selections were incubated for 48 h, resulting in colonies of uniform size.

163

9.4 SCS of sfGFP(R2TAG)

40 kDA 30 kDA

20 kDA

40 kDA 30 kDA

20 kDA

Figure 67 I SDS PAGE of SCS with sfGFP (≈ 28 kDa) and MjTyrRS variant 33 (top) as well as variant 39 (bottom). E1-E3 = eluate fraction 1-3.

164

9.5 Enzymatic [3,2]Tpa synthesis

scale-up after cation test reaction after HPLC exchange chr.

UV after ninhydrin prior to after reaction reaction reaction Figure 68 I TLC chromatograms following the progress of [3,2]Tpa synthesis.

Figure 69 I Mass spectrum of [3,2]Tpa after HPLC purification.

165