<<

Generalization of Genetic Code Expansion

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

Citation Stork, Devon. 2020. Generalization of Genetic Code Expansion. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Citable link https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37368951

Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA HARVARD UNIVERSITY Graduate School of Arts and Sciences

DISSERTATION ACCEPTANCE CERTIFICATE

The undersigned, appointed by the Department of Molecular and Cellular Biology have examined a dissertation entitled Generalization of Genetic Code Expansion

presented by Devon Stork candidate for the degree of Doctor of Philosophy and hereby certify that it is worthy of acceptance.

Signature Richard Losick (Sep 15, 2020 15:40 EDT) Typed name: Prof. Richard Losick Vlad Denic Signatur Vlad Denic (Sep 17, 2020 14:52 EDT)

Typed name: Prof. Vladimir Denic

Signature Abhishek Chatterjee (Sep 23, 2020 13:28 EDT) Typed name: Prof. Abhishek Chatterjee

Signature Typed name: Prof.

Signature Typed name: Prof.

Date: September 15, 2020

Generalization of Genetic Code Expansion

A dissertation presented

by

Devon Stork

to

The Department of Molecular and Cellular Biology

In partial fulfillment of the requirements for the degree of

Doctor of Philosophy

in the subject of

Biochemistry

Harvard University

Cambridge, Massachusetts

September 2020

© 2020 Devon Stork

All rights reserved

Dissertation Advisors: Dr. Ethan Garner and Dr. George Church Devon Stork

Generalization of Genetic Code Expansion

ABSTRACT

The standard genetic code directs the assembly of the 20 standard amino acids into

proteins and defines function in biology. Through the central dogma, DNA is transcribed into

RNA which is translated by the well-understood machinery of the ribosome and accompanying

tRNA, using the genetic code to create the proteins that accomplish most tasks in life. The field

of genetic code expansion has focused on incorporating synthetic ‘non-standard amino acids’

(nsAAs) with novel chemical structures into the genetic code. This is done by engineering an

aminoacyl-tRNA synthetase to conjugate an externally provided nsAA onto an engineered tRNA

in vivo such that it will proceed to the ribosome for standard translation, being incorporated

into a growing polypeptide chain. Once incorporation has been achieved, nsAAs allow for site- specific encoding of a defined chemical function, without the limitations of the standard genetic code or the requirement of complex protein engineering. With over 150 nsAAs demonstrated in the literature, a broad array of functions are available for experiment and application. However, the contexts in which they can be used are limited. In this thesis, I investigate ways to broaden the applications of existing genetic code expansion tools.

I begin with a description of a post-translational proofreading tool capable of distinguishing between proteins successfully charged with a ‘correct’ nsAA and proteins with an

‘incorrect’ nsAA or standard amino acid. We repurposed a natural protein degradation

iii pathway, the N-end rule, to degrade proteins that were not properly charged with the target nsAA. This system could be tuned by engineering an adaptor protein to change the desired nsAA profile, allowing different versions of post-translational proofreading to check for distinct

nsAAs. Finally, we demonstrated that this tool improved the purity of desired product for

promiscuous genetic code expansion systems and facilitated the directed evolution of more

specific genetic code expansion systems.

Next, I explore genetic code expansion beyond the optimal conditions of strains

specifically engineered to enhance nsAA incorporation. My coauthors and I investigate the use

of peptides derived from honeybee antimicrobial molecules which could transiently inhibit

competition with genetic code expansion. These peptides allow improved nsAA incorporation into various biotechnologically relevant E. coli strains as well as facilitate the expansion of the

Agrobacterium tumefaciens genetic code for the first time.

Finally, I apply the tools of genetic code expansion to the bacteria Bacillus subtilis and demonstrate that nearly any nsAA used in E. coli can be applied in B. subtilis using identical synthetases. I explain that nsAA incorporation into native stop codons is much more common than in E. coli, suggesting differences in translational termination between the two organisms. I also utilize nsAAs for translational titration and photocrosslinking in B. subtilis, showing that these tools can be easily utilized for novel kinds of experiments. Together, these tools will help expand the scope of genetic code expansion beyond specifically engineered strains and nsAAs.

iv Acknowledgements

This thesis and all the work behind it would not have been possible without a huge amount of effort on the part of others in training and supporting me.

As a graduate student, I’ve received quality mentoring from many people, professor, postdoc and graduate student. My advisors Ethan Garner and George Church each supported and mentored me in their own way, and I am thankful for the hands-off but responsive advising style. I was allowed to wander until lost and then helped to find my place again. They also encouraged to follow my interests wherever they lead me, even if that was outside of academia. I also thank my committee, Rich Losick, Abhishek Chatterjee and Vlad Denic for their helpful advice and support.

I owe much to the various postdocs I’ve interacted with over the years. Most notably

Aditya Kunjapur, who took me under his wing when my project was going nowhere and has had so much to teach me about genetic code expansion, professionalism, and the true scientific process. We’ve continued collaborating as he became a professor and I wait with bated breath to see what his lab will accomplish with these technologies. I’ve also worked closely with Erkin

Kuru, an irrepressible spirit of unrivalled creativity and brilliance. I believe in his projects even when they don’t work the first time. Many other postdocs have given me invaluable advice and feedback, including Alex Bisson, Cory Smith, Jorge Marchand, and Kamesh Narasimhan.

My fellow students have provided an important feeling of camaraderie. My MCO class doing homework together was vital for the first years of graduate school, and staying in touch through D&D has been an important place to socialize. I still owe Mary Morrison, Korleki Akiti and Andrew Kane the finale to the story of the Dragonriders. The fellow students in the Garner

v and Church labs have been great people to complain about science to and get advice on tough experiments. Georgia Squyres has been a role model of mine for her incredible way of analyzing the literature and planning good experiments since my second year. Max Shubert, Gabe

Filsinger, George Chao and Sean Wilson have been important sounding boards and sources of support for failed experiments.

Outside of the lab, I’ve had a great deal of support from friends, family and previous educators. My parents and especially my father, Christof Stork, encouraged my curiosity and interest in science from a young age, while my mother Terri Olson kept it tempered by practicality. I’ve had such a wonderful series of amazing teachers at every level of education that I really do wonder if I can take any credit for what I’ve learned. My long-term friends

Andrew Gibiansky and Conrad de Kerkhove have shaped my views of the world, even if we only spend time together intermittently. Sarah Scheffler has been an incredible best friend and I’ve looked forward to every time we hang out. Finally, my partner Amanda Lemire has been my daily source of support, conversation, and fortitude, and I cannot thank her enough for helping me find my center even on bad days.

vi Table of contents

ABSTRACT ...... iii Acknowledgements ...... v Table of contents ...... vii List of Figures ...... ix List of Tables ...... x Abbreviations ...... xi Scientific contributions to this Thesis:...... xii Chapter 1: Introduction...... 1 1.1 Genetic Code Expansion ...... 1 1.2 Engineered tRNA & tRNA-synthetase diversity ...... 4 1.3 Orthogonality and specificity of genetic code expansion ...... 6 1.4 Expansion of the Genetic Code to Novel organisms ...... 7 1.5 Overview ...... 9 Chapter 2: Engineering posttranslational proofreading to discriminate nonstandard amino acids ...... 11 2.1 Abstract ...... 11 2.2 Introduction ...... 12 2.3 Results ...... 14 2.3.1 Evaluation of the Biphenylalanine (BipA) OTS promiscuity ...... 14 2.3.2. Application of the N-end rule to commonly used nsAAs ...... 15 2.3.3 Engineering of the N-end rule for altered recognition of nsAAs ...... 19 2.3.4 Application of proofreading for selective OTS evolution ...... 21 2.3.5 Demonstration of enhanced biocontainment using the more selective OTS ...... 26 2.4 Discussion ...... 28 Chapter 3: Release Factor Inhibiting Antimicrobial Peptides Improve Nonstandard Amino Acid Incorporation in Wild-type Bacterial Cells ...... 31 3.1 Abstract ...... 31 3.2 Introduction ...... 32 3.3 Results ...... 35 3.3.1 Apidaecins improve nsAA incorporation in a cell-free translation system ...... 35 3.3.2 Apidaecins preferably inhibit RF1 in bacteria ...... 37 3.3.3 Apidaecins improve nsAA-dependent sfGFP expression in different bacteria ...... 41 3.3.4 A new auto-inducible system to encode nsAAs ...... 45 3.3.5 Apidaecins improve specific nsAA incorporation ...... 47 3.3.6 In-cell expression of Api1b dramatically improves nsAA incorporation ...... 52

vii 3.3.7 Partial recoding promotes apidaecin tolerance and nsAA incorporation...... 56 3.3.8 In-cell expression of new apidaecin variants improve nsAA incorporation ...... 58 3.4 Conclusion ...... 62 Chapter 4: Broad and Efficient Genetic Code Expansion in Bacillus subtilis ...... 63 4.1 Abstract ...... 63 4.2 Introduction ...... 63 4.3 Results ...... 65 4.3.1 Synthetase activity ...... 65 4.3.2 Genomic TAG incorporation...... 71 4.3.3 Cellular uptake of nsAAs ...... 74 4.3.4 Translational titration with nsAAs ...... 76 4.3.5 Photocrosslinking ...... 79 4.4 Discussion ...... 80 Chapter 5: Discussion ...... 82 5.1 Summary of Results ...... 83 5.2 New Synthetase development ...... 84 5.3 Genetic code expansion in novel organisms ...... 85 5.4 Biotechnology applications ...... 87 5.5 The future of generalization of GCE ...... 90 APPENDIX...... 91 Chapter 2 Appendix ...... 91 Additional Findings ...... 91 Funding, COI and acknowledgements: ...... 93 Methods ...... 94 Chapter 3 Appendix ...... 115 Funding, COI and acknowledgements: ...... 115 Methods ...... 115 Chapter 4 Appendix ...... 138 Funding, COI and acknowledgements: ...... 138 Methods ...... 139 REFERENCES ...... 148

viii

List of Figures Figure 1 Basic Genetic Code Expansion machinery ...... 3 Figure 2 Evaluation of OTS promiscuity ...... 15 Figure 3 Posttranslational proofreading proof of concept ...... 17 Figure 4 Evaluation of OTS promiscuity for 6 OTS/nsAA sets ...... 18 Figure 5 Proofreading tunability achieved through rational ClpS engineering ...... 19 Figure 6 Sequence alignment sampling natural diversity of ClpS...... 20 Figure 7 Characterization of select ClpS variants on broader panels of nsAAs ...... 21 Figure 8 Selective BipA OTS evolution using proofreading ...... 21 Figure 9 FACS data from BipARS EP-PCR library exposed to negative screens of differing stringency ...... 22 Figure 10 Confirmation of BipA incorporation by mass spectrometry (MS) ...... 24 Figure 11 Spontaneous tRNA mutations observed in sorted variants and effect on selectivity .. 25 Figure 12 Sample images of plates depicting biocontainment escape frequency estimation ..... 27 Figure 13 Effect of evolved BipA OTS on biocontainment strain escape and fitness ...... 27 Figure 14 Function of Apidaecins in Genetic Code Expansion ...... 33 Figure 15 Apidaecins improved nsAA incorporation in a cell-free translation system ...... 36 Figure 16 Apidaecins improve nsAA-dependent reporter expression in vitro and in different bacteria ...... 37 Figure 17 Apidaecins inhibit colony formation of different E. coli strains where RF1 function is essential ...... 38 Figure 18 Apidaecins are toxic to different Gram-negative bacteria where RF1 function is essential ...... 40 Figure 19 Apidaecins confer conditional phage resistance and improve nsAA incorporation in E. coli cells with redundant RF1 functionality ...... 42 Figure 20 Exogenously added apidaecins improve nsAA-dependent sfGFP signal increase in different bacteria ...... 44 Figure 21 Media effects on nsAA incorporation combined with api treatment ...... 46 Figure 22 Apidaecins improve specific, multi-site incorporation of Cou...... 48 Figure 23 Apidaecins improve specific multi-site incorporation of a fluorescent nsAA ...... 51 Figure 24 Tweaking expression level and autoinduction improves nsAA incorporation ...... 53 Figure 25 In-cell autoinduction of apidaecin-like antimicrobial peptides improve nsAA incorporation in different E. coli strains ...... 55 Figure 26 RF1 inhibition by apideacins can facilitate recoding efforts toward improved nsAA incorporation ...... 58 Figure 27 In-cell autoinduction allows evolution of new apidaecin-like peptides that show improved nsAA incorporation and decreased cell toxicity ...... 61 Figure 28 Genetic code expansion via nsAA incorporation in B. subtilis...... 66 Figure 29 Extended nsAA incorporation in B. subtilis ...... 68 Figure 30 Mass-spectrometry of nsAA-containing peptides ...... 70 Figure 31 Flourescence imaging using CouAA ...... 71

ix Figure 32 Incorporation into genomic amber stop codons ...... 72 Figure 33 Genomic incorporation and growth curves ...... 73 Figure 34 Fluorescence and OD time courses for various nsAAs ...... 75 Figure 35 LCMS analysis of nsAA levels in cells ...... 76 Figure 36 Protein titration with nsAAs ...... 77 Figure 37 Extended titration data ...... 78 Figure 38 Incorporation of pAzF in sporulating B. subtilis cells ...... 79 Figure 39 In vivo photocrosslinking ...... 80 Figure 40 Additional initial characterization of Post-Translational Proofreading (PTP) system .. 92 Figure 41 PTP is generalizable to other reporter proteins ...... 92 Figure 42 Single UAG suppression sensitivity assay with and without PTP ...... 93

List of Tables Table 1 Sequences of evolved BipA OTS variants ...... 23 Table 2 Growth of biocontained adk.d6/tyrS.d8 strain on 100 μM non-cognate nsAAs as represented by average incubation time required to achieve OD 0.05 (h) ...... 28 Table 3 Amino acid residues of the api1b gene, of the apidaecin-like peptide library, and of enrinched variants that are pursued further in this work...... 54 Table 4 Sequences of key from chapter 2 ...... 56 Table 5 Quantitative characteristics of different promoters in B. sub...... 78 Table 6 Sequences of key constructs used in chapter 2 ...... 96 Table 7 Oligonucleotides used in Chapter 2 ...... 99 Table 8 Sequences of new constructs from chapter 3 ...... 119

x Abbreviations

Genetic Code Expanison – GCE

Nonstandard amino acid – nsAA

Standard amino acid – sAA

Aminoacyl tRNA synthetase – AARS

Elongation factor thermo unstable – EF-Tu

Orthogonal translation system - OTS

Release factor 1 – RF1

xi Scientific contributions to this Thesis:

Contributions for Chapter 2:

Author contributions: A.M.K. designed research; A.M.K., D.A.S., E.K., O.V.-R., and M.L. performed research; A.M.K., D.A.S., O.V.-R., D.S., and G.M.C. analyzed data; A.M.K., D.A.S., and

D.S. wrote the paper; and D.S. and G.M.C. supervised research

Contributions for Chapter 3:

E.K., K.No., R.M. and were involved in plasmid and strain construction. E.K. performed non- standard amino acid incorporation, microscopy and growth experiments with help from K.No.,

R.M, D.A.S. and K.Na. D.W. J.R. and E. K. purified proteins necessary for in vitro experiments and provided synthetic chemistry support. E.K., D.A.S. and G.M.C. wrote the manuscript with feedback from all other authors.

Contributions for Chapter 4:

D.S, G.S and A.J performed strain construction. D.S performed non-standard amino acid incorporation and growth experiments with help from E.K, A.K and A.J. D.S and E. K. purified proteins and performed biochemical experiments. D.S and J.R performed cell import assays. G.S carried out microscopy experiments and analyzed data. K. G and B. B performed photocrosslinking assays and analysis. D.S, E.K and E.G wrote the manuscript with feedback from all other authors.

xii Chapter 1: Introduction

1.1 Genetic Code Expansion

In biology, most functional tasks are carried out by proteins. They form structural elements,

generate mechanical force, catalyze metabolism, respond to stimuli, assemble the molecules of life

into functional structures and much, much more. These complex biomolecules are linear strings of

amino acids, often several hundred or thousand monomers long. DNA stores the information to

make proteins, where every 3 ‘codon’ stands for one amino acid. The ‘genetic code’

specifies which of the 20 amino acids corresponds to each codon. All of life uses the same amino

acids, which have little chemical complexity and are lacking many chemistries that are common in

organic chemistry. Expanding the genetic code with new chemistries by using new amino acids

beyond the standard 20 is an attractive way to engineer biology at the most fundamental level.

To incorporate novel amino acids, we must understand and reprogram how life specifies its

genetic code. Nature stipulates which amino acids correspond to which codon with enzymes called aminoacyl-tRNA synthetases (AARSs), which are ubiquitous across the tree of life. AARSs conjugate

uncharged tRNAs to free amino acids by aminoacylation. In this reaction, amino acids are attached to either the 3’ or 2’ hydroxyl groups of the tRNA acceptor end (5’-CCA-OH-3’). Aminoacylated

tRNAs are then able to bind to elongation factor thermo unstable (EF-Tu) and engage in ribosomal

translation, specifically decoding an mRNA into protein by base pairing between codon and

anticodon1. The specificity of AARSs for their cognate amino acids and tRNAs determines the

genetic code2. tRNAs with anticodons that match the next codon on a translating strand of mRNA

will attach their acylated amino acid to the growing polypeptide chain. This reaction is thought to

occur without regard for the identity of that amino acid, and is solely specified by anticodon-codon 1 base pairing1. As shown in 2001 by Wang and Schultz’s landmark paper, using an engineered AARS to charge a unique tRNA capable of ribosomal translation with a nonstandard amino acid (nsAA) in

vivo was sufficient to introduce that amino acid to the genetic code of E. coli3 (Figure 1).

While the core of Genetic Code Expansion (GCE) systems are the AARS/tRNA pair that

charge the nsAA, often known as the orthogonal translation system (OTS), it is common for other

components of the host cell to require engineering for efficient nsAA incorporation4–7. To

successfully incorporate a new amino acid into the genetic code, the nsAA-charged tRNA must be

able to efficiently with the target codon. In practice, this requires overwriting an existing

codon, using a codon not used in biology or creating an orthogonal translation system. One widely

used approach is to overwrite the UAG codon, or ‘amber’ . The amber stop codon has

been shown to be the least-used codon across much of biology8–10.

Without additional intervention, amber stop codon replacement must suppress the native termination process by successfully competing with release factor I (RFI), or peptide chain release

factor (prfA) in bacteria11. Unsuccessful competition decreases the efficiency of GCE and extensive

strain engineering efforts have attempted to address this issue. For example, whole

recoding7 and synthesis12,13 have been used to generate strains lacking native amber codons and

corresponding release factors that compete with amber codon nsAA incorporation. Newer

strategies have look to develop codons outside of the natural 64 codons found in the standard

genetic code. We can gain access to additional usable codons by switching from triplet to

quadruplet codons14,15 or using unnatural nucleotide base pairs16–18 (non-ATGC) to gain access to

an expanded codon space. Other approaches to improve GCE seek to engineer ribosomal factors, ranging from simply mutating EF-Tu to accept structurally varied nsAAs19–21 to approaches aimed at

2 generating orthogonal ribosome systems with flexible genetic codes5,22,23. While these alternative approaches represent the next generation of expanding the genetic code, the basic approach of competing with the release factor for amber codon incorporation remains a viable strategy for general applications of GCE (Figure 1).

Figure 1 Basic Genetic Code Expansion machinery For successful genetic code expansion, a heterologously expressed AARS binds to its cognate tRNACUA and a synthetic nsAA, resulting in a charged tRNACUA, which proceed to the ribosome through EF-TU interaction to translate the UAG amber stop codon as the nsAA. In non-recoded bacteria this requires competing with release factor 1 (RF1). Figure modified from O’Donoghue et al. Nat. Chem. Biol. 2013

Over 150 different nsAAs have been shown to incorporate into proteins in the literature.

These nsAAs include a very diverse set of functional groups suitable for a broad range of

applications24. While the potentially expanded protein design space with nsAAs bearing chemistries not available to the standard 20 amino acids is very exciting25–28, the field of protein

design has not yet advanced far enough to allow for full utilization of these novel amino acids in newly designed proteins. Further development of protein design tools is necessary before nsAAs can be fully utilized to expand the protein design space. However, nsAA incorporation is very

effective at encoding a single, understood chemical function at a specific position in a protein

without requiring extensive understanding of protein structure.

3 The primary use of current state-of-the-art GCE is nonperturbative addition of emergent

functions to proteins by site-specific addition of custom chemistries. One of the most exciting

examples is the biorthogonal conjugation of arbitrary small molecules to specific positions on

proteins29,30. Researchers have made therapeutic biologics with enhanced function by improving

pharmacokinetics or specifically delivering drug payloads31,32. In another example, GCE has been

used to incorporate a photo-crosslinker on a known or suspected binding surface to identify native

binding partners and locate specific interactions with angstrom-level precision33–35. Furthermore, photocaged amino acids placed in or near the active site of a target protein can give researchers highly precise temporal and spatial control of protein function36–38. Given the broad array of

functional groups, many more applications exist, including fluorescence39, specific insertion of post-

translational modifications40,41 and installation of probes to aid in X-ray and NMR techniques42,43.

For the past 20 years, GCE has been leveraged across biological studies many organisms including E.

coli, yeast and mammalian cells. Moving forward, GCE application areas are beginning to make their way into biomedical and materials engineering.

1.2 Engineered tRNA & tRNA-synthetase diversity

Many different versions of AARS/tRNA pairs that make up the core of a GCE system have

been developed. These can be broken down into families, where each family is derived from a single naturally existing AARS/tRNA pair. Each AARS family contains specific variants which have

been engineered to charge specific target nsAAs. Often, the engineered and wildtype AARSs are

different only by a few mutations in or around the amino acid binding pocket. Typically, families

share general levels of activity and requirements for successful function such that if one member of

a family is functional in a given condition with its target nsAA, another member of that family will

4 also be functional in that condition with its target nsAA. There are two primary AARS families, each

broadly used with many variants, and several secondary AARS families with fewer, more targeted

applications.

The first developed3 and still widely used synthetase family for GCE is the

Methanocaldococcus jannaschii (formerly Methanococcus janaschii) Tyrosyl-tRNA synthetase

(MjTyrRS) system28. Sourced from the autotropic hyperthermophillic M. jannaschii, the MjTyrRS

family has shown itself to be a robust and flexible platform for genetic code expansion, tolerating

tRNACUA mutations and a wide array of binding pocket mutations that allow the many MjTyRSs to bind an extremely broad set of tyrosine-based nsAAs. Though MjTyRSs are only orthogonal to

native translational machinery in bacteria and not in eukaryotes, they allow high-efficiency UAG

suppression of over 50 nsAAs24 in E. coli.

The second of the two prominent synthetase families is based on the archaeal pyrrolysine system from the Methanosarcina genus. Unlike other AARS systems, the wildtype pyrrolysine

AARS/tRNA natively suppresses the UAG amber stop codon. Heterologously expressing the

Methanosarcina barkeri pyrrolysine-tRNA synthetase (MbPylRS) and accompanying tRNA is sufficient to accomplish amber suppression with pyrrolysine in E. coli44. Since its initial discovery,

the MbPylRS and homologous Methanosarcina mazei pyrrolysine-tRNA synthetase (MmPylRS) have

been engineered to incorporate over a hundred novel pyrrolysine-based nsAAs with broad chemical diversity24,45, aided by the promiscuity of the PylRS for various pyrrolysine-derived substrates46–49.

Additionally, the PylRS genetic code expansion is functional in both prokaryotes41 and eukaryotes50,

including in live animals51,52. It has been noted that N-terminal tRNA-binding domain of the PylRSs is

5 insoluble, especially in bacteria53, and new PylRS homologs lacking this domain54,55 may increase

activity and broaden applications of pyrrolsyine-based GCE approaches56,57.

Several additional synthetase systems for GCE exist, but they are limited in both scope and

number of available nsAAs compared to the MjTyRs and PylRS systems. Both the E. coli tyrosyl-58

and leucyl-59 tRNA synthetases (EcTyrRS and EcLeuRS respectively) have been adapted for GCE in

eukaryotic cells. The EcLeuRS contains more engineered variants, capable of incorporating

approximately 20 different nsAAs, while the EcTyrRS system has been used for 9 different nsAAs as of 201524. A final system of note is the tryptophan synthetase (ScWRS),

which functions in E. coli and is capable of charging 4 different nsAAs60,61. Several other orphan GCE

families exist, with one or two characterized variants and nsAAs, including the Methanosarcina acetivorans Tyrosine synthetase62, Pyrococcus horikoshii lysine synthetase63 and the

Methanococcus maripaludis O -Phosphoserine synthetase64.

Future work will undoubtedly see additional nsAAs added to the repertoire of GCE, likely through both new variants of synthetase families described here and addition of new families with new substrate ranges.

1.3 Orthogonality and specificity of genetic code expansion

Though the initial primary challenge in GCE is ensuring that the AARS successfully charges

the introduced tRNA with the target nsAA, a similar challenge lies in ensuring the orthogonality and

specificity of genetic code expansion systems. Introduced AARS/tRNA pairs must not interact with native AARS/tRNAs. Any interaction would result in native tRNAs being charged with the nsAA or the introduced tRNA being charged with standard amino acids, yielding nonspecific incorporation at the target site and elsewhere. To avoid these interactions, AARS families are sourced from

6 phylogenetically distant organisms, and are responsible for the host ranges for different families of

synthetases (Section 1.2) 65,66.

Using distant synthetases is a good start, but given the conservation of tRNA structure, specific engineering of the tRNA/AARS is often necessary to avoid crosstalk with native translation systems. Such methods often use libraries of tRNA/AARSs and dual-selection approaches to remove unwanted activity with host systems while maintaining orthogonal activity67,68. Alternatively, specific identification of the AARS-tRNA interactions responsible for cross-talk can allow rational engineering to generate truly orthogonal GCE systems60,69.

In addition to establishing that an AARS/tRNA pair does not cross-talk with host synthetases,

the AARS itself must lack activity against standard amino acids. AARSs are typically developed for

GCE with directed evolution methods involving rounds of positive and negative selection70,71,

though the limited dynamic range of negative selection25,72 often leaves synthetases with a low level of activity against their parental standard amino acids73,74. This requires work-arounds like

minimal media, or acceptance of some level of mis-incorporation of standard amino acids75, which

can interfere with downstream applications such as biocontainment76,77.

1.4 Expansion of the Genetic Code to Novel organisms

GCE has been widely available as a chemical biology tool in some model systems, but only in

recent years have nsAAs started being incorporated in broader contexts. After being originally

developed in E. coli, GCE was quickly applied in mammalian cells78, S. cerevisiae58 and C. elegans79.

However, only recently has the genetic code of other organisms been expanded, including gram- negative80,81, gram-positive82–84, mycobacterial85 and photoautorophic organisms86,87. These

approaches are useful for both basic science and industrial applications of nsAAs. For example,

7 enabling GCE in Rhodobacter sphaeroides allows usage of nsAA-based chemical biology tools to

investigate the function of photosynthetic reaction centers in their native context. GCE in the

industrially relevant protein producers, such as Bacillus cerus and Bacillus subtilis, may be the first

steps toward nsAA-containing antimicrobial peptides or industrial-scale production of proteins

using GCE83,84.

There are many potential challenges to expanding the genetic code of an organism for the first time. Approaches thus far have not needed to engineer novel AARS/tRNA pairs, instead using pre-existing pairs from the literature, but face challenges in adapting GCE in their organism. At the

very least, oftentimes expression levels and conditions for both the tRNA and the AARS need to be optimized, which is non-trivial in less-tractable organisms85,87. Additionally, issues of orthogonality

and compatibility (Section 1.3) with the host translational system may require significant

engineering to overcome87. Finally, efficiency is often limited by competition with native release

factors83,86, which may be more active and functional in novel bacteria than in E. coli or other organisms used for GCE8.

The most commonly used AARS family for GCE in novel organisms are the Mm/MbPylRSs,

which have been widely touted as an orthogonal and effective tool for genetic code expansion of a

wide range of useful nsAAs45,46,88. However, some of these applications have been limited by the

generally lower efficiency of the Mm/MbPylRSs than the MjTyrRSs83. Additionally, it is common in

these studies to use only a single synthetase family, and often only a single member of that family,

for the first-time expansion of the genetic code83,84,86. This approach does not provide a clear

roadmap to further expansion of the genetic code for broader applications, or even clearly

demonstrate that the rest of the family of AARSs used will work in the novel organism in question.

8 Current and future work will hopefully devote effort to explaining more general rules for expanding the genetic code of both the organism of interest to the study and novel organisms generally.

1.5 Overview

Genetic code expansion has succeeded in incorporating many useful nsAAs in vivo with an ever-expanding set of orthogonal AARSs. While work to further expand the repertoire of available nsAAs is valuable, there are currently many opportunities to improve the accuracy and reliability of nsAA applications and the further expansion of the genetic code to novel contexts, such as new organisms. These are challenges that may be best approached with novel tools from outside the field of genetic code expansion, instead of relying on further selection-based directed evolution of already highly engineered AARSs. In this thesis, I describe new tools and techniques with the focus of improving existing GCE techniques or making them usable in novel contexts.

In Chapter 2, I describe work previously published in Proceedings of the National Academy of Sciences. We develop a method to post-translationally proofreading incorporation of an nsAA, then use that method to address low-specificity AARSs to improve biocontainment applications of

GCE. This chapter addresses issues of orthogonality and specificity in existing GCE systems and describes a new tool to refine and develop these systems in the future.

Chapter 3 contains work previously published in ACS Chemical Biology, in which we apply a release-factor inhibiting peptide to temporarily promote suppression of the amber stop codon in non-recoded bacteria. This improves nsAA incorporation efficiency and facilitates the expansion of the genetic code in Agrobacterium tumefaciens for the first time. This chapter explains a general route to GCE in new bacterial organisms, bypassing recoding to overcome release factor competition.

9 In Chapter 4 I explore in-preparation work in which I expand the genetic code of Bacillus

subtilis with three different families of AARSs, demonstrate three significant applications of nsAAs

and lay a roadmap for incorporation of any demonstrated nsAA into this important bacterium. This

chapter demonstrates the steps necessary to fully open the toolkit of GCE in a new organism, and

the benefits of doing so.

In Chapter 5, I summarize these efforts and describe their potential impact and future

directions. I conclude that as new organisms become genetically tractable, GCE will be developed in them to facilitate basic and applied projects.

10 Chapter 2: Engineering posttranslational proofreading to discriminate nonstandard

amino acids

This chapter was previously published as “Engineering post-translational proofreading to

discriminate non-standard amino acids” with the following authors: Aditya M. Kunjapur Devon A.

Stork, Erkin Kuru, Oscar Vargas-Rodriguez, Matthieu Landon, Dieter Söll, and George M. Church

PNAS January 16, 2018 115 (3) 619-624

2.1 Abstract

Incorporation of nsAAs leads to chemical diversification of proteins, which is an important tool for

the investigation and engineering of biological processes. However, the aminoacyl-tRNA

synthetases crucial for this process are polyspecific in regard to nsAAs and standard amino acids.

Here we develop a quality control system called “post-translational proofreading” to more

accurately and rapidly evaluate nsAA incorporation. We achieve this proofreading by hijacking a natural pathway of protein degradation known as the N-end rule, which regulates the lifespan of a protein based on its amino-terminal residue. We find that proteins containing certain desired N-

terminal nsAAs have much longer half-lives compared to those proteins containing undesired amino acids. We use the post-translational proofreading system to further evolve a

Methanocaldococcus jannaschii tyrosyl-tRNA synthetase (TyrRS) variant and a tRNATyr species for

improved specificity of the nsAA biphenylalanine in vitro and in vivo. Our newly evolved

biphenylalanine incorporation machinery enhances the biocontainment and growth of genetically

engineered strains that depend on biphenylalanine incorporation. Finally, we show

that our post-translational proofreading system can be designed for incorporation of other nsAAs

by rational engineering of the ClpS protein, which mediates the N-end rule. Taken together, our 11 post-translational proofreading system for in vivo protein sequence verification presents a new paradigm for molecular recognition of amino acids and is a major advance in our ability to accurately expand the genetic code.

Significance:

Accurate incorporation of nsAAs is central for genetic code expansion to increase the chemical diversity of proteins. However, aminoacyl-tRNA synthetases are polyspecific and facilitate incorporation of multiple nsAAs. We investigated and repurposed a natural protein degradation pathway, the N-end rule pathway, to devise a novel system for rapid assessment of the accuracy of nsAA incorporation. Using this tool to monitor incorporation of the nsAA biphenylalanine allowed the identification of TyrRS variants with improved amino acid specificity. The evolved TyrRS variants enhanced our ability to contain unwanted proliferation of genetically modified organisms. This post-translational proofreading system will aid the evolution of orthogonal translation systems for specific incorporation of diverse nsAAs.

2.2 Introduction

The ability to incorporate chemically diverse nsAAs broadens the structural and functional

diversity of proteins24,89. nsAAs with varied sidechains can serve as photo-crosslinking groups,

spectroscopic and fluorescent probes, or reactive handles for conjugation24. nsAA incorporation has

also applied to control proliferation of genetically modified organisms by introducing nsAA-

dependency in essential proteins76,77. These biocontainment approaches, known as “synthetic auxotrophy”, are important safeguards as we advance towards assembling that contain large deviations from the standard genetic code12, which can provide hosts with increased

resistance7,90.

12 Site-specific incorporation of nsAAs requires a dedicated aminoacyl-tRNA synthetase

(AARS)-tRNA pair, also known as orthogonal translation system (OTS), which must not cross-react

with the host’s tRNAs and AARSs. The substrate specificity of an AARS is normally engineered

through rounds of site-detected evolution to recognize a desired nsAAs while discriminating any

other AA in the cell2. However, many engineered AARSs fail to effectively discern between cognate

nsAA substrates and standard AAs and other nsAAs. Thus, engineered AARSs with low specificity

can generate target proteins with different amino acids at the desired positions. Currently, most nsAAs are incorporated in response of a nonsense codon (e.g. TAG) within a gene encoding a target protein. Incorporation of nsAAs is then monitored by production of the full-length reporter protein

as evaluated by standard gel electrophoresis or by fluorimetry. However, promiscuous AARS-tRNA

pairs can produce full-length target proteins even in the absence of the nsAA74,74,91–93.

The AARSs that facilitate nsAA incorporation may exhibit overlap of substrate specificities,

which limits their simultaneous use for synthesis of proteins with different nsAAs47,49,94. In contrast,

the AARS enzymes are highly specific for their natural cognate AA, and together with other pre-

translational quality control processes allow a mistranslation rate of only 1 in 1042. Cross-talk of

nsAA incorporation machinery with standard AAs may lower the effectiveness of previously

demonstrated synthetic auxotrophy76 as promiscuous activity of the biphenylalanine (BipA)

incorporation machinery95 can promote escape. Many other nsAA applications, such as protein

double labelling, Förster resonance energy transfer (FRET), and antibody conjugation, require high

fidelity incorporation to avoid heterogeneous .

Currently, the identity of an incorporated amino acid is best determined by mass

spectrometry of the desired recombinant protein. We sought to develop a new detection system

13 with the following design criteria: (i) the ability to controllably mask and unmask misincorporation in vivo; (ii) compatibility with different reporter proteins; and (iii) customizability for most commonly used nsAAs. Here, we report how the N-end rule pathway of protein degradation, a

natural protein regulatory and quality control pathway conserved across prokaryotes and

eukaryotes96–98, applies to commonly used nsAAs. The N-end rule states that the half-life of a

protein is determined by its amino-terminal residue. Because components of the N-end rule

pathway interact specifically with a subset of AAs, we hypothesized that nsAAs may be N-end

stabilizing, whereas their standard AA analogs (Tyr/Phe/Trp/Leu/Lys/Arg) that are the most likely

culprits for misincorporation are known to be N-end destabilizing residues in E. coli, which result in

protein half-lives on the timescale of minutes98. We tested the effect of incorporation of commonly used nsAAs at the N-terminus and used our findings to develop “post-translational proofreading”,

which enables high-accuracy discrimination of nsAA incorporation in vivo. Post-translational

proofreading is a remarkably modular, generalizable, and tunable system for specific protein

recognition based on the identity of a single amino acid at the N-terminus, which is a position

increasingly targeted for applications in chemical biology99. We demonstrated that the ability to

optionally degrade proteins containing standard AA misincorporation events dramatically facilitates

directed evolution for selective nsAA incorporation machinery.

2.3 Results

2.3.1 Evaluation of the Biphenylalanine (BipA) OTS promiscuity

Four primary OTS families have been developed for nsAA incorporation by suppression of

UAG codons in targeted sequences76,77 (Figure 2a). We began by evaluating the promiscuity of the

BipA OTS, which is comprised of the BipARS aminoacyl-tRNA synthetase and tRNA derived from 𝑇𝑇𝑇𝑇𝑇𝑇 14 𝐶𝐶𝐶𝐶𝐶𝐶 the Methanocaldococcus jannaschii Tyr OTS (MjTyrRS and tRNA )95. We performed experiments 𝑇𝑇𝑇𝑇𝑇𝑇 𝐶𝐶𝑈𝑈𝑈𝑈 where each standard AA was introduced individually at an elevated concentration in minimal media

lacking BipA. These experiments suggested that Tyr and Leu were being misincorporated by the

BipA OTS in the absence of BipA (Figure 2b). Similarly, we determined by mass spectrometry that

target peptides produced upon expression of the BipA OTS but in the absence of BipA contained

90%+ Tyr/Leu/Phe at the target site, with glutamine (Gln) also present due to expected near-

cognate suppression4,100 (Figure 2c). This result confirmed that the BipA OTS was causing incorporation of standard AAs in the absence of an nsAA. As hypothesized, most of the standard

AAs that we observed to be incorporated were expected to destabilize proteins if present at the N- terminus according to the N-end rule pathway.

Figure 2 Evaluation of OTS promiscuity (A) Most common OTS families described in literature. The Trp OTS derived from S. cerevisiae, which is marked by an asterisk, is not one of the primary OTS families reviewed but has been used by multiple labs. (B) Minimal media sAA spiking experiment to investigate identity of misincorporated amino acid, where B represents BipA. (C) Mass spectrometry result showing the percentage of traces containing indicated sAAs in position X of trypsin digested Ub-X-GFP peptide grown in 2XYT. 2.3.2. Application of the N-end rule to commonly used nsAAs

To investigate how the N-end rule applies to nsAAs, we constructed a reporter consisting of a cleavable ubiquitin domain (Ub) followed by one UAG codon, a conditionally strong N-degron101,102,

and a super-folder green fluorescent protein (sfGFP) with a C-terminal His6x-tag (Figure 3a). This

reporter is designed such that nsAA incorporation is targeted at a site that is subsequently exposed

as the N-terminus. Depending on the identity of the incorporated AA in a given GFP protein, the

15 protein will either be stabilized or destabilized. We genomically integrated this reporter into an E.

coli strain that was genomically recoded to be devoid of UAG codons and their associated release

factor (strain “C321.ΔA”)7. The use of only one UAG codon increases assay sensitivity for promiscuity compared to the use of multi-UAG codon reporters73, and genomic integration of the

reporter increases reproducibility by eliminating plasmid copy number effects103. We began

experiments with a focus on BipA but eventually tested a panel of commonly used nsAAs (Figure

3b). As a proof-of-concept, we tested co-expression of different components of the BipA OTS

(“incorporation machinery”) with different components of the N-end rule pathway (“proofreading

machinery”) in the presence and absence of BipA (BipA- or BipA+). Expression of the orthogonal

tRNA alone was responsible for a moderate amount of GFP accumulation in cells based on 𝑇𝑇𝑇𝑇𝑇𝑇 𝐶𝐶𝐶𝐶𝐶𝐶 normalized FL/OD signal (Figure 3c). Expression of the BipARS together with tRNA resulted in 𝑇𝑇𝑇𝑇𝑇𝑇 𝐶𝐶𝐶𝐶𝐶𝐶 nearly equivalent signal in BipA- or BipA+ cases. Expression of an N-terminally truncated yeast

ubiquitin cleavase protein (UBP1)104,105 to expose the target residue as the N-terminal residue

caused ~4-fold reduction of the BipA- signal but no significant change in the BipA+ signal. The

decrease in only the BipA- signal supported our hypothesis that BipA would be N-end stabilizing.

BipA- signal decreased further upon overexpression of ClpS, the adaptor protein that directly binds

to N-terminal destabilizing residues (Tyr/Phe/Trp/Leu) on protein substrates and delivers them to

the ClpAP AAA+ protease complex for unfolding and degradation106. ClpS overexpression may

decrease substrate competition for proteins targeted for proofreading because ClpS is known to

inhibit ClpAP-dependent degradation of other substrates such as SsrA-tagged proteins107. Because

ClpS overexpression resulted in lower growth rates in LB medium, we performed subsequent

experiments in 2XYT medium, where we observed no differences in growth rates.

16

Figure 3 Posttranslational proofreading proof of concept (A) Scheme for proofreading consisting of N-end exposure and recognition steps applied to synthetic substrates. Ub is cleaved by ubiquitin cleavase UBP1 to expose the target site as N-terminal. ClpS is the native N-recognin in E. coli and ClpAP forms an AAA+ protease complex for degradation by the N-end rule pathway. (B) nsAAs used in this study (full chemical names in SI Appendix). (C) Incorporation assay showing fluorescence resulting from GFP expression normalized by optical density (FL/OD) in the absence/presence of BipA and expression of various OTS or N-end rule components. “Over” indicates overexpression of natively expressed components. Error bars represent SD, n = 3. (D) Heatmap of FL/OD signals obtained from an nsAA panel arranged roughly in descending size from left to right without proofreading occurring in Top row and with proofreading occurring in Bottom row. Left reflects activity of the Bipyridylalanine OTS and Right reflects activity of the p-acetyl-phenylalanine OTS. Heatmap values here and elsewhere are average of n = 3. To examine how the N-end rule applies to a larger set of nsAAs, we used the Bipyridinylalanine

OTS to screen 11 nsAAs because of its low nsAA- signal in our reporter assay (Figure 4). However, this OTS resulted in observable nsAA+ signal for only 5 out of 11 tested phenyl-nsAAs, with preference for large hydrophobic side chains at the para position of phenylalanine (Figure 3d).

Notably, nsAA+ signal for these 5 nsAAs was unaffected by proofreading. The p-Acetyl- phenylalanine OTS was used to test incorporation of the 6 remaining nsAAs and appeared to broadly increase signal for these 6 nsAAs with proofreading “off” (ie., no expression of UBP1/ClpS). 17 We observed marked differences in signal between proofreading “off” and “on” states based roughly on nsAA size. For p-Iodo-phenylalanine and larger nsAAs, signal did not significantly change, and therefore p-Iodo-phenylalanine and larger nsAAs appear N-end stabilizing. However, for p-

Bromo-phenylalanine and other smaller or polar phenyl-nsAAs such as p-Azido-phenylalanine, signal was significantly diminished when proofreading was “on” relative to when it was “off”. The data suggest that smaller deviations from Tyr/Phe are tolerated by the ClpS binding pocket, making smaller nsAAs such as p-Bromo-phenylalanine and p-Azido-phenylalanine appear N-end destabilizing.

Figure 4 Evaluation of OTS promiscuity for 6 OTS/nsAA sets Before performing a more comprehensive evaluation of how the N-end rule applies to commonly used nsAAs, we evaluated the promiscuity of 6 OTS/nsAA sets. Preliminary examination of two different families of OTSs (MjTyrRS and ScTrpRS) without PTP revealed that every OTS except for BipyARS/tRNA()* exhibited high nsAA- signal. With PTP, we observed significant decrease of nsAA- signal for all OTSs. However, we also observed significant decrease of nsAA+ signal for two nsAAs: p-Azido- phenylalanine (pAzF) and 5-Hydroxy-tryptophan (5OHW). We also observed degradation of pAzF and 5OHW in the PTP “On” state. This result motivated a more comprehensive examination of the N-end rule classification for commonly used nsAAs.

18 2.3.3 Engineering of the N-end rule for altered recognition of nsAAs

We hypothesized that we could engineer ClpS to alter N-end rule classification of these smaller nsAAs. We targeted four hydrophobic residues in the ClpS binding pocket for single point mutagenesis covering F/L/I/V using NTC-containing oligos (Figure 5a). Sequence alignments of ClpS

Figure 5 Proofreading tunability achieved through rational ClpS engineering (A) Cartoon generated from crystal structure of E. coli ClpS binding N-end Phe peptide (PDB ID code: 3O2B) showing four hydrophobic ClpS residues subjected to single-point mutations that sampled F/L/I/V. (B) Heatmap of FL/OD signals obtained using a ClpS− host expressing UBP1, the p-acetyl-phenylalanine OTS, and variants of ClpS in the presence of different nsAAs. (C) Cartoon generated from crystal structure of C. crescentus ClpS binding N-end Trp peptide (PDB ID code: 3GQ1). (D) FL/OD heatmap resulting from expression of UBP1, the 5- hydroxytryptophan OTS, and ClpS variants in the presence/absence of 5-hydroxytryptophan. Scale as in B. (E) FL/OD heatmap resulting from expression of UBP1/ClpS in strains with Ub-X-GFP reporter genes expressing standard AAs in place of X. homologs across prokaryotes and eukaryotes showed conservation of these residues among related hydrophobic amino acids (Figure 6). By screening the resulting 12 single mutants in a ClpS- deficient version of our reporter strain with select nsAAs and the p-Acetyl-phenylalanine OTS, we identified a variant (ClpSV65I) that resulted in stabilization of all screened N-end phenyl nsAAs while still degrading standard AAs (Figure 5b). In addition, we identified a variant (ClpSL32F) that 19 resulted in complete degradation of all but the two largest screened N-end phenyl nsAAs (Figure 7).

Figure 6 Sequence alignment sampling natural diversity of ClpS A) Ten bacterial sequences were obtained from UniProt and aligned using Clustal Omega (http://www.ebi.ac.uk/Tools/msa/clustalo/). (B) E. coli ClpS sequence alignment with UBR1 homologs present in yeast and humans. The four candidate positions for engineering that were identified using the crystal structure appear here to be conserved. However, three positions (32, 43, and 65) show capacity for substitution with other hydrophobic amino acids. We also attempted to distinguish tryptophanyl analogs from Trp using the 5-Hydroxy-tryptophan

(5OHW) OTS (Figure 5c). Although 5OHW appeared N-end destabilizing with wild-type ClpS, we

observed that ClpSV43I and ClpSV65I improved discrimination of 5OHW from Trp in the ClpS-deficient

strain (Figure 5d). Given the desirable properties of ClpSV65I, we wanted to examine whether it

alters N-end rule classification for standard AAs. We substituted the UAG codon in our GFP reporter for codons encoding a representative panel of standard AAs and found that ClpSV65I affects stability

of these N-end standard AAs no differently than ClpS (Figure 5e). Rational designs from our small

library can precisely distinguish small modifications on a variety of chemical templates, such as

nsAAs with phenyl as well as indole sidechains, showcasing the remarkably tunability of the

proofreading strategy. Interestingly, overexpression of either ClpS or ClpSV65I leads to degradation

of N-end I/V, residues that are previously shown to be only weakly N-end destabilizing in vitro108.

Proteolysis of native proteins containing N-terminal I/V may contribute to the toxicity observed

from ClpS overexpression.

20

Figure 7 Characterization of select ClpS variants on broader panels of nsAAs Left panel indicates that ClpSL32F recognizes all tested N-end nsAAs except BipA and pBnzylF. Right panel indicates that these ClpS variants decrease the degradation rate of peptides with N-end pClF, pFF, or pNitroF. 2.3.4 Application of proofreading for selective OTS evolution

The ability of proofreading to discriminate incorporation of intended nsAAs from related standard AAs is useful for high-throughput screening of OTS libraries. To demonstrate this, we integrated the UBP1-clpSV65I expression cassette into our ClpS-deficient reporter strain and used this strain (“C321.Nend”) to improve the selectivity of the parental BipA OTS. Previous efforts to

Figure 8 Selective BipA OTS evolution using proofreading (A) FACS evolution scheme with error-prone PCR aminoacyl-tRNA synthetase libraries transformed into hosts with posttranslational proofreading (“PTP”, using ClpSV65I) genomically integrated, followed by three sorting rounds. (B) Evaluation of enriched evolved BipARS variants in clean backgrounds on a panel of nsAAs ([BipA] = 100 μM, [rest] = 1 mM, which are their standard concentrations). The parental variant is noted as “P.” (C) In vitro amino acid substrate specificity profile of BipA OTS variants. Error bars = SD, n = 3.

21 engineer MjTyrRS variants like BipARS focused on site-directed mutagenesis on positions near the

amino acid binding pocket3,92. To generate a novel BipARS library, we used error-prone PCR to

introduce 2-4 mutations throughout the bipARS gene. These libraries were transformed into

C321.Nend and screened with three rounds of fluorescence-activated cell sorting (FACS): (i) positive

sort for GFP+ cells in BipA+; (ii) negative sort for GFP- cells in BipA- to expose the tendency for

misincorporation; (iii) final positive sort for GFP+ cells in BipA+ (Figure 8a). To obtain variants

exhibiting decreased promiscuity against other nsAAs, we altered the negative screening stringency

by varying addition of undesired nsAAs, which changed the profile of isolated variants (Figure 9).

Figure 9 FACS data from BipARS EP-PCR library exposed to negative screens of differing stringency Addition of nsAAs whose incorporation is undesired during intermediate rounds of selection in order to alter specificity profile of resulting selection. Purification and retransformation of the 11 most enriched variants into strain C321.Ub-UAG-sfGFP

(which lacks proofreading machinery) showed that most of our variants increased BipA+ signal and

22 decreased BipA- signal compared to the parental OTS (Figure 8b and Table 1 Sequences of evolved

BipA OTS variantsTable 1, Variants 1-6). Supplementation with undesired nsAAs enriched for mutants with even greater selectivity against standard AAs and undesired nsAAs (Variants 4, 9-11) but also gave rise to an extremely promiscuous variant (Variant 8), suggesting that these conditions may be nearly too harsh and facilitate emergence of cheaters.

Table 1 Sequences of evolved BipA OTS variants OTS tRNA BipARS Mutations Variant Mutations 1 N157K, I255F - 2 R257G - 3 R181C, E259V A22G 4 I153V, A214T C67A 5 P37A - 6 K76R - 7 - A22G 8 I49F, A130V, A233V C26T 9 L55M, G158S C29A 10 D61V, H70Q G51T N117D, D200Y, G210S, E237V, 11 G23A D286Y

One mutant only isolated in higher stringencies, Variant 10, exhibited high activity on BipA and no observable activity on any other nsAAs except tert-Butyl-tyrosine (tBtylY), whose structure is very similar to BipA and contains the inert tert-Butyl protecting group typically removed for further modification (Figure 8b). SDS-PAGE of Ub-X-GFP resulting from the Variant 10 OTS after expression and affinity purification showed no observable BipA- protein production in contrast to the parental

BipA OTS, which shows a distinct BipA- band (Figure 10a). Furthermore, mass spectrometry confirmed site-specific BipA incorporation in the BipA+ condition (Figure 10b-d).

23

Figure 10 Confirmation of BipA incorporation by mass spectrometry (MS) (A) SDS-PAGE gel of Ni- NTA purified Ub-X-GFP reporter proteins. (B) MS trace indicating incorporation of tyrosine in position X in peptide GGXLFVQELASK (positions 75-86 of Ub-X-GFP) using WT BipA OTS and no addition of BipA. (C) MS trace indicating incorporation of BipA in position X of the same peptide using WT BipA OTS in the presence of BipA. (D) MS trace indicating incorporation of BipA in position X of the same peptide using BipA 10 OTS in the presence of BipA.

We sequenced the aminoacyl-tRNA synthetase and tRNA regions of all enriched OTS variants and discovered spontaneous tRNA mutations in our most selective variants, such as 4, 9, and 10, perhaps arising because of our use of a MutS-deficient host (Table 1 and Figure 11a). None of the observed tRNA mutations occurred at the anticodon region, indicating that these tRNAs still lead to

24 incorporation at the UAG site and do not cause off-target incorporation. Because OTS variants 4, 9, and 10 caused limited fluorescent protein production in the absence of BipA, we can also

Figure 11 Spontaneous tRNA mutations observed in sorted variants and effect on selectivity (A) Positions of observed tRNA mutations on the predicted MjTyrRS tRNAopt structure. Note that the position of the BipA OTS Variant 10 tRNA is the most influential for interaction with elongation factor Tu (EF-Tu). (B) FL/OD measurements after cloning each combination of BipARS and tRNA variant. Each of the 3 variant tRNAs confers selectivity against standard amino acids (represented by the “No nsAA” case) regardless of the BipARS pairing. Variant 10 BipARS with Variant 10 tRNA is the most selective for BipA compared to the other nsAAs shown above. (C) In vitro amino acid substrate specificity of Variant 9 BipARS with WT tRNA or Variant 9 tRNA. (D) In vitro amino acid substrate specificity of Variant 10 BipARS with WT tRNA or Variant 10 tRNA. confidently state that these tRNAs are not being acylated by endogenous synthetases. When we

25 reverted these tRNA mutations, each corresponding BipA OTS became more promiscuous (Figure

11b), suggesting that observed tRNA mutations increase selectivity. The G51 position (G50 in E. coli

nomenclature) mutated in tRNA Variant 10 is the most significant base pair in determining acylated

tRNA binding affinity to elongation factor Tu (EF-Tu), which influences ribosomal incorporation

selectivity downstream of the aminoacyl-tRNA synthetase 109,110. To more rigorously assess OTS

selectivity, we purified BipARS and tRNA for the parental, Variant 9, and Variant 10 OTSs. The

observed in vitro substrate specificity as determined by tRNA aminoacylation is in excellent

agreement with our in vivo assays (Figure 8c), and the data suggests that BipARS and tRNA variants

each contribute to selectivity improvements (Figure 11c-d). The Variant 10 OTS exhibited the

highest selectivity for BipA and was chosen for subsequent applications.

2.3.5 Demonstration of enhanced biocontainment using the more selective OTS

To demonstrate the utility of a more selective OTS for biocontainment based on synthetic

auxotrophy, we substituted the parental BipA OTS construct previously used in three biocontained

strains that exhibit observable escape frequencies with containing either parental or

Variant 10 OTS. These three biocontained strains (adk.d6, tyrS.d8, and adk.d6/tyrS.d8) harbor

computational redesigns of two essential genes (adk and tyrS) to make their stability dependent on

BipA76. The effectiveness of synthetic biocontainment is evaluated by growing cells in permissive

media that contains nsAA and subsequently plating on non-permissive media that does not contain nsAA. In this manner, the fraction of cells that gain the undesired ability to grow without nsAA can be measured and this is the escape frequency.

26

Figure 12 Sample images of plates depicting biocontainment escape frequency estimation (A) Total CFU estimation on permissive media. (B) Escapee estimation on non-permissive media. We monitored escape frequencies on non-permissive media for seven days and observed lower

Figure 13 Effect of evolved BipA OTS on biocontainment strain escape and fitness (A) Escape frequencies over time for adk.d6 strains transformed with constructs indicated in legend below plots. Green and yellow circles compare escape frequencies of parent and evolved variants and are most relevant for this study. Navy circles represent previously published data (from ref. 3). Gray circles for adk.d6 represent our repeat of previously published data. KA, kanamycin+arabinose; SCA, SDS+chloramphenicol+arabinose. Error bars in A–C represent SEM, n = 3. (B) Escape frequencies over time for tyrS.d8 strains. Lines represent assay detection limit in cases where no colonies were observed. (C) Escape frequencies over time for adk.d6/tyrS.d8 strains. (D) Doubling time for biocontained strains with parental (P) or variant 10 OTS. Error bars = SD, n = 3. escape frequencies for strains containing the Variant 10 OTS at all measured time-points (Figure 12,

27 Figure 13c-d,). The difference in escape frequency was most apparent for the adk.d6/tyrS.d8 strain,

which exhibited a 7-day escape frequency of 7.4 X 10-9, a value more than two orders of magnitude lower than observed for any recoded E. coli strain containing only two genes engineered to depend

on an nsAA. Furthermore, the fitness of all three strains improved with the Variant 10 OTS, with

doubling time decreasing by nearly 2-fold (Figure 13d). The improved fitness is likely due to

decreased formation of destabilized Adk.d6 and TyrS.d8 proteins containing misincorporation

events, which should reduce burden on degradation machinery. Finally, Variant 10 also delayed

onset of growth of adk.d6/tyrS.d8 on non-cognate nsAAs (Table 2). We expect these benefits to

carry over to all strains which employ Variant 10 over the parental OTS. The increase in

containment efficacy and growth rate is significant for potential industrial uses of the biocontained

strain because it translates into the ability to safely grow 100-fold more cells in a reactor volume

without concern for an escapee that could propagate upon accidental environmental release.

Furthermore, these biocontained cells will grow more rapidly than first generation biocontained

cells, which can accelerate the rate of industrial production of metabolites or proteins.

Table 2 Growth of biocontained adk.d6/tyrS.d8 strain on 100 μM non-cognate nsAAs as represented by average incubation time required to achieve OD 0.05 (h) 100 uM nsAA WT 10 - DNO DNO BipA 10.1 10.9 pBnzylF 46.4 DNO tBtylY 10.4 13.3 NapA 34.1 40.3 pAcF DNO DNO pAzF DNO DNO DNO: Did not observe within a 48 hour incubation period 2.4 Discussion

We have demonstrated how the N-end rule pathway of protein degradation applies to commonly used nsAAs and how the pathway can be engineered for altered N-end rule classification 28 of these molecules. We harnessed these findings to develop our post-translational proofreading

method, which eliminates most false positive protein expression (at the N-terminus) and therefore

improves the ability to determine and increase the selectivity of OTSs used for nsAA incorporation.

Furthermore, the capability of proofreading to distinguish among nsAAs will facilitate future efforts

to simultaneously harness more than 21 amino acids. We validated proofreading during evolution

of the BipA OTS, which resulted not only in greater selectivity in vivo and in vitro but also in

enhanced biocontainment efficacy and greater strain fitness in all tested biocontained strains.

Compared to strategies that feature toxin-antitoxin systems or metabolic auxotrophs111–115, the

strategy of synthetic auxotrophy has been the most effective biocontainment strategy reported in

the literature in terms of limiting cell growth to conditions that are not naturally available and

resulting in escape rates below a detection limit of 10-12 76. Our work shows how OTS selectivity

influences the effectiveness of synthetic auxotrophy and generates the most selective OTS for the most effective biocontainment strategy available.

In addition to providing a new paradigm for OTS evaluation and evolution, post-translational proofreading can be transformative for applications in which the identity of a single amino acid is critical, such as screening of natural synthetases for nsAA acceptance116, sense codon

reassignment117,118, post-translational modifications119, and for industrial uses where purity is

extremely important, such as nsAA-containing biologics120. The ability to discriminate between

nsAAs and standard AAs may prove especially useful for reassigning sense codons, where novel screening and analytical methods are required because UAG readthrough is independent of the nature of the AA. For industrial production of proteins containing nsAAs, significant cost savings may be obtained through biosynthesis of nsAAs rather than supplementation121, and the use of

29 nsAA biosynthetic pathways adds additional motivation to the need for selective OTSs that do not

recognize structurally similar precursors to the desired nsAA. Proofreading may also find use in

translational regulation and as an orthogonal biocontainment strategy.

It may be possible to expand the applicability of proofreading to all 20 standard AAs given that

they are all known to be N-end destabilizing under certain contexts122. The feasibility of increasing

the set of N-end destabilizing AAs by engineering or importing conditionally expressed N-recognins across organisms has only begun to be explored. An engineered methionine aminopeptidase with broader substrate specificity has been expressed successfully in E. coli, presumably increasing the number of native substrates of N-end rule degradation123. In addition, aspartate and glutamate

have previously been converted to N-end destabilizing residues in E. coli using the bacterial

aminoacyl-transferase Bpt from Vibrio vulnificus124. These results suggest that E. coli can tolerate some increases in the number of native proteins that are likely subject to N-end degradation

despite the impact that potential degradation of essential proteins may have on cell viability.

Future attempts to engineer the N-end rule may shed additional light on how pathway components

evolved. Because small changes in N-recognin binding pocket size can strongly influence

recognition of unnatural analogs, our work suggests that natural N-recognin homologs may vary

considerably in their nsAA recognition profiles. Characterization of natural diversity may increase

the number of useful modules for future proofreading efforts.

30 Chapter 3: Release Factor Inhibiting Antimicrobial Peptides Improve

Nonstandard Amino Acid Incorporation in Wild-type Bacterial Cells

This chapter was previously published as “Release Factor Inhibiting Antimicrobial Peptides Improve

Nonstandard Amino Acid Incorporation in Wild-type Bacterial Cells” with the following authors:

Erkin Kuru, Rosa-Maria Määttälä, Karen Noguera, Devon A. Stork, Kamesh Narasimhan, Jonathan

Rittichier, Daniel Wiegand, and George M. Church

ACS Chemical Biology 2020 15 (7), 1852-1861

3.1 Abstract

We report a tunable chemical approach for enhancing genetic code expansion in different wild-type bacterial strains that employs apidaecin-like, anti-microbial peptides observed to

temporarily sequester and thereby inhibit Release Factor 1 (RF1). In a concentration-dependent

matter, these peptides granted a conditional lambda phage resistance to a recoded Escherichia coli

strain with non-essential RF1 activity and promoted multi-site non-standard amino acid (nsAA)

incorporation at in-frame amber stop codons in vivo and in vitro. When exogenously added, the

peptides stimulated specific nsAA incorporation in a variety of sensitive, wild-type (RF1+) strains

including Agrobacterium tumefaciens, a species in which nsAA incorporation has not been

previously reported. Improvement in nsAA incorporation was typically 2–15-fold in E. coli BL21,

MG1655, DH10B strains and A. tumefaciens with the >20-fold improvement observed in probiotic E.

coli Nissle 1917. In-cell expression of these peptides promoted multi-site nsAA incorporation in

transcripts with up to 6 amber codons, with a >35-fold increase in BL21 showing moderate toxicity.

Leveraging this RF1 sensitivity allowed multiplexed partial recoding of MG1655 and DH10B that

31 rapidly resulted in resistant strains that showed an additional ~2 fold boost to nsAA incorporation independent of the peptide. Finally, in-cell expression of an apidaecin-like peptide library allowed

the discovery of new peptide variants with reduced toxicity that still improved multi-site nsAA

incorporation >25-fold. In parallel to genetic reprogramming efforts, these new approaches can

facilitate genetic code expansion technologies in a variety of wild-type bacterial strains.

3.2 Introduction

Proteins are translated by ribosomal decoding of messenger into polypeptides. This

process continues until a stop codon is reached and a specific release factor (RF) is recruited to

terminate translation. For each new residue, the correct amino acid is selected using specific

transfer RNAs (tRNAs) as adaptors, which are selectively charged with one of the naturally

occurring amino acids by aminoacyl tRNA synthetases (aaRS). The protein translation machinery of

numerous bacterial and eukaryotic species has been successfully engineered to site-specifically

encode a variety of different synthetic, non-standard amino acids (nsAAs) in target proteins (Figure

14)3,24,100,125,126. This technology enables the expansion of the protein chemistry space with a

32 diverse set of new synthetic functionalities that include fluorescence, photo crosslinking,

bioorthogonal tags and post-translational modifications among many others24,67,100,125–128.

Figure 14 Function of Apidaecins in Genetic Code Expansion Genetic code expansion enables incorporation of non-standard amino acids (nsAAs) into proteins by providing orthogonal aminoacyl tRNA synthetase (aaRS)/tRNACUA pairs to insert nsAAs at in-frame amber stop codons. a) Amber codon suppression allows nsAA incorporation at UAG codons, but such incorporation competes with RF1 which terminates translation. Our strategy uses apidaecins to inhibit RF1. b) Amino acid alignment of a naturally occuring Apidaecin 1b (Api1b) and its potent synthetic analog Api137 used in this work. (O: ornithine, gu: N,N,N′,N′-tetramethylguanidino) Typically, genetic code expansion technologies rely on the use of engineered orthogonal

aaRSs that do not charge native tRNAs or natural amino acids, but instead charge a specific nsAA to an amber-suppressing cognate tRNACUA, which is not recognized by native aaRSs. As a result, the

ribosome site-specifically incorporates this nsAA into target proteins with in-frame amber (UAG)

stop codons (Figure 14 a). In bacteria, this approach is used to encode nsAAs of different sizes and

functionalities3,24,60,126,127,129 without any significant incorporation to native/off-target amber stop

codons126,130. Due to extensive optimizations over the last decades, encoding a single nsAA into a 33 target protein via amber suppression can now provide expression yields close to native expression in wild-type (RF1+) E. coli3,92,131,132. However, efficiencies of incorporation begin to drop when a certain nsAA is encoded in a multi-site manner in a single polypeptide. This is chiefly because amber codon-specific RF1 competes with the nsAA-charged tRNACUA , which can also lead to formation of undesired, truncated protein products11. Multi-site incorporation of an nsAA can amplify the desired new chemical property in a target protein toward a variety of new applications, such as improving the biostability of protein therapeutic agents31, facilitating the development of new antibody-drug conjugates133,134, or modulating the brightness of a fluorescent nsAA-labeled protein.

Therefore, efficient cellular production of proteins with multiple nsAAs is an outstanding challenge for realizing the potential of genetic code expansion.

This limitation can be addressed in strains in which RF1 can be deleted4,7,135–137. For example, whole-genome recoding of all 321 native amber stop codons (TAG) to ochre (TAA) in

Escherichia coli MG16557 allowed for the unconditional deletion of RF1 resulting in the strain

C321.ΔA (Addgene #48998 referred here as C321.ΔRF1) and enabling efficient and accurate multi- site nsAA incorporation73. However, such recoding approaches are resource-intensive and require the availability of powerful genetic tools and an advanced understanding of genome architecture and function and have been accomplished in only a few E. coli strains. Instead of genetically knocking out RF1, we sought to develop a modular, chemical genetics approach relying on a RF1- inhibiting small molecule. Recently, apidaecins, a broad class of proline-rich antimicrobial peptides first isolated from honeybees138–140 (Figure 14), were shown to have RF1- and RF2-inhibiting activity141–143. Inspired by this newly elucidated mechanism of action, here we show that apidaecins-like peptides serve as an effective agent to improve site-specific, multi-site nsAA

34 incorporation in a variety of wild-type bacterial species. This approach allows for rapid, transient, and tunable RF1 inhibition in a variety of strains where the alternative option of recoding may be technically unfeasible.

3.3 Results

3.3.1 Apidaecins improve nsAA incorporation in a cell-free translation system

In wild-type bacteria, premature translation termination by RF1 limits amber suppression and incorporation of nsAAs into target proteins11. Recent work with bacterial cell-free translation systems tied the antimicrobial activity of apidaecins to their ability to trap and deplete release factors RF1 and RF2 in E. coli (Figure 14)141–143. Given the limited abundance of RF1 relative to RF2 and ribosomes in the bacterial translation machinery10, we hypothesized that apidaecins could promote nsAA incorporation in response to amber suppression in a dose-dependent manner. As a first test of this idea, we expressed a super folder GFP (sfGFP) reporter DNA template carrying an in-frame amber codon (T7-(UAG)1-sfGFP) in a modified real-time, cell-free translation monitoring

144,145 system based on PURExpress® in the presence of an orthogonal aaRS/tRNACUA pair (MjBpaRS and ) and its cognate nsAA, Bpa. We found we could obtain sfGFP protein at levels 𝑇𝑇𝑇𝑇𝑇𝑇 𝐶𝐶𝐶𝐶𝐶𝐶 comparable𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 to a control template without an amber codon (T7-(UAG)0-sfGFP) when the system lacks RF1 (using the specialized PURExpress® Δ RF123 kit) (Figure 15a). However, the addition of

RF1 reduced the signal ~18 fold (Figure 15a). When RF1 was present, both naturally occurring Api1b and its synthetic analog Api137 (Figure 15b), promoted nsAA-dependent increase of T7-(UAG)1-

35 sfGFP signal (Figure 15b). sfGFP yields were concentration dependent with Api137 having a significantly greater effect than the same concentrations of Api1b (Figure 15b).

Figure 15 Apidaecins improved nsAA incorporation in a cell-free translation system a) In the cell-free protein translation system PURE, addition of purified MjBpaRS/〖tRNA〗_CUA^Tyr pair and their cognate nsAA, Bpa expressed an in-frame amber containing sfGFP construct (T7-(UAG)1-sfGFP) comparably to a construct without ambers (T7-(UAG)0-sfGFP). Api137 inhibited cell-free translation at concentrations higher than 80 μg mL-1. b) At the same concentrations, Api137, a synthetic apidaecin analog, promoted nsAA-dependent increase of T7-(UAG)1-sfGFP signal significantly better than the naturally occurring Api1b. Maximum relative fluorescence units (RFU) for each condition are shown on right panels. ****, P <0.0001 Moreover, 80 µg mL-1 Api137 promoted ~10-fold increase of the nsAA-dependent expression of a sfGFP reporter carrying two in-frame amber codons (T7-(UAG)2-sfGFP) (Figure 16a).

36 Higher concentrations of Api137 (>80 µg mL-1) appeared to inhibit the translation of both the

reporter and the control templates (Figure 15a, Figure 16a). These results are consistent with

Api137 dependent depletion of ribosomes in conjunction with RF1 and RF2 at high

concentrations141. These results also suggest that apidaecin dosage could be modulated to favor

RF1 depletion in cell lysates or other in vitro translation systems from other RF1+ organisms146,147.

3.3.2 Apidaecins preferably inhibit RF1 in bacteria

Figure 16 Apidaecins improve nsAA-dependent reporter expression in vitro and in different bacteria a) In a dose dependent manner, Api137 improved nsAA-dependent (UAG)2-sfGFP expression in an RF1+ cell- free translation sytem, PURExpress®. The ΔRF1 condition represents the same conditions applied in a specialized PURExpress® Δ RF123 kit lacking RF1. b) 3 distinct orthogonal aaRS systems and their cognate nsAAs used in this work. Cou is a fluorescent nsAA. c–e) In a dose dependent manner and with modest effect in final cell density, Api137 improved nsAA-dependent reporter expression indicative of increased c) AbK incorporation in E. coli BL21, d) Bpa incorporation in E. coli Nissle 1917 and e) in Agrobacterium tumefaciens C58 cells. ****, P <0.0001 A variety of Gram-negative bacterial species, including E. coli and Agrobacterium

tumefaciens are known to be sensitive to apidaecins10,138. In liquid medium, both Api1b and Api137

inhibited the total cell mass and the growth rate in a concentration dependent manner of common

wild-type E. coli strains MG1655, BL21, DH10B, and 1917 (a standard probiotic strain), and A.

37 tumefaciens C58 (Figure 18). As a general trend, Api137 was a significantly more potent inhibitor than Api1b. In addition, the growth of the engineered E. coli strains in which all UAG codons were replaced by UGAs (C321.RF1 and C321.ΔRF1, derived from MG16557, where RF1 was retained as wild-type in C321.RF1 but deleted in C321.ΔRF1) was minimally inhibited even at the highest

Api137 concentrations tested (2560 µg mL-1, Figure 18). On solid media, 125 µg mL-1 Api137 inhibited colony formations of wild-type E. coli strains, MG1655, BW25113, BL21, and DH10B.

However, C321.ΔRF1 was resistant to Api137 even at the highest concentration tested, 750 µg mL-1

(Figure 17). These results suggest that apidaecins preferably inhibit RF1 in E. coli and do not cause significant cell toxicity when RF1 function is redundant.

Figure 17 Apidaecins inhibit colony formation of different E. coli strains where RF1 function is essential In LB solid media, Api137 (down to 125 μg mL-1) was toxic to cells from different E. coli strains, except the recoded MG1655 lacking native UAGs and RF1 (C321.ΔRF1). ECNR2gO* is the MG1655 non-recoded and RF1+ parental strain of C321. C321.ΔRF1 is also resistant to different E. coli-specific , such as λ phage, that require RF1 activity in order to accurately express lytic genes ending with the UAG codon (Figure

19a)7,90. In contrast, the recoded E. coli that still contains RF1 (C321.RF1) is sensitive to phages

((Figure 19b)7,90. Upon the induction of the λ phage , Api137 did not affect the growth of the C321.ΔRF1 with genomically integrated λ, C321.ΔRF1 (λcl857), however, Api137 rescued the 38 growth of otherwise isogenic C321.RF1 (λcl857) (Figure 19a–b). In this strain, Api137 inhibited the

λ phage lytic cycle and conferred a ‘conditional’ phage resistance in a dose-dependent manner

((Figure 19b); a phenotype that could be exploited as a conditional biocontainment system in genetically modified organisms76.

39

Figure 18 Apidaecins are toxic to different Gram-negative bacteria where RF1 function is essential In a dose-manner, apidaecins inhibited growth of different Gram-negative bacteria in liquid media. This inhibition typically manifested itself as a reduction in final cell mass or as a prolonged doubling time (right panels). In general, Api137 was a more potent inhibitor than Api1b, and RF1+ strains (including C321.RF1) was more sensitive than C321.ΔRF1. Among E. coli strains tested, Nissle 1937 was the strain that was the most sensitive to apidaecins followed by BL21, DH10B, MG1655, C321.RF1 and C321.ΔRF1. ****, P <0.0001; **, P ≤ 0.007; *, P = 0.0116 40 3.3.3 Apidaecins improve nsAA-dependent sfGFP expression in different bacteria

We next tested if we could utilize preferential RF1-inhibiting activity of apidaecins to promote nsAA incorporation in response to amber stop codon(s) in live bacteria. We focused on three previously engineered primary classes of orthogonal aaRS and their cognate tRNACUA pairs.

These include a Methanocaldococcus jannaschii (Mj) tyrosyl-RS, MjBpaRS148, previously evolved to charge an orthogonal with Bpa, a photo-crosslinker nsAA. We also utilized a variant of 𝑇𝑇𝑇𝑇𝑇𝑇 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝐶𝐶𝐶𝐶𝐶𝐶 Saccharomyces cerevisiae (Sc) tryptophanyl-RS, Sc5OHWRS, that charges an orthogonal 𝑇𝑇𝑇𝑇𝑇𝑇 𝐶𝐶𝐶𝐶𝐶𝐶 with 5OHW60,127, a serotonin precursor. We also adopted a variant of Methanosarcina barkeri𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 (Mb) pyrrolysyl-RS, MbAbKRS, that can charge an orthogonal with both AbK, a photo- 𝑃𝑃𝑃𝑃𝑃𝑃 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝐶𝐶𝐶𝐶𝐶𝐶

41

Figure 19 Apidaecins confer conditional phage resistance and improve nsAA incorporation in E. coli cells with redundant RF1 functionality a) In liquid media, C321.ΔRF1 is resistant to λ phage-induced and to Api137. b) In a dose dependent manner, Api137 rescued lysis of C321.RF1 (λcl857) cells upon λ phage lytic cycle induction. c) Api1b stimulated AbK-dependent sfGFP signal increase in C321.RF1, but not in C321.ΔRF1 cells expressing MbAbKRS / system. d) Api1b improved 5OH- and Bpa-dependent sfGFP signal increase in C321.RF1 expressing 𝑃𝑃𝑃𝑃𝑃𝑃 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝐶𝐶𝐶𝐶𝐶𝐶 Sc5OHWRS/ and MjBpaRS/ systems. ****, P <0.0001. 𝑇𝑇𝑇𝑇𝑇𝑇 𝑇𝑇𝑇𝑇𝑇𝑇 𝐶𝐶𝐶𝐶𝐶𝐶 𝐶𝐶𝐶𝐶𝐶𝐶 crosslinker𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 nsAA and BocK, a chemically𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 protected lysine derivative that is more readily available than AbK129. Finally, in order to directly measure the nsAA-dependent fluorescence signal increase, we also included MjCouRS126 evolved to charge an orthogonal with a fluorescent nsAA, 𝑇𝑇𝑇𝑇𝑇𝑇 𝐶𝐶𝐶𝐶𝐶𝐶 Cou (Figure 16b). To probe site-specific nsAA incorporation in live𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 cells of various strains, we expressed these aaRS/tRNACUA pairs constitutively on one plasmid and in parallel an inducible sfGFP reporter on another and used a range of sfGFP constructs containing different numbers of in-frame

UAG codons. We quantified nsAA incorporation using established bulk culture plate-reader assays

42 and by normalizing sfGFP signal to final optical density127 or via more sensitive and higher

information single-cell microscopy149,150.

We first tested the effect of Api1b on nsAA incorporation in C321.ΔRF1 and C321.RF1,

already shown to be resistant to apidaecins. Unsurprisingly, the presence of RF1 in C321.RF1

caused a ~80% reduction in nsAA-dependent sfGFP signal in comparison to C321.ΔRF1 (Figure 19c).

Exogenously added Api1b partially rescued nsAA-dependent sfGFP signal in C321.RF1 but not in

C321.ΔRF1 (Figure 19c–d). This effect was dose-dependent and improved the reporter expression

1.4–2-fold in C321.RF1 co-expressing MbAbKRS, Sc5OHWRS, or MjBpaRS systems (Figure 19c–d).

These results suggest that apidaecins preferably inhibit RF1 in live cells and therefore stimulate

nsAA incorporation. As apidaecins also inhibit RF2141 but are not toxic to C321.ΔRF1, these peptides

may also facilitate nsAA incorporation into the UAA/UGA codons in parallel to the UAG codon in

this strain151.

We hypothesized that by tuning the apidaecin dosage and exposure, we could promote

nsAA incorporation also in sensitive bacterial strains that have not been recoded and retain their

native UAG codons and RF1. When added in the late-exponential phase, Api137 promoted AbK–

dependent sfGFP signal increase ~2-fold, indicative of increased nsAA incorporation, in E. coli BL21 in a dose-dependent manner with minimal inhibitory effect on the final cell density (Figure 16c).

Consistent with its higher inhibitory potential140, Api137 improved nsAA-dependent signal more

than Api1b (Figure 16c and Figure 20a). These results motivated us to test the effects of apidaecins

in other, sensitive strains where nsAA technology has not been demonstrated. In probiotic E. coli

Nissle 1917 with plasmids expressing inducible (UAG)2-sfGFP and constitutive MjBpaRS/ , 𝑇𝑇𝑇𝑇𝑇𝑇 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝐶𝐶𝐶𝐶𝐶𝐶

43 Api137 improved Bpa-dependent sfGFP expression dramatically (>23–fold, Figure 16d), potentially due to inherently higher activity of RF1 in this strain.

Figure 20 Exogenously added apidaecins improve nsAA-dependent sfGFP signal increase in different bacteria a) in a dose-dependent manner and with minimal effect in final cell density (manifested by the drop of the relative final OD600 compared to no drug controls), Api1b improved AbK-dependent sfGFP signal increase in E. coli BL21 cells, b) A. tumefaciens (constitutively expressing MjBpaRS/〖tRNA〗_CUA^Tyr ) expressed (UAG)1- sfGFP optimally at 50 μM Bpa. c–d) Tandem mass spectrometry (MS/MS) fragmentation analysis of the trypsin digested (UAG)1-sfGFP construct from A. tumefaciens (grown in LB) confirmed Bpa incorporation at the expected amber position. c) A MS-MS spectra for a representative Bpa-containing peptide. d) Relative abundances are based on ion count of detected relevant peptides with Bpa and the other natural amino acids. Similarly, double transformation of the Gram-negative bacterium, A. tumefaciens, with plasmids expressing codon optimized, inducible (UAG)1-sfGFP and constitutive MjBpaRS/ 𝑇𝑇𝑇𝑇𝑇𝑇 𝐶𝐶𝐶𝐶𝐶𝐶 showed a Bpa-dependent increased sfGFP expression suggestive of nsAA incorporation (Figure𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡

20b). Tandem mass spectrometry (MS/MS) analysis confirmed the incorporation of Bpa at the expected UAG codon (~63% of the total ions) on top of a high confidence evidence for glutamate

44 incorporation (~26% of the total ions) among lower confidence evidence for tyrosine (~9% of the total ions), lysine and methionine (Figure 20c–d). These results demonstrated the expansion of the nsAA technology to A. tumefaciens, to our knowledge, for the first time. In this strain too, Api137 improved Bpa-dependent sfGFP expression in a dose-dependent manner (up to ~3.1–fold) with moderate toxicity (Figure 16e). These results suggest that apidaecins could facilitate functional nsAA experiments, such as site-specific probing of protein-protein interactions by photo-crosslinker nsAAs in previously uncharacterized bacterial strains.

As a general trend, the presence of apidecins increased background sfGFP signal also in the absence of added nsAAs (Figure 16c–d, Figure 20a–b). This is likely a consequence of RF1- inhibition, as this phenomenon is widely reported in ΔRF1 strains127,152. Although the exact reasons for this background signal in ΔRF1 strains are still unclear, the contributing factors are linked to a combination of promiscuity of engineered AARSs for natural amino acids, increased near cognate suppression and codon skipping in the absence of cognate nsAAs in the medium127,130,152.

3.3.4 A new auto-inducible plasmid system to encode nsAAs

In RF1+ cells, nsAA-dependent protein expression efficiency is known to decay significantly if multiple nsAAs are encoded in a polypeptide11. We wanted to examine if apidaecins could address this problem, but to accurately observe expected low levels of protein expression we switched to a new system that features lower background noise and higher experimental reproducibility. We first cloned our aaRS/ tRNACUA pairs into a pDule plasmid (p15A origin, TcR, aaRS and tRNA genes constitutively expressed). In addition, into a pBAD

45

Figure 21 Media effects on nsAA incorporation combined with api treatment a) The Sc5OHWRS/ in LB results in high back-ground (no nsAA) signal occluding the effects of apidaecins. b) The new autoinducable𝑇𝑇𝑇𝑇𝑇𝑇 reporter system in GMML minimal media results in strong signal (+ nsAA) over back- 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝐶𝐶𝐶𝐶𝐶𝐶 ground (no nsAA). In a dose dependent manner, apidaecins improve nsAA-dependent sfGFP expression in both

BL21 (a–d) and DH10B (e–f) expressing PopZ-(UAG)2-sfGFP reporter and Sc5OHWRS/ (a,b,g), or MbAbKRS 𝑇𝑇𝑇𝑇𝑇𝑇 / (c,e), or MjBpaRS/ (d,f) systems. b) In BL21 cells expressing the Sc5OHWRS/ and PopZ- 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝐶𝐶𝐶𝐶𝐶𝐶 𝑃𝑃𝑃𝑃𝑃𝑃 𝑇𝑇𝑇𝑇𝑟𝑟 𝑇𝑇𝑇𝑇𝑇𝑇 (UAG)2-sfGFP, Api137, or Api1b improved sfGFP signal up to ~4, or ~5 fold. c) In BL21 cells expressing the MbAbKRS 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝐶𝐶𝐶𝐶𝐶𝐶 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝐶𝐶𝐶𝐶𝐶𝐶 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝐶𝐶𝐶𝐶𝐶𝐶 / and PopZ-(UAG)2-sfGFP, Api137 improved OD600 normalized sfGFP signal up to ~10 fold. d) In BL21 cells 𝑃𝑃𝑃𝑃𝑃𝑃 express𝐶𝐶𝐶𝐶𝐶𝐶ing the MjBpARS/ and PopZ-(UAG)2-sfGFP, Api137, or Api1b improved sfGFP signal up to ~13, or 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑇𝑇𝑇𝑇𝑇𝑇 ~14 fold. e) In DH10B expressing the MbAbKRS / and PopZ-(UAG)2-sfGFP, Api137 improved sfGFP signal 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝐶𝐶𝐶𝐶𝐶𝐶 𝑃𝑃𝑃𝑃𝑃𝑃 up to ~3 fold. f) In DH10B expressing the MjBpARS/ 𝐶𝐶𝐶𝐶𝐶𝐶 and PopZ-(UAG)2-sfGFP, Api137 improved sfGFP signal 𝑡𝑡𝑡𝑡𝑁𝑁𝐴𝐴 𝑇𝑇𝑇𝑇𝑇𝑇 up to ~2 fold. g) In DH10B expressing the Sc5OHWRS/ and PopZ-(UAG)2-sfGFP, Api137, or Api1b improved 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝐶𝐶𝐶𝐶𝐶𝐶 sfGFP signal up to ~16, or ~3 fold. ****, P <0.0001; ***, P ≤ 𝑇𝑇𝑇𝑇𝑇𝑇0.005; **, P ≤ 0.0076. 𝑡𝑡𝑡𝑡𝑡𝑡𝐴𝐴46𝐶𝐶𝐶𝐶𝐶𝐶

plasmid, we cloned sfGFP reporters with 2, 6, or 8 UAG in-frame codons (in addition to a 0 UAG

control) as a C-terminal fusion of the arabinose-inducible Caulobacter PopZ that forms polar, sub-

cellular foci in E. coli153. This new plasmid system allows late-exponential auto-induction of

reporters in glycerol minimal media (GMML) supplemented with glucose and arabinose154,155. The

auto-induction in GMML increased the reproducibility of nsAA incorporation experiments and

reduced the background sfGFP signal in the absence of added nsAAs, a non-specific signal that is

known to be exacerbated by the excess natural amino acids in rich media127. For example, BL21

cells expressing Sc5OHWRS/ in GMML had less ‘no nsAA’ signal than when the same 𝑇𝑇𝑇𝑇𝑇𝑇 𝐶𝐶𝐶𝐶𝐶𝐶 experiment was performed in𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 rich medium (Figure 21a–b). In addition, apidaecins improved

5OHW–dependent sfGFP signal only ~1.3 fold in LB and up to ~5 fold in GMML (Figure 21a–b). One possible explanation for the improved expression observed in this system is that growth in minimal

media can increase uptake of peptides like apidaecins and/or of nsAAs127,156. Added to late-

exponential cells, apidaecins improved nsAA-dependent reporter expression in both BL21 and

DH10B indicative of increased nsAA incorporation: In BL21 cells with the PopZ-(UAG)2-sfGFP

reporter and expressing MbAbKRS, Sc5OHWRS, or MjBpaRS systems, Api137 or Api1b improved

nsAA–dependent sfGFP increase 4-14 fold, depending on the AARS/nsAA used (Figure 21b–d). In

DH10B cells this improvement varied between 2 to 16-fold (Figure 21e–g).

3.3.5 Apidaecins improve specific nsAA incorporation

To directly and quantitively link the nsAA-dependent sfGFP signal increase in the presence

of apidaecins to increased nsAA incorporation, we utilized the MjCouRS/ system encoding 𝑇𝑇𝑇𝑇𝑇𝑇 𝐶𝐶𝐶𝐶𝐶𝐶 the fluorescent nsAA, Cou. In a bulk plate reader assay with DH10B, Api137𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 treatment improved

both Cou and Cou–dependent PopZ-(UAG)2-sfGFP signals (Figure 22). These results directly

47 supported that apidaecins increase nsAA-dependent sfGFP signal because they promote nsAA incorporation. For a given aaRS/tRNACUA system, C321.ΔRF1 cells represent the current limit of high nsAA incorporation efficiencies7,11. In order to estimate the extent to which apidaecins can promote

Figure 22 Apidaecins improve specific, multi-site incorporation of Cou a) Bulk measurements of DH10B expressing MjCouRS/〖tRNA〗_CUA^Tyr and PopZ-(UAG)2-sfGFP show that Api137 treatment increases spectrally distinct sfGFP and Cou signals to the comparable extend. b) Micrographs showing subcellular signals from Cou (false colored in red) and PopZ-(UAG)6-sfGFP fusion (false colored in green) colocalized at the poles of the DH10B E. coli cells imaged in phase, DAPI, and EGFP channels and overlayed on phase (false colored in blue) c) Violin plots of single cell quantification by light microscopy showed that Api137 improved both Cou and sfGFP signals comparably. An exception was DH10B cells expressing PopZ- (UAG)6-sfGFP treated with highest concentration of Api137 tested (100 µg mL-1). Under these conditions, the Cou signal improvement was ~2 fold, but the sfGFP signal improvement was ~5 fold. The scale bars are 2 µm. ****, P <0.0001; ***, P ≤ 0.004; **, P = 0.0035. . 48 multi-site Cou incorporation, we next compared the case with C321.ΔRF1 to its RF1+ and UAG+

11 parent, MG1655. Similar to previous estimations , in C321.ΔRF1, Cou–dependent PopZ-(UAG)6- sfGFP expression was around 12% of PopZ-(UAG)0-sfGFP and was minimally affected by Api137

(Figure 23a). Under the same conditions, in MG1655, Api137 improved the expression levels of the

PopZ-(UAG)6-sfGFP from ~1.8% to ~4.4% of the PopZ-(UAG)0-sfGFP, a ~2.4-fold increase (Figure

23a). These results suggest that apidaecins can remarkably promote multi-site nsAA incorporation in unrecoded strains retaining RF1.

In order to test if apidaecins cause significant non-specific signal, we devised a quantitative, single cell fluorescence microscopy approach that is sensitive enough to resolve sub-cellular nsAA incorporation. As opposed to diffuse localization of typical GFP constructs157, PopZ fusion recruits the nsAA incorporated sfGFP specifically to cell poles, as demonstrated here by the colocalization of spectrally distinct signals from fluorescent nsAA, Cou and sfGFP in E. coli MG1655 expressing PopZ-

(UAG)6-sfGFP and MjCouRS/ (Figure 23b). Two lines of evidence distinctly supported that 𝑇𝑇𝑇𝑇𝑇𝑇 𝐶𝐶𝐶𝐶𝐶𝐶 Api137 did not cause significant𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 non-specific signal. First, Api137 did not change the Cou signal ratio of the ‘fluorescent poles’ to the rest of the cell body in neither C321.ΔRF1 nor MG1655 cells expressing PopZ-(UAG)6-sfGFP (Figure 23c). Because C321.ΔRF1 represents the case without any

UAG containing native genes, the lack of change in this ratio also in the UAG+ parent suggests that apidaecins do not cause significant non-specific Cou incorporation into genomic amber codons in wild-type E. coli.3 This is consistent with previous observations about the stop codon context effect preventing nsAA incorporation into native amber stop codons130.

Second, single cell quantification by light microscopy in DH10B cells expressing the

MjCouRS/ and PopZ-(UAG)2-sfGFP or PopZ-(UAG)6-sfGFP, showed that Api137 treatment 𝑇𝑇𝑇𝑇𝑇𝑇 𝐶𝐶𝐶𝐶𝐶𝐶 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 49 improved both Cou and sfGFP signals comparably (~2 fold) without affecting the colocalization of

Cou or sfGFP signals (Figure 23d–e and Figure 22 7b-c). These results suggest that apidaecins do not cause significant undesired amber-suppression in a target protein, e.g. by promoting non- specific incorporation of natural amino acids into PopZ-(UAG)2-sfGFP reporter. Altogether, these results indicate that apidaecins improve multi-site nsAA incorporation and cause minimal non- specific incorporation events.

50

Figure 23 Apidaecins improve specific multi-site incorporation of a fluorescent nsAA a) Exogenously added Api137 doubles multi-Cou incorporation to a PopZ-(UAG)6-sfGFP reporter in MG1655 cells, but not in C321.ΔRF1. b) Signals from PopZ-(UAG)6-sfGFP fusion (false colored in green) and the Cou (false colored in red) colocalized at the poles of the MG1655 E. coli cells imaged in phase, DAPI, and EGFP channels. c) Exogenously added Api137 does not change the ratio of Cou signal at the poles to the rest of the cells in either MG1655 or in C321.ΔRF1. d) Micrographs showing subcellular signals from Cou (false colored in red) and PopZ-(UAG)2-sfGFP fusions (false colored in green) colocalized at the poles of the DH10B E. coli cells imaged in phase, DAPI, and EGFP channels and overlayed on phase (false colored in blue) e) Violin plots of Single cell quantification by light microscopy showed that Api137 improved both Cou and sfGFP signals ~ 2 fold. The scale bars are 2 µm. ****, P <0.0001; **, P = 0.0059. 51

3.3.6 In-cell expression of Api1b dramatically improves nsAA incorporation

Encouraged by the moderate toxicity of exogenously added apidaecins on late-exponential cells, we hypothesized that cells would tolerate auto-induction of a gene for Api1b (Table 3) under

arabinose control. To test this, we cloned api1b into the plasmid that constitutively expresses the

MbAbKRS/ system. Optimization of the Api1b expression under different ribosome 𝑃𝑃𝑃𝑃𝑃𝑃 𝐶𝐶𝐶𝐶𝐶𝐶 binding sequences𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 (RBS) led to two such sequences, ParaB-RBS1-api1b and ParaB-RBS2-api1b, which

improved BocK incorporation in BL21 cells up to ~37 fold compared to no peptide condition despite

a significant decrease of the final cell density (Figure 24a). Growth curves confirmed that the late-

exponential autoinduction of these peptides mainly reduced the final cell mass (Figure 24b).

Nevertheless, in-cell autoinduction of ParaB-RBS2-api1b in BL21 improved expression levels of the

PopZ-(UAG)6-sfGFP and PopZ-(UAG)8-sfGFP from undetectable levels (<0.02% even with

exogenously added Api137) to ~6% and to ~2% of PopZ-(UAG)0-sfGFP (Figure 25a). These results

also suggest that the modest improvements by exogenously added Api1b might be due to its

limited cellular uptake (Figure 20a). Consistently, exogenously added Api1b inhibited the growth of

the api1b expressing cells more severely than the control cells (Figure 24b). The dramatic

improvement of nsAA incorporation by api1b expression was strain-specific: in DH10B in-cell

autoinduction of api1b merely doubled the expression levels of the PopZ-(UAG)6-sfGFP (from

~0.05% to only ~0.1% of PopZ-(UAG)0-sfGFP, Figure 24c). Growth curves confirmed that in-cell

autoinduction of api1b did not affect the growth in this strain, suggesting that DH10B (and its RF1)

is resistant to Api1b at these expression levels (Figure 24d).

52

Figure 24 Tweaking expression level and autoinduction improves nsAA incorporation a) Three ribosome binding sequences of different strenghts (RBS2>RBS1>RBS3) were designed using the RBS calculator tool (https://salislab.net/software/)12. Arabinose operated autoinduction of api1b improved BocK incorporation up to ~37 fold while showing significant toxicity depending on the gene’s RBS-strengh. b) Growth curves (left), generation times (middle) and final OD (right) of BL21 cells auto-inducibly expressing MbAbKRS and ParaB-RBS1-api1b or ParaB-RBS2-api1b. Exogenously added Api1b further sensitized api1b expressing cells but did not affect the growth of MbAbKRS control cells. 1.25 μg mL-1 Api1b reduced the growth rate of MbAbKRS control cells only 2% but it reduced the growth rate of api1b expressing cells ~30%. Growth of these api1b expressing cells were completely inhibited in the presence of higher Api1b concentrations tested (>10 μg mL-1). c) In-cell autoinduction of api1b genes had a dramatically less effect on nsAA incorporation in DH10B compared to BL21. d) ParaB-RBS1-api1b or ParaB-RBS2-api1b expression did not inhibit the growth of DH10B. The RBS sequences are RBS1: GGAGGTAAAAA, RBS2:GGAGTTAAGGAGGTAAAAA, and RBS3: GGAGGTAAAAAATGCCCGTTTTTAAGGAGGTAAAAA. ****, P <0.0001; **, P ≤ 0.007.

53 Table 3 Amino acid residues of the api1b gene, of the apidaecin-like peptide library, and of enrinched variants that are pursued further in this work. Divergence from the original Api1b sequence is highlighted. Each construct starts with an additional initiating N- terminal formyl-methionine that is not present in natural apidaecins. The library also contains a degenarate termination codon (TRR) sampling each of the 3 stop codons and Trp. Position api1b Library apiB5 apiB8 apiB10 apiC3 1 M Fixed M M M M 2 G A,E,G,K,R,T K E A A 3 N Fixed N N N N 4 N Fixed N N N N 5 R A,G,R,T A A T A 6 P Fixed P P P P 7 V A,I,T,V I I V V 8 Y Fixed Y Y Y Y 9 I I,V V V V V 10 P P,S P S S P A,E,G,K,P,Q,R, Q G Q K 11 Q T 12 P Fixed P P P P 13 R Fixed R R R R 14 P Fixed P P P P 15 P Fixed P P P P 16 H Fixed H H H H 17 P Fixed P P P P 18 R K,R K R K R 19 L I,L L I I I UAG, UAA, UAA UGA UAA UGG STOP UAG UGA, UGG (W)

54

Figure 25 In-cell autoinduction of apidaecin-like antimicrobial peptides improve nsAA incorporation in different E. coli strains a) In-cell autoinduction of an api1b gene in BL21 dramatically improved multi-BocK incorporation to 6- and 8-UAG containing reporters despite significantly reducing the final cell density. b) with exogenously added apidaecins selected ‘partially recoded’ and Api137 resistant DH10B cells that incorporated BocK more efficiently than the wild-type, even absent Api137. c-d) In-cell autoinduction of enriched apidaecin-like variants in DH10B dramatically improved multi-BocK incorporation to (c) 2-UAG or (d) 6-UAG containing reporters while being non-toxic. ****, P <0.0001; **, P = 0.004. 55

3.3.7 Partial recoding promotes apidaecin tolerance and nsAA incorporation.

In order to improve apidaecin-dependent nsAA incorporation in other strains and reduce

their toxicity, we pursued two distinct approaches: First, we explored if we could utilize RF1

inhibition by apidaecins as a selection for multiplexed recoding of essential UAG+ genes in different

E. coli wild-type strains. E. coli DH10B and MG1655 are both known to have the same 13 essential genes ending with UAG (Table 4 Sequences of key oligonucleotides)7,158. We were able to recover

resistant transformants on Api137-containing selection plates by only performing 3 cycles of recombineering159 with a mixture of 13 oligos which would recode the stop codons of these genes

(UAG to UAA) (Figure 26a). Our attempts to delete RF1 in multiple isolates were unsuccessful

suggesting that RF1 was still essential in these strains. Nevertheless, growth curves with two

selected isolates confirmed that each of the partially recoded isolates was significantly more

resistant to Api137 than their parents (Figure 26b). Mismatch amplification mutation assay

polymerase chain reaction7 revealed that atpE and coAD for the DH10B and atpE and lolA for the

MG1655 isolate were recoded. Moreover, these Api137 resistant, ‘partially recoded’ isolates

showed higher BocK-dependent (UAG)2-sfGFP signals even in the absence of apidaecins, which was

further improved in the presence of Api137 in DH10B (Figure 25b) and in MG1655 (Figure 26c).

Using apidaecins as selection for rapid recoding can be broadly applicable to strains that are

apidaecin sensitive, such as A. tumefaciens.

Table 4 Sequences of key oligonucleotides from chapter 2 Pri1 GTAATACGACTCACTATAGGGTTAACTTTAAGAAGGAGATATACATATGCAGATTTT TGTG Pri2 TTAGTGGTGGTGGTGGTGGTG Ultra1 GGTGGATAAAAATCTCTGCTTGAGGCCAATGCTTNNSCCGACTCTGDNSAACTATV NSCGAAAACTCGATAGGATTTTACCAGGCCCAATAAAAATTTTCGAAGTCGGACCTT GTTACCGGAAAGAGTCTGACGGCAAAGAGCACCTGGAAGAATTTACTATGGTGRV CTTCDSCCAGDBSGGTTCGGGATGTACTCGGG

56 Pri3 CGGAAGATCTGTTTTGAAAGTTC Pri4 AAAATCTTGAAGCTCTCATCAAAG Pri5 GATACTGAACTTTCAAAACAGATCTTCCGGGTGGATAAAAATCTCTGCTTG Pri6 CCAGAAACTCTTTGATGAGAGCTTCAAGATTTTCCCGAGTACATCCCG

MAGE-Neg control T*C*ATGTTGCTTCATGTGATCTGGATAGCGGGAAAAACATTGTACATACAGAGTA GTTACGAGAGTTGGCCATGGTACTGGGAGCTTGCCA g*t*taaagccggaataatatttgaccaaatgttcggccagccaaaTtaacatgtcccattctcctgtaaagc MAGE_murF_TAA gcgtactacctcttcca*t*g g*t*cgtaggatttaaataagagtccaggcctgatgagacgtgacaagcgtcacatcaggcatcggtgcaca MAGE_pgpA_TAA attacgacagaataccca*g*c g*a*taatgccttatccggtctacagtgcaggtgaaacttaaactattacacgtccagcagcagacgcgtcgg MAGE_sucB_TAA atcttccagcaactctt*t*g a*a*agtattatccgaaaaatcgagcgacagattgctcactcaggtgcctttacttacgttgatcatctaccgt MAGE_lolA_TAA gacgccttgcggcggg*g*t a*g*cgacattcatgactccatcaatcgaacgctgccgcggcgtaattagttgccagaagccagcaaggtta MAGE_lpxK_TAA gttgcgtaagcagtttcg*c*t c*c*agggaacacaaatgcaaattgcgtcatgttttaatccttatcttagaaacgaaccagcgcggagcccca MAGE_fabH_TAA ggtgaatccaccgccaa*a*g c*a*taggcgtaaatgcaccctgtaaaaaaagaaaatgatgtactgttactccagcccgaggctgtcgcgca MAGE_hemA_TAA gaatattcaggcgttcgt*t*a g*a*tatcattactccgtctgagcgaatgcgccgcctgagccgttaatgatgaataaccacgctactgtgcaat MAGE_fliN_TAA cttccgcgccggtttc*t*g a*g*ttcggataaggcgttcgcgccgcatccgacaataaacaccttatttacaacttcagaatttctttcacaa MAGE_hda_TAA acggaatggtcagctt*a*c a*g*caacagcgcaatgaggaaagagagccagattacccagcgtccctggctacgatagctcgccattattg MAGE_mreC_TAA ccctcccggcgcacgcgc*a*g a*t*cagggcgatgtcacccatttcctgccggagaatgtccatcaggcgctgatggcgaagttagcgtaacgt MAGE_coaD_TAA ttatgccggatggtatg*c*c t*a*gttaacgttctgatattgctctttaaataaaagcaacgcttattacgcgacagcgaacatcacgtacaga MAGE_atpE_TAA cccagacctacagcga*t*c t*t*taggttgtggtgagtgggggttgtgtttaaggacggggagagtcggggtattattacgaaagcccgctcc MAGE_yafF_TAA ccgcaaggactgacgc*c*a

57

Figure 26 RF1 inhibition by apideacins can facilitate recoding efforts toward improved nsAA incorporation a) Cells recoded by a mixture of oligos specifically targeting essential genes formed colonies on Api137 containing selective plates. Same experimental set-up with a random oligo did not result into spontanously resistant mutants. b) Growth curves (left) and generation times (right) of ‘partially recoded’ DH10B and MG1655 in the presence of different [Api137] show that partially recoded strains are more apidaecin-resistant than the wild-type parents (right). c) Partially recoded MG1655 cells incorporated BocK more efficiently than the wild-type even in the absence of Api137. ****, P <0.0001; **, P ≤ 0.0082; *, P ≤ 0.0366.

3.3.8 In-cell expression of new apidaecin variants improve nsAA incorporation

Second, in search of more potent novel apidaecin isoforms, we designed a focused peptide library based on natural and synthetic apidaecin-type, proline-rich antibacterial peptide sequences.

This library covered isoforms like Api2, Api3 and other related antibacterial peptides first isolated

58 from Bumble bee (Bb+A), Cicada Killer (CkA) or Bald-faces hornet (Ho+)139,140,160. In addition to the

termination codon, we varied 8 positions with residues of known diversity, resulting in a library of

~5.5 x 104 theoretical diversity (Table 3). We assembled this library (with at least 10X coverage)

using the plasmid expressing ParaB-RBS1-api1b and the MbAbKRS/ system as the template. 𝑃𝑃𝑃𝑃𝑃𝑃 𝐶𝐶𝐶𝐶𝐶𝐶 We validated the diversity and quality of this library by randomly sequencing𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 50 library variants.

Next, we grew the cells expressing the pooled library in GMML autoinduction media in the

presence of BocK and we sorted DH10B cells for (UAG)2-sfGFP signal via fluorescence-activated cell

sorting (FACS). We plated the ~0.0005% (stringent sort) and ~0.02% (loose sort) of most fluorescent

cells on selective plates. Variants sequenced from these plates did not converge into common

sequences (Figure 27a–b). Purification and retransformation of the 31 most enriched variants were

further characterized in subsequent plate-reader assays. Of these, none of the 12 variants from the

loose sort increased BocK+ / no nsAA signal. However, 20 out of the 21 variants from the stringent

sort significantly increased BocK+ / no nsAA signal with varying levels of final cell densities (Figure

27c–d). Based on their reduced toxicity, but increased BocK+ signals, 4 optimal variants, apiB5,

apiB8, apiB10, and apiC3 were selected and further pursued (Table 3). Despite their sequence

diversity (Figure 27e–f) each of these variants dramatically improved BocK incorporation in DH10B

cells compared to api1b (Figure 24c and Figure 25c–d). In-cell autoinduction of these novel

peptides improved expression levels of the PopZ-(UAG)2-sfGFP and PopZ-(UAG)6-sfGFP from ~0.9 %

and ~0.1% up to ~38.9% and ~9.1% of PopZ-(UAG)0-sfGFP (Figure 25c-d). This represented a ~43-91

fold increase of the BocK+ signal without the peptide and an increase up to ~645 fold of the true

nsAA signal over the background incorporation, i.e. no nsAA condition (Figure 25c–d). Growth

curves suggested that the late-exponential autoinduction of these peptides did not slow the

59 apparent growth rate, but rather significantly reduced the final cell mass (Figure 27g). Given the conservation of RF1 across bacteria, this approach may be utilized to raise new apidaecin-like peptides in bacterial species that are resistant to the natural apidaecins.

60

Figure 27 In-cell autoinduction allows evolution of new apidaecin-like peptides that show improved nsAA incorporation and decreased cell toxicity a–b) Enriched apiaecin-like peptide variants from the loose (a) or stringent (b) sorts do not converge to common sequences. c-d) In-cell autoinduction of variants from the stringent sort improved BocK incorporation but also affected the cell growth to different extends in DH10B cells. e–f) a Neighbor-joining phylogenetic tree without distance corrections (e), and multiple sequence alignment of relevant apidaecin variants (f) generated using Clustal Omega Multiple Sequence Alignment tool13. g) Growth curves (left), generation times (middle) and final OD (right) of DH10B cells auto-inducibly expressing MbAbKRS and apiB5, apiB8, apiB10, or apiC3. ****, P <0.0001; *, P = 0.0113.

61 3.4 Conclusion

The transient and tunable inhibition of RF1 activity by exploiting apidaecins-like peptides

could be used to improve nsAA based genetic code expansion applications and recoding projects in

a variety of wild-type bacterial species. For example, by promoting multi-site incorporation of small fluorescent nsAAs, these peptides can rapidly amplify the brightness of a target protein and may

find use in sensitive applications where large fluorescent protein fusions cannot be used161. Further

characterization of new apidaecins-like peptides is on-going.

62 Chapter 4: Broad and Efficient Genetic Code Expansion in Bacillus subtilis

This chapter was a manuscript in preparation for submission with the following authors:

Devon Stork, Georgia Squyres, Erkin Kuru, Kasia Gromek, Jonathan Rittichier, Aditya Jog, Briana

Burton, Aditya Kunjapur, Ethan Garner, and George M. Church

4.1 Abstract

Genetic code expansion allows chemical diversification of proteins, facilitating many experimental and practical approaches. These tools have been widely developed and applied in E. coli and mammalian systems, but rarely in gram-positive systems. Here we demonstrate broad and efficient genetic code expansion in the primary gram-positive model bacterium Bacillus subtilis by incorporating 20 nonstandard amino acids using 3 different genetic code expansion systems and both the UAG and UAGA codon. The current limitations include incorporation into sites in the B. subtilis genome and conditional uptake of nonstandard amino acids. However, we were able to use genomic genetic code expansion constructs for translational titration and photocrosslinking assays in B. subtilis. The broad functionality of this approach demonstrates an effective roadmap for genetic code expansion in B. subtilis and other bacteria.

4.2 Introduction

Genetic code expansion through site-specific incorporation of nonstandard amino acids

(nsAAs) widens the chemistry of biology outside of the standard amino acids, allowing use of diverse chemical functional groups. Across biology, nsAA incorporation enables new experimental approaches involving probing, labelling and controlling proteins in a minimally disruptive manner89,162. More than 100 different nsAAs have been demonstrated and are primarily used in E.

63 coli and mammalian cell culture. Several different families of nsAAs exist, including tyrosine, pyrrolysine, serine, leucine and tryptophan-derived nsAAs, which are incorporated by engineered variants of the corresponding aminoacyl tRNA synthetase (AARS) and cognate tRNA24. However, these powerful tools have been difficult to effectively and broadly utilize in bacterial systems outside of E. coli, especially in gram-positive organisms for basic science and industrial applications83,84.

One such organism, is B. subtilis, a gram-positive soil bacterium often found in the plant rhizosphere163,164 and with tremendous utility for basic and applied research165. In fundamental research, B. subtilis is a for the study of endospore formation166, asymmetric cell division167, biofilm formation168, and multicellular behavior169. In applied research, B. subtilis generates high interest given its ability to serve as a probiotic for both plants and animals, thus spanning agricultural, nutritional, and medical applications. It is an endophytic bacteria that resides within plant tissues and often stimulates plant growth170,171. B. subtilis spores also have GRAS

(Generally Regarded As Safe) status and tolerate the gastrointestinal environment well, with the ability to germinate, proliferate, and resporulate within the gut172. As a result, B. subtilis has a history of being consumed as a component of nutritional or medical supplements internationally in aquaculture, veterinary use, and human use173,174. The ability for safe consumption and high tolerance of the GI tract enables more creative engineering uses, such as demonstration of B. subtilis as an oral vaccine delivery vehicle175.

Here we show broad and efficient genetic code nsAA incorporation in B. subtilis using 3 families of stable, genomically integrated AARS constructs to efficiently incorporate 20 different nsAAs with functions ranging from biorthogonal tagging or photocrosslinking to fluorescence. We

64 demonstrate that unlike in non-recoded E. coli7,130,176, nsAAs incorporate efficiently into amber stop codons in native B. subtilis genes, and we demonstrate quadruplet codon incorporation to attempt to deal with the issue.. Additionally, nsAAs can be used to modulate translation rates, in this case allowing tight titration of filament cappers in vivo to demonstrate the effects of partially inhibiting cell division machinery. Finally, specific photocrosslinking experiments have demonstrated binding interactions of secreted gram-positive virulence factors. These results and our deposited strains will facilitate the use of nsAAs for general use in B. subtilis across research and industrial applications.

The success of our framework for general nsAA incorporation with different synthetases

suggests that any synthetase system functional in bacteria could be adapted for use in B. subtilis

with minimal optimization. Our results showing that import into the cell is limiting for large,

nonpolar nsAAs suggests paths for future general optimizations. Combined with B. subtilis’s status

as an industrial protein producer177,178, future research may see B. subtilis nsAA incorporation

become the default for production of chemically modified proteins179.

4.3 Results

4.3.1 Synthetase activity

65 To enable genetic code expansion in B. subitlis, we began by integrating at the LacA locus a codon optimized Methanococcus jannaschii-tyrosine synthetase (MjTyrRS) and corresponding tRNA

O O OH OH B NH2 A nsAA NH2 N3 p-azidophenylalanine (pAzF) biphenylalanine (bipA) tRNA OH nsAA O O AUC O O nsAA H N OH AARS O OH 2 NH 2 Benzoylphenalanine (BpA) AUC Coumarin-n s AA (couAA) HO O mRNA O Genome H OH UAG O N OH NH2 O NH2 N H boc-Lysine (boc-K) 5-HydroxyTryptophan (5OHW)

C E. coli nsAA incorporation D B. subtilis nsAA incorporation 100 BpA 100 ion ion s

+ nsAA s 80 pAzF + nsAA s

s 80 bipA e e r couAA - nsAA r BpA - nsAA p p

x pAzF 60 5OHW x 60 e e bipA couAA AG 40 AG 40 U U 0 0

f boc-K f 5OHW o 20 o 20

boc-K % % 0 0

Figure 28 Genetic code expansion via nsAA incorporation in B. subtilis A) Scheme of nsAA incorporation, with genomically integrated AARS and tRNA constructs incorporating externally provided nsAA into a genomically expressed gene containing an in-frame TAG amber stop codon. B) Chemical structure of 6 nsAAs of primary interest to this study. C&D) nsAA incorporation in C) E. coli and D) B, subtilis. nsAA used is indicated above each bar, synthetase below each bar. In all cases, signal is normalized to an identical reporter containing a TAC Tyr codon in place of a TAG amber codon. Average of triplicates shown with standard deviation as error bar. In C), c321 recoded E. coli from plasmid based AARS & tRNA and a genomic sfGFP reporter containing a single TAG codon in an N-terminal linker. D) nsAA incorporation in B. subtilis from genomically integrated AARS & tRNA and a genomic mNeongreen reporter containing a single TAG codon at position 2, immediately following the start codon. with a panel of constitutive promoters driving AARS and tRNA expression. Incorporation activity was read out with a separate cassette integrated at the AmyE locus, expressing the mNeongreen fluorescent protein with an IPTG-inducible and containing a TAG stop codon at position 2

(Figure 28a). Initial experiments determined that a pVeg/pSer AARS/tRNA promoter combination

66 yielded incorporation (Figure 29a). Follow-up experiments found a high background signal due to a secondary start codon at Methionine10 of mNeongreen, driven by a ribosomal binding site contained in residues 4-8 (Figure 29b-c). These findings may drive reinterpretation of subceullar microscopy analysis using C-terminal or sandwich fusion mNeongreen tags180–182, as the mNeongreen sequence commonly used is capable of initiating translation independently. An M10S mutation suppressed the background and reported 30-50 fold increase in mNeongreen fluorescence upon addition of the nsAA (Figure 29b). All subsequent usage of mNeongreen used the M10S variant.

To increase the number of nsAAs available in B. subtilis, we used the same promoter architecture and built multiple AARS cassettes, including additional MjTyrRSs as well as the

Saccharomyces cerevisiae tryptophan synthetase (ScWRS)60, and both Methanosarcina barkeri pyrrolysine synthetase (MbPylRS)129 and Methanomethylophilus alvus pyrrolysine synthetase

(MaPylRs)54,55. All demonstrated activity in B. subtilis with the exception of the MbPylRS, which is known to be poorly soluble in bacteria due to the N-terminal domain that the homologous MaPylRS lacks53,131 (Figure 29d). The incorporation activity of these diverse synthetases on their corresponding nsAAs reflected activity in E. coli24 (Figure 28b-d). We utilized the promiscuity of the

MjTyrRSs for multiple nsAAs to incorporate many more nsAAs, bringing the total incorporated in B. subtilis up to 20 (Figure 29e-f). These nsAAs cover most of the applications of genetic code

67

Figure 29 Extended nsAA incorporation in B. subtilis For all bar graphs, average of three replicates shown with standard deviation as error bar. A). Assay of 5 AARS/tRNA promoter combinations, with identity of the promoter indicated below the bars. Reported by an inducible mNeongreen containing a TAG codon in an N-terminal linker. Normalized to maximum fluorescence from the experiment. B) Assay of two mNeongreen reporters with and without associated synthetase. The reported TAG or TAC codon is at position 2, after the start methionine. C) Schematic of first 17 residues of TAG-mNeongreen reporter, with the M10 capable of secondary translational start indicated. Secondary RBS strength was calculated as approximately 1/3 the strength of the pHyperspank RBS with the Salis Lab Ribosomal Biding site calculator (https://www.denovodna.com/software/). D) Comparison of MaPylRS and MbPylRS activity with the TAG-mNeongreen reporter, normalized to TAC-mNeongreen. E) Assay of 3 different synthetases using sensitive IPTG-inducible TAG-nanoluciferase reporter capable of reporting over 5 orders of magnitude. F) Structures and names of additional nsAAs incorporated in G) by promiscuous napARS and bipARS synthetases. 100 uM of nsAA used in all cases. expansion and suggest that nearly any function of genetic code expansion in E. coli can be moved into B. subtilis.

68 Further exploration of general nsAA incorporation in B. subtilis with the extremely sensitive nanoluciferase reporter183 showed subtleties of incorporation efficiency and promiscuous background incorporation. The MjTyrRSs are known to promiscuously incorporate native amino acids in the absence of the target nsAAs127, and the presence of napARS increased TAG-luciferase expression 22-fold even in the absence of nsAA. The addition of nsAA increased the expression another 25-fold, for a total of 556-fold increase over the reporter alone. Both the ScwRS and

MaPylRS showed minimal background incorporation, with synthetase expression causing 1.3 and

3.6-fold over the reporter in the absence of nsAA, respectively. Addition of nsAA increased expression 11 and 295-fold over the reporter in the presence of corresponding nsAAs, respectively

(Figure 29e).

To confirm nsAA incorporation, we expressed FLAG-tagged mNeongreen with a UAG incorporated into an elastin-like peptide optimized for mass-spectrometry detection of nsAAs184.

Purification and analysis of peptides demonstrated incorporation of all the tyrosine-based nsAAs shown in Figure 28b (Figure 30). Instead of detecting boc-K at the indicated position, lysine was detected instead. This is likely due to the deprotection of boc during the chromatographic step of peptide identification185. Additionally, while no 5OHW was detected, this is likely due to the inability to purify sufficient protein due to low expression levels (Figure 29G)

69

A Synthetase napARSnone bipARS bpaRSnapARSCouRSMaPylRSScwRS B nsAA none pAzF bipA pBpA pAzF couAA boc-K 5OHW VPGAGVPG#GVPGVGK C

D E

F G

H

Figure 30 Mass-spectrometry of nsAA-containing peptides A) Gel image showing purified mNeongreen containing nsAAs. B) Sequence of 16-resitdue Elastin-like peptides found in mass- spec, with position of nsAA indicated by #.C-H) Mass Spectrums of peptides containing C) Tyrosine D) pAzF E) bipA F)pBpA G) CouAA H) Lysine (likely resulting from deprotection of boc-lysine)

70 4.3.2 Genomic TAG incorporation

An exciting application of nsAA incorporation is fluorescent tagging for subcellular microscopy30,126. We attempted to use coumarin incorporation to specifically localize small, dynamic cell-division components that have not previously been tagged. However, high background fluorescence hindered this goal (Figure 31). Whole protein lysate revealed that the fluorescent nsAA was being incorporated into more than 40 different proteins (Figure 32a). The fluorescent coumarin nsAA remained in the cell only when the corresponding CouRS synthetase was present

Figure 31 Flourescence imaging using CouAA Images of B. subtilis taken in phase, GFP (mNeongreen) and DAPI (CouAA) wavelengths. Each column has identical imaging conditions and brightness/contrast settings. 71 but did not require the reporter to be present (Figure 32b). Upon incorporation of pAzF, a click- chemistry capable nsAA, subsequent click-enrichment and mass-spectrometry analysis of the pull-

A B C 1 mM CouAA 10 0 uc

- nsAA L + pAzF r 250 kD o o

1.0 n u

l -1 - pAzF

100 kD f 10

Na - 75 kD A A C

50 kD u A o -2 T C

37 kD 10

f e o

v 0.5

25 kD i t n a l o -3 i 20 kD e 10 t R c a

15 kD r F -4 10 kD 0.0 10

Coumassie stain CouAA fluorescence D E 0.6 0.6

0.4 0.4 00 00 6 6 D D

s750 O O 0.2 0.2 s750 s750 + 1% Plu F-68 s750 + 1% Plu F-68 s750 Ammonia s750 Ammonia 0.0 0.0 0 500 1000 0 500 1000 Time (min) Time (min) Figure 32 Incorporation into genomic amber stop codons A). Whole-cell lysate of cells grown with CouAA run on SDS-PAGE gels. Imaged with Coomassie whole protein stain (left) and CouAA fluorescence (right) B). Fluorescence of the CouAA amino acid remaining in bulk cells after washing. Average of triplicates shown with standard deviation as the error bar C) High-sensitivity detection of nanoluciferase reporter with TAC, TAG or TAGA codons inserted at position 2. Average of triplicates shown with standard deviation as the error bar. Normalized to fraction of expression of TAC construct. D&E) Growth curves of reporter strains for D) bipA and E) pAzF incorporation in the presence of corresponding nsAAs. Timepoint where significant fluorescence of a TAG-mNeongreen reporter protein was initially detected is denoted with a vertical line. down fraction revealed that the nsAA was highly enriched in proteins ending with a UAG stop codon as compared to UAA or UGA (Figure 33a). Despite the suppression of genomic UAG stop codons, no significant decrease in growth rate was observed (Figure 33b-c).

72 A 0.25 B TAC napARS + pAzF 0.6

in TAG napARS + pAzF

e 0.20 AG T rot TAG napARS - pAzF p h

t 0.15 f i 0.4 o w

n g

o 0.10 n i i t d n

e 0.05 0.2 Frac

0.00 napaRS + + 0.0 UAG-mNeon + 0 500 1000 C Time (min) TAC bipARS + bipA 0.6 TAG bipARS + bipA TAG bipARS - bipA 0.4

0.2

0.0 0 500 1000 Time (min) Figure 33 Genomic incorporation and growth curves A) Comparison in number of peptides detected from B. subtilis proteins ending in the TAG codon from click- pulldown of lysate of cells grown in s750 + 100 uM pAzF. B-C) Growth curves of cells with and without nsAA.

One approach to reduce the level of background nsAA incorporation and to further extend nsAAs in B. subtilis would be to use a codon rarer than the 569 UAG codons in B. subtilis, such as the quadruplet UAGA codon, which occurs 217 times. To accomplish this, we cloned a ‘qNapARS’ cassette, containing a UCUA-tRNA to bind to the UAGA codon, and a napARS synthetase with the

F261S and D286E mutations which have been shown to encourage UCUA-tRNA aminoacylation186.The quadruplet system was able to successfully incorporate into the UAGA codon with low efficiency, but also incorporated into the UAG codon with similar efficiency (Figure 32c).

73 4.3.3 Cellular uptake of nsAAs

Efficient nsAA incorporation was found to depend on cell state, media type and the specific

nsAA used. In the case of bipA, a standard nsAA used for biocontainment187, incorporation was

delayed until stationary phase (Figure 32d & Figure 34a). For the smaller, more hydrophilic nsAA

pAzF, equal incorporation was observed in exponential and stationary, (Figure 32e & Figure 34b).

BpA, another bulky hydrophobic amino acid, also showed limited exponential incorporation, but

the pyrrolysine analogue boc-K incorporated throughout the cell cycle (Figure 34c-d). We

hypothesized that nsAA uptake into the cell was limited and inhibited by high concentrations of

standard amino acids. Pluronic F-68, a surfactant shown to be nontoxic and to help bulky molecules

cross the cell membrane188 increased nsAA incorporation, as did removing all amino acids from the cell culture media (Figure 34e-f). Rich media delays incorporation significantly for both bipA and pAzF (Figure 34g-h), possibly due to the high concentrations of amino acids present in the media.

74 A WT + bipA TAG bipARS + bipA B WT + pAzF TAG napARS + pAzF TAC + bipA TAG bipARS - bipA TAC + pAzF TAG napARS - nsAA

1.0 flourescence 1.0 0.6 flourescence 0.6 OD OD OD 0.4 0.4 OD 0.5 0.5

0.2 0.2 Normalized Fluor Normalized Normalized Fluor Normalized

0.0 0.0 0.0 0.0 0 500 1000 0 500 1000 C Time (min) D Time (min) WT + bpA TAG bipARS + bpA WT + boc-k TAG maPylRS + boc-k TAC + bpA TAG bipARS - nsAA TAC + boc-k TAG maPylRS - nsAA

1.0 flourescence 1.0 flourescence 0.6 0.6 OD OD OD 0.4 0.4 OD 0.5 0.5

0.2 0.2 Normalized Fluor Normalized Fluor Normalized

0.0 0.0 0.0 0.0 0 500 1000 0 500 1000 Time (min) Time (min)

E WT + bipA TAG bipARS + bipA F WT + bipA TAG bipARS + bipA UAC + bipA TAG bipARS - nsAA TAC + bipA TAG bipARS- nsAA

1.0 flourescence 1.0 Flourescence 0.6 0.6 OD OD OD 0.4 OD 0.4 0.5 0.5

0.2 0.2 Normalized Fluor Normalized Normalized Fluor Normalized

0.0 0.0 0.0 0.0 0 500 1000 0 500 1000 Time (min) Time (min)

G WT + bipA TAG bipARS + bipA H WT + pAzF TAG napARS + pAzF TAC + bipA TAG bipARS - nsAA TAC + pAzF TAG napARS - nsAA

1.0 flourescence 1.0 flourescence 0.6 0.6 OD OD OD 0.4 0.4 OD 0.5 0.5

0.2 0.2 Normalized Fluor Normalized Normalized Fluor Normalized

0.0 0.0 0.0 0.0 0 500 1000 0 500 1000 Time (min) Time (min)

Figure 34 Fluorescence and OD time courses for various nsAAs . Fluorescence and OD curves for different nsAAs and media conditions. A-D) s750 minimal media. E)s750 minimal media plus 1% w/w Pluronic F-68 F) s750 media modified to replace all amino acids with 0.3% ammonium sulfate. G-H) CH rich media. 75 To verify that import into the cell was limiting, we performed LCMS experiments measuring internal nsAA concentrations under different conditions. Reflecting the time course data, bipA was only present in very low concentrations in cells during exponential phase, with higher concentrations in minimal media and stationary phase. pAzF was present at high concentrations inside cells during both exponential and stationary phase in minimal media, but the concentrations were greatly reduced in rich media (Figure 35).

pAzF Exponential BipA Exponential LCMS data pAzF Stationary BipA Stationary 50 0.10 i n M 40 0.08 t e u / r ] n F a z

0.06 l 30 [ b pA [ i

p l A a 20 0.04 n ] r / u M

nte 10 0.02 i

0 0.00 s750 s750 + Plu F-68 LB Figure 35 LCMS analysis of nsAA levels in cells Lysate of cells grown in indicated conditions analyzed by quantitative LCMS.

4.3.4 Translational titration with nsAAs

If a protein transcript contains a TAG stop codon, the presence of the nsAA is necessary for translation of the full-length protein. Titration of the pAzF titrated the translation of mNeongreen reporter over a wide linear range and with a dynamic range of approximately 50-fold (Figure 36a

76

Figure 36 Protein titration with nsAAs A) Comparison between titration curves of standard IPTG promoters and nsAA titration with pAzf. Hill coefficents and dynamic ranges of sigmoidal curves found in Table 1. Inset shows sigmoidal curve fits at low expression levels. All values are averages of 3 replicates with standard deviation error bars. B) 2-dimensional titration of UAG-mNeongreen with both IPTG and pAzF. All values are average of 3 replicates, errors are standard deviation. C) Titration of MciZ with pHyperspank and UAG. Top is phase, bottom is epifluorescence of a tagged FtsA to image the Z-ring.

Figure 37a). Notably, while most transcriptionally inducible promoters show cooperative-like behavior and have steep induction curves with high hill coefficients, nsAA-inducible promoters show no , with a hill coefficient not significantly different from 1 (Table 5).

77 Because the transcriptionally inducible promoters act by controlling mRNA levels, and the

UAG titration controls translation, it is possible to use both at once for ‘2-dimensional titration’,

allowing for extreme fine-tuning of protein expression from completely undetectable levels to

overexpression (Figure 36b, Figure 37b).

A B x [pAzF]/uM TAG + napARS + - a 0 0.1 1 10 100 1.0 m 1.0 1 mM IPTG p

y pXylose-mNeongreen 50 uM IPTG c 0.8 H i x p 0.8 o 25 uM IPTG TAG-mNeon t o t s 0.6 e d 10 uM IPTG

e 0.6 m z o i l

c 5 uM IPTG e a 0.4

0.4 b m 2.5 uM IPTG r e s o o n l 1 uM IPTG 0.2

0.2 y D X 0 uM IPTG O / r

o 0.0 u l 0 100 1000 F [Xylose]/mM Figure 37 Extended titration data A) Titration and sigmoidal fit of pXylose-mNeongreen, with pAzF-induced TAG-mNeongreen from Error! Reference source not found.a overlaid. B) 2-dimensional titration data, alternate display for dataset Error! Reference source not found.b

Table 5 Quantitative characteristics of different promoters in B. sub. Characteristics were plotted by fitting a sigmoidal curve to the induction curve. Leakiness is measured as % of pHyperspank max expression. 95% CI on Leakiness Hill Hill Promoter Inducer Dynamic Range (absolute) coefficent coefficent pHyperspank IPTG 13.7 7.1 1.22 1.01 to 1.48 pSpank IPTG 4.1 3.6 2.28 1.67 to 3.30 pSpac IPTG 4.3 4.5 2.33 1.15 to 6.45 pHyperspank* IPTG 5.4 1.7 3.27 2.09 to 7.12 Could not Could not pXyl Xylose ~10 11 saturate saturate UAG pAzF 49.9 1.3 1.08 0.97 to 1.2 To apply this novel method of protein expression, we looked to places where tight titration

of proteins would allow for ‘in vivo biochemistry’. The B. subtilis FtsZ protein forms a treadmilling

filament that is essential to cell division, but the relationship between filament dynamics and cell

78 division is not entirely understood180. We titrated low levels of FtsZ filament cappers to determine

how sub-inhibitory levels of cappers would change FtsZ dynamics and cell division, and captured an intermediate filament state with present but diffuse Z-rings (Figure 36c). Analysis is ongoing as to the altered dynamics of these Z-rings.

4.3.5 Photocrosslinking

A primary application of nsAAs is the use of UV-photocrosslinkers to probe protein structure

and assembly in vivo. Previous work has shown that the YukE protein, homolog of the

mycobacterial virulence factor EsxA, requires homodimerization for efficient translocation by the

Early secretory antigen (Esx) pathway during sporulation189. We demonstrated that pAzF can be

spI-GFP27TAG Phase spI-GFP27TAG (overexposure)

+pAzF

-pAzF

Figure 38 Incorporation of pAzF in sporulating B. subtilis cells Cells with gfp(F27tag) under control of a mother-cell specific promoter, PIIE, were induced to sporulate by resuspension. At the time of resuspension, the culture was split in two and pAzF was added to the experimental sample. An example image of sporulating cells at 150 minutes after resuspension shows mNeonGreen signal in the mother cell compartment of engulfed sporulating cells. The control sample shows fluorescent signal at background levels when pAzF is not present.

79

incorporated during sporulation (Figure 38) and used the photocrosslinking capabilities of pAzF to demonstrate specific crosslinking between YukE monomers in vivo in bacillus. Reflecting experiments in E. coli, B. subtilis photocrosslinking is capable of distinguishing short-range interactions, as placing the photocrosslinker nsAA on the interface of the homodimer (W44TAG) resulted in high-efficiency crosslinking, while placing it on the external face (V21TAG) did not yield any detectable crosslinking (Figure 39). A. 180°

B.

Figure 39 In vivo photocrosslinking A). Positions of V21 (orange space fill) and W44 (purple space fill) indicated. (B). Immunoblotting results from in vivo crosslinking experiment. Cells producing YukE with pAzF incorporated at either position 21 or 44 were treated with UV for indicated times. Top panel anti-SigA loading control immunoblot. Bottom panel anti-YukE immunoblot with monomeric and dimeric YukE positions noted. 4.4 Discussion

80 Here we demonstrate general nsAA incorporation tools in the primary gram-positive model

organism B. subtilis. We enabled tyrosine, pyrrolysine and tryptophan-derived nsAAs in this gram-

positive organism, together constituting 85% of demonstrated nsAAs incorporated in any

organism24 and demonstrated applications of several of these nsAAs. Characterization of the

limitations of nsAAs in this organism demonstrated extensive incorporation into native B. subtilis

proteins, a phenomenon not observed in E. coli7,130,176. Additionally, nsAA import can be limiting in

rich medias and with large, hydrophobic nsAAs. While these issues limit the immediate applications

of genetic code expansion in B. subtilis, we demonstrate both in vivo photocrosslinking and translational titration of protein expression with favorable induction characteristics. Combining both transcriptional and translational titration allow us to do ‘in vivo biochemistry’, titrating cappers of the cell-division filament FtsZ to see the effect on cell division. We believe that enabling genetic code expansion tools in B. subtilis will not only allow use of nsAA-based techniques developed in E. coli, but also serve as the foundation for developing new technologies taking advantage of B. subtilis’s status as an industrial and GRAS organism.

This work also has implications for better understanding genetic code expansion systems and their use in diverse organisms. While Mb/Mm pyrrolysine-based systems are often used to expand the genetic code of novel organisms82,83,190, this work shows that such efforts may be

inhibited by the low solubility of the N-terminal domain53,131 and the possibility of using the

MaPylRS instead54,55. The ease of high-efficiency incorporation with tyrosine-based systems was

also notable, and their activity and the broad array of functional nsAAs suggests that MjTyrRS-

systems should be the first attempt in expansion of the genetic code in bacteria. Finally, it was

observed that previously demonstrated UAGA quadruplet incorporation systems186 are capable of

81 incorporating into the triplet UAG codon at nearly identical efficiency to the quadruplet codon,

complicating efforts to incorporate multiple nsAAs at once.

Future work developing the potential of nsAAs in B. subtilis can focus on features of nsAAs

specific to Bacillus or use previous E. coli work as a roadmap to enable new applications.

Duplicating the feat of genome recoding to remove the TAG stop codon7 would expand the horizons of genetic code expansion in B. subtilis dramatically. In addition to advances in whole- genome synthesis13,191, new technologies that take advantage of B. subtilis natural competence for

multiplexed genome editing192,193 and select for recoding194 will dramatically speed the recoding

process. Encouraging uptake of large, hydrophobic nsAAs may be accomplished by heterologous

expression of a panel of appropriate membrane transporters195, though minimal protein

engineering may aid in maximizing uptake196. Further engineering of the incorporation system for higher efficiency higher yield197 or incorporation into multiple sites73 will enable more practical

applications, especially for pharmaceuticals177,179.

Highly efficient and controlled nsAA incorporation technology will transform and enable

applications of B. subtilis. While genetic code expansion enables additional research tools including

fluorescence, crosslinking and photocontrol38 of proteins, the greater impact will likely be in

practical applications. As a GRAS microorganism used as a probiotic for humans, livestock and

plants170,174, genetic code expansion will allow for biocontainment via synthetic auxotrophy76.

Advances in metabolic engineering enable in-vivo synthesis of nsAAs198 and utilize them to improve

biocatalysis processes199, enhancing the already capable B. subtilis chassis for metabolic

engineering200.

Chapter 5: Discussion

82 5.1 Summary of Results

The approaches presented here generalize previously existing GCE techniques to new

systems and new applications. Specifically, this thesis does not focus on the evolution of novel

AARS/tRNA pairs, but rather the development of tools surrounding nsAAs to enhance them for new applications.

We find in Chapter 2 that several AARS/tRNA pairs promiscuously incorporate multiple nsAAs and low levels of standard amino acids. Under some conditions this is acceptable and even advantageous, but absent competition with RF1 it causes high background and unwanted products.

These products, in addition to causing high background, can sabotage biocontainment approaches.

We invented a tunable method to post-translationally degrade these unwanted products and used

it to help select for more specific OTSs which improved specificity in a broad range of media

conditions and aided biocontainment approaches.

While a lack of competition with RF1 can result in high background, competition with RF1

inhibits GCE, and makes it difficult to incorporate multiple nsAAs into a single protein. This is

especially a problem in strains optimized for high levels of protein expression, such as BL21. In

Chapter 3 our solution was to develop an antimicrobial peptide to transiently inhibit RF1 to encourage efficient multi-site nsAA incorporation. This approach works generally, enhancing GCE

across different OTSs and across organisms.

While there are many examples of expanding the genetic code of novel organisms, such

work often uses a single nsAA in one condition for a specific application and further GCE in that

organism must re-develop incorporation for a different nsAA or condition. In Chapter 4 we expanded the genetic code of Bacillus subtilis with three different families of synthetases to

83 incorporate over 20 different amino acids in various medias and conditions. We then used these nsAAs for three different practical applications. This approach validated a standard path to incorporate any existing nsAA in B. subtilis and demonstrated the issues and next steps for improving GCE in this organism.

Together, these results demonstrate approaches for expanding the domain of GCE beyond specialist strains and optimized conditions to more general uses across biology.

5.2 New Synthetase development

Development of new AARSs for charging nsAAs onto tRNAs is a technically demanding task.

The large number of functional nsAAs that can be incorporated by existing systems decrease the

demand for future work in this area. There are still opportunities for new nsAA development, such

as the development of fluorogenic nsAAs to replace fluorescent protein tags201 but generally

synthetase development has transitioned to focusing on improved activity and orthogonality.

Since engineered AARSs usually are significantly less active than wild-type AARSs6,73 further

development is needed for OTSs with wild-type levels of activity. Additionally, many current OTS

systems are promiscuous, incorporating available sAAs or multiple nsAAs at once127. Fixing these

problems requires extensive AARS engineering, especially if the goal is robust multi-site

incorporation73 or high specificity, as shown in Chapter 2. OTS-independent tools to select for high

activity and specificity coupled with powerful methods for directed evolution and protein design

are important parts of this process202,203.

In addition to improving the characteristics of AARSs, new approaches are opening up

multiple codons for use in encoding nsAAs13,15,16,186. These techniques allow incorporation of

multiple different nsAAs at once and may allow ribosomal translation of entirely nonstandard

84 proteins. Such approaches would require many mutually orthogonal AARS/tRNA pairs, and recent

work has discovered many potential candidates that require engineering to be able to fufill this

promise204.

While development of AARSs to incorporate novel nsAAs has become less significant, it has

been replaced by the refinement of existing systems and the development of mutually orthogonal

sets of AARS/tRNA pairs to enable simultaneous incorporation of multiple nsAAs.

5.3 Genetic code expansion in novel organisms

There is continuing development of genetic tools in non-model organisms for basic and

applied science, as many of these organisms can provide fresh insight into biological processes and provide new industrial capabilities. As more of these organisms are made genetically tractable, it will be become possible to expand their genetic code. Because of the legacy of GCE tool development and use in E. coli and mammalian systems, getting a GCE working in a new organism allows immediate access to a very broad range of new tools for biochemistry, cell biology, chemical biology and more.

For the applications of nsAAs in E. coli and mammalian cells to carry over to new organisms, we should focus on using PylRS or MjTyrRS-based incorporation systems. Specific selection of synthetase family for use in a new organism should look at the desired function, target efficiency and potential orthogonality of a GCE system before choosing a synthetase family.

The Mm/Mb PylRS systems have been touted as ideal for application to new bacteria and are often used for GCE in a new organism, primarily due to its general orthogonality79,82,83.

However, the lower solubility of the N-terminal domain of the Mm/Mb PylRSs in bacteria53,131 may

mean that new classes of PylRSs lacking the N-terminal domain are better candidates for applying

85 pyrrolysine nsAAs in new organisms. Work in this area was spearheaded by the MaPylRS

demonstrated in chapter 4. While it does seem possible to use the Mm/Mb PylRSs in bacteria by

overexpressing tRNA83 as the tRNA-bound synthetase remains soluble, using a priori soluble AARS is

a better solution. Thus far, transferring mutations from the Mm/Mb PylRSs to the MaPylRS seems

to transfer expanded nsAA substrate specificity, allowing direct replacement of Mm/Mb PylRSs with

MaPylRS in GCE approaches. Finally, PylRS systems struggle to reach a high efficiency of nsAA incorporation, often struggling to compete with a native release factor for efficient incorporation.

The greater efficiency, cheaper cost of nsAAs and lack of solubility problems often mean that MjTyRS-based systems of nsAA incorporation are a better choice for expanding into a novel organism. MjTyrRS systems are usually only orthogonal in bacterial systems and generally will have unacceptable levels of crosstalk with native translational systems in eukaryotic and archaeal hosts, where they should likely not be used. Additionally, the greater structural flexibility of pyrrolysine- based nsAAs means there are more pyrrolysine nsAAs with a broader set of functions, such as copper-free click-chemistry176. However, in cases using a bacterial host and not requiring

pyrrolysine-specific functionality, we found in chapters 3 & 4 that tyrosine-based GCE systems

generally have better characteristics than the pyrrolysine systems in novel organisms.

After selecting an AARS host family, the AARS, tRNA and target gene need to be expressed in the new host organism. In our experiments in chapter 3 & 4 we found moderate constitutive expression of the AARS & tRNA to be effective, from either genomic integrations or plasmid-based

systems. It has often been reported that tRNA expression can be limiting, so higher tRNA

expression or use of multiple copies of the tRNA may increase incorporation efficiency, though this

may also be solved by using a better AARS or translational engineering. It is common but not

86 necessary to inducibly express the target gene containing a codon targeted for incorporation.

Finally, attempting incorporation in a range of medias and growth conditions as in chapter 4 is

essential to discover optimal conditions and potential limiting factors for nsAA incorporation.

While hopefully additional host engineering is not necessary to achieve broad and efficient

GCE, it seems like various organisms may have additional barriers for nsAA incorporation. The level of UAG suppression varies widely across bacteria8, and in some cases interfering with RF1 may

significantly increase the efficiency of nsAA incorporation, as shown in chapter 3 with

Agrobacterium tumefaciens. This may be an especially effective approach for incorporating multiple

copies of an nsAA into a single protein. While in chapter 3 we presented RF1-inhibiting peptides as

a general method for bacterial inhibition of RF1, other approaches to suppressing competition for a

target codon are also viable, ranging from knockouts of accessory termination factors205 to recoding

and deletion of the release factor itself7. Additionally, limited uptake of nsAAs may hinder proper incorporation, and can be solved by media perturbations as shown in chapter 4, or engineering of membrane proteins to encourage uptake196.

I hope that the guidelines above will aid in transferring the rich tools of genetic code expansion into new organisms to help take advantage of novel biology.

5.4 Biotechnology applications

In addition to experimental tools, there are many approaches to applying GCE in biotechnology today. Beyond the general potential of expanding protein design space with new letters of the genetic code, specific bioconjugation and synthetic auxotrophy are some of the most exciting areas in this space. However, few of these applications have seen industrial success in large

part due to the limited host space and efficiency of current GCE techniques. As these techniques

87 are further generalized to broader conditions and engineered for higher efficiency and specificity,

these applications will become more relevant to modern biotechnology and enable new

approaches.

The arguably most exciting near-term application of GCE is specific bioconjugation, where a protein, typically an antibody or nanobody, is site-specifically conjugated with a chemical cargo.

These cargos can enhance immunocompatibility of a protein drug by blocking antigen recognition, increase target specificity with a second binding interaction or convey new activity with a small- molecule effector31,120. To get these conjugates, an nsAA is incorporated into the target protein

during expression in a genetically expanded host or cell-free system capable of GCE, and after

purification the nsAA is conjugated with the desired chemical cargo via one of a number of fast, biocompatible but biorthogonal reactions134. There are other ways to achieve chemical cargoes

including chemical conjugation to cystines and enzymatic approaches, but nsAAs offer residue-

specific positioning and have the potential to produce more homologous products134,206. One of the largest barriers to using GCE for bioconjugation arises from low titers derived from low efficiency nsAA incorporation. This is further complicated because some therapeutic proteins such as antibodies must be produced in mammalian cell culture to be properly glycosylated. Increases in efficiency of GCE in mammalian cells207 and a movement toward nanobodies that can be expressed from optimized bacterial hosts208 are some approaches to dealing with this problem. Further issues

arise from incompletely optimized nsAA incorporation, producing heterologous products as mistranslation and termination compete with correct suppression and AARSs incorporate standard amino acids. Methods discussed in chapters 2 & 3 to increase suppression efficiency and decrease

incorrect translation are examples of ways to improve heterologous nsAA incorporation.

88 Finally, expansion of the genetic code to novel organisms is very likely to help improve the

use of GCE for bioconjugation, as model organisms such as E. coli are often not ideal bioproduct

production strains. Bacillus strains are especially noted for their industrial protein production

capabilities177,208,209, so expansion of their genetic code in chapter 4 and other work82–84 is directly

relevant. GCE approaches in bioconjugation for production of therapeutic peptides will be

dramatically improved in coming years as improvements and generalizations allow more efficient

and more general nsAA incorporation.

While nsAAs allow diverse chemical modifications to target proteins, there is also potential for impressive control of engineered strains through synthetic auxotrophy. In this scheme strains are made to require a synthetic nsAA for survival. Usually this involves incorporating an nsAA into an essential protein such that replacement with a standard amino acid causes misfolding or inability to perform some essential function187,210. Such approaches have extremely low escape

rates and will be increasingly significant as engineered organisms are deployed in uncontrolled

environments for therapeutic, bioremediation or other purposes. However, as shown in chapter 2,

such approaches require powerful and precise GCE technology to avoid growth limitations and

issues with biocontainment. Additionally, ideal probiotic and bioremediation strains are likely to be nonmodal strains and can only be subjected to synthetic auxotrophy by extension of GCE using methods such as those described in chapters 3 & 4. Though the need for synthetic auxotrophy is not supremely pressing, it is one of the most promising routes to the kind of inescapable biocontainment that will allow deployment of the most powerful engineered biological technologies.

89 Finally, the expansion of the genetic code beyond the standard 20 amino acids holds incredible promise for the future of engineered proteins. Making all stable chemical groups available as easily as standard amino acids to protein engineers will enable development of proteins capable of nearly any theoretically possible activity. Possible applications range from arbitrary enzymatic activity to structural and force-generating properties211,212. With proper application, these engineered biopolymers have the potential to transform every aspect of science and life. However, these starry-eyed possibilities are still decades in the future, requiring significant advancements in protein engineering, genetic code expansion and general biological sciences.

Nonetheless, keeping in mind the eventual goal of these technologies is important to understanding the follow-up work that is necessary to get there.

5.5 The future of generalization of GCE

For the past 20 years, GCE has been extensively developed in a few, limited contexts. A huge stable of functional nsAAs are available for efficient and specific incorporation in E. coli (especially in recoded strains) and mammalian cells, but in few other places. Expanding the hosts and contexts in which GCE can be used is currently the most important part of realizing its potential.

There are several approaches to expanding the context of GCE, highlighted in the previous sections of this thesis. In chapter 2 and section 5.2 I describe new approaches and methods to develop better AARSs with higher activity and specificity, able to be used in rich media and broader growth conditions. In chapters 3 & 4 and sections 5.3 & 5.4 I cover ways to use GCE in new organisms and how that will improve biotechnological applications of GCE. Traditional approaches to improve AARSs and recode strains are still significant, but development of accessory approaches

90 that generalize the application of GCE is just as important for the future development of the

technology.

It is difficult to predict what future GCE generalization will look like, beyond careful and thorough transitioning to new organisms. Creative approaches separate from the core nsAA charging systems were significant for Chapters 2 & 3, and further creative approaches to improving nsAA incorporation are likely to play a large role in being able to use nsAAs in new contexts. Some possibilities involve host-independent import systems to encourage uptake, or ways to alter suppression efficiency for some instances of a codon but not others using mRNA structure or separate ribosomal factors. A more ambitious goal is a fully orthogonal ribosome, complete with a full set of tRNA/AARS pairs, translating separate RNA from the native ribosome. The ability to freely engineer such a system without needing to preserve host activity and use it in multiple host systems with different translational architecture would be a great boon toward making GCE technology more accessible and general. Whatever techniques are used, the potential of genetic code expansion to engineer the fundamentals of biology is vast.

APPENDIX

Chapter 2 Appendix

Additional Findings

Additional characterization of PTP. Reporter degradation is abrogated by the substitution of

an N-end stabilizing residue (Figure 40a), and the incorporation of BipA at another position in addition to the N-terminus does not change our results (Figure 40b). We also show that PTP

can be easily extended to other target proteins such as markers (Figure 41).

91

Figure 40 Additional initial characterization of Post-Translational Proofreading (PTP) system (A) Control constructs with hard-coded serine and tyrosine in place of X in the Ub-X-GFP construct. Behavior for serine (known N-end stabilizing) and tyrosine (known N-end destabilizing) is as expected. (B) Control experiments showing that an additional position internal to GFP can be targeted for BipA incorporation if the N-terminus is also targeted and similar behavior results.

Figure 41 PTP is generalizable to other reporter proteins (A) Results using a genomically integrated Ub-UAG-Bla selectable marker, where Bla is a beta-lactamase conferring resistance to carbenicillin if fully expressed and stabilized. (B) Results using a genomically integrated Ub-UAG-Cat selectable marker, where Cat is a chloramphenicol acetyltransferase conferring resistance to chloramphenicol if fully expressed and stabilized.

Improving selectivity or Comparison of recent evolution strategies for improving selectivity.

activity need not be conflicting goals, but different reporters and screening schemes may be

92 better suited for one or the other. For example, another recent evolution method (12) used a

reporter containing many UAG sites and a genome-integrated OTS system. The resulting

higher ratio of UAG sites to OTS expression compared to this study can produce OTSs capable

of expressing protein containing as many as 30 UAGs. However, we tested this evolved OTS

(pAcFRS.2.t1) on our genome-integrated Ub-X-GFP reporter and observed remarkably high

promiscuity (Figure 42). In fact, the promiscuity increased substantially from the parental construct, which offers insight into the potential tradeoff between selectivity/activity and the importance of methods capable of achieving either aim.

20000 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 No NSAA No NSAA w PTP 1 mM pAcF w PTP 1 mM pAzF w PTP

pAcFRS_D286R pAzFRS.1.t1 pAcFRS.2.t1

Figure 42 Single UAG suppression sensitivity assay with and without PTP Using ClpS_V65I, which does not degrade pAcF or pAzF reveals that AARSs evolved using a strategy geared towards multi-UAG suppression (Ref. 24) display very low fidelity for single UAG sites. Progenitor: pAcFRS_D286R. Evolved strains: pAzFRS.1.t1 and pAcFRS.2.t1. Funding, COI and acknowledgements:

Funding Sources:

93 This work was supported by Life Sciences Research Foundation Fellowship awarded to E.K. Work in

the Church laboratory was supported by US Department of Energy Grant DE-FG02-02ER63445.

Conflict of Interest Disclosure:

Conflict of interest statement: G.M.C. has related financial interests in ReadCoor, EnEvolv, and GRO

Biosciences. A.M.K. and G.M.C. have filed a provisional patent on posttranslational proofreading,

and A.M.K., D.S., E.K., and G.M.C. have filed a provisional patent on evolved BipA OTS variants. For

a complete list of G.M.C.’s financial interests, please visit arep.med.harvard.edu/gmc/tech.html.

Acknowledgements:

We thank Dr. Daniel J. Mandell (Harvard), Dr. Ethan Garner (Harvard), Dr. Karl Schmitz

[Massachusetts Institute of Technology (MIT)], Georgia Squyres (Harvard), Alex Bisson (Harvard),

Bernardo Cervantes (MIT), Dr. Yekaterina Tarasova (MIT), Dr. Abubakar Jalloh, Juhee Park (MIT),

and Dr. Irene M. B. Reizman (Rose-Hulman) for insightful discussions or manuscript comments;

Chad Araneo for FACS assistance; and Dr. Bogdan Budnik for mass spectrometry assistance. This

project was graciously funded by US Department of Energy Grant DE-FG02-02ER63445 (to G.M.C.)

and National Institutes of Health Grants R01GM22854 and R35GM122560 (to D.S.).

Methods

Plasmids and plasmid construction.

Two copies of orthogonal MjTyrRS-derived AARSs and opt were kindly provided in 𝑇𝑇𝑇𝑇𝑇𝑇 𝐶𝐶𝐶𝐶𝐶𝐶 pEVOL plasmids by Dr. Peter Schultz (Scripps Institute)92. Engineered𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 aminoacyl-tRNA synthetases

(AARSs) used in this study were the following: BipARS95, BipyARS95, pAcFRS213, pAzFRS214, and

NapARS215. The pEVOL plasmids were maintained using chloramphenicol. Original plasmids

harboring two AARS copies were used for synthetase promiscuity comparison experiments (Figure 94 5 & Figure 8). For generation and characterization of synthetase variants, plasmids harboring only

one AARS copy under inducible expression were constructed using Gibson assembly216. The ScWRS-

R3-13 AARS was synthesized as codon-optimized for expression in E. coli and cloned into the pEVOL plasmid along with its associated tRNA60,217. In all cases, tRNA is constitutively expressed and AARS

expression is either arabinose inducible or constitutive.

An N-terminally truncated form of the UBP1 gene from Saccharomyces cerevisiae104,105

(ScUBP1trunc or simply UBP1) was synthesized as codon-optimized for expression in E. coli and

cloned into the pZE21 vector (Kanamycin resistance, ColE1 origin, TET promoter) (Expressys). The E.

coli genes clpS and clpP were PCR amplified from E. coli MG1655 and cloned into artificial downstream of the UBP1 gene in the pZE21 vector using Gibson assembly. Artificial operons were created by inserting the following RBS sequence between the UBP1 and clp genes:

TAATAAAAGGAGATATACC. This RBS was originally designed using the RBS calculator218 and

previously validated in the context of another artificial operon219. Rational engineering of ClpS

variants was performed by dividing the clpS gene into two amplicons where the second amplicon

contained a degenerate NTC or NTT sequence in the oligo corresponding to each codon of interest.

The four initial positions of interest in the clpS gene correspond to amino acids 32, 43, 65, and 99.

In each case, Gibson assembly was used to ligate both amplicons and the backbone plasmid. The

pZE/UBP1/ClpS and pZE/UBP1/ClpS_V65I plasmids are available from Addgene (Cat. No. 98566 and

98567).

Three reporter constructs were initially cloned into pZE21 vectors before use as templates for

PCR amplification and genomic integration. The first of these consists of a Ubiquitin-*-LFVQEL-

sfGFP-His6x fusion (“Ub-UAG-sfGFP”) downstream of the TET promoter. The second has an

95 additional UAG codon internal to the sfGFP at position Y151* (“Ub-UAG-sfGFP_151UAG”). The third

has an ATG codon (encoding Met) in place of the first UAG (“Ub-M-sfGFP_151UAG”).

Strains and strain engineering.

E. coli strain C321.∆A (NCBI Accession No. CP006698.1), which was previously engineered to be devoid of UAG codons and RF1, was the starting strain used for this study7.

The TET promoter and Ub-UAG-sfGFP expression cassette was genomically integrated using λ

Red recombineering220 and tolC negative selection using Colicin E1221,222. This resulted in strain

C321.Ub-UAG-sfGFP. Please see Table 6 for sequences of key constructs such as the reporter

construct. Multiplex automatable genome engineering (MAGE)158 was used to inactivate the

endogenous mutS and clpS genes when needed and to add or remove UAG codons in the

integrated reporter. For MAGE, saturated overnight cultures were diluted 100- fold into 3 mL

LBL containing appropriate and grown at 34 °C until mid-log. The integrated Lambda

Red cassette in C321. ∆A derived strains was induced in a shaking water bath (42 °C, 300 rpm,

15 minutes), followed by cooling culture tubes on ice for at least two minutes.

Table 6 Sequences of key constructs used in chapter 2 Constru Sequence ct Name Ubiquitin ATGCAGATTTTTGTGAAGACTTTAACAGGTAAGACGATTACCCTGGAGG -*- TGGAGTCCTCGGACACCATCGATAATGTAAAATCAAAAATCCAAGATAA LFVQEL GGAAGGAATCCCTCCAGACCAGCAACGTCTGATTTTCGCAGGTAAACA -sfGFP- ACTGGAGGATGGTCGCACGCTTTCGGACTACAACATCCAGAAAGAATC His6x TACCCTTCATTTGGTTCTGCGTCTGCGTGGAGGATAGTTGTTTGTGCAG GAGCTTgcatccaagggcgaggagctctttactggcgtagtaccaattctcgtagagctcgatggcg atgtaaatggccataagttttccgtacgcggcgagggcgagggcgatgcaactaacggcaagctcact ctcaagtttatttgtactactggcaagctcccagtaccatggccaactctcgtaactactctgacctatggcg tacaatgtttttcccgctatccagatcacatgaagcaacatgatttttttaagtccgcaatgccagagggcta tgtacaagagcgcactattagctttaaggatgatggcacctataagactcgcgcagaggtaaagtttgag ggcgatactctcgtaaatcgcattgagctcaagggcattgattttaaggaggatggcaatattctcggccat aagctggagtataatttcaattcccataatgtatacattaccgcagataagcaaaagaatggcattaagg cgaattttaagattcgccataatgtggaggatggctccgtacaactcgcagatcattatcaacaaaatact ccaattggcgatggcccagtactcctcccagataatcattatctctccactcaatccgtgctctccaaagat ccaaatgagaagcgcgatcacatggtactcctggagtttgtaactgcagcaggcattactcatggcatgg

96 atgagctctataagctcgagcaccaccaccaccaccactaa pAzFRS. GTTATGcactacGATggtgttgacgttTACgttggtggtatggaacagcgtaaaatccacatgctg 1.t1 gcgcgtgaactgctgccgaaaaaagttgtttgcatccacaacccggttctgaccggtctggacggtgaag gBlock gtaaaatgtcttcttctaaaggtaacttcatcgcggttgacgactctccggaagaaatccgtgcgaaaatc aaaaaagcgtactgcccggcgggtgttgttgaaggtaacccgatcatggaaatcgcgaaatacttcctg gaatacccgctgaccatcaaaGGT ScUBP1t ATGGGGAGTGGGTCTTTCATTGCTGGGCTTGTCAACGATGGTAATACG runc, or TGTTTTATGAACTCGGTTCTTCAGTCCCTTGCTAGTAGCCGTGAACTTA UBP1 TGGAGTTTTTGGATAATAATGTAATCCGTACATATGAAGAAATTGAACA GAACGAGCACAATGAGGAAGGTAATGGCCAAGAGAGCGCACAAGATG AGGCAACTCACAAAAAAAACACTCGCAAGGGAGGTAAGGTCTATGGGA AGCATAAAAAGAAATTAAACCGCAAATCTTCTAGCAAGGAAGACGAAGA AAAGTCGCAAGAACCAGACATTACGTTTTCGGTGGCGTTGCGTGATCT GCTGAGCGCATTAAATGCTAAGTATTATCGCGACAAACCCTACTTTAAG ACTAACTCTTTATTAAAAGCGATGAGCAAGTCCCCGCGCAAAAATATCT TGCTTGGGTACGATCAAGAAGACGCTCAGGAATTTTTTCAAAACATTCT TGCGGAGTTAGAATCTAATGTCAAGTCGTTAAACACAGAAAAGCTTGAT ACTACACCGGTAGCCAAGTCCGAACTTCCAGACGATGCTCTGGTTGGC CAATTAAACCTTGGTGAGGTAGGCACCGTGTACATTCCCACAGAACAA ATTGACCCCAATTCGATTTTACATGACAAATCGATTCAAAACTTTACCCC CTTTAAACTGATGACCCCGTTGGATGGGATCACGGCTGAGCGCATCGG CTGCCTGCAATGCGGAGAGAACGGGGGAATTCGCTACAGTGTTTTCAG CGGATTAAGTTTGAACCTGCCGAATGAAAATATTGGAAGCACTCTTAAA CTGTCCCAGTTACTGTCCGATTGGTCGAAACCCGAGATTATCGAGGGT GTTGAATGCAACCGTTGCGCTTTAACAGCTGCGCACTCACACTTGTTTG GCCAATTAAAGGAGTTTGAGAAGAAACCTGAAGGCTCGATTCCCGAAA AACTTATTAATGCCGTAAAGGACCGCGTGCACCAGATCGAAGAGGTCT TGGCAAAGCCGGTTATCGACGATGAAGATTATAAAAAATTGCATACTGC GAATATGGTCCGCAAGTGTTCAAAAAGTAAACAAATTCTTATCTCTCGT CCACCACCTTTGTTGTCTATTCATATCAACCGCTCTGTTTTCGACCCGC GCACCTACATGATTCGCAAGAACAACTCCAAGGTTTTGTTCAAGTCACG CTTGAACCTGGCACCCTGGTGCTGTGATATCAACGAAATCAATCTTGAC

97 GCACGCCTTCCGATGTCGAAGAAGGAAAAAGCAGCTCAACAAGATTCT TCTGAAGACGAGAACATTGGCGGAGAGTACTATACTAAATTGCATGAA CGTTTTGAGCAGGAGTTTGAAGATTCTGAAGAAGAGAAGGAATACGAT GATGCAGAGGGTAATTATGCATCGCATTATAACCATACCAAGGACATCT CCAACTACGATCCATTGAATGGAGAAGTCGACGGTGTGACTTCCGATG ATGAGGATGAATACATTGAAGAGACAGACGCGTTGGGGAATACCATCA AAAAACGTATTATTGAACACTCCGACGTGGAGAACGAAAACGTGAAGG ATAATGAAGAACTTCAGGAGATCGATAACGTTAGCTTGGATGAGCCAAA AATTAATGTCGAGGACCAGCTTGAAACGAGTTCTGATGAGGAAGACGT TATTCCTGCTCCACCCATCAACTACGCTCGCAGCTTTAGTACGGTCCCA GCGACCCCTTTAACTTACTCTTTGCGCAGCGTCATCGTGCACTATGGG ACTCACAACTACGGACATTATATTGCATTTCGCAAGTATCGTGGATGTT GGTGGCGCATCTCCGATGAGACGGTCTATGTGGTAGATGAGGCCGAA GTACTGTCAACACCGGGGGTATTTATGCTTTTCTACGAGTATGATTTCG ACGAGGAGACCGGAAAAATGAAAGACGACTTAGAAGCTATCCAGAGCA ATAATGAGGAAGATGACGAGAAAGAACAGGAACAGAAGGGTGTCCAG GAGCCAAAAGAATCCCAGGAGCAAGGCGAAGGCGAAGAACAAGAAGA AGGGCAAGAGCAAATGAAATTTGAGCGTACGGAGGATCATCGCGACAT TTCAGGGAAGGATGTGAATTAA

These cells were made electrocompetent at 4 °C by pelleting 1 mL of culture (16,000 rcf, 20 seconds) and washing twice with 1 mL ice cold deionized water (dH2O).

Electrocompetent pellets were resuspended in 50 μL of dH2O containing the desired DNA. For

MAGE oligonucleotides, 5 μM of each was used. Please see Table 7for a list of

all oligonucleotides used in this study. For integration of dsDNA cassettes, 50 ng was used.

Allele-specific colony PCR was used to identify desired colonies resulting from MAGE as previously described223. Colony PCR was performed using Kapa 2G Fast HotStart ReadyMix following manufacturer protocols and Sanger sequencing was performed by Genewiz to verify strain engineering. The strains C321.Ub- UAG-sfGFP and C321.ΔClpS.Ub-UAG-sfGFP are available from Addgene (Cat. No. 98564 and 98565). Ub-X-GFP reporters containing codons encoding standard AAs in place of UAG were generated from Ub-UAG-GFP by PCR and Gibson

assembly, and they were subsequently cloned into the pOSIP-TT vector for Clonetegration

98 (one-step cloning and chromosomal integration) into NEB5α strains224. The UBP1/clpS_V65I was also placed under weak constitutive expression and integrated into

C321.ΔClpS.Ub-UAG-sfGFP using Clonetegration. This strain (C321.Nend) was used as the host for FACS experiments.

Table 7 Oligonucleotides used in Chapter 2 Oligo Name Sequence pZE21-seq-F CCATTATTATCATGACATTAACC pZE21-seq-R GGATTTGTCCTACTCAGGAG AARS-seq-F CTTTTTATCGCAACTCTC Ubiquitin+N-degron-F TTAAAGAGGAGAAATTAACTATGCAGATTTTTGTGAAGACT Ubiquitin+N-degron- AGCTCCTCGCCCTTGGATGCAAGCTCCTGCACAAACAAGT R pEVOLbbone_Ubp1- CAGGGAAGGATGTGAATTAATAAGTCGACCATCATCATCA F pEVOLbbone_Ubp1- ATGAAAGACCCACTCCCCATAGATCTAATTCCTCCTGTTAG R C Ubp1-P1-F TAACAGGAGGAATTAGATCTATGGGGAGTGGGTCTTTCAT Ubp1-P1-R TCAAGCGTGACTTGAACAAAACCTTGGAGTTGTTCTTGCG Upb1-P2-F CGCAAGAACAACTCCAAGGTTTTGTTCAAGTCACGCTTGA Upb1-P2-R TGATGATGATGGTCGACTTATTAATTCACATCCTTCCCTGA pUbi-*-Ndeg-GFP-F TGCGTCTGCGTGGAGGATAGTTGTTTGTGCAGGAGCTTGC pUbi-*-Ndeg-GFP-R AAGCTCCTGCACAAACAACTATCCTCCACGCAGACGC Ubp1_int-seq-F GCTTGGGTACGATCAAGAAG Ubp1_int-seq-R CCTTGGTATGGTTATAATGCG pZE21bbone4Ubp1-F CAGGGAAGGATGTGAATTAAAAGCTTGATGGGGGATCCCA pZE21bbone4Ubp1- ATGAAAGACCCACTCCCCATGGTACCTTTCTCCTCTTTAAT R GAAT Ubp1-ins-F TTAAAGAGGAGAAAGGTACCATGGGGAGTGGGTCTTTCAT Ubp1-ins-R TGGGATCCCCCATCAAGCTTTTAATTCACATCCTTCCCTGA TAAAGAGGAGAAAGGTACCATGCAGATTTTTGTGAAGACTT UbiGFPins-F TAAC UbiGFPins-R TGGGATCCCCCATCAAGCTTTTAGTGGTGGTGGTGGTGGT pZEbbone4UbiGFP- ACCACCACCACCACCACTAAAAGCTTGATGGGGGATCCCA F pZEbbone4UbiGFP- GTCTTCACAAAAATCTGCATGGTACCTTTCTCCTCTTTAATG R AAT reporter_to_genome- TTACGGGCTAATTACAGGCAGAAATGCGTGATGTGTGCCAC F ACTTGTTGATCCCTATCAGTGATAGAGATTGAC reporter_to_genome- CCAGCGGGCTAACTTTCCTCGCCGGAAGAGTGGTTAACAA R AATAGTAACGTCACCGACAAACAACAGATAAAAC SIR-seq-F CCAAAGTGAGTTGAGTATAAC 99 SIR-seq-R TTTCTCCTTATTATCAATGC GCCGCAGCAAGCCAAAGTGAGTTGAGTATAACGCAAATTTG r2g-extend-F CTACTGGTCCGATGGGTGCAATGGTCTGAATTACGGGCTA ATTACAGGC AACGCAATCGCAACCGCTAAACCACTGGCCATGTGCACGA r2g-extend-R GTTTCATTCATTTCTCCTTATTATCAATGCACCAGCGGGCTA ACTTTC t*a*aagagctcctcgcccttggatgcAAGCTCCTGCACAAACAACgATC MAGE_*toS CTCCACGCAGACGCAGAACCAAATGAAGGGTAGATTCTTTC T

100 asPCR-S-F CGTCTGCGTGGAGGATC asPCR-*-F CGTCTGCGTGGAGGATA pZE- TTCTGACCCATCGTAATTAAaagcttgatgggggatccca Ubp1bbone4ClpP-F pZE- tGGTATATCTCCTTTTATTATTAATTCACATCCTTCCCTGAAA Ubp1bbone4ClpP-R T GTGAATTAATAATAAAAGGAGATATACCatgTCATACAGCGG clpPins-F CGA clpPins-R tgggatcccccatcaagcttTTAATTACGATGGGTCAGAATCG pEVOLtRNA-p1-F ctgccaacttactgatttagtgtatgatggtgtttttgagg pEVOLtRNA-p1-R gccgcttagttagccgtgcaaacttatatcgtatggggctg pEVOLtRNA-p2-F agccccatacgatataagtttgcacggctaactaagcggc pEVOLtRNA-p2-R ctcaaaaacaccatcatacactaaatcagtaagttggcagcatca pZE- TGTGTACGCTAGAAAAAGCCTAAaagcttgatgggggatc Ubp1bbone4ClpS-F pZE- GTTCGTTTTACCcatGGTATATCTCCTTTTATTATTAATTCACA Ubp1bbone4ClpS-R T ClpSins-F ATAATAAAAGGAGATATACCatgGGTAAAACGAACGACTG ClpSins-R gatcccccatcaagcttTTAGGCTTTTTCTAGCGTACACA AARSlibraryins-F tactgtttctccatacccgtttttttgggctaacaggaggaattagatct pEVOLbbone4lib-R agatctaattcctcctgttagcc A*C*CCCATGAGTGCAATAGAAAATTTCGACGCCCATACGCC mutS_null_mut-2* CATGATGCAGCAGTGATAGTCGCTGAAAGCCCAGCATCCC GAGATCCTGC A*C*CCCATGAGTGCAATAGAAAATTTCGACGCCCATACGCC mutS_null_revert-2* CATGATGCAGCAGTATCTCAGGCTGAAAGCCCAGCATCCC GAGATCCTGC mutS-2_ascPCR_wt- CCATGATGCAGCAGTATCTCAG F mutS- CCATGATGCAGCAGTGATAGTC 2_ascPCR_mut-F mutS-2_ascPCR-R AGGTTGTCCTGACGCTCCTG ASPCR-151UAG-F GTATAATTTCAATTCCCATAATGTATAG ASPCR-151UAC-F GTATAATTTCAATTCCCATAATGTATAC ASPCR-151-R ctcgagcttatagagctcatc Remove151UAG- c*t*taaaattcgccttaatgccattcttttgcttatctgcggtaatgtatacattatgggaattg MAGE_corrected aaattatactccagcttatggccgag C*T*TTTTCTTCCGCCAGTTGATCAAAGTCCAGCCAGTCGTT ClpS.inact-MAGE CtaTTatCaCATTGTCAGTTATCATCTTCGGTTACGGTTATCG GCAGAAC ASPCR-ClpS_WT-F CCGATAACCGTAACCGAAGATGATAACTGACAATGG ASPCR-ClpS.inact-F CCGATAACCGTAACCGAAGATGATAACTGACAATGT ASPCR-ClpS-R CGTACTTGTTCACCATCGCCACTTTGGT pZE-U- CGACTGAGCCCGAGGAGTAAaagcttgatgggggatccca bbone4ClpS2_At-F pZE-U- TCAACAGGACTATCAGACATGGTATATCTCCTTTTATTATTA bbone4ClpS2_At-R ATTCACATCC

101 ATAATAAAAGGAGATATACCATGTCTGATAGTCCTGTTGACT ClpS2_At-ins-F T ClpS2_At-ins-R tgggatcccccatcaagcttTTACTCCTCGGGCTCAGTCG ClpS_M40A-F ATGATGATTACACTCCGGCGGAGTTTGTTATTGACGTGT ClpS_M40A-R CGTCAATAACAAACTCCGCCGGAGTGTAATCATCATTGAC pOSIPbbone-F taacctaaactgacaggcat pOSIPbbone-R ttccgatccccaattcct pEVOL-araC-seq-1 GGATCATTTTGCGCTTCAG pEVOL-araC-seq-2 GAATATAACCTTTCATTCCC pEVOLCmR-seq-R caacagtactgcgatgag upstreamClpS-F GCAAATAAGCTCTTGTCAGC CATCTATGTATAAAGTGATANTCGTCAATGATGATTACACTC ClpS_L32-NTC-F CG ClpS_32-R TATCACTTTATACATAGATG ATTACACTCCGATGGAGTTTNTTATTGACGTGTTACAAAAAT ClpS-V43-NTT-F TC ClpS_43-R AAACTCCATCGGAGTGTAAT ClpS_V65-NTT-F CAACGCAATTGATGCTCGCTNTTCACTACCAGGGGAAGG ClpS_65-R AGCGAGCATCAATTGCGTTG CGAGGGAGAATGAGCATCCANTCCTGTGTACGCTAGAAAA ClpS_L99-NTC-F AGC ClpS_99-R TGGATGCTCATTCTCCCTCG Alt_ClpS-R_forL99 gcggatttgtcctactcag AARS-inducible-only- gctaacaggaggaattagatct F AARS-inducible-only- ttgataatctaacaaggattatggg R pEVOLbbone-Ind- cccataatccttgttagattatcaaaggcattttgctattaaggg only-F pEVOL-bbone-ind- agatctaattcctcctgttagc only-R protosens-bbone-F TAACTCGAGGCTGTTTTGG protosens-bbone-R CATATGTATATCTCCTTGTGCATC Ubp1ClpS4protosens GATGCACAAGGAGATATACATATGGGGAGTGGGTCTTTCAT -F Ubp1ClpS4protosens CCAAAACAGCCTCGAGTTAGGCTTTTTCTAGCGTACA -R pAzFRS.1.t1-ins-F acccgatcatgcaggttaacGTTATGcactacGATggtgt pAzFRS.1.t1-ins-R tcaccaccgaatttttccggACCtttgatggtcagcg bbone4pAzFRS.1.t1- ccggaaaaattcggtggtga F bbone4pAzFRS.1.t1- gttaacctgcatgatcgggt R pZEbbone4tetR-F acgctctcctgagtaggac pZEbbone4tetR-R tcaccgacaaacaacagataaaac TetR-ins-F tatctgttgtttgtcggtgaacgtctcattttcgccagat TetR-ins-R gtcctactcaggagagcgtagtgtcaactttatggctagc 102 libraryINS-seq-R CGCATCAGGCAATTTAGC pEVOLbbone4libv2-F ctgcagtttcaaacgctaaattg AARSlibraryinsv2-R taggcctgataagcgtagcgcatcaggcaatttagcgtttgaaactgcag pAcFRS.2.t1_correct TAGCGTTTGAAACTGCAGTTATAATCTCTTTCTAATTGGCTC ed_ins-R TAA pEVOLbbone4Kan-F cattttagcttccttagctcctg pEVOLbbone4Kan-R taatttttttaaggcagttattggtgc KanRins4pEVOL-F taactgccttaaaaaaattagaagaactcgtcaagaaggc KanRins4pEVOL-R gagctaaggaagctaaaatgattgaacaagatggattgcac TrpRS-int-seq ACAAAACGCCATGTCTTATC TrpRS-int2-seq GTTTACGAAATGGTTGCAAG TrpRS-int3-seq CTTCATCGCGAATTAGGC pZE-Ub-bbone-F caccaccaccaccaccac pZE-Ub-bbone-R AAGCTCCTGCACAAACAAC pOSIPbbone-F taacctaaactgacaggcat pOSIPbbone-R ttccgatccccaattcct Ubi-NNN-GFP-F TGCGTCTGCGTGGAGGANNNTTGTTTGTGCAGGAGCTTGC Ubi-NNN-GFP-R TCCTCCACGCAGACGC AGGAATTGGGGATCGGAATCCCTATCAGTGATAGAGATTGA Reporter2pOSIP-F C ATGCCTGTCAGTTTAGGTTATCACCGACAAACAACAGATAA Reporter2pOSIP-R AAC Ubiquitin-seq-F CAGGTAAACAACTGGAGG ATCATTTCGAATTCGTCCATATGTATATCTCCTTCTTAAAGTT pET20.R AAACAAAATTATTTCTAGAGGG pET20.F2 CGATCCGTAAACGTCTGGCGCTTGCGGCCGCACT TTTAAGAAGGAGATATACATATGGACGAATTCGAAATGATC BipRS.F AAACGTAACAC TGCTCGAGTGCGGCCGCAAGCGCCAGACGTTTACGGATCG BipRS.R2 G pUCbip_F AAATCCCCTCCGCCGGACCAGGCATAAGCTTGGCG CCCTGCTGAACTACCGCCGGTATAGTGAGTCGTATTAGGAT pUCbip_R CCCCG TCCTAATACGACTCACTATACCGGCGGTAGTTCAGCAGGG tBip_F CAGAACGGCGGACTCTAAATCCGCATGGCAGGGGTTCAAA TCCCCTCCGCCGGACCAGGCATAAGCTTGGCGTAATC GATTACGCCAAGCTTATGCCTGGTCCGGCGGAGGGGATTT tBip_R GAACCCCTGCCATGCGGATTTAGAGTCCGCCGTTCTGCCC TGCTGAACTACCGCCGGTATAGTGAGTCGTATTAGGA TCCTAATACGACTCACTATACCGGCGGTAGTTCAGCAGGG tBip9_F CAGAACGGAGGACTCTAAATCCGCATGGCAGGGGTTCAAA TCCCCTCCGCCGGACCAGGCATAAGCTTGGCGTAATC GATTACGCCAAGCTTATGCCTGGTCCGGCGGAGGGGATTT tBip9_R GAACCCCTGCCATGCGGATTTAGAGTCCTCCGTTCTGCCCT GCTGAACTACCGCCGGTATAGTGAGTCGTATTAGGA

103 TCCTAATACGACTCACTATACCGGCGGTAGTTCAGCAGGG tBip10_F CAGAACGGCGGACTCTAAATCCGCATGGCATGGGTTCAAA TCCCCTCCGCCGGACCAGGCATAAGCTTGGCGTAATC GATTACGCCAAGCTTATGCCTGGTCCGGCGGAGGGGATTT tBip10_R GAACCCATGCCATGCGGATTTAGAGTCCGCCGTTCTGCCC TGCTGAACTACCGCCGGTATAGTGAGTCGTATTAGGA

Culture Conditions.

Cultures for general culturing, for experiments in Figure 2, for FACS screening, and

for biocontainment escape assays were grown in LB-Lennox medium (LBL: 10 g/L bacto

tryptone, 5 g/L sodium chloride, 5 g/L yeast extract). Cultures for all other experiments in

Figures 2 and 3 were grown in 2X YT medium (2XYT: 16 g/L bacto tryptone, 10 g/L bacto

yeast extract, 5 g/L sodium chloride) given improved observed final culture densities

compared to LBL upon expression of ClpS variants. Unless otherwise indicated, all cultures

were grown in biological triplicate in 96-well deep-well plates in 300 µL culture volumes at 34 °C and 400 rpm.

nsAA Incorporation Assays.

Strains harboring integrated GFP reporters and AARS/tRNA plasmids were inoculated from frozen stocks in biological triplicate and grown to confluence overnight in deep well plates. Experimental cultures were inoculated at 1:100 dilution in either LBL or

2XYT media supplemented with chloramphenicol, arabinose, and the appropriate nsAA.

Cultures were incubated at 34 °C to an OD600 of 0.5–0.8 in a shaking plate incubator at 400

rpm (~4-5 h). GFP expression was induced by addition of anhydrotetracycline, and cells

were incubated at 34 °C for an additional 16-20 h before measurement.

All assays were performed in 96-well plate format. Cells were centrifuged at

104 5,000g for 3 min, washed with PBS, and resuspended in PBS after a second spin. GFP fluorescence was measured on a Biotek spectrophotometric plate reader using excitation and emission wavelengths of 485 and 525 nm (Gain = 80). Fluorescence signals were corrected for autofluorescence as a linear function of OD600 using the parent C321.∆A strain that does not contain a reporter. Fluorescence was then normalized by the OD600

reading to obtain FL/OD.

Chemicals.

nsAAs and SAAs used in this study were purchased from PepTech Corporation,

Sigma Aldrich, Santa Cruz Biotechnology, Bachem, and Toronto Research Chemicals. The following amino acids were purchased: L-4,4-Biphenylalanine (BipA), L-4-

Benzoylphenylalanine (pBnzylF), O-tert-Butyl-L-tyrosine (tBtylY), L-2-Naphthylalanine

(NapA), L-4-Acetylphenylalanine (pAcF), L-4-Iodophenylalanine (pIF), L-4-

Bromophenylalanine (pBrF), L-4-Chlorophenylalanine (pClF), L-4-Fluorophenylalanine (pFF),

L-4-Azidophenylalanine (pAzF), L-4-Nitrophenylalanine (pNitroF), L-3-Iodophenylalanine

(mIF), L-phenylalanine, L-tyrosine, L-tryptophan, and 5-Hydro- xytryptophan (5OHW).

Solutions of amino acids (50 or 100 mM) were made in 10-50 mM NaOH.

Minimal Media SAA Spiking Experiments.

Minimal media adapted C321.ΔA strains225 harboring either (i) pZE21/Ub-M-

%& ()* sfGFP_151UAG only, (ii) pZE21/Ub-M-sfGFP_151UAG and pEVOL/Mj opt, (iii) pZE21/Ub-M-sfGFP_151UAG only and pEVOL/bipARS_WT-tRNA_WT,𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 or (iv) pZE21/Ub-M- sfGFP_151UAG only and pEVOL/bipARS_10-tRNA_10 were inoculated from frozen stocks in at least experimental duplicates. A 1X M9 salt medium containing 6.78 g/L Na2HPO4 ∙7H2O,

105 3 g/L KH2PO4, 1 g/L NH4CI, and 0.5 g/L NaCl, supplemented with 2 mM MgSO4, 0.1 mM

CaCl2, 1% glycerol, trace elements, 0.25 μg/L D-biotin, and carbenicillin was used as the culture medium. The trace element solution (100X) used contained 5 g/L EDTA, 0.83 g/L

FeCl3∙6H2O, 84 mg/L ZnCl2, 10 mg/L CoCl2∙6H2O, 13 mg/L CuCI2∙2H2O, 1.6 mg/L MnCl2∙2H2O

226 and 10 mg/L H3BO3 dissolved in water as previously used for metabolic engineering .

Inoculum were grown to confluence overnight in deep 96-well plates containing supplemented with 0.2% arabinose and chloramphenicol and/or kanamycin. Experimental cultures were inoculated at 1:7 dilution in the same media supplemented with each of the

20 standard amino acids or bipA to 1 mM or 100 uM, respectively. Cultures were incubated at 34 °C to an OD600 of 0.5–0.8 in a shaking plate incubator at 1050 rpm (~4-5 h). GFP

expression was induced by addition of anhydrotetracycline, and cells were incubated at 34

°C for an additional 16-20 h before measurement. All assays were performed in 96-well

plate format. Cells were centrifuged at 5,000g for 5 min, washed with 1 x PBS, and resuspended in 1 x PBS after a second spin. GFP fluorescence was measured on a Biotek spectrophotometric plate reader using excitation and emission wavelengths of 485 and

525 nm. Fluorescence was then normalized by the OD600 read- ing to obtain FL/OD.

Average normalized FL/OD from 3 independent experiments were plotted.

Library Generation.

Error-prone PCR (EP-PCR) was performed using the GeneMorph II Random

Mutagenesis Kit (Stratagene Catalog #200550), following manufacturer instructions to

obtain approximately an average of 2-4 DNA mutations per library member. To generate

libraries of MjTyrRS-derived AARSs, roughly 175 ng of PCR template was used in each 25

106 uL of PCR mix containing primers that have roughly 40 base pairs of homology flanking

the AARS coding region. The reaction mixture was subject to 30 cycles with Tm of 63°C

and extension time of 1 min. Four separate 25 uL EP-PCR reactions were performed per

AARS and then pooled. Plasmid back- bone PCRs were performed using KOD Xtreme Hot

Start Polymerase (Millipore Catalog #71795). Both PCR products were isolated by 1% agarose gel electrophoresis, DpnI digested, and Gibson assembled in 8 parallel 20 uL volumes per library. Assemblies were pooled, washed by ethanol precipitation, and resuspended in 50 μL of dH2O, which was drop dialyzed (EMD Milli-pore, Billerica, MA) and electroporated into “E. cloni” supreme cells (Lucigen, Middleton, WI). Libraries were expanded in culture and miniprepped (Qiagen, Valencia, CA). 1 μg of library was drop dialyzed and electroporated into C321.Nend for subsequent FACS experiments. Colony counts of dilutions of each transformation plated on appropriate antibiotic within one doubling time after transformation revealed library sizes of roughly 1 x 106 for AARS libraries in E. cloni hosts and 1 x 107 in C321.Nend hosts. 20 colonies were picked from

each plate to confirm library diversity and 5-15% parent construct was observed.

Transformation of Gibson assembly into E. cloni hosts was the bottleneck that

determined library size, and efforts were made during all subsequent steps to ensure

oversampling.

Flow Cytometry and Cell Sorting.

AARS libraries were subject to three rounds of fluorescence activated sorting in a

Beckman Coulter MoFlo Astrios EQ cell sorter. Prior to each round, the nsAA incorporation assay procedure detailed above was followed such that cells would express

107 GFP reporter proportional to the activity of the AARS library member. One notable

deviation from that procedure was the use of a higher and variable inoculum volume, up to 25 uL, to avoid bottlenecking the library. Cells displaying the top 0.5% of fluorescence activation (50k cells) were collected after Round 1, expanded overnight, and used to inoculate experimental cultures for the next round. Because the next round was a negative screening round, the desired nsAA was not added into culture medium. The rest of the nsAA incorporation assay procedure was followed to eliminate cells that exhibited fluorescence due to promiscuous AARS activity on standard amino acids. In the second sort, cells displaying the lowest 10%-20% of visible fluorescence (500k cells) were collected. Cells passing the second round were expanded overnight and used to inoculate the third and final round of sorting. The experimental cultures for the third round were treated as the first round and were sorted for the upper 0.05% of fluorescence activation

(1k cells). The final cells collected were expanded overnight and plated for sequencing and downstream testing.

Libraries were frozen at each stage before and after sorting. FlowJo X software was used to analyze the flow cytometry data. Constructs of interest were grown overnight, miniprepped, and transformed into C321.∆A.Ubiq-UAG-sfGFP for further analysis in plate

reader assays.

Reporter Purification.

Strains harboring integrated GFP reporters and AARS/tRNA plasmids were inoculated from frozen stocks and grown to confluence overnight in 5 mL 2XYT containing chloramphenicol. Saturated cultures were used to inoculate 500 mL experimental cultures

108 of 2XYT supplemented with chloramphenicol, arabinose, and appropriate nsAAs. Cultures

were incubated at 34 °C to an OD600 of 0.5–0.8 in a shaking incubator at 250 rpm. GFP

expression was induced by addition of anhydrotetracycline, and cells were incubated at

34 °C for an additional 24 h before measurement. Cells were centrifuged in a Sorvall RC 5C

Plus at 10,000 g for 20 minutes. Pellets were frozen at -20 °C before lysis and purification.

Lysis of resuspended pellets was performed under denaturing conditions in 10 mL 7 M

urea, 0.1 M Na2PO4, 0.01 M Tris-Cl, pH 8.0 buffer with 450 units of Benzonase (Novagen,

cat. no. 70664-3) using 15 minutes of sonication in ice using a QSonica Q125 sonicator.

Lysate was distributed into microcentrifuge tubes and centrifuged for 20 minutes at

20,000 g at room temperature, and then protein- containing supernatant was removed. 2

mL supernatant with 7.5 uM imidazole was added to 250 uL Ni-NTA resin (Qiagen Cat no.

30210) and equilibrated at 4°C overnight. Columns were washed with 7x 1 mL washes using 8 M urea, 0.1 M Na2PO4, 0.01 M Tris-Cl. Wash 1 and 2 were adjusted to pH 6.3 and

contained no imidazole. Washes 3-7 were adjusted to pH 6.1 and

contained imidazole at concentrations of 10 mM, 25 mM, 40 mM, 60 mM and 80 mM

respectively. Protein was eluted with two 150 uL elutions using elution buffer (8 M urea,

0.1 M Na2PO4, 0.01 M Tris-Cl, pH 4.5, 300 mM imidazole). Gels demonstrated that wash 5

eluted the protein, and for several samples the wash 5 fraction was concentrated ~20X

using Amicon Ultra 0.5 mL 10K spin concentrators. Protein gels were loaded with 30 uL

wash or elution volumes along with 10 uL Nu- PAGE loading dye in Nu-PAGE 10% Bis-Tris

Gels (ThermoFisher Cat. no NP0301). Protein gels were run at 180 V for 1 h, washed 3x

with DI water, stained with coomassie (Invitrogen Cat. no LC6060) for one hour. Gels were

109 destained overnight in water on a shaker at room temperature and images were taken

with a BioRad ChemiDoc MP imaging system.

Mass Spectrometry.

Samples were submitted for single LC-MS/MS experiments that were performed on a LTQ Orbitrap Elite (Thermo Fischer) equipped with Waters (Milford, MA)

NanoAcquity HPLC pump. Trypsin-digested peptides were separated onto a 100 µm inner diameter microcapillary trapping column packed first with approximately 5 cm of C18

Reprosil resin (5 µm, 100 Å, Dr. Maisch GmbH, Germany) followed by analytical column

~20 cm of Reprosil resin (1.8 µm, 200 Å, Dr. Maisch GmbH, Germany). Separation was achieved through applying a gradient from 5–27% ACN in 0.1% formic acid over 90 min at

200 nl min−1.

Electrospray ionization was enabled through applying a voltage of 2.0 kV using a home-made electrode junction at the end of the microcapillary column and sprayed from fused silica pico tips (New Objective, MA). The LTQ Orbitrap Elite was operated in the data-dependent mode for the mass spectrometry methods. The mass spectrometry survey scan was performed in the Orbitrap in the range of 395 –1,800 m/z at a resolution of 6 × 104, followed by the selection of the twenty most intense ions (TOP20) for CID-MS2

fragmentation in the Ion trap using a precursor isolation width window of 2 m/z, AGC

setting of 10,000, and a maximum ion accumulation of 200 ms.

Singly charged ion species were not subjected to CID fragmentation. Normalized collision energy was set to 35 V and an activation time of 10 ms, AGC was set to 50,000, the maximum ion time was 200 ms. Ions in a 10 ppm m/z window around ions selected

110 for MS2 were excluded from further selection for fragmentation for 60 s.

Mass Spectrometry Analysis.

Raw data were submitted for analysis in Proteome Discoverer 2.1.0.81 (Thermo

Scientific) software. Assignment of MS/MS spectra was performed using the Sequest HT algorithm by searching the data against a user provided protein sequence database as well as all entries from the E. coli Uniprot database and other known contaminants such as human keratins and common lab contaminants. Sequest HT searches were performed using a 20 ppm precursor ion tolerance and requiring each peptides N-/C termini to adhere

with Trypsin protease specificity while allowing up to two missed cleavages. Cysteine

carbamidomethyl (+57.021) was set as static modifications while methionine oxidation

(+15.99492 Da) was set as variable modification. MS2 spectra assignment false discovery

rate (FDR) of 1% on protein level was achieved by applying the target-decoy database search. Filtering was performed using a Percolator. For quantification, a 0.02 m/z window centered on the theoretical m/z value of each the six reporter ions and the intensity of the signal closest to the theoretical m/z value was recorded. Reporter ion intensities were exported in result file of Proteome Discoverer 2.1 search engine as an excel tables. All fold changes were analyzed after normalization between samples based on total unique peptides ion signal.

In vitro Aminoacylation Assays.

Wild-type BipARS, BipARS9, and BipARS10 DNA template was amplified from the pEVOL.BipARS plasmid and cloned into pET20b using Gibson assembly (New England

Biolabs) with primers pET20.F2 and pET20.R for linearization of pET20b and BipRS.F and

111 BipRS.R2 for amplification of BipARS. The BipARS.pET20b plasmids were transformed into

BL21(DE3) cells. A 25-mL overnight culture was used to inoculate 500 mL of fresh LB

media containing ampicillin. Cells were grown at 37 C to an OD600 of approximately 0.6, and protein overexpression was induced with 1 mM IPTG for 4 h. Cells were harvested by centrifugation at 4C for 20 minutes at 6000 rpm. Cells were lysed using 50 mM Tris

(pH7.5), 300 mM NaCl, 3 mM 2-mercaptoethanol and 5 mM imidazole followed by sonication. Lysed cells were centrifuged at 18000 x g for 1 h at 4 C. The supernatant was run through TALON resin and BipARS was eluted using an imidazole concentration gradient. The proteins were stored in 50 mM HEPES (pH 7.3), 50 mM KCl, and 1 mM dithiothreitol (DTT). Protein concentration was calculated using the Bradford assay

(BioRad).

The tRNA genes were cloned into pUC18 using Gibson Assembly. pUC18 was linearized using primers pUCbip_F and pUCbip_R. The tRNA gene fragment was prepared

by annealing 2 uM of primers tBip_F and tBip_R for WT tRNA, tBip9_F and tBip9_R for tRNA variant 9, and tBip10_F and tBip10_R for tRNA variant 10. tRNAs were obtained by in vitro using T7 RNA polymerase. ~100 ug of resulting plasmid was digested with BstNI overnight at 55 C, and the digestion reaction was used to start in vitro transcription by adding transcription buffer (40 mM Tris-HCl, pH 8, 6 mM MgCl2, 1 mM spermidine, 0.01% Triton, 0.005 mg/mL BSA, and 5 mM dithiothreitol), 4 mM NTPs (ATP,

GTP, UTP, and CTP), 20 mM MgCl2, 5 mM DTT, 2 units/mg of pyrophosphatase (Roche), and 0.75 mg/mL T7 RNA polymerase. The reaction was incubated for 6-7 h at 37 C. The tRNA was purified using an 8 M urea/12 % acrylamide gel and extracted from the gel using

112 a solution containing 0.5 M sodium acetate and 1 mM EDTA (pH 8) overnight at 30C

followed by ethanol precipitation.

For aminoacylation reactions, tRNAs were radiolabeled at the 3’-end using CCA- adding enzyme as previously described227. Reactions were carried out with 5 uM tRNA

(with trace amount of 32P-labeled tRNA), 2.5 mM amino acid, and 5 uM BipARS in buffer containing 50 mM HEPES (pH 7.3), 4 mM ATP, 20 mM MgCl2, 0.1 mg/mL BSA, and 1 mM

DTT. Reactions were incubated for 30 minutes at 37 uC. 2 uL of reaction mixture were

quenched in 5 uL of 0.1 U/uL P1 nuclease (Sigma) in 200 mM sodium acetate (pH 5) right

after enzyme addition and after 30 min. The quenched time points were incubated at

room temperature for 1 h. 1 uL of the solution was run PEI cellulose thin layer

chromatography sheets. The fraction of aminoacylated tRNA was determined as described

previously227. All assays were repeated three times. Figures were generated using Prism 7

(GraphPad Software).

Biocontainment Escape Frequency Assays.

Escape assays were performed nearly as previously described76. All strains were

grown in permissive conditions and harvested in late exponential phase. Cells were washed

twice in LB and resuspended in LB. Viable CFU were calculated from the mean and

standard error of the mean (SEM) of three technical replicates of tenfold serial dilutions on

permissive media. Three technical replicates were plated on non-permissive media and

monitored for 7 days. Synthetic auxotrophs were plated on two different non-permissive

media conditions: SCA - LB with SDS, chloramphenicol, and arabinose – for previously

published strains; and KA - LB with kanamycin and arabinose – for strains generated in this

113 study.

The latter strains were isolated by transformation with pEVOL vectors harboring kanamycin resistance markers instead of chloramphenicol resistance markers. Passaging and replica plating was used to ensure that isolated strains had lost chloramphenicol resistance and thus used the original OTS construct in the previous study. If synthetic auxotrophs exhibited escape frequencies above the detection limit (lawns) on non-

permissive media at days 2, 5, or 7, escape frequencies for those days were calculated

from additional platings at lower density. The SEM across technical replicates of the

cumulative escape frequency was calculated as previously indicated.

Biocontained strain doubling time measurement.

Doubling times for biocontained strains were measured in triplicate by plate reader

as indicated earlier for growth assays. Doubling time assays for biocontained strains in the

presence of only non-cognate nsAAs were performed as follows: cells grown to mid-log in

permissive media were washed twice in LB and diluted to OD ~0.1 before 300-fold dilution

into three 150 μL volumes of LB+nsAA for each nsAA. These cultures were incubated in

the Eon plate reader at conditions described earlier.

114

Chapter 3 Appendix

Funding, COI and acknowledgements:

Funding Sources:

This work was supported by Life Sciences Research Foundation Fellowship awarded to E.K.

Work in the Church laboratory was supported by US Department of Energy Grant DE-FG02-

02ER63445.

Conflict of Interest Disclosure:

G.M.C. has related financial interests in 64-x, EnEvolv, and GRO Biosciences. For a complete list

of G.M.C.’s financial interests, please visit arep.med.harvard.edu/gmc/tech.html.

Acknowledgements:

We thank J. Aach, T. Bernhardt, C. Fuqua, D. Boyd, H. C. Lim, E. M. Appleton, T. M. Wannier, M.

Schubert, A. Kunjapur and J.A.B. Marchand, for their help with the project design; J. Aach for his help improving the readability of the manuscript. C. Fuqua for providing A. tumefaciens plasmids and for his advice and assistance in experiments involving A. tumefaciens; A. Kunjapur

for providing the plasmid constructs for the initial experiments. D. Boyd for sharing λcl857

strains his help with phage lysis experiments; H. C. Lim for sharing the PopZ constructs and for

his related insights. G. Kuziel for his advice and assistance with in vitro experiments; J.A.B.

Marchand for his advice with fluorescence cell-sorting experiments.

Methods

Reagents

115 Antibiotics and nsAAs were purchased from Sigma, except for AbK, which was purchased from TOCRIS. Apidaecin 1b was purchased from AnaSpec. Api137 was purchased from NovoPro

Biosciences Inc. N-3-oxo-octanoyl-L-Homoserine lactone (NHL) was purchased from Cayman

Chemical and stock solutions with it were made in ethyl acetate (acidified with 0.01% acetic acid) to 1 mg mL-1. nsAA stock solutions were prepared in water with minimal base or acid, e.g. 0.3 M

KOH to prepare 0.2 M Bpa stock solution, except for Cou stock, which was prepared in DMSO at concentrations of 100–200 mM. Aqueous apidaecin (to 5–20 mg mL-1) stocks and nsAA stock solutions were filter-sterilized and stored at -20 °C before use. DNA oligonucleotides and gBlocks were synthesized by IDT.

In Vitro Protein Translation Assay.

The recombinant MjBpaRS was prepared as previously described148. was 𝑇𝑇𝑇𝑇𝑇𝑇 𝐶𝐶𝐶𝐶𝐶𝐶 prepared by in vitro transcription and purified as previously described228. PURExpress®𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 and

PURExpress® and Δ RF123 Kit were purchased from NEB and the cell-free translation experiments were set-up following the manufacturer’s instructions supplemented with 20 ng μL-1 linearized

DNA templates (T7-(UAG)0-sfGFP, T7-(UAG)1-sfGFP, or T7-(UAG)2-sfGFP, see also below),

MjBpaRS (to 10 μM final) and (to 5 μM final) in 5 μL reactions per condition. 4 μL of 𝑇𝑇𝑇𝑇𝑇𝑇 𝐶𝐶𝐶𝐶𝐶𝐶 these reaction mixtures were transferred𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 to a Corning® 384 Well flat bottom, low flange, white polystyrene assay and relative fluorescence units for sfGFP was measured at excitation/emission wavelengths of 485 nm/528 nm using a Biotek spectrophotometric plate reader at 37 °C over 8 hours. The signal values were normalized to peak fluorescence magnitude within an experiment and the graph was plotted indicating the standard deviation between repeats in shade. Graphs were plotted and analyzed in Prism 8.2.1 for Windows, GraphPad Software, www.graphpad.com.

116 Growth Media and Growth Conditions.

Unless otherwise noted, cultures were grown in 2xYT medium (16 g L-1 bacto tryptone, 10 g L-1 bacto yeast extract, 5 g L-1 sodium chloride) supplemented with antibiotics to retain the plasmids. nsAA incorporation experiments with autoinduction (e.g. Figure 21) were done in

GMML minimal media [1× M9 (Sigma-Aldrich M-6030) /1 mM MgSO4 /0.1 mM CaCl2 /8.5 mM

NaCl /5 μM Fe2SO4 /1% v/v glycerol /0.3 mM leucine] supplemented with 10% 2xYT, 0.05%

glucose and 0.05% arabinose. Typically, Agrobacterium tumefaciens C58 cells were grown in LB

at 30 °C, C321 strains were grown at 34 °C, and the rest of the E. coli strains at 37 °C. To check

the sensitivity of a given species to the apidaecins, overnight cultures were adjusted to OD600

~0.5 and serially diluted (2 x 10-1 dilutions). 2 μL of each dilution was spotted on solid media (e.g.

LB) containing Api137 of different concentrations up to 750 μg mL-1.

Growth Curves.

Overnight cultures (grown in 2xYT) were diluted to OD600 = 0.05 into media with different

concentrations of apidaecins; either 2xYT (for experiments in Figure 18 and 9b) or into GMML

supplemented with appropriate antibiotics 10% 2xYT (final), 0.05% glucose and 0.05% arabinose

(for experiments in Figure 24b, d, Figure 27g) in a Corning® 96 Well clear flat bottom plate. OD600

was recorded every minute using a Biotek spectrophotometric plate reader set to 30 °C with

continuous shaking over at least 18 hours. At least three technical and two biological repeats

were plotted (indicating the standard deviation between repeats in shade). Exceptions were the

high apidaecins concentrations where the availability of the peptide was limiting, e.g. 1280 µg

mL-1 Api1b. The growth curves were analyzed in Prism 8.2.1 for Windows, GraphPad Software,

117 www.graphpad.com. The growth parameters were predicted by fitting the growth data to logistic

growth models and the two-tail P values were determined by t-test.

Lambda Phage lysis assay

In order to induce the C321 λ cI857 lysogens, freezer stocks of the cells were streaked on

LB agar plates and incubated overnight at 30 °C. Several colonies were screened for temperature sensitivity at 42 °C. Parallel liquid cultures were set up in LB supplemented with 5mM MgSO4 at

30 °C. Overnight cultures from the temperature sensitive isolates were diluted 1:100 in the same medium containing Api137 at indicated concentrations. Once the cells reached OD600 ~0.1 (grown

at 30 °C with good aeration) the temperature was shifted to 42 °C for 15 min. The cells were then

diluted to OD600 ~0.05 in a Corning® 96 Well clear flat bottom plate. OD600 was recorded every

minute using a Biotek spectrophotometric plate reader set to 37 ° with continuous shaking.

Cloning and strain engineering

For routine PCR and Gibson assembly procedures Q5® High-Fidelity 2X Master Mix and

Gibson Assembly® Master Mix from NEB were used and primers were designed following the

manufacturer’s instructions. (T7-(UAG)2-sfGFP DNA template was generated by linearizing and

amplifying the pBAD-Ub-UAG-sfGFP_151UAG plasmid with primers Pri1 and Pri2 (Table 4 & Table

8). The template was cleaned up and concentrated by phenol-chloroform extraction and ethanol

precipitation before use in cell-free translation experiments. Routinely, new plasmids were

constructed using parts from existing plasmids, e.g. p006-GFP-pBAD (Addgene Plasmid #108315)

as the plasmid backbone for the new pBAD-PopZ plasmids (Table 8), or gBlocks (IDT) via Gibson

assembly and cloning into NEB® 5-alpha Competent E. coli.

118 Of note, a shortened backbone from pDULE-ABK (Addgene Plasmid #49086, with total vector size of 7590) was used to construct the new pDule plasmids, e.g. pDule-MbAbKRS-2xtRNA with total vector size of 4577 bp. The pDule-MbAbKRS-2xtRNA plasmid series contains two copies of genes under proK and lpp promoters. 𝑃𝑃𝑃𝑃𝑃𝑃 𝐶𝐶𝐶𝐶𝐶𝐶 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡Simple site-directed mutagenesis of reporter plasmids, e.g. PopZ-(UAG)2-sfGFP to PopZ-

(UAG)6-sfGFP, were performed using Q5® Site-Directed Mutagenesis Kit from NEB® following the manufacturer’s instructions.

Strains that are used in the nsAA incorporation experiments were generated by transforming electrocomponent cells, BL21 (E. cloni EXPRESS BL21(DE3), Lucigen), DH10B (E. cloni 10G, Lucigen) or other strains including E. coli Nissle 1917 and A. tumefaciens that are made electrocompetent and handled as described229,230. Cultures from at least 3 separate colonies were frozen and used for the nsAA incorporation assays as biological replicates. Routinely, new sequences were verified via Sanger sequencing by Genewiz and NGS-based complete plasmid sequencing by MGH DNA Core (Table 8).

Table 8 Sequences of new constructs from chapter 3 Promoter sequences – blue highlight; RBS sequences – purple highlight; relevant ORFs – red; stop codons underlined; terminator sequences – green highlight. In-frame amber stop codons are highlighted in black Na Sequence me T7- GTAATACGACTCACTATAGGGTTAACTTTAAGAAGGAGATATACATATGCAGATTTTTGTGAAGACTTTAACAGGTAAGACGATTACCCTGGAGG (UA TGGAGTCCTCGGACACCATCGATAATGTAAAATCAAAAATCCAAGATAAGGAAGGAATCCCTCCAGACCAGCAACGTCTGATTTTCGCAGGTAA ACAACTGGAGGATGGTCGCACGCTTTCGGACTACAACATCCAGAAAGAATCTACCCTTCATTTGGTTCTGCGTCTGCGTGGAGGATAGTTGTTTG G)2- TGCAGGAGCTTGCATCCAAGGGCGAGGAGCTCTTTACTGGCGTAGTACCAATTCTCGTAGAGCTCGATGGCGATGTAAATGGCCATAAGTTTTCC sfGF GTACGCGGCGAGGGCGAGGGCGATGCAACTAACGGCAAGCTCACTCTCAAGTTTATTTGTACTACTGGCAAGCTCCCAGTACCATGGCCAACTC P TCGTAACTACTCTGACCTATGGCGTACAATGTTTTTCCCGCTATCCAGATCACATGAAGCAACATGATTTTTTTAAGTCCGCAATGCCAGAGGGCT ATGTACAAGAGCGCACTATTAGCTTTAAGGATGATGGCACCTATAAGACTCGCGCAGAGGTAAAGTTTGAGGGCGATACTCTCGTAAATCGCAT TGAGCTCAAGGGCATTGATTTTAAGGAGGATGGCAATATTCTCGGCCATAAGCTGGAGTATAATTTCAATTCCCATAATGTATAGATTACCGCAG ATAAGCAAAAGAATGGCATTAAGGCGAATTTTAAGATTCGCCATAATGTGGAGGATGGCTCCGTACAACTCGCAGATCATTATCAACAAAATACT CCAATTGGCGATGGCCCAGTACTCCTCCCAGATAATCATTATCTCTCCACTCAATCCGTGCTCTCCAAAGATCCAAATGAGAAGCGCGATCACATG GTACTCCTGGAGTTTGTAACTGCAGCAGGCATTACTCATGGCATGGATGAGCTCTATAAGCTCGAGCACCACCACCACCACCACTAA pTD GTAAAACGACGGCCAGTGAGCGCGCGTAATACGACTCACTATAGGGCGAATTGGAGCTCACGTGCAGATCTGCACATAGCCACACCCTGAATGA 114_ GATGTTTTCTCTCCGCTACGTTTCTTGGGCTAGCCCGAAAGAGGAGAAATTAACTATGGCATCCAAGGGCGAGGAGCTCTTTACTGGCGTAGTAC CAATTCTCGTAGAGCTCGATGGCGATGTAAATGGCCATAAGTTTTCCGTACGCGGCGAGGGCGAGGGCGATGCAACTAACGGCAAGCTCACTCT sfGF CAAGTTTATTTGTACTACTGGCAAGCTCCCAGTACCATGGCCAACTCTCGTAACTACTCTGACCTATGGCGTACAATGTTTTTCCCGCTATCCAGAT P- CACATGAAGCAACATGATTTTTTTAAGTCCGCAATGCCAGAGGGCTATGTACAAGAGCGCACTATTAGCTTTAAGGATGATGGCACCTATAAGAC TCGCGCAGAGGTAAAGTTTGAGGGCGATACTCTCGTAAATCGCATTGAGCTCAAGGGCATTGATTTTAAGGAGGATGGCAATATTCTCGGCCAT

119 1AT AAGCTGGAGTATAATTTCAATTCCCATAATGTATAGATTACCGCAGATAAGCAAAAGAATGGCATTAAGGCGAATTTTAAGATTCGCCATAATGT G GGAGGATGGCTCCGTACAACTCGCAGATCATTATCAACAAAATACTCCAATTGGCGATGGCCCAGTACTCCTCCCAGATAATCATTATCTCTCCAC TCAATCCGTGCTCTCCAAAGATCCAAATGAGAAGCGCGATCACATGGTACTCCTGGAGTTTGTAACTGCAGCAGGCATTACTCATGGCATGGATG AGCTCTATAAGCTCGAGCACCACCACCACCACCACTAAGGTCTAGAGGATCCTTGGCCACCTCACTCAAAAGCTGGTGAACTGCCTCACGGGCG GCATCCGCACTCTTGTAGAAGTAGTGTCGAACTCCGTCCTCATCCCAGCGTGGCGTACGTAAGGTGGAGGCGTGCAAATGACTGTACAACGTGT AGGGCAACGCAATTGACACTAGCAGACCTCCCGGATCGCAGTTTCGAAACTTGAACTTCGAACTCTCAGATGAGTTTCCGCCGGATGGCGAGCG CGGTAAGATGGGCCTTGCTGCGGACGTCGAAGCGCTTCATGGCTTCGCGTAGCTTGACGCGGACGCTGTTGTACTTGACCCCTTCGACGTCGGC GATCTCCTCCATCGTCTTGCCGACGGCAATCCATCTCAGATAGGTGGCCTCCTTCGGATCGAGCCATGCGGCATCTTCCGCGGTAGGGGTGGTGC GAAGGAATGAGATGCGGGCATGGATCTGCCCGATGGTTGCAGCGGCTGCGACTGCATCGATCTCCCGATCGAGATCGATCACCGGCTTGTCCGA TGCCATCGTGAACATCGACATAAAGCCGTTGGCGGTCTTGATGGGTATTGTGATGCCGGAGCGGATGCCGAAATCGGATGCGTGGTCATAGAA GGCACGCTCGTCCTTCGACAGCGTCGGCCGCTCGTGCTCGCCCGACCAGGTGAAGATGTGCTTCCGGGACCTCGCGCGTTTGACGACCGGATCG AGCGCTTCGAACTTCTTGTCGAAGTAGGTTGATTGCCATTGGCGGTGATAGTTGGTAACGGCGGTGATGTGCCTGTGCTGGATATGAAGGTAGG CATAGCCGGTGAAGCCGAAATGGTCGGCGATGTCCGCCAGCCCGGTCTTCAGGATGCACTCATCGCCTTCGATCGCGGCAAGATCAGTCAGCTT GTCCAGCCAGTGCTGCATTCCATACCTCCTATGCGGTGTCAGTAGCCTCTTCGTTGCTAGTCTCTGCAGGAATTCGATATCAAGCTTATCGATACC GTCGACCTCGAGGGGGGGCCCGGTACCCAGCTTTTGTTCCCTTTAGTGAGGGTTAATTGCGCGCTTGGCGTAATCATGGTCATAGCTGTTTCCTG TGTGAAATTGTTATCCGCTCACAATTCCACACAACATACGAGCCGGAAGCATAAAGTGTAAAGCCTGGGGTGCCTAATGAGTGAGCTAACTCACA TTAATTGCGTTGCGCTCACTGCCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTGCATTAATGAATCGGCCAACGCGCGGGGAGAGGCGGTT TGCGTATTGGGCGCATGCATAAAAACTGTTGTAATTCATTAAGCATTCTGCCGACATGGAAGCCATCACAAACGGCATGATGAACCTGAATCGCC AGCGGCATCAGCACCTTGTCGCCTTGCGTATAATATTTGCCCATGGACGCACACCGTGGAAACGGATGAAGGCACGAACCCAGTTGACATAAGC CTGTTCGGTTCGTAAACTGTAATGCAAGTAGCGTATGCGCTCACGCAACTGGTCCAGAACCTTGACCGAACGCAGCGGTGGTAACGGCGCAGTG GCGGTTTTCATGGCTTGTTATGACTGTTTTTTTGTACAGTCTATGCCTCGGGCATCCAAGCAGCAAGCGCGTTACGCCGTGGGTCGATGTTTGATG TTATGGAGCAGCAACGATGTTACGCAGCAGCAACGATGTTACGCAGCAGGGCAGTCGCCCTAAAACAAAGTTAGGTGGCTCAAGTATGGGCAT CATTCGCACATGTAGGCTCGGCCCTGACCAAGTCAAATCCATGCGGGCTGCTCTTGATCTTTTCGGTCGTGAGTTCGGAGACGTAGCCACCTACT CCCAACATCAGCCGGACTCCGATTACCTCGGGAACTTGCTCCGTAGTAAGACATTCATCGCGCTTGCTGCCTTCGACCAAGAAGCGGTTGTTGGC GCTCTCGCGGCTTACGTTCTGCCCAAGTTTGAGCAGCCGCGTAGTGAGATCTATATCTATGATCTCGCAGTCTCCGGAGAGCACCGGAGGCAGG GCATTGCCACCGCGCTCATCAATCTCCTCAAGCATGAGGCCAACGCGCTTGGTGCTTATGTGATCTACGTGCAAGCAGATTACGGTGACGATCCC GCAGTGGCTCTCTATACAAAGTTGGGCATACGGGAAGAAGTGATGCACTTTGATATCGACCCAAGTACCGCCACCTAACAATTCGTTCAAGCCGA GATCGGCTTCCCGGCCGCGGAGTTGTTCGGTAAATTGTCACAACGCCGCAGGTGGCACTTTTCGGGGAAATGTGCGCGCCCGCGTTCCTGCTGG CGCTGGGCCTGTTTCTGGCGCTGGACTTCCCGCTGTTCCGTCAGCAGCTTTTCGCCCACGGCCTTGATGATCGCGGCGGCCTTGGCCTGCATATCC CGATTCAACGGCCCCAGGGCGTCCAGAACGGGCTTCAGGCGCTCCCGAAGGTCTCGGGCCGTCTCTTGGGCTTGATCGGCCTTCTTGCGCATCTC ACGCGCTCCTGCGGCGGCCTGTAGGGCAGGCTCATACCCCTGCCGAACCGCTTTTGTCAGCCGGTCGGCCACGGCTTCCGGCGTCTCAACGCGC TTTGAGATTCCCAGCTTTTCGGCCAATCCCTGCGGTGCATAGGCGCGTGGCTCGACCGCTTGCGGGCTGATGGTGACGTGGCCCACTGGTGGCC GCTCCAGGGCCTCGTAGAACGCCTGAATGCGCGTGTGACGTGCCTTGCTGCCCTCGATGCCCCGTTGCAGCCCTAGATCGGCCACAGCGGCCGC AAACGTGGTCTGGTCGCGGGTCATCTGCGCTTTGTTGCCGATGAACTCCTTGGCCGACAGCCTGCCGTCCTGCGTCAGCGGCACCACGAACGCG GTCATGTGCGGGCTGGTTTCGTCACGGTGGATGCTGGCCGTCACGATGCGATCCGCCCCGTACTTGTCCGCCAGCCACTTGTGCGCCTTCTCGAA GAACGCCGCCTGCTGTTCTTGGCTGGCCGACTTCCACCATTCCGGGCTGGCCGTCATGACGTACTCGACCGCCAACACAGCGTCCTTGCGCCGCT TCTCTGGCAGCAACTCGCGCAGTCGGCCCATCGCTTCATCGGTGCTGCTGGCCGCCCAGTGCTCGTTCTCTGGCGTCCTGCTGGCGTCAGCGTTG GGCGTCTCGCGCTCGCGGTAGGCGTGCTTGAGACTGGCCGCCACGTTGCCCATTTTCGCCAGCTTCTTGCATCGCATGATCGCGTATGCCGCCAT GCCTGCCCCTCCCTTTTGGTGTCCAACCGGCTCGACGGGGGCAGCGCAAGGCGGTGCCTCCGGCGGGCCACTCAATGCTTGAGTATACTCACTA GACTTTGCTTCGCAAAGTCGTGACCGCCTACGGCGGCTGCGGCGCCCTACGGGCTTGCTCTCCGGGCTTCGCCCTGCGCGGTCGCTGCGCTCCCT TGCCAGCCCGTGGATATGTGGACGATGGCCGCGAGCGGCCACCGGCTGGCTCGCTTCGCTCGGCCCGTGGACAACCCTGCTGGACAAGCTGAT GGACAGGCTGCGCCTGCCCACGAGCTTGACCACAGGGATTGCCCACCGGCTACCCAGCCTTCGACCACATACCCACCGGCTCCAACTGCGCGGC CTGCGGCCTTGCCCCATCAATTTTTTTAATTTTCTCTGGGGAAAAGCCTCCGGCCTGCGGCCTGCGCGCTTCGCTTGCCGGTTGGACACCAAGTGG AAGGCGGGTCAAGGCTCGCGCAGCGACCGCGCAGCGGCTTGGCCTTGACGCGCCTGGAACGACCCAAGCCTATGCGAGTGGGGGCAGTCGAA GGGCGAAGCCCGCCCGCCTGCCCCCCGAGCCTCACGGCGGCGAGTGCGGGGGTTCCAAGGGGGCAGCGCCACCTTGGGCAAGGCCGAAGGCC GCGCAGTCGATCAACAAGCCCCGGAGGGGCCACTTTTTGCCGGAGGGGGAGCCGCGCCGAAGGCGTGGGGGAACCCCGCAGGGGTGCCCTTC TTTGGGCACCAAAGAACTAGATATAGGGCGAAATGCGAAAGACTTAAAAATCAACAACTTAAAAAAGGGGGGTACGCAACAGCTCATTGCGGC ACCCCCCGCAATAGCTCATTGCGTAGGTTAAAGAAAATCTGTAATTGACTGCCACTTTTACGCAACGCATAATTGTTGTCGCGCTGCCGAAAAGTT GCAGCTGATTGCGCATGGTGCCGCAACCGTGCGGCACCCCTACCGCATGGAGATAAGCATGGCCACGCAGTCCAGAGAAATCGGCATTCAAGC CAAGAACAAGCCCGGTCACTGGGTGCAAACGGAACGCAAAGCGCATGAGGCGTGGGCCGGGCTTATTGCGAGGAAACCCACGGCGGCAATGC TGCTGCATCACCTCGTGGCGCAGATGGGCCACCAGACCCACGGCGGCAATGCTGCTGCATCACCTCGTGGCGCAGATGGGCCACCAGAACGCC GTGGTGGTCAGCCAGAAGACACTTTCCAAGCTCATCGGACGTTCTTTGCGGACGGTCCAATACGCAGTCAAGGACTTGGTGGCCGAGCGCTGGA TCTCCGTCGTGAAGCTCAACGGCCCCGGCACCGTGTCGGCCTACGTGGTCAATGACCGCGTGGCGTGGGGCCAGCCCCGCGACCAGTTGCGCCT GTCGGTGTTCAGTGCCGCCGTGGTGGTTGATCACGACGACCAGGACGAATCGCTGTTGGGGCATGGCGACCTGCGCCGCATCCCGACCCTGTAT CCGGGCGAGCAGCAACTACCGACCGGCCCCGGCGAGGAGCCGCCCAGCCAGCCCGGCATTCCGGGCATGGAACCAGACCTGCCAGCCTTGACC GAAACGGAGGAATGGGAACGGCGCGGGCAGCAGCGCCTGCCGATGCCCGATGAGCCGTGTTTTCTGGACGATGGCGAGCCGTTGGAGCCGCC GACACGGGTCACGCTGCCGCGCCGGTAGCACTTGGGTTGCGCAGCAACCCGTAAGTGCGCTGTTCCAGACTATCGGCTGTAGCCGCCTCGCCGC CCTATACCTTGTCTGCCTCCCCGCGTTGCGTCGCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGGAT TCACCGTTTTTATCAGGCTCTGGGAGGCAGAATAAATGATCATATCGTCAATTATTACCTCCACGGGGAGAGCCTGAGCAAACTGGCCTCAGGCA TTTGAGAAGCACACGGTCACACTGCTTCCGGTAGTCAATAAACCGGTAAACCAGCAATAGACATAAGCGGCTATTTAACGACCCTGCCCTGAACC GACGACCGGGTCGAATTTGCTTTCGAATTTCTGCCATTCATCCGCTTATTATCACTTATTCAGGCGTAGCAACCAGGCGTTTAAGGGCACCAATAA CTGCCTTAAAAAAATTACGCCCCGCCCTGCCACTCATCGCAGTACGGCCTATTGGTTAAAAAATGAGCTGATTTAACAAAAATTTAACGCGAATTT TAACAAAATATTAACGCTTACAATTTCCATTCGCCATTCAGGCTGCGCAACTGTTGGGAAGGGCGATCGGTGCGGGCCTCTTCGCTATTACGCCA GCTGGCGAAAGGGGGATGTGCTGCAAGGCGATTAAGTTGGGTAACGCCAGGGTTTTCCCAGTCACGACGTT

120 pYW CATGGCTCGAGAAATCATAAAAAATTTATTTGCTTTGTGAGCGGATAACAATTATAATAGATTCAATTGTGAGCGGATAACAATTTCACACAGAA 15c_ TTCATTAAAGAGGAGAAATTAACTATGGATGAATTCGAGATGATCAAGCGTAATACATCTGAAATCATCAGTGAAGAGGAATTACGTGAGGTGT TGAAAAAAGATGAGAAATCCGCTGGCATTGGATTTGAGCCTTCCGGTAAGATTCATCTTGGGCACTATCTTCAGATTAAAAAGATGATCGACTTA MjB CAAAATGCCGGGTTCGACATCATCATCCTGTTGGCCGACTTACATGCGTATTTAAATCAGAAGGGAGAACTTGACGAAATTCGCAAGATTGGCG paRS ATTACAACAAGAAGGTATTTGAGGCGATGGGACTGAAGGCGAAGTATCTTTATGGCTCACCTTTTCAGTTGGATAAGGACTACACTTTAAATGTA TATCGTCTGGCTTTAAAGACTACCCTGAAGCGTGCGCGCCGCTCGATGGAGCTTATCGCGCGTGAGGACGAAAACCCAAAAGTAGCCGAAGTGA TCTATCCAATCATGCAAGTGAATACCTCACATTATCTTGGTGTTGACGTCGCCGTGGGCGGAATGGAGCAGCGTAAAATCCACATGTTAGCTCGT GAGTTACTTCCCAAAAAGGTGGTCTGTATCCACAATCCTGTTCTTACAGGGCTGGACGGTGAAGGCAAAATGAGTTCATCCAAAGGCAACTTTAT CGCAGTGGATGATAGTCCTGAAGAGATTCGCGCCAAGATTAAAAAGGCCTATTGTCCCGCCGGAGTTGTCGAGGGGAATCCTATTATGGAAATC GCCAAATACTTCCTGGAATATCCTTTAACCATCAAACGTCCAGAGAAGTTTGGAGGAGACCTGACGGTAAATTCGTACGAAGAGCTTGAATCCCT GTTTAAGAACAAAGAACTGCACCCGATGGACTTGAAAAACGCCGTAGCCGAAGAGCTTATCAAAATTTTAGAGCCAATCCGTAAGCGTCTTTAA CTGCAGTTTCAAACGCTAAATTGCCTGAGAATTCAAAAAAGCCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCTTAACTATGAGAGGATTGCA CGGCTAACTAAGCGGCCTGCTGACTTTCTCGCCGATCAAAAGGCATTTTGCTATTAAGGGATTGACGAGGGCGTATCTGCGCAGTAAGATGCGC CCCGCATTCCGGCGGTAGTTCAGCAGGGCAGAACGGCGGACTCTAAATCCGCATGGCAGGGGTTCAAATCCCCTCCGCCGGACCAAATTCGAAA AGCCTGCTCAACGAGCAGGCTTTTTTGCATGCCCGCATGCGAGCTCGGTACCCCGGGTCGACCTGCAGCCAAGCTTAATTAGCTGACCATGGTGC GCAATTCTTGAAGACGAAAGGGCCTCGTGATACGCCTATTTTTATAGGTTAATGTCATGATAATAATGGTTTCTTAGACGTCAGGTGGCACTTTTC GGGGAAATGTGCGCGGAACCCCTATTTGTTTATTTTTCTAAATACATTCAAATATGTATCCGCTCATGAGACAATAACCCTGATAAATGCTTCAAT AATATTGAAAAAGGAAGAGTATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGA AACGCTGGTGAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGCGGTAAGATCCTTGAGAG TTTTCGCCCCGAAGAACGTTTTCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGTGTTGACGCCGGGCAAGAGCAAC TCGGTCGCCGCATACACTATTCTCAGAATGACTTGGTTGAGTACTCACCAGTCACAGAAAAGCATCTTACGGATGGCATGACAGTAAGAGAATTA TGCAGTGCTGCCATAACCATGAGTGATAACACTGCGGCCAACTTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAA CATGGGGGATCATGTAACTCGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTGCAGC AATGGCAACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAACAATTAATAGACTGGATGGAGGCGGATAAAGTT GCAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAGCGTGGGTCTCGCGGTATCATTGCAGC ACTGGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGATCGCTGA GATAGGTGCCTCACTGATTAAGCATTGGTAACTGTCAGACCAAGTTTACTCATATATACTTTAGATTGATTTAAAACTTCATTTTTAATTTAAAAGG ATCTAGGTGAAGATCCTTTTTGATAATCTCATGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAAGATCAAA GGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAG CTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTCCTTCTAGTGTAGCCGTAGTTAGGCCACCACTTCAAG AACTCTGTAGCACCGCCTACATACCTCGCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCA AGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACT GAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAG GAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTG ATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGTTCT TTCCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATACCGCTCGCCGCAGCCGAACGACCGAGCGCAGCGAG TCAGTGAGCGAGGAAGCGGAAGAGCGCCTGATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCACACCGAGCGCCTGATGCGGTATTTTC TCCTTACGCATCTGTGCGGTATTTCACACCGCATATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGTATACACTCCGCT ATCGCTACGTGACTGGGTCATGGCTGCGCCCCGACACCCGCCAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACA GACAAGCTGTGACCGTCTCCGGGAGCTGCATGTGTCAGAGGTTTTCACCGTCATCACCGAAACGCGCGAGGCAGGCGTCCATCAGCTCGCCCCG ATCTTCGGGCAGTGAGGCGGCAATGACTGCCGCCTTTTCCTGCCTTTCGGTCTTGGGTTGGTGCTTTCTGCTCACGGCGTCGGCTCCGGCTCAATA CTCGTGGGGCAGCATGGCTACCGTGTGCGAACGGTCGCCATCGGTCACGATCAGCACTCGCACACCGTGCGGGTGGTAAACCGACACGATGGT GCCGCACTCCTTCACGGCTTCCTCGTTGGCCTCGCGCTGCGCATCGTCCACGTCGCCCCAGTCCCCGGAAACGTGGCGGTCGATCAGCGGCCCTG CTACGCCCACGTGTTCTGCCAACATAACGGCGGATGCTTCATCACGGGAAAGCACCGGAAACACGGCGACCTCATGGGCCAAAAGCCACATTGA ACGGGTCATGCCATCAGTGAAACACACCGCGAAATGGCCCAGAATCATGGTCGGGCTTACGTCAGCAAGCGGAACGGGGCTGTCGATGCCCAC GGCAAAATGCGCGGCGCTCGACGGATATTTCCGGTCTTTAGGCATGTCCTGCTTCAACGGGCAGGCTTGAGAATCCTCGACGCCTAACGCCAAC CACGCTCGATAGAAAGCCAAGCCATTCACATACACAATGAAATCGGACTGTGCGGCATTTGAAGCTGTCATAAAGGCGTCAGGATGGTTCGGCA CGGGCAGCCGCCAAACAACCTCATTCCAAGCATTCGGCACGATTTGAGGCAATTCGAGGGGCAACCCGCTCGACTGCGCAAGCTCTTTATCAAG CCGCCGAATTTCCAGCCAAGCGCGACACCGCCGCCGTAGCAACATCAGCATCGTTAGCGCCCATGCTCTCGGCCTGCGCCGGTTTTTTCCATCTGC CGCTGTCGTTGATGTTGCGCGGCCTGTGGGTTGCGGACGATTCCGGCGCGACCGTGCGGATAACTGATTTCCGCATGGTCGTTCATCTTCACTTT CTCCGCGCCCGAAACAGCGGCGCGTTCATGCAGAATGATCGTGCCTGGGCGGCTATCGTCGGCCTGAATGACATAGCTATTGGTGCGACCGATC ACACGGCCCCGGTAAGTTTCGCCCGGTTCTGCGTGGCGGTGAAATTCATCTGCGCGGGCTTTCTCCCTGCCTACTGCCTCGCGCACCTGTTGCGC CTCGATGATGGCCCTTTCTACGTCGCGCTCAAGCCGGTTGGCGTTGGTGCTCTTAGTGGCGTCTGGCGTGGTTTCTAGCTCTTTGCGGGTCTCTTG TAGTCCGCTCTTGTAGTTCTCAAGCAAGGGTTCATTGCCTTTGGCTTCTTTCGGAATTTCGGCCAATTGGCGGCCTGCCTCGAAAGCCAGCAGGC GCATGGTTTCCGCTTCATTCTGGCGCGACTGCTCTTGCAGCTTCCGGCGTTCGTCTTTCTCCTGCTCGCGGCGCGTGTCGGTCATGGTCTTTCCTCC ATCGTGTCAAGCTCGCTTCGCGTGCCATAGTGCTCGCGCCGTTCAACGGCGTTGCCGTCCTCCACTACGTTGCGGCCCTTCGGGTGACTCGGCGC TGTGCGCTATGGCGATCTGGAAGTGTCGGGACGAGGGGCGTCCCGGCGGTAAATTCGACGTGCCTGCGGCGCGTCGAACAAGGGGCGTGCCC GGTGTCAGTTCGTGGCATTTTTCGAGGCGCGACGCCATTTCCAAGGCTCCTGAGCATTCGGGTCTGACCAAAGGCCGAGCCGTTGGCGGCGGGC CTCTTTTTCATACTCGTACATCTGCGCGTCGGTTGGTTTGCCGTAATAACGGTAAGCCCAGGCCATGCCTTCTTGAACCATGATCGCATTGATGTT GGTGAGTTGTGTTTGGCCGCCGGGGTATTGCAACGGCGCGTAAACGACCCCAAGAGTGCGGCCATACCGATCAACCTCTTTTTCGGTCACTTGA ACCTCTTGGCGAAAGGTCAAGTCGGCGAGCCGTTGGCGAGCACGGGAGCCGAAGGCTTGGCCGCTTTCCGGTGCGTCAATATCGGCCAATCTCA CGCGGATGGTCTGACGGTTCACCAAAACGTCGATAGTGTCACCGTCAAGGATTCGGACGACTTCACCCCGGAAGTCGGCCCAAGCGGGCACACT GACGATTAGGACGACAGCGGCCGCGACCGCGCGAAGGGCGGCAAGGGCGCTTTTCATTGTTTGCCTCCTGTTTTCAAGACGGCTGTGAGATTG GCGACCTGCTCTTTGAGGGCTTCCACCTGACCTTGCAGACTGGCGGCGCGCTCGATGGCCTCCTTGGCCTGTTTGCGGGCCTCGATAGCCTCGTT ATCGCGTTGGGTGAGCTTTTCCATGCAGCGGTTTAGTTCTTCGCCGCTGCGGCGTTTCACTTCGGCAAGCTGGTCGGCCAGCTTGTCGCGCTCGC GTTCCATAGGTTCGAGCTGATTCACTCGTTCGCGGAGCTGGTCGTTTTCGCGGGTGAAGGTGTCGGCTAGTTCGATTGCTTCGGCAAGCTGCTGG

121 CTGATGGCCGCTTTGTCGGCCTCGATCTGTTTCCGATCTTCGTCAAACCGGGCGTTGGCGTGCGCCAGGGCGATAGCCCATAGCGCATTGCCAAG CTCGGCAAGATGCTCGTTGACTGCAACCGGCAATGGGTCTGATGAGGGCAGGGTGGCGGTCTTGCGGTTTTTCCATTCAGCCATTGCATCGGAA ATGGTTGTGAAGCTACCGCTTCCGAGTTTCTTGCGCACGGCGGCCAAAGTGGGCCGGATGCCTTCGGCGTCCAGTTCGTCGGCTGCTCGCCAAA TGTCTTGTTTAGTGATTGCCATTCTTGCGGGCCTCTGTACTGTAGTATGTTGTATGATACTACATACTACAACAATTTAACAGAGCCATCTTGGAAT CTGGTGTCTCTGCGCCTATAATTCTGGAACAGCTACTTTCCGAACGACTCCTGCGTTGATCGGAAATCCAGAAGCCCGAGAGGTTGCCGCCTTTC GGGCTTTTTCTTTTTCAAAAAAAAAAATTTATAAAACGATCTGTTGCGGCCGCCGGGTTGTGGGCAAAGGCGCTCGACGGTGGGCAACCGCTTG CGGTTGTCCACGGGCGGAGCCGGTGCGCGTAGCGCATTGTCCACAAGCCAAGGGCGACCAATAATTGATATATATATTCATAATTGAAAAGCTA ATTGAACATACTACTTGCTGTAACTACTTGCCGGAGCGAGGGGTGTTTGCAAGCTGTTGATCTGAAAGGGCTATTAGCGTTCTCACGTGCCTTTTT GATTAGCGATTTCACGTGACCTTATTAGCGATTTCACGTACTCCGATTAGCGATTTCACGTACCCTGATTAGCGATTTCACGTGGATAGTTTTTGG AGCGGGCCGGAAAGCCCCGTGAATCAAGGCTTTGCGGGGCATTAGCGGTTTCACGTGGATAACTACCCTCTATCCACAGGCTTCCGGGGATAAA AAAGCCCGCTCGACGGCGGGCTGTTGGATGGGAAGGCTTGACCAAGCCAAGCGTAGCGTTGGCCTGGTCAAGTCGGAGGGGGGCCGATGCGA GCGCCCTTGCCGGGTGCGCGGGTGACATGCAGGCGTGTGGATTTGATGCGCAGGCATTCGCCGTCATCTTCGATGCAGTCGCTTGCCTCGGGAT AGACAATCAACACTTCGCGTAGGCGCTTTTTGAAGTTGTATTTGAAGCTGGCGAGTGCTGCCCGCTCTGCCCGCTCTCGGGCCTTATCGTCCAGTT CGGGCGAGTTGCGTGCGCGGCTGCCATAGGATGAGCCGAATTGCGCTTGCAGGGCGACCCAAGGGATTTGCACGAAGGGGCGGCCCTTGGCC CGCAACAGGAACACGCGATAGGTCAGCCACGTGTAAATGTCCATCGCAAGCGGAGACTGCCGCAAGGCATGCAGGTAGTCGATTCGGATAGGA ACCGGTGAGCGGGTGACTTCCTCGAAGAAATCGCCTGTGAGGGTGAGGGTGCTATCCCATAGCGCCCGATCTTCTGGCCGCTTGGGATTCCAGA ATAGAAAAGCGCGCTTGGCAATGACGACGTTCTCAATGCCGAAGTCATTGCCTTGCTCGCCGGCAAGCGAAATCATGGATGAAAACAGGCGTTG CGCCTGATTGCGAAGGGTGGCCGTGTAACGGCCATCGGTGTGCATTCCGAGCCTTTGTAGAAATTCCGATTGCGACCGGCCAAGGTTCAACACG GGGTCTTTCGTTCGCACGGCCTCGGTGCATATCCAAGCAAGCAAGGTGCGCGGCATAGAACCGTAGGGCAGGCCGATGCTCGGCTTGCCCATGA TCGACAAGGTGACGATGCCATTGGTGCGCTCAAAGTAGCTGGTCTTGGGGTCGGTGTGGGGCATGGTCGCTTGCACAAGGCAACGGGCCATGT AGCCGACTAAGCCAGCTTCGCGGGCATCCTCCATTTCGAGCGCGAGGCTCGTCTTGATGATCTCGTTGATACGATGGCCGGGGGCTTTGTTGTTC TTAGGCATGTTGTTCCCTCCCCGGCATGGTGATGGTTGGTCTAGTGTTTGTGGGTTTGATGTTCCGGCGTTTGATGAACAGGCGCAAGGTGTGAG GGCTGACGCCTAACAACTCGGCTGCGCGACTTTGCGGCAAGCCAAGGTTCACGTATGCCTGTACTTCATCAATACGGCTGTCCAGCTTCAAGGCG CTCGATTTGCTGCCCTTGGGTCGCCCGAGCGTCTTGCCGCGCTCTCTGGCGACTTGTAGCGCCTCGGTGGTACGTGCCTGAATGAAATGCCGCTC GATCTGTGCAGCCAAGCCAAGCACGGTTGCCATGATGTCGCTTTGTAGGCTGCCGTCCATGATGATCTTCTGTTTGGTCACATGGACGATTAGGC CGCGCTCGCTCGCCGCTTTGAGAATTTCCAAGGCGGCGAGGGCGGAACCGGCAATGCGCGTAATCTCCGGCGTCAGTAGCACGTCGCCACGCTC GGCCTTTTCGATGATTGCTCCGAGCTTGCGCTTGCGCCAGTCCTTTGCTCTGCTGGCAATTTCTTCCTCGATCTGTAGCGGCGCGAAGCCTTTGGC GTTCGCGTATTCGAGCAAACCGTATTTTTGGTTTTCCGGGTCTTGGCCGTCACGCGAAACCCGGAGATAGGCATAGTATTTTGGCATTTGCAGGG AAAACGTCAGATTCGGTTAAACATGCCTCATTCTAGCGCAGATTAAATAGGAATTAAATACCCTGTTGCGGTATAGATAAAACGTTGGTTTGTTCT GCCCTATGAGCGTACAAAAAAGGCCGGGTGAGTGGCCCGGCCTTCGTTTAGGTGCTGAATAGGATTGGTTCTGGTGCCAGCCTCATGAGAAGC GCGTCATAAAACCACATGAGGGCCGACGCACCAAGGCCGACGCCTGCGACCGATAGCATGATGTGGGTCTTATTGGCCGAGTCCAGCCCAAGC CACATGATCGGTAGGGTGATGAGACTGGCGAACGAAGCCAAGCCGAGAAGAAAGCGCACCGGGCCGCGCAGCCACCAGAGAATGAGGAACAC CAGATGTCGTTTTCAGAAGACGGCTGCACTGAACGTCAGAAGCCGACTGCACTATAGCAGCGGAGGGGTTGGATCCATCAGGCAACGACGGGC TGCTGCCGGCCATCAGCGGACGCAGGGAGGACTTTCCGCAACCGGCCGTTCGATGCGGCACCGATGGCCTTCGCGCAGGGGTAGTGAATCCGC CAGGATTGACTTGCGCTGCCCTACCTCTCACTAGTGAGGGGCGGCAGCGCATCAAGCGGTGAGCGCACTCCGGCACCGCCAACTTTCAGCACAT GCGTGTAAATCATCGTCGTAGAGACGTCGGAATGGCCGAGCAGATCCTGCACGGTTCGAATGTCGTAACCGCTGCGGAGCAAGGCCGTCGCGA ACGAGTGGCGGAGGGTGTGCGGTGTGGCGGGCTTCGTGATGCCTGCTTGTTCTACGGCACGTTTGAAGGCGCGCTGAAAGGTCTGGTCATACA TGTGATGGCGACGCACGACACCGCTCCGTGGATCGGTCGAATGCGTGTGCTGCGCAAAAACCCAGAACCACGGCCAGGAATGCCCGGCGCGCG GATACTTCCGCTCAAGGGCGTCGGGAAGCGCAACGCCGCTGCGGCCCTCGGCCTGGTCCTTCAGCCACCATGCCCGTGCACGCGACAAAGCTCA TCAGCGTGGTCGTGAAGCGATTCACAGATGTCTGCCTGTTCATCCGCGTCCAGCTCGTTGAGTTTCTCCAGAAGCGTTAATGTCTGGCTTCTGATA AAGCGGGCCATGTTAAGGGCGGTTTTTTCCTGTTTGGTCACTGATGCCTCCGTGTAAGGGGGATTTCTGTTCATGGGGGTAATGATACCGATGAA ACGAGAGAGGATGCTCACGATACGGGTTACTGATGATGAACATGCCCGGTTACTGGAACGTTGTGAGGGTAAACAACTGGCGGTATGGATGCG GCGGGACCAGAGAAAAATCACTCAGGGTCAATGCCAGCGCTTCGTTAATACAGATGTAGGTGTTCCACAGGGTAGCCAGCAGCATCCTGCGATG CAGATCCGGAACATAATGGTGCAGGGCGCTGACTTCCGCGTTTCCAGACTTTACGAAACACGGAAACCGAAGACCATTCATGTTGTTGCTCAGGT CGCAGACGTTTTGCAGCAGCAGTCGCTTCACGTTCGCTCGCGTATCGGTGATTCATTCTGCTAACCAGTAAGGCAACCCCGCCAGCCTAGCCGGG TCCTCAACGACAGGAGCACGATCATGCGCACCCGTGGCCAGGACCCAACGCTGCCCGAGATGCGCCGCGTGCGGCTGCTGGAGATGGCGGACG CGATGGATATGTTCTGCCAAGGGTTGGTTTGCGCATTCACAGTTCTCCGCAAGAATTGATTGGCTCCAATTCTTGGAGTGGTGAATCCGTTAGCG AGGTGCCGCCGGCTTCCATTCAGGTCGAGGTGGCCCGGCTCCATGCACCGCGACGCAACGCGGGGAGGCAGACAAGGTATAGGGCGGCGCCT ACAATCCATGCCAACCCGTTCCATGTGCTCGCCGAGGCGGCATAAATCGCCGTGACGATCAGCGGTCCAATGATCGAAGTTAGGCTGGTAAGAG CCGCGAGCGATCCTTGAAGCTGTCCCTGATGGTCGTCATCTACCTGCCTGGACAGCATGGCCTGCAACGCGGGCATCCCGATGCCGCCGGAAGC GAGAAGAATCATAATGGGGAAGGCCATCCAGCCTCGCGTCGCGAACGCCAGCAAGACGTAGCCCAGCGCGTCGGCCGCCATGCCGGCGATAAT GGCCTGCTTCTCGCCGAAACGTTTGGTGGCGGGACCAGTGACGAAGGCTTGAGCGAGGGCGTGCAAGATTCCGAATACCGCAAGCGACAGGCC GATCATCGTCGCGCTCCAGCGAAAGCGGTCCTCGCCGAAAATGACCCAGAGCGCTGCCGGCACCTGTCCTACGAGTTGCATGATAAAGAAGACA GTCATAAGTGCGGCGACGATAGTCATGCCCCGCGCCCACCGGAAGGAGCTGACTGGGTTGAAGGCTCTCAAGGGCATCGGTCGAGGAGGC pBA GGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCCC D- GAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATC TTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAAC Ub- GCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCTT UAG TGAGTGAGCTGATACCGCTCGCCGCAGCCGAACGACCGAGCGCAGCGAGTCAGTGAGCGAGGAAGCGGAAGGCGAGAGTAGGGAACTGCCA - GGCATCAAACTAAGCAGAAGGCCCCTGACGGATGGCCTTTTTGCGTTTCTACAAACTCTTTCTGTGTTGTAAAACGACGGCCAGTCTTAAGCTCG sfGF GGCCCCCTGGGCGGTTCTGATAACGAGTAATCGTTAATCCGCAAATAACGTAAAAACCCGCTTCGGCGGGTTTTTTTATGGGGGGAGTTTAGGG AAAGAGCATTTGTCAGAATATTTAAGGGCGCCTGTCACTTTGCTTGATATATGAGAATTATTTAACCTTATAAATGAGAAAAAAGCAACGCACTTT P_15 AAATAAGATACGTTGCTTTTTCGATTGATGAACACCTATAATTAAACTATTCATCTATTATTTATGATTTTTTGTATATACAATATTTCTAGTTTGTTA 1UA AAGAGAATTAAGAAAATAAATCTCGAAAATAATAAAGGGAAAATCAGTTATGACAACTTGACGGCTACATCATTCACTTTTTCTTCACAACCGGC G ACGGAACTCGCTCGGGCTGGCCCCGGTGCATTTTTTAAATACCCGCGAGAAATAGAGTTGATCGTCAAAACCAACATTGCGACCGACGGTGGCG ATAGGCATCCGGGTGGTGCTCAAAAGCAGCTTCGCCTGGCTGATACGTTGGTCCTCGCGCCAGCTTAAGACGCTAATCCCTAACTGCTGGCGGA

122 AAAGATGTGACAGACGCGACGGCGACAAGCAAACATGCTGTGCGACGCTGGCGATATCAAAATTGCTGTCTGCCAGGTGATCGCTGATGTACT GACAAGCCTCGCGTACCCGATTATCCATCGGTGGATGGAGCGACTCGTTAATCGCTTCCATGCGCCGCAGTAACAATTGCTCAAGCAGATTTATC GCCAGCAGCTCCGAATAGCGCCCTTCCCCTTGCCCGGCGTTAATGATTTGCCCAAACAGGTCGCTGAAATGCGGCTGGTGCGCTTCATCCGGGCG AAAGAACCCCGTATTGGCAAATATTGACGGCCAGTTAAGCCATTCATGCCAGTAGGCGCGCGGACGAAAGTAAACCCACTGGTGATACCATTCG CGAGCCTCCGGATGACGACCGTAGTGATGAATCTCTCCTGGCGGGAACAGCAAAATATCACCCGGTCGGCAAACAAATTCTCGTCCCTGATTTTT CACCACCCCCTGACCGCGAATGGTGAGATTGAGAATATAACCTTTCATTCCCAGCGGTCGGTCGATAAAAAAATCGAGATAACCGTTGGCCTCAA TCGGCGTTAAACCCGCCACCAGATGGGCATTAAACGAGTATCCCGGCAGCAGGGGATCATTTTGCGCTTCAGCCATACTTTTCATACTCCCGCCA TTCAGAGAAGAAACCAATTGTCCATATTGCATCAGACATTGCCGTCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTT ATTAAAAGCATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATT ATTTGCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGTTTCT CCATACCCGTTTTTAAGGAGGTAAAAAATGCAGATTTTTGTGAAGACTTTAACAGGTAAGACGATTACCCTGGAGGTGGAGTCCTCGGACACCAT CGATAATGTAAAATCAAAAATCCAAGATAAGGAAGGAATCCCTCCAGACCAGCAACGTCTGATTTTCGCAGGTAAACAACTGGAGGATGGTCGC ACGCTTTCGGACTACAACATCCAGAAAGAATCTACCCTTCATTTGGTTCTGCGTCTGCGTGGAGGATAGTTGTTTGTGCAGGAGCTTGCATCCAA GGGCGAGGAGCTCTTTACTGGCGTAGTACCAATTCTCGTAGAGCTCGATGGCGATGTAAATGGCCATAAGTTTTCCGTACGCGGCGAGGGCGAG GGCGATGCAACTAACGGCAAGCTCACTCTCAAGTTTATTTGTACTACTGGCAAGCTCCCAGTACCATGGCCAACTCTCGTAACTACTCTGACCTAT GGCGTACAATGTTTTTCCCGCTATCCAGATCACATGAAGCAACATGATTTTTTTAAGTCCGCAATGCCAGAGGGATATGTACAAGAGCGCACTAT TAGCTTTAAGGATGATGGCACCTATAAGACTCGCGCAGAGGTAAAGTTTGAGGGCGATACTCTCGTAAATCGCATTGAGCTCAAGGGCATTGAT TTTAAGGAGGATGGCAATATTCTCGGCCATAAGCTGGAGTATAATTTCAATTCCCATAATGTATAGATTACCGCAGATAAGCAAAAGAATGGCAT TAAGGCGAATTTTAAGATTCGCCATAATGTGGAGGATGGCTCCGTACAACTCGCAGATCATTATCAACAAAATACTCCAATTGGCGATGGCCCAG TACTCCTCCCAGATAATCATTATCTCTCCACTCAATCCGTGCTCTCCAAAGATCCAAATGAGAAGCGCGATCACATGGTACTCCTGGAGTTTGTAA CTGCAGCAGGCATTACTCATGGCATGGATGAGCTCTATAAGCTCGAGCACCACCACCACCACCACTAACCCCAAGGGCGACACCCCCTAATTAGC CCGGGCGAAAGGCCCAGTCTTTCGACTGAGCCTTTCGTTTTATTTGATGCCTGGCAGTTCCCTACTCTCGCATGGGGAGTCCCCACACTACCATCG GCGCTACGGCGTTTCACTTCTGAGTTCGGCATGGGGTCAGGTGGGACCACCGCGCTACTGCCGCCAGGCAAACAAGGGGTGTTATGAGCCATAT TCAGGTATAAATGGGCTCGCGATAATGTTCAGAATTGGTTAATTGGTTGTAACACTGACCCCTATTTGTTTATTTTTCTAAATACATTCAAATATGT ATCCGCTCATGAGACAATAACCCTGATAAATGCTTCAATAATATTGAAAAAGGAAGAATATGAGCCATATTCAACGGGAAACGTCGAGGCCGCG ATTAAATTCCAACATGGATGCTGATTTATATGGGTATAAATGGGCTCGCGATAATGTCGGGCAATCAGGTGCGACAATCTATCGCTTGTATGGGA AGCCCGATGCGCCAGAGTTGTTTCTGAAACATGGCAAAGGTAGCGTTGCCAATGATGTTACAGATGAGATGGTCAGACTAAACTGGCTGACGGA ATTTATGCCACTTCCGACCATCAAGCATTTTATCCGTACTCCTGATGATGCATGGTTACTCACCACTGCGATCCCCGGAAAAACAGCGTTCCAGGT ATTAGAAGAATATCCTGATTCAGGTGAAAATATTGTTGATGCGCTGGCAGTGTTCCTGCGCCGGTTGCACTCGATTCCTGTTTGTAATTGTCCTTT TAACAGCGATCGCGTATTTCGCCTCGCTCAGGCGCAATCACGAATGAATAACGGTTTGGTTGATGCGAGTGATTTTGATGACGAGCGTAATGGCT GGCCTGTTGAACAAGTCTGGAAAGAAATGCATAAACTTTTGCCATTCTCACCGGATTCAGTCGTCACTCATGGTGATTTCTCACTTGATAACCTTA TTTTTGACGAGGGGAAATTAATAGGTTGTATTGATGTTGGACGAGTCGGAATCGCAGACCGATACCAGGATCTTGCCATCCTATGGAACTGCCTC GGTGAGTTTTCTCCTTCATTACAGAAACGGCTTTTTCAAAAATATGGTATTGATAATCCTGATATGAATAAATTGCAATTTCATTTGATGCTCGATG AGTTTTTCTAAGCGGCGCGCCATCGAATGGCGCAAAACCTTTCGCGGTATGGCATGATAGCGCCCGGAAGAGAGTCAATTCAGGGTGGTGAATA TGAAACCAGTAACGTTATACGATGTCGCAGAGTATGCCGGTGTCTCTTATCAGACCGTTTCCCGCGTGGTGAACCAGGCCAGCCACGTTTCTGCG AAAACGCGGGAAAAAGTGGAAGCGGCGATGGCGGAGCTGAATTACATTCCCAACCGCGTGGCACAACAACTGGCGGGCAAACAGTCGTTGCT GATTGGCGTTGCCACCTCCAGTCTGGCCCTGCACGCGCCGTCGCAAATTGTCGCGGCGATTAAATCTCGCGCCGATCAACTGGGTGCCAGCGTG GTGGTGTCGATGGTAGAACGAAGCGGCGTCGAAGCCTGTAAAGCGGCGGTGCACAATCTTCTCGCGCAACGCGTCAGTGGGCTGATCATTAAC TATCCGCTGGATGACCAGGATGCCATTGCTGTGGAAGCTGCCTGCACTAATGTTCCGGCGTTATTTCTTGATGTCTCTGACCAGACACCCATCAAC AGTATTATTTTCTCCCATGAGGACGGTACGCGACTGGGCGTGGAGCATCTGGTCGCATTGGGTCACCAGCAAATCGCGCTGTTAGCGGGCCCAT TAAGTTCTGTCTCGGCGCGTCTGCGTCTGGCTGGCTGGCATAAATATCTCACTCGCAATCAAATTCAGCCGATAGCGGAACGGGAAGGCGACTG GAGTGCCATGTCCGGTTTTCAACAAACCATGCAAATGCTGAATGAGGGCATCGTTCCCACTGCGATGCTGGTTGCCAACGATCAGATGGCGCTG GGCGCAATGCGCGCCATTACCGAGTCCGGGCTGCGCGTTGGTGCGGATATCTCGGTAGTGGGATACGACGATACCGAAGATAGCTCATGTTATA TCCCGCCGTTAACCACCATCAAACAGGATTTTCGCCTGCTGGGGCAAACCAGCGTGGACCGCTTGCTGCAACTCTCTCAGGGCCAGGCGGTGAA GGGCAATCAGCTGTTGCCAGTCTCACTGGTGAAAAGAAAAACCACCCTGGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCAT TAATGCAGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGACTCATGACCAAAATCCCTTAACGTGAGTTACGCGCGCGTCGTTCCAC TGAGCGTCAGACCCCGTAGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAAAAAAACCACCGCTA CCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTTCTTCTA GTGTAGCCGTAGTTAGCCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGC GATAAGTCGTGTCTTACCGGGTTGGACTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACG pBA GATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAGGA D- GAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATG CTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGTTCTTTCC PopZ TGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATACCGCTCGCCGCAGCCGAACGACCGAGCGCAGCGAGTCAG - TGAGCGAGGAAGCGGAAGGCGAGAGTAGGGAACTGCCAGGCATCAAACTAAGCAGAAGGCCCCTGACGGATGGCCTTTTTGCGTTTCTACAAA (UA CTCTTTCTGTGTTGTAAAACGACGGCCAGTCTTAAGCTCGGGCCCCCTGGGCGGTTCTGATAACGAGTAATCGTTAATCCGCAAATAACGTAAAA ACCCGCTTCGGCGGGTTTTTTTATGGGGGGAGTTTAGGGAAAGAGCATTTGTCAGAATATTTAAGGGCGCCTGTCACTTTGCTTGATATATGAGA G)0- ATTATTTAACCTTATAAATGAGAAAAAAGCAACGCACTTTAAATAAGATACGTTGCTTTTTCGATTGATGAACACCTATAATTAAACTATTCATCTA sfGF TTATTTATGATTTTTTGTATATACAATATTTCTAGTTTGTTAAAGAGAATTAAGAAAATAAATCTCGAAAATAATAAAGGGAAAATCAGTTATGAC P AACTTGACGGCTACATCATTCACTTTTTCTTCACAACCGGCACGGAACTCGCTCGGGCTGGCCCCGGTGCATTTTTTAAATACCCGCGAGAAATAG AGTTGATCGTCAAAACCAACATTGCGACCGACGGTGGCGATAGGCATCCGGGTGGTGCTCAAAAGCAGCTTCGCCTGGCTGATACGTTGGTCCT CGCGCCAGCTTAAGACGCTAATCCCTAACTGCTGGCGGAAAAGATGTGACAGACGCGACGGCGACAAGCAAACATGCTGTGCGACGCTGGCGA TATCAAAATTGCTGTCTGCCAGGTGATCGCTGATGTACTGACAAGCCTCGCGTACCCGATTATCCATCGGTGGATGGAGCGACTCGTTAATCGCT TCCATGCGCCGCAGTAACAATTGCTCAAGCAGATTTATCGCCAGCAGCTCCGAATAGCGCCCTTCCCCTTGCCCGGCGTTAATGATTTGCCCAAAC AGGTCGCTGAAATGCGGCTGGTGCGCTTCATCCGGGCGAAAGAACCCCGTATTGGCAAATATTGACGGCCAGTTAAGCCATTCATGCCAGTAGG CGCGCGGACGAAAGTAAACCCACTGGTGATACCATTCGCGAGCCTCCGGATGACGACCGTAGTGATGAATCTCTCCTGGCGGGAACAGCAAAAT

123 ATCACCCGGTCGGCAAACAAATTCTCGTCCCTGATTTTTCACCACCCCCTGACCGCGAATGGTGAGATTGAGAATATAACCTTTCATTCCCAGCGG TCGGTCGATAAAAAAATCGAGATAACCGTTGGCCTCAATCGGCGTTAAACCCGCCACCAGATGGGCATTAAACGAGTATCCCGGCAGCAGGGGA TCATTTTGCGCTTCAGCCATACTTTTCATACTCCCGCCATTCAGAGAAGAAACCAATTGTCCATATTGCATCAGACATTGCCGTCACTGCGTCTTTT ACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAGCATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAA AGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATTTGCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGAT CCTACCTGACGCTTTTTATCGCAACTCTCTACTGTTTCTCCATACCCGTTTTTAAGGAGGTAAAAAATGTCCGATCAGTCTCAAGAACCTACAATGG AGGAAATCCTCGCCTCCATTCGACGCATCATCTCGGAGGATGACGCGCCGGCGGAGCCTGCGGCCGAAGCGGCGCCCCCGCCGCCGCCGGAAC CCGAACCTGAACCGGTGTCGTTCGACGACGAGGTTCTGGAATTGACGGATCCGATCGCGCCCGAGCCCGAGCTGCCGCCGCTGGAGACTGTCG GCGACATCGACGTCTATTCGCCGCCGGAACCTGAGTCGGAACCGGCCTACACGCCGCCGCCGGCGGCTCCGGTGTTTGATCGCGACGAAGTCGC CGAGCAGCTGGTCGGCGTTTCGGCCGCTTCGGCCGCGGCGAGCGCCTTCGGCAGCCTGAGCTCGGCCCTGCTGATGCCCAAGGACGGTCGGAC GCTGGAAGACGTCGTACGCGAGCTGCTGCGCCCGCTGCTCAAGGAGTGGCTGGACCAGAACCTGCCGCGCATCGTCGAGACCAAGGTTGAGGA AGAAGTGCAGCGTATCTCTCGGGGACGCGGCGGTGGTGGTGGTTCTGGTACCGCATCCAAGGGCGAGGAGCTCTTTACTGGCGTAGTACCAAT TCTCGTAGAGCTCGATGGCGATGTAAATGGCCATAAGTTTTCCGTACGCGGCGAGGGCGAGGGCGATGCAACTAACGGCAAGCTCACTCTCAAG TTTATTTGTACTACTGGCAAGCTCCCAGTACCATGGCCAACTCTCGTAACTACTCTGACCTATGGCGTACAATGTTTTTCCCGCTATCCAGATCACA TGAAGCAACATGATTTTTTTAAGTCCGCAATGCCAGAGGGCTATGTACAAGAGCGCACTATTAGCTTTAAGGATGATGGCACCTATAAGACTCGC GCAGAGGTAAAGTTTGAGGGCGATACTCTCGTAAATCGCATTGAGCTCAAGGGCATTGATTTTAAGGAGGATGGCAATATTCTCGGCCATAAGC TGGAGTATAATTTCAATTCCCATAATGTATACATTACCGCAGATAAGCAAAAGAATGGCATTAAGGCGAATTTTAAGATTCGCCATAATGTGGAG GATGGCTCCGTACAACTCGCAGATCATTATCAACAAAATACTCCAATTGGCGATGGCCCAGTACTCCTCCCAGATAATCATTATCTCTCCACTCAA TCCGTGCTCTCCAAAGATCCAAATGAGAAGCGCGATCACATGGTACTCCTGGAGTTTGTAACTGCAGCAGGCATTACTCATGGCATGGATGAGCT CTATAAGCTCGAGCACCACCACCACCACTAACCCCAAGGGCGACACCCCCTAATTAGCCCGGGCGAAAGGCCCAGTCTTTCGACTGAGCCTTTCG TTTTATTTGATGCCTGGCAGTTCCCTACTCTCGCATGGGGAGTCCCCACACTACCATCGGCGCTACGGCGTTTCACTTCTGAGTTCGGCATGGGGT CAGGTGGGACCACCGCGCTACTGCCGCCAGGCAAACAAGGGGTGTTATGAGCCATATTCAGGTATAAATGGGCTCGCGATAATGTTCAGAATTG GTTAATTGGTTGTAACACTGACCCCTATTTGTTTATTTTTCTAAATACATTCAAATATGTATCCGCTCATGAGACAATAACCCTGATAAATGCTTCA ATAATATTGAAAAAGGAAGAATATGAGCCATATTCAACGGGAAACGTCGAGGCCGCGATTAAATTCCAACATGGATGCTGATTTATATGGGTAT AAATGGGCTCGCGATAATGTCGGGCAATCAGGTGCGACAATCTATCGCTTGTATGGGAAGCCCGATGCGCCAGAGTTGTTTCTGAAACATGGCA AAGGTAGCGTTGCCAATGATGTTACAGATGAGATGGTCAGACTAAACTGGCTGACGGAATTTATGCCACTTCCGACCATCAAGCATTTTATCCGT ACTCCTGATGATGCATGGTTACTCACCACTGCGATCCCCGGAAAAACAGCGTTCCAGGTATTAGAAGAATATCCTGATTCAGGTGAAAATATTGT TGATGCGCTGGCAGTGTTCCTGCGCCGGTTGCACTCGATTCCTGTTTGTAATTGTCCTTTTAACAGCGATCGCGTATTTCGCCTCGCTCAGGCGCA ATCACGAATGAATAACGGTTTGGTTGATGCGAGTGATTTTGATGACGAGCGTAATGGCTGGCCTGTTGAACAAGTCTGGAAAGAAATGCATAAA CTTTTGCCATTCTCACCGGATTCAGTCGTCACTCATGGTGATTTCTCACTTGATAACCTTATTTTTGACGAGGGGAAATTAATAGGTTGTATTGATG TTGGACGAGTCGGAATCGCAGACCGATACCAGGATCTTGCCATCCTATGGAACTGCCTCGGTGAGTTTTCTCCTTCATTACAGAAACGGCTTTTTC AAAAATATGGTATTGATAATCCTGATATGAATAAATTGCAATTTCATTTGATGCTCGATGAGTTTTTCTAAGCGGCGCGCCATCGAATGGCGCAA AACCTTTCGCGGTATGGCATGATAGCGCCCGGAAGAGAGTCAATTCAGGGTGGTGAATATGAAACCAGTAACGTTATACGATGTCGCAGAGTAT GCCGGTGTCTCTTATCAGACCGTTTCCCGCGTGGTGAACCAGGCCAGCCACGTTTCTGCGAAAACGCGGGAAAAAGTGGAAGCGGCGATGGCG GAGCTGAATTACATTCCCAACCGCGTGGCACAACAACTGGCGGGCAAACAGTCGTTGCTGATTGGCGTTGCCACCTCCAGTCTGGCCCTGCACGC GCCGTCGCAAATTGTCGCGGCGATTAAATCTCGCGCCGATCAACTGGGTGCCAGCGTGGTGGTGTCGATGGTAGAACGAAGCGGCGTCGAAGC CTGTAAAGCGGCGGTGCACAATCTTCTCGCGCAACGCGTCAGTGGGCTGATCATTAACTATCCGCTGGATGACCAGGATGCCATTGCTGTGGAA GCTGCCTGCACTAATGTTCCGGCGTTATTTCTTGATGTCTCTGACCAGACACCCATCAACAGTATTATTTTCTCCCATGAGGACGGTACGCGACTG GGCGTGGAGCATCTGGTCGCATTGGGTCACCAGCAAATCGCGCTGTTAGCGGGCCCATTAAGTTCTGTCTCGGCGCGTCTGCGTCTGGCTGGCT GGCATAAATATCTCACTCGCAATCAAATTCAGCCGATAGCGGAACGGGAAGGCGACTGGAGTGCCATGTCCGGTTTTCAACAAACCATGCAAAT GCTGAATGAGGGCATCGTTCCCACTGCGATGCTGGTTGCCAACGATCAGATGGCGCTGGGCGCAATGCGCGCCATTACCGAGTCCGGGCTGCGC GTTGGTGCGGATATCTCGGTAGTGGGATACGACGATACCGAAGATAGCTCATGTTATATCCCGCCGTTAACCACCATCAAACAGGATTTTCGCCT GCTGGGGCAAACCAGCGTGGACCGCTTGCTGCAACTCTCTCAGGGCCAGGCGGTGAAGGGCAATCAGCTGTTGCCAGTCTCACTGGTGAAAAG AAAAACCACCCTGGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCATTAATGCAGCTGGCACGACAGGTTTCCCGACTGGAAA GCGGGCAGTGACTCATGACCAAAATCCCTTAACGTGAGTTACGCGCGCGTCGTTCCACTGAGCGTCAGACCCCGTAGAAAAGATCAAAGGATCT TCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCA ACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTTCTTCTAGTGTAGCCGTAGTTAGCCCACCACTTCAAGAACTCT GTAGCACCGCCTACATACCTCGCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCAAGACG ATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGA PopZ ATGTCCGATCAGTCTCAAGAACCTACAATGGAGGAAATCCTCGCCTCCATTCGACGCATCATCTCGGAGGATGACGCGCCGGCGGAGCCTGCGG - CCGAAGCGGCGCCCCCGCCGCCGCCGGAACCCGAACCTGAACCGGTGTCGTTCGACGACGAGGTTCTGGAATTGACGGATCCGATCGCGCCCG AGCCCGAGCTGCCGCCGCTGGAGACTGTCGGCGACATCGACGTCTATTCGCCGCCGGAACCTGAGTCGGAACCGGCCTACACGCCGCCGCCGG (UA CGGCTCCGGTGTTTGATCGCGACGAAGTCGCCGAGCAGCTGGTCGGCGTTTCGGCCGCTTCGGCCGCGGCGAGCGCCTTCGGCAGCCTGAGCTC G)2- GGCCCTGCTGATGCCCAAGGACGGTCGGACGCTGGAAGACGTCGTACGCGAGCTGCTGCGCCCGCTGCTCAAGGAGTGGCTGGACCAGAACCT sfGF GCCGCGCATCGTCGAGACCAAGGTTGAGGAAGAAGTGCAGCGTATCTCTCGGGGACGCGGCTAGGGTGGTGGTTCTGGTACCTAGGCATCCAA P GGGCGAGGAGCTCTTTACTGGCGTAGTACCAATTCTCGTAGAGCTCGATGGCGATGTAAATGGCCATAAGTTTTCCGTACGCGGCGAGGGCGAG GGCGATGCAACTAACGGCAAGCTCACTCTCAAGTTTATTTGTACTACTGGCAAGCTCCCAGTACCATGGCCAACTCTCGTAACTACTCTGACCTAT ORF GGCGTACAATGTTTTTCCCGCTATCCAGATCACATGAAGCAACATGATTTTTTTAAGTCCGCAATGCCAGAGGGCTATGTACAAGAGCGCACTATT (rest AGCTTTAAGGATGATGGCACCTATAAGACTCGCGCAGAGGTAAAGTTTGAGGGCGATACTCTCGTAAATCGCATTGAGCTCAAGGGCATTGATT is TTAAGGAGGATGGCAATATTCTCGGCCATAAGCTGGAGTATAATTTCAATTCCCATAATGTATACATTACCGCAGATAAGCAAAAGAATGGCATT sam AAGGCGAATTTTAAGATTCGCCATAATGTGGAGGATGGCTCCGTACAACTCGCAGATCATTATCAACAAAATACTCCAATTGGCGATGGCCCAGT ACTCCTCCCAGATAATCATTATCTCTCCACTCAATCCGTGCTCTCCAAAGATCCAAATGAGAAGCGCGATCACATGGTACTCCTGGAGTTTGTAAC e as TGCAGCAGGCATTACTCATGGCATGGATGAGCTCTATAAGCTCGAGCACCACCACCACCACTAA pBA D-

124 PopZ - (UA G)0- sfGF P) PopZ ATGTCCGATCAGTCTCAAGAACCTACAATGGAGGAAATCCTCGCCTCCATTCGACGCATCATCTCGGAGGATGACGCGCCGGCGGAGCCTGCGG - CCGAAGCGGCGCCCCCGCCGCCGCCGGAACCCGAACCTGAACCGGTGTCGTTCGACGACGAGGTTCTGGAATTGACGGATCCGATCGCGCCCG AGCCCGAGCTGCCGCCGCTGGAGACTGTCGGCGACATCGACGTCTATTCGCCGCCGGAACCTGAGTCGGAACCGGCCTACACGCCGCCGCCGG (UA CGGCTCCGGTGTTTGATCGCGACGAAGTCGCCGAGCAGCTGGTCGGCGTTTCGGCCGCTTCGGCCGCGGCGAGCGCCTTCGGCAGCCTGAGCTC G)6- GGCCCTGCTGATGCCCAAGGACGGTCGGACGCTGGAAGACGTCGTACGCGAGCTGCTGCGCCCGCTGCTCAAGGAGTGGCTGGACCAGAACCT sfGF GCCGCGCATCGTCGAGACCAAGGTTGAGGAAGAAGTGCAGCGTATCTCTCGGGGACGCGGCTAGGGTGGTGGTTCTGGTACCTAGACCTAGG P GTTAGGGCTAGGCATCCAAGGGCGAGGAGCTCTTTACTGGCGTAGTACCAATTCTCGTAGAGCTCGATGGCGATGTAAATGGCCATAAGTTTTC CGTACGCGGCGAGGGCGAGGGCGATGCAACTAACGGCAAGCTCACTCTCAAGTTTATTTGTACTACTGGCAAGCTCCCAGTACCATGGCCAACT ORF CTCGTAACTACTCTGACCTATGGCGTACAATGTTTTTCCCGCTATCCAGATCACATGAAGCAACATGATTTTTTTAAGTCCGCAATGCCAGAGGGC (rest TATGTACAAGAGCGCACTATTAGCTTTAAGGATGATGGCACCTATAAGACTCGCGCAGAGGTAAAGTTTGAGGGCGATACTCTCGTAAATCGCA is TTGAGCTCAAGGGCATTGATTTTAAGGAGGATGGCAATATTCTCGGCCATAAGCTGGAGTATAATTTCAATTCCCATAATGTATAGATTACCGCA sam GATAAGCAAAAGAATGGCATTAAGGCGAATTTTAAGATTCGCCATAATGTGGAGGATGGCTCCGTACAACTCGCAGATCATTATCAACAAAATA CTCCAATTGGCGATGGCCCAGTACTCCTCCCAGATAATCATTATCTCTCCACTCAATCCGTGCTCTCCAAAGATCCAAATGAGAAGCGCGATCACA e as TGGTACTCCTGGAGTTTGTAACTGCAGCAGGCATTACTCATGGCATGGATGAGCTCTATAAGCTCGAGCACCACCACCACCACCACTAA pBA D- PopZ - (UA G)0- sfGF P) PopZ ATGTCCGATCAGTCTCAAGAACCTACAATGGAGGAAATCCTCGCCTCCATTCGACGCATCATCTCGGAGGATGACGCGCCGGCGGAGCCTGCGG - CCGAAGCGGCGCCCCCGCCGCCGCCGGAACCCGAACCTGAACCGGTGTCGTTCGACGACGAGGTTCTGGAATTGACGGATCCGATCGCGCCCG AGCCCGAGCTGCCGCCGCTGGAGACTGTCGGCGACATCGACGTCTATTCGCCGCCGGAACCTGAGTCGGAACCGGCCTACACGCCGCCGCCGG (UA CGGCTCCGGTGTTTGATCGCGACGAAGTCGCCGAGCAGCTGGTCGGCGTTTCGGCCGCTTCGGCCGCGGCGAGCGCCTTCGGCAGCCTGAGCTC G)8- GGCCCTGCTGATGCCCAAGGACGGTCGGACGCTGGAAGACGTCGTACGCGAGCTGCTGCGCCCGCTGCTCAAGGAGTGGCTGGACCAGAACCT sfGF GCCGCGCATCGTCGAGACCAAGGTTGAGGAAGAAGTGCAGCGTATCTCTCGGGGACGCGGCTAGGGTGGTGGTTCTGGTACCTAGACCTAGG P GTTAGGGCTAGTCCTAGGGATAGGCATCCAAGGGCGAGGAGCTCTTTACTGGCGTAGTACCAATTCTCGTAGAGCTCGATGGCGATGTAAATG GCCATAAGTTTTCCGTACGCGGCGAGGGCGAGGGCGATGCAACTAACGGCAAGCTCACTCTCAAGTTTATTTGTACTACTGGCAAGCTCCCAGTA ORF CCATGGCCAACTCTCGTAACTACTCTGACCTATGGCGTACAATGTTTTTCCCGCTATCCAGATCACATGAAGCAACATGATTTTTTTAAGTCCGCAA (rest TGCCAGAGGGCTATGTACAAGAGCGCACTATTAGCTTTAAGGATGATGGCACCTATAAGACTCGCGCAGAGGTAAAGTTTGAGGGCGATACTCT is CGTAAATCGCATTGAGCTCAAGGGCATTGATTTTAAGGAGGATGGCAATATTCTCGGCCATAAGCTGGAGTATAATTTCAATTCCCATAATGTAT sam AGATTACCGCAGATAAGCAAAAGAATGGCATTAAGGCGAATTTTAAGATTCGCCATAATGTGGAGGATGGCTCCGTACAACTCGCAGATCATTA TCAACAAAATACTCCAATTGGCGATGGCCCAGTACTCCTCCCAGATAATCATTATCTCTCCACTCAATCCGTGCTCTCCAAAGATCCAAATGAGAA e as GCGCGATCACATGGTACTCCTGGAGTTTGTAACTGCAGCAGGCATTACTCATGGCATGGATGAGCTCTATAAGCTCGAGCACCACCACCACCACC pBA ACTAA D- PopZ - (UA G)0- sfGF P) pDul GCGCCGGTTAAGGCTAAACTGAAAGGACAAGTTTTGGTGACTGCGCTCCTCCAAGCCAGTTACCTCGGTTCAAAGAGTTGGTAGCTCAGAGAAC e- CTTCGAAAAACCGCCCTGCAAGGCGGTTTTTTCGTTTTCAGAGCAAGAGATTACGCGCAGACCAAAACGATCTCAAGAAGATCATCTTATTAATC AGATAAAATATTTCTAGATTTCAGTGCAATTTATCTCTTCAAATGTAGCACCTGAAGTCAGCCCCATACGATATAAGTTGTAATTCTCATGTTTGAC MjB AGCTTATCATCGATAAGCTTTAATGCGGTAGTTTATCACAGTTAAATTGCTAACGCAGTCAGGCACCGTGTATGAAATCTAACAATGCGCTCATCG paRS TCATCCTCGGCACCGTCACCCTGGATGCTGTAGGCATAGGCTTGGTTATGCCGGTACTGCCGGGCCTCTTGCGGGATATCGTCCATTCCGACAGC (incl ATCGCCAGTCACTATGGCGTGCTGCTAGCGCTATATGCGTTGATGCAATTTCTATGCGCACCCGTTCTCGGAGCACTGTCCGACCGCTTTGGCCGC udes CGCCCAGTCCTGCTCGCTTCGCTACTTGGAGCCACTATCGACTACGCGATCATGGCGACCACACCCGTCCTGTGGATCCTCTACGCCGGACGCATC GTGGCCGGCATCACCGGCGCCACAGGTGCGGTTGCTGGCGCCTATATCGCCGACATCACCGATGGGGAAGATCGGGCTCGCCACTTCGGGCTC a ATGAGCGCTTGTTTCGGCGTGGGTATGGTGGCAGGCCCCGTGGCCGGGGGACTGTTGGGCGCCATCTCCTTGCATGCACCATTCCTTGCGGCGG copy CGGTGCTCAACGGCCTCAACCTACTACTGGGCTGCTTCCTAATGCAGGAGTCGCATAAGGGAGAGCGTCGACCGATGCCCTTGAGAGCCTTCAA CCCAGTCAGCTCCTTCCGGTGGGCGCGGGGCATGACTATCGTCGCCGCACTTATGACTGTCTTCTTTATCATGCAACTCGTAGGACAGGTGCCGG

125 of CAGCGCTCTGGGTCATTTTCGGCGAGGACCGCTTTCGCTGGAGCGCGACGATGATCGGCCTGTCGCTTGCGGTATTCGGAATCTTGCACGCCCTC GCTCAAGCCTTCGTCACTGGTCCCGCCACCAAACGTTTCGGCGAGAAGCAGGCCATTATCGCCGGCATGGCGGCCGACGCACTGGGCTACGTCT TGCTGGCGTT CGCGACGCGAGGCTGGATGGCCTTCCCCATTATGATTCTTCTCGCTTCCGGCGGCATCGGGATGCCCGCGTTGCAGGCCATGCTG gene 𝑇𝑇 TCCAGGCAGGTAGATGACGACCATCAGGGACAGCTTCAAGGATCGCTCGCGGCTCTTACCAGCCTAACTTCGATCATTGGACCGCTGATCGTCAC 𝐶𝐶 )𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 GGCGATTTATGCCGCCTCGGCGAGCACATGGAACGGGTTGGCATGGATTGTAGGCGCCGCCCTATACCTTGTCTGCCTCCCCGCGTTGCGTCGC GGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGGATTCACCACTCCAAGAATTGGAGCCAATCAATTCTT GCGGAGAACTGTGAATGCGCAAACCAACCCTTGGCAGAACATATCCATCGCGTCCGTATAAGATCATACGCCGTTATACGTTGTTTACGCTTTGA GGAATCCCATATGGACGAATTTGAAATGATAAAGAGAAACACATCTGAAATTATCAGCGAGGAAGAGTTAAGAGAGGTTTTAAAAAAAGATGA AAAATCTGCTGGTATAGGTTTTGAACCAAGTGGTAAAATACATTTAGGGCATTATCTCCAAATAAAAAAGATGATTGATTTACAAAATGCTGGAT TTGATATAATTATATTGTTGGCTGATTTACACGCCTATTTAAACCAGAAAGGAGAGTTGGATGAGATTAGAAAAATAGGAGATTATAACAAAAAA GTTTTTGAAGCAATGGGGTTAAAGGCAAAATATCTTTATGGAAGTCCTTTCCAGCTTGATAAGGATTATACACTGAATGTCTATAGATTGGCTTTA AAAACTACCTTAAAAAGAGCAAGAAGGAGTATGGAACTTATAGCAAGAGAGGATGAAAATCCAAAGGTTGCTGAAGTTATCTATCCAATAATGC AGGTTAATACGAGTCATTATCTGGGCGTTGATGTTGCAGTTGGAGGGATGGAGCAGAGAAAAATACACATGTTAGCAAGGGAGCTTTTACCAAA AAAGGTTGTTTGTATTCACAACCCTGTCTTAACGGGTTTGGATGGAGAAGGAAAGATGAGTTCTTCAAAAGGGAATTTTATAGCTGTTGATGACT CTCCAGAAGAGATTAGGGCTAAGATAAAGAAAGCATACTGCCCAGCTGGAGTTGTTGAAGGAAATCCAATAATGGAGATAGCTAAATACTTCCT TGAATATCCTTTAACCATAAAAAGGCCAGAAAAATTTGGTGGAGATTTGACAGTTAATAGCTATGAGGAGTTAGAGAGTTTATTTAAAAATAAGG AATTGCATCCAATGGATTTAAAAAATGCTGTAGCTGAAGAACTTATAAAGATTTTAGAGCCAATTAGAAAGAGATTATAATAAGTCGACCATCAT CATCATCATCCACCTGGCGGACCTGCACGCGTACCTGAACCAGAAAGGTGAACTGGACGAAATCCGTAAAATCGGTGACTACAACAAAAAAGTT TTCGAAGCGATGGGTCTGAAAGCGAAATACGTTTACGGTTCTGAATGGATGCTGGACAAAGACTACACCCTGAACGTTTACCGTCTGGCGCTGA AAACCACCCTGAAACGTGCGCGTCGTTCTATGGAACTGATCGCGCGTGAAGACGAAAACCCGAAAGTTGCGGAAGTTATCTACCCGATCATGCA GGTTAACGGTATCCACTACAAAGGTGTTGACGTTGCGGTTGGTGGTATGGAACAGCGTAAAATCCACATGCTGGCGCGTGAACTGCTGCCGAAA AAAGTTGTTTGCATCCACAACCCGGTTCTGACCGGTCTGGACGGTGAAGGTAAAATGTCTTCTTCTAAAGGTAACTTCATCGCGGTTGACGACTC TCCGGAAGAAATCCGTGCGAAAATCAAAAAAGCGTACTGCCCGGCGGGTGTTGTTGAAGGTAACCCGATCATGGAAATCGCGAAATACTTCCTG GAATACCCGCTGACCATCAAAGGTCCGGAAAAATTCGGTGGTGACCTGACCGTTAACTCTTACGAAGAACTGGAATCTCTGTTCAAAAACAAAG AACTGCACCCGATGGACCTGAAAAACGCGGTTGCGGAAGAACTGATCAAAATCCTGGAACCGATCCGTAAACGTCTGTAACTGCAGTTTCAAAC GCTAAATTGCCTGATGCGCTACGCTTATCAGGCCTACATGATCTCTGCAATATATTGAGTTTGCGTGCTTTTGTAGGCCGGATAAGGCGTTCACGC CGCATCCGGCAAGAAACAGCAAACAATCCAAAACGCCGCGTTCAGCGGCGTTTTTTCTGCTTTTCTTCGCGAATTAATTCCGCTTCGCAACATGTG AGCACCGGTTTATTGACTACCGGAAGCAGTGTGACCGTGTGCTTCTCAAATGCCTGAGGCCAGTTTGCTCAGGCTCTCCCCGTGGAGGTAATAAT TGACGATATGATCAGTGCACGGCTAACTAAGCGGCCTGCTGACTTTCTCGCCGATCAAAAGGCATTTTGCTATTAAGGGATTGACGAGGGCGTAT CTGCGCAGTAAGATGCGCCCCGCATTCCGGCGGTAGTTCAGCAGGGCAGAACGGCGGACTCTAAATCCGCATGGCAGGGGTTCAAATCCCCTCC GCCGGACCAAATTCGAAAAGCCTGCTCAACGAGCAGGCTTTTTTGCATGCTCGAGCAGCTCAGGGTCGTTTCAAACGCTAAATTGCCTGATGCGC TACGCTTATCAGGCCTACATGATCTCTGCAATATATTGAGTTTGCGTGCTTTTGTAGGCCGGATAAGGCGTTCACGCCGCATCCGGCAAGAAACA GCAAACAATCCAAAACGCCGCGTTCAGCGGCGTTTTTTCTGCTTTTCTTCGCGAATTAATTCCGCTTCGCACATGTGAGCAAAAGGCCAGCAAAA GGCCAGGAACCGCTCGAGCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCCTGAGTAGGACAAATCCGCCGGGAGCTGTCCCTCCTGTTCAGCT ACTGACGGGGTGGTGCGTAACGGCAAAAGCACCGCCGGACATCAGCGCTAGCGGAGTGTATACTGGCTTACTATGTTGGCACTGATGAGGGTG TCAGTGAAGTGCTTCATGTGGCAGGAGAAAAAAGGCTGCACCGGTGCGTCAGCAGAATATGTGATACAGGATATATTCCGCTTCCTCGCTCACT GACTCGCTACGCTCGGTCGTTCGACTGCGGCGAGCGGAAATGGCTTACGAACGGGGCGGAGATTTCCTGGAAGATGCCAGGAAGATACTTAAC AGGGAAGTGAGAGGGCCGCGGCAAAGCCGTTTTTCCATAGGCTCCGCCCCCCTGACAAGCATCACGAAATCTGACGCTCAAATCAGTGGTGGC GAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGCGGCTCCCTCGTGCGCTCTCCTGTTCCTGCCTTTCGGTTTACCGGTGTCATTC CGCTGTTATGGCCGCGTTTGTCTCATTCCACGCCTGACACTCAGTTCCGGGTAGGCAGTTCGCTCCAAGCTGGACTGTATGCACGAACCCCCCGTT CAGTCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGAAAGACATGCAAAAGCACCACTGGCAGCAGCCACTGGTAATTG ATTTAGAGGAGTTAGTCTTGAAGTCAT pDul GCGCCGGTTAAGGCTAAACTGAAAGGACAAGTTTTGGTGACTGCGCTCCTCCAAGCCAGTTACCTCGGTTCAAAGAGTTGGTAGCTCAGAGAAC e- CTTCGAAAAACCGCCCTGCAAGGCGGTTTTTTCGTTTTCAGAGCAAGAGATTACGCGCAGACCAAAACGATCTCAAGAAGATCATCTTATTAATC AGATAAAATATTTCTAGATTTCAGTGCAATTTATCTCTTCAAATGTAGCACCTGAAGTCAGCCCCATACGATATAAGTTGTAATTCTCATGTTTGAC MjC AGCTTATCATCGATAAGCTTTAATGCGGTAGTTTATCACAGTTAAATTGCTAACGCAGTCAGGCACCGTGTATGAAATCTAACAATGCGCTCATCG ouRS TCATCCTCGGCACCGTCACCCTGGATGCTGTAGGCATAGGCTTGGTTATGCCGGTACTGCCGGGCCTCTTGCGGGATATCGTCCATTCCGACAGC (incl ATCGCCAGTCACTATGGCGTGCTGCTAGCGCTATATGCGTTGATGCAATTTCTATGCGCACCCGTTCTCGGAGCACTGTCCGACCGCTTTGGCCGC udes CGCCCAGTCCTGCTCGCTTCGCTACTTGGAGCCACTATCGACTACGCGATCATGGCGACCACACCCGTCCTGTGGATCCTCTACGCCGGACGCATC GTGGCCGGCATCACCGGCGCCACAGGTGCGGTTGCTGGCGCCTATATCGCCGACATCACCGATGGGGAAGATCGGGCTCGCCACTTCGGGCTC a ATGAGCGCTTGTTTCGGCGTGGGTATGGTGGCAGGCCCCGTGGCCGGGGGACTGTTGGGCGCCATCTCCTTGCATGCACCATTCCTTGCGGCGG copy CGGTGCTCAACGGCCTCAACCTACTACTGGGCTGCTTCCTAATGCAGGAGTCGCATAAGGGAGAGCGTCGACCGATGCCCTTGAGAGCCTTCAA of CCCAGTCAGCTCCTTCCGGTGGGCGCGGGGCATGACTATCGTCGCCGCACTTATGACTGTCTTCTTTATCATGCAACTCGTAGGACAGGTGCCGG CAGCGCTCTGGGTCATTTTCGGCGAGGACCGCTTTCGCTGGAGCGCGACGATGATCGGCCTGTCGCTTGCGGTATTCGGAATCTTGCACGCCCTC GCTCAAGCCTTCGTCACTGGTCCCGCCACCAAACGTTTCGGCGAGAAGCAGGCCATTATCGCCGGCATGGCGGCCGACGCACTGGGCTACGTCT gene 𝑇𝑇 TGCTGGCGTTCGCGACGCGAGGCTGGATGGCCTTCCCCATTATGATTCTTCTCGCTTCCGGCGGCATCGGGATGCCCGCGTTGCAGGCCATGCTG 𝐶𝐶 )𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 TCCAGGCAGGTAGATGACGACCATCAGGGACAGCTTCAAGGATCGCTCGCGGCTCTTACCAGCCTAACTTCGATCATTGGACCGCTGATCGTCAC GGCGATTTATGCCGCCTCGGCGAGCACATGGAACGGGTTGGCATGGATTGTAGGCGCCGCCCTATACCTTGTCTGCCTCCCCGCGTTGCGTCGC GGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGGATTCACCACTCCAAGAATTGGAGCCAATCAATTCTT GCGGAGAACTGTGAATGCGCAAACCAACCCTTGGCAGAACATATCCATCGCGTCCGTATAATATCATACGCTGTTATACGTTGTTTACGCTTTGA GGAATCCCATATGGACGAATTCGAAATGATCAAACGTAACACCTCTGAAATCATCTCTGAAGAAGAACTGCGTGAAGTTCTGAAAAAAGACGAA AAATCTGCGGAAATCGGTTTCGAACCGTCTGGTAAAATCCACCTGGGTCACTACCTGCAGATCAAAAAAATGATCGACCTGCAGAACGCGGGTT TCGACATCATCATCCATCTGGGTGACCTGGGAGCGTACCTGAACCAGAAAGGTGAACTGGACGAAATCCGTAAAATCGGTGACTACAACAAAAA AGTTTTCGAAGCGATGGGTCTGAAAGCGAAATACGTTTACGGTTCTGAATATCATCTGGACAAAGACTACACCCTGAACGTTTACCGTCTGGCGC TGAAAACCACCCTGAAACGTGCGCGTCGTTCTATGGAACTGATCGCGCGTGAAGACGAAAACCCGAAAGTTGCGGAAGTTATCTACCCGATCAT GCAGGTTAACGGTATCCACTACGGTGGTGTTGACGTTGCGGTTGGTGGTATGGAACAGCGTAAAATCCACATGCTGGCGCGTGAACTGCTGCCG

126 AAAAAAGTTGTTTGCATCCACAACCCGGTTCTGACCGGTCTGGACGGTGAAGGTAAAATGTCTTCTTCTAAAGGTAACTTCATCGCGGTTGACGA CTCTCCGGAAGAAATCCGTGCGAAAATCAAAAAAGCGTACTGCCCGGCAGGTGTTGTTGAAGGTAACCCGATCATGGAAATCGCGAAATACTTC CTGGAATACCCGCTGACCATCAAACGTCCGGAAAAATTCGGTGGTGACCTGACCGTTAACTCTTACGAAGAACTGGAATCTCTGTTCAAAAACAA AGAACTGCACCCGATGGACCTGAAAAACGCGGTTGCGGAAGAACTGATCAAAATCCTGGAACCGATCCGTAAACGTCTGTAACTGCAGTTTCAA ACGCTAAATTGCCTGATGCGCTACGCTTATCAGGCCTACATGATCTCTGCAATATATTGAGTTTGCGTGCTTTTGTAGGCCGGATAAGGCGTTCAC GCCGCATCCGGCAAGAAACAGCAAACAATCCAAAACGCCGCGTTCAGCGGCGTTTTTTCTGCTTTTCTTCGCGAATTAATTCCGCTTCGCAACATG TGAGCACCGGTTTATTGACTACCGGAAGCAGTGTGACCGTGTGCTTCTCAAATGCCTGAGGCCAGTTTGCTCAGGCTCTCCCCGTGGAGGTAATA ATTGACGATATGATCAGTGCACGGCTAACTAAGCGGCCTGCTGACTTTCTCGCCGATCAAAAGGCATTTTGCTATTAAGGGATTGACGAGGGCGT ATCTGCGCAGTAAGATGCGCCCCGCATTCCGGCGGTAGTTCAGCAGGGCAGAACGGCGGACTCTAAATCCGCATGGCAGGGGTTCAAATCCCCT CCGCCGGACCAAATTCGAAAAGCCTGCTCAACGAGCAGGCTTTTTTGCATGCTCGAGCAGCTCAGGGTCGTTTCAAACGCTAAATTGCCTGATGC GCTACGCTTATCAGGCCTACATGATCTCTGCAATATATTGAGTTTGCGTGCTTTTGTAGGCCGGATAAGGCGTTCACGCCGCATCCGGCAAGAAA CAGCAAACAATCCAAAACGCCGCGTTCAGCGGCGTTTTTTCTGCTTTTCTTCGCGAATTAATTCCGCTTCGCACATGTGAGCAAAAGGCCAGCAAA AGGCCAGGAACCGCTCGAGCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCCTGAGTAGGACAAATCCGCCGGGAGCTGTCCCTCCTGTTCAGC TACTGACGGGGTGGTGCGTAACGGCAAAAGCACCGCCGGACATCAGCGCTAGCGGAGTGTATACTGGCTTACTATGTTGGCACTGATGAGGGT GTCAGTGAAGTGCTTCATGTGGCAGGAGAAAAAAGGCTGCACCGGTGCGTCAGCAGAATATGTGATACAGGATATATTCCGCTTCCTCGCTCAC TGACTCGCTACGCTCGGTCGTTCGACTGCGGCGAGCGGAAATGGCTTACGAACGGGGCGGAGATTTCCTGGAAGATGCCAGGAAGATACTTAA CAGGGAAGTGAGAGGGCCGCGGCAAAGCCGTTTTTCCATAGGCTCCGCCCCCCTGACAAGCATCACGAAATCTGACGCTCAAATCAGTGGTGGC GAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGCGGCTCCCTCGTGCGCTCTCCTGTTCCTGCCTTTCGGTTTACCGGTGTCATTC CGCTGTTATGGCCGCGTTTGTCTCATTCCACGCCTGACACTCAGTTCCGGGTAGGCAGTTCGCTCCAAGCTGGACTGTATGCACGAACCCCCCGTT CAGTCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGAAAGACATGCAAAAGCACCACTGGCAGCAGCCACTGGTAATTG ATTTAGAGGAGTTAGTCTTGAAGTCAT pDul GTTCGCGACGCGAGGCTGGATGGCCTTCCCCATTATGATTCTTCTCGCTTCCGGCGGCATCGGGATGCCCGCGTTGCAGGCCATGCTGTCCAGGC e- AGGTAGATGACGACCATCAGGGACAGCTTCAAGGATCGCTCGCGGCTCTTACCAGCCTAACTTCGATCATTGGACCGCTGATCGTCACGGCGATT TATGCCGCCTCGGCGAGCACATGGAACGGGTTGGCATGGATTGTAGGCGCCGCCCTATACCTTGTCTGCCTCCCCGCGTTGCGTCGCGGTGCAT Sc5O GGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGGATTCACCACTCCAAGAATTGGAGCCAATCAATTCTTGCGGAGA HWR ACTGTGAATGCGCAAACCAACCCTTGGCAGAACATATCCATCGCGTCCGTATAAGATCATACGCCGTTATACGTTGTTTACGCTTTGAGGAATCCC S ATATGTCAAATGATGAGACTGTTGAGAAAGTTACGCAGCAGGTATCGGAGCTTAAATCAACGGACGTGAAGGAGCAAGTAGTGACTCCGTGGG (incl ATGTTGAGGGAGGTGTGGACGAGCAAGGACGTGCGCAAAACATTGATTACGATAAGTTAATTAAACAATTCGGAACTAAACCCGTCAATGAGG AGACACTTAAGCGCTTTAAGCAGGTGACAGGTCGTGAGCCACACCATTTCTTACGCAAAGGGCTGTTCTTCTCGGAACGTGATTTCACAAAGATC udes CTTGACCTGTATGAGCAAGGTAAACCCTTCTTCTTATATTGCGGCCGCGGGCCGTCTTCAGATTCAATGCACCTGGGCCACATGATTCCCTTTGTG a TTTACGAAATGGTTGCAAGAGGTGTTCGATGTTCCCTTGGTTATCGAGCTGACAGATGACGAAAAATTCTTATTTAAACACAAGCTTACTATCAAC copy GATGTTAAAAATTTTGCCCGTGAAAATGCAAAGGATATTATCGCGGTCGGTTTCGACCCAAAGAATACCTTCATCTTCAGTGATTTGCAATATATG of GGCGGTGCATTTTATGAGACGGTCGTGCGTGTTTCACGTCAGATTACGGGCAGCACGGCGAAGGCCGTTTTCGGCTTCAACGACTCTGATTGTAT CGGTAAATTTCACTTTGCTAGTATTCAGATCGCCACTGCCTTCCCATCATCATTTCCGAATGTTCTGGGTTTGCCCGACAAAACGCCATGTCTTATC ACGGCAGCGATCGATCAAGATCCGTACTTCCGTGTATGCCGTGAC GTTGCGGACAAATTGAAATATTCGAAACCCGCACTTTTACACTCGCGTTTT gene 𝑇𝑇 TTCCCTGCCTTGCAAGGGTCAACAACAAAGATGTCAGCCAGTGATGATACCACAGCCATCTTCATGACTGATACGCCCAAGCAGATCCAAAAAAA 𝐶𝐶 )𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 AATTAACAAGTACGCCTTCAGTGGCGGTCAGGTCTCAGCGGACCTTCATCGCGAATTAGGCGGGAATCCAGACGTAGATGTTGCTTACCAGTATC TGAGTTTCTTTAAGGACGACGACGTTTTTCTGAAGGAGTGTTATGATAAATACAAGTCGGGTGAACTGTTAAGCGGGGAAATGAAAAAGTTGTG CATTGAAACTCTTCAAGAGTTTGTCAAAGCATTCCAAGAGCGCCGTGCTCAAGTGGACGAAGAAACTTTGGACAAATTTATGGTACCTCACAAGC TTGTCTGGGGCGAAAAAGAACGTCTTGTCGCGCCCAAGCCTAAAACTAAGCAAGAAAAAAAATAAGTCGACCATCATCATTGAGTTTAAACGGT CTCCAGCTTGGCTGTTTTGGCGGATGAGAGAAGATTTTCAGCCTGATACAGATTAAATCAGAACGCAGAAGCGGTCTGATAAAACAGAATTTGC CTGGCGGCAGTAGCGCGGTGGTCCCACCTGACCCCATGCCGAACTCAGAAGTGAAACGCCGTAGCGCCGATGGTAGTGTGGGGTCTCCCCATG CGAGAGTAGGGAACTGCCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTGTTTGTGAGCTCCCGGTCATCAATCATAATTC CGCTTCGCAACATGTGAGCACCGGTTTATTGACTACCGGAAGCAGTGTGACCGTGTGCTTCTCAAATGCCTGAGGCCAGTTTGCTCAGGCTCTCC CCGTGGAGGTAATAATTGACGATATGATCAGTGCACGGCTAACTAAGCGGCCTGCTGACTTTCTCGCCGATCAAAAGGCATTTTGCTATTAAGGG ATTGACGAGGGCGTATCTGCGCAGTAAGATGCGCCCCGCATTGAAGCGGTGGCTCAAGGGTAGAGCTGGCGCCTCTAAAGCGCCTGGTTGCAG GTTCAAGTCCTGTCCGCTTCACCAAATTCGAAAAGCCTGCTCAACGAGCAGGCTTTTTTGCATGCTCGAGCAGCTCAGGGTCGAATTTGCTTTCGA ATTTCTGCCATTCATCCGTTTCAAACGCTAAATTGCCTGATGCGCTACGCTTATCAGGCCTACATGATCTCTGCAATATATTGAGTTTGCGTGCTTT TGTAGGCCGGATAAGGCGTTCACGCCGCATCCGGCAAGAAACAGCAAACAATCCAAAACGCCGCGTTCAGCGGCGTTTTTTCTGCTTTTCTTCGC GAATTAATTCCGCTTCGCACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGCTCGAGCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCC TGAGTAGGACAAATCCGCCGGGAGCTGTCCCTCCTGTTCAGCTACTGACGGGGTGGTGCGTAACGGCAAAAGCACCGCCGGACATCAGCGCTA GCGGAGTGTATACTGGCTTACTATGTTGGCACTGATGAGGGTGTCAGTGAAGTGCTTCATGTGGCAGGAGAAAAAAGGCTGCACCGGTGCGTC AGCAGAATATGTGATACAGGATATATTCCGCTTCCTCGCTCACTGACTCGCTACGCTCGGTCGTTCGACTGCGGCGAGCGGAAATGGCTTACGAA CGGGGCGGAGATTTCCTGGAAGATGCCAGGAAGATACTTAACAGGGAAGTGAGAGGGCCGCGGCAAAGCCGTTTTTCCATAGGCTCCGCCCCC CTGACAAGCATCACGAAATCTGACGCTCAAATCAGTGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGCGGCTCCCT CGTGCGCTCTCCTGTTCCTGCCTTTCGGTTTACCGGTGTCATTCCGCTGTTATGGCCGCGTTTGTCTCATTCCACGCCTGACACTCAGTTCCGGGTA GGCAGTTCGCTCCAAGCTGGACTGTATGCACGAACCCCCCGTTCAGTCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGG AAAGACATGCAAAAGCACCACTGGCAGCAGCCACTGGTAATTGATTTAGAGGAGTTAGTCTTGAAGTCATGCGCCGGTTAAGGCTAAACTGAAA GGACAAGTTTTGGTGACTGCGCTCCTCCAAGCCAGTTACCTCGGTTCAAAGAGTTGGTAGCTCAGAGAACCTTCGAAAAACCGCCCTGCAAGGC GGTTTTTTCGTTTTCAGAGCAAGAGATTACGCGCAGACCAAAACGATCTCAAGAAGATCATCTTATTAATCAGATAAAATATTTCTAGATTTCAGT GCAATTTATCTCTTCAAATGTAGCACCTGAAGTCAGCCCCATACGATATAAGTTGTAATTCTCATGTTTGACAGCTTATCATCGATAAGCTTTAATG CGGTAGTTTATCACAGTTAAATTGCTAACGCAGTCAGGCACCGTGTATGAAATCTAACAATGCGCTCATCGTCATCCTCGGCACCGTCACCCTGG ATGCTGTAGGCATAGGCTTGGTTATGCCGGTACTGCCGGGCCTCTTGCGGGATATCGTCCATTCCGACAGCATCGCCAGTCACTATGGCGTGCTG CTAGCGCTATATGCGTTGATGCAATTTCTATGCGCACCCGTTCTCGGAGCACTGTCCGACCGCTTTGGCCGCCGCCCAGTCCTGCTCGCTTCGCTA CTTGGAGCCACTATCGACTACGCGATCATGGCGACCACACCCGTCCTGTGGATCCTCTACGCCGGACGCATCGTGGCCGGCATCACCGGCGCCAC AGGTGCGGTTGCTGGCGCCTATATCGCCGACATCACCGATGGGGAAGATCGGGCTCGCCACTTCGGGCTCATGAGCGCTTGTTTCGGCGTGGGT

127 ATGGTGGCAGGCCCCGTGGCCGGGGGACTGTTGGGCGCCATCTCCTTGCATGCACCATTCCTTGCGGCGGCGGTGCTCAACGGCCTCAACCTAC TACTGGGCTGCTTCCTAATGCAGGAGTCGCATAAGGGAGAGCGTCGACCGATGCCCTTGAGAGCCTTCAACCCAGTCAGCTCCTTCCGGTGGGC GCGGGGCATGACTATCGTCGCCGCACTTATGACTGTCTTCTTTATCATGCAACTCGTAGGACAGGTGCCGGCAGCGCTCTGGGTCATTTTCGGCG AGGACCGCTTTCGCTGGAGCGCGACGATGATCGGCCTGTCGCTTGCGGTATTCGGAATCTTGCACGCCCTCGCTCAAGCCTTCGTCACTGGTCCC GCCACCAAACGTTTCGGCGAGAAGCAGGCCATTATCGCCGGCATGGCGGCCGACGCACTGGGCTACGTCTTGCTGGC pDul TAACTATATGCGAAAACTCGATAGGATTTTACCAGGCCCAATAAAAATTTTCGAAGTCGGACCTTGTTACCGGAAAGAGTCTGACGGCAAAGAG e- CACCTGGAAGAATTTACTATGGTGAACTTCGCTCAGATGGGTTCGGGATGTACTCGGGAAAATCTTGAAGCTCTCATCAAAGAGTTTCTGGACTA TCTGGAAATCGACTTCGAAATCGTAGGAGATTCCTGTATGGTCTTTGGGGATACTCTTGATATAATGCACGGGGACCTGGAGCTTTCTTCGGCAG MbA TCGTCGGGCCAGTTTCTCTTGATAGAGAATGGGGTATTGACAAACCATGGATAGGTGCAGGTTTTGGTCTTGAACGCTTGCTCAAGGTTATGCAC bKRS GGCTTTAAAAACATTAAGAGGGCATCAAGGTCCGAATCTTACTATAATGGGATTTCAACCAATCTATAACTGCAGTTTCAAACGCTAAATTGCCT - GATGCGCTACGCTTATCAGGCCTACATGATCTCTGCAATATATTGAGTTTGCGTGCTTTTGTAGGCCGGATAAGGCGTTCACGCCGCATCCGGCA 2xtR AGAAACAGCAAACAATCCAAAACGCCGCGTTCAGCGGCGTTTTTTCTGCTTTTCTTCGCGAATTAATTCCGCTTCGCACATGTGAGCAAAAGGCC AGCAAAAGGCCAGGAACCGCTCGAGCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCCTGAGTAGGACAAATCCGCCGGGAGCTGTCCCTCCTG NA TTCAGCTACTGACGGGGTGGTGCGTAACGGCAAAAGCACCGCCGGACATCAGCGCTAGCGGAGTGTATACTGGCTTACTATGTTGGCACTGATG (incl AGGGTGTCAGTGAAGTGCTTCATGTGGCAGGAGAAAAAAGGCTGCACCGGTGCGTCAGCAGAATATGTGATACAGGATATATTCCGCTTCCTCG udes CTCACTGACTCGCTACGCTCGGTCGTTCGACTGCGGCGAGCGGAAATGGCTTACGAACGGGGCGGAGATTTCCTGGAAGATGCCAGGAAGATA two CTTAACAGGGAAGTGAGAGGGCCGCGGCAAAGCCGTTTTTCCATAGGCTCCGCCCCCCTGACAAGCATCACGAAATCTGACGCTCAAATCAGTG GTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGCGGCTCCCTCGTGCGCTCTCCTGTTCCTGCCTTTCGGTTTACCGGTGT copi CATTCCGCTGTTATGGCCGCGTTTGTCTCATTCCACGCCTGACACTCAGTTCCGGGTAGGCAGTTCGCTCCAAGCTGGACTGTATGCACGAACCCC es of CCGTTCAGTCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGAAAGACATGCAAAAGCACCACTGGCAGCAGCCACTGGT AATTGATTTAGAGGAGTTAGTCTTGAAGTCATGCGCCGGTTAAGGCTAAACTGAAAGGACAAGTTTTGGTGACTGCGCTCCTCCA AGCCAGTTAC CTCGGTTCAAAGAGTTGGTAGCTCAGAGAACCTTCGAAAAACCGCCCTGCAAGGCGGTTTTTTCGTTTTCAGAGCAAGAGATTACGCGCAGACC gene 𝑃𝑃 𝐶𝐶 AAAACGATCTCAAGAAGATCATCTTATTAATCAGATAAAATATTTCTAGATTTCAGTGCAATTTATCTCTTCAAATGTAGCACCTGAAGTCAGCCCC )𝑡𝑡 𝑅𝑅𝑅𝑅𝑅𝑅 ATACGATATAAGTTGTAATTCTCATGTTTGACAGCTTATCATCGATAAGCTTTAATGCGGTAGTTTATCACAGTTAAATTGCTAACGCAGTCAGGC ACCGTGTATGAAATCTAACAATGCGCTCATCGTCATCCTCGGCACCGTCACCCTGGATGCTGTAGGCATAGGCTTGGTTATGCCGGTACTGCCGG GCCTCTTGCGGGATATCGTCCATTCCGACAGCATCGCCAGTCACTATGGCGTGCTGCTAGCGCTATATGCGTTGATGCAATTTCTATGCGCACCCG TTCTCGGAGCACTGTCCGACCGCTTTGGCCGCCGCCCAGTCCTGCTCGCTTCGCTACTTGGAGCCACTATCGACTACGCGATCATGGCGACCACA CCCGTCCTGTGGATCCTCTACGCCGGACGCATCGTGGCCGGCATCACCGGCGCCACAGGTGCGGTTGCTGGCGCCTATATCGCCGACATCACCG ATGGGGAAGATCGGGCTCGCCACTTCGGGCTCATGAGCGCTTGTTTCGGCGTGGGTATGGTGGCAGGCCCCGTGGCCGGGGGACTGTTGGGCG CCATCTCCTTGCATGCACCATTCCTTGCGGCGGCGGTGCTCAACGGCCTCAACCTACTACTGGGCTGCTTCCTAATGCAGGAGTCGCATAAGGGA GAGCGTCGACCGATGCCCTTGAGAGCCTTCAACCCAGTCAGCTCCTTCCGGTGGGCGCGGGGCATGACTATCGTCGCCGCACTTATGACTGTCTT CTTTATCATGCAACTCGTAGGACAGGTGCCGGCAGCGCTCTGGGTCATTTTCGGCGAGGACCGCTTTCGCTGGAGCGCGACGATGATCGGCCTG TCGCTTGCGGTATTCGGAATCTTGCACGCCCTCGCTCAAGCCTTCGTCACTGGTCCCGCCACCAAACGTTTCGGCGAGAAGCAGGCCATTATCGC CGGCATGGCGGCCGACGCACTGGGCTACGTCTTGCTGGCGTTCGCGACGCGAGGCTGGATGGCCTTCCCCATTATGATTCTTCTCGCTTCCGGCG GCATCGGGATGCCCGCGTTGCAGGCCATGCTGTCCAGGCAGGTAGATGACGACCATCAGGGACAGCTTCAAGGATCGCTCGCGGCTCTTACCAG CCTAACTTCGATCATTGGACCGCTGATCGTCACGGCGATTTATGCCGCCTCGGCGAGCACATGGAACGGGTTGGCATGGATTGTAGGCGCCGCC CTATACCTTGTCTGCCTCCCCGCGTTGCGTCGCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGGATT CACCACTCCAAGAATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAACCAACCCTTGGCAGAACATATCCATCGCGTCCGCCATCT CCAGCAGCCGCACGCGGCGCATCTCGGGCTCCTTGCATGCACCATTCCTTGCGGCGGCGGTGCTCAACGGCCTCAACCTACTACTGGGCTGCTTC CTAATGCAGGAGTCGCATAAGGGAGAGCGTCTGGCGAAAGGGGGATGTGCTGCAAGGCGATTAAGTTGGGTAACGCCAGGGTTTTCCCAGTCA CGACGTTGTAAAACGACGGCCAGTGCCAAGCTTAAAAAAAATCCTTAGCTTTCGCTAAGGATCTGCAGTGGCGGAAACCCCGGGAATCTAACCC GGCTGAACGGATTTAGAGTCCATTCGATCTACATGATCAGGTTTCCGAATTCAGCGTTACAAGTATTACACAAAGTTTTTTATGTTGAGAATATTT TTTTGATGGGAGGCATTTTGCTATTAAGGGATTGACGAGGGCGTATCTGCGCAGTAAGATGCGCCCCGCATTGGAAACCTGATCATGTAGATCG AATGGACTCTAAATCCGTTCAGCCGGGTTAGATTCCCGGGGTTTCCGCCAAATTCGAAAAGCCTGCTCAACGAGCAGGCTTTTTTGCATGCCCCTC GGGTTGTCAGCCTGTCCCGCTTATAAGATCATACGCCGTTATACGTTGTTTACGCTTTGAGGAATCCCATATGGATAAAAAACCATTAGATGTTTT AATATCTGCGACCGGGCTCTGGATGTCCAGGACTGGCACGCTCCACAAAATCAAGCACCATGAGGTCTCAAGAAGTAAAATATACATTGAAATG GCGTGTGGAGACCATCTTGTTGTGAATAATTCCAGGAGTTGTAGAACAGCCAGAGCATTCAGACATCATAAGTACAGAAAAACCTGCAAACGAT GTAGGGTTTCGGACGAGGATATCAATAATTTTCTCACAAGATCAACCGAAAGCAAAAACAGTGTGAAAGTTAGGGTAGTTTCTGCTCCAAAGGT CAAAAAAGCTATGCCGAAATCAGTTTCAAGGGCTCCGAAGCCTCTGGAAAATTCTGTTTCTGCAAAGGCATCAACGAACACATCCAGATCTGTAC CTTCGCCTGCAAAATCAACTCCAAATTCGTCTGTTCCCGCATCGGCTCCTGCTCCTTCACTTACAAGAAGCCAGCTTGATAGGGTTGAGGCTCTCTT AAGTCCAGAGGATAAAATTTCTCTAAATATGGCAAAGCCTTTCAGGGAACTTGAGCCTGAACTTGTGACAAGAAGAAAAAACGATTTTCAGCGG CTCTATACCAATGATAGAGAAGACTACCTCGGTAAACTCGAACGTGATATTACGAAATTTTTCGTAGACCGGGGTTTTCTGGAGATAAAGTCTCC TATCCTTATTCCGGCGGAATACGTGGAGAGAATGGGTATTAATAATGATACTGAACTTTCAAAACAGATCTTCCGGGTGGATAAAAATCTCTGCT TGAGGCCAATGCTTGCCCCGACTCTGTA pDul TAACTATATGCGAAAACTCGATAGGATTTTACCAGGCCCAATAAAAATTTTCGAAGTCGGACCTTGTTACCGGAAAGAGTCTGACGGCAAAGAG e- CACCTGGAAGAATTTACTATGGTGAACTTCGCTCAGATGGGTTCGGGATGTACTCGGGAAAATCTTGAAGCTCTCATCAAAGAGTTTCTGGACTA TCTGGAAATCGACTTCGAAATCGTAGGAGATTCCTGTATGGTCTTTGGGGATACTCTTGATATAATGCACGGGGACCTGGAGCTTTCTTCGGCAG MbA TCGTCGGGCCAGTTTCTCTTGATAGAGAATGGGGTATTGACAAACCATGGATAGGTGCAGGTTTTGGTCTTGAACGCTTGCTCAAGGTTATGCAC bKRS GGCTTTAAAAACATTAAGAGGGCATCAAGGTCCGAATCTTACTATAATGGGATTTCAACCAATCTATAACTGCAGTTTCAAACGCTAAATTGCCT - GATGCGCTACGCTTATCAGGCCTACATGATCTCTGCAATATATTGAGTTTGCGTGCTTTTGTAGGCCGGATAAGGCGTTCACGCCGCATCCGGCA 2xtR AGAAACAGCAAACAATCCAAAACGCCGCGTTCAGCGGCGTTTTTTCTGCTTTTCTTCGCGAATTAATTCCGCGAGAAGAAACCAATTGTCCATATT GCATCAGACATTGCCGTCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAGCATTCTGTAACAAAGCGGGA NA- CCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATTTGCACGGCGTCACACTTTGCTATGCC RBS1 ATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGTTTCTCCATACCCGTTTTTAAGGAGGTAAAAAA - TGGGTAATAATCGCCCCGTATACATTCCTCAGCCACGCCCGCCGCACCCACGCCTTTAGTTCGCACATGTGAGCAAAAGGCCAGCAAAAGGCCAG GAACCGCTCGAGCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCCTGAGTAGGACAAATCCGCCGGGAGCTGTCCCTCCTGTTCAGCTACTGAC

128 api1 GGGGTGGTGCGTAACGGCAAAAGCACCGCCGGACATCAGCGCTAGCGGAGTGTATACTGGCTTACTATGTTGGCACTGATGAGGGTGTCAGTG b AAGTGCTTCATGTGGCAGGAGAAAAAAGGCTGCACCGGTGCGTCAGCAGAATATGTGATACAGGATATATTCCGCTTCCTCGCTCACTGACTCG CTACGCTCGGTCGTTCGACTGCGGCGAGCGGAAATGGCTTACGAACGGGGCGGAGATTTCCTGGAAGATGCCAGGAAGATACTTAACAGGGAA (incl GTGAGAGGGCCGCGGCAAAGCCGTTTTTCCATAGGCTCCGCCCCCCTGACAAGCATCACGAAATCTGACGCTCAAATCAGTGGTGGCGAAACCC udes GACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGCGGCTCCCTCGTGCGCTCTCCTGTTCCTGCCTTTCGGTTTACCGGTGTCATTCCGCTGTT two ATGGCCGCGTTTGTCTCATTCCACGCCTGACACTCAGTTCCGGGTAGGCAGTTCGCTCCAAGCTGGACTGTATGCACGAACCCCCCGTTCAGTCC copi GACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGAAAGACATGCAAAAGCACCACTGGCAGCAGCCACTGGTAATTGATTTAG AGGAGTTAGTCTTGAAGTCATGCGCCGGTTAAGGCTAAACTGAAAGGACAAGTTTTGGTGACTGCGCTCCTCCAAGCCAGTTACCTCGGTTCAAA es of GAGTTGGTAGCTCAGAGAACCTTCGAAAAACCGCCCTGCAAGGCGGTTTTTTCGTTTTCAGAGCAAGAGATTACGCGCAGACCAAAACGATCTC AAGAAGATCATCTTATTAATCAGATA AAATATTTCTAGATTTCAGTGCAATTTATCTCTTCAAATGTAGCACCTGAAGTCAGCCCCATACGATATAA gene 𝑃𝑃 GTTGTAATTCTCATGTTTGACAGCTTATCATCGATAAGCTTTAATGCGGTAGTTTATCACAGTTAAATTGCTAACGCAGTCAGGCACCGTGTATGA 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝐶𝐶 AATCTAACAATGCGCTCATCGTCATCCTCGGCACCGTCACCCTGGATGCTGTAGGCATAGGCTTGGTTATGCCGGTACTGCCGGGCCTCTTGCGG ) GATATCGTCCATTCCGACAGCATCGCCAGTCACTATGGCGTGCTGCTAGCGCTATATGCGTTGATGCAATTTCTATGCGCACCCGTTCTCGGAGCA CTGTCCGACCGCTTTGGCCGCCGCCCAGTCCTGCTCGCTTCGCTACTTGGAGCCACTATCGACTACGCGATCATGGCGACCACACCCGTCCTGTG GATCCTCTACGCCGGACGCATCGTGGCCGGCATCACCGGCGCCACAGGTGCGGTTGCTGGCGCCTATATCGCCGACATCACCGATGGGGAAGAT CGGGCTCGCCACTTCGGGCTCATGAGCGCTTGTTTCGGCGTGGGTATGGTGGCAGGCCCCGTGGCCGGGGGACTGTTGGGCGCCATCTCCTTGC ATGCACCATTCCTTGCGGCGGCGGTGCTCAACGGCCTCAACCTACTACTGGGCTGCTTCCTAATGCAGGAGTCGCATAAGGGAGAGCGTCGACC GATGCCCTTGAGAGCCTTCAACCCAGTCAGCTCCTTCCGGTGGGCGCGGGGCATGACTATCGTCGCCGCACTTATGACTGTCTTCTTTATCATGCA ACTCGTAGGACAGGTGCCGGCAGCGCTCTGGGTCATTTTCGGCGAGGACCGCTTTCGCTGGAGCGCGACGATGATCGGCCTGTCGCTTGCGGTA TTCGGAATCTTGCACGCCCTCGCTCAAGCCTTCGTCACTGGTCCCGCCACCAAACGTTTCGGCGAGAAGCAGGCCATTATCGCCGGCATGGCGGC CGACGCACTGGGCTACGTCTTGCTGGCGTTCGCGACGCGAGGCTGGATGGCCTTCCCCATTATGATTCTTCTCGCTTCCGGCGGCATCGGGATGC CCGCGTTGCAGGCCATGCTGTCCAGGCAGGTAGATGACGACCATCAGGGACAGCTTCAAGGATCGCTCGCGGCTCTTACCAGCCTAACTTCGAT CATTGGACCGCTGATCGTCACGGCGATTTATGCCGCCTCGGCGAGCACATGGAACGGGTTGGCATGGATTGTAGGCGCCGCCCTATACCTTGTC TGCCTCCCCGCGTTGCGTCGCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGGATTCACCACTCCAAG AATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAACCAACCCTTGGCAGAACATATCCATCGCGTCCGCCATCTCCAGCAGCCGCA CGCGGCGCATCTCGGGCTCCTTGCATGCACCATTCCTTGCGGCGGCGGTGCTCAACGGCCTCAACCTACTACTGGGCTGCTTCCTAATGCAGGAG TCGCATAAGGGAGAGCGTCTGGCGAAAGGGGGATGTGCTGCAAGGCGATTAAGTTGGGTAACGCCAGGGTTTTCCCAGTCACGACGTTGTAAA ACGACGGCCAGTGCCAAGCTTAAAAAAAATCCTTAGCTTTCGCTAAGGATCTGCAGTGGCGGAAACCCCGGGAATCTAACCCGGCTGAACGGAT TTAGAGTCCATTCGATCTACATGATCAGGTTTCCGAATTCAGCGTTACAAGTATTACACAAAGTTTTTTATGTTGAGAATATTTTTTTGATGGGAG GCATTTTGCTATTAAGGGATTGACGAGGGCGTATCTGCGCAGTAAGATGCGCCCCGCATTGGAAACCTGATCATGTAGATCGAATGGACTCTAA ATCCGTTCAGCCGGGTTAGATTCCCGGGGTTTCCGCCAAATTCGAAAAGCCTGCTCAACGAGCAGGCTTTTTTGCATGCCCCTCGGGTTGTCAGC CTGTCCCGCTTATAAGATCATACGCCGTTATACGTTGTTTACGCTTTGAGGAATCCCATATGGATAAAAAACCATTAGATGTTTTAATATCTGCGA CCGGGCTCTGGATGTCCAGGACTGGCACGCTCCACAAAATCAAGCACCATGAGGTCTCAAGAAGTAAAATATACATTGAAATGGCGTGTGGAGA CCATCTTGTTGTGAATAATTCCAGGAGTTGTAGAACAGCCAGAGCATTCAGACATCATAAGTACAGAAAAACCTGCAAACGATGTAGGGTTTCG GACGAGGATATCAATAATTTTCTCACAAGATCAACCGAAAGCAAAAACAGTGTGAAAGTTAGGGTAGTTTCTGCTCCAAAGGTCAAAAAAGCTA TGCCGAAATCAGTTTCAAGGGCTCCGAAGCCTCTGGAAAATTCTGTTTCTGCAAAGGCATCAACGAACACATCCAGATCTGTACCTTCGCCTGCA AAATCAACTCCAAATTCGTCTGTTCCCGCATCGGCTCCTGCTCCTTCACTTACAAGAAGCCAGCTTGATAGGGTTGAGGCTCTCTTAAGTCCAGAG GATAAAATTTCTCTAAATATGGCAAAGCCTTTCAGGGAACTTGAGCCTGAACTTGTGACAAGAAGAAAAAACGATTTTCAGCGGCTCTATACCAA TGATAGAGAAGACTACCTCGGTAAACTCGAACGTGATATTACGAAATTTTTCGTAGACCGGGGTTTTCTGGAGATAAAGTCTCCTATCCTTATTCC GGCGGAATACGTGGAGAGAATGGGTATTAATAATGATACTGAACTTTCAAAACAGATCTTCCGGGTGGATAAAAATCTCTGCTTGAGGCCAATG CTTGCCCCGACTCTGTA pDul GAGAAGAAACCAATTGTCCATATTGCATCAGACATTGCCGTCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTA e- AAAGCATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATTT MbA GCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGTTTCTCCAT bKRS ACCCGTTTTTAAGGAGTTAAGGAGGTAAAAAATGGGTAATAATCGCCCCGTATACATTCCTCAGCCACGCCCGCCGCACCCACGCCTTTAGTTCG CACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGCTCGAGCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCCTGAGTAGGACAAATCC - GCCGGGAGCTGTCCCTCCTGTTCAGCTACTGACGGGGTGGTGCGTAACGGCAAAAGCACCGCCGGACATCAGCGCTAGCGGAGTGTATACTGG 2xtR CTTACTATGTTGGCACTGA NA- RBS2 - api1 b (rest is same as pDul e- MbA bKRS - 2xtR NA-

129 RBS1 - api1b ) pDul GAGAAGAAACCAATTGTCCATATTGCATCAGACATTGCCGTCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTA e- AAAGCATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATTT MbA GCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGTTTCTCCAT bKRS ACCCGTTTTTAAGGAGGTAAAAAATGAAAAATAATGCACCCATATACGTACCACAACCACGCCCGCCGCACCCAAAACTATAATTCGCACATGTG AGCAAAAGGCCAGCAAAAGGCCAGGAACCGCTCGAGCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCCTGAGTAGGACAAATCCGCCGGGAG - CTGTCCCTCCTGTTCAGCTACTGACGGGGTGGTGCGTAACGGCAAAAGCACCGCCGGACATCAGCGCTAGCGGAGTGTATACTGGCTTACTATG 2xtR TTGGCACTGA NA- RBS1 - apiB 5 (rest is same as pDul e- MbA bKRS - 2xtR NA- RBS1 - api1b ) pDul GAGAAGAAACCAATTGTCCATATTGCATCAGACATTGCCGTCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTA e- AAAGCATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATTT MbA GCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGTTTCTCCAT bKRS ACCCGTTTTTAAGGAGGTAAAAAATGGAAAATAATGCACCCATATACGTATCAGGACCACGCCCGCCGCACCCAAGAATATGATTCGCACATGT GAGCAAAAGGCCAGCAAAAGGCCAGGAACCGCTCGAGCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCCTGAGTAGGACAAATCCGCCGGGA - GCTGTCCCTCCTGTTCAGCTACTGACGGGGTGGTGCGTAACGGCAAAAGCACCGCCGGACATCAGCGCTAGCGGAGTGTATACTGGCTTACTAT 2xtR GTTGGCACTGA NA- RBS1 - apiB 8 (rest is same as pDul e- MbA bKRS - 2xtR NA- RBS1 - api1b ) pDul GAGAAGAAACCAATTGTCCATATTGCATCAGACATTGCCGTCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTA e- AAAGCATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATTT GCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGTTTCTCCAT

130 MbA ACCCGTTTTTAAGGAGGTAAAAAATGGCAAATAATACACCCGTATACGTATCACAACCACGCCCGCCGCACCCAAAAATATAATTCGCACATGTG bKRS AGCAAAAGGCCAGCAAAAGGCCAGGAACCGCTCGAGCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCCTGAGTAGGACAAATCCGCCGGGAG - CTGTCCCTCCTGTTCAGCTACTGACGGGGTGGTGCGTAACGGCAAAAGCACCGCCGGACATCAGCGCTAGCGGAGTGTATACTGGCTTACTATG 2xtR TTGGCACTGA

NA- RBS1 - apiB 10 (rest is same as pDul e- MbA bKRS - 2xtR NA- RBS1 - api1b ) pDul GAGAAGAAACCAATTGTCCATATTGCATCAGACATTGCCGTCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTA e- AAAGCATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATTT MbA GCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGTTTCTCCAT bKRS ACCCGTTTTTAAGGAGGTAAAAAATGGCAAATAATGCACCCGTATACGTACCAAAACCACGCCCGCCGCACCCAAGAATATGGTTCGCACATGTG AGCAAAAGGCCAGCAAAAGGCCAGGAACCGCTCGAGCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCCTGAGTAGGACAAATCCGCCGGGAG - CTGTCCCTCCTGTTCAGCTACTGACGGGGTGGTGCGTAACGGCAAAAGCACCGCCGGACATCAGCGCTAGCGGAGTGTATACTGGCTTACTATG 2xtR TTGGCACTGA NA- RBS1 - apiC 3 (rest is same as pDul e- MbA bKRS - 2xtR NA- RBS1 - api1b )

nsAA Incorporation Assays nsAA incorporation was quantified using at least 3 biological and technical replicates as described previously127 with the following modifications: in general, strains harboring indicated

131 reporter and aaRS/tRNA plasmids were inoculated from frozen stocks in biological triplicates

and grown to confluence overnight in 96-well deep well plates. Experimental cultures were

inoculated at 1:10 dilutions of the overnight in 96-well deep well plates containing 100 μL of

either 2xYT, GMML, or LB (for A. tumefaciens) media supplemented with antibiotics, inducers

and nsAAs (or no nsAA). Cells were incubated shaking at their optimal temperatures. In

experiments with apidaecins, the peptide (or water control) was added at around OD600 of 0.5–

0.8, typically after 2–4 h of growth, as adding the peptide earlier was either toxic or did not

result in an improvement in nsAA incorporation. The cells were then further incubated until the

cultures reached confluency (16–24 h). Cells were then centrifuged (5,000 g for 3 min), the

supernatant was removed by decanting the plates and the pellets were washed with 1 mL PBS

at least once (three times for experiments with Cou) and resuspended in 1 mL PBS. 100 μL of the cell suspensions were transferred to a Corning® 96 Well clear flat bottom, black polystyrene plates and absorbance at 600 nm (i.e. OD600) and relative fluorescence units (RFU,

with excitation/emission wavelengths of 485 nm/528 nm for sfGFP and 390 nm/450 nm for

Cou) were measured using a Biotek spectrophotometric plate reader. The read data were blanked and further processed as reported in figures, e.g. the reporter fluorescence was normalized by the OD600 reading to obtain RFU/OD600. Typically, these individual intensity

values were further normalized to the highest average signal within an experiment and the data

were plotted and analyzed in Prism 8.2.1 for Windows.

Specifically, in experiments leading to Figure 14, Figure 19c-d, Figure 20a and Figure 21a

we used cells harboring the reporter plasmid pZE21/Ub-UAG-sfGFP_151UAG (ColE1 origin,

KanR, under Tet induction) and pEVOL aaRS/tRNACUA plasmids (p15A origin, CmR, tRNA genes

132 constitutively expressed, aaRS genes under arabinose induction) as previously described127. In

these experiments, arabinose (to final 0.2%) was present throughout, including the initial

inoculation of the frozen stocks, because the constitutive expression of the aaRS genes did not

affect the growth. After the cells were diluted into 2xYT supplemented with arabinose,

kanamycin (to final 25 μg mL-1), chloramphenicol (to final 12.5 μg mL-1), and nsAAs, at around

OD600 of 0.5–0.8, typically after 2–4 h of growth, sfGFP expression was induced by the addition

of anhydrotetracycline (100 ng mL-1 final) together with the addition of the apidaecin peptides.

For experiments with A. tumefaciens, overnight cultures of A. tumefaciens cells

harboring the plasmids pTD114_sfGFP-1ATG (pBBR1 origin, GmR, under NHL induction) and

pYW15c_MjBpaRS (pSa origin, AmpR, MjBpaRS and genes constitutively expressed 𝑇𝑇𝑇𝑇𝑇𝑇 𝐶𝐶𝐶𝐶𝐶𝐶 under PN25 and proK promoters, respectively) were𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 diluted 1:10 in LB supplemented with

gentamycin (to final 125 μg mL-1), carbenicillin (to final l 25 μg mL-1) and Bpa (or no nsAA). At

around OD600 of 0.5–0.8 (after 3–4 h of growth at 30 °C), sfGFP expression was induced by the addition of NHL (1 μg mL-1 final) together with the addition of the apidaecin peptide.

For experiments leading to Figure 16d, Figure 21b-g, Figure 22a, Figure 23, Figure 25,

Figure 25, Figure 26 the new auto-inducible reporter system was used. In these experiments E.

coli cells harbored two plasmids: an auto-inducible reporter plasmid (ColE1 origin, KanR, under

arabinose induction), and a pDule aaRS/tRNACUA plasmid (p15A origin, TcR, aaRS and tRNA

genes constitutively expressed, under GlnS and proK promoters, respectively). In any strain the

reporter plasmid was either of the pBAD-Ub-UAG-sfGFP_151UAG, pBAD-PopZ-(UAG)0-sfGFP,

pBAD-PopZ-(UAG)2-sfGFP, pBAD-PopZ-(UAG)6-sfGFP or pBAD-PopZ-(UAG)8-sfGFP. The

aaRS/tRNACUA plasmid was either of the pDule-MjBpaRS, pDule-MjCouRS, pDule-Sc5OHWRS,

133 pDule-MbAbKRS-2xtRNA, pDule-MbAbKRS-2xtRNA-RBS1-api1b, pDule-MbAbKRS-2xtRNA-RBS2-

api1b, pDule-MbAbKRS-2xtRNA-RBS1-apiB5, pDule-MbAbKRS-2xtRNA-RBS1-apiB8, and pDule-

MbAbKRS-2xtRNA-RBS1-apiC3) (Table 8). In these experiments, overnight cultures were grown

in 2xYT supplemented with 25 μg mL-1 kanamycin and 5 μg mL-1 tetracycline. The cells were

then directly diluted 1:10 in GMML supplemented with kanamycin (to final 12.5 μg mL-1),

tetracycline (to final 2.5 μg mL-1), glucose (to final 0.05%), arabinose (to final 0.05%), and nsAAs

(or no nsAA). In conditions involving exogenously added peptides, at around OD600 of 0.5–0.8

(after 3–4 h of growth at 37 °C) apidaecin peptides were added. Otherwise, the cells were

incubated until the cultures reached confluency (16–24 h) and they were as detailed above.

Protein Sequence Analysis by LC-MS/MS

Excised gel bands were cut into approximately 1 mm3 pieces. Gel pieces were then subjected to a modified in-gel trypsin digestion procedure231. Gel pieces were washed and

dehydrated with acetonitrile for 10 min. followed by removal of acetonitrile. Pieces were then

completely dried in a speed-vac. Rehydration of the gel pieces was with 50 mM ammonium

bicarbonate solution containing 12.5 ng/µl modified sequencing-grade trypsin (Promega,

Madison, WI) at 4ºC. After 45 min., the excess trypsin solution was removed and replaced with

50 mM ammonium bicarbonate solution to just cover the gel pieces. Samples were then placed

in a 37ºC room overnight. Peptides were later extracted by removing the ammonium

bicarbonate solution, followed by one wash with a solution containing 50% acetonitrile and 1%

formic acid. The extracts were then dried in a speed-vac (~1 hr). The samples were then stored

at 4ºC until analysis.

134 On the day of analysis, the samples were reconstituted in 5 - 10 µl of HPLC solvent A

(2.5% acetonitrile, 0.1% formic acid). A nano-scale reverse-phase HPLC capillary column was

created by packing 2.6 µm C18 spherical silica beads into a fused silica capillary (100 µm inner

diameter x ~30 cm length) with a flame-drawn tip232. After equilibrating the column each

sample was loaded via a Famos auto sampler (LC Packings, San Francisco CA) onto the column.

A gradient was formed, and peptides were eluted with increasing concentrations of solvent B

(97.5% acetonitrile, 0.1% formic acid). As peptides eluted, they were subjected to electrospray

ionization and then entered into an LTQ Orbitrap Velos Pro ion-trap mass spectrometer

(Thermo Fisher Scientific, Waltham, MA). Peptides were detected, isolated, and fragmented to

produce a tandem mass spectrum of specific fragment ions for each peptide. Peptide

sequences (and hence protein identity) were determined by matching protein databases with

the acquired fragmentation pattern by the software program, Sequest (Thermo Fisher

Scientific, Waltham, MA)233. All databases include a reversed version of all the sequences and

the data was filtered to between a one and two percent peptide false discovery rate.

Image acquisition and quantification

For imaging, bacterial cells were resuspended in minimal volume of 1 x PBS. 1 µL of this

cell suspension was spotted to the coverslips (typically, 24X50 mm coverslips; #1.5) and an 8x8-

mm wide, 2-mm thick PBS-agarose pad (SeaKem LE Agarose) was laid on top of the cells. Phase

and fluorescence images were acquired using a Nikon Ti2 Eclipse inverted microscope equipped

with a Plan Apo Lambda DM 60X (1.4 NA, Ph3) oil objective and Andor Zyla sCMOS camera. NIS-

Element AR software was used for image acquisition. For quantitative comparisons, the samples were imaged in the same session with the same image conditions across. Image processing was

135 performed in FIJI. Images were scaled, cropped and rotated without interpolation. Linear

adjustment was performed to optimize contrast and brightness of the images. Figure

construction was performed in Adobe Illustrator. The relative fluorescence units of Cou and sfGFP

signal intensities were quantified using a FIJI plugin, MicrobeJ234_ENREF_48, where cells were

identified in the phase contrast channel with width limit from 0.3 to 2 µm and length above 1

µm. Fluorescence intensities at the cell poles and the rest of the cell body then quantified and

averaged within individual cells (N> 100) using the ‘polarity’ mode in MicrobeJ. Violin plots

(Figure 3e and Figure 22c) were plotted and analyzed in Prism 8.2.1 for Windows,

Partial recoding by multiplex automated genome engineering (MAGE)

Previously designed MAGE oligos were ordered from IDT with standard desalting and 2

phosphorothioate bonds at each terminus (Table 4)7. A master stock solution with a mixture of

these 13 oligos to reach final concentrations of 30 µM for each oligo. As a negative control,

MAGE-Neg control was prepared to 400 µM. pORTMAGE protocol was performed as previously

described159. Briefly, an overnight culture of cells harboring pORTMAGE-3 plasmid (Addgene

Plasmid # 72678) was diluted 100-fold into 20–30 mL 2xYT + kanamycin (to final 25 μg mL-1), and grown at 30 °C with aeration until mid-log growth was achieved (OD600 ~0.55– 0.65).

Lambda Red was induced in a shaking water bath (42 °C, 300 rpm, 15 minutes), then induced culture tubes were cooled rapidly on ice for at least 5 minutes. Electrocompetent cells were prepared at 4 °C by pelleting 10 mL of culture (centrifuge at 16,000 g at 5 min) and washing the cell pellet twice with 1 mL ice cold deionized water (dH2O) and eventually resuspending cells in

250 μL cold dH2O. 55 μL of the cells were mixed with 5 µL of the oligo mixture. Cells were transferred to 0.1 cm cuvettes, electroporated (BioRad GenePulser™, 1800 V, 200 Ω, 25 µF),

136 and then immediately resuspended in 0.5 mL SOC medium. The cells were allowed to recover 1 h at 30 °C at 250rpm. To select for Api 137 resistant colonies, 100 µL of these cells were plated on LB + Api 137 (750 µg mL-1). For continued MAGE cycling 4.5 mL 2xYT + kanamycin (to final

25 μg mL-1) were added and cultures were recovered to mid-log phase before being induced for the next cycle. Colonies on LB + Api 137 plates appeared after as little as 2 MAGE cycles with the mixture of the 13 oligos, but not with the negative control. The resistant isolates were tested for their ability to incorporate nsAAs after the pORTMAGE plasmid was cured. The presence of codon replacements were confirmed using allele-specific colony PCR using primer sets specific for 13 genes and following the MASC-PCR protocol as described7.

Library Generation Flow Cytometry and Cell Sorting.

Apidaecin peptide library was constructed using pDule-MbAbKRS-2xtRNA-RBS1-api1b as the template. First, the plasmid backbone was linearized using primers Pri3 and Pri4. The library insert sequence, Ultra1, was acquired as one 200 bp PAGE purified Ultramer (IDT). Ultra1 (to final

1 μM) was further amplified by Pri5 and Pri6. The insert (~125 ng) is assembled into the backbone

(~400 ng) in a 150μL Gibson assembly reaction (NEB 50 °C, 1h). The product was then cleaned and concentrated by ethanol precipitation and the entire product was electro-transformed into

E. cloni 10G SUPREME (Lucigen) cells that already harbored the pBAD-Ub-UAG-sfGFP_151UAG plasmid. After cells were recovered in SOC for 1 h, overnight cultures were set up by adding 4 mL

2xYT supplemented with kanamycin (to final 25 μg mL-1) and tetracycline (to final 5 μg mL-1) at

37 °C with aeration. In parallel, dilutions were plated to estimate the library size. 50 colonies were randomly selected and sequenced (Genewiz) in order to estimate library diversity and quality.

137 The library was directly diluted 1:10 in GMML supplemented with kanamycin (to final

12.5 μg mL-1), tetracycline (to final 2.5 μg mL-1), glucose (to final 0.05%), arabinose (to final

0.05%), and BocK (to 2 mM). After the cells were incubated overnight at 37 °C, they were

washed twice with 1 mL 1 x PBS and diluted for fluorescence activated sorting in a Sony MA900

Cell Sorter. Cells displaying the top ~0.0005% and ~0.02% of fluorescence activation (~2,000 cells) were collected into 2xYT. After 30 min recovery, dilutions of the recovery were plated on

LB + tetracycline (to final 10 μg mL-1). The next day, 30 colonies from each sort were sequenced.

Library variants of interest were grown overnight, miniprepped, and retransformed into E. cloni

10G cells for further analysis in nsAA incorporation assays.

Chapter 4 Appendix

Funding, COI and acknowledgements:

Funding Sources

This work was supported by US Department of Energy Grant DE-FG02-02ER63445. (Add

fellowships)

Conflict of Interest Disclosure

G.M.C. has related financial interests in ReadCoor, EnEvolv, 64-X, and GRO Biosciences. For a

complete list of G.M.C.’s financial interests, please visit arep.med.harvard.edu/gmc/tech.html.

Acknowledgements:

We thank A. Chatterjee, A. Bisson, S. Wilson, M. Holmes, W. Mallard, S. Hurlimann, A. Florez, M.

Dion, M. Baas-Thomas, G. Chao, G. Filsinger, J. Marchand, A. Mijalis, K. Narasimhan, A. Nyerges,

N. Ostrov, A. Rudolph, M. Shubert, P. Smith, D. Thompson, T. Wannier for helpful discussions

138 and advice. The Taplin Mass Spectrometry Facility and the Analytical Chemistry Core at Harvard

Medical School were also essential to this work.

Methods

Reagents:

Antibiotics and nsAAs were purchased from Sigma. nsAA stock solutions were prepared in water with minimal base or acid, e.g. 0.3 M KOH to prepare 0.2 M Bpa stock solution, except for

Cou stock, which was prepared in DMSO. All stock concentrations of nsAA were between 100–

200 mM.

B. subtilis medias:

S750 media was made in 500 mL, by mixing 50 mL 10X S750 salts (recipe below), 5 mL 100x S750

metals (recipe below), 5 mL 1 Molar glutamate and 10 mL 50% (w/w) glucose together. Made

up to 500 mL with ddH2O. Filter sterilized (not autoclaved) and distributed into 50 mL aliquots.

Glycerol, sorbitol or fumarate was was substituted 1:1 for Glucose for different carbon sources.

Other amino acids can be substituted 1:1 for glutamate or use 20% (w/w) ammonium sulfate

(final concentration 0.2% + additional from 10x salts) for media lacking amino acids.

10x S750 salts were made in 1 liter aliquots. 104.7 g MOPS (free acid), 13.2 g ammonium sulfate

(NH4)2SO4, 6.8 grams potassium phosphate monobasic KH2PO4 were added, and buffered to pH

7 with 50% KOH, then made up to 1 L with ddH2O. Medias was filter sterilized, covered with foil

and store at 4 degrees C. If yellowed, solution is no longer good.

100x S750 metals have final concentrations of: 0.2 M MgCl2, 70 mM CaCl2, 5 mM MnCl2, 0.1 mM

ZnCl2, 100 microgram/mL thiamine-HCl, 2 mM HCL, 0.5 mM FeCl3. Add iron last to prevent

precipitation. Filter sterilize and store foil-wrapped at 4 degrees C.

139 MC media for transformations was made at a 10x stock. The 10x stock has final concentrations

of: 1 M potassium phosphate pH 7, 30 mM sodium citrate, 20% (w/w) glucose, 220 mg/mL

Ferric ammonium citrate, 1% casein hydrolysate, 2% potassium glutamate. Store in aliquots at -

20 degrees, and make 1x media during the competence protocol, supplementing the 1x MC

with 3-30 mM MgSO4.

Strain construction:

All B. subtilis strains were derived from the prototrophic strain PY79. All cloning was done via a combination of Gibson isothermal assembly and overlap-extension PCR. Primers were ordered from IDT with Q5 Tms (calculated via the NEB Tm calculator) of 70-72 degrees. Overlaps between parts were 20-35 bases, attempting to maintain overlap Tm of 55-65 degrees as calculated by the Genious Tm calculator. PCRs were carried out in 25 uL, using 2x Q5 master mix with the maximum allowed temperature at 25 cycles and with 1 min/kB extension time.

PCR fragments were gel-verified and either gel-extracted or PCR-purified using appropriate

Qiagen kits. Isothermal reactions were done in 20 uL final volume homemade Gibson mix made

according to the original 2011 gibson paper, at 50 degrees C for 30-45 minutes. The entire

volume of the Gibson was transformed directly into B. subtilis by the B. subtilis transformation

protocol described below. In case of transformation failure, overlap assembly PCR was carried

out in a pairwise manner to assemble difficult-to-gibson regions of DNA. 5-10 uL of each DNA

part was added to overlap extension PCRs, which were run for 5 cycles, followed by addition of

primers to amplify the full-length piece and 25 more cycles.

140 Amber codon mutations were introduced into the coding sequence of YukE cloned into

integration vector pDR111 as described in189. Strains and oligonucleotides used in this study are

listed in Table 1 and 2. All constructs were confirmed by sequencing.

B. subtilis transformation protocol:

For transformation of either genomic or linearized DNA, LB plates were streaked with parent

strains the day before. Freshly grown (not more than a day old) colonies were inoculated into 1

mL MC in a large glass test tube, with 3-30 uL of 1 M MgSO4 supplemented. Strains were grown

in MC at 37 degrees C in a roller drum for 4 hours, at which point culture should be visibly

turbid. Then, 200 uL of culture was added to a standard 13 mL culture tube with transformation

DNA. Use an entire 20 uL Gibson reaction, or 2 uL of B. subtilis genomic DNA. Cultures went

back in the 37 degree roller for 2 more hours, then the entire volume was plated on selective LB

plates. Single colonies were picked and verified by sequencing of unpurified colony PCR. To do

colony PCR with B. subtilis colonies, a colony was suspended in 50 uL TE + 10% CHELEX beads,

then vortexed for 10 minutes, boiled for 30 minutes, vortexed for 10 minutes, and spun down.

1 uL was used as PCR template, without getting any CHELEX beads into the PCR reaction.

B. subtilis genomic DNA preparation:

For Bacillus genomic DNA prep, used as a PCR template or to transform into other strains, 1-2

mL LB was inoculated from a fresh colony. Cells were grown until dense, preferably not

overgrown. Cells were pelleted at max speed and supernatant aspirated. Cells were

resuspended in 500 uL lysis buffer (recipe below), and 50 uL of freshly made 20 mg/mL egg

white lysozyme in lysis buffer was added. The mixture was incubated at 37 degrees for 15-45

minutes, using the longer time if cells were overgrown. Added 60 uL 10% (w/w) sarkosyl (N-

141 lauroylsarkosine) in ddH2O and vortexed. Entire solution was transferred to a phase lock tube and 600 uL phenol-chloroform added. Tube was vortexed vigorously for 15-20 seconds until frothy, then spun at max speed for 5 minutes. Aqueous phase was removed to a fresh 1.5 mL tube and 1/10 volume (60 uL) 3 M sodium acetate added, then vortexed. 2 volumes (or top of tube) of 100% Ethanol was added and tube inverted until DNA had visibly precipitated. Then tube was spun at max speed for 1 min. Supernatant was aspirated, 150 uL 70% ethanol 30% ddH2O was added, vortexed. Spun 1 min at max speed. Supernatant was removed and tubes were left to dry on bench 5-15 minutes. DNA was resuspended in 350 uL ddH2O and stored at -

20 ° C. B. subtilis lysis buffer is made with final concentration 20 mM Tris-HCl pH 7.5, 50 mM

EDTA, 10 mM NaCl.

Protein purification and mass spec

For protein expression, 25mL of s750 culture containing 1 mM IPTG & nsAA (1 mM for boc-K &

5OHW, 100 uM for bipA, bpA and pAzF) & was seeded at OD 0.002 with appropriate strain.

Cultures were grown in a shaking incubator overnight at 37° C. Cells were pelleted at 5K RCF for

30 min and frozen at -80. Cells were lysed using the EMD Millipore Lysonase BugBuster kit (Cat.

# 71370-3) following manufacturer’s instructions for gram-positive bacteria. For His-tag purification, Thermo Scientific HisPur cobalt resin (Cat. # 89964) was used. After washing with equilibration buffer, lysate was bound to resin with 45 minute binding at room temperature followed by one wash step and 3 elution steps, using the following buffers: Equilibration buffer:

20 mM Tris-HCl, Ph 8.3, 0.5 M NaCl, 5 mM imidazole. Wash buffer: 20 mM Tris-HCl, Ph 8.3, 0.5

M NaCl, 20 mM imidazole. Elution buffer: 20 mM Tris-HCl, Ph 8.3, 0.5 M NaCl, 200 mM imidazole.

142 Elutions were combined and 18 uL was run out on an Invitrogen 4-12% Bis-Tris NuPAGE gel (cat

# NP0322PK2) following manufacturer’s instructions. Purified tagged mNeongreen was seen at the correct weight and excised in approximately 5x20 mm pieces.

Excised gel bands were cut into approximately 1 mm3 pieces. Gel pieces were then subjected to a modified in-gel trypsin digestion procedure. Gel pieces were washed and dehydrated with acetonitrile for 10 min. followed by removal of acetonitrile. Pieces were then completely dried in a speed-vac. Rehydration of the gel pieces was with 50 mM ammonium bicarbonate solution containing 12.5 ng/µl modified sequencing-grade trypsin (Promega, 182 Madison, WI) at 4ºC.

After 45 min., the excess trypsin solution was removed and replaced with 50 mM ammonium bicarbonate solution to just cover the gel pieces. Samples were then placed in a 37ºC room overnight. Peptides were later extracted by removing the ammonium bicarbonate solution, followed by one wash with a solution containing 50% acetonitrile and 1% formic acid. The extracts were then dried in a speed-vac (~1 hr). The samples were then stored at 4ºC until analysis. On the day of analysis, the samples were reconstituted in 5 - 10 µl of HPLC solvent A

(2.5% acetonitrile, 0.1% formic acid). A nano-scale reverse-phase HPLC capillary column was created by packing 2.6 µm C18 spherical silica beads into a fused silica capillary (100 µm inner

191 diameter x ~30 cm length) with a flame-drawn tip. After equilibrating the column each sample was loaded via a Famos auto sampler (LC Packings, San Francisco CA) onto the column.

A gradient was formed, and peptides were eluted with increasing concentrations of solvent B

(97.5% acetonitrile, 0.1% formic acid). As peptides eluted, they were subjected to electrospray

195 ionization and then entered into an LTQ Orbitrap Velos Pro ion-trap mass spectrometer

(Thermo Fisher Scientific, Waltham, MA). Peptides were detected, isolated, and fragmented to

143 produce a tandem mass spectrum of specific fragment ions for each peptide. Peptide

sequences (and hence protein identity) were determined by matching protein databases with

the acquired fragmentation pattern by the software program, Sequest (Thermo Fisher

Scientific, Waltham, MA). All databases include a reversed version of all the sequences and the data was filtered to between a one and two percent peptide false discovery rate nsAA incorporation

For nsAA incorporation, fresh colonies were picked from LB plates to make a starter in the same media as the experimental media, most often s750 minimal media. These cultures were grown to exponential phase, preferably between OD 0.1-0.4, though ODs as high as 0.7 were used without issue. Experimental cultures were seeded from starter cultures at an OD of 0.002.

Experimental cultures were grown overnight either in a plate reader or culture tubes at 37 degrees either in a shaker at 250 rpm or roller drum. The experimental cultures contained the

final concentrations of IPTG and nsAA from the start. For endpoint experiments, cultures were

diluted 1:1 with PBS and then read in the plate reader (exact model needed), unless the

fluorescent couAA was being incorporated, in which case cells were pelleted at 5K rcf and

washed 3x with PBS before being read in the plate reader.

LCMS nsAA import quantification:

Protocol adapted from72.

To assay import of nsAAs into cells, serial dilutions of WT Py79 B. subtilis were seeded in 1 mL s750 media tubes and incubated at 37 ° C in a tube roller overnight to obtain turbid but not stationary cultures, OD 0.4-0.8. In the morning, 200 uL of this culture was added to 60 mL

experimental cultures containing nsAA. After growth in a 37° shaker until OD ~0.2, exact OD’s

144 were recorded and 50 mL of culture taken, the rest left in the 37 ° shaker overnight. 50 mL of culture was pelleted at 4 ° C for 10 min at 5.25K RCF and the supernatant discarded. The pellet was washed 4 times with 1 mL ice-cold s750, with rapid but thorough resuspension and 2.5- minute spins at 14 k RCF. Cell pellets were then frozen. After the overnight, the saturated culture was diluted 10:1 for OD measurement, and then 5 mL of saturated culture was pelleted at 4 ° C for 10 min at 5.25K, followed by 4 washes with 1 mL ice-cold s750, with rapid but thorough resuspension and 2.5-minute spins at 14 k RCF and brief freezing. To thoroughly lyse cells, pellets were resuspended in 400 uL 40:60 sterile methanol:water, and transferred to screw-top tubes. 300 mg of 200 µm acid-washed glass beads was added to each tube, and the tubes beaten in a bead beater for 10 minutes at 4 ° C, in 1-minute increments with a 5-minute gap between each bead beating step. The tubes were inverted and a small hole poked in the bottom with a syringe. Then the screw-top tubes were placed into 7 mL tubes and the lysate collected by a 5-minute spin at 4000 rcf. An additional 400 uL 40:60 methanol was added to the tubes and the spin repeated to ensure all lysate was collected. The collected lysate and wash was spun at 16 k rcf at 4 ° C for 30 hours and 750 uL was transferred to a fresh tubes, which were then centrifuged at 4 ° C for 2 hours and 600 uL supernatant extracted to be used in

LCMS experiments.

Incorporation of unnatural amino acids during sporulation

Sporulation was induced by resuspension as described235 with the following modifications.

Where appropriate, at the time of resuspension, the inducer, isopropyl β- d-1- thiogalactopyranoside (IPTG) and the unnatural amino acid p-azido-phenylalanine (pAzF) were added to the test cultures at 1 mM and 100 µM final concentration respectively.

145 Samples for microscopy were collected and prepared as previously described236. Images were collected using an Axio M1 Imager (Zeiss). For strains producing mNeogreen, excitation was for 2 sec (Ex band pass 500±25 and Em - 535±30)) and for strains producing GFP, excitation was 30 milliseconds (Ex band pass 470±40 and Em - 525±50). Images were analyzed and prepared using Zen 2.0 software (Zeiss) and ImageJ.

Detection of protein-protein interaction by crosslinking in vivo

Samples were prepared as described in189 with the following modifications. Strains were grown in 30 mL LB medium at 37o C with shaking at 210 rpm to an OD 600 nm of 1.0-1.5. The cultures were split into separate flasks in 7 ml aliquots. As appropriate, cultures were either supplemented with inducer (IPTG) or with unnatural amino acid, pAzF, to final 1 mM and 100

µM concentration respectively, or both or neither.

Growth at 37o C with shaking was continued for 13 hours. At this time, OD measurement of cultures was taken again (generally around OD600 3.4 – 4.9), and 1 mL aliquots were pelleted by centrifugation and resuspended in 1 mL of sterile 1x PBS buffer. Cell resuspensions were transferred to a 12-well polycarbonate plate (Corning, catalog # 353043) and exposed to UV light at 365 nm in Spectrolinker XL-1000 UV Crosslinker (Spectronics Corporation). The distance between the bottom of the well with cell resuspension and the UV source was ~4 cm. Cell suspensions were exposed to the UV source for either 15 minutes or a total of 60 minutes which was divided in two 30-minute long intervals. 12-well plate was on ice and thermal probe readings confirmed that the cell suspension temperature never exceeded 35.1o C.

After UV treatments, cell suspensions equivalent to 1.5 OD units were pelleted by centrifugation and resulting cell pellets were resuspended in 100 µL of Tris-HCl pH 6.8 lysis

146 buffer, containing 10 mM EDTA, 100 µg/mL chicken egg white lysozyme, 10 µg/mL DNaseI,

0.25 mM phenylmethyl sulfonyl fluoride (PMSF) and 1 µM protease cocktail inhibitor E-64.

Lysis was continued for 20 minutes at 37o C and then 50 µL SDS-PAGE loading buffer (without reductant) was added. The lysed samples were mixed vigorously and then heated for 10 minutes at 65o C.

For crosslinking detection, 7.5 µL aliquots of all control and test samples were separated by

17% SDS-PAGE. Then proteins were transferred from the gel to polyvinylidene fluoride (PVDF) membrane for 52 minutes at 15 V in Trans Blot Semi Dry Transfer Cell (BioRad). The membrane was cut horizontally along the 25 kDa marker band to allow for probing with two different antibodies. After blocking, the lower part of the membrane was incubated with custom rabbit polyclonal anti‐YukE antibodies used in a dilution of 1:5,000 (Huppert et al 2014). The upper membrane portion (used as loading control) was probed with anti-SigmaA antibodies diluted

1:10,000237. The primary antibodies were tagged successively with goat anti‐rabbit antibodies conjugated with horseradish peroxidase (HRP) (abcam, catalog # ab6721). The bound antibodies were detected using luminol containing reagent Clarity Western ECL Substrate

(BioRad) and visualized using ChemiDoc imager (BioRad).

147 REFERENCES

1. Ibba, M. & Söll, D. Aminoacyl-tRNA Synthesis. Annu. Rev. Biochem. 69, 617–650 (2000).

2. Ibba, M. & Söll, D. Aminoacyl-tRNAs: setting the limits of the genetic code. Genes Dev. 18, 731–738 (2004).

3. Wang, L., Brock, A., Herberich, B. & Schultz, P. G. Expanding the Genetic Code of Escherichia coli. Science 292, 498–500 (2001).

4. Johnson, D. B. F. et al. RF1 knockout allows ribosomal incorporation of unnatural amino acids at multiple sites. Nat. Chem. Biol. 7, 779–786 (2011).

5. Wang, K., Neumann, H., Peak-Chew, S. Y. & Chin, J. W. Evolved orthogonal ribosomes enhance the efficiency of synthetic genetic code expansion. Nat. Biotechnol. 25, 770–777 (2007).

6. Hammerling, M. J., Krüger, A. & Jewett, M. C. Strategies for in vitro engineering of the translation machinery. Nucleic Acids Res. 48, 1068–1083 (2020).

7. Lajoie, M. J. et al. Genomically Recoded Organisms Expand Biological Functions. Science 342, 357–360 (2013).

8. Korkmaz, G., Holm, M., Wiens, T. & Sanyal, S. Comprehensive Analysis of Stop Codon Usage in Bacteria and Its Correlation with Release Factor Abundance. J. Biol. Chem. 289, 30334– 30342 (2014).

9. Sun, J., Chen, M., Xu, J. & Luo, J. Relationships among stop codon usage bias, its context, isochores, and level in various eukaryotes. J. Mol. Evol. 61, 437–444 (2005).

10. Schmidt, A. et al. The quantitative and condition-dependent Escherichia coli proteome. Nat. Biotechnol. 34, 104–110 (2016).

11. Zheng, Y. et al. Performance of optimized noncanonical amino acid mutagenesis systems in the absence of release factor 1. Mol. Biosyst. 12, 1746–1749 (2016).

12. Ostrov, N. et al. Design, synthesis, and testing toward a 57-codon genome. Science 353, 819–822 (2016).

13. Fredens, J. et al. Total synthesis of Escherichia coli with a recoded genome. Nature 569, 514–518 (2019).

14. Chen, I. A. & Schindlinger, M. Quadruplet codons: One small step for a ribosome, one giant leap for proteins. BioEssays News Rev. Mol. Cell. Dev. Biol. 32, 650–654 (2010).

148 15. DeBenedictis, E. A., Chory, E. J., Gretton, D., Wang, B. & Esvelt, K. A high-throughput platform for feedback-controlled directed evolution. bioRxiv 2020.04.01.021022 (2020) doi:10.1101/2020.04.01.021022.

16. Zhang, Y. et al. A semi-synthetic organism that stores and retrieves increased genetic information. Nature 551, 644–647 (2017).

17. Hamashima, K., Kimoto, M. & Hirao, I. Creation of unnatural base pairs for genetic alphabet expansion toward synthetic xenobiology. Curr. Opin. Chem. Biol. 46, 108–114 (2018).

18. Dien, V. T., Morris, S. E., Karadeema, R. J. & Romesberg, F. E. Expansion of the genetic code via expansion of the genetic alphabet. Curr. Opin. Chem. Biol. 46, 196–202 (2018).

19. DeLey Cox, V. E., Cole, M. F. & Gaucher, E. A. Incorporation of Modified Amino Acids by Engineered Elongation Factors with Expanded Substrate Capabilities. ACS Synth. Biol. 8, 287–296 (2019).

20. Haruna, K., Alkazemi, M. H., Liu, Y., Söll, D. & Englert, M. Engineering the elongation factor Tu for efficient selenoprotein synthesis. Nucleic Acids Res. 42, 9976–9983 (2014).

21. O’Donoghue, P., Ling, J., Wang, Y.-S. & Söll, D. Upgrading protein synthesis for synthetic biology. Nat. Chem. Biol. 9, 594–598 (2013).

22. Dedkova, L. M., Fahmi, N. E., Golovine, S. Y. & Hecht, S. M. Construction of Modified Ribosomes for Incorporation of d-Amino Acids into Proteins. Biochemistry 45, 15541–15551 (2006).

23. Hui, A. & de Boer, H. A. Specialized ribosome system: preferential translation of a single mRNA species by a subpopulation of mutated ribosomes in Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 84, 4762–4766 (1987).

24. Dumas, A., Lercher, L., D. Spicer, C. & G. Davis, B. Designing logical codon reassignment – Expanding the chemistry in biology. Chem. Sci. 6, 50–69 (2015).

25. Chin, J. W. Expanding and reprogramming the genetic code. Nature 550, 53–60 (2017).

26. Arranz-Gibert, P., Vanderschuren, K. & Isaacs, F. J. Next-generation genetic code expansion. Curr. Opin. Chem. Biol. 46, 203–211 (2018).

27. Zhao, J., Burke, A. J. & Green, A. P. Enzymes with noncanonical amino acids. Curr. Opin. Chem. Biol. 55, 136–144 (2020).

28. Davis, L. & Chin, J. W. Designer proteins: applications of genetic code expansion in cell biology. Nat. Rev. Mol. Cell Biol. 13, 168–182 (2012).

149 29. Kim, C. H., Axup, J. Y. & Schultz, P. G. Protein conjugation with genetically encoded unnatural amino acids. Curr. Opin. Chem. Biol. 17, 412–419 (2013).

30. Lee, K. J., Kang, D. & Park, H.-S. Site-Specific Labeling of Proteins Using Unnatural Amino Acids. Mol. Cells 42, 386–396 (2019).

31. Huang, Y. & Liu, T. Therapeutic applications of genetic code expansion. Synth. Syst. Biotechnol. 3, 150–158 (2018).

32. Kang, M., Lu, Y., Chen, S. & Tian, F. Harnessing the power of an toward next-generation biopharmaceuticals. Curr. Opin. Chem. Biol. 46, 123–129 (2018).

33. Mori, H. & Ito, K. Different modes of SecY–SecA interactions revealed by site-directed in vivo photo-cross-linking. Proc. Natl. Acad. Sci. 103, 16159–16164 (2006).

34. Liu, C. et al. Coupled chaperone action in folding and assembly of hexadecameric Rubisco. Nature 463, 197–202 (2010).

35. Boos, D., Kuffer, C., Lenobel, R., Körner, R. & Stemmann, O. Phosphorylation-dependent Binding of Cyclin B1 to a Cdc6-like Domain of Human Separase. J. Biol. Chem. 283, 816–823 (2008).

36. Liu, J., Hemphill, J., Samanta, S., Tsang, M. & Deiters, A. Genetic Code Expansion in Embryos and Its Application to Optical Control of Cell Signaling. J. Am. Chem. Soc. 139, 9100–9103 (2017).

37. Courtney, T. M. & Deiters, A. Optical control of protein phosphatase function. Nat. Commun. 10, 4384 (2019).

38. Courtney, T. & Deiters, A. Recent advances in the optical control of protein function through genetic code expansion. Curr. Opin. Chem. Biol. 46, 99–107 (2018).

39. Charbon, G. et al. Subcellular Protein Localization by Using a Genetically Encoded Fluorescent Amino Acid. Chembiochem 12, 1818–1821 (2011).

40. Neumann, H., Hazen, J. L., Weinstein, J., Mehl, R. A. & Chin, J. W. Genetically Encoding Protein Oxidative Damage. J. Am. Chem. Soc. 130, 4028–4033 (2008).

41. Neumann, H., Peak-Chew, S. Y. & Chin, J. W. Genetically encoding N(epsilon)-acetyllysine in recombinant proteins. Nat. Chem. Biol. 4, 232–234 (2008).

42. Xie, J. et al. The site-specific incorporation of p -iodo- L -phenylalanine into proteins for structure determination. Nat. Biotechnol. 22, 1297–1301 (2004).

43. Deiters, A., Geierstanger, B. H. & Schultz, P. G. Site-Specific in vivo Labeling of Proteins for NMR Studies. ChemBioChem 6, 55–58 (2005).

150 44. Blight, S. K. et al. Direct charging of tRNA CUA with pyrrolysine in vitro and in vivo. Nature 431, 333–335 (2004).

45. Wan, W., Tharp, J. M. & Liu, W. R. Pyrrolysyl-tRNA Synthetase: an ordinary enzyme but an outstanding genetic code expansion tool. Biochim. Biophys. Acta 1844, 1059–1070 (2014).

46. Borrel, G. et al. Unique Characteristics of the Pyrrolysine System in the 7th Order of Methanogens: Implications for the Evolution of a Genetic Code Expansion Cassette. Archaea vol. 2014 e374146 https://www.hindawi.com/journals/archaea/2014/374146/ (2014).

47. Guo, L.-T. et al. Polyspecific pyrrolysyl-tRNA synthetases from directed evolution. Proc. Natl. Acad. Sci. 111, 16724–16729 (2014).

48. Hohl, A. et al. Engineering a promiscuous pyrrolysyl-tRNA synthetase by a high throughput FACS screen. bioRxiv 229054 (2017) doi:10.1101/229054.

49. Wang, Y.-S. et al. The de novo engineering of pyrrolysyl-tRNA synthetase for genetic incorporation of L-phenylalanine and its derivatives. Mol. Biosyst. 7, 714–717 (2011).

50. Hancock, S. M., Uprety, R., Deiters, A. & Chin, J. W. Expanding the Genetic Code of Yeast for Incorporation of Diverse Unnatural Amino Acids via a Pyrrolysyl-tRNA Synthetase/tRNA Pair. J. Am. Chem. Soc. 132, 14819–14824 (2010).

51. Greiss, S. & Chin, J. W. Expanding the genetic code of an animal. J. Am. Chem. Soc. 133, 14196–14199 (2011).

52. Han, S. et al. Expanding the genetic code of Mus musculus. Nat. Commun. 8, 14568 (2017).

53. Jiang, R. & Krzycki, J. A. PylSn and the Homologous N-terminal Domain of Pyrrolysyl-tRNA Synthetase Bind the tRNA That Is Essential for the Genetic Encoding of Pyrrolysine. J. Biol. Chem. 287, 32738–32746 (2012).

54. Willis, J. C. W. & Chin, J. W. Mutually orthogonal pyrrolysyl-tRNA synthetase/tRNA pairs. Nat. Chem. 10, 831–837 (2018).

55. Beránek, V., Willis, J. C. W. & Chin, J. W. An Evolved Methanomethylophilus alvus Pyrrolysyl- tRNA Synthetase/tRNA Pair Is Highly Active and Orthogonal in Mammalian Cells. Biochemistry 58, 387–390 (2019).

56. Ding, W. et al. Chimeric design of pyrrolysyl-tRNA synthetase/tRNA pairs and canonical synthetase/tRNA pairs for genetic code expansion. Nat. Commun. 11, 3154 (2020).

57. Yamaguchi, A., Iraha, F., Ohtake, K. & Sakamoto, K. Pyrrolysyl-tRNA Synthetase with a Unique Architecture Enhances the Availability of Lysine Derivatives in Synthetic Genetic Codes. Mol. Basel Switz. 23, (2018).

151 58. Chin, J. W. et al. An Expanded Eukaryotic Genetic Code. Science 301, 964–967 (2003).

59. Wu, N., Deiters, A., Cropp, T. A., King, D. & Schultz, P. G. A genetically encoded photocaged amino acid. J. Am. Chem. Soc. 126, 14306–14307 (2004).

60. Hughes, R. A. & Ellington, A. D. Rational design of an orthogonal tryptophanyl nonsense suppressor tRNA. Nucleic Acids Res. 38, 6813–6830 (2010).

61. Chatterjee, A., Xiao, H., Yang, P.-Y., Soundararajan, G. & Schultz, P. G. A tryptophanyl-tRNA synthetase/tRNA pair for unnatural amino acid mutagenesis in E. coli. Angew. Chem. Int. Ed Engl. 52, 5106–5109 (2013).

62. Ikeda-Boku, A. et al. A simple system for expression of proteins containing 3-azidotyrosine at a pre-determined site in Escherichia coli. J. Biochem. (Tokyo) 153, 317–326 (2013).

63. Anderson, J. C. et al. An expanded genetic code with a functional quadruplet codon. Proc. Natl. Acad. Sci. U. S. A. 101, 7566–7571 (2004).

64. Park, H.-S. et al. Expanding the Genetic Code of Escherichia coli with Phosphoserine. Science 333, 1151–1154 (2011).

65. Melnikov, S. V. & Söll, D. Aminoacyl-tRNA Synthetases and tRNAs for an Expanded Genetic Code: What Makes them Orthogonal? Int. J. Mol. Sci. 20, (2019).

66. Arranz-Gibert, P., Patel, J. R. & Isaacs, F. J. The Role of Orthogonality in Genetic Code Expansion. Life 9, (2019).

67. Wang, L. & Schultz, P. G. A general approach for the generation of orthogonal tRNAs. Chem. Biol. 8, 883–890 (2001).

68. Tian, H. et al. Screening system for orthogonal suppressor tRNAs based on the species- specific toxicity of suppressor tRNAs. Biochimie 95, 881–888 (2013).

69. Miller, C. et al. A synthetic tRNA for EF-Tu mediated selenocysteine incorporation in vivo and in vitro. FEBS Lett. 589, 2194–2199 (2015).

70. Xie, J. & Schultz, P. G. An expanding genetic code. Methods 36, 227–238 (2005).

71. Wang, Q., Parrish, A. R. & Wang, L. Expanding the Genetic Code for Biological Studies. Chem. Biol. 16, 323–336 (2009).

72. Zhang, M. S. et al. Biosynthesis and genetic encoding of phosphothreonine through parallel selection and deep sequencing. Nat. Methods 14, 729–736 (2017).

73. Amiram, M. et al. Evolution of translation machinery in recoded bacteria enables multi-site incorporation of nonstandard amino acids. Nat. Biotechnol. 33, 1272–1279 (2015).

152 74. Monk, J. W. et al. Rapid and Inexpensive Evaluation of Nonstandard Amino Acid Incorporation in Escherichia coli. ACS Synth. Biol. 6, 45–54 (2017).

75. Farrell, I. S., Toroney, R., Hazen, J. L., Mehl, R. A. & Chin, J. W. Photo-cross-linking interacting proteins with a genetically encoded benzophenone. Nat. Methods 2, 377–384 (2005).

76. Mandell, D. J. et al. Biocontainment of genetically modified organisms by synthetic protein design. Nature 518, 55–60 (2015).

77. Rovner, A. J. et al. Recoded organisms engineered to depend on synthetic amino acids. Nature 518, 89–93 (2015).

78. Sakamoto, K. et al. Site-specific incorporation of an unnatural amino acid into proteins in mammalian cells. Nucleic Acids Res. 30, 4692–4699 (2002).

79. Parrish, A. R. et al. Expanding the Genetic Code of Using Bacterial aminoacyl-tRNA Synthetase/tRNA Pairs. ACS Chem. Biol. 7, 1292–1302 (2012).

80. Gan, Q., Lehman, B. P., Bobik, T. A. & Fan, C. Expanding the genetic code of Salmonella with non-canonical amino acids. Sci. Rep. 6, (2016).

81. Lin, S. et al. Site-Specific Incorporation of Photo-Cross-Linker and Bioorthogonal Amino Acids into Enteric Bacterial . J. Am. Chem. Soc. 133, 20581–20587 (2011).

82. Bartholomae, M. et al. Expanding the Genetic Code of Lactococcus lactis and Escherichia coli to Incorporate Non-canonical Amino Acids for Production of Modified Lantibiotics. Front. Microbiol. 9, (2018).

83. Scheidler, C. M., Vrabel, M. & Schneider, S. Genetic Code Expansion, Protein Expression, and Protein Functionalization in Bacillus subtilis. ACS Synth. Biol. 9, 486–493 (2020).

84. Luo, X. et al. Recombinant thiopeptides containing noncanonical amino acids. Proc. Natl. Acad. Sci. U. S. A. 113, 3615–3620 (2016).

85. Wang, F., Robbins, S., Guo, J., Shen, W. & Schultz, P. G. Genetic Incorporation of Unnatural Amino Acids into Proteins in Mycobacterium tuberculosis. PLOS ONE 5, e9354 (2010).

86. Chemla, Y. et al. Expanding the Genetic Code of a Photoautotrophic Organism. Biochemistry 56, 2161–2165 (2017).

87. Weaver, J. B. & Boxer, S. G. Genetic Code Expansion in Rhodobacter sphaeroides to Incorporate Noncanonical Amino Acids into Photosynthetic Reaction Centers. ACS Synth. Biol. 7, 1618–1628 (2018).

153 88. Fekner, T. & Chan, M. K. The Pyrrolysine Translational Machinery as a Genetic-Code Expansion Tool. Curr. Opin. Chem. Biol. 15, 387–391 (2011).

89. Chin, J. W. Expanding and Reprogramming the Genetic Code of Cells and Animals. Annu. Rev. Biochem. 83, 379–408 (2014).

90. Ma, N. J. & Isaacs, F. J. Genomic Recoding Broadly Obstructs the Propagation of Horizontally Transferred Genetic Elements. Cell Syst. 3, 199–207 (2016).

91. Oki, K., Sakamoto, K., Kobayashi, T., Sasaki, H. M. & Yokoyama, S. Transplantation of a tyrosine editing domain into a tyrosyl-tRNA synthetase variant enhances its specificity for a tyrosine analog. Proc. Natl. Acad. Sci. U. S. A. 105, 13298–13303 (2008).

92. Young, T. S., Ahmad, I., Yin, J. A. & Schultz, P. G. An Enhanced System for Unnatural Amino Acid Mutagenesis in E. coli. J. Mol. Biol. 395, 361–374 (2010).

93. Antonczak, A. K. et al. Importance of single molecular determinants in the fidelity of expanded genetic codes. Proc. Natl. Acad. Sci. 108, 1320–1325 (2011).

94. Fan, C., Ho, J. M. L., Chirathivat, N., Söll, D. & Wang, Y.-S. Exploring the Substrate Range of Wild-Type Aminoacyl-tRNA Synthetases. ChemBioChem 15, 1805–1809 (2014).

95. Xie, J., Liu, W. & Schultz, P. G. A genetically encoded bidentate, metal-binding amino acid. Angew. Chem. Int. Ed Engl. 46, 9239–9242 (2007).

96. Bachmair, A., Finley, D. & Varshavsky, A. In vivo half-life of a protein is a function of its amino-terminal residue. Science 234, 179–186 (1986).

97. Tasaki, T., Sriram, S. M., Park, K. S. & Kwon, Y. T. The N-End Rule Pathway. Annu. Rev. Biochem. 81, 261–289 (2012).

98. Tobias, J. W., Shrader, T. E., Rocap, G. & Varshavsky, A. The N-end rule in bacteria. Science 254, 1374–1377 (1991).

99. Rosen, C. B. & Francis, M. B. Targeting the N terminus for site-selective protein modification. Nat. Chem. Biol. 13, 697–705 (2017).

100. O’Donoghue, P. et al. Near-cognate suppression of amber, opal and quadruplet codons competes with aminoacyl-tRNAPyl for genetic code expansion. FEBS Lett. 586, 3931–3937 (2012).

101. Wang, K. H., Sauer, R. T. & Baker, T. A. ClpS modulates but is not essential for bacterial N-end rule degradation. Genes Dev. 21, 403–408 (2007).

102. Wang, K. H., Oakes, E. S. C., Sauer, R. T. & Baker, T. A. Tuning the strength of a bacterial N-end rule degradation signal. J. Biol. Chem. 283, 24600–24607 (2008).

154 103. Million-Weaver, S., Alexander, D. L., Allen, J. M. & Camps, M. Methods for quantifying plasmid copy number to investigate plasmid dosage effects associated with directed protein evolution. Methods Mol. Biol. Clifton NJ 834, 33–48 (2012).

104. Tobias, J. W. & Varshavsky, A. Cloning and functional analysis of the ubiquitin-specific protease gene UBP1 of Saccharomyces cerevisiae. J. Biol. Chem. 266, 12021–12028 (1991).

105. Wojtowicz, A. et al. Expression of yeast deubiquitination enzyme UBP1 analogues in E. coli. Microb. Cell Factories 4, 17 (2005).

106. Román-Hernández, G., Hou, J. Y., Grant, R. A., Sauer, R. T. & Baker, T. A. The ClpS adaptor mediates staged delivery of N-end rule substrates to the AAA+ ClpAP protease. Mol. Cell 43, 217–228 (2011).

107. Dougan, D. A., Reid, B. G., Horwich, A. L. & Bukau, B. ClpS, a Substrate Modulator of the ClpAP Machine. Mol. Cell 9, 673–683 (2002).

108. Wang, K. H., Roman-Hernandez, G., Grant, R. A., Sauer, R. T. & Baker, T. A. The Molecular Basis of N-end Rule Recognition. Mol. Cell 32, 406–414 (2008).

109. LaRiviere, F. J., Wolfson, A. D. & Uhlenbeck, O. C. Uniform Binding of Aminoacyl-tRNAs to Elongation Factor Tu by Thermodynamic Compensation. Science 294, 165–168 (2001).

110. Schrader, J. M., Chapman, S. J. & Uhlenbeck, O. C. Understanding the sequence specificity of tRNA binding to elongation factor Tu using tRNA mutagenesis. J. Mol. Biol. 386, 1255–1264 (2009).

111. Ronchel, M. C. & Ramos, J. L. Dual System To Reinforce Biological Containment of Recombinant Bacteria Designed for Rhizoremediation. Appl. Environ. Microbiol. 67, 2649– 2656 (2001).

112. Li, Q. & Wu, Y.-J. A fluorescent, genetically engineered microorganism that degrades organophosphates and commits suicide when required. Appl. Microbiol. Biotechnol. 82, 749–756 (2009).

113. Pasotti, L., Zucca, S., Lupotto, M., Cusella De Angelis, M. G. & Magni, P. Characterization of a synthetic bacterial self-destruction device for programmed cell death and for recombinant proteins release. J. Biol. Eng. 5, 8 (2011).

114. Wright, O., Delmans, M., Stan, G.-B. & Ellis, T. GeneGuard: A modular plasmid system designed for biosafety. ACS Synth. Biol. 4, 307–316 (2015).

115. Chan, C. T. Y., Lee, J. W., Cameron, D. E., Bashor, C. J. & Collins, J. J. ‘Deadman’ and ‘Passcode’ microbial kill switches for bacterial containment. Nat. Chem. Biol. 12, 82–86 (2016).

155 116. Tanrikulu, I. C., Schmitt, E., Mechulam, Y., Goddard, W. A. & Tirrell, D. A. Discovery of Escherichia coli methionyl-tRNA synthetase mutants for efficient labeling of proteins with azidonorleucine in vivo. Proc. Natl. Acad. Sci. 106, 15285–15290 (2009).

117. Krishnakumar, R. & Ling, J. Experimental challenges of sense codon reassignment: an innovative approach to genetic code expansion. FEBS Lett. 588, 383–388 (2014).

118. Biddle, W., Schmitt, M. A. & Fisk, J. D. Evaluating Sense Codon Reassignment with a Simple Fluorescence Screen. Biochemistry 54, 7355–7364 (2015).

119. Losfeld, M.-E., Soncin, F., Ng, B. G., Singec, I. & Freeze, H. H. A sensitive green fluorescent protein biomarker of N-glycosylation site occupancy. FASEB J. Off. Publ. Fed. Am. Soc. Exp. Biol. 26, 4210–4217 (2012).

120. Zimmerman, E. S. et al. Production of site-specific antibody-drug conjugates using optimized non-natural amino acids in a cell-free expression system. Bioconjug. Chem. 25, 351–361 (2014).

121. Völler, J.-S. & Budisa, N. Coupling genetic code expansion and metabolic engineering for synthetic cells. Curr. Opin. Biotechnol. 48, 1–7 (2017).

122. Chen, S.-J., Wu, X., Wadas, B., Oh, J.-H. & Varshavsky, A. An N-end rule pathway that recognizes proline and destroys gluconeogenic enzymes. Science 355, (2017).

123. Liao, Y.-D., Jeng, J.-C., Wang, C.-F., Wang, S.-C. & Chang, S.-T. Removal of N-terminal methionine from recombinant proteins by engineered E. coli methionine aminopeptidase. Protein Sci. Publ. Protein Soc. 13, 1802–1810 (2004).

124. Graciet, E. et al. Aminoacyl-transferases and the N-end rule pathway of prokaryotic/eukaryotic specificity in a human . Proc. Natl. Acad. Sci. 103, 3078– 3083 (2006).

125. Luo, J. et al. Genetically Encoded Optochemical Probes for Simultaneous Fluorescence Reporting and Light Activation of Protein Function with Two-Photon Excitation. J. Am. Chem. Soc. 136, 15551–15558 (2014).

126. Wang, J., Xie, J. & Schultz, P. G. A Genetically Encoded Fluorescent Amino Acid. J. Am. Chem. Soc. 128, 8738–8739 (2006).

127. Kunjapur, A. M. et al. Engineering posttranslational proofreading to discriminate nonstandard amino acids. Proc. Natl. Acad. Sci. 115, 619–624 (2018).

128. Summerer, D. et al. A genetically encoded fluorescent amino acid. Proc. Natl. Acad. Sci. 103, 9785–9789 (2006).

156 129. Ai, H., Shen, W., Sagi, A., Chen, P. R. & Schultz, P. G. Probing protein-protein interactions with a genetically encoded photo-crosslinking amino acid. Chembiochem Eur. J. Chem. Biol. 12, 1854–1857 (2011).

130. Chemla, Y., Ozer, E., Algov, I. & Alfonta, L. Context effects of genetic code expansion by stop codon suppression. Curr. Opin. Chem. Biol. 46, 146–155 (2018).

131. Chatterjee, A., Sun, S. B., Furman, J. L., Xiao, H. & Schultz, P. G. A Versatile Platform for Single- and Multiple-Unnatural Amino Acid Mutagenesis in Escherichia coli. Biochemistry 52, 1828–1837 (2013).

132. Cellitti, S. E. et al. In vivo Incorporation of Unnatural Amino Acids to Probe Structure, Dynamics, and Ligand Binding in a Large Protein by Nuclear Magnetic Resonance Spectroscopy. J. Am. Chem. Soc. 130, 9268–9281 (2008).

133. Hallam, T. J. & Smider, V. V. Unnatural amino acids in novel antibody conjugates. Future Med. Chem. 6, 1309–1324 (2014).

134. Oller-Salvia, B. Genetic Encoding of a Non-Canonical Amino Acid for the Generation of Antibody-Drug Conjugates Through a Fast Bioorthogonal Reaction. J. Vis. Exp. JoVE (2018) doi:10.3791/58066.

135. Mukai, T. et al. Codon reassignment in the Escherichia coli genetic code. Nucleic Acids Res. 38, 8188–8195 (2010).

136. Heinemann, I. U. et al. Enhanced phosphoserine insertion during Escherichia coli protein synthesis via partial UAG codon reassignment and release factor 1 deletion. FEBS Lett. 586, 3716–3722 (2012).

137. Mukai, T. et al. Highly reproductive Escherichia coli cells with no specific assignment to the UAG codon. Sci. Rep. 5, 9699 (2015).

138. Casteels, P., Ampe, C., Jacobs, F., Vaeck, M. & Tempst, P. Apidaecins: antibacterial peptides from honeybees. EMBO J. 8, 2387–2391 (1989).

139. Casteels, P. et al. Biodiversity of apidaecin-type peptide antibiotics. Prospects of manipulating the antibacterial spectrum and combating acquired resistance. J. Biol. Chem. 269, 26107–26115 (1994).

140. Berthold, N. et al. Novel Apidaecin 1b Analogs with Superior Serum Stabilities for Treatment of Infections by Gram-Negative Pathogens. Antimicrob. Agents Chemother. 57, 402–409 (2013).

141. Florin, T. et al. An antimicrobial peptide that inhibits translation by trapping release factors on the ribosome. Nat. Struct. Mol. Biol. 24, 752–757 (2017).

157 142. Graf, M. et al. Visualization of translation termination intermediates trapped by the Apidaecin 137 peptide during RF3-mediated recycling of RF1. Nat. Commun. 9, 3053 (2018).

143. Adio, S. et al. Dynamics of ribosomes and release factors during translation termination in E. coli. eLife 7, e34252 (2018).

144. Li, J. et al. Dissecting limiting factors of the Protein synthesis Using Recombinant Elements (PURE) system. Transl. Austin Tex 5, e1327006 (2017).

145. Shimizu, Y. et al. Cell-free translation reconstituted with purified components. Nat. Biotechnol. 19, 751–755 (2001).

146. Martin, R. W. et al. Cell-free protein synthesis from genomically recoded bacteria enables multisite incorporation of noncanonical amino acids. Nat. Commun. 9, 1203 (2018).

147. Yin, G. et al. RF1 attenuation enables efficient non-natural amino acid incorporation for production of homogeneous antibody drug conjugates. Sci. Rep. 7, 3026 (2017).

148. Chin, J. W., Martin, A. B., King, D. S., Wang, L. & Schultz, P. G. Addition of a photocrosslinking amino acid to the genetic code of Escherichia coli. Proc. Natl. Acad. Sci. 99, 11020–11024 (2002).

149. Kuru, E. et al. Mechanisms of Incorporation for D-Amino Acid Probes That Target Peptidoglycan Biosynthesis. ACS Chem. Biol. 14, 2745–2756 (2019).

150. Meijer, M., Hendriks, H. S., Heusinkveld, H. J., Langeveld, W. T. & Westerink, R. H. S. Comparison of plate reader-based methods with fluorescence microscopy for measurements of intracellular calcium levels for the assessment of in vitro neurotoxicity. Neurotoxicology 45, 31–37 (2014).

151. Wan, W. et al. A Facile System for Genetic Incorporation of Two Different Noncanonical Amino Acids into One Protein in Escherichia coli. Angew. Chem. Int. Ed. 49, 3211–3214 (2010).

152. Ma, N. J., Hemez, C. F., Barber, K. W., Rinehart, J. & Isaacs, F. J. Organisms with alternative genetic codes resolve unassigned codons via mistranslation and ribosomal rescue. eLife 7, e34878 (2018).

153. Lim, H. C. et al. Identification of new components of the RipC-FtsEX cell separation pathway of Corynebacterineae. PLOS Genet. 15, e1008284 (2019).

154. Studier, F. W. Protein production by auto-induction in high density shaking cultures. Protein Expr. Purif. 41, 207–234 (2005).

155. Liu, D. R. & Schultz, P. G. Progress toward the evolution of an organism with an expanded genetic code. Proc. Natl. Acad. Sci. 96, 4780–4785 (1999).

158 156. Liechti, G. W. et al. A new metabolic cell-wall labelling method reveals peptidoglycan in Chlamydia trachomatis. Nature 506, 507–510 (2014).

157. Ke, N., Landgraf, D., Paulsson, J. & Berkmen, M. Visualization of Periplasmic and Cytoplasmic Proteins with a Self-Labeling Protein Tag. J. Bacteriol. 198, 1035–1043 (2016).

158. Wang, H. H. et al. Programming cells by multiplex genome engineering and accelerated evolution. Nature 460, 894–898 (2009).

159. Nyerges, Á. et al. A highly precise and portable genome engineering method allows comparison of mutational effects across bacterial species. Proc. Natl. Acad. Sci. 113, 2502– 2507 (2016).

160. Otvos, Laszlo et al. Designer Antibacterial Peptides Kill Fluoroquinolone-Resistant Clinical Isolates. J. Med. Chem. 48, 5349–5359 (2005).

161. Swulius, M. T. & Jensen, G. J. The Helical MreB Cytoskeleton in Escherichia coli MC1000/pLE7 Is an Artifact of the N-Terminal Yellow Fluorescent Protein Tag. J. Bacteriol. 194, 6382–6386 (2012).

162. Ledbetter, M. P. & Romesberg, F. E. Editorial overview: Expanding the genetic alphabet and code. Curr. Opin. Chem. Biol. 46, A1–A2 (2018).

163. Mahaffee, W. F. & Kloepper, J. W. Temporal changes in the bacterial communities of soil, rhizosphere, and endorhiza associated with field-grown cucumber (Cucumis sativus L.). Microb. Ecol. 34, 210–223 (1997).

164. Mahaffee, W. F. & Kloepper, J. W. Bacterial communities of the rhizosphere and endorhiza associated with field-grown cucumber plants inoculated with a plant growth- promoting rhizobacterium or its genetically modified derivative. Can. J. Microbiol. 43, 344– 353 (1997).

165. Earl, A. M., Losick, R. & Kolter, R. Ecology and genomics of Bacillus subtilis. Trends in Microbiology vol. 16 269–275 (2008).

166. Losick, R., Youngman, P. & Piggot, P. J. Genetics of Endospore Formation in Bacillus Subtilis. Annu. Rev. Genet. 20, 625–669 (1986).

167. Mckenney, P. T., Driks, A. & Eichenberger, P. The Bacillus subtilis endospore: Assembly and functions of the multilayered coat. Nature Reviews Microbiology vol. 11 33–44 (2013).

168. Kearns, D. B., Chu, F., Branda, S. S., Kolter, R. & Losick, R. A master regulator for biofilm formation by Bacillus subtilis. Mol. Microbiol. 55, 739–749 (2005).

169. Branda, S. S., González-Pastor, J. E., Ben-Yehuda, S., Losick, R. & Kolter, R. Fruiting body formation by Bacillus subtilis. Proc. Natl. Acad. Sci. U. S. A. 98, 11621–11626 (2001).

159 170. Bai, Y., D’Aoust, F., Smith, D. L. & Driscoll, B. T. Isolation of plant-growth-promoting Bacillus strains from soybean root nodules. Can. J. Microbiol. 48, 230–238 (2002).

171. Sivasakthi, S., Usharani, G. & Saranraj, P. Biocontrol potentiality of plant growth promoting bacteria (PGPR) - Pseudomonas fluorescens and Bacillus subtilis: a review. Afr. J. Agric. Res. 9, 1265–1277 (2014).

172. Tam, N. K. M. et al. The intestinal life cycle of Bacillus subtilis and close relatives. J. Bacteriol. 188, 2692–2700 (2006).

173. Hong, H. A., Le, H. D. & Cutting, S. M. The use of bacterial spore formers as probiotics. FEMS Microbiology Reviews vol. 29 813–835 (2005).

174. Cutting, S. M. Bacillus probiotics. Food Microbiology vol. 28 214–220 (2011).

175. Cutting, S. M., Hong, H. A., Baccigalupi, L. & Ricca, E. Oral vaccine delivery by recombinant spore probiotics. International Reviews of Immunology vol. 28 487–505 (2009).

176. Kipper, K. et al. Application of Noncanonical Amino Acids for Protein Labeling in a Genomically Recoded Escherichia coli. ACS Synth. Biol. 6, 233–255 (2017).

177. Westers, L., Westers, H. & Quax, W. J. Bacillus subtilis as cell factory for pharmaceutical proteins: a biotechnological approach to optimize the host organism. Biochim. Biophys. Acta BBA - Mol. Cell Res. 1694, 299–310 (2004).

178. Song, Y., Nikoloff, J. M. & Zhang, D. Improving Protein Production on the Level of Regulation of both Expression and Secretion Pathways in Bacillus subtilis. J. Microbiol. Biotechnol. 25, 963–977 (2015).

179. Kang, M., Lu, Y., Chen, S. & Tian, F. Harnessing the power of an expanded genetic code toward next-generation biopharmaceuticals. Curr. Opin. Chem. Biol. 46, 123–129 (2018).

180. Bisson-Filho, A. W. et al. Treadmilling by FtsZ filaments drives peptidoglycan synthesis and bacterial cell division. Science 355, 739–743 (2017).

181. Subramanian, S., Gao, X., Dann, C. E. & Kearns, D. B. MotI (DgrA) acts as a molecular clutch on the flagellar stator protein MotA in Bacillus subtilis. Proc. Natl. Acad. Sci. 114, 13537–13542 (2017).

182. Hussain, S. et al. MreB filaments align along greatest principal membrane curvature to orient cell wall synthesis. eLife 7, e32471 (2018).

183. England, C. G., Ehlerding, E. B. & Cai, W. NanoLuc: A Small Luciferase is Brightening up the Field of Bioluminescence. Bioconjug. Chem. 27, 1175–1187 (2016).

160 184. Mohler, K. et al. MS-READ: Quantitative Measurement of Amino Acid Incorporation. Biochim. Biophys. Acta 1861, 3081–3088 (2017).

185. Wolf, C. et al. Elucidation of the presence and location of t-Boc protecting groups in amines and dipeptides using on-column H/D exchange HPLC/ESI/MS. J. Am. Soc. Mass Spectrom. 16, 553–564 (2005).

186. Hankore, E. D. et al. Genetic Incorporation of Noncanonical Amino Acids Using Two Mutually Orthogonal Quadruplet Codons. ACS Synth. Biol. 8, 1168–1174 (2019).

187. Mandell, D. J. et al. Biocontainment of genetically modified organisms by synthetic protein design. Nature 518, 55–60 (2015).

188. Berlatzky, I. A., Rouvinski, A. & Ben-Yehuda, S. Spatial organization of a replicating bacterial chromosome. Proc. Natl. Acad. Sci. U. S. A. 105, 14136–14140 (2008).

189. Sysoeva, T. A., Zepeda-Rivera, M. A., Huppert, L. A. & Burton, B. M. Dimer recognition and secretion by the ESX secretion system in Bacillus subtilis. Proc. Natl. Acad. Sci. 111, 7653–7658 (2014).

190. Wang, T. et al. Incorporation of nonstandard amino acids into proteins: principles and applications. World J. Microbiol. Biotechnol. 36, 60 (2020).

191. Itaya, M. et al. Far rapid synthesis of giant DNA in the Bacillus subtilis genome by a conjugation transfer system. Sci. Rep. 8, 8792 (2018).

192. Dalia, A. B., McDonough, E. & Camilli, A. Multiplex genome editing by natural transformation. Proc. Natl. Acad. Sci. U. S. A. 111, 8937–8942 (2014).

193. Ye, B. et al. Unmarked genetic manipulation in Bacillus subtilis by natural co- transformation. J. Biotechnol. 284, 57–62 (2018).

194. Kuru, E. et al. Release Factor Inhibiting Antimicrobial Peptides Improve Nonstandard Amino Acid Incorporation in Wild-type Bacterial Cells. ACS Chem. Biol. 15, 1852–1861 (2020).

195. Zhu, B. & Stülke, J. SubtiWiki in 2018: from genes and proteins to functional network annotation of the model organism Bacillus subtilis. Nucleic Acids Res. 46, D743–D748 (2018).

196. Ko, W., Kumar, R., Kim, S. & Lee, H. S. Construction of Bacterial Cells with an Active Transport System for Unnatural Amino Acids. ACS Synth. Biol. 8, 1195–1203 (2019).

197. Roy, G. et al. Development of a high yielding expression platform for the introduction of non-natural amino acids in protein sequences. mAbs 12, 1684749 (2020).

161 198. Völler, J.-S. & Budisa, N. Coupling genetic code expansion and metabolic engineering for synthetic cells. Curr. Opin. Biotechnol. 48, 1–7 (2017).

199. Zhao, J., Burke, A. J. & Green, A. P. Enzymes with noncanonical amino acids. Curr. Opin. Chem. Biol. 55, 136–144 (2020).

200. Calero, P. & Nikel, P. I. Chasing bacterial chassis for metabolic engineering: a perspective review from classical to non-traditional microorganisms. Microb. Biotechnol. 12, 98–124 (2019).

201. Hsu, Y.-P. et al. Fluorogenic D-amino acids enable real-time monitoring of peptidoglycan biosynthesis and high-throughput transpeptidation assays. Nat. Chem. 11, 335–341 (2019).

202. Ravikumar, A., Arzumanyan, G. A., Obadi, M. K. A., Javanpour, A. A. & Liu, C. C. Scalable, Continuous Evolution of Genes at Mutation Rates above Genomic Error Thresholds. Cell 175, 1946-1957.e13 (2018).

203. AlQuraishi, M. End-to-End Differentiable Learning of Protein Structure. Cell Syst. 8, 292- 301.e3 (2019).

204. Cervettini, D. et al. Rapid discovery and evolution of orthogonal aminoacyl-tRNA synthetase–tRNA pairs. Nat. Biotechnol. 38, 989–999 (2020).

205. M, G. et al. Molecular basis for bacterial class I release factor methylation by PrmC. Mol. Cell 20, 917–927 (2005).

206. Schumacher, D., Hackenberger, C. P. R., Leonhardt, H. & Helma, J. Current Status: Site- Specific Antibody Drug Conjugates. J. Clin. Immunol. 36, 100–107 (2016).

207. Zheng, Y., Lewis, T. L., Igo, P., Polleux, F. & Chatterjee, A. Virus-Enabled Optimization and Delivery of the Genetic Machinery for Efficient Unnatural Amino Acid Mutagenesis in Mammalian Cells and Tissues. ACS Synth. Biol. 6, 13–18 (2017).

208. Yang, M. et al. Engineering Bacillus subtilis as a Versatile and Stable Platform for Production of Nanobodies. Appl. Environ. Microbiol. 86, (2020).

209. Schallmey, M., Singh, A. & Ward, O. P. Developments in the use of Bacillus species for industrial production. Can. J. Microbiol. 50, 1–17 (2004).

210. Koh, M., Yao, A., Gleason, P. R., Mills, J. H. & Schultz, P. G. A General Strategy for Engineering Noncanonical Amino Acid Dependent Bacterial Growth. J. Am. Chem. Soc. 141, 16213–16216 (2019).

211. Albayrak, C. & Swartz, J. R. Direct Polymerization of Proteins. ACS Synth. Biol. 3, 353–362 (2014).

162 212. Hauf, M. et al. Photoactivatable Mussel-Based Underwater Adhesive Proteins by an Expanded Genetic Code. Chembiochem Eur. J. Chem. Biol. 18, 1819–1823 (2017).

213. Wang, L., Zhang, Z., Brock, A. & Schultz, P. G. Addition of the keto functional group to the genetic code of Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 100, 56–61 (2003).

214. Chin, J. W. et al. Addition of p-Azido-l-phenylalanine to the Genetic Code of Escherichia coli. J. Am. Chem. Soc. 124, 9026–9027 (2002).

215. Wang, L., Brock, A. & Schultz, P. G. Adding l-3-(2-Naphthyl)alanine to the Genetic Code of E. coli. J. Am. Chem. Soc. 124, 1836–1837 (2002).

216. Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009).

217. Ellefson, J. W. et al. Directed evolution of genetic parts and circuits by compartmentalized partnered replication. Nat. Biotechnol. 32, 97–101 (2014).

218. Salis, H. M., Mirsky, E. A. & Voigt, C. A. Automated design of synthetic ribosome binding sites to control protein expression. Nat. Biotechnol. 27, 946–950 (2009).

219. Kunjapur, A. M., Tarasova, Y. & Prather, K. L. J. Synthesis and Accumulation of Aromatic Aldehydes in an Engineered Strain of Escherichia coli. J. Am. Chem. Soc. 136, 11644–11654 (2014).

220. Yu, D. et al. An efficient recombination system for chromosome engineering in Escherichia coli. Proc. Natl. Acad. Sci. 97, 5978–5983 (2000).

221. DeVito, J. A. Recombineering with tolC as a Selectable/Counter-selectable Marker: remodeling the rRNA Operons of Escherichia coli. Nucleic Acids Res. 36, e4 (2008).

222. Gregg, C. J. et al. Rational optimization of tolC as a powerful dual selectable marker for genome engineering. Nucleic Acids Res. 42, 4779–4790 (2014).

223. Isaacs, F. J. et al. Precise Manipulation of Chromosomes in Vivo Enables Genome-Wide Codon Replacement. Science 333, 348–353 (2011).

224. St-Pierre, F. et al. One-Step Cloning and Chromosomal Integration of DNA. ACS Synth. Biol. 2, 537–541 (2013).

225. Wannier, T. M. et al. Adaptive evolution of genomically recoded Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 115, 3090–3095 (2018).

226. Kunjapur, A. M., Hyun, J. C. & Prather, K. L. J. Deregulation of S-adenosylmethionine biosynthesis and regeneration improves methylation in the E. coli de novo vanillin biosynthesis pathway. Microb. Cell Factories 15, 61 (2016).

163 227. Ledoux, S. & Uhlenbeck, O. C. [3′-32P]-labeling tRNA with nucleotidyltransferase for assaying aminoacylation and peptide bond formation. Methods 44, 74–80 (2008).

228. Goto, Y., Katoh, T. & Suga, H. Flexizymes for genetic code reprogramming. Nat. Protoc. 6, 779–790 (2011).

229. Sambrook, J. : A Laboratory Manual, Third Edition. (Cold Spring Harbor Laboratory Press, 2001).

230. Morton, E. R. & Fuqua, C. Genetic manipulation of Agrobacterium. Curr. Protoc. Microbiol. Chapter 3, Unit 3D.2. (2012).

231. Shevchenko, A., Wilm, M., Vorm, O. & Mann, M. Mass Spectrometric Sequencing of Proteins from Silver-Stained Polyacrylamide Gels. Anal. Chem. 68, 850–858 (1996).

232. J, P. & Sp, G. Proteomics: the move to mixtures. J. Mass Spectrom. JMS 36, 1083–1091 (2001).

233. Eng, J. K., McCormack, A. L. & Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).

234. Ducret, A., Quardokus, E. M. & Brun, Y. V. MicrobeJ, a tool for high throughput bacterial cell detection and quantitative analysis. Nat. Microbiol. 1, 1–7 (2016).

235. Harwood, C. R. & Cutting, S. M. Molecular biological methods for Bacillus. (Wiley, 1990).

236. Burton, B. M., Marquis, K. A., Sullivan, N. L., Rapoport, T. A. & Rudner, D. Z. The ATPase SpoIIIE Transports DNA across Fused Septal Membranes during Sporulation in Bacillus subtilis. Cell 131, 1301–1312 (2007).

237. Huppert, L. A. et al. The ESX System in Bacillus subtilis Mediates Protein Secretion. PLOS ONE 9, e96267 (2014).

164