Transposon-Based Tools for Enhancing Protein Function

By

Dana Caldwell Nadler

A dissertation submitted in partial satisfaction of the

requirements for the degree of

Doctor of Philosophy

in

Chemical Engineering

in the

Graduate Division

of the

University of California, Berkeley

Committee in charge:

Professor David F. Savage, Co-Chair Professor Danielle T. Tullman-Ercek, Co-Chair Professor Jay D. Keasling Professor John E. Dueber

Fall 2014

Copyright © 2014

by Dana Caldwell Nadler

Abstract

Transposon-Based Tools for Enhancing Protein Function

by

Dana Caldwell Nadler

Doctor of Philosophy in Chemical Engineering

University of California, Berkeley

Professor David F. Savage, Co-Chair Professor Danielle T. Tullman-Ercek, Co-Chair

Multi-domain proteins are a common and essential component of the cellular proteome and contribute complex functions that single domains cannot achieve. The field of protein engineering uses sophisticated techniques to modify proteins for a given purpose, but generally focuses on single domains without consideration of the protein’s overall topology. In this work, we set out to develop tools that enable the high-throughput sampling of different domain connections. A novel method of random transposon-mediated domain insertion was devised, which efficiently created large libraries of domain fusions. The utility of this method is first demonstrated with insertions of circularly permuted GFP into maltose-binding protein, coupled with fluorescence-activated cell sorting (FACS), to isolate biosensor proteins that are allosterically regulated by maltose. We next demonstrate the generality of this method by constructing a pool of functional dCas9 variants with insertions of a PDZ protein-protein interaction domain. Ultimately, these dCas9 variants may prove useful as a scaffold for the recruitment of engineered proteins to specific sites in the . These successful protein-engineering efforts illustrate the advanced functions possible with multi-domain constructs and how this transposon-based insertion method can facilitate the creation and study of this new class of synthetic proteins.

1

Acknowledgements

Graduate school has been a lengthy, unpredictable, and stimulating journey. I am thankful for many people who provided support and guidance along the way, without whom I would not have been able to complete this work.

I am indebted to my advisor, Prof. David Savage, for giving me the opportunity to pursue my academic goals. He showed me the positive side of science and taught me that good things are always right around the corner. His unceasing optimism and work ethic inspired me everyday.

Thanks to all of the members of the Savage lab for creating a fun and engaging work environment. Thanks especially to the members of Team Biosensor, Stacy and Katie, for collaborating with me on much of this work and always providing insightful feedback. Thanks to Ben for collaborating with me on the work for Chapter 4, and to Rayka for being my bay-mate and contributing to my organizational endeavors.

I am thankful for the love and support I received from my family during graduate school and beyond. I am also deeply grateful for my partner in life, Anna, for her unconditional encouragement and love. Finally, I dedicate this work to my grandparents, Gerald and Elaine.

i Table of Contents

Acknowledgements ...... i Table of Contents ...... ii List of Figures ...... v List of Tables ...... v List of Abbreviations ...... vi

Chapter 1 Introduction 1.1 Protein Engineering ...... 1 1.2 Allostery ...... 2 1.3 Transposons used for protein engineering ...... 3 1.4 Allosteric Biosensors ...... 4 1.5 References ...... 5

Chapter 2 Transposon-Mediated Domain Insertion 2.1 Introduction ...... 10 2.1.1 In Vitro Transposition ...... 10 2.1.2 Modified Transposons ...... 11 2.2 Materials and Methods ...... 12 2.2.1 Strains and Media ...... 12 2.2.2 DNA manipulation ...... 12 2.2.3 Creating Modified Transposons by PCR ...... 13 2.2.4 Creating Modified Asymmetric Transposons by Cloning ...... 13 2.2.5 Excising Transposon DNA from ...... 14 2.2.6 Transposition Reactions ...... 15 2.2.7 Cloning into Modified Transposons ...... 16 2.2.8 Target/Expression Plasmids ...... 17 2.3 Results ...... 19 2.3.1 Possible Restriction Sites for Modified Transposons ...... 19 2.3.2 PCR-Generated Modified Transposons ...... 20 2.3.3 Plasmid-Propagated Asymmetric Transposons ...... 22 2.3.4 Cloning into Modified Transposons ...... 26 2.3.5 Optimizing a BsaI-Modified Transposon ...... 28 2.4 Discussion ...... 29 2.5 References ...... 30

Chapter 3 Fluorescent Biosensor Development 3.1 Introduction ...... 33 3.1.1 Single-FP Biosensors ...... 33 3.1.2 Producing Functional Single-FP Biosensors ...... 35 3.1.3 Screening Libraries of Potential Biosensors ...... 36 3.2 Materials and Methods ...... 38 3.2.1 Strains and Media ...... 38 3.2.2 DNA manipulation ...... 38 3.2.3 Cloning Minimal Expression Plasmid ...... 39 ii 3.2.4 Maltose-Binding Protein Cloning ...... 39 3.2.5 Excising Transposon DNA from Propagation Plasmid ...... 39 3.2.6 Transposition Reactions ...... 39 3.2.7 Cloning cpEGFP into Transposon Insertion Libraries ...... 40 3.2.8 PCR Analysis of Insertion Libraries ...... 41 3.2.9 Cloning MBP-170 Library ...... 41 3.2.10 Fluorescence-Activated Cell Sorting (FACS) ...... 42 3.2.11 Microplate Biosensor Testing ...... 43 3.2.12 Single-Cell Measurements on Sony SH800 ...... 43 3.2.13 In Vitro Biosensor Analysis ...... 44 3.3 Results ...... 45 3.3.1 Construction of a MBP-cpEGFP Library ...... 45 3.3.2 FACS Enrichment of Biosensors from a Library ...... 48 3.3.3 Targeted Analysis of Enriched MBP-cpEGFP Constructs ...... 51 3.3.4 Linker Optimization of MBP170-cpEGFP ...... 52 3.3.5 In Vitro Biochemical Analysis of a Functional Maltose Sensor ...... 57 3.4 Discussion ...... 57 3.5 References ...... 60

Chapter 4 Domain Insertions for Protein Recruitment 4.1 Introduction ...... 63 4.1.1 Current Uses of Cas9 ...... 65 4.1.2 Initial Cas9 Engineering Questions ...... 65 4.1.3 Goals ...... 66 4.2 Materials and Methods ...... 67 4.2.1 Strains and Media ...... 67 4.2.2 Electrocompetent E. coli preparation for library construction ...... 67 4.2.3 DNA Manipulation ...... 68 4.2.4 Plasmids ...... 68 4.2.5 Creation of dCas9 Transposon-Insert Library ...... 68 4.2.6 PDZ Cloning into Insert Library ...... 69 4.2.7 FACS Screening of dCas9 Library ...... 70 4.2.8 Library Analysis ...... 71 4.2.9 Isolated Construct Analysis ...... 71 4.2.10 Next-Generation Sequencing ...... 72 4.3 Results ...... 72 4.3.1 Screening dCas9 Using Fluorescent Proteins ...... 72 4.3.2 Creating and Screening of a PDZ-dCas9 Library ...... 74 4.3.3 Determining Screening Enrichment of PDZ-dCas9 Domain Insertions ...... 75 4.3.4 Identifying and Testing PDZ-dCas9 Clones from a Screened Library ...... 76 4.3.5 Next-Generation Sequencing of PDZ-Insert Libraries ...... 77 4.4 Discussion ...... 78 4.5 References ...... 79

iii Chapter 5 Conclusions 5.1 Summary ...... 82 5.2 Discussion ...... 83 5.3 Future Directions ...... 85 5.4 References ...... 86

Appendix A: Published Single-FP Biosensors ...... 87 Appendix B: Restriction-Site Search Python Script ...... 91 Appendix C: DNA Sequences ...... 96 Sequences from Chapter 2 ...... 96 Sequences from Chapter 3 ...... 113 Sequences from Chapter 4 ...... 120

iv List of Figures

Figure 2-1: Schematic outline of an in vitro, minimal transposition system ...... 11 Figure 2-2: Schematic of a modified transposon with restriction sites ...... 12 Figure 2-3: Mutations in the transposon sequence that produce restriction sites .... 20 Figure 2-4: Tested mutations within transposon ends ...... 21 Figure 2-5: Overview of modified transposon insertion and subsequent cloning ..... 24 Figure 2-6: Agarose-gel analysis of transposon-insert libraries ...... 26 Figure 2-7: Expression testing of libraries after cloning of a fluorescent domain ..... 27 Figure 2-8: Overview of new modified transposon designs ...... 29

Figure 3-1: Overview of single-FP-based biosensors ...... 34 Figure 3-2 Outline of biosensor development strategy ...... 36 Figure 3-3 Schematic of biosensor screening strategy ...... 37 Figure 3-4: Schematic for the production of a cpEGFP-insertion library ...... 46 Figure 3-5: Sequence details of transposition and cloning strategy ...... 47 Figure 3-6: PCR analysis of insertion library diversity ...... 48 Figure 3-7: Fluorescence-activated cell sorting (FACS) of MBP-cpEGFP library ...... 50 Figure 3-8: Microplate in vivo analysis of MBP-cpEGFP constructs from sorting ...... 52 Figure 3-9: Microplate testing of MBP170 insertion constructs ...... 53 Figure 3-10: Flow cytometry measurements of select MBP170 constructs ...... 56 Figure 3-11: In vitro analysis of MBP170-cpEGFP construct B2 ...... 57 Figure 3-12: Overall workflow timeline of biosensor creation platform ...... 58

Figure 4-1: Holo Cas9 model and its potential uses...... 64 Figure 4-2: Screen for functional dCas9s ...... 73 Figure 4-3: Cell sorting data from the GFP-RFP screen ...... 75 Figure 4-4: Checking success of a screen and picking final clones ...... 76 Figure 4-5: Validating functionality of engineered dCas9 ...... 77 Figure 4-6: Next-generation sequencing of screened PDZ-dCas9 library ...... 78

List of Tables

Table 2-1: List of DNA primers ...... 18 Table 3-1: List of DNA primers ...... 45 Table 3-2: Linker composition of top eight MBP170 constructs ...... 54 Table 4-1: List of DNA primers ...... 72

v List of Abbreviations

A ...... Adenine Nucleotide AMP ...... Adenosine Monophosphate AmpR ...... Ampicillin/Carbenicillin Resistance Gene ATC ...... Anhydrotetracycline bp ...... Base Pair(s) C ...... Cytosine Nucleotide CamR or CmR ...... Chloramphenicol Resistance Gene Cas ...... CRISPR-Associated genes CAT ...... Chloramphenicol Acetyltransferase CFU ...... Colony-Forming Unit cpEGFP ...... Circularly-Permuted Enhanced Green Fluorescent Protein cpGFP ...... Circularly-Permuted Green Fluorescent Protein cpSFGFP ...... Circularly-Permuted Super Folder Green Fluorescent Protein CRISPR ...... Clustered Regularly Interspaced Short Palindromic Repeats CRISPRi ...... CRISPR interference crRNA ...... CRISPR-RNA dCas9 ...... Catalytically Dead mutant of the Cas9 enzyme DOI ...... Domain of Interest DMSO ...... Dimethyl Sulfoxide EGFP ...... Enhanced Green Fluorescent Protein FACS ...... Fluorescence-Activated Cell Sorting FP ...... Fluorescent Protein FRET ...... Förster Resonance Energy Transfer G ...... Guanine Nucleotide GG ...... Golden Gate (cloning) GFP ...... Green Fluorescent Protein GOI ...... Gene of Interest IPTG ...... Isopropyl β-D-1-Thiogalactopyranoside IR ...... Inverted Repeat KanR ...... Kanamycin Resistance Gene LB ...... Lysogeny Broth LBD ...... Ligand-Binding Domain MBP ...... Maltose-Binding Protein MOPS ...... 3-Morpholinopropane-1-Sulfonic Acid NGS ...... Next-Generation Sequencing OD600 ...... Optical Density at 600 nm ORF ...... Open Reading Frame PAM ...... Protospacer Adjacent Motif PBS ...... Phosphate-Buffered Saline PCR ...... Chain Reaction RBS ...... Ribosomal RFP ...... Red Fluorescent Protein rpm ...... Revolutions Per Minute SD ...... Standard Deviation vi sgRNA ...... Single Guide RNA SpCas9 ...... Streptococcus pyogenes Cas9 sfGFP ...... Super Folder Green Fluorescent Protein T ...... Thymine Nucleotide TE ...... tracrRNA ...... Trans-Activating crRNA v/v...... Volume to Volume (percent) w/v ...... Weight to Volume (percent) WT ...... Wild-Type X-Gal ...... 5-bromo-4-chloro-3-indolyl-β-D-galactopyranoside

vii Chapter 1 Introduction

1.1 Protein Engineering

Protein engineering is a field of research that attempts to alter, improve, or design proteins. The field emerged in the early 1980’s with the development of oligonucleotide-based DNA mutagenesis techniques (Knowles 1987, Ulmer 1983). Early studies used these techniques to investigate the structure-function relationships within proteins. For example, the theories developed for the and oxyanion binding site of serine proteases were confirmed through site- directed mutagenesis of subtilisin (Carter & Wells 1988, Wells et al. 1986). As mutagenesis methods improved rapidly (Smith 1985), researchers began making alterations with the goal of improving proteins. Like subtilisin, tyrosyl-transfer-RNA synthetase was used as a model for protein engineering and rational mutations were used to increase its enzymatic activity 250 fold (Wilkinson et al. 1984).

The 1990’s saw protein engineering move beyond rational design approaches with the advent of random-mutation techniques (Leung et al. 1989, Stemmer 1994). The new evolution-based strategies (“irrational design”) were able to improve properties beyond what could be accomplished with a few point mutations (Arnold 1998). For example, the melting temperature of a para-nitrobenzyl esterase was increased 14°C (Giver et al. 1998) and a cephalosporinase was engineered to have >270 fold increase in antibiotic-resistance activity (Crameri et al. 1998). In the 2000’s, researchers started to use completely de novo design strategies to engineer new proteins (Kuhlman et al. 2003). However, for most purposes these methods needed to be complemented with traditional protein engineering techniques (Röthlisberger et al. 2008). Most experiments in the first ~20 years of protein engineering concentrated on simple enhancements such as increased expression level, enzyme stability, or catalytic efficiency (Goldsmith & Tawfik 2012). More recent research has benefited from a better understanding of protein function as well as gains in technology, such as computing power and high-throughput screening instrumentation.

The latest advances in protein engineering have gone beyond enhancement of single properties and introduced network or intra-protein communication (Fastrez 2009, Khalil & Collins 2010). These advanced functions generally require multi-domain proteins; domains can be thought of as the structural, functional, or evolutionary units of a protein (Ponting & Russell 2002, Vogel et al. 2004). Once a single domain naturally develops a novel function (diverges), it can be used repeatedly in evolution so as to not “remake the wheel.” To a first approximation, proteins are just a sum of their functional parts, but interactions, communication, and cooperative effects can enhance the final function. Nature has produced ever more intricate functionality in single proteins by evolutionary processes that includes domain swapping, shuffling, duplication, and insertion (Aroul-Selvam et al. 2004, Vogel et al. 1 2004). Researchers have recognized the potential of these natural processes and used protein-engineering techniques to mimic them and bring domains together in new ways. Notable examples are the creation of scaffolded enzyme systems (Dueber et al. 2009), light-activated (Lee et al. 2008) and networks (Airan et al. 2009), and improved enzymes for binding and hydrolysis of biomass (Bommarius et al. 2014). There is great interest in multi-domain engineering approaches to produce valuable synthetic biology components, metabolic biosensors, and controllable genetic editing tools.

1.2 Allostery

Allosteric regulation (or allostery) was first described more than 5 decades ago and was defined as distinct binding sites in a protein having indirect interactions (Monod et al. 1965). Today, allostery is used to generally describe any energetic coupling between physically distinct sites in a protein (or other biological macromolecules), though definitions found in the literature can be contradictory (Fenton 2008). In nature, an event at one protein site, often the binding of an effector molecule, is commonly found to regulate the activity of a distal functional site. This allows cells to have finer control over a given protein activity without relying on competitive binding or concentration changes of the protein. Classic examples of proteins exhibiting allostery are hemoglobin (regulation by oxygen), (regulation by AMP and fructose-2,6-bisphosphate), and LacI repressor (regulation by allolactose) (Changeux 2012). In addition, allosteric proteins will often divide the sites that indirectly communicate between distinct domains (Motlagh et al. 2014). For example, LacI has separate domains for binding of its cognate ligand and for binding of DNA (Friedman et al. 1995). Beyond systems biology, the implications of allostery in cellular physiology, including disease states, is an emerging area of research that could have a wide impact (Nussinov & Tsai 2013).

Allostery has been observed and studied in a myriad of proteins, but the underlying mechanism is still under debate. The classic view is based on macro-level changes in structural conformation, generally visible in solved crystal structures, brought on by effectors or conditions (Changeux & Edelstein 2005, Laskowski et al. 2009). An alternative modern theory views internal protein dynamics and the ensemble of protein microstates (both not visible in crystal structures) as paramount to understanding allosteric mechanisms (Smock & Gierasch 2009, Tzeng & Kalodimos 2011, Wand 2013). For example, the classic mechanical view has led to theories of amino-acid-connected pathways within proteins that link allosteric sites (Lockless & Ranganathan 1999, Süel et al. 2003). However, the “dynamic allostery” model better explains certain phenomena such as inverted responses found with single mutations or in new environments (Motlagh & Hilser 2012, Reichheld et al. 2009).

2 Allostery is one of the advanced functionalities that is desirable for engineering into multi-domain proteins, as mentioned above. There are some successful examples of functionally connecting two domains to create a synthetic allosteric protein, sometimes referred to as “protein switches” (Cross et al. 2013, Meister & Joshi 2013). Though, with limited understanding of the underlying mechanism, it remains difficult to rationally design allostery into a protein. Even the most cutting-edge methods are still only semi-rational and require abundant data on the domains of interest (Reynolds et al. 2011, Schuyler et al. 2009). Approaches that are more consistently successful are at the other spectrum of protein engineering: directed evolution. Adapting the natural processes of domain swapping, shuffling, or insertion with sophisticated in vitro approaches, researchers have made DNA libraries and used laboratory evolution to isolate allosteric proteins (Edwards et al. 2008, Guntas et al. 2005, Kanwar et al. 2013). This methodology negates the need for a mechanistic understanding of allostery or much data on the domains to be engineered.

1.3 Transposons used for protein engineering

Transposons are stretches of DNA that can efficiently move within or between (Montaño & Rice 2011). In molecular biology, researchers have co-opted the natural insertion ability of transposons for in vitro methods of DNA manipulation (Haapa et al. 1999). Successful protein engineering technologies have been developed with the transposon’s unique capacity for random insertion (Reznikoff 2006). Prominent examples include the creation of split-protein systems (Segall-Shapiro et al. 2014), mutations to single codons (Daggett et al. 2009, Liu & Cropp 2012), and circularly permuted protein variants (Mehta et al. 2012).

Transposons are excellently suited for the construction of domain-insertion libraries for engineering allostery. Alternative methods have been developed using random nuclease cleavage to make insertion libraries, but they suffer from deletions, duplications, and low efficiencies (Biondi et al. 1998, Guntas & Ostermeier 2004, Tullman et al. 2011). Transposon methods, however, have proved to be accurate and efficient in library creation (Edwards et al. 2008, Shah et al. 2012). These published examples do have issues as well, such as relying on blunt-end cloning or adding extra amino acids between domains. However, if these shortfalls can be ameliorated, transposon-based library generation could be very powerful for laboratory evolution of allostery. Additionally, transposition is independent of the target DNA sequence or length, so a transposon-based strategy would scale up easily. This is a great advantage over PCR-based cloning strategies, where the library creation methods would need to be redesigned for every new protein target.

3 1.4 Allosteric Biosensors

Genetically encoded biosensors are invaluable to the study of small molecules within a cellular environment (Okumoto 2010). Detecting or measuring metabolites in vivo can lead to a better understanding of physiology and metabolic disease states (Berg et al. 2009, Chen et al. 2013, DeBerardinis & Thompson 2012). All genetically encoded biosensors are examples of , where a construct has a recognition (binding) element that interacts with an output (signaling) element. The mechanism for recognition is generally the binding of a ligand of interest, while the output can be any signal that is measureable but is often fluorescence. There are a number of RNA- and protein-based biosensors available with different modes of action. Fluorescent sensors built from RNA are an exciting new achievement but will need to improve its output signal to find wide-spread use (Dean & Palmer 2014, Paige et al. 2012). Other sensors, both RNA- and protein-based, rely on an intermediate step to deliver an observable output. These transcription-factor or riboswitch methods alter the transcription or translation, respectively, of a reporter gene in response to a ligand (Groher & Suess 2014, Schallmey et al. 2014). These sensors have excellent amplified signals, but cannot measure a ligand in real-time. They have found more utility in screening for end-point levels of molecules (Binder et al. 2012, Mustafi et al. 2012).

Allosteric biosensors engineered with fluorescent proteins (FPs), sometimes referred to as nanosensors, offer advantages that have not been met by other alternatives (Okumoto et al. 2012). FP-based sensors are composed of a recognition element (or domain) with one or two FPs inserted into its sequence. A single-FP construct has an output of fluorescence intensity from a chromophore that is allosterically linked to the ligand-binding domain (LBD) (Akerboom et al. 2009, Nakai et al. 2001). A dual-FP sensor functions by Förster resonance energy transfer between the two FPs, which is modulated by the LBD (Fehr et al. 2002, Miyawaki et al. 1997). Both types of FP-based sensors have real-time and relatively bright signal responses. They have both also been successful in measuring metabolism in vivo, though single-FP sensors generally have a greater dynamic range than dual-FP sensors (Okumoto et al. 2012). Another advantage of the single-FP is that it only requires a single-wavelength detection setup making it easier to use and compatible with more orthogonal fluorescent sensors.

Single-FP biosensors have utility in metabolism research, but also provide a desirable template to advance protein-engineering techniques for creating allostery. They are composed of only two domains and have an easily detectable output. Like other allosteric proteins, no design rules have been discovered to create functional sensors. Not many biosensors of small molecules have been published (<10) because of this lack of understanding. However, creating all possible domain connections between a LBD and a FP could produce some constructs that are allosterically linked. Fluorescence in the presence or absence of the cognate ligand is then an efficient way to screen for allostery. Thus, transposon-mediated library

4 construction coupled with an appropriate screening strategy should be an efficient and robust route to single-FP biosensors.

1.5 References

Airan RD, Thompson KR, Fenno LE, Bernstein H, Deisseroth K. 2009. Temporally precise in vivo control of intracellular signalling. Nature. 458(7241):1025–29 Akerboom J, Rivera JDV, Guilbe MMR, Malavé ECA, Hernandez HH, et al. 2009. Crystal structures of the GCaMP calcium sensor reveal the mechanism of fluorescence signal change and aid rational design. J Biol Chem. 284(10):6455– 64 Arnold FH. 1998. Design by directed evolution. Accounts of chemical research Aroul-Selvam R, Hubbard T, Sasidharan R. 2004. Domain insertions in protein structures. J Mol Biol. 338(4):633–41 Berg J, Hung YP, Yellen G. 2009. A genetically encoded fluorescent reporter of ATP:ADP ratio. Nat Methods. 6(2):161–66 Binder S, Schendzielorz G, Stäbler N, Krumbach K, Hoffmann K, et al. 2012. A high- throughput approach to identify genomic variants of bacterial metabolite producers at the single-cell level. Genome Biol. 13(5):R40 Biondi RM, Baehler PJ, Reymond CD, Véron M. 1998. Random insertion of GFP into the cAMP-dependent protein regulatory subunit from Dictyostelium discoideum. Nucleic Acids Res. 26(21):4946–52 Bommarius AS, Sohn M, Kang Y, Lee JH, Realff MJ. 2014. Protein engineering of cellulases. Curr Opin Biotech. 29:139–45 Carter P, Wells JA. 1988. Dissecting the catalytic triad of a serine protease. Nature. 332(6164):564–68 Changeux J-P. 2012. Allostery and the Monod-Wyman-Changeux model after 50 years. Annu Rev Biophys. 41:103–33 Changeux J-P, Edelstein SJ. 2005. Allosteric mechanisms of signal transduction. Science. 308(5727):1424–28 Chen T-W, Wardill TJ, Sun Y, Pulver SR, Renninger SL, et al. 2013. Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature. 499(7458):295–300 Crameri A, Raillard SA, Bermudez E, Stemmer WP. 1998. DNA shuffling of a family of genes from diverse species accelerates directed evolution. Nature. 391(6664):288–91 Cross PJ, Allison TM, Dobson RCJ, Jameson GB, Parker EJ. 2013. Engineering allosteric control to an unregulated enzyme by transfer of a regulatory domain. Proc. Natl. Acad. Sci. U.S.A. 110(6):2111–16

5 Daggett KA, Layer M, Cropp TA. 2009. A general method for scanning unnatural amino acid mutagenesis. ACS Chem. Biol. 4(2):109–13 Dean KM, Palmer AE. 2014. Advances in fluorescence labeling strategies for dynamic cellular imaging. Nat Chem Biol. 10(7):512–23 DeBerardinis RJ, Thompson CB. 2012. Cellular metabolism and disease: what do metabolic outliers teach us? Cell. 148(6):1132–44 Dueber JE, Wu GC, Malmirchegini GR, Moon TS, Petzold CJ, et al. 2009. Synthetic protein scaffolds provide modular control over metabolic flux. Nat. Biotechnol. 27(8):753–59 Edwards WR, Busse K, Allemann RK, Jones DD. 2008. Linking the functions of unrelated proteins using a novel directed evolution domain insertion method. Nucleic Acids Res. 36(13):e78 Fastrez J. 2009. Engineering allosteric regulation into biological catalysts. Chembiochem. 10(18):2824–35 Fehr M, Frommer WB, Lalonde S. 2002. Visualization of maltose uptake in living yeast cells by fluorescent nanosensors. P Natl Acad Sci Usa. 99(15):9846–51 Fenton AW. 2008. Allostery: an illustrated definition for the 'second secret of life'. Trends Biochem. Sci. 33(9):420–25 Friedman AM, Fischmann TO, Steitz TA. 1995. Crystal structure of lac repressor core tetramer and its implications for DNA looping. Science. 268(5218):1721–27 Giver L, Gershenson A, Freskgard PO, Arnold FH. 1998. Directed evolution of a thermostable esterase. P Natl Acad Sci Usa. 95(22):12809–13 Goldsmith M, Tawfik DS. 2012. Directed enzyme evolution: beyond the low-hanging fruit. Curr Opin Struct Biol. 22(4):406–12 Groher F, Suess B. 2014. Synthetic riboswitches - A tool comes of age. Biochim Biophys Acta. 1839(10):964–73 Guntas G, Mansell TJ, Kim JR, Ostermeier M. 2005. Directed evolution of protein switches and their application to the creation of ligand-binding proteins. P Natl Acad Sci Usa. 102(32):11224–29 Guntas G, Ostermeier M. 2004. Creation of an allosteric enzyme by domain insertion. J Mol Biol. 336(1):263–73 Haapa S, Taira S, Heikkinen E, Savilahti H. 1999. An efficient and accurate integration of mini-Mu transposons in vitro: a general methodology for functional genetic analysis and molecular biology applications. Nucleic Acids Res. 27(13):2777–84 Kanwar M, Wright RC, Date A, Tullman J, Ostermeier M. 2013. Protein switch engineering by domain insertion. Method Enzymol. 523:369–88 Khalil AS, Collins JJ. 2010. Synthetic biology: applications come of age. Nat. Rev. Genet. 11(5):367–79 6 Knowles JR. 1987. Tinkering with enzymes: what are we learning? Science. 236(4806):1252–58 Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D. 2003. Design of a novel globular protein fold with atomic-level accuracy. Science. 302(5649):1364–68 Laskowski RA, Gerick F, Thornton JM. 2009. The structural basis of allosteric regulation in proteins. FEBS Lett. 583(11):1692–98 Lee J, Natarajan M, Nashine VC, Socolich M, Vo T, et al. 2008. Surface sites for engineering allosteric control in proteins. Science. 322(5900):438–42 Leung DW, Chen E, Goeddel DV. 1989. A method for random mutagenesis of a defined DNA segment using a modified polymerase chain reaction. Technique. 1:11–15 Liu J, Cropp TA. 2012. A method for multi-codon scanning mutagenesis of proteins based on asymmetric transposons. Protein Eng Des Sel. 25(2):67–72 Lockless SW, Ranganathan R. 1999. Evolutionarily conserved pathways of energetic connectivity in protein families. Science. 286(5438):295–99 Mehta MM, Liu S, Silberg JJ. 2012. A transposase strategy for creating libraries of circularly permuted proteins. Nucleic Acids Res. 40(9):e71–e71 Meister GE, Joshi NS. 2013. An engineered calmodulin-based allosteric switch for Peptide biosensing. Chembiochem. 14(12):1460–67 Miyawaki A, Llopis J, Heim R, McCaffery JM, Adams JA, et al. 1997. Fluorescent indicators for Ca2+ based on green fluorescent proteins and calmodulin. Nature. 388(6645):882–87 Monod J, Wyman J, Changeux J-P. 1965. On the Nature of Allosteric Transitions: A Plausible Model. J Mol Biol. 12:88–118 Montaño SP, Rice PA. 2011. Moving DNA around: DNA transposition and retroviral integration. Curr Opin Struct Biol. 21(3):370–78 Motlagh HN, Hilser VJ. 2012. Agonism/antagonism switching in allosteric ensembles. Proc. Natl. Acad. Sci. U.S.A. 109(11):4134–39 Motlagh HN, Wrabl JO, Li J, Hilser VJ. 2014. The ensemble nature of allostery. Nature. 508(7496):331–39 Mustafi N, Grünberger A, Kohlheyer D, Bott M, Frunzke J. 2012. The development and application of a single-cell biosensor for the detection of l-methionine and branched-chain amino acids. Metabolic Engineering. 14(4):449–57 Nakai J, Ohkura M, Imoto K. 2001. A high signal-to-noise Ca(2+) probe composed of a single green fluorescent protein. Nat. Biotechnol. 19(2):137–41 Nussinov R, Tsai C-J. 2013. Allostery in disease and in drug discovery. Cell. 153(2):293–305

7 Okumoto S. 2010. Imaging approach for monitoring cellular metabolites and ions using genetically encoded biosensors. Curr Opin Biotech. 21(1):45–54 Okumoto S, Jones A, Frommer WB. 2012. Quantitative imaging with fluorescent biosensors. Annu Rev Plant Biol. 63:663–706 Paige JS, Nguyen-Duc T, Song W, Jaffrey SR. 2012. Fluorescence imaging of cellular metabolites with RNA. Science. 335(6073):1194 Ponting CP, Russell RR. 2002. The natural history of protein domains. Annu Rev Biophys Biomol Struct. 31:45–71 Reichheld SE, Yu Z, Davidson AR. 2009. The induction of folding by ligand binding drives the allosteric response of tetracycline repressor. Proc. Natl. Acad. Sci. U.S.A. 106(52):22263–68 Reynolds KA, McLaughlin RN, Ranganathan R. 2011. Hot spots for allosteric regulation on protein surfaces. Cell. 147(7):1564–75 Reznikoff WS. 2006. Tn5 transposition: a molecular tool for studying protein structure-function. Biochem Soc Trans. 34(Pt 2):320–23 Röthlisberger D, Khersonsky O, Wollacott AM, Jiang L, DeChancie J, et al. 2008. Kemp elimination catalysts by computational enzyme design. Nature. 453(7192):190– 95 Schallmey M, Frunzke J, Eggeling L, Marienhagen J. 2014. Looking for the pick of the bunch: high-throughput screening of producing microorganisms with biosensors. Curr Opin Biotech. 26:148–54 Schuyler AD, Carlson HA, Feldman EL. 2009. Computational methods for predicting sites of functionally important dynamics. J Phys Chem B. 113(19):6613–22 Segall-Shapiro TH, Meyer AJ, Ellington AD, Sontag ED, Voigt CA. 2014. A “resource allocator” for transcription based on a highly fragmented T7 RNA polymerase. Mol. Syst. Biol. 10(7):742 Shah V, Pierre B, Kim JR. 2012. Facile construction of a random protein domain insertion library using an engineered transposon. Anal. Biochem. Smith M. 1985. In vitro mutagenesis. Annu. Rev. Genet. 19:423–62 Smock RG, Gierasch LM. 2009. Sending signals dynamically. Science. 324(5924):198–203 Stemmer WP. 1994. Rapid evolution of a protein in vitro by DNA shuffling. Nature. 370(6488):389–91 Süel GM, Lockless SW, Wall MA, Ranganathan R. 2003. Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nat Struct Biol. 10(1):59–69 Tullman J, Guntas G, Dumont M, Ostermeier M. 2011. Protein switches identified from diverse insertion libraries created using S1 nuclease digestion of supercoiled-form plasmid DNA. Biotechnol Bioeng. 108(11):2535–43 8 Tzeng S-R, Kalodimos CG. 2011. Protein dynamics and allostery: an NMR view. Curr Opin Struct Biol. 21(1):62–67 Ulmer KM. 1983. Protein engineering. Science. 219(4585):666–71 Vogel C, Bashton M, Kerrison ND, Chothia C, Teichmann SA. 2004. Structure, function and evolution of multidomain proteins. Curr Opin Struct Biol. 14(2):208–16 Wand AJ. 2013. The dark energy of proteins comes to light: conformational entropy and its role in protein function revealed by NMR relaxation. Curr Opin Struct Biol. 23(1):75–81 Wells JA, Cunningham BC, Graycar TP, Estell DA. 1986. Importance of hydrogen- bond formation in stabilizing the transition state of subtilisin. Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences. 317(1540):415–23 Wilkinson AJ, Fersht AR, Blow DM, Carter P, Winter G. 1984. A large increase in enzyme-substrate affinity by protein engineering. Nature. 307(5947):187–88

9 Chapter 2 Transposon-Mediated Domain Insertion

2.1 Introduction

A DNA transposon, or transposable element (TE), is a length of DNA that can move within or between genomes (Haren et al. 1999). The enzymes that move these pieces of DNA are known as transposases. Transposons are found in all three domains of life and transposase enzymes are the most abundant and ubiquitous gene class in nature (Aziz et al. 2010). The mechanism of transposon insertion is well studied (Montaño & Rice 2011) and they are continually turned to for innovative genetic and molecular biology tools (Liu & Cropp 2012, Mealer et al. 2008, Rad et al. 2010, van Opijnen et al. 2009).

We set out to use the unique property of transposons, DNA insertions, as a tool for protein engineering. The functional units (or sometimes viewed as evolutionary units) of proteins are called domains and in nature they are often recombined to create new protein functions instead of building a whole new protein from scratch (Aroul-Selvam et al. 2004, Chothia et al. 2003, Vogel et al. 2004). We see transposition as a way to achieve accelerated laboratory domain recombination. Previous work has relied on laborious molecular biology techniques to create domain fusions, with each construct requiring unique cloning steps and reagents (Reynolds et al. 2011). Transposition reactions provide a rapid method to produce all possible domain insertions in one pooled step. A library of such variants is extremely useful for engineering new functions, or studying domain interactions, and interesting candidates can be isolated with an appropriate screen.

2.1.1 In Vitro Transposition While some transposons require multiple protein components in vivo, researchers have distilled the transposition reaction down to a minimal number of components that can be carried out in vitro (Haapa et al. 1999). Using the transposase enzyme MuA, from the bacteriophage Mu, along with a strand of DNA flanked with the Mu R1R2 inverted-repeat (IR) sequences (MuA binding sites) at its ends, random transposition into any target DNA can be carried out (Figure 2-1). The simplicity of this system, along with the fact that insertions are mostly random (Green et al. 2012), make it an ideal platform for developing random DNA-insertion techniques. Lastly, when used in the minimal in vitro system, Mu-based transposons show tolerance to some mutations in the R1R2 inverted repeat region (Goldhaber-Gordon et al. 2002). This flexibility is desirable to modify the transposon sequence for molecular biology purposes.

10 A B C transposon DNA inverted repeats transposase enzyme in vitro in vitro strand transfer construct for complex transformation

target DNA

Figure 2-1: Schematic outline of an in vitro, minimal transposition system (A) The three necessary components: transposon DNA with R1R2 inverted repeats at the ends, transposase MuA enzyme, and any target DNA. (B) MuA binds the inverted repeats of the transposon, forms a multimer transpososome assembly, and then binds a random spot in the target DNA sequence. The transpososome bound to the target DNA is called a strand transfer complex. (C) The transpososome catalyzes the transfer of the transposon strand into the target DNA strand, creating an integrated insertion. A transposon with a selectable marker (shown in yellow) can be selected for in subsequence transformation steps.

2.1.2 Modified Transposons Inserting transposons in random locations throughout genomes or plasmids has proven very useful in many areas of research, such as making knockout libraries for genetic screening. However, as a tool for protein engineering, the amino acid sequence that the transposon DNA encodes is as important as its insertional ability. The necessary IR sequences at both ends have greater than 40 basepairs, which would encode over 14 amino acids (Figure 2-1). While researchers have included the IR sequences in final protein coding sequences (Gregory et al. 2010, Mealer et al. 2008), these extra amino acids prevent transposons from being used to bring domains together in a functional way. The IR region has always encoded a linker between non-interacting domains in previous protein engineering efforts, such as a fluorescent protein tagged onto a protein of interest (Gregory et al. 2010, Jin et al. 2011, Sheridan & Hughes 2004).

One way to harness the utility of the random insertions of transposons, without bringing along the extra amino acids it encodes, is to excise the extra sequence (or the entire transposon) once it has created an insertion. There are a number of molecular biology tools that could accomplish this in theory (e.g. Cre-Lox recombination), but many would be impossible to encode within the constraints of the IR sequences. As mentioned, there is some mutational flexibility but it is only enough leeway to design short stretches of desired sequence. Restriction enzymes offer the ability to cut and paste DNA while only requiring six to eight bases of recognition sequence. Thus, altering a transposon to encode restriction enzyme sites is a feasible means to remove the undesired sequence and paste in any sequence of interest (Figure 2-2). We set out to achieve this goal to enable production of libraries with random domain insertions.

11 transposition DOI

DOI cloning

restriction sites

Figure 2-2: Schematic of a modified transposon with restriction enzyme sites It is not desirable to encode a protein or domain directly within a transposon (dashed arrow), because the needed IR sequences will encode unwanted amino acids. However, if restriction sites are encoded at the ends of a transposon, a two-step process can put the domain of interest into the random insertion point without unwanted scarring amino acids (solid arrows).

2.2 Materials and Methods

2.2.1 Strains and Media The bacterial strain for cloning was 10G E. coli, electrocompetent, from Lucigen Corp. Chemically competent DH5α E. coli were also used, made by standard techniques. The expression strain was BL21 (DE3) E. coli, electrocompetent, also from Lucigen. Recovery Medium for electroporation transformations was provided in the electrocompetent kits. All other growth, unless otherwise noted, was done in Lysogeny Broth (LB Broth MILLER, EMD Millipore) liquid medium or on Lysogeny Broth 1.5% w/v agar media plates (Bacto Agar, Becton, Dickinson and Company). Media and all enzymatic reactions were prepared with purified water from a Barnstead Purification System (Thermo Fisher Scientific Inc.).

2.2.2 DNA manipulation All standard restriction enzymes used were FastDigest Enzymes from Thermo Fisher Scientific Inc. All Type IIs restriction enzymes used in Golden Gate reactions (BsaI and BsmBI) were obtained from NEB (New England Biolabs). Plasmids were isolated from bacterial cultures with a QIAprep Spin Miniprep Kit or a HiSpeed Plasmid Midi Kit (Qiagen). DNA was isolated from agarose gels using a Zymoclean Gel DNA Recovery Kit (Zymo Research Corp.). DNA concentrations were quantified with a NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific Inc.). Electroporation of competent E. coli was carried out using a BTX ECM 630 Exponential Decay Wave Electroporation System (Harvard Apparatus, Inc.). Settings 12 for all electroporations were: 50 μF, 150 Ω, and 1.5 kV. Gene Pulser cuvettes (Bio- Rad Laboratories, Inc., cat# 165-2089), 0.1 cm gap, were used with the instrument.

2.2.3 Creating Modified Transposons by PCR The artificial transposon Entranceposon (CamR-3) (Thermo Fisher Scientific Inc., Cat# F-778) was purchased to use as template in PCR reactions to test modified transposon ends. Because each end of the synthetic transposon is the reverse complement of the other, a single primer was used to amplify a modified version, making the same mutations to both ends. Primers used are listed in Table 2-1 (primers 1 through 8) and all include a BglII restriction site. PCR reaction: 10 μL 5X HF Phusion buffer, 1 μL 10mM dNTPs, 1.5 μL DMSO, 2.5 μL 20 μM primer, 1 μL 1 ng/μL Entranceposon (CamR-3), 0.5 μL Phusion DNA Polymerase, and water to 50 μL. Thermocycler conditions: 98°C 5 min; cycle 35X, 98°C 10 sec, 68°C 20 sec, 72°C 30 sec; 72°C 5 min. After PCR cleanup, BglII and DpnI were used to pre-cut the transposon and to digest the methylated template DNA, respectively. Digested DNA was purified with a PCR cleanup kit.

2.2.4 Creating Modified Asymmetric Transposons by Cloning All PCR reactions for cloning transposon sequences used Phusion High-Fidelity Polymerase (Thermo Fisher Scientific Inc.) with the manufacturer’s recommended conditions, plus the addition of DMSO (3% v/v final concentration). Annealing temperatures were calculated with the Thermo Scientific “Tm calculator” that is specific for Phusion reactions. PCR products were either gel purified or, if an analytic gel shows a single band, treated with DpnI to digest template DNA and purified with a PCR cleanup kit. Golden Gate cloning (Engler et al. 2008) was used to assemble inserts and plasmids to create transposon-propagation vectors. Briefly, 20 fmoles of each DNA component was mixed with 5 units Type IIs enzyme, 400 units T4 DNA (NEB), and 1X T4 DNA Ligase Reaction Buffer (NEB) in a total volume of 10 μL. The reaction was incubated 2 minutes at 45°C, 5 minutes at 16°C (first two steps cycled 25 times), 20 minutes at 60°C, and 20 minutes at 80°C.

Either 5 or 10 μL of the Golden Gate reaction was used to transform 50 or 100 μL of chemically competent cloning cells, respectively. Correct assembly of asymmetric transposons within a transposon-propagation vector was selected for on LB plates with 25 μg/mL chloramphenicol (500 μM IPTG (Amresco, Inc.) and 25 μg/mL X-Gal (Gold Biotechnology, Inc.) was sometimes used in plates for blue/white screening to detect loss of the lacZ-α dropout fragment from the destination vectors). Growth of colonies in liquid LB medium with 25 μg/mL chloramphenicol and 50 μg/mL kanamycin ensured that the selectable markers from both the plasmid backbone and transposon insert were functional.

The artificial transposon Entranceposon (M1-CamR) was purchased from Thermo Fisher Scientific Inc. (cat# F-760). F-760 is very similar to F-778 used above, but a bit shorter, because it lacks synthetic priming sites, and it includes mutations that produce NotI restriction sites at the very ends. Both have a selectable CamR gene. F- 13 760 was used because F-778 has internal BamHI (2) and NotI (1) sites, which would prevent the use of these sites in modified ends of the transposon. Using this F-760 transposon as template, Golden Gate cloning was used to remove the two internal BsmBI restriction sites along with the NotI sites at the transposon ends (returning the ends to the native Mu R1R2 sequences). This was accomplished by amplifying the transposon as three separate pieces via PCR and cloning them into a Golden- Gate-compatible pET47GG-BsaI-Dest plasmid using BsaI (primers 9 through 14, Table 2-1). This transposon-propagation plasmid, pET47GG-Entranceposon-M1- CamR, encodes a transposon with wild-type R1R2 ends and was used as a PCR template to create asymmetric transposons with modified ends.

Because transposon sequences have ends that are reverse-compliment, a primer will anneal to either end indiscriminately. But, amplifying the transposon as two halves and reassembling them in a destination vector allowed the creation of asymmetric sequences in a transposon-propagation plasmid. To create the modified transposon BsmBI-M1-CamR, primer sets 15/16 and 17/18 (Table 2-1) were used to amplify two separate halves of wt-entranceposon-M1-CamR from the plasmid pET47GG-Entranceposon-M1-CamR. The two inserts split the CmR gene, which allowed for selection of properly assembled halves when cloning. These inserts were cloned into pET47GG-BsaI-Dest using a Golden Gate reaction, creating the asymmetric transposon-propagation vector pET47GG-BsmBI-M1-CamR. Likewise, the transposon BsaI-M1-CamR was cloned with primer sets 19/20 and 21/22 (Table 2-1) using a Golden Gate reaction with pET47GG-BsmBI-Dest and BsmBI creating the transposon-propagation vector pET47GG-BsaI-M1-CamR.

For variations of the BsaI-M1-CmR transposon ends, the internal primers 20 and 21 were used again with new transposon-end primers, 23 through 26 (Table 2-1). Specifically: BsaI-M1-CmR-1 23/20 and 21/24; BsaI-M1-CmR-2 23/20 and 21/26; BsaI-M1-CmR-3 25/20 and 21/24; BsaI-M1-CmR-4 25/20 and 21/26. Primers 23- 26 are labeled A to D, so variations can also be thought of combinations of these primer ends: var. 1:A/B, var. 2:A/D, var. 3:C/B, and var. 4:C/D. These variations were all cloned into pUCGG-KanR-BsmBI-Dest, which allowed for higher DNA yields of the transposon-propagation vector from minipreps than pET47-based plasmids. The final transposon-propagation vectors were named pUCGG-KanR-BsaI-M1-CmR- N, where N is the variation number described above.

2.2.5 Excising Transposon DNA from Plasmids The modified transposons were encoded in plasmids so that they could be excised with restriction enzymes BglII and HindIII. Approximately 2 μL of each restriction enzyme was used to digest 5 μg of the propagation plasmid over 3 hours at 37°C. Transposon DNA was separated from the plasmid backbone by agarose-gel electrophoresis; the transposon is 1254 basepairs while the backbone is 2086 basepairs (for pUCGG-KanR) or 4990 basepairs (for pET47GG). The desired smaller band was excised and extracted. The isolated transposon was eluted with Buffer EB

14 (Qiagen, 10 mM Tris-Cl, pH 8.5) because the Zymo elution buffer contains EDTA, which can inhibit downstream reactions.

2.2.6 Transposition Reactions PCR-Generated Modified-Transposon Reactions MuA Transposase enzyme was purchased from Thermo Fisher Scientific Inc. (Cat. #F-750). Reactions with PCR-amplified modified transposons were based on the manufacturer’s protocol (manual from F-702, Template Generation System II Kit). In brief, the reaction was a total volume of 10 μL: 2 μL 5X MuA reaction buffer, 10 ng modified transposon (pre-cut), target plasmid DNA (~2.5 molar ratio to transposon DNA; 50 ng for pUC19, 2686 basepairs), and 0.5 μL 0.22 μg/μL MuA transposase enzyme. The reaction was incubated for 18 hours at 30°C followed by 10 minutes at 75°C to heat inactivate MuA. The entire 10 μL reaction was added to 100 μL of chemically competent DH5α cells and transformed with standard techniques. Recovered cells were plated on LB-agar plates with appropriate antibiotics (25 μg/mL chloramphenicol to select for transposon insert). Reaction efficiency was calculated by comparing the number of CFUs from a reaction to the CFUs from the reaction using the wild-type transposon (given as percentage of wild-type reaction).

Modified-Asymmetric-Transposon Reactions Transposition reactions were a total volume of 20 μL with the following components: 4 μL 5X MuA reaction buffer, 100 ng modified transposon (purified from transposon-propagation plasmid), target plasmid DNA (~0.5 molar ratio to transposon DNA), and 1 μL 0.22 μg/μL MuA transposase enzyme. Reactions were carried out for 18 hours at 30°C followed by 10 minutes at 75°C to heat inactivate MuA. Completed reactions were cleaned up with a DNA Clean & Concentrator-5 Kit (Zymo Research Corp.) using 40 μL of DNA Binding Buffer for sample preparation and 6 μL of water for elution.

2 μL of cleaned-up transposition reaction was transformed into 25 μL of 10G E. coli using electroporation. The transformation was recovered with 975 μL of Recovery Medium at 37°C, 250 rpm shaking for 1 hour. An aliquot of the recovery culture was spread on an LB-agar plate with appropriate antibiotics to assess reaction efficiency (25 μg/mL chloramphenicol to select for transposon insertion and 50 μg/mL carbenicillin to select for the target-plasmid backbone). Reaction efficiency was calculated as above with PCR-generated transposons. In addition, comparing the volume that was plated to the total recovery volume, a total theoretical CFU value was calculated to estimate the total number of cells containing a plasmid with a successful insertion from the whole reaction/transformation process.

The recovery culture was then centrifuged at 4000 g for 5 minutes to pellet the cells and remove supernatant. Cells were resuspended in 50 mL LB with 25 μg/mL chloramphenicol and 50 μg/mL carbenicillin to select for transposon insertions and the target-plasmid backbone, respectively. It was difficult to obtain pure excised transposon DNA with gel purification – some full length transposon-propagation

15 plasmid always seemed to get through – so selecting for the target plasmid’s marker is important (the transposon-propagation plasmid and the target plasmid must have different backbone selectable markers). The culture was grown overnight at 37°C, 250 rpm. A midi-prep was carried out on the culture the following day to harvest the plasmid library containing transposon inserts.

2.2.7 Cloning into Modified Transposons Pools of transposon-inserted plasmids were purified from no-insert plasmid by agarose-gel electrophoresis. Approximately 5 μg of supercoiled plasmid DNA was added to extra wide wells in an agarose gel and run at low voltage (~3 V/cm) until good resolution was achieved. The major transposon-insertion-library band was cut out and the DNA extracted, leaving the smaller no-insert plasmid band.

The circularly permuted super-folder green fluorescent protein (cpsfGFP) gene in initial tests was amplified from the plasmid pCF-cpsfGFP. For cloning into BsmBI- M1-CmR, primers 27 and 28 (Table 2-1) were used, which added the compatible BsmBI sites to either end. Primers 29 and 30 were used to amplify cpsfGFP for cloning into the BsaI-M1-CmR transposon. For testing 4 variations of BsaI-M1-Cmr, cpsfGFP was amplified from pET47GG-cpsfGFP-GTGmut, which had the second codon mutated GTG->GTT. Four combinations of four primers were used to amplify inserts for the four BsaI-M1-CmR variations: Primers 31 to 34. Analogous to the cloning of the modified transposon variations: ver. 1:31/32, ver. 2:31/34, ver. 3:33/32, ver. 4:33/34. The PCR reactions followed the Phusion High-Fidelity Polymerase manufacturer’s recommended protocol. Annealing temperatures were calculated with the Thermo Scientific “Tm calculator.” PCR products were treated with DpnI to digest template DNA and purified with a PCR cleanup kit.

A Golden Gate reaction was used to exchange the modified transposon, harbored within each plasmid of the library, with the PCR amplified cpsfGFP. Briefly, 40 fmoles of each DNA component were mixed with 10 units Type IIs enzyme, 800 units T4 DNA Ligase (NEB), and 1X T4 DNA Ligase Reaction Buffer (NEB) in a total volume of 20 μL. The reaction was incubated 2 minutes at 45°C, 5 minutes at 16°C (first two steps were cycled 50 times), 20 minutes at 60°C, and 20 minutes at 80°C. The completed reaction was purified and concentrated with a PCR cleanup kit, eluted with 6 μL water.

2 μL of cleaned-up reaction was transformed into 25 μL of BL21 (DE3) E. coli using electroporation. The transformation was recovered with 975 μL of Recovery Medium at 37°C, 250 rpm shaking for 1 hour. An aliquot of the recovery culture was spread on different LB-agar plates with appropriate antibiotics to assess efficiency and visualize fluorescence of GFP insert. A plate with 50 μg/mL carbenicillin and 1% w/v glucose was used to select for the target-plasmid backbone and repress protein expression, respectively, to determine the transformation total. With the observed CFU value, comparing the volume that was plated to the total recovery volume allowed the total theoretical CFU value to be calculated. A second plate with 25

16 μg/mL chloramphenicol was used to select for CmR-transposon inserts to determine amount of contaminating un-cut plasmid (that still contains the modified transposon). If there is a large amount of un-cut plasmid after the Golden Gate reaction, it may indicate bad or old restriction enzyme. A final plate with 50 μg/mL carbenicillin and no glucose was used to observe fluorescence of colonies; images of GFP fluorescence were taken with a Fuji LAS 4000.

The recovery culture was then centrifuged at 4000 g for 5 minutes to pellet the cells and remove supernatant. Cells were resuspended in 6 mL LB with 50 μg/mL carbenicillin and 1% w/v glucose. Glucose is added to repress any protein expression in BL21 (DE3) during growth. The culture was grown overnight at 37°C, 250 rpm. A mini-prep was carried out on the culture the following day to harvest the plasmid library, containing cpsfGFP inserts, for storage.

2.2.8 Target/Expression Plasmids Plasmids that were targeted with transposons were created by cloning a gene of interest into the plasmid pET14GG-CHis-Dest. This is a Golden-Gate-compatible vector, with BsaI restriction sites, derived from pET-14b from Novagen (EMD Millipore). The GG cut sites are GTCC and AGCG (5’->3’ in direction of ORF; these are the standard sites used in all of our GG cloning vectors). The E. coli fadR gene was amplified using primers 35 and 36 (Table 2-1) with a standard Phusion Polymerase protocol. MBP was amplified with primers 37 and 38. MBP was the malE sequence from E. coli without the signal sequence and a point mutation (MBP Gly19, i.e. malE Gly45, GGT->GGA) to eliminate a BsaI recognition site. The target genes were cloned into pET14GG-CHis-Dest using a Golden Gate reaction. Briefly, 20 fmoles of plasmid and 20 fmoles of gene insert were mixed with 5 units BsaI (NEB), 400 units T4 DNA Ligase (NEB), and 1X T4 DNA Ligase Reaction Buffer (NEB) in a total volume of 10 μL. The reaction was incubated 2 minutes at 45°C, 5 minutes at 16°C (first two steps cycled 25 times), 20 minutes at 60°C, and 20 minutes at 80°C. Either 5 or 10 μL of the Golden Gate reaction is used to transform 50 or 100 μL of chemically competent cloning cells, respectively. Proper cloning was detected by plating on LB agar with 500 μM IPTG and 25 μg/mL X-Gal for blue/white screening. Plates also contained 100 μg/mL carbenicillin to select for the pET14GG backbone.

17 Table 2-1: List of DNA primers Lower case letters indicate bases that anneal to the template # Name Sequence (5’->3’) 1 WT CACACCAAGATCtgaagcggcgcacgaaaaac 2 A-C CACACCAAGATCtgaCgcggcgcacgaaaaacgcg 3 AarI CACACCAAGATCtgCagGTgcgcacgaaaaacgcgaaag 4 MlyI (Jones) CACACCAAGATCtgaCTcggcgcacgaaaaacgcg 5 BsmBI Mut1 CACACCAAGATCtgaagGAgACGacgaaaaacgcgaaagcgtttcac 6 BsmBI Mut2 CACACCAAGATCtgaagcggGAGacgaaaaacgcgaaag 7 AarI-2 CACACCAAGATCtgaagcggcgcaGgTGaaacgcgaaa 8 BsaI CACACCAAGATCtgaagcggAgACcgaaaaacgcgaaagcgtttcac 9 bglII-MuGG-F CACACCAGGTCTCAGTCCCCAGAGGATTAgatctgAAgcGgcgcacgaaaaacg 10 M1-mut-insert1-R CACACCAGGTCTCAAtctcattttcgccaaaag 11 M1-mut-insert2-F CACACCAGGTCTCAagaTgttgatcggcacgtaagag 12 M1-mut-insert2-R CACACCAGGTCTCAACacgaaaaacatattctcaataaacc 13 M1-mut-insert3-F CACACCAGGTCTCAgtGtcagccaatccctgg 14 hindIII-MuGG-R CACACCAGGTCTCACGCTATACATAGCAAAGCTTGAAgcGgcgcacgaaaaacg 15 BsmBI-Mu-GG-F CACACCAGGTCTCAGTCCagatctgTGgGAgACGacgaaaaacgcgaaagcgttt cac 16 CmR_MuTrans_intern CACACCAGGTCTCAcgtaacacgccacatcttgcg al_R 17 CmR_MuTrans_intern CACACACGGTCTCAtacggtgaaaacctggcctatttcc al_F 18 BsmBI-Mu-GG-R CACACACGGTCTCACGCTaagcttgaagGAgACGacgaaaaacgcgaaagcgttt cac 19 BsaI-Mu-GG-F CACACCACGTCTCAGTCCagatctgTGTcggAgACcgaaaaacgcgaaagcgttt cac 20 CmR_MuTrans_intern CACACCACGTCTCAcgtaacacgccacatcttgcg al_R_bsmbi 21 CmR_MuTrans_intern CACACACCGTCTCAtacggtgaaaacctggcctatttcc al_F_bsmbi 22 BsaI-Mu-GG-R CACACACCGTCTCACGCTaagcttgaaCcggAgACcgaaaaacgcgaaagcgttt cac 23 BsaI-Mu-GG-F_A CACACCACGTCTCAGTCCagatctgCaTcggAgACcgaaaaacgcgaaagcgttt cac 24 BsaI-Mu-GG-R_B CACACACCGTCTCACGCTaagcttgaCgcggAgACcgaaaaacgcgaaagcgttt cac 25 BsaI-Mu-GG-F_C CACACCACGTCTCAGTCCagatctACTTcggAgACcgaaaaacgcgaaagcgttt cac 26 BsaI-Mu-GG-R_D CACACACCGTCTCACGCTaagcttgaGgTggAgACcgaaaaacgcgaaagcgttt cac 27 cpsfGFP-MuBsmBI-F ACCAACCCGTCTCTTGTGAGcaacgtgtatatcacggccgac 28 cpsfGFP-MuBsmBI-R CACACCACGTCTCATGAACcgttatattcaagcttatggcccagg 29 cpsfGFP-MuBsaI-F CACACCAGGTCTCATGTCTaacgtgtatatcacggccgacaag 30 cpsfGFP-MuBsaI-R CACACCAGGTCTCAAACcgttatattcaagcttatggcccaggatg 31 cpsfGFP-MuBsaI-F_A CACACCAGGTCTCACATCTaacgtttatatcacggccgacaag 32 cpsfGFP-MuBsaI-R_B CACACCAGGTCTCAACGCgttatattcaagcttatggcccaggatg 33 cpsfGFP-MuBsaI-F_C CACACCAGGTCTCACTTCTaacgtttatatcacggccgacaag 34 cpsfGFP-MuBsaI-R_D CACACCAGGTCTCAAGGtgttatattcaagcttatggcccaggatg 35 FadR_GG_F CACACCAGGTCTCAGTCCgtcattaaggcgcaaagcccg 36 FadR_GG_R CACACCAGGTCTCACGCTtcgcccctgaatggctaaatcac 37 MBP_GG_F CACACCAGGTCTCAGTCCaaaatcgaagaaggtaaactggtaatctgg 38 MBP_GG_R CACACCAGGTCTCACGCTcttggtgatacgagtctgcgc

18 2.3 Results

2.3.1 Possible Restriction Sites for Modified Transposons To design modified transposons that can be excised with restriction enzymes, important properties were first established to have a basis for testing and prioritization. First, it is critical to remove as much of the transposon sequence as possible to minimize “scar” amino acids left over. This allows proteins or domains to be brought into close contact in the final construct. The second important property for any modified transposon is the ability to transpose. Sufficient transposition activity for protein engineering purposes is one that can produce an adequate library size for the given goal or experiment (i.e. the number of plasmids that achieved an insert is enough for one’s purposes). This will often be the library with enough members to cover all possible insertion sites in a given DNA sequence. There are no set rules on which nucleotide bases in the IR sequences can be altered without affecting the transposition efficiency greatly (the one exception being the final base pair of the transposon, which is strongly sensitive to changes (Goldhaber- Gordon et al. 2003a)), so any modifications must be tested for adequate function.

A programming script written in Python was used to search for possible restriction enzyme sites that could be added to the transposon sequence with a minimal number of mutations (Figure 2-3) (full script in Appendix B). The script searched for enzyme sites, from a set of common commercially available restriction enzymes, that allowed for up to three mutations relative to the wild-type R1R2 transposon sequence. The resulting list was pared down by selecting variants that removed the most of the transposon sequence when cut (i.e. cut sites closest to the end of the transposon), without mutating the initial T, which is required in the sequence. There are over a dozen unique cut sites that can be encoded within the last 9 basepairs of the transposon sequence using 3 mutations or fewer (Figure 2-3). This provides a valuable list of possible modified transposons that could be used for protein engineering purposes. In addition, these mutations could be combined to have asymmetric cut sites at the transposon ends.

19

Figure 2-3: Mutations in the transposon sequence that produce restriction sites The final 27 bases of the Mu R1R2 transposon sequence are shown as inserted into a gene of interest (GOI). Mutations relative to the wild-type sequence are highlighted in blue. Restriction enzyme recognition sequences are double underlined. Restriction enzyme cut sites are illustrated in orange.

2.3.2 PCR-Generated Modified Transposons Initial testing of modified transposon ends was accomplished with PCR amplification of a transposon to introduce mutations. For efficient transposition reactions in vitro, the transposon DNA must be pre-cut at the 3’ ends with 5’ overhangs (Savilahti et al. 1995). A BglII restriction site was added to the primers so the amplified product could be pre-cut at both ends in this required way. A commercially available minimal Mu transposon was used as a PCR template (Entranceposon, from Thermo Fisher) that encodes only a chloramphenicol acetyltransferase (CAT or CmR) and CAT promoter between Mu R1R2 inverted repeats (see schematic in Figure 2-1A). The two ends are the reverse compliment of each other, so a single PCR primer was used to introduce identical mutations at both

20 ends of the transposon sequence. The PCR products were tested for transposition efficiency into a pUC19 plasmid catalyzed by MuA enzyme (Figure 2-4). The efficiency of reaction was determined by transforming the completed reaction into chemically competent E. coli and measuring the CFUs on plates with selection against the transposon-encoded CmR cassette. Cells will only have resistance against chloramphenicol if they take up a plasmid with a successfully inserted transposon.

Figure 2-4: Tested mutations within transposon ends Introduced mutations are highlighted with orange boxes. Highlighted mutations were added to both ends of the transposon through PCR amplification. Efficiency of the transposition reaction with the listed mutations is reported relative to the efficiency of a transposon with no mutations (wild-type or WT). The top sequence shows a consensus sequence from comparing all inverted repeats (MuA recognition sites) in the Mu genome and highlighted boxes show experimentally determined positions that have contact with MuA (Goldhaber-Gordon et al. 2003b). Below that is a wild-type sequence with black highlighting of bases that deviate from the consensus. Sequences used in previous publications are: MlyI (Jones 2005) and AarI (Hoeller et al. 2008).

A group of seven modified transposons were tested for transposition ability (Figure 2-4). Of the six modified sequences encoding a restriction site, two were previously published: MlyI (Jones 2005) and AarI (Hoeller et al. 2008). All restriction sites were Type IIs sites. This provides the added option of modifying the sequence in the cut site since it is separate from the enzyme recognition site (see Figure 2-3). The previously published sequences were tested for comparison purposes, but their cut sites go into the target DNA so they would be difficult to use for protein engineering. The modified transposons tested also ranged in the number and location of mutations (Figure 2-4). We found that the fourth and fifth base allowed mutations that resulted in the smallest decrease in transposition efficiency (samples: A-C, MlyI), as was expected from the Mu IR consensus sequence (Goldhaber-Gordon et al.

21 2003b). However, it was also seen that mutations could be made in more evolutionary conserved regions while still retaining transposition ability (samples: AarI, BsmBI_Mut1, BsaI). This is not too surprising, though, when considering that the inverted repeat R1 sequence has four bases in conflict with the consensus sequence, indicating that the consensus does not indicate functional permissibility.

From the initial screening of modified transposons, we picked two to further test their usefulness in protein engineering: BsmBI_Mut1 and BsaI. Even though the AarI sequence showed good activity, the AarI restriction enzyme can be difficult to work with; complete cleavage of some sequences is not achieved and the enzyme requires two copies of the recognition sequence so extra DNA oligonucleotides must be added. BsmBI and BsaI, on the other hand, have been tested robustly in the development of Golden Gate cloning strategies (Sarrion-Perdigones et al. 2011), and are used often in our laboratory. The two modified transposons that showed activity with these sites also have cut sites very close to the end of the transposon – within the last 6 bases (Figure 2-3). The BsmBI_Mut1 sequence showed very low activity, but moving to a more efficient transformation method (electroporation) may produce libraries of sufficient size for protein engineering. And, it is desirable to pursue because the cut site removes as much of the transposon as possible (leaving only a 4-bp sticky end from the transposon).

2.3.3 Plasmid-Propagated Asymmetric Transposons The modified transposons, with either BsmBI or BsaI, were next tested to establish how large a library they could produce when inserted into a plasmid and transformed into electrocompetent cells. To accomplish this, the creation of transposon DNA was shifted from PCR amplification to isolation from plasmid- borne sequences. PCR has a limited fidelity, thus encoding a transposon within a bacterium-propagated plasmid allows for sequence-verified DNA production. In addition, asymmetric transposons can be encoded in plasmids but would not be possible to make in a single PCR amplification. Plasmids were created with a developed PCR method that amplified the transposon template in two pieces that allowed easy Golden Gate cloning (Engler et al. 2008) into a destination vector (see Materials and Methods). A commercially available transposon, which is functionally identical to the one used for the initial testing, was used as a template for the PCR. The final construct can be verified by sequencing before use in a transposition reaction. In addition, the necessary pre-cutting of the transposon can also provide release of the transposon from the backbone of the plasmid; the excised transposon sequence is then gel-purified from the backbone DNA and is ready for transposition.

Two test transposons were created: BsmBI-M1-CmR and BsaI-M1-CmR (Figure 2-5). Both had modifications relative to the wild-type sequence, not only to create the Type IIs recognition site, but also in the cut sites to encode for favorable amino acids if a domain were to be cloned into the sites. It is important to note that the two ends of a Mu transposon insert 5 basepairs apart into opposite strands of the target DNA and these gaps are filled in upon transformation into E. coli or other compatible host

22 strains. This creates a 5-bp duplication in the target DNA, which is a common result of all transposon reactions (with varying lengths of duplication). This duplication must be taken into account when designing cloning strategies, such as those laid out in Figure 2-5, to ensure all coding sequences are in-frame. A transposon can insert in any of the three possible codon frames, so a researcher must pick one of the three to be in-frame after cloning. We picked the frame that would disturb the original open reading frame (ORF) the least – the initial T inserts into the wobble position of a target-sequence codon while the final three bases of the transposon complete a full codon (Figure 2-5). Either of the two other reading frame insertions would cause more randomization of amino acids by combining target-sequence bases with transposon bases in codons.

23 target gene A … N1N2N3N4N5 … N1N2N3N4N5

AA1 AA2 transposon DNA CmR transposition

BsaI BsaI … N1N2N3N4N5TGTGTCGGAGACC CmR GGTCTCCGGTTCAN1N2N3N4N5… N1N2N3N4N5ACACAGCCTCTGG CCAGAGGCCAAGTN1N2N3N4N5 BsaI + BsaI CACACCAGGTCTCATGTCTNNNNNN NNNNNNGGTTTGAGACCTGGTGTG GTGTGGTCCAGAGTACAGANNNNNN DOI NNNNNNCCAAACTCTGGACCACAC AA AA AA AA golden gate rxn

… N1N2N3N4N5TGTGTCTNNNNNN NNNNNNGGTTCAN1N2N3N4N5N6 … N1N2N3N4N5ACACAGANNNNNN DOI NNNNNNCCAAGTN1N2N3N4N5N6

AA1 AA2 Val Ser AA AA AA AA Gly Ser AA1 AA2

target gene B … N1N2N3N4N5 … N1N2N3N4N5

AA1 AA2 transposon DNA CmR transposition

BsmBI BsmBI … N1N2N3N4N5TGTGGGAGACGAC CmR GTCGTCTCCTTCAN1N2N3N4N5… N1N2N3N4N5ACACCCTCTGCTG CAGCAGAGGAAGTN1N2N3N4N5 BsmBI + BsmBI ACCAACCCGTCTCTTGTGAGCNNNNNN NNNNNNGGTTCATGAGACGTGGTGTG TGGTTGGGCAGAGAACACTCGNNNNNN DOI NNNNNNCCAAGTACTCTGCACCACAC AA AA AA AA golden gate rxn

… N1N2N3N4N5TGTGAGCNNNNNN NNNNNNGGTTCAN1N2N3N4N5N6 … N1N2N3N4N5ACACTCGNNNNNN DOI NNNNNNCCAAGTN1N2N3N4N5N6

AA1 AA2 Val Ser AA AA AA AA Gly Ser AA1 AA2

Figure 2-5: Overview of modified transposon insertion and subsequent cloning Transposition creates a 5-bp duplication in the target coding sequence (gray). Accounting for that duplication, a domain of interest (DOI, in blue) can be cloned into the modified transposon such that its reading frame matches that of the target gene. Letters in bold italics indicate sequence of the DOI that is derived from a primer. (A) BsaI-M1-CmR modified transposon and example of subsequent cloning. (B) BsmBI-M1-CmR modified transposon and example of subsequent cloning.

24 The two new plasmid-derived transposons were tested in transposition reactions with the plasmid pET14GG-CHis-FadR. The completed reaction of each was concentrated and cleaned up by column purification and transformed by electroporation into commercial high-efficiency competent cells. A plasmid-derived wild-type transposon was tested in parallel to compare reaction efficiencies. After recovery growth from electroporation, an aliquot of cells was plated on LB plates with selection for a transposon insert as well as the target plasmid backbone to determine the reaction efficiencies based on the number of CFUs produced. BsaI- M1-CmR had a relative efficiency of 10%, with a theoretical total of approximately 150,000 CFUs per transformation. This relative efficiency is the same as was found for BsaI mutations in the earlier PCR-generated transposon testing (Figure 2-4). Furthermore, the number of successful transposon insertions from one reaction and transformation is very high compared to the number of possible insertion sites in common plasmids (~10,000; 5,000 basepairs x 2 insertion directions). BsmBI-M1- CmR showed a relative efficiency of 2.5%, which is much greater than the value found previously for BsmBI mutations (BsmBI_Mut1 in Figure 2-4). The theoretical total CFU value from the reaction and transformation was approximately 40,000. This value is just large enough to create a library that gives 3X coverage over the number of possible insertions into a common plasmid. The discrepancies in efficiencies between these tests and previous PCR-generated ones is likely due to the additional mutations to the cut site region of the modified transposons. However, there may also be some influence from the use of a different target plasmid (pET14GG-CHis-FadR versus pUC19) as we have seen effects when switching between target plasmids (data not shown – no conclusive trends seen).

A library of plasmids with a transposon insertion must be free from plasmids with no insertion. Plasmids with an insertion were selected for by growth in medium containing chloramphenicol so that cells without the transposon sequence would be unable to grow. The culture was midi-prepped after growth to isolate the library of plasmids. Assessment of selection and purity of plasmids with an insertion was performed by agarose-gel electrophoresis of linearized plasmid (Figure 2-6). A transposon insertion increases the length of a plasmid by a fixed amount (1259 basepairs) so comparing sequence sizes present in a sample can indicate the purity achieved from a selection growth. Our optimized antibiotic and transformation dilution conditions produced very pure library samples (Figure 2-6), with no-insert plasmid making up a limiting amount of the pool. However, later tests of cloning found that even this small amount of no-insert plasmid can overtake an insertion library after a Golden Gate reaction and subsequent transformation because it is never digested (restriction sites are only gained from the transposon) and there is no counter selection for removal of the transposon. Thus, the enriched insertion library still had to be gel purified (we separated the supercoiled insertion-library from the no-insert plasmid on a gel, but one could also separate a pre-digested sample during the cloning step). It is also important to note that purity after growth selection does not reflect efficiency from the transposition reaction – even a poor reaction that created just a handful of successful insertions should be enriched in the correct growth selection. 25 pET14GG-Chis-FadR (5346 bp) BsmB1- BsaI- control M1- M1- CmR CmR

10000 8000 successul insert 6000 no insert 5000 4000 3500 3000 2500 2000 1500

1000 750 500

250

Figure 2-6: Agarose-gel analysis of transposon-insert libraries All DNA samples were digested with HindIII enzyme to linearize the plasmid. pET14GG-CHis-FadR was the target plasmid in all transposition reactions. Control lane is pET14GG-CHis-FadR alone (no transposition). Arrows indicate expected size of a plasmid with or without a transposon insert (an insert adds 1259 basepairs). DNA ladder has length of fragments in bands (in # of basepairs) listed to the left. Trace amounts of no-insert band are present in the transposon samples, but do not show up well in this image.

2.3.4 Cloning into Modified Transposons The modified transposons, now harbored in a plasmid library, were next tested for their ability to act as a cloning site with their modified ends. The circularly permuted super-folder green fluorescent protein (cpsfGFP) (Pédelacq et al. 2006) was used as a test domain to clone into the modified transposons. The domain of cpsfGFP is advantageous because, by design, the N- and C- termini are adjacent causing the least disturbance to the target gene into which it is inserting. Additionally, cpsfGFP is fluorescent when properly folded, which allows for a screen of in-frame insertions. This domain was amplified by PCR to add compatible restriction sites to clone into the modified transposon (primer and cloning design

26 shown for both transposons in Figure 2-5). The amplified domain for each modified transposon was cloned into its respective insertion library using Golden Gate (GG) cloning and transformed in to BL21 (DE3) expression cells. When plated, the libraries from both modified transposons showed successful cloning of the fluorescent domain (Figure 2-7). Moreover, the calculated total of transformants was always greater than 106 per GG reaction and transformation, with the rate of transposon-harboring plasmid always being less than one in 1000. This is favorable for applications requiring the generation of more complex libraries.

cpsfGFP control FadR control

FadR-cpsfGFP FadR-cpsfGFP BsmBI Library BsaI Library

Figure 2-7: Expression testing of libraries after cloning of a fluorescent domain BL21 (DE3) was transformed with different constructs or libraries, all in a pET14GG-CHis plasmid, and plated on LB media with carbenicillin (no inducer added – expression is from leakiness of the strain). Fluorescence image taken of all plates – darker pixels indicate larger fluorescent signal. cpsfGFP control: pET14GG-CHis-cpsfGFP149. FadR control: pET14GG-CHis-FadR. FadR-cpsfGFP BsmBI library: cpsfGFP cloned into BsmBI-M1-CmR insertion sites within pET14GG-CHis-FadR plasmids. FadR-cpsfGFP BsaI library: cpsfGFP cloned into BsaI-M1-CmR insertion sites within pET14GG-CHis-FadR plasmids.

The libraries of FadR with fluorescent domain inserts showed varying levels of fluorescence when colonies on plates were imaged (Figure 2-7). This is not unexpected because every 5 out of 6 inserts in the ORF should be out of frame or in the reverse orientation. The insert could also be outside of the ORF entirely. Sequencing of selected colonies (both fluorescent and non-fluorescent) showed that the transposition and cloning strategy created insert sequences and duplication

27 events as expected (see Figure 2-5). But, while some fluorescent colonies had cpsfGFP inserts in-frame with the FadR ORF, a good portion also showed inserts out-of-frame. It was unexpected to have out-of-frame insertions for constructs that showed fluorescence. However, upon further examination of the cloning strategy we found that a cut site used in both transposons encoded a GTG codon that may have acted as a start codon for a ribosome. Once the fluorescent domain was cloned in, this start codon would translate a cpsfGFP on its own with some random amino acids at its C-terminus, until a stop codon was hit. This could explain the unexpected fluorescence and indicated a need to redesign the modified transposons to prevent translation of protein fragments.

Both of the modified transposons used for cloning, BsmBI-M1-CmR and BsaI-M1- CmR, were successful in carrying out the strategy. For a redesign, we decided to only pursue the BsaI-M1-CmR modified transposon to allow more complete testing of one system. This transposon was also much more efficient in transposition reactions, so generally more useful for library creation. Also, it leaves four amino acids encoded from scar basepairs, while BsmBI-M1-CmR encodes three amino acids at a minimum, thus not reducing scar amino acids significantly. Lastly, BsaI- M1-CmR has a cut site that is in a very non-conserved region of the transposon sequence (Figure 2-4), while BsmBI-M1-CmR has a cut site at the very end where the last two basepairs are highly conserved. This should give BsaI-M1-CmR greater flexibility in changing the composition of the cut site without impairing its transposition ability.

2.3.5 Optimizing a BsaI-Modified Transposon Four new versions of BsaI-M1-CmR were designed and created that vary the sequences in the cut sites of BsaI while removing the problematic GTG codon (Figure 2-8). Each of the four versions has a combination of two possible 5’ and two possible 3’ ends; this will remove the GTG and also test for the best transposition ability among variants that encode small, innocuous amino acids. The cut site sequence from the wild-type transposon was not used because it encodes a glutamate (Figure 2-8) which is generally charged and may interfere with the normal function of the two domains being brought together. The test domain used for cloning into the modified transposons was also modified because it had a GTG in its second codon (amino acid 150 of GFP based on standard numbering) that may have also acted as a start codon. This codon was mutated to a GTT, still encoding a valine.

28 relative transposon sequence after cloning efficiency

… N1N2N3N4N5TGAAGCGNNNNNN NNNNNNGCTTCAN1N2N3N4N5N6 … wt ends N1N2N3N4N5ACTTCGCNNNNNN DOI NNNNNNCGAAGTN1N2N3N4N5N6 -

AA1 AA2 Glu Ala AA AA AA AA Ala Ser AA1 AA2 * ** * … N1N2N3N4N5TGTGTCTNNNNNN NNNNNNGGTTCAN1N2N3N4N5N6 … ver. 0 N1N2N3N4N5ACACAGANNNNNN DOI NNNNNNCCAAGTN1N2N3N4N5N6 4.8%

AA1 AA2 Val Ser AA AA AA AA Gly Ser AA1 AA2 * * * … N1N2N3N4N5TGCATCTNNNNNN NNNNNNGCGTCAN1N2N3N4N5N6 … ver. 1 N1N2N3N4N5ACGTAGANNNNNN DOI NNNNNNCGCAGTN1N2N3N4N5N6 8.3%

AA1 AA2 Ala Ser AA AA AA AA Ala Ser AA1 AA2 * * * * … N1N2N3N4N5TGCATCTNNNNNN NNNNNNACCTCAN1N2N3N4N5N6 … ver. 2 N1N2N3N4N5ACGTAGANNNNNN DOI NNNNNNTGGAGTN1N2N3N4N5N6 4.1%

AA1 AA2 Ala Ser AA AA AA AA Thr Ser AA1 AA2 * *** * … N1N2N3N4N5TACTTCTNNNNNN NNNNNNGCGTCAN1N2N3N4N5N6 … ver. 3 N1N2N3N4N5ATGAAGANNNNNN DOI NNNNNNCGCAGTN1N2N3N4N5N6 2.6%

AA1 AA2 Thr Ser AA AA AA AA Ala Ser AA1 AA2 * *** * * … N1N2N3N4N5TACTTCTNNNNNN NNNNNNACCTCAN1N2N3N4N5N6 … ver. 4 N1N2N3N4N5ATGAAGANNNNNN DOI NNNNNNTGGAGTN1N2N3N4N5N6 4.1%

AA1 AA2 Thr Ser AA AA AA AA Thr Ser AA1 AA2

Figure 2-8: Overview of new modified transposon designs Four versions were created beyond the original “version 0.” The transposons have the BsaI mutations (not shown in this figure) that were outlined in Figure 2-3 and Figure 2-5A. Each version has mutations in the cut site sequence, and is shown after a domain of interest (DOI) is cloned into the sites. Mutations relative to the wild-type Mu transposon sequence are indicated with an asterisk. Efficiency of the transposition reaction with the listed mutations is reported relative to the efficiency of a transposon with no mutations (wild-type, no BsaI mutations or cut site mutations).

The four new versions of BsaI-M1-CmR were tested in a transposition reaction with a new test plasmid, pET14GG-CHis-MBP (Figure 2-8). Of the modified transposons tested, version 1 had the highest transposition efficiency and was even better than the original BsaI-M1-CmR (version 0) in this test. Version 1, named BsaI-M1-CmR-1, encodes for alanine and serine at both ends, where the domains are linked. When cpsfGFP was cloned into insertion libraries created by each transposon variation, fluorescent colonies had a lower rate of out-of-frame insertions when sequenced (data not shown). This indicates that the elimination of the GTG codons helped reduce expression of protein fragments that contain only the fluorescent domain with none of the target ORF. Thus, libraries created by this transposon method will express full-length domain fusions when insertions are in-frame, while out-of-frame insertions should only produce truncated fragments of the target ORF.

2.4 Discussion

We have shown that minimal Mu transposons, used in in vitro reactions, can be modified to introduce restriction enzyme sites. Based on sequence analysis,

29 numerous restriction sites could be created within a Mu transposon with very few mutations. All sites would cut close to the ends of the transposon, leaving little of the sequence behind in a target plasmid. Testing the transposition efficiency of seven different modified transposons, we found that the sequence was relatively tolerant to mutations allowing the addition of desired restriction sites.

Modified transposons, encoding Type IIs restriction sites, were shown to create insertion libraries large enough to contain all possible insertion sites in an average plasmid (>106). These insertion libraries also proved efficient in subsequent cloning reactions, owing to their Golden-Gate-compatible restriction sites. This provided easy domain replacement of the modified transposons at their random insertion points.

These modified transposons are powerful tools that can create libraries containing domain insertions at every possible insert position (including every codon position of an ORF). The technology to bring domains together in random topologies allows protein engineers to develop functionality that has hitherto been unattainable. One example of these elusive goals in protein engineering is the implementation of designed allostery into a protein of interest (Raman et al. 2014). Not all insertion sites in a protein are created equal when trying to achieve allostery (Lee et al. 2008), so this modified transposon method provides a rapid way to create all possible insertions. Beyond domain insertions for protein engineering goals, these experiments supply a platform for wider tool development in DNA manipulation. For example, in synthetic biology randomly inserted cloning sites could be used to optimize the distance between parts or to test the robustness of plasmid backbones.

2.5 References

Aroul-Selvam R, Hubbard T, Sasidharan R. 2004. Domain insertions in protein structures. J Mol Biol. 338(4):633–41 Aziz RK, Breitbart M, Edwards RA. 2010. Transposases are the most abundant, most ubiquitous genes in nature. Nucleic Acids Res. 38(13):4207–17 Chothia C, Gough J, Vogel C, Teichmann SA. 2003. Evolution of the protein repertoire. Science. 300(5626):1701–3 Engler C, Kandzia R, Marillonnet S. 2008. A one pot, one step, precision cloning method with high throughput capability. PLoS ONE. 3(11):e3647 Goldhaber-Gordon I, Early MH, Baker TA. 2003a. The terminal nucleotide of the Mu genome controls catalysis of DNA strand transfer. P Natl Acad Sci Usa. 100(13):7509–14 Goldhaber-Gordon I, Early MH, Baker TA. 2003b. MuA Transposase Separates DNA Sequence Recognition from Catalysis †. Biochemistry-Us. 42(49):14633–42 Goldhaber-Gordon I, Early MH, Gray MK, Baker TA. 2002. Sequence and positional 30 requirements for DNA sites in a mu transpososome. J Biol Chem. 277(10):7703– 12 Green B, Bouchier C, Fairhead C, Craig NL, Cormack BP. 2012. Insertion site preference of Mu, Tn5, and Tn7 transposons. Mob DNA. 3(1):3 Gregory JA, Becker EC, Jung J, Tuwatananurak I, Pogliano K. 2010. Transposon assisted gene insertion technology (TAGIT): a tool for generating fluorescent fusion proteins. PLoS ONE. 5(1):e8731 Haapa S, Taira S, Heikkinen E, Savilahti H. 1999. An efficient and accurate integration of mini-Mu transposons in vitro: a general methodology for functional genetic analysis and molecular biology applications. Nucleic Acids Res. 27(13):2777–84 Haren L, Ton-Hoang B, Chandler M. 1999. Integrating DNA: transposases and retroviral . Annu Rev Microbiol. 53:245–81 Hoeller BM, Reiter B, Abad S, Graze I, Glieder A. 2008. Random tag insertions by Transposon Integration mediated Mutagenesis (TIM). Journal of Microbiological Methods. 75(2):251–57 Jin L, Baker B, Mealer R, Cohen L, Pieribone V, et al. 2011. Random insertion of split- cans of the fluorescent protein venus into Shaker channels yields voltage sensitive probes with improved membrane localization in mammalian cells. J. Neurosci. Methods. 199(1):1–9 Jones DD. 2005. Triplet nucleotide removal at random positions in a target gene: the tolerance of TEM-1 beta-lactamase to an amino acid deletion. Nucleic Acids Res. 33(9):e80 Lee J, Natarajan M, Nashine VC, Socolich M, Vo T, et al. 2008. Surface sites for engineering allosteric control in proteins. Science. 322(5900):438–42 Liu J, Cropp TA. 2012. Experimental methods for scanning unnatural amino acid mutagenesis. Methods Mol. Biol. 794:187–97 Mealer R, Butler H, Hughes T. 2008. Functional fusion proteins by random transposon-based GFP insertion. Methods Cell Biol. 85:23–44 Montaño SP, Rice PA. 2011. Moving DNA around: DNA transposition and retroviral integration. Curr Opin Struct Biol. 21(3):370–78 Pédelacq J-D, Cabantous S, Tran T, Terwilliger TC, Waldo GS. 2006. Engineering and characterization of a superfolder green fluorescent protein. Nat. Biotechnol. 24(1):79–88 Rad R, Rad L, Wang W, Cadinanos J, Vassiliou G, et al. 2010. PiggyBac : a tool for cancer gene discovery in mice. Science. 330(6007):1104– 7 Raman S, Taylor N, Genuth N, Fields S, Church GM. 2014. Engineering allostery. Trends Genet.

31 Reynolds KA, McLaughlin RN, Ranganathan R. 2011. Hot spots for allosteric regulation on protein surfaces. Cell. 147(7):1564–75 Sarrion-Perdigones A, Falconi EE, Zandalinas SI, Juárez P, Fernández-del-Carmen A, et al. 2011. GoldenBraid: an iterative cloning system for standardized assembly of reusable genetic modules. PLoS ONE. 6(7):e21622 Savilahti H, Rice PA, Mizuuchi K. 1995. The phage Mu transpososome core: DNA requirements for assembly and function. EMBO J. 14(19):4893–4903 Sheridan DL, Hughes TE. 2004. A faster way to make GFP-based biosensors: two new transposons for creating multicolored libraries of fluorescent fusion proteins. BMC Biotechnol. 4:17 van Opijnen T, Bodi KL, Camilli A. 2009. Tn-seq: high-throughput parallel sequencing for fitness and genetic interaction studies in microorganisms. Nat Methods. 6(10):767–72 Vogel C, Bashton M, Kerrison ND, Chothia C, Teichmann SA. 2004. Structure, function and evolution of multidomain proteins. Curr Opin Struct Biol. 14(2):208–16

32 Chapter 3 Fluorescent Biosensor Development

3.1 Introduction

Probing the underlying chemical pathways and metabolic states of individual cells is critical to advancing the field of biology. It is especially important to measure these phenomena in their natural context, without altering or disrupting their biochemical environment (Okumoto 2010). However, even in this post-genomic era, measurements of metabolites are still laborious and slow compared to advances in sequencing and molecular biology (Zenobi 2013). Genetically encoded fluorescent biosensors show great promise as a robust technology to make small-molecule measurements that are real-time and non-invasive (Okumoto et al. 2012, Sample et al. 2014). There are a number of promising constructs in the literature (e.g. Förster- resonance-energy-transfer-based (FRET-based) sensors and RNA aptamers), but a less frequently adopted construct, the single-fluorescent-protein (single-FP) biosensor, has several advantages. First, most have greater dynamic range than alternative constructs (Okumoto et al. 2012), which makes them more useful in high-throughput screening techniques that often require large signal differences. Second, single-FP sensors can also be imaged with a simpler setup than FRET (dual- FP) sensors and the single FP means there are more orthogonal sensors possible.

3.1.1 Single-FP Biosensors Single-FP biosensors have a common construction that has proven successful for several ligands (Okumoto et al. 2012) (Figure 3-1A). A GFP protein is circularly permuted (original N- and C-termini joined and new ends created elsewhere), resulting in a structure with new termini directly adjacent to the chromophore. In previous sensors, the new N-terminus is one of the GFP residues 145 through 149, while the new C-terminus is one of the residues 144 through 146 (see sequence alignment in Appendix A). These new ends of the circularly permuted GFP (cpGFP) are inserted into a ligand-binding domain (LBD), connected by short linkers. It is presumed that this connection is such that when the LBD binds its cognate ligand, its conformational change propagates through the linkers to cause a shift in the environment around the GFP chromophore.

33

Figure 3-1: Overview of single-FP-based biosensors (A) A schematic of a single-GFP-based biosensor. The original GFP N- and C-termini are connected, with a new N-terminus at 145 and a new C-terminus at 144. This circular permutation opens the beta-barrel structure directly next to the chromophore. The N- and C-termini are connected to a ligand-binding domain (LBD) through short linker sequences. It is thought that when the LBD binds its cognate ligand, conformational changes propagate through the linkers and alter the environment around the chromophore. (B) Excitation and emission spectrum of wild-type GFP (solid lines) overlaid on the excitation spectra of GFP with 100% protonated chromophore and with 100% deprotonated chromophore (dashed line; structure of each chromophore shown above its respective spectrum)(adapted from Tsien 1998).

The large dynamic ranges produced by many published single-FP biosensors result from the sensitivity of the GFP chromophore to its environment. The origins of this phenomenon can be traced to the wild-type GFP (WT GFP) protein from Aequorea victoria, which has two distinct peaks in its absorption spectrum (Figure 3-1B) (Tsien 1998). Through mutations of amino acids residues around the GFP chromophore, researchers found that these two peaks represent two populations of chromophore states: protonated and deprotonated. The deprotonated state has an excitation peak at ~488 nm and emits green ~511 nm fluorescence. The protonated state of the chromophore has an absorption peak at ~399 nm and has the same green fluorescence due to an excited-state proton transfer (deprotonation) event (Chattoraj et al. 1996, Stoner-Ma et al. 2005), but does not have any excitation at 488 nm (Cubitt et al. 1999). Thus, in the context of 488 nm excitation, the protonated state can be thought of as “dark,” while the deprotonated state is “light.” The bulk spectral properties of the WT GFP are a combination of both the protonated and deprotonated chromophore states (Figure 3-1B). The fact that both states exist naturally in the wild-type protein and can be shifted one way or the other with specific mutations implies the two states are in a delicate balance that can easily be perturbed. It is proposed then that single-FP biosensors function by altering the protonation state of the chromophore by changing its exposure to the solvent (Akerboom et al. 2009), though any changes to the proton transfer network, by solvent exposure or side chain movements, could perturb the protonation state (Oltrogge et al. 2014).

34

3.1.2 Producing Functional Single-FP Biosensors Even with a model for the underlying mechanism of single-FP biosensors, it has still proven difficult to produce them for targets of interest. The first such sensor was published 15 years ago, but less than 10 single-FP sensors against small molecules have been published since then (Belousov et al. 2006, Berg et al. 2009, Honda & Kirimura 2013, Hung et al. 2011, Ji et al. 2013, Marvin et al. 2011, 2013; Nausch et al. 2008, Tewson et al. 2012) (details in Appendix A). The reason identified by many of these studies for the difficulty in producing biosensors is a lack of general principles for how to best fuse a cpGFP into LBD to create functional sensing (Figure 3-1A). Many of these studies have relied on crystal structure data to determine places of large conformational change in apo versus holo forms of LBD and inserted the cpGFP at that location. However, the initial site chosen did not always work and it takes a great deal of time and effort to build new constructs to test a different insertion site. The laborious iterative testing is a major hurdle to quickly creating new biosensors. In this work, we propose to overcome this challenge by creating and testing all insertion sites in one pool to develop biosensors in a high-throughput manner.

To produce a pool of all possible cpGFP insertions into a LBD of interest, a method is needed to insert the cpGFP, at the DNA level, into every codon of the LBD (Figure 3-2). The created DNA library, when expressed, would represent all possible topologies of fusions between the domains. Our lab has developed a modified- transposon method that will produce such a library without the time and cost burden of a traditional PCR-based approach (see Chapter 2). With a pool of potential biosensors constructed, a screen for sensor (or switching) behavior could pull out the best possible fusion construct. With the previous iterative design approach, even if a functional sensor was created, not all insertion sites were tested so the “final” sensor may have been suboptimal.

35 plasmid library protein library

ligand-binding domain cpGFP screening circularly permuted GFP target gene

Figure 3-2 Outline of biosensor development strategy Achieving a protein library of all cpGFP and LBD connections, or topologies, requires making a plasmid DNA library with the sequence of cpGFP inserted at every codon of the LBD. This library, when expressed, will sample all possible locations in the LBD that may successfully alter the attached cpGFP’s fluorescence. A proper screen, that detects proper switching or biosensor behavior, should pull out the successful insertion sites.

3.1.3 Screening Libraries of Potential Biosensors In protein engineering, the screen used to quantify and discriminate performance is critical to the success of a method (Arnold & Volkov 1999). For a library of potential biosensors, this fact remains true. Fortunately there are advanced techniques to measure fluorescence, the biosensor output, in a high-throughput manner. Fluorescence-activated cell sorting (FACS) is a powerful tool that was developed over 40 years ago and can serve as a discriminating screen of fluorescence intensity (Herzenberg et al. 2002). Single-FP sensors expressed in E. coli cells have detectable shifts in fluorescence intensity on FACS instruments when exposed to its cognate ligand (Figure 3-3A). The performance of a biosensor is not measured by the absolute fluorescence of a construct, but by the dynamic range between ligand- bound and unbound states. Therefore, a single screen with FACS would not enrich for better sensor performance. Instead, we propose to use iterative rounds of positive and negative screens with FACS to enrich for a sensor’s switch behavior (Figure 3-3B-D). Screening for high fluorescence in the presence of the ligand and screening for low fluorescence with no ligand should apply selective pressure towards constructs with high dynamic ranges. After multiple rounds of sorting, constructs from the enriched pool can be tested individually to identify the best hits (Figure 3-3E).

36 A 100 B + ligand sort 1 - ligand + ligand 80 + screening!

cellcount 60 cutoff

40 cellcount

20 normalized normalized 0 C sort 2 102 103 104 - ligand sensor fluorescence - screening!

cutoff cellcount

E targeted analysis of enriched clones D + ligand sort 3

+ screening!

cutoff cellcount

fluorescence

Figure 3-3 Schematic of biosensor screening strategy (A) A previously published maltose biosensor (Marvin et al. 2011) measured on a FACS instrument with and without maltose added shows a significant change in fluorescence in the cell population. A method for screening out active biosensors, schematized with ideal sorting efficiency: (B) the initial library is sorted for high fluorescence when ligand is present, (C) the sorted cells are grown up and sorted for low fluorescence when no ligand is added, and finally (D) the sorted cells are grown up, maltose is added again and they are sorted for high fluorescence. The alternating application of positive and negative screens can continue to enrich a library, but we started with three rounds of sorting. After the final round, (E) individual clones are tested in a microwell format for sensing behavior.

To develop our new method of biosensor production, we turned to maltose-binding protein (MBP) as a model LDB. MBP is an ideal candidate for this purpose; it is extremely well studied and, more importantly, has already been used to construct a single-FP biosensor (Marvin et al. 2011, Medintz & Deschamps 2006). The previously published maltose sensor serves as a good positive control and as a benchmark. MBP also undergoes a large conformational change upon ligand binding (Evenäs et al. 2001, Sharff et al. 1992), which is thought to provide many sites that would readily alter the chromophore environment of an attached cpGFP. It is important to consider the ligand of interest, as well, and whether it can efficiently reach the biosensor. Maltose, the cognate ligand of MBP, is quickly taken up by E. coli, but not metabolized significantly if in a rich medium, and thus the intracellular concentration can be easily manipulated. These features make MBP an optimal proof-of-concept target LBD. The biosensor construction and screening methods can be tested and developed around MBP and then applied to more diverse targets to evaluate or improve robustness. 37

3.2 Materials and Methods

3.2.1 Strains and Media The bacterial strain for cloning was 10G E. coli, electrocompetent, from Lucigen Corp. Chemically competent DH5α E. coli were also used, made by standard calcium chloride techniques. The expression strain was BL21 (DE3) E. coli, electrocompetent, also from Lucigen. Recovery Medium for electroporation transformations was provided in the electrocompetent kits. All other growth, unless otherwise noted, was done in Lysogeny Broth (LB Broth MILLER, EMD Millipore) liquid medium or on Lysogeny Broth 1.5% w/v agar media plates (Bacto Agar, Becton Dickinson and Company). Media and all enzymatic reactions were prepared with purified water from a Barnstead Purification System (Thermo Fisher Scientific Inc.).

3.2.2 DNA manipulation All standard restriction enzymes used were FastDigest Enzymes from Thermo Fisher Scientific Inc. All Type IIs restriction enzymes used in Golden Gate reactions were obtained from NEB (New England Biolabs). Plasmids were isolated from bacterial cultures with a QIAprep Spin Miniprep Kit or a HiSpeed Plasmid Midi Kit (Qiagen). DNA was isolated from agarose gels using a Zymoclean Gel DNA Recovery Kit (Zymo Research Corp.). DNA concentrations were quantified with a NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific Inc.).

Electroporation of competent E. coli was carried out using a BTX ECM 630 Exponential Decay Wave Electroporation System (Harvard Apparatus, Inc.). Settings for all electroporations were: 50 μF, 150 Ω, and 1.5 kV. Gene Pulser cuvettes (Bio- Rad Laboratories, Inc., cat# 165-2089), 0.1 cm gap, were used with the instrument. All PCR reactions for cloning used Phusion High-Fidelity Polymerase (Thermo Fisher Scientific Inc.) with the manufacturer’s recommended conditions. Annealing temperatures were calculated with the Thermo Scientific “Tm calculator” that is specific for Phusion reactions. PCR products were either gel purified or, if an analytic gel showed a single band, treated with DpnI to digest template DNA and purified with a PCR cleanup kit.

Golden Gate cloning (Engler et al. 2008) was used for creating plasmids and assembling genes into destination vectors. Briefly, 20 fmoles of each DNA component was mixed with 5 units Type IIs enzyme, 400 units T4 DNA Ligase (NEB), and 1X T4 DNA Ligase Reaction Buffer (NEB) in a total volume of 10 μL. The reaction was incubated 2 minutes at 45°C, 5 minutes at 16°C (first two steps cycled 25 times), 20 minutes at 60°C, and 20 minutes at 80°C. Either 5 or 10 μL of the Golden Gate reaction was used to transform 50 or 100 μL of chemically competent

38 cloning cells, respectively. Proper cloning was detected by plating on LB agar with an appropriate antibiotic or 500 μM IPTG (Amresco, Inc.) and 25 μg/mL X-Gal (Gold Biotechnology, Inc.) for blue/white screening.

3.2.3 Cloning Minimal Expression Plasmid The plasmid pET14GG-CHis-Dest served as a template to create the new minimal expression plasmid pETM-CHis-Dest, removing the rop gene and sections of unnecessary sequence. Two sections of pET14GG-CHis-Dest were amplified with primer sets 1/2 and 3/4 (Table 3-1). Section 1 contained the plasmid origin and AmpR resistance gene. Section 2 contained the T7 expression cassette, including the GG BsaI cloning site. The two fragments were assembled together with Golden Gate cloning using the enzyme BsmBI. The BsaI GG cut sites of the new plasmid are still GTCC and AGCG (5’->3’ in direction of ORF).

3.2.4 Maltose-Binding Protein Cloning The Maltose-Binding Protein (MBP) template sequence was derived from the E. coli genome (encoded by malE). The template lacks the periplasmic signal sequence (first 26 amino acids) and has a point mutation (MBP Gly19, i.e. malE Gly45, GGT- >GGA) that eliminated a BsaI recognition site; this mutation site was introduced with a QuikChange II Site-Directed Mutagenesis Kit (Agilent Technologies, Inc.). The MBP coding sequence was amplified from the template sequence with primers 5 and 6 (Table 3-1) and assembled into pET14GG-CHis-Dest or pETM-CHis-Dest with BsaI Golden Gate cloning.

3.2.5 Excising Transposon DNA from Propagation Plasmid The BsaI-M1-CmR-1 transposon was encoded in a propagation plasmid, pUCGG- KanR-BsaI-M1-CmR-1, so that it could be excised with restriction enzymes BglII and HindIII. Approximately 2 μL of each restriction enzyme was used to digest 5 μg of the propagation plasmid over 3 hours at 37°C. Transposon DNA was separated from the plasmid backbone by agarose-gel electrophoresis; the transposon is 1254 basepairs while the backbone is 2086 basepairs. The desired smaller band was cut out and extracted. The isolated transposon was eluted with Buffer EB (Qiagen, 10 mM Tris-Cl, pH 8.5) because the Zymo elution buffer contains EDTA, which can inhibit downstream reactions.

3.2.6 Transposition Reactions MuA Transposase enzyme was purchased from Thermo Fisher Scientific Inc. (Cat. #F-750). Transposition reactions were a total volume of 20 μL with the following components: 4 μL 5X MuA reaction buffer, 100 ng modified transposon (purified from transposon-propagation plasmid), target plasmid DNA (~0.5 molar ratio to transposon DNA), and 1 μL 0.22 μg/μL MuA transposase enzyme. Reactions were carried out for 18 hours at 30°C followed by 10 minutes at 75°C to heat inactivate MuA. Completed reactions were cleaned up with a DNA Clean & Concentrator-5 Kit 39 (Zymo Research Corp.) using 40 μL of DNA Binding Buffer for sample preparation and 6 μL of water for elution.

2 μL of cleaned-up transposition reaction was transformed into 25 μL of 10G E. coli using electroporation. The transformation was recovered with 975 μL of Recovery Medium at 37°C, 250 rpm shaking for 1 hour. An aliquot of the recovery culture was spread on an LB-agar plate with appropriate antibiotics to assess reaction efficiency (25 μg/mL chloramphenicol to select for transposon insertion and 50 μg/mL carbenicillin to select for the target-plasmid backbone). Comparing the volume that was plated to the total recovery volume, a total theoretical colony-forming unit (CFU) value was calculated to estimate the total number of cells containing a plasmid with a successful insertion.

The recovery culture was then centrifuged at 4000 g for 5 minutes to pellet the cells and remove supernatant. Cells were resuspended in 50 mL LB with 25 μg/mL chloramphenicol and 50 μg/mL carbenicillin to select for transposon insertions and the target-plasmid backbone, respectively. The culture was grown overnight at 37°C, 250 rpm. A midi-prep was carried out on the culture the following day to harvest the plasmid library containing transposon inserts.

3.2.7 Cloning cpEGFP into Transposon Insertion Libraries Pools of transposon-inserted plasmids were purified from no-insert plasmid by agarose-gel electrophoresis. Approximately 5 μg of supercoiled plasmid DNA was added to extra wide wells in an agarose gel and run at low voltage (~3 V/cm) until good resolution was achieved. The major transposon-insertion-library band was cut out and the DNA extracted, leaving the smaller no-insert plasmid band.

The circularly permuted enhanced green fluorescent protein (cpEGFP) gene was amplified from the plasmid pETM-CHis-cpEGFP. This construct was built from standard EGFP to match the amino acid sequence of cpEGFP used in the previously published maltose sensor (Marvin et al. 2011) . The cpEGFP was amplified with primers 7 and 8 by PCR (Table 3-1). Example primers for adding codons between the cpEGFP and cloning sites are shown as primers 9 and 10.

A Golden Gate reaction was used to exchange the modified transposon, harbored within each plasmid of the library, with the PCR amplified cpEGFP. Briefly, 40 fmoles of each DNA component was mixed with 10 units BsaI (NEB), 800 units T4 DNA Ligase (NEB), and 1X T4 DNA Ligase Reaction Buffer (NEB) in a total volume of 20 μL. The reaction was incubated 2 minutes at 45°C, 5 minutes at 16°C (first two steps were cycled 50 times), 20 minutes at 60°C, and 20 minutes at 80°C. The completed reaction was purified and concentrated with a PCR cleanup kit, eluted with 6 μL water.

2 μL of cleaned-up reaction was transformed into 25 μL of BL21 (DE3) E. coli using electroporation. The transformation was recovered with 975 μL of Recovery

40 Medium at 37°C, 250 rpm shaking for 1 hour. An aliquot of the recovery culture was spread on different LB-agar plates with appropriate antibiotics to assess efficiency. A plate with 50 μg/mL carbenicillin and 1% w/v glucose was used to select for the target-plasmid backbone and repress protein expression, respectively, to determine the number of transformed cells. With the observed CFU value, comparing the volume that was plated to the total recovery volume allowed the total theoretical CFU value to be calculated. A second plate with 25 μg/mL chloramphenicol was used to select for CmR-transposon inserts to determine amount of contaminating un-cut plasmid (that still contains the modified transposon).

The recovery culture was then centrifuged at 4000 g for 5 minutes to pellet the cells and remove supernatant. Cells were resuspended in 6 mL LB with 50 μg/mL carbenicillin and 1% w/v glucose. Glucose is added to repress any protein expression in BL21 (DE3) during growth. The culture was grown overnight at 37°C, 250 rpm. The following day, an aliquot was used to make a frozen 15% v/v glycerol stock (stored at -80°C). If analysis or sorting was to be carried out that day, an aliquot was also used to inoculate a fresh culture. The remaining culture was mini- prepped to harvest the plasmid library.

3.2.8 PCR Analysis of Insertion Libraries PCR was used to detect the insertion position of either the BsaI-M1-CmR-1 transposon or cpEGFP. In combination with insert-specific primers (primer 11 for transposon, primer 12 for cpEGFP, Table 3-1), a forward primer which anneals upstream of the target gene (primer 13) will produce a PCR product if the insert is coding in the same direction. Another reverse primer downstream of the target gene (primer 14) produces a PCR product when the insert is coding in the reverse direction of the target gene. Thus, for example, the primer set 11/13 used with a transposon insertion library will amplify any transposon inserted in the forward direction (relative to the target gene) and produce a different fragment length for each insert position. Using the primer set 11/14 on the same library will detect reverse-oriented insertions.

Taq DNA polymerase (NEB) was used for PCR analysis of libraries because Phusion polymerase was found to preferentially amplify certain bands over others and to promote more mis-priming. Reactions were as follows: 2.5 μL Thermopol reaction buffer (NEB), 0.5 μL 10mM dNTPs (NEB), 0.2 μL 25 μM insertion primer, 0.2 μL 25 μM backbone primer, 0.125 μL Taq DNA polymerase (NEB), 1 μL template DNA, and water to 25 μL. Standard Taq thermocycler conditions were used with an annealing temperature of 54°C for both forward and reverse detection reactions. Completed reactions were analyzed with agarose-gel electrophoresis to visualize the distribution of band lengths.

3.2.9 Cloning MBP-170 Library A Golden Gate destination plasmid was created that allowed cloning directly into the MBP-170 site. Primers 15 and 16 (Table 3-1) were used to amplify all of pETM-CHis- 41 MBP by PCR, adding BsmBI cloning sites at the 170 amino acid location. A lacZ-α fragment was amplified with primers 17 and 18, adding compatible BsmBI sites on the ends. These two pieces were assembled with standard BsmBI GG cloning and correct assembly detected by blue/white screening. This assembled plasmid, pETM- MBP170-Dest, has a BsaI GG-compatible cloning site between amino acid 169 and 170 of MBP. The sequence of BsaI cut sites (GTTC and AAGT, 5’->3’ in direction of ORF) encode amino acid residues originally in the ORF of MBP and thus produce no scar site after cloning.

The cpEGFP gene was amplified from pETM-CHis-cpEGFP with primers that are compatible for cloning into pETM-MBP170-Dest. A set of forward (19-23) and reverse (24-28) primers were designed that added zero through four random codons between cpEGFP and the cloning cut site (Table 3-1). These random codons create random linkers between MBP and cpEGFP once cloned. A standard Phusion PCR was carried out but with an equal mixture of each primer.

A Golden Gate reaction was used to assemble cpEGFP with random linkers into pETM-MBP170-Dest. Briefly, 40 fmoles of amplified insert and 40 fmoles of pETM- MBP-170 were mixed with 10 units BsaI (NEB), 800 units T4 DNA Ligase (NEB), and 1X T4 DNA Ligase Reaction Buffer (NEB) in a total volume of 20 μL. The reaction was incubated 2 minutes at 45°C, 5 minutes at 16°C (first two steps were cycled 50 times), 20 minutes at 60°C, and 20 minutes at 80°C. The completed reaction was purified and concentrated with a PCR cleanup kit, and eluted with 6 μL water.

2 μL of cleaned-up reaction was transformed into 25 μL of BL21 (DE3) E. coli using electroporation. The transformation was recovered with 975 μL of Recovery Medium at 37°C, 250 rpm shaking for 1 hour. An aliquot of the recovery culture was spread on different LB-agar plates. A plate with 50 μg/mL carbenicillin and 1% w/v glucose was used to repress protein expression and provide colonies to test in 96- well plates. A second plate with 50 μg/mL carbenicillin, 500 μM IPTG, and 25 μg/mL X-Gal was used to assess GG reaction efficiency by blue/white screening.

3.2.10 Fluorescence-Activated Cell Sorting (FACS) Samples for FACS analysis and sorting were inoculated from an overnight culture of BL21 (DE3) with a control plasmid or a plasmid library. The overnight culture was 5 mL LB with 50 μg/mL carbenicillin and 1% w/v glucose, inoculated from a plated colony or from a frozen glycerol library stock. 200 μL of overnight culture was added to 5 mL LB with 50 μg/mL carbenicillin and 1% w/v glucose (glucose represses protein expression before induction) in 14 mL polypropylene test tubes (Falcon brand, Corning Inc.). Cultures were grown at 37°C, 250 rpm until the optical density at 600 nm (OD600) reached approximately 0.6 (around 2 hours). At that point, IPTG was added to a final concentration of 0.4 mM. The cultures were then incubated at 30°C, 300 rpm. After 3 hours: for positive-sort samples maltose was added to 1mM and for negative-sort samples no substrate was added. All cultures were returned to 30°C, 300 rpm for 30 minutes. Cultures were then put on ice until

42 sorting. Cultures were diluted 1:100 in ice-cold PBS with 1mM maltose for positive screening or ice-cold PBS only for negative screening. Samples were sorted and analyzed with a BD Influx cell sorter (BD Biosciences) at the UC Berkeley Flow Cytometry Core Facility. Cells were measured for GFP signal with a 488 nm laser and 530/40 filter (setting FL 1). Greater than 105 cells were collected in each sort. Cells were sorted into 5 mL LB medium with 100 μg/mL carbenicillin and 1% w/v glucose. Cultures with sorted cells were kept on ice until all samples were finished. Cultures were then grown at 37°C, 250rpm. After 30 minutes, an aliquot was plated on LB agar with 100 μg/mL carbenicillin and 1% w/v glucose to verify cell viability after sorting. The remaining culture was grown overnight. The following day, an aliquot was used to make a frozen 15% v/v glycerol stock (stored at -80°C). If analysis or another round of sorting was to be carried out that day, an aliquot was also used to inoculate a fresh culture. The remaining culture was mini-prepped to harvest the plasmid library.

3.2.11 Microplate Biosensor Testing Starter cultures were inoculated in standard round-bottom polypropylene 96-well plates (Product #3359, Corning Inc.). Single BL21 (DE3) colonies were used to inoculate 150 μL LB with 1% w/v glucose, 10% v/v glycerol, and 100 μg/mL carbenicillin. The plates were sealed with a white rayon breathable seal (Nunc brand, Thermo Fisher Scientific Inc.) and grown overnight at 37°C, 750rpm (3 mm throw). The next day, the overnight culture was used to inoculate a fresh, expression culture plate. The remaining culture was archived; the plate was sealed with an aluminum foil seal (VWR International) and stored at -80°C.

An auto-induction medium was used for biosensor expression and testing in the microwell format. Another standard 96-well plate was filled with 150 μL ZYM-5052 auto-induction medium in each well (Studier 2005). Each well was inoculated with 2 μL of culture from the 96-well starter culture plate. The expression plate was sealed with a breathable seal and incubated at 30°C, 750rpm (3mm throw) overnight to induce protein production. The following day, 100 μL from each well was transferred to a standard black polystyrene round-bottom 96-well plate (Product #3792, Corning Inc.). Fluorescence measurements were made with a Tecan M1000 microplate reader: 488/5 nm excitation, 520/20 nm emission. After the initial pre- substrate-addition measurement, maltose was added to each sample for a final concentration of 1 mM. The plate was left shaking for 10 minutes on the plate reader before taking a second, post-substrate-addition, reading.

3.2.12 Single-Cell Measurements on Sony SH800 Fluorescence measurements were made on a single-cell level with a Sony SH800 FACS instrument run in analyzer (flow cytometry) mode. To reduce background fluorescence signal from the LB medium, MOPS EZ Rich Defined Medium (kit from Teknova) was used for cell growth and protein expression. Single BL21 (DE3) colonies were used to inoculate starter cultures of 3 mL MOPS EZ with 0.2% v/v glycerol and 100 μg/mL carbenicillin. The cultures were grown overnight at 37°C, 43 250rpm overnight. The following day, a 5 mL culture of MOPS EZ with 0.2% v/v glycerol and 100 μg/mL carbenicillin was inoculated with 100 to 200 μL of starter culture and grown at 37°C, 250rpm until an OD600 of 0.6 was reached. At that point, IPTG was added to a final concentration of 0.4 mM. The cultures were then incubated at 30°C, 300 rpm for 3.5 hours. Cultures were removed and split into two test tubes, 2 mL in each. One tube was a no-substrate sample. The other was a plus- substrate sample and maltose was added to 1mM. All cultures were returned to 30°C, 300 rpm for 30 minutes. Cultures were then put on ice until analysis. No- substrate samples were diluted 1:100 in ice-cold PBS and plus-substrate samples were diluted 1:100 in ice-cold PBS with 1mM maltose. Samples were measured with the GFP laser and filter set on the SH800, collecting 50,000 measurements of each.

3.2.13 In Vitro Biosensor Analysis Protein was expressed with plasmid pETM-CHis-MBP170-cpEGFP-Mal2B2 (B2 biosensor construct) or pETM-CHis-cpEGFP (fluorescent control) in BL21 (DE3) cells. The proteins were C-terminally 6xHis-tagged and purified with Ni-NTA columns (Qiagen). Purified protein was buffer exchanged and concentrated in phosphate buffered saline (PBS), pH 7.4. The protein was diluted to ~10μM in PBS. In a standard 384-well plate, 20 μL of 2x ligand (in PBS) was added to 20 μL of protein. No-ligand measurements were 20 μL PBS added to 20 μL of protein. Excitation/emission 485/515 nm was measured on a plate reader after PBS (with or without ligand) addition. The negative control (labeled PBS in Figure 3-11) had no protein in the sample. ΔF/F was calculated as (Fligand-Fno-ligand)/Fno-ligand.

44 Table 3-1: List of DNA primers Lower case letters indicate bases that anneal to the template. # Name Sequence (5’->3’) 1 pETM-section-1-F ACACACCCGTCTCACAAggtttcttagacgtcaggtggcac 2 pETM-section-1-R ACACACCCGTCTCAGATggcgctcttccgcttcc 3 pETM-section-2-F CACACCACGTCTCACATcagcaaccgcacctgtgg 4 pETM-section-2-R CACACCACGTCTCACTTgactacgcgatcatggcgac 5 MBP_GG_F CACACCAGGTCTCAGTCCaaaatcgaagaaggtaaactggtaatctgg 6 MBP_GG_R CACACCAGGTCTCACGCTcttggtgatacgagtctgcgc 7 cpEGFP-MuBsaI-F_1 CACACCAGGTCTCACATCTtataacgtctttatcatggccgacaagc 8 cpEGFP-MuBsaI-R_2 CACACCAGGTCTCTACGCgttgaagttatattcaagcttatggcccag 9 cpEGFP-MuBsaI-F_1.x CACACCAGGTCTCACATCTNNNtataacgtctttatcatggccgacaagc 10 cpEGFP-MuBsaI-R_2.x CACACCAGGTCTCTACGCNNNgttgaagttatattcaagcttatggcccag 11 Transposon_detect_rev cagtttgctcaggctctcc 12 cpGFP_detect_neg cttgatgccgttcttctgc 13 Forward_detect_ori acctctgacttgagcgtcg 14 Reverse_detect_ampr cacggaaatgttgaatactcatactc 15 170_insert_site_gg_F CACACCACGTCTCTaagtatgaaaacggcaagtacgacattaaagac 16 170_insert_site_gg_R CACACCACGTCTCAgaacgcataacccccgtcag 17 lacZ__DO_170mbp_F CACACCACGTCTCAGTTCAGAGACcattaatgcagctggcacgac 18 lacZ__DO_170mbp_R CACACCACGTCTCAACTTAGAGACcagcttgtctgtaagcggatgc 19 cpEGFP-MBP170-F CACACCAGGTCTCAGTTCtataacgtctttatcatggccgacaagc 20 cpEGFP-MBP170- CACACCAGGTCTCAGTTCVSTtataacgtctttatcatggccgacaagc F_pls1 21 cpEGFP-MBP170- CACACCAGGTCTCAGTTCVSTVSTtataacgtctttatcatggccgacaagc F_pls2 22 cpEGFP-MBP170- CACACCAGGTCTCAGTTCVSTVSTVSTtataacgtctttatcatggccgacaa F_pls3 gc 23 cpEGFP-MBP170- CACACCAGGTCTCAGTTCVSTVSTVSTVSTtataacgtctttatcatggccga F_pls4 caagc 24 cpEGFP-MBP170-R CACACCAGGTCTCTACTTgttgaagttatattcaagcttatggcccag 25 cpEGFP-MBP170- CACACCAGGTCTCTACTTASBgttgaagttatattcaagcttatggcccag R_pls1 26 cpEGFP-MBP170- CACACCAGGTCTCTACTTASBASBgttgaagttatattcaagcttatggccca R_pls2 g 27 cpEGFP-MBP170- CACACCAGGTCTCTACTTASBASBASBgttgaagttatattcaagcttatggc R_pls3 ccag 28 cpEGFP-MBP170- CACACCAGGTCTCTACTTASBASBASBASBgttgaagttatattcaagcttat R_pls4 ggcccag

3.3 Results

3.3.1 Construction of a MBP-cpEGFP Library A modified transposon, with built-in restriction sites, was used to create a library of MBP-cpEGFP fusion constructs (Figure 3-4). The transposon, BsaI-M1-CmR-1, has mutations that add BsaI restriction sites at the very end of its sequence that allow for cloning at the random insertion site with minimal scarring (see Chapter 2 for details). A transposition reaction was carried out with BsaI-M1-CmR-1 and the plasmid pET14GG-CHis-MBP, an expression plasmid carrying the MBP gene (Figure

45 3-4A). Initial analysis of this transposon insertion library found that inserts were heavily favored in the regulatory rop gene that helps control the copy-number of the plasmid (data not shown). We hypothesized that disrupting the rop gene increased the copy-number of the plasmid providing increased amounts of antibiotic- resistance enzyme, thus improving the host cell’s growth rate over other cells. To combat this selective pressure, we built a new expression plasmid for MPB that removes the rop gene and only keeps necessary sequence elements. The new minimal pET plasmid, pETM-CHis-MBP, was used in a transposition reaction with BsaI-M1-CmR-1 to create a new insertion library. The transformed library produced greater than 105 transformants, much greater than the theoretical maximum number of possible insertion positions (plasmid length x two insertion directions: 3432x2 = 6864).

A transposition D insertion library

cpEGFP cloning C expression B cpEGFP

BsaI restriciton sites

Figure 3-4: Schematic for the production of a cpEGFP-insertion library (A) A modified transposon inserts into random positions throughout a plasmid with a gene of interest (MBP in this case). (B) Each plasmid has a cloning site introduced by the transposon at a random location. A cpEGFP is cloned into that random site. (C) Each plasmid has a cpEGFP inserted at a random location. (D) When expressed, plasmids with in-frame cpEGFP insertions produce fusion proteins of MBP attached to cpEGFP, which could be possible maltose sensors.

Circularly permuted EGFP was cloned into the transposon insertion library using Golden Gate cloning (Figure 3-4B). The resulting plasmid library of cpEGFP insertions into MBP represents all possible fusion topologies at the protein level (Figure 3-4C and D). Exchange of the BsaI-M1-CmR-1 transposon with cpEGFP, if in frame with MBP, results in the two proteins being connected by an alanine-serine linker at both connections, along with a duplication of two amino acids from the MBP sequence (Figure 3-5). Transformation of the completed cloning reaction into expression cells resulted in greater than 107 successful transformants, again much larger than the theoretical maximum library size.

46 target gene … N1N2N3N4N5 … N1N2N3N4N5

AA1 AA2 transposon DNA CmR transposition

BsaI BsaI … N1N2N3N4N5TGCATCGGAGACC CmR GGTCTCCGCGTCAN1N2N3N4N5… N1N2N3N4N5ACGTAGCCTCTGG CCAGAGGCGCAGTN1N2N3N4N5 BsaI + BsaI CACACCAGGTCTCACATCTTATAACGTC AACTTCAACGCGTTGAGACCTGGTGTG GTGTGGTCCAGAGTGTAGAATATTGCAG cpEGFP TTGAAGTTGCGCAACTCTGGACCACAC Tyr Asn Val Asn Phe Asn 148 149 150 144 145 156 golden gate rxn

… N1N2N3N4N5TGCATCTTATAACGTC cpEGFP AACTTCAACGCGTCAN1N2N3N4N5N6 … N1N2N3N4N5ACGTAGAATATTGCAG TTGAAGTTGCGCAGTN1N2N3N4N5N6

AA1 AA2 Ala Ser Tyr Asn Val Asn Phe Asn Ala Ser AA1 AA2 148 149 150 144 145 156

Figure 3-5: Sequence details of transposition and cloning strategy The modified transposon (magenta and yellow) inserts into a random location within MBP (blue) causing a duplication of five basepairs. cpEGFP that is PCR amplified with compatible primer ends (green, with bases from primers shown in bold) is Golden Gate cloned into the BsaI sites of the modified transposon. The final construct, if in-frame, has alanine-serine linkers are both ends at two MBP amino acids duplicated.

The randomness of the insertion libraries was assessed by PCR of the purified plasmid pool (Figure 3-6). Primers that anneal to the inserting sequence and to the backbone will produce PCR products of different sizes for each insertion site (Figure 3-6A and B). In addition, by choosing the correct relative directions of the primers, insertions of a specific direction can be detected. For example, we used a set of primers that showed us where insertions occurred that were in the same direction as the MBP gene (the only direction that would result in in-frame domain insertions). For both the transposon insertion library and the subsequent cpEGFP insertion library, insertions were seen randomly throughout the sequence (Figure 3-6C). This indicates the final MBP-cpEGFP library should contain all possible in- frame fusions of the two proteins. It will also include out-of-frame and reverse insertions of cpEGFP, but with proper screening the most functional domain connection can be isolated.

47 transposon cpEGFP insert cloned A C bp 20000 10000 7000 5000 transposon 4000 3000 2000 1500 MBP B 1000 cpEGFP 700 500 400 300 200

Figure 3-6: PCR analysis of insertion library diversity Schematic is shown for primer design for (A) a transposon insertion library and (B) a cpEGFP insertion library. In both cases, one primer is directed at the backbone upstream of the gene of interest (shown in blue) and the other primer is directed at the insertion sequence. The primer anneals to the coding strand of the insertion such that the PCR extends towards the upstream primer only when the insertion is in the same direction as the gene of interest. Different insertion locations will produce different size fragments. (C) Agarose-gel electrophoresis analysis of resulting PCR using primer schemes A and B for the transposon insertion library (middle 3 lanes) and the cpEGFP insertion library (rightmost lane), respectively. The range of fragment lengths corresponding to insertions within MBP is marked on the right (same range for both library types). Smearing of DNA in a lane represents random insertions.

3.3.2 FACS Enrichment of Biosensors from a Library The assembled MBP-cpEGFP insertion library, expressed in BL21 (DE3) E. coli cells, was subjected to multiple rounds of FACS to enrich for constructs with sensor behavior (Figure 3-7). Following the strategy outlined in Figure 3-3, three rounds of sorting, alternating between positive and negative screening, were carried out. In the first positive sort, with maltose present, cells with fluorescence above the negative control (MBP only) were sorted, making up approximately 1% of the initial library population (Figure 3-7A). This low percentage is expected because, as discussed previously, most of the insertions should be unproductive (out-of-frame, reverse orientation, or outside of the MBP ORF). It was anticipated that this first sort would enrich for in-frame insertions, since unproductive insertions should not show any fluorescence. Unlike in the theoretical sorting strategy, in reality the sorting is not 100% efficient, letting some “off” cells through, leading to two populations when propagated to the next round: a major non-fluorescent population and a minor fluorescent population (Figure 3-7B). The non-fluorescent population most likely also had contaminating non-expressing cells present, based on those seen in the positive control of cpEGFP only, making it the larger population. These sorted cells 48 were propagated without any maltose in the second round and subjected to a negative sort, enriching for low fluorescence (i.e. the off state of a sensor) (Figure 3-7B). It was presumed that an “off” biosensor would still have low fluorescence, thus the gate used to sort was for fluorescence above the negative control (to still avoid unproductive-insertion constructs) but below the minor fluorescent population. This sorting gate represented approximately 30% of the population. The cells from the negative screen were propagated and sorted a final time with maltose added back (Figure 3-7C). Even with a contaminating non-expressing population, an extended tail was seen in the fluorescent population and was sorted. This gate represented approximately 8% of the population. This final sorted population was propagated and spread on agar plates to isolate individual constructs.

49 100 A + maltose 80

60 MBP-cpEGFP library 40

20 MBP normalized cellcount normalized only 0 100 B 100 101 - maltose102 103 104 AFU 80 cpEGFP MBP-cpEGFP only 60 library MBP only 40

20 normalized cellcount normalized 0 100 0 1 2 3 4 C 10 10 + maltose10 10 10 80 AFU

60 MBP-cpEGFP 40 library

20

normalized cellcount normalized MBP only 0 100 101 102 103 104 AFU Figure 3-7: Fluorescence-activated cell sorting (FACS) of MBP-cpEGFP library Fluorescence histograms are shown for three FACS runs. MBP-cpEGFP insertion libraries are shown with a magenta solid line and the control constructs, MBP or cpEGFP, are shown with gray dashed lines. Fluorescence range that was sorted is shown filled in with gray. (A) Sort 1: with substrate (maltose), fluorescence above background signal (cells expressing MBP only) were sorted. (B) Sort 2: no substrate, fluorescence range above background but lower than main fluorescent peak was sorted. (C) Sort 3: with substrate, high fluorescence tail was sorted 50 3.3.3 Targeted Analysis of Enriched MBP-cpEGFP Constructs A microwell format for testing the performance of individual sensor constructs was developed (Figure 3-3E). Random colonies obtained from the final round of sorting were picked and grown in 96-well plates. Auto-induction medium was used to express protein in the small-scale format. After induction, a bulk fluorescence measurement was taken of each well. Maltose was then added to a saturating level (1 mM) in each well and bulk fluorescence was measured again. The amount of change in fluorescence was calculated to evaluate the functionality of each potential sensor (Figure 3-8A). Measuring fluorescence with zero and saturating maltose levels can determine the maximum dynamic range of a construct. Of 24 samples initially tested with maltose, over 20% showed significant sensor function. These hits were sequenced to determine the insertion location of cpEGFP within MBP. We found two unique insertion sites and the measured performance of each site clustered together at distinct levels (Figure 3-8A). The two insertion sites found, MBP amino acid residue 170 and residue 182 (MBP170-cpEGFP and MBP182- cpEGFP, respectively), are also in close proximity in the three-dimensional MBP structure (Figure 3-8B). Interestingly, they are both also in the known “hinge” region of MBP around which the two halves of the structure bend (Sharff et al. 1992). This area of the structure would presumably undergo large conformational changes upon ligand binding; previous efforts to make a MBP-cpEGFP sensor even targeted this area for this reason (Marvin et al. 2011). It is also important to note that both active sensors isolated from our screening strategy had an increase in fluorescence when binding maltose, which is the turn-on behavior for which we sorted (i.e. no sensors were isolated that turned off when binding maltose). This shows our iterative sorting screen enriched for insertions within a high- conformational-change region and for constructs that show turn-on sensor behavior.

51 A in vivo measurements B

insert site 170

165

182 F/F

Δ insert site 182 170

175 sample # Ffinal - Finitial ΔF/F = Finitial

Figure 3-8: Microplate in vivo analysis of MBP-cpEGFP constructs from sorting Constructs were grown and expressed in 96-well plates. Fluorescence was measure before (Finitial) and after (Ffinal) addition of maltose. (A) 24 random colonies were chosen and tested, reported in values of ΔF/F (defined in the figure). A value above zero indicates an increase in fluorescence upon maltose addition, a negative value indicates a decrease, and a zero value indicates no change in fluorescence. Two samples with ΔF/F of ~0.7 were both insertions at residue 182 of MBP. Three samples with ΔF/F of ~2-2.3 were all insertions at residue 170 of MBP. (B) These two residues are highlighted in orange on the MBP crystal structure, with side chains of these residues shown with stick models (PDB 1ANF). Two sites targeted in a previous effort to generate a MBP-cpEGFP maltose sensor are highlighted in magenta (Marvin et al. 2011).

3.3.4 Linker Optimization of MBP170-cpEGFP The MBP-cpEGFP library that was screened by FACS did not have any variations in the linkers that attached the two domains (see Figure 3-5), so we built a new linker- variation library with the best insertion site found, MBP170. Previous work has shown that for a functional connection site, testing linker variations can produce a better performing sensor (Akerboom et al. 2012, Hung et al. 2011, Marvin et al. 2011, 2013; Nakai et al. 2001). Thus, we wanted to test how much improvement could be made in MBP170-cpEGFP through linker optimization. The cpEGFP domain was cloned into the 170 site with linker variation, both in length (0 to 4 amino acids) and composition (equal probability of Ala, Arg, Gly, Pro, Ser, or Thr), to see if sensor function (i.e. dynamic range) could be improved. The new library was not sorted to enrich sensor behavior; instead 180 random members of the initial pool were tested to ascertain the effect of linker variation on performance at this site (Figure 3-9A). We found that at site MBP170, even with completely random linkers, over 75% of the constructs still showed significant turn-on sensor behavior with maltose. In addition, a handful had much higher dynamic range than the rest of the library samples. These top hits showed significant improvement over the parent MBP170-cpEGFP construct, as well as up to an order of magnitude greater dynamic range than a previously published MBP165-cpEGFP sensor (Marvin et al. 2011) (Figure 3-9B). This confirms previously published findings, that linker composition is critical to optimizing the performance of a fusion biosensor.

52 MBP170 Linker Library A F/F Δ

Sample # B

F/F n=5 Δ

Figure 3-9: Microplate testing of MBP170 insertion constructs Constructs were again measured in 96-well plates. (A) 180 random samples from the MBP170-cpEGFP library were tested for sensor performance against saturating levels of maltose. (B) The top 8 samples from the initial 180 were tested with biological replicates, comparing their performance to the parent construct (isolated from the FACS sorting), a published maltose sensor (EcMBP165-cpGFP.PPYF.T203V, Marvin et al. 2011), cpEGFP control protein, and MBP control protein. Error bars are ±SD (some are small enough to be obscured by data point circle).

The top hits found from screening the MBP170-insertion linker-variation library were sequenced to determine the amino acid composition of their linkers (Table 3-2). While the parent MBP170-cpEGFP sensor had linkers of three amino acids on both sides (two coded by the transposon and one from MBP sequence duplication, see Figure 3-5), the optimized linkers varied from two to four amino acids on either side. Linkers longer than four amino acids were not present in the library, so it is unknown how they would function. No linkers of zero or one amino acids were seen in the top hits. Beyond that observation, no clear trends were seen in length or composition in the optimized sensors.

53 Table 3-2: Linker composition of top eight MBP170 constructs Amino acid sequences of the linkers from the top 8 hits of the MBP170-insertion linker-variation library. The parent sequence, from the MBP170-cpEGFP construct isolated from the FACS screening, is also listed. As illustrated below the table, N-Linker is the amino acid sequence connecting the N-terminal section of MBP to the N-terminus of cpEGFP. Likewise, C-linker is the sequence connecting the C-terminal section of MBP to the C-terminus of cpEGFP.

Clone ΔF/F N-Linker C-Linker B2 4.62 AT SR B4 1.76 GGGG SARA F1 1.54 GRGR RTP B7 1.17 SG AR A5 0.9 GT APGP F5 0.88 RGA RRAT G1 0.8 GA PARP A12 0.74 ST TTTP Parent 0.67 NAS ASF

N-Linker C-Linker

MBP 1-169 cpGFP MBP 170-370

Further testing of MBP170-cpEGFP optimized variants was done with flow cytometry to provide single cell measurements. Samples were grown in a defined MOPS medium to reduce background fluorescence. Fluorescence measurements were taken either with 1mM maltose or without maltose (Figure 3-10). Of the top five sensors in Table 3-2, two constructs, B2 and F1, had significantly greater dynamic ranges. The change in rank order of the top hits is likely due to the sensitivity of MBP to glucose (data shown below), which was present in the medium in the previous bulk measurements and could have altered fluorescence in the no- maltose conditions. With the improved medium and measurement technique, both B2 and F1 showed an order of magnitude better performance than the previously published maltose sensor (Figure 3-10). Additionally, the absolute fluorescence levels of the MBP170 sensors can span two orders of magnitude (parent construct with no maltose to B2 construct with maltose). This shows that cpEGFP fluorescence can vary greatly across different biosensor constructs and conditions, though it is still unclear what the maximum dynamic range of a single construct

54 could be. Construct B2, when maltose is present, can achieve fluorescence levels similar to that of EGFP, which is theoretically an upper limit for cpEGFP fluorescence. At the low end, the parent MBP170-cpEGFP construct with no maltose is only ~3.5 times more fluorescent than MBP on its own. A test with a GFP variant that has a chromophore always in a protonated state (Figure 3-1B) would be needed to establish a theoretical lower limit of cpEGFP fluorescence.

55 ΔF/F = 7.7 B2

ΔF/F = 10.1 F1

ΔF/F = 3.6 parent

ΔF/F = 0.6 published sensor normalized cell count normalized

EGFP

MBP

AFU

Figure 3-10: Flow cytometry measurements of select MBP170 constructs Fluorescence histograms measured on a Sony SH800 instrument are overlaid. Dotted lines represent the approximate mean of each sample. MBP, the negative control, is shown in gray. EGFP, the positive control, is shown in green. All sensor constructs were measured twice: once with no maltose (lighter color) and once with saturating levels of maltose (darker shade of respective color) (control samples showed no significant change with maltose addition). ΔF/F is calculated from the mean of each population in a given condition (with or without maltose). The published sensor is EcMBP165- cpGFP.PPYF.T203V (Marvin et al. 2011). The linker composition of the MBP170-insert sensors (parent, F1, and B2) is shown in Table 3-2.

56 3.3.5 In Vitro Biochemical Analysis of a Functional Maltose Sensor The brightest biosensor, MBP170-cpEGFP construct B2, was purified and tested in vitro to determine binding specificity and tightness (Figure 3-11). Using a panel of sugar substrates, tested at saturating concentrations, B2 was found to respond to glucose, maltose, and trehalose (Figure 3-11A). All three of these substrates are composed of glucose monomers, thus MBP has binding pockets that can recognize at least a portion of each. Binding curves were measured for each these three substrates, allowing for the calculation of binding constants (Figure 3-11B). The dissociation constant of B2 binding to maltose is 2.6 μM, which is close to the constant for wild-type MBP of ~2 μM (Schwartz et al. 1976, Telmer & Shilton 2003). This suggests that the cpEGFP insertion at MBP residue 170 does not affect the natural binding movement of MBP or its contacts with maltose.

ex485em515 A ex485em515-MALTOSEPBSB ex485em515-MALTOSEmaltose ex485em515 1.0 MBP30000 3000B2 30000 PBS PBS cpEGFP PBS 1.0 sensorMBP B2 B2 sensorMBP B2 B2 MBP B2 cpEGFP 0.5 cpGFP 20000 cpEGFP 20000 2000 cpEGFPcpGFP AFU F/F EC = 2.590e-006 0.5 Kd = EC2.6 =μ 2.590e-006M Δ dF/F RFU 2 R-square = 0.9753 RFU R =R-square 0.975 = 0.9753 0.0 100001000 dF/F 10000 0.0

-0.5 0 0 0 -10 -5-5 0 -10 -5 PBS 0 -0.5 IPTG PBS glucose lactose sucrose maltose fructosesorbosemannose log[maltose],log[Maltose], M M galactose trehalosearabinose log[Maltose],cellobiose M

glucose-6-PPBS 0.5M IPTG glucose trehalose 50mM G-6-P 0.5M Maltose 0.5M Glucose 0.5M Sucrose 0.5M Fructose 25mM Lactose Kd = 16 mM Kd = 34 mM 0.5M IPTG 0.5M Galactose 0.5M Trehalose 0.5M Arabinose 250mM Sorbose 50mM G-6-P 250mM Mannose 50mM Cellobiose 0.5M Maltose 0.5M Glucose 0.5M Sucrose 0.5M Fructose

Figure 25mM Lactose 3-11: In vitro analysis of MBP170-cpEGFP construct B2 0.5M Galactose 0.5M Trehalose 0.5M Arabinose 250mM Sorbose

Purified B2 (red squares) and cpEGFP (green triangles) protein were tested for changes in fluorescence 250mM Mannose in PBS with various substrates. PBS only (blue circles) served as a negative control. (A) Dynamic range of 50mM Cellobiose fluorescence was measured using saturating concentrations of various substrates. (B) A binding curve was fit to fluorescent measurements from a wide range of substrate concentrations. A plot is shown for the substrate maltose. The same experiment was carried out with glucose and trehalose, as well (calculated Kd values shown).

3.4 Discussion

We have shown that a library of potential maltose biosensors can be made by randomly inserting cpEGFP into the MBP gene using a modified transposon. The expression plasmid must not contain elements that would improve cell fitness by disruption. Alternatively, an initial plasmid could be used for transposition that has cloning sites around the MBP gene to move only the MBP ORF, and its transposon inserts, to a separate expression plasmid. This would prevent insertions in any regulatory element and would also reduce the unproductive insertions present

57 within the expression library. During the cloning step, size selection on an agarose gel could be used to only clone MBP fragments containing a transposon insert.

Expressing MBP-cpEGFP constructs in E. coli and iterative positive and negative screening with FACS enriched for active maltose sensors. There was generally a contaminating non-expressing population of cells, even in the positive controls, so optimization of the expression system or cells would be beneficial, especially when screening a larger initial library where maximizing throughput is critical. Even without the possible improvements mentioned, we demonstrated a method that can produce a functional fluorescent biosensor in as little as two weeks (Figure 3-12). We found two insertion sites in MBP that created functional fluorescent sensors with cpEGFP. These two sites were enriched by sorting and made up over 20% of the final pool (Figure 3-8). Notably, the insertions sites isolated were in the “hinge” region of MBP that other researchers have rationally targeted in various binding proteins to make sensors (Marvin et al. 2011, 2013), but no prior information about the MBP structure was needed to find these sites with our method.

expression transposition cloning

4-5 Days

sorting

targeted testing

cellcount

cellcount cellcount fluorescence 4-7 Days

Figure 3-12: Overall workflow timeline of biosensor creation platform

We found that optimizing the linkers between MBP and cpEGFP greatly improved the dynamic range of sensor constructs (Figure 3-9), as was shown in previous studies (Akerboom et al. 2012, Hung et al. 2011, Marvin et al. 2011, 2013; Nakai et al. 2001). Sorting was not even used, as eight samples from only 180 random 58 MBP170-insertion linker-varied constructs showed significant dynamic-range improvement over the parent construct. The best construct’s ΔF/F of ~10 is better than all reported single FP biosensors besides GCaMP, which is a Ca2+ sensor that has been iteratively improved over the past 13 years (Akerboom et al. 2012, Nakai et al. 2001, Tallini et al. 2006, Tian et al. 2009, Zhao et al. 2011). However, no trend was found for the length or composition of the linkers in the top hits. This gives further support for our starting assumption that the performance of a linker composition, like a specific cpEGFP insertion site in a LBD, is very difficult to predict a priori. Thus, it is best to test all insertion sites and linker compositions in a library and screen out the winners.

With regards to testing linker variations, the cloning step in our biosensor workflow (Figure 3-12) permits linker variations to be encoded in primers such that a more diverse library can be created at the outset. This would eliminate the need for follow-up linker optimization. However, testing will be needed to determine the feasibility of this approach since the library size would increase drastically (from 6864 to >12 million when adding zero, one, or two amino acid residues to either end of the cpEGFP with equal probability of six amino acids). The library size could be reduced greatly by adopting the alternative transposition plasmid approach discussed above (e.g. only 2220 insertions sites in MBP only, versus 6864 in the whole plasmid). The number of rounds of FACS screening could also be increased to provide further enrichment of high-performing sensors. Additionally, we found that for a good sensor insertion site (MBP residue 170 in this case) a majority of linker variants showed good sensor behavior (Figure 3-9A). Thus, full library screening may not be needed to get close to a global maximum of performance. Follow-up linker optimization could be used to see if improvements could still be made.

This work shows that biosensors could be efficiently made from maltose-binding protein. Because the transposon method used is independent of the DNA sequence, this biosensor development strategy may be generalizable to other ligand binding proteins. Leaving behind the slow and laborious rational-design methods used previously, this insertion-library approach quickly sifted out functional sensors from a pool of domain fusions. Importantly, no prior structural knowledge of the binding protein was needed and this expands the pool of potential future LDBs to any protein that shows binding to a ligand of interest. It may even be possible to expand this approach to proteins that are hypothesized to bind a certain ligand such that a successful sensor would prove protein-ligand interaction. Additionally, transposition can be carried out on multiple gene targets at once, creating a library of LBD-cpEGFPs with multiple LBDs. This expanded library can be screened with a ligand and the best LBD for sensor functionality can be enriched. This could be especially useful if there are multiple theoretical binders of the ligand of interest. Overall, this method is a valuable tool for creating biosensors for a ligand of interest, whether a binding domain against the ligand is known or suspected.

59 3.5 References

Akerboom J, Chen T-W, Wardill TJ, Tian L, Marvin JS, et al. 2012. Optimization of a GCaMP calcium indicator for neural activity imaging. J. Neurosci. 32(40):13819– 40 Akerboom J, Rivera JDV, Guilbe MMR, Malavé ECA, Hernandez HH, et al. 2009. Crystal structures of the GCaMP calcium sensor reveal the mechanism of fluorescence signal change and aid rational design. J Biol Chem. 284(10):6455– 64 Arnold FH, Volkov AA. 1999. Directed evolution of biocatalysts. Current opinion in chemical biology. 3(1):54–59 Belousov VV, Fradkov AF, Lukyanov KA, Staroverov DB, Shakhbazov KS, et al. 2006. Genetically encoded fluorescent indicator for intracellular hydrogen peroxide. Nat Methods. 3(4):281–86 Berg J, Hung YP, Yellen G. 2009. A genetically encoded fluorescent reporter of ATP:ADP ratio. Nat Methods. 6(2):161–66 Chattoraj M, King BA, Bublitz GU, Boxer SG. 1996. Ultra-fast excited state dynamics in green fluorescent protein: multiple states and proton transfer. P Natl Acad Sci Usa. 93(16):8362–67 Cubitt AB, Woollenweber LA, Heim R. 1999. Understanding structure-function relationships in the Aequorea victoria green fluorescent protein. Methods Cell Biol. 58:19–30 Engler C, Kandzia R, Marillonnet S. 2008. A one pot, one step, precision cloning method with high throughput capability. PLoS ONE. 3(11):e3647 Evenäs J, Tugarinov V, Skrynnikov NR, Goto NK, Muhandiram R, Kay LE. 2001. Ligand-induced structural changes to maltodextrin-binding protein as studied by solution NMR spectroscopy. J Mol Biol. 309(4):961–74 Herzenberg LA, Parks D, Sahaf B, Perez O, Roederer M, Herzenberg LA. 2002. The history and future of the fluorescence activated cell sorter and flow cytometry: a view from Stanford. Clin. Chem. 48(10):1819–27 Honda Y, Kirimura K. 2013. Generation of circularly permuted fluorescent-protein- based indicators for in vitro and in vivo detection of citrate. PLoS ONE. 8(5):e64597 Hung YP, Albeck JG, Tantama M, Yellen G. 2011. Imaging Cytosolic NADH-NAD(+) Redox State with a Genetically Encoded Fluorescent Biosensor. Cell Metab. 14(4):545–54 Ji Q, Zhao BS, He C. 2013. A highly sensitive and genetically encoded fluorescent reporter for ratiometric monitoring of quinones in living cells. Chem. Commun. (Camb.). 49(73):8027–29 Marvin JS, Borghuis BG, Tian L, Cichon J, Harnett MT, et al. 2013. An optimized

60 fluorescent probe for visualizing glutamate neurotransmission. Nat Methods. 10(2):162–70 Marvin JS, Schreiter ER, Echevarría IM, Looger LL. 2011. A genetically encoded, high- signal-to-noise maltose sensor. Proteins. 79(11):3025–36 Medintz IL, Deschamps JR. 2006. Maltose-binding protein: a versatile platform for prototyping biosensing. Curr Opin Biotech. 17(1):17–27 Nakai J, Ohkura M, Imoto K. 2001. A high signal-to-noise Ca(2+) probe composed of a single green fluorescent protein. Nat. Biotechnol. 19(2):137–41 Nausch LWM, Ledoux J, Bonev AD, Nelson MT, Dostmann WR. 2008. Differential patterning of cGMP in vascular smooth muscle cells revealed by single GFP- linked biosensors. Proc. Natl. Acad. Sci. U.S.A. 105(1):365–70 Okumoto S. 2010. Imaging approach for monitoring cellular metabolites and ions using genetically encoded biosensors. Curr Opin Biotech. 21(1):45–54 Okumoto S, Jones A, Frommer WB. 2012. Quantitative imaging with fluorescent biosensors. Annu Rev Plant Biol. 63:663–706 Oltrogge LM, Wang Q, Boxer SG. 2014. Ground-state proton transfer kinetics in green fluorescent protein. Biochemistry-Us. 53(37):5947–57 Sample V, Mehta S, Zhang J. 2014. Genetically encoded molecular probes to visualize and perturb signaling dynamics in living biological systems. J. Cell. Sci. 127(Pt 6):1151–60 Schwartz M, Kellermann O, Szmelcman S, Hazelbauer GL. 1976. Further studies on the binding of maltose to the maltose-binding protein of . Eur J Biochem. 71(1):167–70 Sharff AJ, Rodseth LE, Spurlino JC, Quiocho FA. 1992. Crystallographic evidence of a large ligand-induced hinge-twist motion between the two domains of the maltodextrin binding protein involved in active transport and chemotaxis. Biochemistry-Us. 31(44):10657–63 Stoner-Ma D, Jaye AA, Matousek P, Towrie M, Meech SR, Tonge PJ. 2005. Observation of excited-state proton transfer in green fluorescent protein using ultrafast vibrational spectroscopy. J Am Chem Soc. 127(9):2864–65 Studier FW. 2005. Protein production by auto-induction in high density shaking cultures. Protein Expr. Purif. 41(1):207–34 Tallini YN, Ohkura M, Choi B-R, Ji G, Imoto K, et al. 2006. Imaging cellular signals in the heart in vivo: Cardiac expression of the high-signal Ca2+ indicator GCaMP2. P Natl Acad Sci Usa. 103(12):4753–58 Telmer PG, Shilton BH. 2003. Insights into the conformational equilibria of maltose- binding protein by analysis of high affinity mutants. J Biol Chem. 278(36):34555–67 Tewson P, Westenberg M, Zhao Y, Campbell RE, Quinn AM, Hughes TE. 2012.

61 Simultaneous detection of Ca2+ and diacylglycerol signaling in living cells. PLoS ONE. 7(8):e42791 Tian L, Hires SA, Mao T, Huber D, Chiappe ME, et al. 2009. Imaging neural activity in worms, flies and mice with improved GCaMP calcium indicators. Nat Methods. 6(12):875–81 Tsien RY. 1998. The green fluorescent protein. Annu. Rev. Biochem. 67:509–44 Zenobi R. 2013. Single-cell metabolomics: analytical and biological perspectives. Science. 342(6163):1243259 Zhao Y, Araki S, Wu J, Teramoto T, Chang Y-F, et al. 2011. An expanded palette of genetically encoded Ca²⁺ indicators. Science. 333(6051):1888–91

62 Chapter 4 Domain Insertions for Protein Recruitment

Adapted from Methods in Enzymology, Volume 546, Oakes B. L., Nadler D. C., & Savage D. F., Protein Engineering of Cas9 for Enhanced Function, Pages 491–511, Copyright 2014, with permission from Elsevier.

4.1 Introduction

The manipulation of gene sequence and expression is fundamental to unraveling the complexity of biological systems. However, our inability to make such manipulations across different organisms and cell types has limited the power of recombinant DNA technology to a handful of model systems. Consequently, numerous strategies for genome engineering—the ability to programmably disrupt or replace genomic loci—have emerged in recent years, yet there remains no universal solution to the problem (Carroll 2014).

Investigation of bacterial adaptive immunity, known as the “clustered regularly interspaced short palindromic repeats” (CRISPR) system, led to the discovery of the RNA-guided DNA nuclease Cas9, which has proven a particularly potent tool for genome engineering (Barrangou et al. 2007, Deltcheva et al. 2011, Jinek et al. 2012, Wiedenheft et al. 2012). In its biological context, Cas9 is part of a Type II CRISPR interference system that functions to degrade pathogenic phage or plasmid DNA. Targeting of Cas9 is enabled by host-encoded CRISPR-RNAs (crRNAs), which recognize, through RNA:DNA hybridization, 20 bp of complementary target DNA sequence (referred to as a protospacer) (Figure 4-1A). The Cas9 protein itself also plays a role in target recognition by binding a short DNA sequence adjacent and opposite the protospacer, called the protospacer adjacent motif (PAM). Although there is significant variation in PAM specificity among Cas9 orthologs the commonly employed Cas9 from Streptococcus pyogenes (SpCas9) recognizes the PAM sequence 5′-NGG-3′. PAM binding is thought to prime Cas9 for target recognition by the crRNA sequence (Sternberg et al. 2014). Upon target recognition, two nuclease domains, termed the RuvC and HNH domains because of their sequence similarity to other endonucleases, engage and cleave the separated strands of DNA between 3 and 4 bp upstream of the PAM site (Jinek et al. 2012). A second trans-activating RNA (tracrRNA), with partial complementarity to crRNA, is also required for crRNA maturation and activity. Doudna and colleagues have shown that the crRNA and tracrRNA can be fused together with a tetraloop insertion to form a single guide RNA (sgRNA or “guide”) (Jinek et al. 2012). Expression of Cas9 and this sgRNA is both necessary and sufficient for targeting DNA. Therefore, the rapid success of Cas9-based engineering has been driven by programmability—Cas9 can be targeted to any DNA locus by simply changing the sgRNA sequence.

63

Figure 4-1: Holo Cas9 model and its potential uses. (A) Single guide RNA, target dsDNA, and Cas9 are modeled. Domains of Cas9 are colored accordingly: RuvC-green, BH-pink, RecI-gray, RecII- dark gray, HNH-yellow PI-red. (B) Common uses of Cas9 as a tool. Red x's in Cas9 represent a nuclease dead variant.

Intense interest in Cas9-based genetic engineering has already led to a number of directed alterations which change or improve Cas9 functionality (Figure 4-1B). Based on sequence conservation of the RuvC and HNH nuclease domains, a number of point mutants were constructed to transform the normal endonuclease activity into either a nickase (for genome editing) or a catalytically dead mutant (dCas9) that can function as a transcription inhibitor (CRISPRi) (Cong et al. 2013, Jinek et al.

64 2012, Qi et al. 2013). As PAM recognition is critical to functionality but encoded by the protein, a number of efforts have identified Cas9 orthologs with minimal PAM requirements for use in conjunction with, or in place of, SpCas9 (Esvelt et al. 2013). Finally, a number of N- and C-terminal fusions to Cas9 have been used to recruit alternative factors to specific DNA loci, including RNA polymerase subunits to activate transcription and additional nuclease domains for improving the on-target specificity of genome editing (Bikard et al. 2013, Guilinger et al. 2014, Tsai et al. 2014). These advances show that, from an engineering perspective, Cas9 can be thought of as a unifying factor able to recruit any protein, RNA, and DNA together in the cell (Mali et al. 2013a). However, even with these recent improvements, there are a number of additional desirable features that could be engineered into Cas9.

4.1.1 Current Uses of Cas9 Cas9 has rapidly established itself as a promising genome-engineering technology in widely used model organisms (Friedland et al. 2013, Gratz et al. 2013, Guilinger et al. 2014, Hou et al. 2013, Hsu et al. 2013, Hwang et al. 2013, Li et al. 2013, Nishimasu et al. 2014, Niu et al. 2014, Tsai et al. 2014, Wang et al. 2013). In these systems, Cas9 has been used to create both small genomic insertions and deletions (indels) via nonhomologous end joining and to facilitate larger sequence manipulations with homologous recombination. Cas9 also allows for multiplexed genome engineering, and has been used to generate large knock-out libraries in human cells, a feat both surprising in its simplicity and impressive in its efficacy (Shalem et al. 2014, Zhou et al. 2014). Decoupling the DNA-binding activity of Cas9 proteins from cleavage activity has lead to a broader set of uses such as repression and activation of transcription (Gilbert et al. 2013). Finally, recent evidence suggests that Cas9 may be used to target and cleave RNA, as well (O'Connell et al. 2014). Although still nascent, the simple programmability and effectiveness of Cas9-based technology promises to democratize access to genome, and potentially transcriptome, manipulation.

4.1.2 Initial Cas9 Engineering Questions As previously mentioned, there are a number of clear initial questions pertaining to Cas9 that are addressable using existing protein engineering tools. Namely, we believe that designing novel Cas9s with domain insertions and deletions will lead to the creation of a new family of synthetic orthologs whose outputs are manifold. For example, domain insertions could act to recruit additional protein partners with desired activity onto Cas9-associated nucleic acids, while domain deletions will reduce Cas9's size and increase its versatility.

Alternatively, improving N- or C-terminal fusions with engineered linkers or creating Cas9s with new N- and C-termini altogether, may greatly increase the efficacy of fusions. For example, to address issues of Cas9 targeting specificity dCas9 has been fused to FokI, an obligate dimeric sequence-independent nuclease (Guilinger et al. 2014, Tsai et al. 2014). This system requires the mutual on-target activity of two different FokI–dCas9 fusions to adjacent sites, a combined 40 bp of 65 targeting, to catalyze a DNA cleavage event. Unfortunately, these FokI–dCas9 fusions are substantially less active than WT Cas9 at inducing indels, rendering them a less attractive tool. Nevertheless, it is known that FokI protein fusions to other DNA- binding domains can achieve cleavage efficiencies similar to that of WT Cas9 (Hwang et al. 2013, Mali et al. 2013b). Therefore, lower activity of the current FokI– dCas9 is likely due to imperfect positioning of the FokI nuclease domain and further engineering the dCas9–FokI interface should yield an increase in activity.

Lastly, split proteins are known to function as switches or response elements in many different systems (Olson & Tabor 2012). Splitting Cas9, as mentioned above, would be a simple method for engineering allosteric control and open the door to a number of uses including optogenetics, small-molecule dependence, or linking function to a cellular signal, such as a phosphorylation-dependent signal transduction cascade. Nevertheless, all of the previous engineered scenarios require that Cas9 is active despite the modifications introduced. Therefore, it is imperative that any engineering attempt start with an assay allowing for the separation of active mutant proteins.

4.1.3 Goals Enabling complex functions requires more elaborate protein engineering efforts than previously attempted. Fortunately, high-resolution structures of apo and holo Cas9 were recently solved and inform this process greatly (Jinek et al. 2014, Nishimasu et al. 2014). Ultimately, the isolation of an enhanced Cas9 with new function will require assaying the activity of large libraries (> 106) of variants. Fortuitously, the basic function of Cas9, disrupting a gene sequence or a gene’s expression, facilitates the construction of a genetic screen and means that directed evolution methods can be employed in the same conditions that functionality is ultimately desired. To this end, we present a method of screening for active Cas9 proteins.

We also demonstrate that Cas9 can be manipulated by domain insertion, a common event found in eukaryotic proteomes (Lander et al. 2001), and that such a library can be successfully screened for functional fusion constructs. Specifically, we created libraries of Cas9 in which the α-syntrophin PDZ domain was randomly inserted throughout the SpCas9 gene once per variant using in vitro transposition- based methods (Edwards et al. 2008). PDZ domains are small (~ 100 amino acids) proteins with adjacent N- and C-termini that mediate protein–protein interactions by specifically binding the C-terminal peptide of a cognate partner in the cell (Nourry et al. 2003). These domains have been used extensively in synthetic biology applications as a tool for scaffolding proteins (Dueber et al. 2003, 2007). In the context of Cas9, PDZ insertion could recruit additional factors to a DNA-bound Cas9 in the cell, such as florescent tags, chromatin remodeling machinery, and nuclease domains.

66 4.2 Materials and Methods

4.2.1 Strains and Media The bacterial strain for cloning was 10G E. coli, electrocompetent, from Lucigen Corp. Chemically competent DH5α E. coli were also used, made by standard calcium chloride techniques. Recovery Medium for electroporation transformations was provided in the electrocompetent kits. All other growth, unless otherwise noted, was done in Lysogeny Broth (LB Broth MILLER, EMD Millipore) liquid medium or on Lysogeny Broth 1.5% w/v agar media plates (Bacto Agar, Becton, Dickinson and Company). Media and all enzymatic reactions were prepared with purified water from a Barnstead Purification System (Thermo Fisher Scientific Inc.).

4.2.2 Electrocompetent E. coli preparation for library construction Construction of libraries containing 106–109 diverse members is a basic step for engineering new functions into Cas9. A simple method for the creation of electrocompetent cells that consistently yields E. coli with transformation efficiencies of ≥ 108 CFU per μg of plasmid was used. If necessary, cells can be pretransformed with additional screening/selection plasmids to maximize library transformation efficiency.

The desired strain was grown as single colonies on appropriate plates. We used a strain already containing plasmid 44251: pgRNA-bacteria (Addgene) with the RFP sgRNA from Qi et al. (2013) grown on carbenicillin plates (100 μg/mL) overnight at 37 °C. A single colony was used to inoculate 5 mL SOC (BD Difco 244310: Super Optimal Broth + 0.4% glucose) plus carbenicillin. This starter culture was grown overnight at 37 °C and the following day all 5 mL was used to inoculate 1 L of SOC with 100 μg/mL carbenicillin. The 1 L culture was grown at 37°C 250rpm for 2-4 hours, to an optical density at 600 nm (OD600) between 0.55 and 0.65. Then the culture was rapidly cooled in an ice bath by swirling. The cells were kept at 4 °C during all subsequent steps.

The culture was centrifuged at 4000 × g for 10 min. Supernatant was poured off and the cells were washed with 500 mL of sterile ice-cold water by gentle resuspension. Cells were centrifuged again at 4000 × g for 10 min. The water wash step was repeated. Cells were then washed twice with 500 mL of ice-cold 10% v/v glycerol. A final spin was carried out and the supernatant discarded. The final pellet was resuspended in 2.75 mL of 10% v/v glycerol and aliquoted into cold microcentrifuge tubes, 75 μL each. The tubes were flash frozen with liquid nitrogen.

Electroporation of competent E. coli was carried out using a BTX ECM 630 Exponential Decay Wave Electroporation System (Harvard Apparatus, Inc.) and Gene Pulser cuvettes, 0.1 cm gap (Bio-Rad Laboratories, Inc., cat# 165-2089). Aliquots of electrocompetent cells were thawed on ice. The appropriate plasmid was added to 75 μL cells and vortexed. They were transformed with the following instrument settings: 1800 V, 200 Ω, 25 μF. Immediately after electroporation, the 67 aliquots were resuspended in warm SOC and recovered for 1 h at 37 °C before adding antibiotics.

4.2.3 DNA Manipulation All standard restriction enzymes used were FastDigest Enzymes from Thermo Fisher Scientific Inc. All Type IIs restriction enzymes used in Golden Gate reactions were obtained from NEB (New England Biolabs). MuA Transposase enzyme was purchased from Thermo Fisher Scientific Inc. (Cat. #F-750). All PCR reactions for cloning used Phusion High-Fidelity Polymerase (Thermo Fisher Scientific Inc.) with the manufacturer’s recommended conditions. Annealing temperatures were calculated with the Thermo Scientific “Tm calculator” that is specific for Phusion reactions. Plasmids were isolated from bacterial cultures with a QIAprep Spin Miniprep Kit or a HiSpeed Plasmid Midi Kit (Qiagen). DNA was isolated from agarose gels using a Zymoclean Gel DNA Recovery Kit (Zymo Research Corp.). DNA cleanups for cloning and transposition procedures were performed with a DNA Clean & Concentrator-5 Kit (Zymo Research Corp.). DNA concentrations were quantified with a NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific Inc.).

4.2.4 Plasmids The expression plasmid for catalytically dead Cas9 was pdCas9 (Addgene plasmid #44249, Qi et al. 2013), referred to as WT dCas9 in this work, containing an ATC- inducible Tet promoter, p15a origin, and a CmR resistance cassette. The negative control, “broken” dCas9, referred to as IT dCas9 in this work, is expressed from the plasmid pITdCas9, which harbors only the C-terminal 250 amino acids of dCas9. The dCas9 gene was moved from an expression plasmid into a minimal pUC-based backbone for use in the transposition reaction. The plasmid pdCas9 was digested with BglII and XhoI to excise the dCas9 gene. The plasmid pUC19_noGGsites was amplified with primers 1 and 2 (Table 4-1), cut with the same enzymes, and ligated with dCas9 to create pUCM-dCas9.

4.2.5 Creation of dCas9 Transposon-Insert Library The plasmid pUCGG-KanR-BsaI-M1-CmR-1 was digested with BglII and HindIII to release the modified transposon. Transposon DNA (1254 bp) was separated from the plasmid backbone (2086 bp) by agarose-gel electrophoresis; the desired smaller band was excised and extracted. A transposition reaction was carried out with the following components: 4 μL 5X MuA reaction buffer, 100 ng BsaI-M1-CmR-1 transposon (purified from transposon-propagation plasmid), 227ng pUCM-dCas9 DNA (~0.5 molar ratio to transposon DNA), 1 μL 0.22 μg/μL MuA transposase enzyme, and water to 20 μL. The reaction was incubated for 18 hours at 30°C followed by 10 minutes at 75°C to heat inactivate MuA. The reaction was cleaned up and eluted with 6 μL of water.

68 Electrocompetent 10G E. coli were transformed with 2 μL of cleaned-up transposition reaction. The cells were recovered with 975 μL of Recovery Medium at 37°C 250 rpm shaking for 1 hour. An aliquot of the recovery culture was spread on an LB-agar plate with appropriate antibiotics to assess reaction efficiency (25 μg/mL chloramphenicol to select for transposon insertion and 50 μg/mL carbenicillin to select for the target-plasmid backbone). Comparing the volume that was plated to the total recovery volume, a total theoretical colony-forming unit (CFU) value was calculated to estimate the total number of cells containing a plasmid with a successful insertion.

The recovery culture was then centrifuged at 4000 g for 5 minutes to pellet the cells and remove supernatant. Cells were resuspended in 50 mL LB with 25 μg/mL chloramphenicol and 50 μg/mL carbenicillin to select for transposon insertions and the target-plasmid backbone, respectively. The culture was grown overnight at 37°C 250 rpm. A midi-prep was carried out on the culture the following day to harvest the plasmid library containing transposon inserts.

The dCas9 genes with BsaI-M1-CmR-1 insertions were harvested from the library by digestion with BglII and XhoI and purified (5385 basepairs) from dCas9 genes with no insertion (4131 basepairs) by agarose-gel electrophoresis. The extracted insertion library was cloned into the backbone of pdCas9 via its BglII and XhoI sites. Briefly, the gel-purified pdCas9 backbone (2574 basepairs) was used in a 20 μL ligation reaction: 1X T4 ligase buffer (NEB), 400 units T4 DNA ligase (NEB), 68 ng dCas9 library, and 32 ng vector backbone (1:1 insert:vector molar ratio). The reaction was incubated overnight at 16°C and for 10 minutes at 65°C to denature the ligase. The reaction was cleaned up and eluted with 6 μL of water. 2 μL was used to transform, select, and purify correctly assembled plasmid following the steps from above for transposition-reaction transformations, using a medium with 25 μg/mL chloramphenicol. The resulting library was pdCas9 plasmids with a random insertion of BsaI-M1-CmR-1 only between the BglII and XhoI sites.

4.2.6 PDZ Cloning into Insert Library The PDZ domain of syntrophin-α1 (mouse SNTA1) was obtained from laboratory of John Dueber in plasmid pKC580. The sequence has mutations relative to the wild- type sequence to eliminate an internal RBS site (GTTGTGTTGGAG -> GTCGTGCTCGAA). Primers were designed to PCR amplify the sequence corresponding to amino acid residues 78 to 163 of syntrophin-α1 (based on Uniprot SNTA1_MOUSE). Three versions of the forward and reverse primers were designed that add zero, one, or two random BCT codons (primers 3 through 8, Table 4-1). This codon had equal probability of encoding a serine, proline, or alanine. The primers adding zero:one:two codons were mixed in a ratio of 1:3:9.

A Golden Gate reaction was used to exchange the modified transposon, harbored within each plasmid of the library, with the PCR amplified PDZ domain. Briefly, 40 fmoles of amplified PDZ and 40 fmoles of pdCas9 transposon library were mixed

69 with 10 units BsaI (NEB), 800 units T4 DNA Ligase (NEB), and 1X T4 DNA Ligase Reaction Buffer (NEB) in a total volume of 20 μL. The reaction was incubated 2 minutes at 45°C, 5 minutes at 16°C (first two steps were cycled 50 times), 20 minutes at 60°C, and 20 minutes at 80°C. The completed reaction was purified and concentrated with a PCR cleanup kit, eluted with 6 μL water.

4.2.7 FACS Screening of dCas9 Library Electrocompetent E. coli (expressing GFP and RFP from the genome and harboring an sgRNA plasmid) was transformed with 1 μg of the PDZ-dCas9 plasmid library. The E. coli strain and guide RNA plasmid come from Qi et al. (2013) (plasmid is Addgene #44251: pgRNA-bacteria). To ensure adequate coverage of the library the transformation efficiency should be at least 5–10 × greater than the theoretical library size. To determine this, we plated 5 μL aliquots of serially diluted transformants, and grew to colonies (overnight, 37 °C) on double-selection media (50 μg/mL chloramphenicol to maintain the engineered dCas9 plasmids and 100 μg/mL carbenicillin to maintain the guide RNA plasmid). The remaining transformed cells were stored at 4 °C overnight.

Once we had determined the volume of the transformation mixture needed to cover the theoretical library size 5–10 times, it was inoculated into 5 mL of rich induction media: SOC, chloramphenicol, carbenicillin and 2 μM anhydrotetracycline (ATC). Tubes of rich induction media were also inoculated with controls WT dCas9 and IT dCas9 with the same RFP sgRNA. The induction cultures were grown at 37 °C and we found shaking ≥ 250 rpm is helpful for maximum RFP and GFP fluorescence. After 8–12 h of growth, 500 μL of each sample was centrifuged, washed 2 × with 1 mL of PBS, and resuspended 1:20 in PBS for flow cytometry.

The control samples were run on the FACS instrument (Sony SH800) to establish correct positive and negative gating. The library was then screened on the FACS instrument with the set positive gate. Cells were sorted into a rich (SOC), non- selective medium. At least 10x the library size was screened because the viability post-sorting was often less than 100%. The sorted cells were recovered for 2 hours at 37°C. Approximately 1000–10,000 of the sorted and recovered cells were plated on rich induction plates with antibiotics and inducer. Plates were incubated overnight at 37°C and then allowed to incubate at room temperature for 12 hours so that RFP could fully mature.

The remaining recovery culture, that was not plated, was added to 6 mL SOC with chloramphenicol and carbenicillin to propagate the library overnight at 37°C. Glycerol stocks of the sorted cells were made the next day by mixing 800 μL of culture with 400 μL of 50% v/v glycerol in LB liquid medium. The remaining liquid overnight culture was centrifuged and the cells were miniprepped to recover the plasmid DNA library. FACS screening was repeated one more time to further enrich the library with functional clones.

70 4.2.8 Library Analysis A PCR was performed to analyze the plasmid DNA from the original and screened libraries. In combination with a PDZ-specific primer (primer 9, Table 4-1), a forward primer which anneals upstream of the dCas9 gene (primer 10) will produce a PCR product if the PDZ insert is coding in the same direction. Thus, the primer set used with a PDZ-dCas9 library will amplify any PDZ inserted in the forward direction (relative to the target gene) and produce a different fragment length for each insert position.

Taq DNA polymerase (NEB) was used for PCR analysis of libraries because Phusion polymerase was found to preferentially amplify certain bands over others and to promote more mis-priming. Reactions were as follows: 2.5 μL Thermopol reaction buffer (NEB), 0.5 μL 10mM dNTPs (NEB), 0.2 μL 25 μM insertion primer, 0.2 μL 25 μM backbone primer, 0.125 μL Taq DNA polymerase (NEB), 1 μL template DNA, and water to 25 μL. Standard Taq thermocycler conditions were used. Completed reactions were analyzed with agarose-gel electrophoresis to visualize the distribution of band lengths.

4.2.9 Isolated Construct Analysis A 96-well PCR plate was setup with 25 μL reactions using the primers and components from the Library Analysis Section (4.2.8). In parallel, a standard 96-well plate was filled with 100 μL of SOC medium per well. From the induction plates, colonies expressing only GFP (RFP expression repressed), as assayed via fluorescence imaging (Bio-Rad Chemidoc MP), were picked. Each colony was used to spot into a well of the PCR plate by pipetting up and down and then, using the same tip, inoculate the corresponding well of the SOC plate. The inoculated SOC plate was stored at 4°C during the sequence analysis. The PCR plate was run in a thermocycler and the resulting amplicons were sent for sequencing. Sequences were aligned to the original plasmid map using SnapGene software (GSL Biotech LLC). Variants were checked for in-frame insertions and all unique clones were noted.

For each unique in-frame clone, 5mL of SOC medium was inoculated with 50 μL of the corresponding well from the 96-well SOC plate. The cultures were grown overnight with antibiotics. Each culture was miniprepped the following day, obtaining both the engineering PDZ-dCas9 plasmid and the RFP guide plasmid from each isolate. The construct was then directly sequenced with a primer upstream of the previously identified dCas9 insertion site to verify the insertion site and determine the linker composition on both ends of PDZ.

For samples of interest, 5 μg of plasmid mixture (PDZ-dCas9 plasmid and RFP sgRNA plasmid) were digested with BsaI to selectively cleave the sgRNA plasmid. The remaining PDZ-dCas9 plasmid was cleaned up to remove DNA fragments and enzymes. The purified PDZ-dCas9 construct was transformed with sgRNA plasmids that target new genes. The sgRNA plasmid targeting gfp expresses the sequence: 5′- AUCUAAUUCAACAAGAAUU-3′. The sgRNA plasmid targeting ftsZ (an essential cell

71 division protein) expresses the sequence: 5′-UCGGCGUCGGCGGCGGCGG-3′. The control dCas9 plasmids (expressing WT dCas9 or IT dCas9) were also transformed with the new sgRNA plasmids. Colonies from selective medium plates were picked and inoculated into rich induction medium (SOC, chloramphenicol, carbenicillin and 2 μM ATC). Cultures were either analyzed for fluorescence on a plate reader (Tecan M1000) or for phenotype on an Axio Observer.Z1 inverted microscope (ZEISS).

4.2.10 Next-Generation Sequencing PDZ-dCas9 library plasmid DNA was sheared with a Covaris S220 focused- ultrasonicator using AFA microTUBEs (product #520052). All DNA cleanups of NGS samples were done with Agencourt AMPure XP SPRI beads (Beckman Coulter, Inc.). Illumina-compatible libraries were prepared from the sheared DNA with a NEBNext DNA Library Prep Reagent Set for Illumina (New England Biolabs). Sheared DNA and prepared samples were analyzed for size distribution on an Agilent 2100 Bioanalyzer using chips from a DNA 1000 kit (Agilent Technologies). Double- stranded DNA concentration of the prepared samples was measure with a dsDNA BR Assay Kit on a Qubit Fluorometer (Life Technologies). Normalized samples were run on a MiSeq instrument with a MiSeq Reagent Kit v3 (150 cycles, Illumina, Inc.).

Table 4-1: List of DNA primers Lower case letters indicate bases that anneal to the template. # Name Sequence (5’->3’) 1 puc_bb_bglII_F CACACCAAGATCtggtttcttagacgtcaggtggc 2 puc_bb_xhoI_R ACACAACCTCGAGtgcattaatgaatcggccaacg 3 PDZ_F_L0 CACACCAGGTCTCACATCTCAACGTCGGCGTgtgacggtgcgcaaggc 4 PDZ_F_L1 CACACCAGGTCTCACATCTBCTCAACGTCGGCGTgtgacggtgcgcaaggc 5 PDZ_F_L2 CACACCAGGTCTCACATCTBCTBCTCAACGTCGGCGTgtgacggtgcgcaaggc 6 PDZ_R_L0 CACACCAGGTCTCTACGCcatgtacttaacttcgagcacgacctcc 7 PDZ_R_L1 CACACCAGGTCTCTACGCAGVcatgtacttaacttcgagcacgacctcc 8 PDZ_R_L2 CACACCAGGTCTCTACGCAGVAGVcatgtacttaacttcgagcacgacctcc 9 PDZ_detect_R gcggtacaggccctcaagaag 10 Plasmid_detect_F cctagcttctgggcgagtttacg

4.3 Results

4.3.1 Screening dCas9 Using Fluorescent Proteins The catalytically dead version of Cas9 has a functional output that can be tied directly to transcription in E. coli; namely, it can repress transcription of a desired gene (Qi et al. 2013). Qi et al. previously demonstrated that dCas9 with a guide sequence of 5′-AACUUUCAGUUUAGCGGUCU-3′ can target and repress a genome- encoded RFP while avoiding repression of a genome-encoded upstream GFP (Qi et al. 2013). In a screening context, this provides a simple output for assaying dCas9 functionality (i.e., RFP knockdown) while correcting for extrinsic noise in the population by monitoring GFP expression (Elowitz et al. 2002). This method of screening is schematized in Figure 4-2A. Cells containing functional dCas9s will

72 repress RFP and express GFP while those with nonfunctional dCas9s will express both fluorescent proteins. We found that these signals were easily distinguished using flow cytometry or florescence imaging (Figure 4-2B and C). Thus, quantification of retained in vivo function of dCas9 variants can efficiently be measured.

Figure 4-2: Screen for functional dCas9s (A) Schematic representation of the screen. (B) Flow cytometry data of the functional, positive (WT dCas9) control in blue and negative “Inactive Truncation” Cas9 (IT dCas9) control in red. IT dCas9 contains only the C-terminal 250 amino acids. Both controls contain the sgRNA plasmid which targets RFP for repression. Samples were grown overnight in rich induction media. (C) Colony fluorescence of the functional (WT dCas9) and “Broken” negative (IT dCas9) controls.

73 4.3.2 Creating and Screening of a PDZ-dCas9 Library The utility of dCas9 is recognized as its ability to bind specific sequence targets within a genome. This function has been harnessed to recruit other proteins to genomic targets through N- or C-terminal fusions of dCas9 (Bikard et al. 2013, Guilinger et al. 2014, Tsai et al. 2014). However, it may be necessary to create fusions at sites within dCas9 other than the N- and C-termini, either to retain function or to specifically position the fused protein relative to any bound DNA. Towards that end, we used a transposon-based domain-insertion technique to insert a SNTA1 PDZ domain with variable linkers into random locations within dCas9. PDZ domains can mediate protein–protein interactions and, thus, recruit factors to its insert location. With the library of possible PDZ-dCas9 fusions, screening is needed to identify those PDZ-insert locations that still retain dCas9 function.

We have found fluorescence screening of dCas9 function to be very effective and the rate of screening can be drastically increased if combined with Fluorescence- Activated Cell Sorting (FACS). Because FACS can efficiently isolate active dCas9 variants from a pool of inactive variants, this technique was applied to the PDZ- dCas9 library (Figure 4-3). The plot of GFP versus RFP expression for the PDZ-dCas9 population shows a small but significant fraction retaining the ability to knock down RFP expression. A gate was created to sort cells that show any reduced RFP expression. This population should contain functional fusions that have retained some ability to target and bind the rfp gene. This sorted pool of cells was propagated and subject to an additional sort with the same gate to further enrich for active variants.

74

Figure 4-3: Cell sorting data from the GFP-RFP screen The first panel depicts the RFP versus GFP measurements of WT dCas9 (blue; light gray in the print version) and IT dCas9 (pink; dark gray in the print version) as run on a Sony SH800 cell sorter. The separation of these two controls into distinct populations is readily apparent. The second panel portrays the spread of the PDZ-Cas9 fusion library (green; dark gray in the print version). The last panel shows the overlay of all three FACS plots. It is evident that there are populations of functional and nonfunctional proteins within the single PDZ-dCas9 library and the third panel shows the gate used for isolation of functional PDZ-dCas9 fusions.

4.3.3 Determining Screening Enrichment of PDZ-dCas9 Domain Insertions A successful round of screening with the PDZ-dCas9 library should enrich for functional PDZ-dCas9 insertion mutants. We developed a simple PCR method to check for enrichment of PDZ-insertion locations. Primers that are specific to the inserted domain and to the plasmid backbone upstream of the engineered Cas9 will produce PCR products of different sizes for each insertion site (Figure 4-4A). An amplified smear indicates a relatively “naïve” library while specific bands indicate enriched library members. We found that each round of FACS for the PDZ-dCas9 library further enriched specific insertion locations (Figure 4-4B). Furthermore, N- and C-terminal insertions were found in this analysis, which is expected based on pervious literature showing such fusions did not affect dCas9 function (Guilinger et al. 2014). When the final library (sorted twice with FACS) was plated with inducer, roughly half of all colonies showed RFP repression (Figure 4-4C). This indicates successful enrichment of active PDZ-dCas9 constructs from FACS screening. This also suggests that enriched bands seen by PCR analysis are likely permissible PDZ insertion sites.

75

Figure 4-4: Checking success of a screen and picking final clones (A) An overview of primer design to check for enrichment of the PDZ-dCas9 library. (B) Gel electrophoresis of PCRs run on the original PDZ-dCas9 library and the first and second rounds of screening. The banding patterns that appear after the first and second sorts are indicative of library enrichment, and represent the insertion sites of a PDZ domain. It is also evident that the N- and C- termini fusions to PDZ are also enriched. Since these N- and C-terminal fusions are expected to be functional, this serves as an internal control. (C) Fluorescent image of the on-plate screen. Colonies that express only GFP are expected to have a functional PDZ-dCas9.

4.3.4 Identifying and Testing PDZ-dCas9 Clones from a Screened Library We next determined which FACS-enriched constructs were in fact successful PDZ- dCas9 fusions. The construct PDZ-1188dCas9 (PDZ inserted at amino acid residue 1188 of dCas9) is shown in Figure 4-5 as an example of our validation process. The final library was screened on plates with inducer, and colonies that showed RFP repression were picked for analysis (Figure 4-4C). The selected constructs were sequenced to determine the insertion site and linker makeup (Figure 4-5A). The 76 function of a fusion was validated by testing it with two new sgRNA plasmids. First, an sgRNA plasmid targeting the gfp gene was used in combination with a 96-well plate fluorescence assay (Figure 4-5B). The knockdown of the genomic gfp gene was quantified and compared to control dCas9 constructs. The function was also tested with a phenotype assay by using an sgRNA targeting the ftsZ gene of E. coli. The FtsZ protein is an essential cell-division component and when knocked out, creates an elongated cell phenotype (Figure 4-5C). Using these methods, we identified and validated 23 unique PDZ insertion sites that create a functional fusion (20 at WT dCas9 levels). While this is a good set of constructs with which to start testing protein recruitment, this method of screening has limited throughput and, consequently, identifying all functional insertions is difficult for such a large protein.

Figure 4-5: Validating functionality of engineered dCas9 (A) Sequence validation of the PDZ-1188dCas9. Sequence alignment via SnapGene. (B) Quantitative repression of GFP by the PDZ-1188dCas9 fusion clone. Bulk florescence measurements of GFP expression levels over 5 h. Double asterisks represent a p value of < 0.0001 in a one way ANOVA. Single asterisk represents p values of < 0.0001 in an unpaired Student's t-test. (C) Qualitative repression of the ftsZ gene by PDZ-dCas9 and controls. The scale bar is 5 μm.

4.3.5 Next-Generation Sequencing of PDZ-Insert Libraries The enriched PDZ-dCas9 library was also analyzed by next-generation sequencing (NGS) to determine the distribution of allowed insertion sites across dCas9. NGS offers a detailed view of the library makeup that is more comprehensive and quantitative than PCR analysis or plate screening. The plasmid DNA from the original and enriched libraries were sheared and Illumina-compatible adaptors were added to the DNA fragments. The prepared samples were sequenced on an Illumina MiSeq instrument and greater than 4 million reads were obtained for each library. All reads that overlapped the junction between dCas9 and PDZ gave data on insertion positions (Figure 4-6). The enriched libraries showed many permissible

77 insertion areas across the dCas9 sequence. Interestingly, the N-terminus of dCas9 showed very little tolerance to PDZ insertion. The crystal structure of dCas9 bound to DNA shows that the first ~200 amino acids are either buried residues or in contact with sgRNA (Nishimasu et al. 2014), so a domain insertion in those regions may inhibit either protein folding or binding to target DNA. These data provide a map of potential sites for protein recruitment.

first round of PDZ screening (post/pre-screen) ratio of # of reads reads of # of ratio second round of PDZ screening (post/pre-screen) ratio of # of reads reads of # of ratio overlay (post/pre-screen) ratio of # of reads reads of # of ratio dCas9 amino acid position of PDZ insertion

Figure 4-6: Next-generation sequencing of screened PDZ-dCas9 library The number of reads observed at a given insert position were compared between pre-screen and post- screen libraries to give a ratio. First and Second rounds of PDZ-dCas9 library are those shown in Figure 4-4B. Ratios above 1 indicate an enrichment of active PDZ-dCas9 for that insert site. Ratios below 1 indicate loss of functionality for an insert at the given site. Certain regions are more amendable (e.g. C- terminus) and less amenable (e.g. N-terminus) to domain insertions.

4.4 Discussion

We found that using dCas9’s ability to repress transcription of a fluorescent protein (FP) was an efficient method to screen for dCas9 function. Coupled with FACS, this method enriched a large library of PDZ-dCas9 fusion proteins. The active PDZ-dCas9 constructs isolated serve as collection of dCas9’s that can recruit proteins to various sites on its structure. The collection can be used to study the optimal topology of DNA factors relative to dCas9 when targeted to a gene. The sites found may also map sites that are more generally permissive of domain insertions, not just PDZ. Further follow-up studies with different domains is needed to determine the generalizability of the sites found. Alternative domains could also further expand the function of dCas9 in genomic applications. One example of this may be to develop dCas9 as a chromatin-modifying enzyme. Alternatively, we may find domains that could allosterically regulate the activity of dCas9 or Cas9.

The FP knockdown screening method may also serve as a proxy of retained WT Cas9 function (binding and DNA cleavage), though further testing is needed to

78 assess this possibility. One issue may be that, for example, a domain-insertion library could have constructs with an insertion that does not inhibit binding but abolishes the nuclease activity of Cas9. If too many of these constructs are present in the library, FP knockdown may not be a useful indicator of Cas9 function, only dCas9 function. However, in either case studying the inserts that disrupt only nuclease activity, and not binding, could provide insight into functional segregation within the Cas9 structure.

Overall, Cas9 has fundamentally altered the genome-engineering landscape due to its simple programmability and overall effectiveness. We have demonstrated protein-engineering-based methods for the enhancement of Cas9 proteins. We believe that such techniques will be critical for answering unresolved biochemical questions of protein structure and function. Moreover, directed evolution of Cas9 will allow for more refined improvement of this singular protein and the construction of next-generation tools for both interrogating the genome and biomedical therapies.

4.5 References

Barrangou R, Fremaux C, Deveau H, Richards M, Boyaval P, et al. 2007. CRISPR provides acquired resistance against viruses in prokaryotes. Science. 315(5819):1709–12 Bikard D, Jiang W, Samai P, Hochschild A, Zhang F, Marraffini LA. 2013. Programmable repression and activation of bacterial gene expression using an engineered CRISPR-Cas system. Nucleic Acids Res. 41(15):7429–37 Carroll D. 2014. Genome engineering with targetable nucleases. Annu. Rev. Biochem. 83:409–39 Cong L, Ran FA, Cox D, Lin S, Barretto R, et al. 2013. Multiplex genome engineering using CRISPR/Cas systems. Science. 339(6121):819–23 Deltcheva E, Chylinski K, Sharma CM, Gonzales K, Chao Y, et al. 2011. CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III. Nature. 471(7340):602–7 Dueber JE, Mirsky EA, Lim WA. 2007. Engineering synthetic signaling proteins with ultrasensitive input/output control. Nat. Biotechnol. 25(6):660–62 Dueber JE, Yeh BJ, Chak K, Lim WA. 2003. Reprogramming control of an allosteric signaling switch through modular recombination. Science. 301(5641):1904–8 Edwards WR, Busse K, Allemann RK, Jones DD. 2008. Linking the functions of unrelated proteins using a novel directed evolution domain insertion method. Nucleic Acids Res. 36(13):e78 Elowitz MB, Levine AJ, Siggia ED, Swain PS. 2002. Stochastic gene expression in a single cell. Science. 297(5584):1183–86 79 Esvelt KM, Mali P, Braff JL, Moosburner M, Yaung SJ, Church GM. 2013. Orthogonal Cas9 proteins for RNA-guided gene regulation and editing. Nat Methods. 10(11):1116–21 Friedland AE, Tzur YB, Esvelt KM, Colaiácovo MP, Church GM, Calarco JA. 2013. Heritable genome editing in C. elegans via a CRISPR-Cas9 system. Nat Methods. 10(8):741–43 Gilbert LA, Larson MH, Morsut L, Liu Z, Brar GA, et al. 2013. CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell. 154(2):442–51 Gratz SJ, Cummings AM, Nguyen JN, Hamm DC, Donohue LK, et al. 2013. Genome engineering of Drosophila with the CRISPR RNA-guided Cas9 nuclease. Genetics. 194(4):1029–35 Guilinger JP, Thompson DB, Liu DR. 2014. Fusion of catalytically inactive Cas9 to FokI nuclease improves the specificity of genome modification. Nat. Biotechnol. 32(6):577–82 Hou Z, Zhang Y, Propson NE, Howden SE, Chu L-F, et al. 2013. Efficient genome engineering in human pluripotent stem cells using Cas9 from Neisseria meningitidis. Proc. Natl. Acad. Sci. U.S.A. 110(39):15644–49 Hsu PD, Scott DA, Weinstein JA, Ran FA, Konermann S, et al. 2013. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol. 31(9):827–32 Hwang WY, Fu Y, Reyon D, Maeder ML, Tsai SQ, et al. 2013. Efficient genome editing in zebrafish using a CRISPR-Cas system. Nat. Biotechnol. 31(3):227–29 Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, Charpentier E. 2012. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science. 337(6096):816–21 Jinek M, Jiang F, Taylor DW, Sternberg SH, Kaya E, et al. 2014. Structures of Cas9 endonucleases reveal RNA-mediated conformational activation. Science. 343(6176):1247997 Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, et al. 2001. Initial sequencing and analysis of the human genome. Nature. 409(6822):860–921 Li J-F, Norville JE, Aach J, McCormack M, Zhang D, et al. 2013. Multiplex and homologous recombination-mediated genome editing in Arabidopsis and Nicotiana benthamiana using guide RNA and Cas9. Nat. Biotechnol. 31(8):688– 91 Mali P, Esvelt KM, Church GM. 2013a. Cas9 as a versatile tool for engineering biology. Nat Methods. 10(10):957–63 Mali P, Yang L, Esvelt KM, Aach J, Guell M, et al. 2013b. RNA-guided human genome engineering via Cas9. Science. 339(6121):823–26 Nishimasu H, Ran FA, Hsu PD, Konermann S, Shehata SI, et al. 2014. Crystal structure of Cas9 in complex with guide RNA and target DNA. Cell. 156(5):935– 80 49 Niu Y, Shen B, Cui Y, Chen Y, Wang J, et al. 2014. Generation of gene-modified cynomolgus monkey via Cas9/RNA-mediated gene targeting in one-cell embryos. Cell. 156(4):836–43 Nourry C, Grant SGN, Borg J-P. 2003. PDZ domain proteins: plug and play! Sci. STKE. 2003(179):RE7 O'Connell MR, Oakes BL, Sternberg SH, East-Seletsky A, Kaplan M, Doudna JA. 2014. Programmable RNA recognition and cleavage by CRISPR/Cas9. Nature Olson EJ, Tabor JJ. 2012. Post-translational tools expand the scope of synthetic biology. Curr Opin Chem Biol. 16(3-4):300–306 Qi LS, Larson MH, Gilbert LA, Doudna JA, Weissman JS, et al. 2013. Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression. Cell. 152(5):1173–83 Shalem O, Sanjana NE, Hartenian E, Shi X, Scott DA, et al. 2014. Genome-scale CRISPR-Cas9 knockout screening in human cells. Science. 343(6166):84–87 Sternberg SH, Redding S, Jinek M, Greene EC, Doudna JA. 2014. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature. 507(7490):62–67 Tsai SQ, Wyvekens N, Khayter C, Foden JA, Thapar V, et al. 2014. Dimeric CRISPR RNA-guided FokI nucleases for highly specific genome editing. Nat. Biotechnol. 32(6):569–76 Wang H, Yang H, Shivalila CS, Dawlaty MM, Cheng AW, et al. 2013. One-step generation of mice carrying mutations in multiple genes by CRISPR/Cas- mediated genome engineering. Cell. 153(4):910–18 Wiedenheft B, Sternberg SH, Doudna JA. 2012. RNA-guided genetic silencing systems in bacteria and archaea. Nature. 482(7385):331–38 Zhou Y, Zhu S, Cai C, Yuan P, Li C, et al. 2014. High-throughput screening of a CRISPR/Cas9 library for functional genomics in human cells. Nature. 509(7501):487–91

81 Chapter 5 Conclusions

5.1 Summary

In this work, we have explored the feasibility and utility of transposon-created insertion libraries for the purposes of protein engineering. In Chapter 2, we investigated modifications of transposons to improve their suitability for creating large insertion libraries. A python script found over a dozen unique restriction sites that could be encoded in a transposon with fewer than four mutations. Transposons were mutated and evaluated with regards to insertion efficiency and amount of scar sequence left behind. One modified transposon, BsaI-M1-CmR-1, was found to have a great balance of efficiency, having limited scarring sequence, and ability to modify cut-site bases (a property of Type IIs restriction sites). It proved effective at creating random-insertion plasmid libraries.

In Chapter 3, we applied the transposon domain-insertion method to link fluorescent proteins (FPs) and ligand-binding domains (LBDs) in random topologies. It was demonstrated that active fluorescent biosensors could be enriched in this library with iterative rounds of fluorescence-activated cell sorting (FACS). All functional sensors that were isolated exhibited the regulation behavior for which we screened, namely, turn-on sensor function. The successful hits indicated successful sites of domain connection that produce allosteric interactions.

Further investigation of the inter-domain linkers in one of the hits found that the allosteric connection could be improved through changes in linker composition and length. However, we were unable to elucidate principles for linker design that promote inter-domain allostery. However, the optimized hits showed that single-FP sensors can achieve significant dynamic range (>10x) and the absolute fluorescence of individual constructs can reach intensities as great as EGFP. Hits were also found that can have an off state that is only ~3.5x brighter than a negative control. Furthermore, in vitro experiments with one successful sensor showed that the binding properties of maltose-binding protein (MBP) were unaffected by the coupled cpEGFP domain.

Finally, in Chapter 5, we outlined a method to find accessible insertion sites within Cas9 for insertion of a protein-protein-interaction domain. A high-throughput FACS screen for retained function proved to be a effective method to enrich for successful constructs. Next-generation sequencing of the enriched libraries gave a detailed map of sites that could tolerate a domain insertion. If mapped onto the crystal structure of Cas9, a site to direct protein-protein interactions in a specific topology could be chosen. This ability will further the possible functions that Cas9 can provide in genome manipulation.

82 Overall, we have found the modularity of protein domains is useful for combining functions to create proteins with new capabilities, including allosteric regulation. We believe that the concept of combining domains is underutilized in the field of protein engineering and further investigation will lead to readily achievable novel functions. In addition, more development and study of synthetic multi-domain constructs will advance mechanistic understanding of domain interactions and communication.

5.2 Discussion

The transposon-based domain-insertion method outlined in Chapter 2 was used to create plasmid libraries in Chapters 3 and 4. Two improvements were made in the implementation of the method from Chapter 3 to Chapter 4. First, in Chapter 4, the target gene ORF was transferred to a new plasmid after the transposition reaction. This created a library with insertions only in the target gene region thereby decreasing the proportion of unproductive domain insertions in the final library. Not only does this step pre-enrich the transposon-insert library, but it also eliminates restrictions on the expression plasmid. In Chapter 3 we designed a minimal expression plasmid because of issues caused by transposons inserting into plasmid regulatory elements. In Chapter 4 we used a minimal plasmid for transposition but then moved the dCas9 ORF to an expression plasmid. The expression plasmid contained an additional gene (tetR) that could have caused problems if we had transposed directly into that vector. This alternative could also produce libraries where the insertion is isolated to one domain within a larger protein or attached to fusions tags or partners.

A second modification made in Chapter 4 was the use of variable domain linkers in the PDZ-dCas9 library. In Chapter 3, we tested one “good” insertion site with a library of variable linkers and found that while most retained biosensor function, over 15% of the constructs showed no significant function. Thus, in Chapter 4 it was presumed that for a given insertion site not all linkers would retain the dCas9 function of the fusion. For a library with constant linkers, such as the one in Chapter 3, it is possible to miss “good” insertion sites because the linkers caused issues at that site. By screening the PDZ-dCas9 library with diverse linkers we were less likely to screen out functional insertion sites because of linker problems. Until the mechanism of domain communication is better understood, we believe that random linkers are critical to isolating the best multi-domain fusions from a library.

A consideration for any protein engineering effort is the context in which variants are screened or selected (“you get what you screen for”) (Arnold 1998). In this work, we chose to screen the libraries in vivo because both biosensors for metabolites and dCas9 genomic tools would likely be used intracellularly in practice. However, unlike traditional random mutagenesis, domain-fusion libraries may have unpredictable consequences on the expression cells used in screening. A protein

83 with a few mutations is likely to fold and behave similarly to its parent sequence, but different domain fusions may have unique folding kinetics, while others may not fold at all or form aggregations. These uncertain behaviors could affect the cells in various ways, including viability or growth rate. Thus, bias could be introduced into the expression library and should be carefully considered when choosing a screening or selection scheme. Specifically, we tested various cell types and induction conditions for expression in order to slow the protein-production rate and prevent misfolding.

Another potential issue of engineering allosteric proteins in vivo is getting the regulating ligand to the construct within the cell. In Chapter 3, we were attempting to make a maltose-regulated fluorescent sensor and relied on the natural transport machinery of E. coli to take up maltose from the medium. However, maltose is a known sugar carbon source for E. coli and most small molecules are likely not actively transported into the cell. Therefore, if we want to screen a potential allosteric protein within a cell, the regulating ligand must: be naturally taken up by the host organism, be permeable to lipid bilayers, or have a transporter or biosynthetic pathway that can be heterologously expressed in the host organism. If the ligand is a metabolic intermediate of the host organism, overexpression or knockouts of pathway genes could also provide control of its intracellular levels. If none of these options are available, an alternative screening strategy will be needed. Previous groups have already turned to in vitro testing to have better defined conditions when measuring allosteric biosensors (Berg et al. 2009, Marvin et al. 2013). Another possible approach, which provides much greater throughput, is protein surface-display technologies for yeast (Gai & Wittrup 2007) or bacteria (Daugherty 2007), where a ligand would not need to enter the cell. A potential downside of these methods would be that enriched allosteric proteins might be optimized to function in the environment of a buffer or growth medium and not the inside of a cell. A final alternative is targeting protein variants to the periplasm of E. coli (Lee et al. 2006), which is more accessible to small molecules through passive diffusion.

We see multi-domain proteins as a new area of protein engineering with very large potential. A major issue, which is true in most protein engineering, is that we are limited to known proteins and domains. A domain with a desired activity or property may not be available. Thus, a new domain must be created or discovered. There is precedent for the former case (Guntas et al. 2005, Looger et al. 2003, Siegel et al. 2010) and may be the only possible route when the activity relates to non- natural molecules. For molecules that are found in nature, it is extremely likely that domains exist which interact with or have activity towards them, even if they have not been discovered yet. Our method of biosensor library construction (Chapter 3) may be well suited to test many potential binding domains in a high-throughput manner. Either suspected binding candidates or a diverse class of binding proteins (e.g. uncharacterized periplasmic binding proteins) could be combined in a library and sorted for function against a ligand of interest. A further limitation of the insertion method that we outlined is that one of the domains to be fused must have 84 adjacent N- and C-termini. Further testing is needed to determine how far apart they can be and still make functional fusions with longer linkers.

5.3 Future Directions

The methods we have described were developed because of the presumption that multi-domain proteins and allostery are difficult to design. However, we believe that further studies using our developed techniques can help to elucidate principles or mechanisms of allostery. Creating manifold libraries with different domains could provide the necessary data to piece together trends of domain coupling and linker composition. If enriched libraries are interrogated with next-generation sequencing, statistical analysis could be carried out across many unique domains and possibly identify principle factors that determine allostery. It will also be interesting to observe if any domains show no ability to be allosterically linked. Researchers have described analysis methods to detect the allosteric potential of protein (“latent allostery”) (Coyle et al. 2013, Lee et al. 2008), but this indicates some proteins may lack the potential entirely. Such proteins, found by high-throughput screening, could further inform a mechanistic understanding of allostery.

One of the main reasons we pursued a transposon-based strategy for library creation is that their activity is independent of DNA sequence or length. This means that transposon-mediated library construction is readily scalable, while PCR or DNA synthesis methods do not scale very well. Furthermore, researchers have used modified transposons in other valuable ways. For example, by encoding a ribosomal binding site (RBS) and stop codon into either end of a transposon, Silberg and colleagues produced libraries of circularly permuted variants of (Mehta et al. 2012). Such a method could be combined with our domain-insertion strategy to create a library of domains combined at every site in both proteins. This would represent a truly exhaustive search of potential fusion constructs between two domains.

The other main reason we used modified transposons to create domain-insertion libraries was because a PCR-based cloning approach would be laborious and expensive. In the future, though, if current price trends continue (Carr & Church 2009), we envision complete gene synthesis becoming a commonplace method of library creation for protein engineering. Using gene synthesis, multi-domain protein libraries can be created that go beyond single insertions. Shuffling of more than two domains can be designed in silico, synthesized, and screened. Smaller, more intelligent libraries can be created to arrive at successful constructs faster. These approaches may still be years away, but we believe most libraries will eventually be created on a computer instead of a becnchtop.

85 5.4 References

Arnold FH. 1998. Design by directed evolution. Accounts of chemical research Berg J, Hung YP, Yellen G. 2009. A genetically encoded fluorescent reporter of ATP:ADP ratio. Nat Methods. 6(2):161–66 Carr PA, Church GM. 2009. Genome engineering. Nat. Biotechnol. 27(12):1151–62 Coyle SM, Flores J, Lim WA. 2013. Exploitation of latent allostery enables the evolution of new modes of MAP kinase regulation. Cell. 154(4):875–87 Daugherty PS. 2007. Protein engineering with bacterial display. Curr Opin Struct Biol. 17(4):474–80 Gai SA, Wittrup KD. 2007. Yeast surface display for protein engineering and characterization. Curr Opin Struct Biol. 17(4):467–73 Guntas G, Mansell TJ, Kim JR, Ostermeier M. 2005. Directed evolution of protein switches and their application to the creation of ligand-binding proteins. P Natl Acad Sci Usa. 102(32):11224–29 Lee J, Natarajan M, Nashine VC, Socolich M, Vo T, et al. 2008. Surface sites for engineering allosteric control in proteins. Science. 322(5900):438–42 Lee PA, Tullman-Ercek D, Georgiou G. 2006. The bacterial twin-arginine translocation pathway. Annu Rev Microbiol. 60:373–95 Looger LL, Dwyer MA, Smith JJ, Hellinga HW. 2003. Computational design of receptor and sensor proteins with novel functions. Nature. 423(6936):185–90 Marvin JS, Borghuis BG, Tian L, Cichon J, Harnett MT, et al. 2013. An optimized fluorescent probe for visualizing glutamate neurotransmission. Nat Methods. 10(2):162–70 Mehta MM, Liu S, Silberg JJ. 2012. A transposase strategy for creating libraries of circularly permuted proteins. Nucleic Acids Res. 40(9):e71–e71 Siegel JB, Zanghellini A, Lovick HM, Kiss G, Lambert AR, et al. 2010. Computational design of an enzyme catalyst for a stereoselective bimolecular Diels-Alder reaction. Science. 329(5989):309–13

86 Appendix A: Published Single-FP Biosensors

Name Target Ligand Year Citation Camgaroo-1 Ca2+ 1999 [1] Camgaroo-2 Ca2+ 2001 [2] Flash-pericam Ca2+ 2001 [3] GCaMP Ca2+ 2001 [4] Sinphos Insulin receptor activation 2004 [5] (phosphorylation) GCaMP2 Ca2 2006 [6] HyPer H2O2 2006 [7] FlincG cGMP 2008 [8] GCaMP3 Ca2+ 2009 [9] Perceval ATP/ADP ratio 2009 [10] Peredox NADH/NAD+ ratio 2011 [11] GECO series Ca2+ 2011 [12] MBP165-cpGFP Maltose 2011 [13] Upward/Downward Dog Ca2+ & Diacylglycerol (DAG) 2012 [14] GCaMP5 Ca2+ 2012 [15] iGluSnFR Glutamate 2013 [16] RCaMP Ca2+ 2013 [17] FlincG3 cGMP 2013 [18] QSer Quinones 2013 [19] CF98 Citrate 2013 [20] AmTrac & MepTrac Ammonium transport 2013 [21] GCaMP6 Ca2 2013 [22] ASAP1 Voltage 2014 [23] CEPIA Ca2+ 2014 [24] GAP Ca2+ 2014 [25]

87 Sequence Alignment of Select cpFPs from Published Single-FP Biosensors

Sequence sources: cp_avGFP: Uniprot P42212 (GFP_AEQVI), split at 144-145 cpEYFP_Baird: Camgaroo-1 [1] cpEYFP-Flash-Pericam: Flash-pericam [3] Perceval_cpmVenus: Perceval [10] HyPer_cpEYFP: HyPer [7] Peredox_cpmTSapphire: Peredox [11] G-CaMP1: GCaMP [4] G-CaMP2: GCaMP2 [6] G-CaMP3: GCaMP3 (Addgene #22692) [9] GltI253: Addgene #41733, same cpFP sequence as iGluSnFR [16] EcMBP165-cpGFP.PPYF: Addgene #33372 [13] EcMBP165-cpGFP.PPYF.T203V: Addgene #33373 [13] G-GECO1: [12] B-GECO1: [12] cpsfGFP149_ver1: this work, Chapter 2

88 References

1. Baird GS, Zacharias DA, Tsien RY: Circular permutation and receptor insertion within green fluorescent proteins. P Natl Acad Sci Usa 1999 Sep 28;96:11241– 11246. 2. Griesbeck O, Baird GS, Campbell RE, Zacharias DA, Tsien RY: Reducing the environmental sensitivity of yellow fluorescent protein. Mechanism and applications. J Biol Chem 2001 Aug 3;276:29188–29194. 3. Nagai T, Sawano A, Park ES, Miyawaki A: Circularly permuted green fluorescent proteins engineered to sense Ca2+. P Natl Acad Sci Usa 2001 Mar 13;98:3197–3202. 4. Nakai J, Ohkura M, Imoto K: A high signal-to-noise Ca(2+) probe composed of a single green fluorescent protein. Nat Biotechnol 2001 Feb;19:137–141. 5. Kawai Y, Sato M, Umezawa Y: Single color fluorescent indicators of protein phosphorylation for multicolor imaging of intracellular signal flow dynamics. Anal Chem 2004 Oct 15;76:6144–6149. 6. Tallini YN, Ohkura M, Choi B-R, Ji G, Imoto K, Doran R, et al.: Imaging cellular signals in the heart in vivo: Cardiac expression of the high-signal Ca2+ indicator GCaMP2. P Natl Acad Sci Usa 2006 Mar 21;103:4753–4758. 7. Belousov VV, Fradkov AF, Lukyanov KA, Staroverov DB, Shakhbazov KS, Terskikh AV, et al.: Genetically encoded fluorescent indicator for intracellular hydrogen peroxide. Nat Methods 2006 Apr;3:281–286. 8. Nausch LWM, Ledoux J, Bonev AD, Nelson MT, Dostmann WR: Differential patterning of cGMP in vascular smooth muscle cells revealed by single GFP- linked biosensors. Proc Natl Acad Sci USA 2008 Jan 8;105:365–370. 9. Tian L, Hires SA, Mao T, Huber D, Chiappe ME, Chalasani SH, et al.: Imaging neural activity in worms, flies and mice with improved GCaMP calcium indicators. Nat Methods 2009 Dec;6:875–881. 10. Berg J, Hung YP, Yellen G: A genetically encoded fluorescent reporter of ATP:ADP ratio. Nat Methods 2009 Feb;6:161–166. 11. Hung YP, Albeck JG, Tantama M, Yellen G: Imaging Cytosolic NADH-NAD(+) Redox State with a Genetically Encoded Fluorescent Biosensor. Cell Metab 2011 Oct 5;14:545–554. 12. Zhao Y, Araki S, Wu J, Teramoto T, Chang Y-F, Nakano M, et al.: An expanded palette of genetically encoded Ca²⁺ indicators. Science 2011 Sep 30;333:1888–1891. 13. Marvin JS, Schreiter ER, Echevarría IM, Looger LL: A genetically encoded, high-signal-to-noise maltose sensor. Proteins 2011 Nov;79:3025–3036. 14. Tewson P, Westenberg M, Zhao Y, Campbell RE, Quinn AM, Hughes TE: Simultaneous detection of Ca2+ and diacylglycerol signaling in living cells.

89 PLoS ONE 2012;7:e42791. 15. Akerboom J, Chen T-W, Wardill TJ, Tian L, Marvin JS, Mutlu S, et al.: Optimization of a GCaMP calcium indicator for neural activity imaging. J Neurosci 2012 Oct 3;32:13819–13840. 16. Marvin JS, Borghuis BG, Tian L, Cichon J, Harnett MT, Akerboom J, et al.: An optimized fluorescent probe for visualizing glutamate neurotransmission. Nat Methods 2013 Feb;10:162–170. 17. Akerboom J, Carreras Calderón N, Tian L, Wabnig S, Prigge M, Tolö J, et al.: Genetically encoded calcium indicators for multi-color neural activity imaging and combination with optogenetics. Front Mol Neurosci 2013;6:2. 18. Bhargava Y, Hampden-Smith K, Chachlaki K, Wood KC, Vernon J, Allerston CK, et al.: Improved genetically-encoded, FlincG-type fluorescent biosensors for neural cGMP imaging. Front Mol Neurosci 2013;6:26. 19. Ji Q, Zhao BS, He C: A highly sensitive and genetically encoded fluorescent reporter for ratiometric monitoring of quinones in living cells. Chem Commun (Camb) 2013 Sep 21;49:8027–8029. 20. Honda Y, Kirimura K: Generation of circularly permuted fluorescent-protein- based indicators for in vitro and in vivo detection of citrate. PLoS ONE 2013;8:e64597. 21. De Michele R, Ast C, Loqué D, Ho C-H, Andrade SL, Lanquar V, et al.: Fluorescent sensors reporting the activity of ammonium transceptors in live cells. Elife 2013;2:e00800. 22. Chen T-W, Wardill TJ, Sun Y, Pulver SR, Renninger SL, Baohan A, et al.: Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature 2013 Jul 18;499:295–300. 23. St-Pierre F, Marshall JD, Yang Y, Gong Y, Schnitzer MJ, Lin MZ: High-fidelity optical reporting of neuronal electrical activity with an ultrafast fluorescent voltage sensor. Nat Neurosci 2014 Jun;17:884–889. 24. Suzuki J, Kanemaru K, Ishii K, Ohkura M, Okubo Y, Iino M: Imaging intraorganellar Ca2+ at subcellular resolution using CEPIA. Nat Commun 2014 Jun;5. DOI: 10.1038/ncomms5153 25. Rodriguez-Garcia A, Rojo-Ruiz J, Navas-Navarro P, Aulestia FJ, Gallego-Sandin S, Garcia-Sancho J, et al.: GAP, an aequorin-based fluorescent indicator for imaging Ca2+ in organelles. Proc Natl Acad Sci USA 2014 Feb 5; DOI: 10.1073/pnas.1316539111

90 Appendix B: Restriction-Site Search Python Script

#!/usr/bin/env python from Bio import Restriction from Bio.Seq import MutableSeq from Bio.Seq import Seq from Bio.Alphabet import IUPAC

#choose enzymes of interest to search for enzymes_to_test = Restriction.RestrictionBatch(['Alw26I', 'AarI', 'ApaI', 'AvrII', 'BamHI', 'BbvI', 'BfuAI', 'BglI', 'BglII', 'BpiI', 'BtgZI', 'BveI','Eco31I', 'EcoRI', 'EcoRV', 'Esp3I', 'FokI', 'HindIII', 'KpnI', 'NcoI', 'NdeI', 'NheI', 'NotI', 'PstI', 'SacI', 'SalI', 'SmaI', 'SpeI', 'SwaI', 'XbaI', 'XhoI'])

#input sequence to scan for enzymes sites test_seq = MutableSeq('TGAAGCGGCGCACGAAAAACGCGAAAGCGTTTCACGATAAATGCGAAAA', IUPAC.IUPACAmbiguousDNA()) outputdict = {}

#check for cutsite in a sequence with a given mutation - add cutsites to a dictionary keyed with the mutation def checksites(seqtotest, pos1, mut1, pos2, mut2, pos3, mut3): #convert MutableSeq to Seq - batch enzyme search does not work on mutableSeq seqinput = seqtotest.toseq() outputdict['%03d'%pos1+mut1+'%03d'%pos2+mut2+'%03d'%pos3+mut3] = enzymes_to_test.search(seqinput, linear = 'true')

#Iterate over each base in sequence, change to other three bases and check for cutsites for xInd, x in enumerate(test_seq): if x != 'A': test_seq[xInd] = 'A' for yInd, y in enumerate(test_seq): if yInd > xInd and yInd <= (xInd + 7): if y != 'A': test_seq[yInd] = 'A' for zInd, z in enumerate(test_seq): if zInd > xInd and zInd > yInd and zInd <= (xInd + 7): if z != 'A': test_seq[zInd] = 'A' checksites(test_seq, xInd, 'A', yInd, 'A', zInd, 'A') if z != 'T': test_seq[zInd] = 'T' checksites(test_seq, xInd, 'A', yInd, 'A', zInd, 'T') if z != 'G': test_seq[zInd] = 'G' checksites(test_seq, xInd, 'A', yInd, 'A', zInd, 'G') if z != 'C': test_seq[zInd] = 'C' checksites(test_seq, xInd, 'A', yInd, 'A', zInd, 'C') test_seq[zInd] = z if y != 'T': test_seq[yInd] = 'T' for zInd, z in enumerate(test_seq): if zInd > xInd and zInd > yInd and zInd <= (xInd + 7): if z != 'A': test_seq[zInd] = 'A' checksites(test_seq, xInd, 'A', yInd, 'T', zInd, 'A') if z != 'T': test_seq[zInd] = 'T' checksites(test_seq, xInd, 'A', yInd, 'T', zInd, 'T') if z != 'G': test_seq[zInd] = 'G' checksites(test_seq, xInd, 'A', yInd, 'T', zInd, 'G') if z != 'C': test_seq[zInd] = 'C' checksites(test_seq, xInd, 'A', yInd, 'T', zInd, 'C') test_seq[zInd] = z 91 if y != 'G': test_seq[yInd] = 'G' for zInd, z in enumerate(test_seq): if zInd > xInd and zInd > yInd and zInd <= (xInd + 7): if z != 'A': test_seq[zInd] = 'A' checksites(test_seq, xInd, 'A', yInd, 'G', zInd, 'A') if z != 'T': test_seq[zInd] = 'T' checksites(test_seq, xInd, 'A', yInd, 'G', zInd, 'T') if z != 'G': test_seq[zInd] = 'G' checksites(test_seq, xInd, 'A', yInd, 'G', zInd, 'G') if z != 'C': test_seq[zInd] = 'C' checksites(test_seq, xInd, 'A', yInd, 'G', zInd, 'C') test_seq[zInd] = z if y != 'C': test_seq[yInd] = 'C' for zInd, z in enumerate(test_seq): if zInd > xInd and zInd > yInd and zInd <= (xInd + 7): if z != 'A': test_seq[zInd] = 'A' checksites(test_seq, xInd, 'A', yInd, 'C', zInd, 'A') if z != 'T': test_seq[zInd] = 'T' checksites(test_seq, xInd, 'A', yInd, 'C', zInd, 'T') if z != 'G': test_seq[zInd] = 'G' checksites(test_seq, xInd, 'A', yInd, 'C', zInd, 'G') if z != 'C': test_seq[zInd] = 'C' checksites(test_seq, xInd, 'A', yInd, 'C', zInd, 'C') test_seq[zInd] = z test_seq[yInd] = y if x != 'T': test_seq[xInd] = 'T' for yInd, y in enumerate(test_seq): if yInd > xInd and yInd <= (xInd + 7): if y != 'A': test_seq[yInd] = 'A' for zInd, z in enumerate(test_seq): if zInd > xInd and zInd > yInd and zInd <= (xInd + 7): if z != 'A': test_seq[zInd] = 'A' checksites(test_seq, xInd, 'T', yInd, 'A', zInd, 'A') if z != 'T': test_seq[zInd] = 'T' checksites(test_seq, xInd, 'T', yInd, 'A', zInd, 'T') if z != 'G': test_seq[zInd] = 'G' checksites(test_seq, xInd, 'T', yInd, 'A', zInd, 'G') if z != 'C': test_seq[zInd] = 'C' checksites(test_seq, xInd, 'T', yInd, 'A', zInd, 'C') test_seq[zInd] = z if y != 'T': test_seq[yInd] = 'T' for zInd, z in enumerate(test_seq): if zInd > xInd and zInd > yInd and zInd <= (xInd + 7): if z != 'A': test_seq[zInd] = 'A' checksites(test_seq, xInd, 'T', yInd, 'T', zInd, 'A') if z != 'T': test_seq[zInd] = 'T' checksites(test_seq, xInd, 'T', yInd, 'T', zInd, 'T') if z != 'G': test_seq[zInd] = 'G' checksites(test_seq, xInd, 'T', yInd, 'T', zInd, 'G') if z != 'C': test_seq[zInd] = 'C' checksites(test_seq, xInd, 'T', yInd, 'T', zInd, 'C') test_seq[zInd] = z 92 if y != 'G': test_seq[yInd] = 'G' for zInd, z in enumerate(test_seq): if zInd > xInd and zInd > yInd and zInd <= (xInd + 7): if z != 'A': test_seq[zInd] = 'A' checksites(test_seq, xInd, 'T', yInd, 'G', zInd, 'A') if z != 'T': test_seq[zInd] = 'T' checksites(test_seq, xInd, 'T', yInd, 'G', zInd, 'T') if z != 'G': test_seq[zInd] = 'G' checksites(test_seq, xInd, 'T', yInd, 'G', zInd, 'G') if z != 'C': test_seq[zInd] = 'C' checksites(test_seq, xInd, 'T', yInd, 'G', zInd, 'C') test_seq[zInd] = z if y != 'C': test_seq[yInd] = 'C' for zInd, z in enumerate(test_seq): if zInd > xInd and zInd > yInd and zInd <= (xInd + 7): if z != 'A': test_seq[zInd] = 'A' checksites(test_seq, xInd, 'T', yInd, 'C', zInd, 'A') if z != 'T': test_seq[zInd] = 'T' checksites(test_seq, xInd, 'T', yInd, 'C', zInd, 'T') if z != 'G': test_seq[zInd] = 'G' checksites(test_seq, xInd, 'T', yInd, 'C', zInd, 'G') if z != 'C': test_seq[zInd] = 'C' checksites(test_seq, xInd, 'T', yInd, 'C', zInd, 'C') test_seq[zInd] = z test_seq[yInd] = y if x != 'G': test_seq[xInd] = 'G' for yInd, y in enumerate(test_seq): if yInd > xInd and yInd <= (xInd + 7): if y != 'A': test_seq[yInd] = 'A' for zInd, z in enumerate(test_seq): if zInd > xInd and zInd > yInd and zInd <= (xInd + 7): if z != 'A': test_seq[zInd] = 'A' checksites(test_seq, xInd, 'G', yInd, 'A', zInd, 'A') if z != 'T': test_seq[zInd] = 'T' checksites(test_seq, xInd, 'G', yInd, 'A', zInd, 'T') if z != 'G': test_seq[zInd] = 'G' checksites(test_seq, xInd, 'G', yInd, 'A', zInd, 'G') if z != 'C': test_seq[zInd] = 'C' checksites(test_seq, xInd, 'G', yInd, 'A', zInd, 'C') test_seq[zInd] = z if y != 'T': test_seq[yInd] = 'T' for zInd, z in enumerate(test_seq): if zInd > xInd and zInd > yInd and zInd <= (xInd + 7): if z != 'A': test_seq[zInd] = 'A' checksites(test_seq, xInd, 'G', yInd, 'T', zInd, 'A') if z != 'T': test_seq[zInd] = 'T' checksites(test_seq, xInd, 'G', yInd, 'T', zInd, 'T') if z != 'G': test_seq[zInd] = 'G' checksites(test_seq, xInd, 'G', yInd, 'T', zInd, 'G') if z != 'C': test_seq[zInd] = 'C' checksites(test_seq, xInd, 'G', yInd, 'T', zInd, 'C') test_seq[zInd] = z 93 if y != 'G': test_seq[yInd] = 'G' for zInd, z in enumerate(test_seq): if zInd > xInd and zInd > yInd and zInd <= (xInd + 7): if z != 'A': test_seq[zInd] = 'A' checksites(test_seq, xInd, 'G', yInd, 'G', zInd, 'A') if z != 'T': test_seq[zInd] = 'T' checksites(test_seq, xInd, 'G', yInd, 'G', zInd, 'T') if z != 'G': test_seq[zInd] = 'G' checksites(test_seq, xInd, 'G', yInd, 'G', zInd, 'G') if z != 'C': test_seq[zInd] = 'C' checksites(test_seq, xInd, 'G', yInd, 'G', zInd, 'C') test_seq[zInd] = z if y != 'C': test_seq[yInd] = 'C' for zInd, z in enumerate(test_seq): if zInd > xInd and zInd > yInd and zInd <= (xInd + 7): if z != 'A': test_seq[zInd] = 'A' checksites(test_seq, xInd, 'G', yInd, 'C', zInd, 'A') if z != 'T': test_seq[zInd] = 'T' checksites(test_seq, xInd, 'G', yInd, 'C', zInd, 'T') if z != 'G': test_seq[zInd] = 'G' checksites(test_seq, xInd, 'G', yInd, 'C', zInd, 'G') if z != 'C': test_seq[zInd] = 'C' checksites(test_seq, xInd, 'G', yInd, 'C', zInd, 'C') test_seq[zInd] = z test_seq[yInd] = y if x != 'C': test_seq[xInd] = 'C' for yInd, y in enumerate(test_seq): if yInd > xInd and yInd <= (xInd + 7): if y != 'A': test_seq[yInd] = 'A' for zInd, z in enumerate(test_seq): if zInd > xInd and zInd > yInd and zInd <= (xInd + 7): if z != 'A': test_seq[zInd] = 'A' checksites(test_seq, xInd, 'C', yInd, 'A', zInd, 'A') if z != 'T': test_seq[zInd] = 'T' checksites(test_seq, xInd, 'C', yInd, 'A', zInd, 'T') if z != 'G': test_seq[zInd] = 'G' checksites(test_seq, xInd, 'C', yInd, 'A', zInd, 'G') if z != 'C': test_seq[zInd] = 'C' checksites(test_seq, xInd, 'C', yInd, 'A', zInd, 'C') test_seq[zInd] = z if y != 'T': test_seq[yInd] = 'T' for zInd, z in enumerate(test_seq): if zInd > xInd and zInd > yInd and zInd <= (xInd + 7): if z != 'A': test_seq[zInd] = 'A' checksites(test_seq, xInd, 'C', yInd, 'T', zInd, 'A') if z != 'T': test_seq[zInd] = 'T' checksites(test_seq, xInd, 'C', yInd, 'T', zInd, 'T') if z != 'G': test_seq[zInd] = 'G' checksites(test_seq, xInd, 'C', yInd, 'T', zInd, 'G') if z != 'C': test_seq[zInd] = 'C' checksites(test_seq, xInd, 'C', yInd, 'T', zInd, 'C') test_seq[zInd] = z 94 if y != 'G': test_seq[yInd] = 'G' for zInd, z in enumerate(test_seq): if zInd > xInd and zInd > yInd and zInd <= (xInd + 7): if z != 'A': test_seq[zInd] = 'A' checksites(test_seq, xInd, 'C', yInd, 'G', zInd, 'A') if z != 'T': test_seq[zInd] = 'T' checksites(test_seq, xInd, 'C', yInd, 'G', zInd, 'T') if z != 'G': test_seq[zInd] = 'G' checksites(test_seq, xInd, 'C', yInd, 'G', zInd, 'G') if z != 'C': test_seq[zInd] = 'C' checksites(test_seq, xInd, 'C', yInd, 'G', zInd, 'C') test_seq[zInd] = z if y != 'C': test_seq[yInd] = 'C' for zInd, z in enumerate(test_seq): if zInd > xInd and zInd > yInd and zInd <= (xInd + 7): if z != 'A': test_seq[zInd] = 'A' checksites(test_seq, xInd, 'C', yInd, 'C', zInd, 'A') if z != 'T': test_seq[zInd] = 'T' checksites(test_seq, xInd, 'C', yInd, 'C', zInd, 'T') if z != 'G': test_seq[zInd] = 'G' checksites(test_seq, xInd, 'C', yInd, 'C', zInd, 'G') if z != 'C': test_seq[zInd] = 'C' checksites(test_seq, xInd, 'C', yInd, 'C', zInd, 'C') test_seq[zInd] = z test_seq[yInd] = y test_seq[xInd] = x

#cleanup #remove any enzyme cut lists that are empty (thus do not cut) for item in outputdict.keys(): for enzyme in outputdict[item].keys(): if not outputdict[item][enzyme]: del outputdict[item][enzyme]

filename = 'MutCutters.txt' fh = open(filename, 'w')

#for each mutation tested, print the mutation with its cutsites for key in sorted(outputdict.iterkeys()): if outputdict[key]: fh.write("%s: %s\n" % (key,outputdict[key]) ) fh.close()

95 Appendix C: DNA Sequences

Sequences from Chapter 2

>Entranceposon (CamR-3) Thermo F-778, 1302 bp TGAAGCGGCGCACGAAAAACGCGAAAGCGTTTCACGATAAATGCGAAAACGATGTCTAACTCCAGCCACCGTTTAAACGGATCCTTTTCGACCGAAT AAATACCTGTGACGGAAGATCACTTCGCAGAATAAATAAATCCTGGTGTCCCTGTTGATACCGGGAAGCCCTGGGCCAACTTTTGGCGAAAATGAGA CGTTGATCGGCACGTAAGAGGTTCCAACTTTCACCATAATGAAATAAGATCACTACCGGGCGTATTTTTTGAGTTGTCGAGATTTTCAGGAGCTAAG GAAGCTAAAATGGAGAAAAAAATCACTGGATATACCACCGTTGATATATCCCAATGGCATCGTAAAGAACATTTTGAGGCATTTCAGTCAGTTGCTC AATGTACCTATAACCAGACCGTTCAGCTGGATATTACGGCCTTTTTAAAGACCGTAAAGAAAAATAAGCACAAGTTTTATCCGGCCTTTATTCACAT TCTTGCCCGCCTGATGAATGCTCATCCGGAATTACGTATGGCAATGAAAGACGGTGAGCTGGTGATATGGGATAGTGTTCACCCTTGTTACACCGTT TTCCATGAGCAAACTGAAACGTTTTCATCGCTCTGGAGTGAATACCACGACGATTTCCGGCAGTTTCTACACATATATTCGCAAGATGTGGCGTGTT ACGGTGAAAACCTGGCCTATTTCCCTAAAGGGTTTATTGAGAATATGTTTTTCGTCTCAGCCAATCCCTGGGTGAGTTTCACCAGTTTTGATTTAAA CGTGGCCAATATGGACAACTTCTTCGCCCCCGTTTTCACCATGGGCAAATATTATACGCAAGGCGACAAGGTGCTGATGCCGCTGGCGATTCAGGTT CATCATGCCGTTTGTGATGGCTTCCATGTCGGCAGAATGCTTAATGAATTACAACAGTACTGCGATGAGTGGCAGGGCGGGGCGTAATTTTTTTAAG GCAGTTATTGGTGCCCTTAAACGCCTGGTTGCTACGCCTGAATAAGTGATAATAAGCGGATGAATGGCAGAAATTCGAAAGCAAATTCGACCCGGTC GTCGGTTCAGGGCAGGGTCGTTAAATAGCCGCTTATGTCTATTGCTGGTTTACCGGTTTATTGACTACCGGAAGCAGTGTGACCGTGTGCTTCTCAA ATGCCTGAGGCCAGTTTGCTCAGGCTCTCCCCGTGGAGGTAATAATTGACGATAGGATCCGCGGCCGCCGACACACTCCAATCTTTCCGTTTTCGCA TTTATCGTGAAACGCTTTCGCGTTTTTCGTGCGCCGCTTCA

>Entranceposon (M1-CamR) Thermo F-760, 1254 bp TGCGGCCGCGCACGAAAAACGCGAAAGCGTTTCACGATAAATGCGAAAACGGATCGATCCTTTTCGACCGAATAAATACCTGTGACGGAAGATCACT TCGCAGAATAAATAAATCCTGGTGTCCCTGTTGATACCGGGAAGCCCTGGGCCAACTTTTGGCGAAAATGAGACGTTGATCGGCACGTAAGAGGTTC CAACTTTCACCATAATGAAATAAGATCACTACCGGGCGTATTTTTTGAGTTGTCGAGATTTTCAGGAGCTAAGGAAGCTAAAATGGAGAAAAAAATC ACTGGATATACCACCGTTGATATATCCCAATGGCATCGTAAAGAACATTTTGAGGCATTTCAGTCAGTTGCTCAATGTACCTATAACCAGACCGTTC AGCTGGATATTACGGCCTTTTTAAAGACCGTAAAGAAAAATAAGCACAAGTTTTATCCGGCCTTTATTCACATTCTTGCCCGCCTGATGAATGCTCA TCCGGAATTACGTATGGCAATGAAAGACGGTGAGCTGGTGATATGGGATAGTGTTCACCCTTGTTACACCGTTTTCCATGAGCAAACTGAAACGTTT TCATCGCTCTGGAGTGAATACCACGACGATTTCCGGCAGTTTCTACACATATATTCGCAAGATGTGGCGTGTTACGGTGAAAACCTGGCCTATTTCC CTAAAGGGTTTATTGAGAATATGTTTTTCGTCTCAGCCAATCCCTGGGTGAGTTTCACCAGTTTTGATTTAAACGTGGCCAATATGGACAACTTCTT CGCCCCCGTTTTCACCATGGGCAAATATTATACGCAAGGCGACAAGGTGCTGATGCCGCTGGCGATTCAGGTTCATCATGCCGTTTGTGATGGCTTC CATGTCGGCAGAATGCTTAATGAATTACAACAGTACTGCGATGAGTGGCAGGGCGGGGCGTAATTTTTTTAAGGCAGTTATTGGTGCCCTTAAACGC CTGGTTGCTACGCCTGAATAAGTGATAATAAGCGGATGAATGGCAGAAATTCGAAAGCAAATTCGACCCGGTCGTCGGTTCAGGGCAGGGTCGTTAA ATAGCCGCTTATGTCTATTGCTGGTTTACCGGTTTATTGACTACCGGAAGCAGTGTGACCGTGTGCTTCTCAAATGCCTGAGGCCAGTTTGCTCAGG CTCTCCCCGTGGAGGTAATAATTGACGATAGGATCGATCCGTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTTCGTGCGCGGCCGCA

96

>pET47GG-BsaI-Dest, 5573 bp TGGCGAATGGGACGCGCCCTGTAGCGGCGCATTAAGCGCGGCGGGTGTGGTGGTTACGCGCAGCGTGACCGCTACACTTGCCAGCGCCCTAGCGCCC GCTCCTTTCGCTTTCTTCCCTTCCTTTCTCGCCACGTTCGCCGGCTTTCCCCGTCAAGCTCTAAATCGGGGGCTCCCTTTAGGGTTCCGATTTAGTG CTTTACGGCACCTCGACCCCAAAAAACTTGATTAGGGTGATGGTTCACGTAGTGGGCCATCGCCCTGATAGACGGTTTTTCGCCCTTTGACGTTGGA GTCCACGTTCTTTAATAGTGGACTCTTGTTCCAAACTGGAACAACACTCAACCCTATCTCGGTCTATTCTTTTGATTTATAAGGGATTTTGCCGATT TCGGCCTATTGGTTAAAAAATGAGCTGATTTAACAAAAATTTAACGCGAATTTTAACAAAATATTAACGCTTACAATTTAGGTGGCACTTTTCGGGG AAATGTGCGCGGAACCCCTATTTGTTTATTTTTCTAAATACATTCAAATATGTATCCGCTCATGAATTAATTCTTAGAAAAACTCATCGAGCATCAA ATGAAACTGCAATTTATTCATATCAGGATTATCAATACCATATTTTTGAAAAAGCCGTTTCTGTAATGAAGGAGAAAACTCACCGAGGCAGTTCCAT AGGATGGCAAGATCCTGGTATCGGTCTGCGATTCCGACTCGTCCAACATCAATACAACCTATTAATTTCCCCTCGTCAAAAATAAGGTTATCAAGTG AGAAATCACCATGAGTGACGACTGAATCCGGTGAGAATGGCAAAAGTTTATGCATTTCTTTCCAGACTTGTTCAACAGGCCAGCCATTACGCTCGTC ATCAAAATCACTCGCATCAACCAAACCGTTATTCATTCGTGATTGCGCCTGAGCGAGGCGAAATACGCGATCACTGTTAAAAGGACAATTACAAACA GGAATCGAATGCAACCGGCGCAGGAACACTGCCAGCGCATCAACAATATTTTCACCTGAATCAGGATATTCTTCTAATACCTGGAATGCTGTTTTGC CGGGGATCGCAGTGGTGAGTAACCATGCATCATCAGGAGTACGGATAAAATGCTTGATGGTCGGAAGAGGCATAAATTCCGTCAGCCAGTTTAGTCT GACCATCTCATCTGTAACATCATTGGCAACGCTACCTTTGCCATGTTTCAGAAACAACTCTGGCGCATCGGGCTTCCCATACAATCGATAGATTGTC GCACCTGATTGCCCGACATTATCGCGAGCCCATTTATACCCATATAAATCAGCATCCATGTTGGAATTTAATCGCGGCCTAGAGCAAGACGTTTCCC GTTGAATATGGCTCATAACACCCCTTGTATTACTGTTTATGTAAGCAGACAGTTTTATTGTTCATGACCAAAATCCCTTAACGTGAGTTTTCGTTCC ACTGAGCGTCAGACCCCGTAGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAAAAAAACCACCGCT ACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTCCTTCTA GTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCG ATAAGTCGTGTCTTACCGGGTTGGACTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTT GGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTA AGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTG AGCGTCGATTTTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTT TGCTCACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATACCGCTCGCCGCAGCCGAACGACC GAGCGCAGCGAGTCAGTGAGCGAGGAAGCGGAAGAGCGCCTGATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCACACCGCAATGGTGCAC TCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGTATACACTCCGCTATCGCTACGTGACTGGGTCATGGCTGCGCCCCGACACCCGCCAA CACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACAGACAAGCTGTGACCGTCTACGGGAGCTGCATGTGTCAGAGGTTTTC ACCGTCATCACCGAAACGCGCGAGGCAGCTGCGGTAAAGCTCATCAGCGTGGTCGTGAAGCGATTCACAGATGTCTGCCTGTTCATCCGCGTCCAGC TCGTTGAGTTTCTCCAGAAGCGTTAATGTCTGGCTTCTGATAAAGCGGGCCATGTTAAGGGCGGTTTTTTCCTGTTTGGTCACTGATGCCTCCGTGT AAGGGGGATTTCTGTTCATGGGGGTAATGATACCGATGAAACGAGAGAGGATGCTCACGATACGGGTTACTGATGATGAACATGCCCGGTTACTGGA ACGTTGTGAGGGTAAACAACTGGCGGTATGGATGCGGCGGGACCAGAGAAAAATCACTCAGGGTCAATGCCAGCGCTTCGTTAATACAGATGTAGGT GTTCCACAGGGTAGCCAGCAGCATCCTGCGATGCAGATCCGGAACATAATGGTGCAGGGCGCTGACTTCCGCGTTTCCAGACTTTACGAAACACGGA AACCGAAGACCATTCATGTTGTTGCTCAGGTCGCAGACGTTTTGCAGCAGCAGTCGCTTCACGTTCGCTCGCGTATCGGTGATTCATTCTGCTAACC AGTAAGGCAACCCCGCCAGCCTAGCCGGGTCCTCAACGACAGGAGCACGATCATGCTAGTCATGCCCCGCGCCCACCGGAAGGAGCTGACTGGGTTG AAGGCTCTCAAGGGCATCGGTCGAGATCCCGGTGCCTAATGAGTGAGCTAACTTACATTAATTGCGTTGCGCTCACTGCCCGCTTTCCAGTCGGGAA ACCTGTCGTGCCAGCTGCATTAATGAATCGGCCAACGCGCGGGGAGAGGCGGTTTGCGTATTGGGCGCCAGGGTGGTTTTTCTTTTCACCAGTGACA CGGGCAACAGCTGATTGCCCTTCACCGCCTGGCCCTGAGAGAGTTGCAGCAAGCGGTCCACGCTGGTTTGCCCCAGCAGGCGAAAATCCTGTTTGAT GGTGGTTAACGGCGGGATATAACATGAGCTGTCTTCGGTATCGTCGTATCCCACTACCGAGATATCCGCACCAACGCGCAGCCCGGACTCGGTAATG 97 GCGCGCATTGCGCCCAGCGCCATCTGATCGTTGGCAACCAGCATCGCAGTGGGAACGATACCCTCATTCAGCATTTGCATGGTTTGTTGAAAACCGG ACATGGCACTCCAGTCGCCTTCCCGTTCCGCTATCGGCTGAATTTGATTGCGAGTGAGATATTTATGCCAGCCAGCCAGACGCAGACGCGCCGAGAC AGAACTTAATGGGCCCGCTAACAGCGCGATTTGCTGGTGACCCAATGCGACCAGATGCTCCACGCCCAGTCGCGTACCGTCTTCATGGGAGAAAATA ATACTGTTGATGGGTGTCTGGTCAGAGACATCAAGAAATAACGCCGGAACATTAGTGCAGGCAGCTTCCACAGCAATGGCATCCTGGTCATCCAGCG GATAGTTAATGATCAGCCCACTGACGCGTTGCGCGAGAAGATTGTGCACCGCCGCTTTACAGGCTTCGACGCCGCTTCGTTCTACCATCGACACCAC CACGCTGGCACCCAGTTGATCGGCGCGAGATTTAATCGCCGCGACAATTTGCGACGGCGCGTGCAGGGCCAGACTGGAGGTGGCAACGCCAATCAGC AACGACTGTTTGCCCGCCAGTTGTTGTGCCACGCGGTTGGGAATGTAATTCAGCTCCGCCATCGCCGCTTCCACTTTTTCCCGCGTTTTCGCAGAAA CGTGGCTGGCCTGGTTCACCACGCGGGAAACGGTCTGATAAGAGACACCGGCATACTCTGCGACATCGTATAACGTTACTGGTTTCACATTCACCAC CCTGAATTGACTCTCTTCCGGGCGCTATCATGCCATACCGCGAAAGGTTTTGCGCCATTCGATGGTGTCCGGGATCTCGACGCTCTCCCTTATGCGA CTCCTGCATTAGGAAGCAGCCCAGTAGTAGGTTGAGGCCGTTGAGCACCGCCGCCGCAAGGAATGGTGCATGCAAGGAGATGGCGCCCAACAGTCCC CCGGCCACGGGGCCTGCCACCATACCCACGCCGAAACAAGCGCTCATGAGCCCGAAGTGGCGAGCCCGATCTTCCCCATCGGTGATGTCGGCGATAT AGGCGCCAGCAACCGCACCTGTGGCGCCGGTGATGCCGGCCACGATGCGTCCGGCGTAGAGGATCGAGATCTCGATCCCGCGAAATTAATACGACTC ACTATAGGGGAATTGTGAGCGGATAACAATTCCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACCATGTCCAGAGACCATTAATGC AGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACACT TTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCTATGACCATGATTACGCCAAGCTTGCATGCCT GCAGGTCGACTCTAGAGGATCCCCGGGTACCGAGCTCGAATTCACTGGCCGTCGTTTTACAACGTCGTGACTGGGAAAACCCTGGCGTTACCCAACT TAATCGCCTTGCAGCACATCCCCCTTTCGCCAGCTGGCGTAATAGCGAAGAGGCCCGCACCGATCGCCCTTCCCAACAGTTGCGCAGCCTGAATGGC GAATGGCGCCTGATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCACACCGCATATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCAT AGTTAAGCCAGCCCCGACACCCGCCAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACAGACAAGCTGGTCTCTAGCG GTTAAGATCCGGCTGCTAACAAAGCCCGAAAGGAAGCTGAGTTGGCTGCTGCCACCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACG GGTCTTGAGGGGTTTTTTGCTGAAAGGAGGAACTATATCCGGAT

>pET47GG-BsmBI-Dest, region - T7 promoter to T7 terminator, 806 bp TAATACGACTCACTATAGGGGAATTGTGAGCGGATAACAATTCCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACCATGTCCAGAG ACGattaatgcagctggcacgacagGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACCCCA GGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCTATGACCATGATTACGCCAAG CTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCCGGGTACCGAGCTCGAATTCACTGGCCGTCGTTTTACAACGTCGTGACTGGGAAAACCCTGGC GTTACCCAACTTAATCGCCTTGCAGCACATCCCCCTTTCGCCAGCTGGCGTAATAGCGAAGAGGCCCGCACCGATCGCCCTTCCCAACAGTTGCGCA GCCTGAATGGCGAATGGCGCCTGATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCACACCGCATATGGTGCACTCTCAGTACAATCTGCTC TGATGCCGCATAGTTAAGCCAGCCCCGACACCCGCCAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCggcatccgcttacagacaagct CGTCTCTAGCGGTTAAGATCCGGCTGCTAACAAAGCCCGAAAGGAAGCTGAGTTGGCTGCTGCCACCGCTGAGCAATAACTAGCATAACCCCTTGGG GCCTCTAAACGGGTCTTGAGGGGTTTTTTG

98

>pUCGG-KanR-BsmBI-Dest, 2254 bp AAAAACTCATCGAGCATCAAATGAAACTGCAATTTATTCATATCAGGATTATCAATACCATATTTTTGAAAAAGCCGTTTCTGTAATGAAGGAGAAA ACTCACCGAGGCAGTTCCATAGGATGGCAAGATCCTGGTATCGGTCTGCGATTCCGACTCGTCCAACATCAATACAACCTATTAATTTCCCCTCGTC AAAAATAAGGTTATCAAGTGAGAAATCACCATGAGTGACGACTGAATCCGGTGAGAATGGCAAAAGTTTATGCATTTCTTTCCAGACTTGTTCAACA GGCCAGCCATTACGCTCGTCATCAAAATCACTCGCATCAACCAAACCGTTATTCATTCGTGATTGCGCCTGAGCGAGGCGAAATACGCGATCACTGT TAAAAGGACAATTACAAACAGGAATCGAATGCAACCGGCGCAGGAACACTGCCAGCGCATCAACAATATTTTCACCTGAATCAGGATATTCTTCTAA TACCTGGAATGCTGTTTTGCCGGGGATCGCAGTGGTGAGTAACCATGCATCATCAGGAGTACGGATAAAATGCTTGATGGTCGGAAGAGGCATAAAT TCCGTCAGCCAGTTTAGTCTGACCATCTCATCTGTAACATCATTGGCAACGCTACCTTTGCCATGTTTCAGAAACAACTCTGGCGCATCGGGCTTCC CATACAATCGATAGATTGTCGCACCTGATTGCCCGACATTATCGCGAGCCCATTTATACCCATATAAATCAGCATCCATGTTGGAATTTAATCGCGG CCTAGAGCAAGACGTTTCCCGTTGAATATGGCTCATAACACCCCTTGTATTACTGTTTATGTAAGCAGACAGTTTTATTGTTCATGACCATCCCTTA ACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAA ACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATAC CAAATACTGTTCTTCTAGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGCTAATCCTGTTACCAGT GGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCG TGCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGG CGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATAGTCCTGTCGGGTT TCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTG GCCTTTTGCTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATACCGCTC GCCGCAGCCGAACGACCGAGCGCAGCGAGTCAGTGAGCGAGGAAGCGGAAGAGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCA TTAATGCAGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCT TTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCTATGTCCGGAGACGCACTGGCCGTC GTTTTACAACGTCGTGACTGGGAAAACCCTGGCGTTACCCAACTTAATCGCCTTGCAGCACATCCCCCTTTCGCCAGCTGGCGTAATAGCGAAGAGG CCCGCACCGATCGCCCTTCCCAACAGTTGCGCAGCCTGAATGGCGAATGGCGCTAACGTCTCTAGCGGTTAATTAAGCCAGCCCCGACACCCGCCAA CACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACAGACAAGCTGTGACCGTCACCGGGAGCTGCATGTGTCAGAGGTTTTC ACCGTCATCACCGAAACGCTTAG

99

>pET14GG-CHis-Dest, TTCTTGAAGACGAAAGGGCCTCGTGATACGCCTATTTTTATAGGTTAATGTCATGATAATAATGGTTTCTTAGACGTCAGGTGGCACTTTTCGGGGA AATGTGCGCGGAACCCCTATTTGTTTATTTTTCTAAATACATTCAAATATGTATCCGCTCATGAGACAATAACCCTGATAAATGCTTCAATAATATT GAAAAAGGAAGAGTATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCT GGTGAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCC GAAGAACGTTTTCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGTGTTGACGCCGGGCAAGAGCAACTCGGTCGCCGCA TACACTATTCTCAGAATGACTTGGTTGAGTACTCACCAGTCACAGAAAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCAT AACCATGAGTGATAACACTGCGGCCAACTTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACATGGGGGATCATGTA ACTCGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTGCAGCAATGGCAACAACGTTGCGCA AACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAACAATTAATAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTGCGCTCGGC CCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAGCGTGGCTCTCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCC CGTATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGATCGCTGAGATAGGTGCCTCACTGATTAAGCATTGGT AACTGTCAGACCAAGTTTACTCATATATACTTTAGATTGATTTAAAACTTCATTTTTAATTTAAAAGGATCTAGGTGAAGATCCTTTTTGATAATCT CATGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGC GTAATCTGCTGCTTGCAAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCT TCAGCAGAGCGCAGATACCAAATACTGTCCTTCTAGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCT GCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTCG GGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGC TTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCT TTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCG GCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCTTTG AGTGAGCTGATACCGCTCGCCGCAGCCGAACGACCGAGCGCAGCGAGTCAGTGAGCGAGGAAGCGGAAGAGCGCCTGATGCGGTATTTTCTCCTTAC GCATCTGTGCGGTATTTCACACCGCAATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGTATACACTCCGCTATCGCTACG TGACTGGGTCATGGCTGCGCCCCGACACCCGCCAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACAGACAAGCTGTG ACCGCCTCCGGGAGCTGCATGTGTCAGAGGTTTTCACCGTCATCACCGAAACGCGCGAGGCAGCTGCGGTAAAGCTCATCAGCGTGGTCGTGAAGCG ATTCACAGATGTCTGCCTGTTCATCCGCGTCCAGCTCGTTGAGTTTCTCCAGAAGCGTTAATGTCTGGCTTCTGATAAAGCGGGCCATGTTAAGGGC GGTTTTTTCCTGTTTGGTCACTGATGCCTCCGTGTAAGGGGGATTTCTGTTCATGGGGGTAATGATACCGATGAAACGAGAGAGGATGCTCACGATA CGGGTTACTGATGATGAACATGCCCGGTTACTGGAACGTTGTGAGGGTAAACAACTGGCGGTATGGATGCGGCGGGACCAGAGAAAAATCACTCAGG GTCAATGCCAGCGCTTCGTTAATACAGATGTAGGTGTTCCACAGGGTAGCCAGCAGCATCCTGCGATGCAGATCCGGAACATAATGGTGCAGGGCGC TGACTTCCGCGTTTCCAGACTTTACGAAACACGGAAACCGAAGACCATTCATGTTGTTGCTCAGGTCGCAGACGTTTTGCAGCAGCAGTCGCTTCAC GTTCGCTCGCGTATCGGTGATTCATTCTGCTAACCAGTAAGGCAACCCCGCCAGCCTAGCCGGGTCCTCAACGACAGGAGCACGATCATGCGCACCC GTGGCCAGGACCCAACGCTGCCCGAGATGCGCCGCGTGCGGCTGCTGGAGATGGCGGACGCGATGGATATGTTCTGCCAAGGGTTGGTTTGCGCATT CACAGTTCTCCGCAAGAATTGATTGGCTCCAATTCTTGGAGTGGTGAATCCGTTAGCGAGGTGCCGCCGGCTTCCATTCAGGTCGAGGTGGCCCGGC TCCATGCACCGCGACGCAACGCGGGGAGGCAGACAAGGTATAGGGCGGCGCCTACAATCCATGCCAACCCGTTCCATGTGCTCGCCGAGGCGGCATA AATCGCCGTGACGATCAGCGGTCCAGTGATCGAAGTTAGGCTGGTAAGAGCCGCGAGCGATCCTTGAAGCTGTCCCTGATGGTCGTCATCTACCTGC CTGGACAGCATGGCCTGCAACGCGGGCATCCCGATGCCGCCGGAAGCGAGAAGAATCATAATGGGGAAGGCCATCCAGCCTCGCGTCGCGAACGCCA GCAAGACGTAGCCCAGCGCGTCGGCCGCCATGCCGGCGATAATGGCCTGCTTCTCGCCGAAACGTTTGGTGGCGGGACCAGTGACGAAGGCTTGAGC GAGGGCGTGCAAGATTCCGAATACCGCAAGCGACAGGCCGATCATCGTCGCGCTCCAGCGAAAGCGGTCCTCGCCGAAAATGACCCAGAGCGCTGCC GGCACCTGTCCTACGAGTTGCATGATAAAGAAGACAGTCATAAGTGCGGCGACGATAGTCATGCCCCGCGCCCACCGGAAGGAGCTGACTGGGTTGA AGGCTCTCAAGGGCATCGGTCGACGCTCTCCCTTATGCGACTCCTGCATTAGGAAGCAGCCCAGTAGTAGGTTGAGGCCGTTGAGCACCGCCGCCGC AAGGAATGGTGCATGCAAGGAGATGGCGCCCAACAGTCCCCCGGCCACGGGGCCTGCCACCATACCCACGCCGAAACAAGCGCTCATGAGCCCGAAG 100 TGGCGAGCCCGATCTTCCCCATCGGTGATGTCGGCGATATAGGCGCCAGCAACCGCACCTGTGGCGCCGGTGATGCCGGCCACGATGCGTCCGGCGT AGAGGATCGAGATCTCGATCCCGCGAAATTAATACGACTCACTATAGGGACACCACAACGGTTTCCCTCTAGAAATAATTTTGTTTAACTTTAAGAA GGAGATATACCATGTCCAGAGACCATTAATGCAGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTA GCTCACTCATTAGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGC TATGACCATGATTACGCCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCCGGGTACCGAGCTCGAATTCACTGGCCGTCGTTTTACAACGT CGTGACTGGGAAAACCCTGGCGTTACCCAACTTAATCGCCTTGCAGCACATCCCCCTTTCGCCAGCTGGCGTAATAGCGAAGAGGCCCGCACCGATC GCCCTTCCCAACAGTTGCGCAGCCTGAATGGCGAATGGCGCCTGATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCACACCGCATATGGTG CACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGCCCCGACACCCGCCAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCG GCATCCGCTTACAGACAAGCTGGTCTCTAGCGGTCACCATCACCATCACCATTAAGATCCGGCTGCTAACAAAGCCCGAAAGGAAGCTGAGTTGGCT GCTGCCACCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGCTGAAAGGAGGAACTATATCCGGATATC CACAGGACGGGTGTGGTCGCCATGATCGCGTAGTCGATAGTGGCTCCAAGTAGCGAAGCGAGCAGGACTGGGCGGCGGCCAAAGCGGTCGGACAGTG CTCCGAGAACGGGTGCGCATAGAAATTGCATCAACGCATATAGCGCTAGCAGCACGCCATAGTGACTGGCGATGCTGTCGGAATGGACGATATCCCG CAAGAGGCCCGGCAGTACCGGCATAACCAAGCCTATGCCTACAGCATCCAGGGTGACGGTGCCGAGGATGACGATGAGCGCATTGTTAGATTTCATA CACGGTGCCTGACTGCGTTAGCAATTTAACTGTGATAAACTACCGCATTAAAGCTTATCGATGATAAGCTGTCAAACATGAGAA

>pET47GG-Entranceposon-M1-CamR, region - T7 promoter to T7 terminator, 1497 bp TAATACGACTCACTATAGGGGAATTGTGAGCGGATAACAATTCCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACCATGTCCCCAG AGGATTAGATCTGAAGCGGCGCACGAAAAACGCGAAAGCGTTTCACGATAAATGCGAAAACGGATCGATCCTTTTCGACCGAATAAATACCTGTGAC GGAAGATCACTTCGCAGAATAAATAAATCCTGGTGTCCCTGTTGATACCGGGAAGCCCTGGGCCAACTTTTGGCGAAAATGAGATGTTGATCGGCAC GTAAGAGGTTCCAACTTTCACCATAATGAAATAAGATCACTACCGGGCGTATTTTTTGAGTTGTCGAGATTTTCAGGAGCTAAGGAAGCTAAAATGG AGAAAAAAATCACTGGATATACCACCGTTGATATATCCCAATGGCATCGTAAAGAACATTTTGAGGCATTTCAGTCAGTTGCTCAATGTACCTATAA CCAGACCGTTCAGCTGGATATTACGGCCTTTTTAAAGACCGTAAAGAAAAATAAGCACAAGTTTTATCCGGCCTTTATTCACATTCTTGCCCGCCTG ATGAATGCTCATCCGGAATTACGTATGGCAATGAAAGACGGTGAGCTGGTGATATGGGATAGTGTTCACCCTTGTTACACCGTTTTCCATGAGCAAA CTGAAACGTTTTCATCGCTCTGGAGTGAATACCACGACGATTTCCGGCAGTTTCTACACATATATTCGCAAGATGTGGCGTGTTACGGTGAAAACCT GGCCTATTTCCCTAAAGGGTTTATTGAGAATATGTTTTTCGTGTCAGCCAATCCCTGGGTGAGTTTCACCAGTTTTGATTTAAACGTGGCCAATATG GACAACTTCTTCGCCCCCGTTTTCACCATGGGCAAATATTATACGCAAGGCGACAAGGTGCTGATGCCGCTGGCGATTCAGGTTCATCATGCCGTTT GTGATGGCTTCCATGTCGGCAGAATGCTTAATGAATTACAACAGTACTGCGATGAGTGGCAGGGCGGGGCGTAATTTTTTTAAGGCAGTTATTGGTG CCCTTAAACGCCTGGTTGCTACGCCTGAATAAGTGATAATAAGCGGATGAATGGCAGAAATTCGAAAGCAAATTCGACCCGGTCGTCGGTTCAGGGC AGGGTCGTTAAATAGCCGCTTATGTCTATTGCTGGTTTACCGGTTTATTGACTACCGGAAGCAGTGTGACCGTGTGCTTCTCAAATGCCTGAGGCCA GTTTGCTCAGGCTCTCCCCGTGGAGGTAATAATTGACGATAGGATCGATCCGTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTTCGTGCGCCGC TTCAAGCTTTGCTATGTATAGCGGTTAAGATCCGGCTGCTAACAAAGCCCGAAAGGAAGCTGAGTTGGCTGCTGCCACCGCTGAGCAATAACTAGCA TAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTG

101

>pET47GG-BsmBI-M1-CamR, region - T7 promoter to T7 terminator, 1477 bp TAATACGACTCACTATAGGGGAATTGTGAGCGGATAACAATTCCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACCATGTCCAGAT CTGTGGGAGACGACGAAAAACGCGAAAGCGTTTCACGATAAATGCGAAAACGGATCGATCCTTTTCGACCGAATAAATACCTGTGACGGAAGATCAC TTCGCAGAATAAATAAATCCTGGTGTCCCTGTTGATACCGGGAAGCCCTGGGCCAACTTTTGGCGAAAATGAGATGTTGATCGGCACGTAAGAGGTT CCAACTTTCACCATAATGAAATAAGATCACTACCGGGCGTATTTTTTGAGTTGTCGAGATTTTCAGGAGCTAAGGAAGCTAAAATGGAGAAAAAAAT CACTGGATATACCACCGTTGATATATCCCAATGGCATCGTAAAGAACATTTTGAGGCATTTCAGTCAGTTGCTCAATGTACCTATAACCAGACCGTT CAGCTGGATATTACGGCCTTTTTAAAGACCGTAAAGAAAAATAAGCACAAGTTTTATCCGGCCTTTATTCACATTCTTGCCCGCCTGATGAATGCTC ATCCGGAATTACGTATGGCAATGAAAGACGGTGAGCTGGTGATATGGGATAGTGTTCACCCTTGTTACACCGTTTTCCATGAGCAAACTGAAACGTT TTCATCGCTCTGGAGTGAATACCACGACGATTTCCGGCAGTTTCTACACATATATTCGCAAGATGTGGCGTGTTACGGTGAAAACCTGGCCTATTTC CCTAAAGGGTTTATTGAGAATATGTTTTTCGTGTCAGCCAATCCCTGGGTGAGTTTCACCAGTTTTGATTTAAACGTGGCCAATATGGACAACTTCT TCGCCCCCGTTTTCACCATGGGCAAATATTATACGCAAGGCGACAAGGTGCTGATGCCGCTGGCGATTCAGGTTCATCATGCCGTTTGTGATGGCTT CCATGTCGGCAGAATGCTTAATGAATTACAACAGTACTGCGATGAGTGGCAGGGCGGGGCGTAATTTTTTTAAGGCAGTTATTGGTGCCCTTAAACG CCTGGTTGCTACGCCTGAATAAGTGATAATAAGCGGATGAATGGCAGAAATTCGAAAGCAAATTCGACCCGGTCGTCGGTTCAGGGCAGGGTCGTTA AATAGCCGCTTATGTCTATTGCTGGTTTACCGGTTTATTGACTACCGGAAGCAGTGTGACCGTGTGCTTCTCAAATGCCTGAGGCCAGTTTGCTCAG GCTCTCCCCGTGGAGGTAATAATTGACGATAGGATCGATCCGTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTTCGTCGTCTCCTTCAAGCTTA GCGGTTAAGATCCGGCTGCTAACAAAGCCCGAAAGGAAGCTGAGTTGGCTGCTGCCACCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAA ACGGGTCTTGAGGGGTTTTTTG

102

>pET47GG-BsaI-M1-CamR, region - T7 promoter to T7 terminator , 1477 bp TAATACGACTCACTATAGGGGAATTGTGAGCGGATAACAATTCCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACCATGTCCAGAT CTGTGTCGGAGACCGAAAAACGCGAAAGCGTTTCACGATAAATGCGAAAACGGATCGATCCTTTTCGACCGAATAAATACCTGTGACGGAAGATCAC TTCGCAGAATAAATAAATCCTGGTGTCCCTGTTGATACCGGGAAGCCCTGGGCCAACTTTTGGCGAAAATGAGATGTTGATCGGCACGTAAGAGGTT CCAACTTTCACCATAATGAAATAAGATCACTACCGGGCGTATTTTTTGAGTTGTCGAGATTTTCAGGAGCTAAGGAAGCTAAAATGGAGAAAAAAAT CACTGGATATACCACCGTTGATATATCCCAATGGCATCGTAAAGAACATTTTGAGGCATTTCAGTCAGTTGCTCAATGTACCTATAACCAGACCGTT CAGCTGGATATTACGGCCTTTTTAAAGACCGTAAAGAAAAATAAGCACAAGTTTTATCCGGCCTTTATTCACATTCTTGCCCGCCTGATGAATGCTC ATCCGGAATTACGTATGGCAATGAAAGACGGTGAGCTGGTGATATGGGATAGTGTTCACCCTTGTTACACCGTTTTCCATGAGCAAACTGAAACGTT TTCATCGCTCTGGAGTGAATACCACGACGATTTCCGGCAGTTTCTACACATATATTCGCAAGATGTGGCGTGTTACGGTGAAAACCTGGCCTATTTC CCTAAAGGGTTTATTGAGAATATGTTTTTCGTGTCAGCCAATCCCTGGGTGAGTTTCACCAGTTTTGATTTAAACGTGGCCAATATGGACAACTTCT TCGCCCCCGTTTTCACCATGGGCAAATATTATACGCAAGGCGACAAGGTGCTGATGCCGCTGGCGATTCAGGTTCATCATGCCGTTTGTGATGGCTT CCATGTCGGCAGAATGCTTAATGAATTACAACAGTACTGCGATGAGTGGCAGGGCGGGGCGTAATTTTTTTAAGGCAGTTATTGGTGCCCTTAAACG CCTGGTTGCTACGCCTGAATAAGTGATAATAAGCGGATGAATGGCAGAAATTCGAAAGCAAATTCGACCCGGTCGTCGGTTCAGGGCAGGGTCGTTA AATAGCCGCTTATGTCTATTGCTGGTTTACCGGTTTATTGACTACCGGAAGCAGTGTGACCGTGTGCTTCTCAAATGCCTGAGGCCAGTTTGCTCAG GCTCTCCCCGTGGAGGTAATAATTGACGATAGGATCGATCCGTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTTCGGTCTCCGGTTCAAGCTTA GCGGTTAAGATCCGGCTGCTAACAAAGCCCGAAAGGAAGCTGAGTTGGCTGCTGCCACCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAA ACGGGTCTTGAGGGGTTTTTTG

103

>pUCGG-KanR-BsaI-M1-CmR-1, 3340 bp AAAAACTCATCGAGCATCAAATGAAACTGCAATTTATTCATATCAGGATTATCAATACCATATTTTTGAAAAAGCCGTTTCTGTAATGAAGGAGAAA ACTCACCGAGGCAGTTCCATAGGATGGCAAGATCCTGGTATCGGTCTGCGATTCCGACTCGTCCAACATCAATACAACCTATTAATTTCCCCTCGTC AAAAATAAGGTTATCAAGTGAGAAATCACCATGAGTGACGACTGAATCCGGTGAGAATGGCAAAAGTTTATGCATTTCTTTCCAGACTTGTTCAACA GGCCAGCCATTACGCTCGTCATCAAAATCACTCGCATCAACCAAACCGTTATTCATTCGTGATTGCGCCTGAGCGAGGCGAAATACGCGATCACTGT TAAAAGGACAATTACAAACAGGAATCGAATGCAACCGGCGCAGGAACACTGCCAGCGCATCAACAATATTTTCACCTGAATCAGGATATTCTTCTAA TACCTGGAATGCTGTTTTGCCGGGGATCGCAGTGGTGAGTAACCATGCATCATCAGGAGTACGGATAAAATGCTTGATGGTCGGAAGAGGCATAAAT TCCGTCAGCCAGTTTAGTCTGACCATCTCATCTGTAACATCATTGGCAACGCTACCTTTGCCATGTTTCAGAAACAACTCTGGCGCATCGGGCTTCC CATACAATCGATAGATTGTCGCACCTGATTGCCCGACATTATCGCGAGCCCATTTATACCCATATAAATCAGCATCCATGTTGGAATTTAATCGCGG CCTAGAGCAAGACGTTTCCCGTTGAATATGGCTCATAACACCCCTTGTATTACTGTTTATGTAAGCAGACAGTTTTATTGTTCATGACCATCCCTTA ACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAA ACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATAC CAAATACTGTTCTTCTAGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGCTAATCCTGTTACCAGT GGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCG TGCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGG CGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATAGTCCTGTCGGGTT TCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTG GCCTTTTGCTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATACCGCTC GCCGCAGCCGAACGACCGAGCGCAGCGAGTCAGTGAGCGAGGAAGCGGAAGAGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCA TTAATGCAGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCT TTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCTATGTCCAGATCTGCATCGGAGACC GAAAAACGCGAAAGCGTTTCACGATAAATGCGAAAACGGATCGATCCTTTTCGACCGAATAAATACCTGTGACGGAAGATCACTTCGCAGAATAAAT AAATCCTGGTGTCCCTGTTGATACCGGGAAGCCCTGGGCCAACTTTTGGCGAAAATGAGATGTTGATCGGCACGTAAGAGGTTCCAACTTTCACCAT AATGAAATAAGATCACTACCGGGCGTATTTTTTGAGTTGTCGAGATTTTCAGGAGCTAAGGAAGCTAAAATGGAGAAAAAAATCACTGGATATACCA CCGTTGATATATCCCAATGGCATCGTAAAGAACATTTTGAGGCATTTCAGTCAGTTGCTCAATGTACCTATAACCAGACCGTTCAGCTGGATATTAC GGCCTTTTTAAAGACCGTAAAGAAAAATAAGCACAAGTTTTATCCGGCCTTTATTCACATTCTTGCCCGCCTGATGAATGCTCATCCGGAATTACGT ATGGCAATGAAAGACGGTGAGCTGGTGATATGGGATAGTGTTCACCCTTGTTACACCGTTTTCCATGAGCAAACTGAAACGTTTTCATCGCTCTGGA GTGAATACCACGACGATTTCCGGCAGTTTCTACACATATATTCGCAAGATGTGGCGTGTTACGGTGAAAACCTGGCCTATTTCCCTAAAGGGTTTAT TGAGAATATGTTTTTCGTGTCAGCCAATCCCTGGGTGAGTTTCACCAGTTTTGATTTAAACGTGGCCAATATGGACAACTTCTTCGCCCCCGTTTTC ACCATGGGCAAATATTATACGCAAGGCGACAAGGTGCTGATGCCGCTGGCGATTCAGGTTCATCATGCCGTTTGTGATGGCTTCCATGTCGGCAGAA TGCTTAATGAATTACAACAGTACTGCGATGAGTGGCAGGGCGGGGCGTAATTTTTTTAAGGCAGTTATTGGTGCCCTTAAACGCCTGGTTGCTACGC CTGAATAAGTGATAATAAGCGGATGAATGGCAGAAATTCGAAAGCAAATTCGACCCGGTCGTCGGTTCAGGGCAGGGTCGTTAAATAGCCGCTTATG TCTATTGCTGGTTTACCGGTTTATTGACTACCGGAAGCAGTGTGACCGTGTGCTTCTCAAATGCCTGAGGCCAGTTTGCTCAGGCTCTCCCCGTGGA GGTAATAATTGACGATAGGATCGATCCGTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTTCGGTCTCCGCGTCAAGCTTAGCGGTTAATTAAGC CAGCCCCGACACCCGCCAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACAGACAAGCTGTGACCGTCACCGGGAGCT GCATGTGTCAGAGGTTTTCACCGTCATCACCGAAACGCTTAG

104

>pUCGG-KanR-BsaI-M1-CmR-2, 3340 bp AAAAACTCATCGAGCATCAAATGAAACTGCAATTTATTCATATCAGGATTATCAATACCATATTTTTGAAAAAGCCGTTTCTGTAATGAAGGAGAAA ACTCACCGAGGCAGTTCCATAGGATGGCAAGATCCTGGTATCGGTCTGCGATTCCGACTCGTCCAACATCAATACAACCTATTAATTTCCCCTCGTC AAAAATAAGGTTATCAAGTGAGAAATCACCATGAGTGACGACTGAATCCGGTGAGAATGGCAAAAGTTTATGCATTTCTTTCCAGACTTGTTCAACA GGCCAGCCATTACGCTCGTCATCAAAATCACTCGCATCAACCAAACCGTTATTCATTCGTGATTGCGCCTGAGCGAGGCGAAATACGCGATCACTGT TAAAAGGACAATTACAAACAGGAATCGAATGCAACCGGCGCAGGAACACTGCCAGCGCATCAACAATATTTTCACCTGAATCAGGATATTCTTCTAA TACCTGGAATGCTGTTTTGCCGGGGATCGCAGTGGTGAGTAACCATGCATCATCAGGAGTACGGATAAAATGCTTGATGGTCGGAAGAGGCATAAAT TCCGTCAGCCAGTTTAGTCTGACCATCTCATCTGTAACATCATTGGCAACGCTACCTTTGCCATGTTTCAGAAACAACTCTGGCGCATCGGGCTTCC CATACAATCGATAGATTGTCGCACCTGATTGCCCGACATTATCGCGAGCCCATTTATACCCATATAAATCAGCATCCATGTTGGAATTTAATCGCGG CCTAGAGCAAGACGTTTCCCGTTGAATATGGCTCATAACACCCCTTGTATTACTGTTTATGTAAGCAGACAGTTTTATTGTTCATGACCATCCCTTA ACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAA ACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATAC CAAATACTGTTCTTCTAGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGCTAATCCTGTTACCAGT GGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCG TGCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGG CGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATAGTCCTGTCGGGTT TCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTG GCCTTTTGCTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATACCGCTC GCCGCAGCCGAACGACCGAGCGCAGCGAGTCAGTGAGCGAGGAAGCGGAAGAGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCA TTAATGCAGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCT TTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCTATGTCCAGATCTGCATCGGAGACC GAAAAACGCGAAAGCGTTTCACGATAAATGCGAAAACGGATCGATCCTTTTCGACCGAATAAATACCTGTGACGGAAGATCACTTCGCAGAATAAAT AAATCCTGGTGTCCCTGTTGATACCGGGAAGCCCTGGGCCAACTTTTGGCGAAAATGAGATGTTGATCGGCACGTAAGAGGTTCCAACTTTCACCAT AATGAAATAAGATCACTACCGGGCGTATTTTTTGAGTTGTCGAGATTTTCAGGAGCTAAGGAAGCTAAAATGGAGAAAAAAATCACTGGATATACCA CCGTTGATATATCCCAATGGCATCGTAAAGAACATTTTGAGGCATTTCAGTCAGTTGCTCAATGTACCTATAACCAGACCGTTCAGCTGGATATTAC GGCCTTTTTAAAGACCGTAAAGAAAAATAAGCACAAGTTTTATCCGGCCTTTATTCACATTCTTGCCCGCCTGATGAATGCTCATCCGGAATTACGT ATGGCAATGAAAGACGGTGAGCTGGTGATATGGGATAGTGTTCACCCTTGTTACACCGTTTTCCATGAGCAAACTGAAACGTTTTCATCGCTCTGGA GTGAATACCACGACGATTTCCGGCAGTTTCTACACATATATTCGCAAGATGTGGCGTGTTACGGTGAAAACCTGGCCTATTTCCCTAAAGGGTTTAT TGAGAATATGTTTTTCGTGTCAGCCAATCCCTGGGTGAGTTTCACCAGTTTTGATTTAAACGTGGCCAATATGGACAACTTCTTCGCCCCCGTTTTC ACCATGGGCAAATATTATACGCAAGGCGACAAGGTGCTGATGCCGCTGGCGATTCAGGTTCATCATGCCGTTTGTGATGGCTTCCATGTCGGCAGAA TGCTTAATGAATTACAACAGTACTGCGATGAGTGGCAGGGCGGGGCGTAATTTTTTTAAGGCAGTTATTGGTGCCCTTAAACGCCTGGTTGCTACGC 105 CTGAATAAGTGATAATAAGCGGATGAATGGCAGAAATTCGAAAGCAAATTCGACCCGGTCGTCGGTTCAGGGCAGGGTCGTTAAATAGCCGCTTATG TCTATTGCTGGTTTACCGGTTTATTGACTACCGGAAGCAGTGTGACCGTGTGCTTCTCAAATGCCTGAGGCCAGTTTGCTCAGGCTCTCCCCGTGGA GGTAATAATTGACGATAGGATCGATCCGTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTTCGGTCTCCACCTCAAGCTTAGCGGTTAATTAAGC CAGCCCCGACACCCGCCAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACAGACAAGCTGTGACCGTCACCGGGAGCT GCATGTGTCAGAGGTTTTCACCGTCATCACCGAAACGCTTAG

>pUCGG-KanR-BsaI-M1-CmR-3, 3340 bp AAAAACTCATCGAGCATCAAATGAAACTGCAATTTATTCATATCAGGATTATCAATACCATATTTTTGAAAAAGCCGTTTCTGTAATGAAGGAGAAA ACTCACCGAGGCAGTTCCATAGGATGGCAAGATCCTGGTATCGGTCTGCGATTCCGACTCGTCCAACATCAATACAACCTATTAATTTCCCCTCGTC AAAAATAAGGTTATCAAGTGAGAAATCACCATGAGTGACGACTGAATCCGGTGAGAATGGCAAAAGTTTATGCATTTCTTTCCAGACTTGTTCAACA GGCCAGCCATTACGCTCGTCATCAAAATCACTCGCATCAACCAAACCGTTATTCATTCGTGATTGCGCCTGAGCGAGGCGAAATACGCGATCACTGT TAAAAGGACAATTACAAACAGGAATCGAATGCAACCGGCGCAGGAACACTGCCAGCGCATCAACAATATTTTCACCTGAATCAGGATATTCTTCTAA TACCTGGAATGCTGTTTTGCCGGGGATCGCAGTGGTGAGTAACCATGCATCATCAGGAGTACGGATAAAATGCTTGATGGTCGGAAGAGGCATAAAT TCCGTCAGCCAGTTTAGTCTGACCATCTCATCTGTAACATCATTGGCAACGCTACCTTTGCCATGTTTCAGAAACAACTCTGGCGCATCGGGCTTCC CATACAATCGATAGATTGTCGCACCTGATTGCCCGACATTATCGCGAGCCCATTTATACCCATATAAATCAGCATCCATGTTGGAATTTAATCGCGG CCTAGAGCAAGACGTTTCCCGTTGAATATGGCTCATAACACCCCTTGTATTACTGTTTATGTAAGCAGACAGTTTTATTGTTCATGACCATCCCTTA ACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAA ACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATAC CAAATACTGTTCTTCTAGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGCTAATCCTGTTACCAGT GGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCG TGCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGG CGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATAGTCCTGTCGGGTT TCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTG GCCTTTTGCTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATACCGCTC GCCGCAGCCGAACGACCGAGCGCAGCGAGTCAGTGAGCGAGGAAGCGGAAGAGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCA TTAATGCAGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCT TTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCTATGTCCAGATCTACTTCGGAGACC GAAAAACGCGAAAGCGTTTCACGATAAATGCGAAAACGGATCGATCCTTTTCGACCGAATAAATACCTGTGACGGAAGATCACTTCGCAGAATAAAT AAATCCTGGTGTCCCTGTTGATACCGGGAAGCCCTGGGCCAACTTTTGGCGAAAATGAGATGTTGATCGGCACGTAAGAGGTTCCAACTTTCACCAT AATGAAATAAGATCACTACCGGGCGTATTTTTTGAGTTGTCGAGATTTTCAGGAGCTAAGGAAGCTAAAATGGAGAAAAAAATCACTGGATATACCA CCGTTGATATATCCCAATGGCATCGTAAAGAACATTTTGAGGCATTTCAGTCAGTTGCTCAATGTACCTATAACCAGACCGTTCAGCTGGATATTAC GGCCTTTTTAAAGACCGTAAAGAAAAATAAGCACAAGTTTTATCCGGCCTTTATTCACATTCTTGCCCGCCTGATGAATGCTCATCCGGAATTACGT ATGGCAATGAAAGACGGTGAGCTGGTGATATGGGATAGTGTTCACCCTTGTTACACCGTTTTCCATGAGCAAACTGAAACGTTTTCATCGCTCTGGA 106 GTGAATACCACGACGATTTCCGGCAGTTTCTACACATATATTCGCAAGATGTGGCGTGTTACGGTGAAAACCTGGCCTATTTCCCTAAAGGGTTTAT TGAGAATATGTTTTTCGTGTCAGCCAATCCCTGGGTGAGTTTCACCAGTTTTGATTTAAACGTGGCCAATATGGACAACTTCTTCGCCCCCGTTTTC ACCATGGGCAAATATTATACGCAAGGCGACAAGGTGCTGATGCCGCTGGCGATTCAGGTTCATCATGCCGTTTGTGATGGCTTCCATGTCGGCAGAA TGCTTAATGAATTACAACAGTACTGCGATGAGTGGCAGGGCGGGGCGTAATTTTTTTAAGGCAGTTATTGGTGCCCTTAAACGCCTGGTTGCTACGC CTGAATAAGTGATAATAAGCGGATGAATGGCAGAAATTCGAAAGCAAATTCGACCCGGTCGTCGGTTCAGGGCAGGGTCGTTAAATAGCCGCTTATG TCTATTGCTGGTTTACCGGTTTATTGACTACCGGAAGCAGTGTGACCGTGTGCTTCTCAAATGCCTGAGGCCAGTTTGCTCAGGCTCTCCCCGTGGA GGTAATAATTGACGATAGGATCGATCCGTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTTCGGTCTCCGCGTCAAGCTTAGCGGTTAATTAAGC CAGCCCCGACACCCGCCAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACAGACAAGCTGTGACCGTCACCGGGAGCT GCATGTGTCAGAGGTTTTCACCGTCATCACCGAAACGCTTAG

>pUCGG-KanR-BsaI-M1-CmR-4, 3340 bp AAAAACTCATCGAGCATCAAATGAAACTGCAATTTATTCATATCAGGATTATCAATACCATATTTTTGAAAAAGCCGTTTCTGTAATGAAGGAGAAA ACTCACCGAGGCAGTTCCATAGGATGGCAAGATCCTGGTATCGGTCTGCGATTCCGACTCGTCCAACATCAATACAACCTATTAATTTCCCCTCGTC AAAAATAAGGTTATCAAGTGAGAAATCACCATGAGTGACGACTGAATCCGGTGAGAATGGCAAAAGTTTATGCATTTCTTTCCAGACTTGTTCAACA GGCCAGCCATTACGCTCGTCATCAAAATCACTCGCATCAACCAAACCGTTATTCATTCGTGATTGCGCCTGAGCGAGGCGAAATACGCGATCACTGT TAAAAGGACAATTACAAACAGGAATCGAATGCAACCGGCGCAGGAACACTGCCAGCGCATCAACAATATTTTCACCTGAATCAGGATATTCTTCTAA TACCTGGAATGCTGTTTTGCCGGGGATCGCAGTGGTGAGTAACCATGCATCATCAGGAGTACGGATAAAATGCTTGATGGTCGGAAGAGGCATAAAT TCCGTCAGCCAGTTTAGTCTGACCATCTCATCTGTAACATCATTGGCAACGCTACCTTTGCCATGTTTCAGAAACAACTCTGGCGCATCGGGCTTCC CATACAATCGATAGATTGTCGCACCTGATTGCCCGACATTATCGCGAGCCCATTTATACCCATATAAATCAGCATCCATGTTGGAATTTAATCGCGG CCTAGAGCAAGACGTTTCCCGTTGAATATGGCTCATAACACCCCTTGTATTACTGTTTATGTAAGCAGACAGTTTTATTGTTCATGACCATCCCTTA ACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAA ACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATAC CAAATACTGTTCTTCTAGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGCTAATCCTGTTACCAGT GGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCG TGCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGG CGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATAGTCCTGTCGGGTT TCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTG GCCTTTTGCTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATACCGCTC GCCGCAGCCGAACGACCGAGCGCAGCGAGTCAGTGAGCGAGGAAGCGGAAGAGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCA TTAATGCAGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCT TTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCTATGTCCAGATCTACTTCGGAGACC GAAAAACGCGAAAGCGTTTCACGATAAATGCGAAAACGGATCGATCCTTTTCGACCGAATAAATACCTGTGACGGAAGATCACTTCGCAGAATAAAT AAATCCTGGTGTCCCTGTTGATACCGGGAAGCCCTGGGCCAACTTTTGGCGAAAATGAGATGTTGATCGGCACGTAAGAGGTTCCAACTTTCACCAT

107 AATGAAATAAGATCACTACCGGGCGTATTTTTTGAGTTGTCGAGATTTTCAGGAGCTAAGGAAGCTAAAATGGAGAAAAAAATCACTGGATATACCA CCGTTGATATATCCCAATGGCATCGTAAAGAACATTTTGAGGCATTTCAGTCAGTTGCTCAATGTACCTATAACCAGACCGTTCAGCTGGATATTAC GGCCTTTTTAAAGACCGTAAAGAAAAATAAGCACAAGTTTTATCCGGCCTTTATTCACATTCTTGCCCGCCTGATGAATGCTCATCCGGAATTACGT ATGGCAATGAAAGACGGTGAGCTGGTGATATGGGATAGTGTTCACCCTTGTTACACCGTTTTCCATGAGCAAACTGAAACGTTTTCATCGCTCTGGA GTGAATACCACGACGATTTCCGGCAGTTTCTACACATATATTCGCAAGATGTGGCGTGTTACGGTGAAAACCTGGCCTATTTCCCTAAAGGGTTTAT TGAGAATATGTTTTTCGTGTCAGCCAATCCCTGGGTGAGTTTCACCAGTTTTGATTTAAACGTGGCCAATATGGACAACTTCTTCGCCCCCGTTTTC ACCATGGGCAAATATTATACGCAAGGCGACAAGGTGCTGATGCCGCTGGCGATTCAGGTTCATCATGCCGTTTGTGATGGCTTCCATGTCGGCAGAA TGCTTAATGAATTACAACAGTACTGCGATGAGTGGCAGGGCGGGGCGTAATTTTTTTAAGGCAGTTATTGGTGCCCTTAAACGCCTGGTTGCTACGC CTGAATAAGTGATAATAAGCGGATGAATGGCAGAAATTCGAAAGCAAATTCGACCCGGTCGTCGGTTCAGGGCAGGGTCGTTAAATAGCCGCTTATG TCTATTGCTGGTTTACCGGTTTATTGACTACCGGAAGCAGTGTGACCGTGTGCTTCTCAAATGCCTGAGGCCAGTTTGCTCAGGCTCTCCCCGTGGA GGTAATAATTGACGATAGGATCGATCCGTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTTCGGTCTCCACCTCAAGCTTAGCGGTTAATTAAGC CAGCCCCGACACCCGCCAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACAGACAAGCTGTGACCGTCACCGGGAGCT GCATGTGTCAGAGGTTTTCACCGTCATCACCGAAACGCTTAG

>pET14GG-CHis-FadR, region - T7 promoter to T7 terminator, 937 bp TAATACGACTCACTATAGGGACACCACAACGGTTTCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACCATGTCCGTCATTAAGGCG CAAAGCCCGGCGGGTTTCGCGGAAGAGTACATTATTGAAAGTATCTGGAATAACCGCTTCCCTCCCGGGACTATTTTGCCCGCAGAACGTGAACTTT CAGAATTAATTGGCGTAACGCGTACTACGTTACGTGAAGTGTTACAGCGTCTGGCACGAGATGGCTGGTTGACCATTCAACATGGCAAGCCGACGAA GGTGAATAATTTCTGGGAAACTTCCGGTTTAAATATCCTTGAAACACTGGCGCGACTGGATCACGAAAGTGTGCCGCAGCTTATTGATAATTTGCTG TCGGTGCGTACCAATATTTCCACTATTTTTATTCGCACCGCGTTTCGTCAGCATCCCGATAAAGCGCAGGAAGTGCTGGCTACCGCTAATGAAGTGG CCGATCACGCCGATGCCTTTGCCGAGCTGGATTACAACATATTCCGCGGCCTGGCGTTTGCTTCCGGCAACCCGATTTACGGTCTGATTCTTAACGG GATGAAAGGGCTGTATACGCGTATTGGTCGTCACTATTTCGCCAATCCGGAAGCGCGCAGTCTGGCGCTGGGCTTCTACCACAAACTGTCGGCGTTG TGCAGTGAAGGCGCGCACGATCAGGTGTACGAAACAGTGCGTCGCTATGGGCATGAGAGTGGCGAGATTTGGCACCGGATGCAGAAAAATCTGCCGG GTGATTTAGCCATTCAGGGGCGAAGCGGTCACCATCACCATCACCATTAAGATCCGGCTGCTAACAAAGCCCGAAAGGAAGCTGAGTTGGCTGCTGC CACCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTG

108

>pET14GG-CHis-MBP, region - T7 promoter to T7 terminator, 1333 bp TAATACGACTCACTATAGGGACACCACAACGGTTTCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACCATGTCCAAAATCGAAGAA GGTAAACTGGTAATCTGGATTAACGGCGATAAAGGCTATAACGGACTCGCTGAAGTCGGTAAGAAATTCGAGAAAGATACCGGAATTAAAGTCACCG TTGAGCATCCGGATAAACTGGAAGAGAAATTCCCACAGGTTGCGGCAACTGGCGATGGCCCTGACATTATCTTCTGGGCACACGACCGCTTTGGTGG CTACGCTCAATCTGGCCTGTTGGCTGAAATCACCCCGGACAAAGCGTTCCAGGACAAGCTGTATCCGTTTACCTGGGATGCCGTACGTTACAACGGC AAGCTGATTGCTTACCCGATCGCTGTTGAAGCGTTATCGCTGATTTATAACAAAGATCTGCTGCCGAACCCGCCAAAAACCTGGGAAGAGATCCCGG CGCTGGATAAAGAACTGAAAGCGAAAGGTAAGAGCGCGCTGATGTTCAACCTGCAAGAACCGTACTTCACCTGGCCGCTGATTGCTGCTGACGGGGG TTATGCGTTCAAGTATGAAAACGGCAAGTACGACATTAAAGACGTGGGCGTGGATAACGCTGGCGCGAAAGCGGGTCTGACCTTCCTGGTTGACCTG ATTAAAAACAAACACATGAATGCAGACACCGATTACTCCATCGCAGAAGCTGCCTTTAATAAAGGCGAAACAGCGATGACCATCAACGGCCCGTGGG CATGGTCCAACATCGACACCAGCAAAGTGAATTATGGTGTAACGGTACTGCCGACCTTCAAGGGTCAACCATCCAAACCGTTCGTTGGCGTGCTGAG CGCAGGTATTAACGCCGCCAGTCCGAACAAAGAGCTGGCGAAAGAGTTCCTCGAAAACTATCTGCTGACTGATGAAGGTCTGGAAGCGGTTAATAAA GACAAACCGCTGGGTGCCGTAGCGCTGAAGTCTTACGAGGAAGAGTTGGCGAAAGATCCACGTATTGCCGCCACCATGGAAAACGCCCAGAAAGGTG AAATCATGCCGAACATCCCGCAGATGTCCGCTTTCTGGTATGCCGTGCGTACTGCGGTGATCAACGCCGCCAGCGGTCGTCAGACTGTCGATGAAGC CCTGAAAGACGCGCAGACTCGTATCACCAAGAGCGGTCACCATCACCATCACCATTAAGATCCGGCTGCTAACAAAGCCCGAAAGGAAGCTGAGTTG GCTGCTGCCACCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTG

109

>pCF-cpSFGFP, 3997 bp GGCTTCCCGGTATCAACAGGGACACCAGGATTTATTTATTCTGCGAAGTGATCTTCCGTCACACTTGTCTGTAAGCGGATGCCGGGAGCAGACAAGC CCGTCAGGGCGCGTCAGCGGGTGTTGGCGGGTGTCGGGGCTGGCTTAACTATGCGGCATCAGAGCAGATTGTACTGAGAGTGCACCATATATGCGGT GTGAAATACCGCACAGATGCGTAAGGAGAAAATACCGCATCAGGCGCCATTCGCCATTCAGGCTGCGCAACTGTTGGGAAGGGCGATCGGTGCGGGC CTCTTCGCTATTACGCCAGCTGGCGAAAGGGGGATGTGCTGCAAGGCGATTAAGTTGGGTAACGCCAGGGTTTTCCCAGTCACGACGTTGTAAAACG ACGGCCAGTGCCAAGCTTGCATGCAAGGAGATGGCGCCCAACAGTCCCCCGGCCACGGGGCCTGCCACCATACCCACGCCGAAACAAGCGCTCATGA GCCCGAAGTGGCGAGCCCGATCTTCCCCATCGGTGATGTCGGCGATATAGGCGCCAGCAACCGCACCTGTGGCGCCGGTGATGCCGGCCACGATGCG TCCGGCGTAGAGGATCGAGATCTCGATCCCGCGAAATTAATACGACTCACTATAGGGACACCACAACGGTTTCCCTCTAGAAATAATTTTGTTTAAC TTTAAGAAGGAGATATACCATGTCTGGTTCTCATCATCATCATCATCATAGCAGCGGCATCGAAGGCCGCGGCCGCGCATTCAACAGCCACAACGTG TATATCACGGCCGACAAGCAGAAGAACGGCATCAAGGCGAACTTCAAGATCCGCCACAACGTCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACC AGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCG CGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGGGTGGCGGTAGCGTGAGCAAGGGCGAG GAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGCGCGGCGAGGGCGAGGGCGATGCCA CCAACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTG CTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTCCTTCAAGGAC GACGGCACCTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCA ACATCCTGGGCCATAAGCTTGAATATAACGCGTAGATCCGGTAACTAACTAAGATCCGGTAAGATCCGGCTGCTAACAAAGCCCGAAAGGAAGCTGA GTTGGCTGCTGCCACCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGCTGAAAGGAGGAACTATATCC GGATATCCACAGGACGGGTGTGGTCGCCATGATCGCGTAGTCGATAGTGGCTCCAAGTAGCGAAGCGAGCAGGACTGGGCGGCGGCCAAAGCGGTCG GACAGTGCTCCGAGAACGGGTGCGCATAGAAATTGCATCAACGCATATAGCGCTAGCAGCACGCCATAGTGACTGGCGATGCTGTCGGAATGGACGA TATCCCGCAAGAGGCCCGGCAGTACCGGCATAACCAAGCCTATGCCTACAGCATCCAGGGTGACGGTGCCGAGGATGACGATGAGCGCATTGTTAGA TTTCATACACGGTGCCTGACTGCGTTAGCAATTTAACTGTGATAAACTACCGCATTAAAGCTTATCGATGATAAGCTGTCAAACATGAGAATTCGTA ATCATGTCATAGCTGTTTCCTGTGTGAAATTGTTATCCGCTCACAATTCCACACAACATACGAGCCGGAAGCATAAAGTGTAAAGCCTGGGGTGCCT AATGAGTGAGCTAACTCACATTAATTGCGTTGCGCTCACTGCCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTGCATTAATGAATCGGCCAACG CGCGGGGAGAGGCGGTTTGCGTATTGGGCGCTCTTCCGCTTCCTCGCTCACTGACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCAGCT CACTCAAAGGCGGTAATACGGTTATCCACAGAATCAGGGGATAACGCAGGAAAGAACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAA AGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACT ATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCG GGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTC AGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGAT TAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGGACAGTATTTGGTATCTGCGCTCTG CTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGA TTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGT CATGAGATTATCAAAAAGGATCTGCCATTCATCCGCTTATTATCACTTATTCAGGCGTAGCACCAGGCGTTTAAGGGCACCAATAACTGCCTTAAAA AAATTACGCCCCGCCCTGCCACTCATCGCAGTACTGTTGTAATTCATTAAGCATTCTGCCGACATGGAAGCCATCACAGACGGCATGATGAACCTGA ATCGCCAGCGGCATCAGCACCTTGTCGCCTTGCGTATAATATTTGCCCATAGTGAAAACGGGGGCGAAGAAGTTGTCCATATTGGCCACGTTTAAAT CAAAACTGGTGAAACTCACCCAGGGATTGGCTGAGACGAAAAACATATTCTCAATAAACCCTTTAGGGAAATAGGCCAGGTTTTCACCGTAACACGC CACATCTTGCGAATATATGTGTAGAAACTGCCGGAAATCGTCGTGGTATTCACTCCAGAGCGATGAAAACGTTTCAGTTTGCTCATGGAAAACGGTG 110 TAACAAGGGTGAACACTATCCCATATCACCAGCTCACCGTCTTTCATTGCCATACGGAACTCCGGATGAGCATTCATCAGGCGGGCAAGAATGTGAA TAAAGGCCGGATAAAACTTGTGCTTATTTTTCTTTACGGTCTTTAAAAAGGCCGTAATATCCAGCTGAACGGTCTGGTTATAGGTACATTGAGCAAC TGACTGAAATGCCTCAAAATGTTCTTTACGATGCCATTGGGATATATCAACGGTGGTATATCCAGTGATTTTTTTCTCCATTTTAGCTTCCTTAGCT CCTGAAAATCTCGATAACTCAAAAAATACGCCCGGTAGTGATCTTATTTCATTATGGTGAAAGTTGGAACCTCTTACGTGCCGATCAACGTCTCATT TTCGCCAAAAGTTGGCCCAG

>pET14GG-CHis-cpsfGFP149, region - T7 promoter to T7 terminator, 937 bp TAATACGACTCACTATAGGGACACCACAACGGTTTCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACCATGTCCAACGTGTATATC ACGGCCGACAAGCAGAAGAACGGCATCAAGGCGAACTTCAAGATCCGCCACAACGTCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGA ACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCA CATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGGGTGGCGGTAGCGTGAGCAAGGGCGAGGAGCTG TTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGCGCGGCGAGGGCGAGGGCGATGCCACCAACG GCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAG CCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTCCTTCAAGGACGACGGC ACCTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCC TGGGCCATAAGCTTGAATATAACAGCGGTCACCATCACCATCACCATTAAGATCCGGCTGCTAACAAAGCCCGAAAGGAAGCTGAGTTGGCTGCTGC CACCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTG

111

>pET47GG-cpsfGFP-GTGmut, region - T7 promoter to T7 terminator, 927 bp TAATACGACTCACTATAGGGGAATTGTGAGCGGATAACAATTCCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACCATGTCCAACG TTTATATCACGGCCGACAAGCAGAAGAACGGCATCAAGGCGAACTTCAAGATCCGCCACAACGTCGAGGACGGCAGCGTGCAGCTCGCCGACCACTA CCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAG CGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGGGTGGCGGTAGCGTGAGCAAGGGCG AGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGCGCGGCGAGGGCGAGGGCGATGC CACCAACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAG TGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTCCTTCAAGG ACGACGGCACCTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGG CAACATCCTGGGCCATAAGCTTGAATATAACAGCGGTTAAGATCCGGCTGCTAACAAAGCCCGAAAGGAAGCTGAGTTGGCTGCTGCCACCGCTGAG CAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTG

112 Sequences from Chapter 3

>pETM-CHis-Dest, 2915 bp CAAGGTTTCTTAGACGTCAGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTGTTTATTTTTCTAAATACATTCAAATATGTATCCGCTC ATGAGACAATAACCCTGATAAATGCTTCAATAATATTGAAAAAGGAAGAGTATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGG CATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGTGAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGA TCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCCGAAGAACGTTTTCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCC CGTGTTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACACTATTCTCAGAATGACTTGGTTGAGTACTCACCAGTCACAGAAAAGCATCTTACGG ATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAACCATGAGTGATAACACTGCGGCCAACTTACTTCTGACAACGATCGGAGGACCGAAGGA GCTAACCGCTTTTTTGCACAACATGGGGGATCATGTAACTCGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGAC ACCACGATGCCTGCAGCAATGGCAACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAACAATTAATAGACTGGATGG AGGCGGATAAAGTTGCAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAGCGTGGCTCTCGCGG TATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAG ATCGCTGAGATAGGTGCCTCACTGATTAAGCATTGGTAACTGTCAGACCAAGTTTACTCATATATACTTTAGATTGATTTAAAACTTCATTTTTAAT TTAAAAGGATCTAGGTGAAGATCCTTTTTGATAATCTCATGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAA GATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGAT CAAGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTCCTTCTAGTGTAGCCGTAGTTAGGCCACCACT TCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGA CTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTG AGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGC GCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTC AGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTA TCCCCTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATACCGCTCGCCGCAGCCGAACGACCGAGCGCAGCGAGTCAGTGAGCGAGG AAGCGGAAGAGCGCCATCAGCAACCGCACCTGTGGCGCCGGTGATGCCGGCCACGATGCGTCCGGCGTAGAGGATCGAGATCTCGATCCCGCGAAAT TAATACGACTCACTATAGGGACACCACAACGGTTTCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACCATGTCCAGAGACCATTAA TGCAGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTAC ACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCTATGACCATGATTACGCCAAGCTTGCATG CCTGCAGGTCGACTCTAGAGGATCCCCGGGTACCGAGCTCGAATTCACTGGCCGTCGTTTTACAACGTCGTGACTGGGAAAACCCTGGCGTTACCCA ACTTAATCGCCTTGCAGCACATCCCCCTTTCGCCAGCTGGCGTAATAGCGAAGAGGCCCGCACCGATCGCCCTTCCCAACAGTTGCGCAGCCTGAAT GGCGAATGGCGCCTGATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCACACCGCATATGGTGCACTCTCAGTACAATCTGCTCTGATGCCG CATAGTTAAGCCAGCCCCGACACCCGCCAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACAGACAAGCTGGTCTCTA GCGGTCACCATCACCATCACCATTAAGATCCGGCTGCTAACAAAGCCCGAAAGGAAGCTGAGTTGGCTGCTGCCACCGCTGAGCAATAACTAGCATA ACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGCTGAAAGGAGGAACTATATCCGGATATCCACAGGACGGGTGTGGTCGCCATGATCGC GTAGT

113

>pETM-CHis-MBP, region - T7 promoter to T7 terminator, 1333 bp TAATACGACTCACTATAGGGACACCACAACGGTTTCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACCATGTCCAAAATCGAAGAA GGTAAACTGGTAATCTGGATTAACGGCGATAAAGGCTATAACGGACTCGCTGAAGTCGGTAAGAAATTCGAGAAAGATACCGGAATTAAAGTCACCG TTGAGCATCCGGATAAACTGGAAGAGAAATTCCCACAGGTTGCGGCAACTGGCGATGGCCCTGACATTATCTTCTGGGCACACGACCGCTTTGGTGG CTACGCTCAATCTGGCCTGTTGGCTGAAATCACCCCGGACAAAGCGTTCCAGGACAAGCTGTATCCGTTTACCTGGGATGCCGTACGTTACAACGGC AAGCTGATTGCTTACCCGATCGCTGTTGAAGCGTTATCGCTGATTTATAACAAAGATCTGCTGCCGAACCCGCCAAAAACCTGGGAAGAGATCCCGG CGCTGGATAAAGAACTGAAAGCGAAAGGTAAGAGCGCGCTGATGTTCAACCTGCAAGAACCGTACTTCACCTGGCCGCTGATTGCTGCTGACGGGGG TTATGCGTTCAAGTATGAAAACGGCAAGTACGACATTAAAGACGTGGGCGTGGATAACGCTGGCGCGAAAGCGGGTCTGACCTTCCTGGTTGACCTG ATTAAAAACAAACACATGAATGCAGACACCGATTACTCCATCGCAGAAGCTGCCTTTAATAAAGGCGAAACAGCGATGACCATCAACGGCCCGTGGG CATGGTCCAACATCGACACCAGCAAAGTGAATTATGGTGTAACGGTACTGCCGACCTTCAAGGGTCAACCATCCAAACCGTTCGTTGGCGTGCTGAG CGCAGGTATTAACGCCGCCAGTCCGAACAAAGAGCTGGCGAAAGAGTTCCTCGAAAACTATCTGCTGACTGATGAAGGTCTGGAAGCGGTTAATAAA GACAAACCGCTGGGTGCCGTAGCGCTGAAGTCTTACGAGGAAGAGTTGGCGAAAGATCCACGTATTGCCGCCACCATGGAAAACGCCCAGAAAGGTG AAATCATGCCGAACATCCCGCAGATGTCCGCTTTCTGGTATGCCGTGCGTACTGCGGTGATCAACGCCGCCAGCGGTCGTCAGACTGTCGATGAAGC CCTGAAAGACGCGCAGACTCGTATCACCAAGAGCGGTCACCATCACCATCACCATTAAGATCCGGCTGCTAACAAAGCCCGAAAGGAAGCTGAGTTG GCTGCTGCCACCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTG

114

>pETM-CHis-cpEGFP, region - T7 promoter to T7 terminator, 955 bp TAATACGACTCACTATAGGGACACCACAACGGTTTCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACCATGTCCTATAACGTCTTT ATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCGAACTTCAAGATCCGCCACAACATCGAGGACGGCGGCGTGCAGCTCGCCTATCACTACCAGC AGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCGTGCAGTCCAAACTGAGCAAAGACCCCAACGAGAAGCGCGA TCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGGGCGGTACCGGAGGGAGCATGGTGAGCAAG GGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCG ATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGT GCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACATTCAGGAGCGCACCATCTTCTTC AAGGACGACGGCAACTATAAGACACGCGCTGAGGTTAAGTTCGAGGGCGACACTCTGGTTAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGG ACGGCAACATCCTGGGCCATAAGCTTGAATATAACTTCAACAGCGGTCACCATCACCATCACCATTAAGATCCGGCTGCTAACAAAGCCCGAAAGGA AGCTGAGTTGGCTGCTGCCACCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTG

>pETM-CHis-MBP170-cpEGFP, region - T7 promoter to T7 terminator, 2083 bp

115 TAATACGACTCACTATAGGGACACCACAACGGTTTCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACCATGTCCAAAATCGAAGAA GGTAAACTGGTAATCTGGATTAACGGCGATAAAGGCTATAACGGACTCGCTGAAGTCGGTAAGAAATTCGAGAAAGATACCGGAATTAAAGTCACCG TTGAGCATCCGGATAAACTGGAAGAGAAATTCCCACAGGTTGCGGCAACTGGCGATGGCCCTGACATTATCTTCTGGGCACACGACCGCTTTGGTGG CTACGCTCAATCTGGCCTGTTGGCTGAAATCACCCCGGACAAAGCGTTCCAGGACAAGCTGTATCCGTTTACCTGGGATGCCGTACGTTACAACGGC AAGCTGATTGCTTACCCGATCGCTGTTGAAGCGTTATCGCTGATTTATAACAAAGATCTGCTGCCGAACCCGCCAAAAACCTGGGAAGAGATCCCGG CGCTGGATAAAGAACTGAAAGCGAAAGGTAAGAGCGCGCTGATGTTCAACCTGCAAGAACCGTACTTCACCTGGCCGCTGATTGCTGCTGACGGGGG TTATGCGTTCAATGCATCTTATAACGTCTTTATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCGAACTTCAAGATCCGCCACAACATCGAGGAC GGCGGCGTGCAGCTCGCCTATCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCGTGCAGTCCA AACTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAA GGGCGGTACCGGAGGGAGCATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCAC AAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGC CCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGA AGGCTACATTCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTATAAGACACGCGCTGAGGTTAAGTTCGAGGGCGACACTCTGGTTAACCGC ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGCCATAAGCTTGAATATAACTTCAACGCGTCATTCAAGTATGAAAACGGCA AGTACGACATTAAAGACGTGGGCGTGGATAACGCTGGCGCGAAAGCGGGTCTGACCTTCCTGGTTGACCTGATTAAAAACAAACACATGAATGCAGA CACCGATTACTCCATCGCAGAAGCTGCCTTTAATAAAGGCGAAACAGCGATGACCATCAACGGCCCGTGGGCATGGTCCAACATCGACACCAGCAAA GTGAATTATGGTGTAACGGTACTGCCGACCTTCAAGGGTCAACCATCCAAACCGTTCGTTGGCGTGCTGAGCGCAGGTATTAACGCCGCCAGTCCGA ACAAAGAGCTGGCGAAAGAGTTCCTCGAAAACTATCTGCTGACTGATGAAGGTCTGGAAGCGGTTAATAAAGACAAACCGCTGGGTGCCGTAGCGCT GAAGTCTTACGAGGAAGAGTTGGCGAAAGATCCACGTATTGCCGCCACCATGGAAAACGCCCAGAAAGGTGAAATCATGCCGAACATCCCGCAGATG TCCGCTTTCTGGTATGCCGTGCGTACTGCGGTGATCAACGCCGCCAGCGGTCGTCAGACTGTCGATGAAGCCCTGAAAGACGCGCAGACTCGTATCA CCAAGAGCGGTCACCATCACCATCACCATTAAGATCCGGCTGCTAACAAAGCCCGAAAGGAAGCTGAGTTGGCTGCTGCCACCGCTGAGCAATAACT AGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTG

>pETM-CHis-MBP182-cpEGFP, region - T7 promoter to T7 terminator, 2083 bp TAATACGACTCACTATAGGGACACCACAACGGTTTCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACCATGTCCAAAATCGAAGAA GGTAAACTGGTAATCTGGATTAACGGCGATAAAGGCTATAACGGACTCGCTGAAGTCGGTAAGAAATTCGAGAAAGATACCGGAATTAAAGTCACCG TTGAGCATCCGGATAAACTGGAAGAGAAATTCCCACAGGTTGCGGCAACTGGCGATGGCCCTGACATTATCTTCTGGGCACACGACCGCTTTGGTGG CTACGCTCAATCTGGCCTGTTGGCTGAAATCACCCCGGACAAAGCGTTCCAGGACAAGCTGTATCCGTTTACCTGGGATGCCGTACGTTACAACGGC AAGCTGATTGCTTACCCGATCGCTGTTGAAGCGTTATCGCTGATTTATAACAAAGATCTGCTGCCGAACCCGCCAAAAACCTGGGAAGAGATCCCGG CGCTGGATAAAGAACTGAAAGCGAAAGGTAAGAGCGCGCTGATGTTCAACCTGCAAGAACCGTACTTCACCTGGCCGCTGATTGCTGCTGACGGGGG TTATGCGTTCAAGTATGAAAACGGCAAGTACGACATTAAAGACGTGGGTGCATCTTATAACGTCTTTATCATGGCCGACAAGCAGAAGAACGGCATC AAGGCGAACTTCAAGATCCGCCACAACATCGAGGACGGCGGCGTGCAGCTCGCCTATCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGC TGCTGCCCGACAACCACTACCTGAGCGTGCAGTCCAAACTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGC CGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGGGCGGTACCGGAGGGAGCATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCC ATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGT TCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACAT GAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACATTCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTATAAGACACGCGCT GAGGTTAAGTTCGAGGGCGACACTCTGGTTAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGCCATAAGCTTGAAT 116 ATAACTTCAACGCGTCAGTGGGCGTGGATAACGCTGGCGCGAAAGCGGGTCTGACCTTCCTGGTTGACCTGATTAAAAACAAACACATGAATGCAGA CACCGATTACTCCATCGCAGAAGCTGCCTTTAATAAAGGCGAAACAGCGATGACCATCAACGGCCCGTGGGCATGGTCCAACATCGACACCAGCAAA GTGAATTATGGTGTAACGGTACTGCCGACCTTCAAGGGTCAACCATCCAAACCGTTCGTTGGCGTGCTGAGCGCAGGTATTAACGCCGCCAGTCCGA ACAAAGAGCTGGCGAAAGAGTTCCTCGAAAACTATCTGCTGACTGATGAAGGTCTGGAAGCGGTTAATAAAGACAAACCGCTGGGTGCCGTAGCGCT GAAGTCTTACGAGGAAGAGTTGGCGAAAGATCCACGTATTGCCGCCACCATGGAAAACGCCCAGAAAGGTGAAATCATGCCGAACATCCCGCAGATG TCCGCTTTCTGGTATGCCGTGCGTACTGCGGTGATCAACGCCGCCAGCGGTCGTCAGACTGTCGATGAAGCCCTGAAAGACGCGCAGACTCGTATCA CCAAGAGCGGTCACCATCACCATCACCATTAAGATCCGGCTGCTAACAAAGCCCGAAAGGAAGCTGAGTTGGCTGCTGCCACCGCTGAGCAATAACT AGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTG

>pETM-MBP170-Dest, region - T7 promoter to T7 terminator, 1926 bp TAATACGACTCACTATAGGGACACCACAACGGTTTCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACCATGTCCAAAATCGAAGAA GGTAAACTGGTAATCTGGATTAACGGCGATAAAGGCTATAACGGACTCGCTGAAGTCGGTAAGAAATTCGAGAAAGATACCGGAATTAAAGTCACCG TTGAGCATCCGGATAAACTGGAAGAGAAATTCCCACAGGTTGCGGCAACTGGCGATGGCCCTGACATTATCTTCTGGGCACACGACCGCTTTGGTGG CTACGCTCAATCTGGCCTGTTGGCTGAAATCACCCCGGACAAAGCGTTCCAGGACAAGCTGTATCCGTTTACCTGGGATGCCGTACGTTACAACGGC AAGCTGATTGCTTACCCGATCGCTGTTGAAGCGTTATCGCTGATTTATAACAAAGATCTGCTGCCGAACCCGCCAAAAACCTGGGAAGAGATCCCGG CGCTGGATAAAGAACTGAAAGCGAAAGGTAAGAGCGCGCTGATGTTCAACCTGCAAGAACCGTACTTCACCTGGCCGCTGATTGCTGCTGACGGGGG TTATGCGTTCAGAGACCATTAATGCAGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACT CATTAGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCTATGACC ATGATTACGCCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCCGGGTACCGAGCTCGAATTCACTGGCCGTCGTTTTACAACGTCGTGACT GGGAAAACCCTGGCGTTACCCAACTTAATCGCCTTGCAGCACATCCCCCTTTCGCCAGCTGGCGTAATAGCGAAGAGGCCCGCACCGATCGCCCTTC CCAACAGTTGCGCAGCCTGAATGGCGAATGGCGCCTGATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCACACCGCATATGGTGCACTCTC AGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGCCCCGACACCCGCCAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCG CTTACAGACAAGCTGGTCTCTAAGTATGAAAACGGCAAGTACGACATTAAAGACGTGGGCGTGGATAACGCTGGCGCGAAAGCGGGTCTGACCTTCC TGGTTGACCTGATTAAAAACAAACACATGAATGCAGACACCGATTACTCCATCGCAGAAGCTGCCTTTAATAAAGGCGAAACAGCGATGACCATCAA CGGCCCGTGGGCATGGTCCAACATCGACACCAGCAAAGTGAATTATGGTGTAACGGTACTGCCGACCTTCAAGGGTCAACCATCCAAACCGTTCGTT GGCGTGCTGAGCGCAGGTATTAACGCCGCCAGTCCGAACAAAGAGCTGGCGAAAGAGTTCCTCGAAAACTATCTGCTGACTGATGAAGGTCTGGAAG CGGTTAATAAAGACAAACCGCTGGGTGCCGTAGCGCTGAAGTCTTACGAGGAAGAGTTGGCGAAAGATCCACGTATTGCCGCCACCATGGAAAACGC CCAGAAAGGTGAAATCATGCCGAACATCCCGCAGATGTCCGCTTTCTGGTATGCCGTGCGTACTGCGGTGATCAACGCCGCCAGCGGTCGTCAGACT GTCGATGAAGCCCTGAAAGACGCGCAGACTCGTATCACCAAGAGCGGTCACCATCACCATCACCATTAAGATCCGGCTGCTAACAAAGCCCGAAAGG AAGCTGAGTTGGCTGCTGCCACCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTG

117

>pETM-CHis-MBP170-cpEGFP-Mal2B2, region - T7 promoter to T7 terminator, 2077 bp TAATACGACTCACTATAGGGACACCACAACGGTTTCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACCATGTCCAAAATCGAAGAA GGTAAACTGGTAATCTGGATTAACGGCGATAAAGGCTATAACGGACTCGCTGAAGTCGGTAAGAAATTCGAGAAAGATACCGGAATTAAAGTCACCG TTGAGCATCCGGATAAACTGGAAGAGAAATTCCCACAGGTTGCGGCAACTGGCGATGGCCCTGACATTATCTTCTGGGCACACGACCGCTTTGGTGG CTACGCTCAATCTGGCCTGTTGGCTGAAATCACCCCGGACAAAGCGTTCCAGGACAAGCTGTATCCGTTTACCTGGGATGCCGTACGTTACAACGGC AAGCTGATTGCTTACCCGATCGCTGTTGAAGCGTTATCGCTGATTTATAACAAAGATCTGCTGCCGAACCCGCCAAAAACCTGGGAAGAGATCCCGG CGCTGGATAAAGAACTGAAAGCGAAAGGTAAGAGCGCGCTGATGTTCAACCTGCAAGAACCGTACTTCACCTGGCCGCTGATTGCTGCTGACGGGGG TTATGCGTTCGCTACTTATAACGTCTTTATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCGAACTTCAAGATCCGCCACAACATCGAGGACGGC GGCGTGCAGCTCGCCTATCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCGTGCAGTCCAAAC TGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGGG CGGTACCGGAGGGAGCATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAG TTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCA CCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGG CTACATTCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTATAAGACACGCGCTGAGGTTAAGTTCGAGGGCGACACTCTGGTTAACCGCATC GAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGCCATAAGCTTGAATATAACTTCAACAGTCGTAAGTATGAAAACGGCAAGTACG ACATTAAAGACGTGGGCGTGGATAACGCTGGCGCGAAAGCGGGTCTGACCTTCCTGGTTGACCTGATTAAAAACAAACACATGAATGCAGACACCGA TTACTCCATCGCAGAAGCTGCCTTTAATAAAGGCGAAACAGCGATGACCATCAACGGCCCGTGGGCATGGTCCAACATCGACACCAGCAAAGTGAAT TATGGTGTAACGGTACTGCCGACCTTCAAGGGTCAACCATCCAAACCGTTCGTTGGCGTGCTGAGCGCAGGTATTAACGCCGCCAGTCCGAACAAAG AGCTGGCGAAAGAGTTCCTCGAAAACTATCTGCTGACTGATGAAGGTCTGGAAGCGGTTAATAAAGACAAACCGCTGGGTGCCGTAGCGCTGAAGTC TTACGAGGAAGAGTTGGCGAAAGATCCACGTATTGCCGCCACCATGGAAAACGCCCAGAAAGGTGAAATCATGCCGAACATCCCGCAGATGTCCGCT TTCTGGTATGCCGTGCGTACTGCGGTGATCAACGCCGCCAGCGGTCGTCAGACTGTCGATGAAGCCCTGAAAGACGCGCAGACTCGTATCACCAAGA GCGGTCACCATCACCATCACCATTAAGATCCGGCTGCTAACAAAGCCCGAAAGGAAGCTGAGTTGGCTGCTGCCACCGCTGAGCAATAACTAGCATA ACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTG

118

>pETM-CHis-MBP170-cpEGFP-Mal3F1, region - T7 promoter to T7 terminator, 2086 bp TAATACGACTCACTATAGGGACACCACAACGGTTTCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACCATGTCCAAAATCGAAGAA GGTAAACTGGTAATCTGGATTAACGGCGATAAAGGCTATAACGGACTCGCTGAAGTCGGTAAGAAATTCGAGAAAGATACCGGAATTAAAGTCACCG TTGAGCATCCGGATAAACTGGAAGAGAAATTCCCACAGGTTGCGGCAACTGGCGATGGCCCTGACATTATCTTCTGGGCACACGACCGCTTTGGTGG CTACGCTCAATCTGGCCTGTTGGCTGAAATCACCCCGGACAAAGCGTTCCAGGACAAGCTGTATCCGTTTACCTGGGATGCCGTACGTTACAACGGC AAGCTGATTGCTTACCCGATCGCTGTTGAAGCGTTATCGCTGATTTATAACAAAGATCTGCTGCCGAACCCGCCAAAAACCTGGGAAGAGATCCCGG CGCTGGATAAAGAACTGAAAGCGAAAGGTAAGAGCGCGCTGATGTTCAACCTGCAAGAACCGTACTTCACCTGGCCGCTGATTGCTGCTGACGGGGG TTATGCGTTCGGTCGTGGTCGTTATAACGTCTTTATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCGAACTTCAAGATCCGCCACAACATCGAG GACGGCGGCGTGCAGCTCGCCTATCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCGTGCAGT CCAAACTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTA CAAGGGCGGTACCGGAGGGAGCATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGC CACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCT GGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCC CGAAGGCTACATTCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTATAAGACACGCGCTGAGGTTAAGTTCGAGGGCGACACTCTGGTTAAC CGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGCCATAAGCTTGAATATAACTTCAACCGTACTCCTAAGTATGAAAACG GCAAGTACGACATTAAAGACGTGGGCGTGGATAACGCTGGCGCGAAAGCGGGTCTGACCTTCCTGGTTGACCTGATTAAAAACAAACACATGAATGC AGACACCGATTACTCCATCGCAGAAGCTGCCTTTAATAAAGGCGAAACAGCGATGACCATCAACGGCCCGTGGGCATGGTCCAACATCGACACCAGC AAAGTGAATTATGGTGTAACGGTACTGCCGACCTTCAAGGGTCAACCATCCAAACCGTTCGTTGGCGTGCTGAGCGCAGGTATTAACGCCGCCAGTC CGAACAAAGAGCTGGCGAAAGAGTTCCTCGAAAACTATCTGCTGACTGATGAAGGTCTGGAAGCGGTTAATAAAGACAAACCGCTGGGTGCCGTAGC GCTGAAGTCTTACGAGGAAGAGTTGGCGAAAGATCCACGTATTGCCGCCACCATGGAAAACGCCCAGAAAGGTGAAATCATGCCGAACATCCCGCAG ATGTCCGCTTTCTGGTATGCCGTGCGTACTGCGGTGATCAACGCCGCCAGCGGTCGTCAGACTGTCGATGAAGCCCTGAAAGACGCGCAGACTCGTA TCACCAAGAGCGGTCACCATCACCATCACCATTAAGATCCGGCTGCTAACAAAGCCCGAAAGGAAGCTGAGTTGGCTGCTGCCACCGCTGAGCAATA ACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTG

119 Sequences from Chapter 4

> pdCas9-bacteria, addgene plasmid #44249, 6705 bp GACGTCTTAAGACCCACTTTCACATTTAAGTTGTTTTTCTAATCCGCATATGATCAATTCAAGGCCGAATAAGAAGGCTGGCTCTGCACCTTGGTGA TCAAATAATTCGATAGCTTGTCGTAATAATGGCGGCATACTATCAGTAGTAGGTGTTTCCCTTTCTTCTTTAGCGACTTGATGCTCTTGATCTTCCA ATACGCAACCTAAAGTAAAATGCCCCACAGCGCTGAGTGCATATAATGCATTCTCTAGTGAAAAACCTTGTTGGCATAAAAAGGCTAATTGATTTTC GAGAGTTTCATACTGTTTTTCTGTAGGCCGTGTACCTAAATGTACTTTTGCTCCATCGCGATGACTTAGTAAAGCACATCTAAAACTTTTAGCGTTA TTACGTAAAAAATCTTGCCAGCTTTCCCCTTCTAAAGGGCAAAAGTGAGTATGGTGCCTATCTAACATCTCAATGGCTAAGGCGTCGAGCAAAGCCC GCTTATTTTTTACATGCCAATACAATGTAGGCTGCTCTACACCTAGCTTCTGGGCGAGTTTACGGGTTGTTAAACCTTCGATTCCGACCTCATTAAG CAGCTCTAATGCGCTGTTAATCACTTTACTTTTATCTAATCTAGACATCATTAATTCCTAATTTTTGTTGACACTCTATCGTTGATAGAGTTATTTT ACCACTCCCTATCAGTGATAGAGAAAAGAATTCAAAAGATCTAAAGAGGAGAAAGGATCTATGGATAAGAAATACTCAATAGGCTTAGCTATCGGCA CAAATAGCGTCGGATGGGCGGTGATCACTGATGAATATAAGGTTCCGTCTAAAAAGTTCAAGGTTCTGGGAAATACAGACCGCCACAGTATCAAAAA AAATCTTATAGGGGCTCTTTTATTTGACAGTGGAGAGACAGCGGAAGCGACTCGTCTCAAACGGACAGCTCGTAGAAGGTATACACGTCGGAAGAAT CGTATTTGTTATCTACAGGAGATTTTTTCAAATGAGATGGCGAAAGTAGATGATAGTTTCTTTCATCGACTTGAAGAGTCTTTTTTGGTGGAAGAAG ACAAGAAGCATGAACGTCATCCTATTTTTGGAAATATAGTAGATGAAGTTGCTTATCATGAGAAATATCCAACTATCTATCATCTGCGAAAAAAATT GGTAGATTCTACTGATAAAGCGGATTTGCGCTTAATCTATTTGGCCTTAGCGCATATGATTAAGTTTCGTGGTCATTTTTTGATTGAGGGAGATTTA AATCCTGATAATAGTGATGTGGACAAACTATTTATCCAGTTGGTACAAACCTACAATCAATTATTTGAAGAAAACCCTATTAACGCAAGTGGAGTAG ATGCTAAAGCGATTCTTTCTGCACGATTGAGTAAATCAAGACGATTAGAAAATCTCATTGCTCAGCTCCCCGGTGAGAAGAAAAATGGCTTATTTGG GAATCTCATTGCTTTGTCATTGGGTTTGACCCCTAATTTTAAATCAAATTTTGATTTGGCAGAAGATGCTAAATTACAGCTTTCAAAAGATACTTAC GATGATGATTTAGATAATTTATTGGCGCAAATTGGAGATCAATATGCTGATTTGTTTTTGGCAGCTAAGAATTTATCAGATGCTATTTTACTTTCAG ATATCCTAAGAGTAAATACTGAAATAACTAAGGCTCCCCTATCAGCTTCAATGATTAAACGCTACGATGAACATCATCAAGACTTGACTCTTTTAAA AGCTTTAGTTCGACAACAACTTCCAGAAAAGTATAAAGAAATCTTTTTTGATCAATCAAAAAACGGATATGCAGGTTATATTGATGGGGGAGCTAGC CAAGAAGAATTTTATAAATTTATCAAACCAATTTTAGAAAAAATGGATGGTACTGAGGAATTATTGGTGAAACTAAATCGTGAAGATTTGCTGCGCA AGCAACGGACCTTTGACAACGGCTCTATTCCCCATCAAATTCACTTGGGTGAGCTGCATGCTATTTTGAGAAGACAAGAAGACTTTTATCCATTTTT AAAAGACAATCGTGAGAAGATTGAAAAAATCTTGACTTTTCGAATTCCTTATTATGTTGGTCCATTGGCGCGTGGCAATAGTCGTTTTGCATGGATG ACTCGGAAGTCTGAAGAAACAATTACCCCATGGAATTTTGAAGAAGTTGTCGATAAAGGTGCTTCAGCTCAATCATTTATTGAACGCATGACAAACT TTGATAAAAATCTTCCAAATGAAAAAGTACTACCAAAACATAGTTTGCTTTATGAGTATTTTACGGTTTATAACGAATTGACAAAGGTCAAATATGT TACTGAAGGAATGCGAAAACCAGCATTTCTTTCAGGTGAACAGAAGAAAGCCATTGTTGATTTACTCTTCAAAACAAATCGAAAAGTAACCGTTAAG CAATTAAAAGAAGATTATTTCAAAAAAATAGAATGTTTTGATAGTGTTGAAATTTCAGGAGTTGAAGATAGATTTAATGCTTCATTAGGTACCTACC ATGATTTGCTAAAAATTATTAAAGATAAAGATTTTTTGGATAATGAAGAAAATGAAGATATCTTAGAGGATATTGTTTTAACATTGACCTTATTTGA AGATAGGGAGATGATTGAGGAAAGACTTAAAACATATGCTCACCTCTTTGATGATAAGGTGATGAAACAGCTTAAACGTCGCCGTTATACTGGTTGG GGACGTTTGTCTCGAAAATTGATTAATGGTATTAGGGATAAGCAATCTGGCAAAACAATATTAGATTTTTTGAAATCAGATGGTTTTGCCAATCGCA ATTTTATGCAGCTGATCCATGATGATAGTTTGACATTTAAAGAAGACATTCAAAAAGCACAAGTGTCTGGACAAGGCGATAGTTTACATGAACATAT TGCAAATTTAGCTGGTAGCCCTGCTATTAAAAAAGGTATTTTACAGACTGTAAAAGTTGTTGATGAATTGGTCAAAGTAATGGGGCGGCATAAGCCA GAAAATATCGTTATTGAAATGGCACGTGAAAATCAGACAACTCAAAAGGGCCAGAAAAATTCGCGAGAGCGTATGAAACGAATCGAAGAAGGTATCA AAGAATTAGGAAGTCAGATTCTTAAAGAGCATCCTGTTGAAAATACTCAATTGCAAAATGAAAAGCTCTATCTCTATTATCTCCAAAATGGAAGAGA CATGTATGTGGACCAAGAATTAGATATTAATCGTTTAAGTGATTATGATGTCGATGCCATTGTTCCACAAAGTTTCCTTAAAGACGATTCAATAGAC AATAAGGTCTTAACGCGTTCTGATAAAAATCGTGGTAAATCGGATAACGTTCCAAGTGAAGAAGTAGTCAAAAAGATGAAAAACTATTGGAGACAAC TTCTAAACGCCAAGTTAATCACTCAACGTAAGTTTGATAATTTAACGAAAGCTGAACGTGGAGGTTTGAGTGAACTTGATAAAGCTGGTTTTATCAA ACGCCAATTGGTTGAAACTCGCCAAATCACTAAGCATGTGGCACAAATTTTGGATAGTCGCATGAATACTAAATACGATGAAAATGATAAACTTATT 120 CGAGAGGTTAAAGTGATTACCTTAAAATCTAAATTAGTTTCTGACTTCCGAAAAGATTTCCAATTCTATAAAGTACGTGAGATTAACAATTACCATC ATGCCCATGATGCGTATCTAAATGCCGTCGTTGGAACTGCTTTGATTAAGAAATATCCAAAACTTGAATCGGAGTTTGTCTATGGTGATTATAAAGT TTATGATGTTCGTAAAATGATTGCTAAGTCTGAGCAAGAAATAGGCAAAGCAACCGCAAAATATTTCTTTTACTCTAATATCATGAACTTCTTCAAA ACAGAAATTACACTTGCAAATGGAGAGATTCGCAAACGCCCTCTAATCGAAACTAATGGGGAAACTGGAGAAATTGTCTGGGATAAAGGGCGAGATT TTGCCACAGTGCGCAAAGTATTGTCCATGCCCCAAGTCAATATTGTCAAGAAAACAGAAGTACAGACAGGCGGATTCTCCAAGGAGTCAATTTTACC AAAAAGAAATTCGGACAAGCTTATTGCTCGTAAAAAAGACTGGGATCCAAAAAAATATGGTGGTTTTGATAGTCCAACGGTAGCTTATTCAGTCCTA GTGGTTGCTAAGGTGGAAAAAGGGAAATCGAAGAAGTTAAAATCCGTTAAAGAGTTACTAGGGATCACAATTATGGAAAGAAGTTCCTTTGAAAAAA ATCCGATTGACTTTTTAGAAGCTAAAGGATATAAGGAAGTTAAAAAAGACTTAATCATTAAACTACCTAAATATAGTCTTTTTGAGTTAGAAAACGG TCGTAAACGGATGCTGGCTAGTGCCGGAGAATTACAAAAAGGAAATGAGCTGGCTCTGCCAAGCAAATATGTGAATTTTTTATATTTAGCTAGTCAT TATGAAAAGTTGAAGGGTAGTCCAGAAGATAACGAACAAAAACAATTGTTTGTGGAGCAGCATAAGCATTATTTAGATGAGATTATTGAGCAAATCA GTGAATTTTCTAAGCGTGTTATTTTAGCAGATGCCAATTTAGATAAAGTTCTTAGTGCATATAACAAACATAGAGACAAACCAATACGTGAACAAGC AGAAAATATTATTCATTTATTTACGTTGACGAATCTTGGAGCTCCCGCTGCTTTTAAATATTTTGATACAACAATTGATCGTAAACGATATACGTCT ACAAAAGAAGTTTTAGATGCCACTCTTATCCATCAATCCATCACTGGTCTTTATGAAACACGCATTGATTTGAGTCAGCTAGGAGGTGACTAACTCG AGTAAGGATCTCCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAG AGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATACCTAGGGATATATTCCGCTTCCTCGCTCACTGACTCGCTACGCTCGGTCGTTCG ACTGCGGCGAGCGGAAATGGCTTACGAACGGGGCGGAGATTTCCTGGAAGATGCCAGGAAGATACTTAACAGGGAAGTGAGAGGGCCGCGGCAAAGC CGTTTTTCCATAGGCTCCGCCCCCCTGACAAGCATCACGAAATCTGACGCTCAAATCAGTGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGC GTTTCCCCCTGGCGGCTCCCTCGTGCGCTCTCCTGTTCCTGCCTTTCGGTTTACCGGTGTCATTCCGCTGTTATGGCCGCGTTTGTCTCATTCCACG CCTGACACTCAGTTCCGGGTAGGCAGTTCGCTCCAAGCTGGACTGTATGCACGAACCCCCCGTTCAGTCCGACCGCTGCGCCTTATCCGGTAACTAT CGTCTTGAGTCCAACCCGGAAAGACATGCAAAAGCACCACTGGCAGCAGCCACTGGTAATTGATTTAGAGGAGTTAGTCTTGAAGTCATGCGCCGGT TAAGGCTAAACTGAAAGGACAAGTTTTGGTGACTGCGCTCCTCCAAGCCAGTTACCTCGGTTCAAAGAGTTGGTAGCTCAGAGAACCTTCGAAAAAC CGCCCTGCAAGGCGGTTTTTTCGTTTTCAGAGCAAGAGATTACGCGCAGACCAAAACGATCTCAAGAAGATCATCTTATTAATCAGATAAAATATTT CTAGATTTCAGTGCAATTTATCTCTTCAAATGTAGCACCTGAAGTCAGCCCCATACGATATAAGTTGTTACTAGTGCTTGGATTCTCACCAATAAAA AACGCCCGGCGGCAACCGAGCGTTCTGAACAAATCCAGATGGAGTTCTGAGGTCATTACTGGATCTATCAACAGGAGTCCAAGCGAGCTCGATATCA AATTACGCCCCGCCCTGCCACTCATCGCAGTACTGTTGTAATTCATTAAGCATTCTGCCGACATGGAAGCCATCACAAACGGCATGATGAACCTGAA TCGCCAGCGGCATCAGCACCTTGTCGCCTTGCGTATAATATTTGCCCATGGTGAAAACGGGGGCGAAGAAGTTGTCCATATTGGCCACGTTTAAATC AAAACTGGTGAAACTCACCCAGGGATTGGCTGAGACGAAAAACATATTCTCAATAAACCCTTTAGGGAAATAGGCCAGGTTTTCACCGTAACACGCC ACATCTTGCGAATATATGTGTAGAAACTGCCGGAAATCGTCGTGGTATTCACTCCAGAGCGATGAAAACGTTTCAGTTTGCTCATGGAAAACGGTGT AACAAGGGTGAACACTATCCCATATCACCAGCTCACCGTCTTTCATTGCCATACGAAATTCCGGATGAGCATTCATCAGGCGGGCAAGAATGTGAAT AAAGGCCGGATAAAACTTGTGCTTATTTTTCTTTACGGTCTTTAAAAAGGCCGTAATATCCAGCTGAACGGTCTGGTTATAGGTACATTGAGCAACT GACTGAAATGCCTCAAAATGTTCTTTACGATGCCATTGGGATATATCAACGGTGGTATATCCAGTGATTTTTTTCTCCATTTTAGCTTCCTTAGCTC CTGAAAATCTCGATAACTCAAAAAATACGCCCGGTAGTGATCTTATTTCATTATGGTGAAAGTTGGAACCTCTTACGTGCCGATCAACGTCTCATTT TCGCCAGATATC

>pITdCas9, 3451 bp GACGTCTTAAGACCCACTTTCACATTTAAGTTGTTTTTCTAATCCGCATATGATCAATTCAAGGCCGAATAAGAAGGCTGGCTCTGCACCTTGGTGA TCAAATAATTCGATAGCTTGTCGTAATAATGGCGGCATACTATCAGTAGTAGGTGTTTCCCTTTCTTCTTTAGCGACTTGATGCTCTTGATCTTCCA ATACGCAACCTAAAGTAAAATGCCCCACAGCGCTGAGTGCATATAATGCATTCTCTAGTGAAAAACCTTGTTGGCATAAAAAGGCTAATTGATTTTC GAGAGTTTCATACTGTTTTTCTGTAGGCCGTGTACCTAAATGTACTTTTGCTCCATCGCGATGACTTAGTAAAGCACATCTAAAACTTTTAGCGTTA 121 TTACGTAAAAAATCTTGCCAGCTTTCCCCTTCTAAAGGGCAAAAGTGAGTATGGTGCCTATCTAACATCTCAATGGCTAAGGCGTCGAGCAAAGCCC GCTTATTTTTTACATGCCAATACAATGTAGGCTGCTCTACACCTAGCTTCTGGGCGAGTTTACGGGTTGTTAAACCTTCGATTCCGACCTCATTAAG CAGCTCTAATGCGCTGTTAATCACTTTACTTTTATCTAATCTAGACATCATTAATTCCTAATTTTTGTTGACACTCTATCGTTGATAGAGTTATTTT ACCACTCCCTATCAGTGATAGAGAAAAGAATTCAAAAGATCTAGGATCCGTTTTCGCATTTATCGTGAAACGCTTTCGCATTTTGAGTGAATTATAT GAAGCTTATTGCTCGTAAAAAAGACTGGGATCCAAAAAAATATGGTGGTTTTGATAGTCCAACGGTAGCTTATTCAGTCCTAGTGGTTGCTAAGGTG GAAAAAGGGAAATCGAAGAAGTTAAAATCCGTTAAAGAGTTACTAGGGATCACAATTATGGAAAGAAGTTCCTTTGAAAAAAATCCGATTGACTTTT TAGAAGCTAAAGGATATAAGGAAGTTAAAAAAGACTTAATCATTAAACTACCTAAATATAGTCTTTTTGAGTTAGAAAACGGTCGTAAACGGATGCT GGCTAGTGCCGGAGAATTACAAAAAGGAAATGAGCTGGCTCTGCCAAGCAAATATGTGAATTTTTTATATTTAGCTAGTCATTATGAAAAGTTGAAG GGTAGTCCAGAAGATAACGAACAAAAACAATTGTTTGTGGAGCAGCATAAGCATTATTTAGATGAGATTATTGAGCAAATCAGTGAATTTTCTAAGC GTGTTATTTTAGCAGATGCCAATTTAGATAAAGTTCTTAGTGCATATAACAAACATAGAGACAAACCAATACGTGAACAAGCAGAAAATATTATTCA TTTATTTACGTTGACGAATCTTGGAGCTCCCGCTGCTTTTAAATATTTTGATACAACAATTGATCGTAAACGATATACGTCTACAAAAGAAGTTTTA GATGCCACTCTTATCCATCAATCCATCACTGGTCTTTATGAAACACGCATTGATTTGAGTCAGCTAGGAGGTGACAAGCTTGATTGATTGAACGAAA AACGCGAAAGCGTTTCACGATAAATGCGAAAACGGATCCTCTCGAGTAAGGATCTCCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGG CCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATACCTAGGGATA TATTCCGCTTCCTCGCTCACTGACTCGCTACGCTCGGTCGTTCGACTGCGGCGAGCGGAAATGGCTTACGAACGGGGCGGAGATTTCCTGGAAGATG CCAGGAAGATACTTAACAGGGAAGTGAGAGGGCCGCGGCAAAGCCGTTTTTCCATAGGCTCCGCCCCCCTGACAAGCATCACGAAATCTGACGCTCA AATCAGTGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGCGGCTCCCTCGTGCGCTCTCCTGTTCCTGCCTTTCGGTTTA CCGGTGTCATTCCGCTGTTATGGCCGCGTTTGTCTCATTCCACGCCTGACACTCAGTTCCGGGTAGGCAGTTCGCTCCAAGCTGGACTGTATGCACG AACCCCCCGTTCAGTCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGAAAGACATGCAAAAGCACCACTGGCAGCAGCCAC TGGTAATTGATTTAGAGGAGTTAGTCTTGAAGTCATGCGCCGGTTAAGGCTAAACTGAAAGGACAAGTTTTGGTGACTGCGCTCCTCCAAGCCAGTT ACCTCGGTTCAAAGAGTTGGTAGCTCAGAGAACCTTCGAAAAACCGCCCTGCAAGGCGGTTTTTTCGTTTTCAGAGCAAGAGATTACGCGCAGACCA AAACGATCTCAAGAAGATCATCTTATTAATCAGATAAAATATTTCTAGATTTCAGTGCAATTTATCTCTTCAAATGTAGCACCTGAAGTCAGCCCCA TACGATATAAGTTGTTACTAGTGCTTGGATTCTCACCAATAAAAAACGCCCGGCGGCAACCGAGCGTTCTGAACAAATCCAGATGGAGTTCTGAGGT CATTACTGGATCTATCAACAGGAGTCCAAGCGAGCTCGATATCAAATTACGCCCCGCCCTGCCACTCATCGCAGTACTGTTGTAATTCATTAAGCAT TCTGCCGACATGGAAGCCATCACAAACGGCATGATGAACCTGAATCGCCAGCGGCATCAGCACCTTGTCGCCTTGCGTATAATATTTGCCCATGGTG AAAACGGGGGCGAAGAAGTTGTCCATATTGGCCACGTTTAAATCAAAACTGGTGAAACTCACCCAGGGATTGGCTGAGACGAAAAACATATTCTCAA TAAACCCTTTAGGGAAATAGGCCAGGTTTTCACCGTAACACGCCACATCTTGCGAATATATGTGTAGAAACTGCCGGAAATCGTCGTGGTATTCACT CCAGAGCGATGAAAACGTTTCAGTTTGCTCATGGAAAACGGTGTAACAAGGGTGAACACTATCCCATATCACCAGCTCACCGTCTTTCATTGCCATA CGAAATTCCGGATGAGCATTCATCAGGCGGGCAAGAATGTGAATAAAGGCCGGATAAAACTTGTGCTTATTTTTCTTTACGGTCTTTAAAAAGGCCG TAATATCCAGCTGAACGGTCTGGTTATAGGTACATTGAGCAACTGACTGAAATGCCTCAAAATGTTCTTTACGATGCCATTGGGATATATCAACGGT GGTATATCCAGTGATTTTTTTCTCCATTTTAGCTTCCTTAGCTCCTGAAAATCTCGATAACTCAAAAAATACGCCCGGTAGTGATCTTATTTCATTA TGGTGAAAGTTGGAACCTCTTACGTGCCGATCAACGTCTCATTTTCGCCAGATATC

>pUC19_noGGsites, 2686 bp TCGCGCGTTTCGGTGATGACGGTGAAAACCTCTGACACATGCAGCTCCCGGTGACGGTCACAGCTTGTCTGTAAGCGGATGCCGGGAGCAGACAAGC CCGTCAGGGCGCGTCAGCGGGTGTTGGCGGGTGTCGGGGCTGGCTTAACTATGCGGCATCAGAGCAGATTGTACTGAGAGTGCACCATATGCGGTGT GAAATACCGCACAGATGCGTAAGGAGAAAATACCGCATCAGGCGCCATTCGCCATTCAGGCTGCGCAACTGTTGGGAAGGGCGATCGGTGCGGGCCT CTTCGCTATTACGCCAGCTGGCGAAAGGGGGATGTGCTGCAAGGCGATTAAGTTGGGTAACGCCAGGGTTTTCCCAGTCACGACGTTGTAAAACGAC GGCCAGTGAATTCGAGCTCGGTACCCGGGGATCCTCTAGAGTCGACCTGCAGGCATGCAAGCTTGGCGTAATCATGGTCATAGCTGTTTCCTGTGTG AAATTGTTATCCGCTCACAATTCCACACAACATACGAGCCGGAAGCATAAAGTGTAAAGCCTGGGGTGCCTAATGAGTGAGCTAACTCACATTAATT GCGTTGCGCTCACTGCCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTGCATTAATGAATCGGCCAACGCGCGGGGAGAGGCGGTTTGCGTATTG 122 GGCGCTCTTCCGCTTCCTCGCTCACTGACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCAGCTCACTCAAAGGCGGTAATACGGTTATC CACAGAATCAGGGGATAACGCAGGAAAGAACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCAT AGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTG GAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTC ACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGT AACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTG CTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAG AGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAA GAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCA CCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGC ACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCC AGTGCTGCAATGATACCGCGAGAGCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTG CAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGC TACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGC AAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTC TTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTG CCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATC TTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAA CAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCA GGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGAC GTCTAAGAAACCATTATTATCATGACATTAACCTATAAAAATAGGCGTATCACGAGGCCCTTTCATC

>pUCM-dCas9, 6137 GATCTAAAGAGGAGAAAGGATCTATGGATAAGAAATACTCAATAGGCTTAGCTATCGGCACAAATAGCGTCGGATGGGCGGTGATCACTGATGAATA TAAGGTTCCGTCTAAAAAGTTCAAGGTTCTGGGAAATACAGACCGCCACAGTATCAAAAAAAATCTTATAGGGGCTCTTTTATTTGACAGTGGAGAG ACAGCGGAAGCGACTCGTCTCAAACGGACAGCTCGTAGAAGGTATACACGTCGGAAGAATCGTATTTGTTATCTACAGGAGATTTTTTCAAATGAGA TGGCGAAAGTAGATGATAGTTTCTTTCATCGACTTGAAGAGTCTTTTTTGGTGGAAGAAGACAAGAAGCATGAACGTCATCCTATTTTTGGAAATAT AGTAGATGAAGTTGCTTATCATGAGAAATATCCAACTATCTATCATCTGCGAAAAAAATTGGTAGATTCTACTGATAAAGCGGATTTGCGCTTAATC TATTTGGCCTTAGCGCATATGATTAAGTTTCGTGGTCATTTTTTGATTGAGGGAGATTTAAATCCTGATAATAGTGATGTGGACAAACTATTTATCC AGTTGGTACAAACCTACAATCAATTATTTGAAGAAAACCCTATTAACGCAAGTGGAGTAGATGCTAAAGCGATTCTTTCTGCACGATTGAGTAAATC AAGACGATTAGAAAATCTCATTGCTCAGCTCCCCGGTGAGAAGAAAAATGGCTTATTTGGGAATCTCATTGCTTTGTCATTGGGTTTGACCCCTAAT TTTAAATCAAATTTTGATTTGGCAGAAGATGCTAAATTACAGCTTTCAAAAGATACTTACGATGATGATTTAGATAATTTATTGGCGCAAATTGGAG ATCAATATGCTGATTTGTTTTTGGCAGCTAAGAATTTATCAGATGCTATTTTACTTTCAGATATCCTAAGAGTAAATACTGAAATAACTAAGGCTCC CCTATCAGCTTCAATGATTAAACGCTACGATGAACATCATCAAGACTTGACTCTTTTAAAAGCTTTAGTTCGACAACAACTTCCAGAAAAGTATAAA GAAATCTTTTTTGATCAATCAAAAAACGGATATGCAGGTTATATTGATGGGGGAGCTAGCCAAGAAGAATTTTATAAATTTATCAAACCAATTTTAG AAAAAATGGATGGTACTGAGGAATTATTGGTGAAACTAAATCGTGAAGATTTGCTGCGCAAGCAACGGACCTTTGACAACGGCTCTATTCCCCATCA AATTCACTTGGGTGAGCTGCATGCTATTTTGAGAAGACAAGAAGACTTTTATCCATTTTTAAAAGACAATCGTGAGAAGATTGAAAAAATCTTGACT TTTCGAATTCCTTATTATGTTGGTCCATTGGCGCGTGGCAATAGTCGTTTTGCATGGATGACTCGGAAGTCTGAAGAAACAATTACCCCATGGAATT

123 TTGAAGAAGTTGTCGATAAAGGTGCTTCAGCTCAATCATTTATTGAACGCATGACAAACTTTGATAAAAATCTTCCAAATGAAAAAGTACTACCAAA ACATAGTTTGCTTTATGAGTATTTTACGGTTTATAACGAATTGACAAAGGTCAAATATGTTACTGAAGGAATGCGAAAACCAGCATTTCTTTCAGGT GAACAGAAGAAAGCCATTGTTGATTTACTCTTCAAAACAAATCGAAAAGTAACCGTTAAGCAATTAAAAGAAGATTATTTCAAAAAAATAGAATGTT TTGATAGTGTTGAAATTTCAGGAGTTGAAGATAGATTTAATGCTTCATTAGGTACCTACCATGATTTGCTAAAAATTATTAAAGATAAAGATTTTTT GGATAATGAAGAAAATGAAGATATCTTAGAGGATATTGTTTTAACATTGACCTTATTTGAAGATAGGGAGATGATTGAGGAAAGACTTAAAACATAT GCTCACCTCTTTGATGATAAGGTGATGAAACAGCTTAAACGTCGCCGTTATACTGGTTGGGGACGTTTGTCTCGAAAATTGATTAATGGTATTAGGG ATAAGCAATCTGGCAAAACAATATTAGATTTTTTGAAATCAGATGGTTTTGCCAATCGCAATTTTATGCAGCTGATCCATGATGATAGTTTGACATT TAAAGAAGACATTCAAAAAGCACAAGTGTCTGGACAAGGCGATAGTTTACATGAACATATTGCAAATTTAGCTGGTAGCCCTGCTATTAAAAAAGGT ATTTTACAGACTGTAAAAGTTGTTGATGAATTGGTCAAAGTAATGGGGCGGCATAAGCCAGAAAATATCGTTATTGAAATGGCACGTGAAAATCAGA CAACTCAAAAGGGCCAGAAAAATTCGCGAGAGCGTATGAAACGAATCGAAGAAGGTATCAAAGAATTAGGAAGTCAGATTCTTAAAGAGCATCCTGT TGAAAATACTCAATTGCAAAATGAAAAGCTCTATCTCTATTATCTCCAAAATGGAAGAGACATGTATGTGGACCAAGAATTAGATATTAATCGTTTA AGTGATTATGATGTCGATGCCATTGTTCCACAAAGTTTCCTTAAAGACGATTCAATAGACAATAAGGTCTTAACGCGTTCTGATAAAAATCGTGGTA AATCGGATAACGTTCCAAGTGAAGAAGTAGTCAAAAAGATGAAAAACTATTGGAGACAACTTCTAAACGCCAAGTTAATCACTCAACGTAAGTTTGA TAATTTAACGAAAGCTGAACGTGGAGGTTTGAGTGAACTTGATAAAGCTGGTTTTATCAAACGCCAATTGGTTGAAACTCGCCAAATCACTAAGCAT GTGGCACAAATTTTGGATAGTCGCATGAATACTAAATACGATGAAAATGATAAACTTATTCGAGAGGTTAAAGTGATTACCTTAAAATCTAAATTAG TTTCTGACTTCCGAAAAGATTTCCAATTCTATAAAGTACGTGAGATTAACAATTACCATCATGCCCATGATGCGTATCTAAATGCCGTCGTTGGAAC TGCTTTGATTAAGAAATATCCAAAACTTGAATCGGAGTTTGTCTATGGTGATTATAAAGTTTATGATGTTCGTAAAATGATTGCTAAGTCTGAGCAA GAAATAGGCAAAGCAACCGCAAAATATTTCTTTTACTCTAATATCATGAACTTCTTCAAAACAGAAATTACACTTGCAAATGGAGAGATTCGCAAAC GCCCTCTAATCGAAACTAATGGGGAAACTGGAGAAATTGTCTGGGATAAAGGGCGAGATTTTGCCACAGTGCGCAAAGTATTGTCCATGCCCCAAGT CAATATTGTCAAGAAAACAGAAGTACAGACAGGCGGATTCTCCAAGGAGTCAATTTTACCAAAAAGAAATTCGGACAAGCTTATTGCTCGTAAAAAA GACTGGGATCCAAAAAAATATGGTGGTTTTGATAGTCCAACGGTAGCTTATTCAGTCCTAGTGGTTGCTAAGGTGGAAAAAGGGAAATCGAAGAAGT TAAAATCCGTTAAAGAGTTACTAGGGATCACAATTATGGAAAGAAGTTCCTTTGAAAAAAATCCGATTGACTTTTTAGAAGCTAAAGGATATAAGGA AGTTAAAAAAGACTTAATCATTAAACTACCTAAATATAGTCTTTTTGAGTTAGAAAACGGTCGTAAACGGATGCTGGCTAGTGCCGGAGAATTACAA AAAGGAAATGAGCTGGCTCTGCCAAGCAAATATGTGAATTTTTTATATTTAGCTAGTCATTATGAAAAGTTGAAGGGTAGTCCAGAAGATAACGAAC AAAAACAATTGTTTGTGGAGCAGCATAAGCATTATTTAGATGAGATTATTGAGCAAATCAGTGAATTTTCTAAGCGTGTTATTTTAGCAGATGCCAA TTTAGATAAAGTTCTTAGTGCATATAACAAACATAGAGACAAACCAATACGTGAACAAGCAGAAAATATTATTCATTTATTTACGTTGACGAATCTT GGAGCTCCCGCTGCTTTTAAATATTTTGATACAACAATTGATCGTAAACGATATACGTCTACAAAAGAAGTTTTAGATGCCACTCTTATCCATCAAT CCATCACTGGTCTTTATGAAACACGCATTGATTTGAGTCAGCTAGGAGGTGACTAACTCGAGTGCATTAATGAATCGGCCAACGCGCGGGGAGAGGC GGTTTGCGTATTGGGCGCTCTTCCGCTTCCTCGCTCACTGACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCAGCTCACTCAAAGGCGG TAATACGGTTATCCACAGAATCAGGGGATAACGCAGGAAAGAACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCT GGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAG GCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGC TTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTG CGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGG TATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTA CCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAA AAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCA AAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCT TAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTT ACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGAGCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGC AGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACG TTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATC CCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCA CTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGAC CGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAA ACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCT GGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATTATT GAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACATTTCCCCGAAA AGTGCCACCTGACGTCTAAGAAACCA

124

> pKC580, 1975 bp TCGGTCTCCGTAGCGGGTCAGGGAGCGGAGGTAGCCTCCAGCGGCGCCGCGTGACGGTGCGCAAGGCCGACGCCGGCGGGCTGGGCATCAGCATCAA GGGGGGCCGGGAAAACAAGATGCCTATTCTCATTTCCAAAATCTTCAAGGGACTGGCAGCAGACCAGACGGAGGCCCTTTTTGTTGGGGATGCCATC CTGTCTGTGAATGGTGAAGATTTGTCCTCTGCCACCCACGATGAAGCGGTACAGGCCCTCAAGAAGACAGGCAAGGAGGTCGTGCTCGAAGTTAAGT ACATGAAGGAGGACTCACCCTATTTCAAGGGATCCTGAGACCAGACCAATAAAAAACGCCCGGCGGCAACCGAGCGTTCTGAACAAATCCAGATGGA GTTCTGAGGTCATTACTGGATCTATCAACAGGAGTCCAAGCGAGCTCGATATCAAATTACGCCCCGCCCTGCCACTCATCGCAGTACTGTTGTAATT CATTAAGCATTCTGCCGACATGGAAGCCATCACAAACGGCATGATGAACCTGAATCGCCAGCGGCATCAGCACCTTGTCGCCTTGCGTATAATATTT GCCCATGGTGAAAACGGGGGCGAAGAAGTTGTCCATATTGGCCACGTTTAAATCAAAACTGGTGAAACTCACCCAGGGATTGGCTGAAACGAAAAAC ATATTCTCAATAAACCCTTTAGGGAAATAGGCCAGGTTTTCACCGTAACACGCCACATCTTGCGAATATATGTGTAGAAACTGCCGGAAATCGTCGT GGTATTCACTCCAGAGCGATGAAAACGTTTCAGTTTGCTCATGGAAAACGGTGTAACAAGGGTGAACACTATCCCATATCACCAGCTCACCGTCTTT CATTGCCATACGAAATTCCGGATGAGCATTCATCAGGCGGGCAAGAATGTGAATAAAGGCCGGATAAAACTTGTGCTTATTTTTCTTTACGGTCTTT AAAAAGGCCGTAATATCCAGCTGAACGGTCTGGTTATAGGTACATTGAGCAACTGACTGAAATGCCTCAAAATGTTCTTTACGATGCCATTGGGATA TATCAACGGTGGTATATCCAGTGATTTTTTTCTCCATTTTAGCTTCCTTAGCTCCTGAAAATCTCGATAACTCAAAAAATACGCCCGGTAGTGATCT TATTTCATTATGGTGAAAGTTGGAACCTCTTACGTGCCCGATCAATCATGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCC CGTAGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAAAAAAACCACCGCTACCAGCGGTGGTTTGT TTGCCGGATCAAGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTTCTTCTAGTGTAGCCGTAGTTAG GCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTAC CGGGTTGGACTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACGACCTAC ACCGAACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGAA CAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTG ATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGTTCTTT CCTGCGTTATCCCCTGATTCTGTGGATAACCGTAG

125