The Pennsylvania State University

The Graduate School

College of Engineering

ENGINEERING A SYNTHETIC REPLICATION SYSTEM TO SECURELY STORE

RECOMBINANT DNA

A Thesis in

Agricultural and Biological Engineering

by

Long Chen

 2012 Long Chen

Submitted in Partial Fulfillment of the Requirements for the Degree of

Master of Science

December 2012

The thesis of Long Chen was reviewed and approved* by the following:

Howard Salis Assistant Professor of Agricultural and Biological Engineering Thesis Advisor

Jeff Catchmark Professor of Agricultural and Biological Engineering

Kenneth Keiler Associate Professor of Biochemistry and Molecular Biology

Virendra Puri Distinguished Professor of Agricultural and Biological Engineering Graduate Program Coordinator

*Signatures are on file in the Graduate School

iii

ABSTRACT

The biotech industry spends millions to engineer genetic systems and manufacture products, and yet the DNA can be stolen and easily reverse-engineered for nefarious purposes. We are developing a synthetic DNA replication system to securely store high- value recombinant DNA and to prevent it from being manipulated by unauthorized third- parties. Our “LOCK and KEY” system only allows a high-value to replicate inside an authorized bacterial host, controlling a key route to reverse engineering a genetic system. Importantly, the LOCK and KEY system requires no special action by legitimate researchers and multiple orthogonal variants are available.

iv

TABLE OF CONTENTS List of Figures………………………………………………………………………………....v

List of Tables………………………………………………………………………………….vi

Acknowledgements…………………………………………………………………………....vii

Chapter 1 Introduction ...... 1

Chapter 2 Methodology ...... 5

Strains, media ...... 5 Primary construct clone ...... 5 LOCK and KEY variants clone ...... 6 Variants characterization and transformants antibiotic susceptibility assay ...... 7

Chapter 3 Results ...... 9

Construction of a dual-origin system ...... 9 Separation of RNAII primer and replication origin ...... 10 Elimination of RNAII self-sufficiency ...... 15 Engineering synthetic “LOCK and KEY” v2.0 Variants ...... 16 Evaluating “LOCK and KEY” variants ...... 19 Translational optimization for LOCK and KEY v2.0 variants ...... 22

Chapter 4 Discussion ...... 24

Chapter 5 Conclusion ...... 26

Appendix ...... 27 Reference ...... 29

v

LIST OF FIGURES

Figure 1-1. Illustration of LOCK and KEY authorized replication system ...... 2

Figure 1-3. Negative feedback loop formed by RNA I and RNA II...... 4

Figure 1-4. Schematic of synthetic ColE1 replication system ...... 4

Figure 3-1. Schematic map of P0 ...... 9

Figure 3-2. qPCR data of P0 in DH10B and Pir-116...... 10

Figure 3-3. Schematic map of construct “P1” ...... 10

Figure 3-4. Transformation results in DH10B and pir-116 of P1 under 50 µg/mL chloramphenicol selection ...... 11

Figure 3-5. Schematic map of construct “P2” ...... 12

Figure 3-6. Transformation results in DH10B and pir-116 of P2 under 50µg/mL chloramphenicol selection...... 12

Figure 3-7. Schematic map of “LK1.0” variants ...... 13

Figure 3-8. Schematic map of construct “C1” ...... 14

Figure 3-9. Transformation results in DH10B of “C1” and “C2” ...... 14

Figure 3-10. Schematic map of construct “C2” ...... 15

Figure 3-11. Schematic of minimizing cis-acting replication of RNAII ...... 15

Figure 3-12. Antibiotic susceptibility assay outcome of poly “A” insertion ...... 16

Figure 3-13. Schematic of LOCK and KEY v2.0 variants ...... 17

Figure 3-14. 37 Alignment between biophysical parameter and length of 37 rescue sequence ...... 19

Figure 3-15. Two types of constructs for characterizing LKv2.0 mutants ...... 20

Figure 3-16. Antibiotic susceptibility data of 37 LKv2.0 variants ...... 21

Figure 3-17. Antibiotic susceptibility data of orthogonal test for 13 outstanding LOCK and KEY variants ...... 21

Figure 3-18. Comparison between previous CmR vector and optimized CmR vector ...... 23

vi

LIST OF TABLES

Table 3-1. Mutations of LOCK and KEY variants v1.0 at 3’ of both RNAII and ...... 13

Table 3-2. Rescue sequence profile ...... 17

Table 3-3. Synthetic RBS designed by RBS calculator ...... 22

vii

ACKNOWLEDGEMENTS

I would like to thank my advisor Dr. Salis for guiding me perform my project and helping me figure out every trouble I met.

I would like to thank all my committee professors: Dr. Catchmark and Dr. Keiler, I feel deeply appreciate them are interested in what I am doing and give me constructive suggestions.

I would also like to give my acknowledgement to all of my labmates; we corporate with each other so well and I feel really good working with them.

1

Chapter 1

Introduction

In the late 1970s, people engineered the first E.coli carried a synthetic gene to produce human insulin (Goeddel et al., 1979). At that time, most people didn’t believe microbes could be engineered to work for us. Today, thanks to the fast rising of metabolic engineering and synthetic biology, we are able to use cheap antibiotics and abundant vaccines. Besides, biofuel might be the next promising benefit we got from making good use of recombinant DNA.

However, on the other hand the security of storing and utilization of recombinant DNA is becoming a serious issue. It is necessary to develop some technologies to secure the entire manufacture process.

The project of “Gene Guard” is created by Dr. Salis in 2011. It has three different layers:

“universal self-protection system”; “LOCK and KEY” authorized replication system and

“obscurity system”. The goal of developing multiple genetic security systems is to create a comprehensive way to protect recombinant DNA information and prevent unauthorized utilization.

This thesis focuses on the study of “LOCK and KEY” authorized replication system. The reason why we want to restrict plasmid’s replication is that plasmid could be easily transformed into a host, and replicate to numerous copies independently. If the plasmid contains particular information, people could transformed it to another hosts to maintain this plasmid and conduct further study on it. In metabolic engineering, recombinant DNA could always be unique and high-value, thus, it is risky if someone steals the plasmid and reverse engineers it.

2

Figure 1-1. Illustration of LOCK and KEY authorized replication system

The goal of “LOCK and KEY” system is to engineer an authorized host, which carries valuable plasmid and allows its replication. However, if this valuable plasmid was transformed to an unauthorized host, it can’t replicate at all. In other words, only authorized hosts could maintain the DNA of the plasmid. Through this method, all the legitimate research could be conducted normally by using authorized hosts, and people don't need to be worried even if the plasmid was stolen.

And our approach to develop “LOCK and KEY” system is based on the ColE1 origin of replication in E.Coli. As the most popular vector in commercial utilization, the mechanisms of how ColE1 origin works has been well studied since 1980s. There is an RNA sequence transcribed 555 bp upstream of the origin of replication (Itoh and Tomizawa, 1980). A priming promoter is responsible for the transcription (Panayotatos, 1984). This RNA sequence is termed as RNAII (Tomizawa, 1984), which plays a role of pre-priming replication by forming RNA/

DNA hybridization. The hybridized RNA is cleaved by RNase H at the origin of replication and then serves as a primer for DNA synthesis with the help of DNA polymerase I (Dasgupta et al.,

3 1987). People have studied the connections between the predictive upstream structures of RNAII primer and its activity to initiate replication, as shown in figure 1-2 (Masukata and Tomizawa.

1986), there are several critical sections related to RNAII’s activity (Masukata and Tomizawa,

1984). For instance, section VII and X are crucial structures for RNAII to bind DNA template.

Figure 1-2. Predicted structure of wild-type RNAII

There is another RNA sequence which is RNAI transcribed in an opposite orientation towards 5’ end of RNAII (Som and Tomizawa. 1983). Since they are complementary, RNAI will bind to the section I and section II (Dooley et al., 1985) of RNAII to form a “kissing loop interaction” due to their specific YUNR motif in the loop (Franch et al., 1999). RNAI works as an antisense RNA

4 which inhibits the activity of RNAII and they form a negative feedback loop to regulate the copy number of plasmid (figure 1-3). The binding of RNA I to RNA II is modulated by Rom protein

(Tomizawa and Som, 1984).

Figure 1-3. Negative feedback loop formed by RNA I and RNA II.

Inspired by the natural regulatory mechanism of ColE1 origin, we are going to engineer a synthetic replication system which has two separate parts: a synthetic RNA II primer in the genome of E.coli; a synthetic origin of replication in the plasmid. The synthetic origin can’t transcribe RNA II primer, so the plasmid will lose the ability to replicate independently. As a result, only when the plasmid was uptaken by the authorized host which transcribes an RNA II to initiate replication, the plasmid could get replicated.

Figure 1-4. Schematic of synthetic ColE1 replication system

5

Chapter 2

Methodology

Strains, media

For most cloning work, we used E.coli pir-116 strain from EPICENTRE Biotechnologies, for variants characterization, we used E.coli DH10B from Salis lab.

The Luria-Bertani (LB) media containing 10 g/l tryptone, 5 g/l yeast extract, 10 g/l Nacl was from

BD Biosciences. The chloramphenicol powder was from Alfa Aesar. Super optimal broth (SOC) was from HIMEDIA, and glucose from Sigma, and agar was also from BD Biosciences. TB buffer used for chemical competent cells making contains 250 mM potassium chloride from

VWR, 15 mM calcium chloride from EMD Serono, 55 mM manganese chloride from J.T. Baker.

Primary construct clone

In order to clone this dual-origin construct, firstly we performed the chew back, anneal and repair

DNA assembly method to merge R6K origin into ColE1-type backbone through annealing two overlaps together (Gibson et al., 2009).

Primers used to amplify these two parts are in the Appendix (R6Kamf-for, R6Kamf-rev,

FTVamf-for, FTVamf-rev). Fragment amplification reactions were performed following NEB protocol M0530 with Phusion High-Fidelity DNA polymerase. The melting temperature used for amplifying R6K fragment was 63˚C, while for amplifying backbone the melting temperature was set at 58˚C. PCR products were purified through agarose gel electrophoresis and then CBAR

(chew back annealing reaction) was performed at 50˚C for 1 hour. Before transforming the

6 production into E coli Pir116 chemical competent cells, the product needed to be digested by

DpnI and purified by a PCR clean-up procedure. The correct construct was then verified by sequencing in Penn State Genomics Core Facility.

LOCK and KEY variants clone

After we cloned the primary LOCK and KEY construct, we used two unique restriction sites

XhoI and PstI to insert wild-type RNAII. The primers used for amplifying RNAII were wtRP- amf-for and wtRP-amf-rev (sequences in Appendix). We ligated these two parts following NEB

T4 DNA ligase protocol in room temperature for 8 minutes and then performed PCR clean-up to purify and concentrate the ligase products. We eluted ligation products within 4µL double distilled water then transformed the mixture into E. coli Pir116 chemical competent cells. The correct construct was verified by sequencing at Penn State Genomic Core Facility.

We used the same strategy to clone a modified ColE1 origin. We ordered a pair of oligo from

Integrated DNA Technologies (IDT) to form a truncated ColE1 origin and an upstream constitutive promoter with two overhangs of SphI and XmaI. We amplified a backbone with the relocated RNAII primer with two restriction sites SphI and XmaI on both ends and used it as backbone to ligase the truncated ColE1 origin and promoter. After a PCR clean-up procedure we transformed the ligated fragment into E.coli pir116 chemical competent cells. The correct construct is verified by sequencing at Penn State Genomic Core Facility.

At 3’end of RNAII, we use restriction sites XhoI and SacI to insert poly “A” sequence, and we varied rescue sequence between XhoI and AatII. For the compensatory origin construction, we used AflII and SpeI two restrict sites to insert compensatory origin at the 3’ end corresponding to cognate RNAII sequence.

7 All the segments like “poly A”, rescue sequence, compensatory origin were synthesized by

Integrated DNA Technologies. Due to complex second structures at the 3’ end of both RNA II and origin (Masukata and Tomizawa, 1984), it is preferred to use digestion and ligation to complete the cloning work.

However, when we optimize the RBS of CAT gene, we used chew back, annealing and repair

(CBAR) method to insert the new RBS. The primers for CBAR are listed in the appendix. We designed an RBS library which refers to a translation initiation rate ranges from 2k ~ 70K artificial unit with RBS library calculator developed by Salis lab (Salis, 2009).

Variants characterization and transformants antibiotic susceptibility assay

Here we developed a reliable experimental procedure to test our synthetic replication system.

Firstly we cloned new mutants into E.coli pir116 strain and obtained the correct construct, and then characterized it in DH10B. That’s the general work flow of conducting our experiments.

Since the vector we used to clone LOCK and KEY constructs has a chloramphenicol antibiotic marker, we tried to align the replication ability of those mutants with their antibiotic resistance to chloramphenicol. The characterization procedure started with sequencing verified constructs. We transform those into E.coli DH10B cells following a chemical transformation protocol; in order to normalize data, we diluted the plasmid to a concentration under 20 ng/µL measured by

NANO DROP 2000c from Thermo SCIENTIFIC. Then we transfer 10 ng of each plasmid into

E.coli DH10B chemical competent cells. All the tubes need to be incubated in ice for at least 30 minutes before heat shock. The heat shock process was carried on at 42˚C through water bath for

75 seconds, and then all the tubes need to be incubated on ice again for 2 minutes. After that, 400

µL super optimal broth (SOC) was added in each tube and grow those cells at 37˚C with 200-300 r.p.m. orbital shaking for 1 hour. After then we spread 45 µL mixtures from each tube on a

8 gradient chloramphenicol selective Luria-Bertani agar plates to make sure there is evenly 1 ng plasmid DNA distributed in cells on each plate. All the plates were incubated in 37˚C oven for overnight growing, and the last step was to count colonies on each plate and recorded the amount.

When we counted the colonies, we always count homogeneous colonies.

9 Chapter 3

Results

Construction of a dual-origin system

The first objective is to build a plasmid with two replication origins: ColE1 and R6K. The reason why we employ R6K origin is to guarantee the maintenance of the plasmid even if ColE1 origin wasn’t functioning. E.coli pir116 expresses a trans-acting pir protein which is essential for R6K origin replicating the plasmid.

Figure 3-1. Schematic map of P0

As described in methodology section, we used a chew back, anneal, and repair reaction to generate this two-origin vector as shown in figure 3-1, and we named it P0. In order to verify both origins are working, we transformed P0 with evenly amount into E.coli DH10B and E.coli pir116 and inoculate single colonies of each strain in the same condition. After overnight growing, we extracted plasmid from both hosts in a 5mL cell culture and compared their concentration. In

E.coli pir116 cells, the concentration of plasmid DNA is almost 1.5-fold higher than that of

10 DH10B cells, which suggested only ColE1 origin could replicate DNA in DH10B, while both

ColE1 and R6K could function well in pir116.

Figure 3-2. qPCR data of P0 in DH10B and Pir-116.

Separation of RNAII primer and replication origin

After we constructed the primary dual-origin vector, the very first question is to test whether

RNAII primer could be separated from ColE1 origin.

Firstly we transplanted the wild-type RNAII with its internal promoter into P0 vector; the schematic map of this new construct “P1” is shown in figure 3-3 below.

Figure 3-3. Schematic map of construct “P1”

11

Because “P1” has two wild-type RNAII primers, as the preliminary test for separating RNAII from ColE1 origin of replication we transformed it into both E.coli pir-116 and DH10B evenly and we got many colonies in both hosts (figure 3-4).

Figure 3-4. Transformation results in DH10B and pir-116 of P1 under 50 µg/mL chloramphenicol selection

After this test, we started to truncate native ColE1 origin. We deleted the internal promoter transcribing RNAII and all the sequence from -35 to -555. A short segment upstream of the origin was kept because there is a critical “CCCCCC” box for RNAII binding (Masukata and

Tomizawa, 1990). Besides, we inserted a constitutive promoter right in front of this “truncated origin” to ensure a transcription bubble will be available for RNAII entering. Figure3-5 illustrates the schematic map of this construct, and at this moment RNAII primer and ColE1 origin were separated physically.

12

Figure 3-5. Schematic map of construct “P2”

We transformed “P2” into both DH10B and pir116 and got large numbers of colonies in both strains (figure 3-6). It indicates RNAII and origin of replication may still retain replication at a separate location in a plasmid.

Figure 3-6. Transformation results in DH10B and pir-116 of P2 under 50µg/mL chloramphenicol selection.

Following these experiments we created three pairs of LOCK and KEY variants ( figure 3-7 ) by engineering synthetic RNAII and synthetic ColE1 origin and tested their performance in DH10B.

Table 3-1 presented the mutations of LK1.1, LK1.2 and LK1.3; all the mutations are located at

10bp upstream of origin. After characterizing these three mutants, we found them had same behavior in DH10B: when we only mutated RNAII, the system could not replicate, the reason can

13 be a mismatched RNAII and origin can’t be sufficient to form a strong and stable RNA/DNA hybridization to initiate the replication; when we mutated both RNAII and origin of replication, the system could replicate in DH10B, which suggested a complimentary origin and RNAII primer might have a strong affinity of binding so that the RNA/DNA hybridization would be good enough to start replication; while if we only mutated origin of replication, opposing our expectation, the system could still maintain replication in DH10B.

Figure 3-7. Schematic map of “LK1.0” variants

Table 3-1. Mutations of LOCK and KEY variants v1.0 at 3’ of both RNAII and origin of replication

Mutants # Original Sequence Sequence of Mutation

LK 1.1 GCCTATGGAAAAACC TAACGCTTAAGAAC

LK 1.2 GCCTATGGAAAAACC GCCTACTTAAGAAC

LK 1.3 GCCTATGGAAAAACC TTAAGGAATTCAAC

14 This unexpected outcome suggested mutating origin only may not be sufficient to stop replication, in other words, there is some other mechanisms to retain the replication. To verify this hypothesis, we constructed two controls: a vector with absence of RNAII primer (“C1” in figure 3-8) and another vector with absence of origin of replication (“C2” in figure 3-10).

Figure 3-8. Schematic map of construct “C1”

We characterized “C1” and “C2” in DH10B and found the system lost its ability to initiate replication without RNA II which demonstrated RNA II primer is essential for replication; however, with absence of origin of replication, “C2” could still replicate in DH10B which indicated RNAII primer itself was sufficient to maintain the replication. This may explain why mutating origin only can’t stop the replication since the RNAII was self-sufficient to initiate replication (figure 3-9).

Figure 3-9. Transformation results in DH10B of “C1” and “C2”

15

Figure 3-10. Schematic map of construct “C2”

Elimination of RNAII self-sufficiency

RNAII primer could activate replication using its own 3’ as the origin of replication. To minimize this self-sufficiency, we created a new version synthetic RNA II with minimal RNA/DNA hybridization. Figure 3-11 presented the way we eliminated RNAII’s self-sufficiency: we eliminated original sequence at 3’ end of RNAII and replaced them with poly “A”. The goal is to weaken the affinity of forming RNA/DNA hybridization at 3’ end and up to 42 A were inserted and we characterized those mutants through antibiotic susceptibility assay.

Figure 3-11. Schematic of minimizing cis-acting replication of RNAII

16 As the data in figure 3-14 indicates, the amount of transformants decreased obviously when we increased the number of poly “A” inserted at 3’ end. We used empty DH10B cells as our negative control, and in this figure we can tell the 42 “A” truncation gave a closest behavior as the negative control, which suggested with 42 “A” we could mostly eliminate cis-acting replication conducted by RNAII itself.

Figure 3-12. Antibiotic susceptibility assay outcome of poly “A” insertion

Engineering synthetic “LOCK and KEY” v2.0 Variants

Now we constructed a new synthetic RNAII primer with 42 A at its 3’ replacing original sequence, we named this construct as “P42A”. This RNAII can’t replicate plasmid by itself, however it brings another issue which is its 3’ end is too weak to hybridize with DNA template even when we added the origin of replication back to the system.

17

Figure 3-13. Schematic of LOCK and KEY v2.0 variants

To solve this issue, we need to maximize trans-acting replication between synthetic RNAII and origin of replication. Based on “P42A”, we tried to add short sequence right after 42 “A” to enhance RNA/DNA hybridization. We named this short sequence “rescue sequence”, referring to rescue replication. Correspondingly, to achieve the excellent RNA/DNA hybridization, we need to engineer a compensatory origin of replication for each synthetic RNAII. Thus, the straight way is to merge rescue sequence into origin. We named our new LOCK and KEY variants as

“LKv2.0”. By now, we designed 37 rescue sequences as showed in table 3-2.

Table 3-2. Rescue sequence profile No. Sequence Length nt #1 CCGAATGCT 9 #2 TGAACCTGA 9 #3 ATGCGGTTT 9 #4 TCATAGCCG 9 #5 TTCTAAAAG 9 #6 CTGTTAGAA 9 #7 AGCTTTGCT 9 #8 GAATTTAAA 9 #9 CTTAATTAC 9 #10 AGACGGCAT 9 #11 AGCCCTGCT 9 #12 AGGGCGGAG 9

18

#13 GAGGGCCGC 9 #14 GGGGCGGGT 9 #15 CCTATCGGAC 10 #16 CCTACCGGAC 10 #17 GCCCATTACGG 11 #18 GCCCCGGATTA 11 #19 GAGTCGGTGAGA 12 #20 TTCCATAGGCTT 12 #21 GGGGAAAAGGGG 12 #22 AGTAAAAGGCTT 12 #23 AGCCCCGCTGCT 12 #24 GGAAATTTGAAT 12 #25 CCATGGGCCCCG 12 #26 GCCGGGCCCTCG 12 #27 CCGAATGCTTTC 12 #28 CTACGAATGCTAT 13 #29 CTACGGATGCTAT 13 #30 CCTAATGCTATCGA 14 #31 CCTTATCGAAATGC 14 #32 GGGGCCGGGGGCCCC 15 #33 CCGAATGCTATCGAA 15 #34 GGGGCCCCGGGGCCCC 16 #35 CCGAATGCTATCGAATTC 18 #36 CCGAATGCTATCGAATTCCTG 21 #37 TTCCATAGGCTCCGGGGGGCCCC 23

Here are two major constrains we based on when design rescue sequence. One is the length, as showed in the table above, we varied the length from 9 nt to 23 nt; on the other hand, we used biophysical tools to theoretically predict the gibbs free energy needed for each rescue RNA sequence binds to its complementary DNA sequence (Hofacker, 2003). In figure 3-14, it shows a map ∆GHybridization of all the 37 rescue sequences versus their length.

19

Figure 3-14. 37 Alignment between biophysical parameter and length of 37 rescue sequence

In the map above, there is a region with a high density of plots, that’s because we occurred to find a certain rescue sequence with better performance at that area, and then we designed more sequences has similar length or thermal dynamic properties. That’s why we designed quite a lot sequences with a length of 9 nt or 12 nt with ∆GHybridization between -15 Kcal/mol and -20

Kcal/mol.

Evaluating “LOCK and KEY” variants

Another important series of experiment is characterizing 37 LKv2.0 mutants. When we conducted these experiments, we cloned two combinations of each variant: “LOC ONLY” and “LOCK &

20 KEY”. The schematic maps are shown in figure 3-15. We hypothesized the best LOCK and KEY set should only replicate DNA when both were present.

Figure 3-15. Two types of constructs for characterizing LKv2.0 mutants

Each round of antibiotic susceptibility assay will tell us the difference between “LOCK ONLY” and “LOCK & KEY”. Figure 3-16 showed the data of antibiotic susceptibility assay of all these

37 variants. Some of the variants could retain replication weakly with LOCK only; however, most variance obtained a steadily improved ability to replicate with both KEY and LOCK. After analyzing data, we selected13 variants with better performance for further study.

21

Figure 3-16. Antibiotic susceptibility data of 37 LKv2.0 variants

We identified those 13 candidates for orthogonal tests. The goal of orthogonal test is to test the specificity of our LOCK and KEY combination. The way we conducted this test was to characterize a random combination between “KEY” and “LOCK” and then compare the antibiotic susceptibility data with that of the specific combination. In figure 3-17, the data shows most of the 13 variants have an obviously higher transformants amount when “LOCK” and “KEY” are specific grouped.

Figure 3-17. Antibiotic susceptibility data of orthogonal test for 13 outstanding LOCK and KEY variants

22 Translational optimization for LOCK and KEY v2.0 variants

Although those 13 good variants we picked from all 37 have an outstanding performance, they can only sustain under 10 µg/mL chloramphenicol, which made it difficult for us to inoculate colonies and isolate plasmids. Thus, we are trying to find a way to improve their antibiotic resistance. Besides keep on optimizing “LOCK and KEY” system, we also used RBS calculator

(Salis, et al, 2009) to design a synthetic RBS for chloramphenicol acetyltransferase in order to increase its translation initiation rate. The comparison between original RBS and synthetic one is in table 3-3.

Table 3-3. Synthetic RBS designed by RBS calculator RBS of chloramphenicol acetyltransferase TIR(au)

Original GTTATCGAGATTTTCAGGAGCTAAGGAAGCTAAA 2357.59

synthetic TGTAGAGACGCAAGCAACTAAGGAACACAGA 23403.09

The synthetic RBS has a translation initiation rate at 23,403 artificial units, which is concerned to be 10 fold higher than the original RBS. We then cloned this new RBS into “LOCK and KEY” variant, and characterized the new vector’s antibiotic susceptibility, and compared to previous results; we found the new vector could maintain beyond 10 µg/mL chloramphenicol selection as shown in figure 3-18, which suggested the optimized antibiotic marker had a better expression.

23

Figure 3-18. Comparison between previous CmR vector and optimized CmR vector

Based on the data, we could tell the increased translation initiate rate of chloramphenicol acetyltransferase will help the cell accumulate more protein so as to improve its antibiotic resistance. With this method, we could optimize all other 12 good variants and then will have a translational optimized version for LOCK and KEY v2.0. The biggest benefit is we can inoculate those cells more smoothly because of higher antibiotic resistance. Additionally, this may also give us some clue about the relationship between copy number and antibiotic resistance.

24 Chapter 4

Discussion

Genetic security is an important issue in metabolic engineering because microorganisms engineered to produce commercial chemicals are always new and unique. The genetic information contained in the microorganism needs to be protected from stealing or any other kind of illegal utilizing. Since DNA is very easy to be stolen and amplified, a plasmid could be sufficient to maintain the recombinant DNA. One company spent three for five years and millions of dollars to engineer a microorganism to work, but just one micro liter plasmid stolen, they might lose the competition.

The LOCK and KEY replication system prevents unauthorized transformation by constructing a synthetic ColE1 origin of replication. In fact, although the machinery of ColE1 origin is well known, most studies focus on the kinetics of RNA interactions (Keasling and Palsson, 1989), or the repression of RNAI (Chiang and Bremer, 1991) or modulation of copy number (Camps, 2010) and so on, but no one has tried to manipulate it and make use of it.

By now, we have successfully engineered multiple pairwise LOCK and KEY sets, and tested their performance in a single plasmid through antibiotic susceptibility assay. Our next step is to engineer several “KEY” sequences into chromosomal of E.coli to create the authorized hosts.

LOCK and KEY system has multiple features which are essential to be applied into industry: first, the “LOCK and KEY” strategy is developed based on the mechanism of DNA/RNA hybridization and no chemical inducer needed, which makes the transformation process as simply as the regular one. Second, the amount of potential variants is huge based on our designing strategies. Third, orthogonal test proves our variants are specific and durable, which guarantees only cognate “KEY” and “LOCK” could trigger the replication.

25 Besides the significance of application, another benefit is we get a deeper understanding of the replication machinery of ColE1 origin. We not only separated RNA II from ColE1 origin, but also truncated the origin to a short version which still plays a role as the origin of replication and we proved in a physical separated situation, RNAII primer could still initiate the replication.

Another important finding we obtained is RNA II’s self-sufficiency. RNA II could hybrid with its own DNA sequence as the origin of replication, thus the key of the play is how to eliminate the interaction at its 3’ but ensure it is sufficient to bind to real origin of replication. Furthermore, we achieved many mutants of RNA II primer and also created compensatory origin for each engineered RNA II, and they are functioning well. This inspires us it is accessible to engineer synthetic replication elementor, and our next goal is to create the optimal sets and stabilize the copy number in our authorized hosts.

26

Chapter 5

Conclusion

LOCK and KEY authorized replication system is supposed to be a significant innovation for metabolism engineering field in the future. Considering bio-security issue has become a leading topic in nowadays, we do believe it is important to get prepared ahead. What we are developing is a solid non-chemical inducible genetic method which ensures plasmid replication in authorized hosts. It is safe, durable and prevents unauthorized replication of high-value recombinant DNA.

By now, we have successfully separate RNAII from origin of replication and tested they can function at a sequestered situation; we designed and characterized multiples of LOCK and KEY variants and picked the best couples for further research and application; now we are engineering authorized host and heading to our eventual goal.

“LOCK and KEY” authorized replication system will allow companies to have their own lock and key sets to protect DNA information. We believe this replication protective system will be useful and widely used in the field of metabolic engineering. Like a similar story in the development of USB flash drive, we believe every plasmid will only be carried by an authorized host one day.

27

Appendix

Primers for PCR amplification and CBAR

Name Sequence Description ttcatatactaactttttattcatcgtacttttaattaatttaCCTATACACTATCTAgc R6Kamf-for ttgggcccg Amplify R6K origin ataaaatttaaatagaatagaataataaaataataaaataaaaaaacgagttcttctg R6Kamf-rev aattggagaccgc Amplify R6K origin ttttttattttattattttattattctattctatttaaattttatATTACGCCCCGCCCTG FTVamf-for CC Amplify ColE1 vector taaattaattaaaagtacgatgaataaaaagttagtatatgaaCGCTTGGACTCCT FTVamf-rev GTTGATAGATCC Amplify ColE1 vector wtRP-amf-for TATA GAA TTC GTT TTT CCA TAG GCT CCG C Amplify wt RNA II wtRP-amf-rev TAT ACT GCA GGA CCA AAATCC CTT AAC GTG AG Amplify wt RNA II GTCGTTCGCTCGAGCTCAAGCTACTAGTGGGCTGTGTGCACGAACCC Adding extra mtRP-amf-for CCCGTT Restriction Sites GTGCACACAGCCCACTAGTAGCTTGAGCTCGAGCGAACGACCTACA Adding extra mtRP-amf-rev CCGAACTGAGATACCTAC Restriction Sites CAT-ins-for ATATACTAGTCCGAATGCTATCGAAGACGTC RBS library insertion ATTTTTTGAGTRTAGAGACGCAAGCAACTAARGARCACAGAATGGAG CAT-ins-rev AAAAAAATCACTGGATATACCACCG RBS library insertion TCTGTGYTCYTTAGTTGCTTGCGTCTCTAYACTCAAAAAATACGCCCG CAT-backbone-for GTAGTGATCTTATTTCATTATGGTG RBS library insertion CAT-backbone-rev GCCGTTTGTGATGGCTTCCATGTCGGCA RBS library insertion

Map of Typical LOCK and KEY Variant (Important restriction sites marked in red)

TACGTGCCGATCAACGTCTCATTTTCGCCAGATATCCTCGAGCCGAATGCTATCGAAGACGTCAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAGCTCTCGACGCTCAAGTCAGAGGTGGCGAAA CCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCC TGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGT AGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCG ACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGC AGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGT GGCCTAACTACGGCTACACTAGAAGGACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGG AAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAG CAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTC AGTGGAACGAACTGCAGaaaaGCGGCCGCGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGG CCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCCTGAGTAGGACAAATCCGCCGCCCTAGAGCATG CGCGTTCGGCTGCGGCGAGCGGTATCAGCTCACTCAAAGGCGGTAATACGGTTATCCACAGAATCAGGG GATAACGCAGGAAAGAACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTT GCTGGCGTTCTTAAGATATGTATTCTGCTAGTAGAATTCATGGAACTAGTAGCTAGCACTGTACCTAGGA CTGAGCTAGCCGTCAACCATATGTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCAC

28

GTTAAGGGATTTTGGTCATGCCCGGGTGCTTGGATTCTCACCAATAAAAAACGCCCGGCGGCAACCGAG CGTTCTGAACAAATCCAGATGGAGTTCTGAGGTCATTACTGGATCTATCAACAGGAGTCCAAGCGttcatat actaactttttattcatcgtacttttaattaatttaCCTATACACTATCTAgcttgggcccgaacaaaaactcatctcagaagaggatct gaatagcgccgtcgaccatcatcatcatcatcattgagtttaaacggtctccagcttggctgttttggcggatgagagaagattttcagcct gatacagattaaatcagaacgcagaagcggtctgataaaacagaatttgcctggcggcagtagcgcggtggtcccacctgaccccatgcc gaactcagaagtgaaacgccgtagcgccgatggtagtgtggggtctccccatgcgagagtagggaactgccaggcatcaaataaaacga aaggctcagtcgaaagactgggcctttcgttttatctgttgtttgtcggtgaactaattctgattcgcacgggcccatggctaattcccatgtc agccgttaagtgttcctgtgtcactcaaaattgctttgagaggctctaagggcttctcagtgcgttacatccctggcttgttgtccacaaccgt taaaccttaaaagctttaaaagccttatatattcttttttttcttataaaacttaaaaccttagaggctatttaagttgctgatttatattaattt tattgttcaaacatgagagcttagtacgtgaaacatgagagcttagtacgttagccatgagagcttagtacgttagccatgagggtttagtt cgttaaacatgagagcttagtacgttaaacatgagagcttagtacgtgaaacatgagagcttagtacgtactatcaacaggttgaactgct gatcttcagatcctctacgccggacgcatcgtggccggatcttgcgcaagcggtctccaattcagaagaactcgtttttttattttattatttt attattctattctatttaaattttatATTACGCCCCGCCCTGCCACTCATCGCAGTACTGTTGTAATTCATTAAGCATT CTGCCGACATGGAAGCCATCACAAACGGCATGATGAACCTGAATCGCCAGCGGCATCAGCACCTTGTCG CCTTGCGTATAATATTTGCCCATGGTGAAAACGGGGGCGAAGAAGTTGTCCATATTGGCCACGTTTAAAT CAAAACTGGTGAAACTCACCCAGGGATTGGCTGAGACGAAAAACATATTCTCAATAAACCCTTTAGGGA AATAGGCCAGGTTTTCACCGTAACACGCCACATCTTGCGAATATATGTGTAGAAACTGCCGGAAATCGTC GTGGTATTCACTCCAGAGCGATGAAAACGTTTCAGTTTGCTCATGGAAAACGGTGTAACAAGGGTGAAC ACTATCCCATATCACCAGCTCACCGTCTTTCATTGCCATACGAAATTCCGGATGAGCATTCATCAGGCGGG CAAGAATGTGAATAAAGGCCGGATAAAACTTGTGCTTATTTTTCTTTACGGTCTTTAAAAAGGCCGTAATA TCCAGCTGAACGGTCTGGTTATAGGTACATTGAGCAACTGACTGAAATGCCTCAAAATGTTCTTTACGAT GCCATTGGGATATATCAACGGTGGTATATCCAGTGATTTTTTTCTCCATTCTGTGTTCCTTAGTTGCTTGCG TCTCTATACTCAAAAAATACGCCCGGTAGTGATCTTATTTCATTATGGTGAAAGTTGGAACCTCT

29 Reference

Goeddel, D., D. G. Kleid, F. Bolivar, H. L. Heyneker, D. G. Yansura, R.Crea, T. Hirose, A. Kraszewski, K. Itakura, and A. D. Riggs. 1979. Expression in Escherichia coli of chemically synthesized genes for human insulin. Proc.Natl. Acad. Sci. USA 76 (1): 106-110.

Itoh, T., J. Tomizawa. 1980. Formation of an RNA primer for initiation of replication of ColE1 DNA by . Proc.Natl. Acad. Sci. USA 77 (5): 2450-2454.

Panayotatos, N. 1984. DNA replication regulated by the priming promoter. Nucleic Acids Research 12 (6): 2641-2648.

Dasgupta, S., H. Masukata, J. Tomizawa. 1987. Mutiple Mechanisms for initiation of ColE1 DNA replication: DNA synthesis in the presence and absence of ribonucleases H. Cell 51: 1113- 1122.

Masukata, H., J. Tomizawa. 1984. Effects of point mutations on formation and structure of the RNA primer for ColE1 DNA replication. Cell 36: 513-522.

Masukata, H., J. Tomizawa. 1986. Control of Primer formation for ColE1 plasmid replication: conformational change of the primer transcript. Cell 44: 125-136.

Masukata, H., J. Tomizawa. 1990. A mechanism of formation of a persistent hybrid between elongating RNA and template DNA. Cell 62: 331-338.

Tomizawa, J. 1984. Control of ColE1 Plasmid replication: the process of binding of RNA I to the primer transcript. Cell 38: 861-870.

Dooley, T. P., J. Tamm, and B. Polisky. 1985. Isolation and characterization of mutants affecting functional domains of ColE1 RNA I. J. Mol. Biol. 186: 87-96.

Som, T., J. Tomizawa. 1983. Regulatory regions of ColE1 that are involved in determination of . Proc.Natl. Acad. Sci. USA 80: 32232-3236.

30

Tomizawa, J., and T. Som. 1984. Control of ColE 1 Plasmid Replication: Enhancement of Binding of RNA I to the Primer Transcript by the Rom Protein. Cell 38: 871-878.

Tomizawa, J. 1990. Control of ColE1 Plasmidreplication: Interaction of rom protein with an unstable complex formed by RNA I and RNA II. J. theor. Biol 212(4):695-708.

Chiang, C. S., H. Bremer. 1991. Maintenance of pBAR322-Derived plasmids without functional RNA I. Plasmid 26: 186-200.

Keasling, J. D., and B. O. Palsson.1989. ColEI plasmid replication: a simple kinetic description from a structured model. J. theor. Biol 141: 447-461.

Franch, T., M. Petersen, E. Gerhart, H. Wagner, J. P. Jacobsen and K. Gerdes. 1999. Antisense RNA regulation in prokaryotes: rapid RNA/RNA interaction facilitated by a general U-turn loop structure. J. Mol. Biol. 294: 1115-1125.

Camps, M. 2010. Modulation of ColE1-like plasmid replication for recombinant gene expression. Recent Pat DNA Gene Seq 4(1): 58-73.

Gibson, D., L.Young, R. Chuang, J. C. Venter, C. A. Hutchison III, H. O. Smith. 2009. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nature Methods 6, 343 - 345.

Salis, H.M., E. A. Mirsky, C. A. Voigt. 2009. Automated design of synthetic ribosome binding sites to control protein expression. Nature Biotechnology 27, 946 – 950.

Hofacker, I. L. 2003. Vienna RNA secondary structure server. Nucleic Acids Research. 31(13): 3429-3431.