Split Inteins As Versatile Tools in Applications of Downstream Purification and

Bioconjugation

Dissertation

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy

in the Graduate School of The Ohio State University

By

Yamin Fan

Graduate Program in Chemical Engineering

The Ohio State University

2019

Dissertation Committee

David W. Wood, Advisor

Shang-Tian Yang

Andre Palmer

Copyrighted by

Yamin Fan

2019

Abstract

Over the past two decades, inteins have been extensively used in a wide variety of applications in . Split inteins are a subset of inteins, which are identified more recently and expressed in two separate segments naturally. They catalyze the splicing reaction in trans upon association of the two halves. Due to their unique features, split inteins offer improved controllability and flexibility in trans-splicing and trans- cleaving over the previous tools based on contiguous inteins. The engineered split inteins would allow the development of efficient self-cleaving affinity tags for purification applications and new methods for conjugation.

In this work, an engineered split intein derived from Nostoc punciforme (Npu) was applied in a column-free purification strategy in combination with the aggregating tag, elastin-like polypeptide (ELP), as an initial capture step for recombinant expressed in E. coli. Meanwhile, on-column purification strategy using the same engineered split intein was employed for the production of value-added biosimilar target,

Granulocyte-colony stimulating factor (G-CSF). To adapt the split intein-based purification platform for the production of protein therapeutics expressed in mammalian cells, multiple leader sequences were designed and screened for optimal expression and secretion of intein-tagged precursor proteins. Moreover, the extein dependency of this ii

engineered Npu split intein was thoroughly characterized by using in solution cleaving kinetics study and Förster Resonance Energy Transfer (FRET) based high-throughput method. The information gathered guides for fast and consistent cleavage reactions among various target proteins and provides insight to the cleavage mechanism.

In this work, the trans-splicing properties of split inteins were also exploited for developing novel bioconjugation methods onto fluorescent nanodiamonds. Two split inteins, Gos-TerL and GP41.1, were used for the development of N-terminal and C- terminal oriented bioconjugation schemes, respectively. The new methods would allow rapid and spontaneous immobilization of proteins onto fluorescent nanodiamond surfaces for applications such as biomedical imaging or drug delivery.

iii

Dedication

This document is dedicated to my beloved family.

iv

Acknowledgments

This work would not have been possible without the support of many people.

I would like to thank my family, especially my parents, whose love and support are with me in whatever I decide to do. I am grateful that they have taken good care of themselves while I am away from home to pursue my degree and dream. I would also like to extend my deepest gratitude to my beloved husband, Weijie Mai. He is my role model and the person who keeps me motivated and curious in research and life. His enormous love and support all the time enables me to survive through all difficulties over the past five years of my Ph.D..

I would like to express my deepest appreciation to my advisor, Professor David W.

Wood, for his continuous support and guidance. He has always been trusting me and encouraging me for my research. He also provided me with valuable opportunities to attend conferences and connected me with people from industry and academia. Without his support, I would not be able to attend internships in summers and eventually pursue a career in pharmaceutical industry that I have been looking forward to for many years.

Besides my advisor, I would like to thank the rest of my thesis committee members,

Professor Shang-Tian Yang, Professor Andre Palmer, who also served on the committees for my qualification and candidacy exams. I would also like to thank Professor Jeffrey v

Chalmers for serving on my candidacy committee. Their insightful comments and questions have incented me to improve my research from various perspectives.

I thank our collaborators in Columbus Nanoworks Inc, Dr. Arfaan Rampersaud, Issac

Rampersaud and David Albertson, for working together for the project in Chapter 5.

My sincere thanks also go to my previous and current labmates. In particular, I would like to thank Jackelyn Miozzi, Joseph Taris, Brian Marshall for stimulating discussions throughout the projects and giving me help constantly. The experience would not have been as fun without the accompany of them in the lab and weekly lunches to Chick-Fil-A.

I am also extremely grateful to Dr. Tzu-Chiang Han, Dr. Ashwin Lahiry, Dr. Merideth

Cooper, Dr. Samuel Stimple for mentoring me and training me all the skills and techniques during my Ph.D.. They are not only great mentors, but also important friends of my life. I would also thank the rest of the current lab members, Kevin McGarry,

Hongyu Yuan, Tarek Mazeed, Maria DeBastiani, Natalya Lavrenchuk, Maria Zindarsic,

Issac Delev, Christy Caporale, Farah Deeba, Lex Tallan and Joel Silleck, for all of their help and accompany.

Finally, I would like to appreciate the fun and support from my friends in Columbus.

They are more than friends to me and keep me accompanied like families. I would also thank my friends in other places of the world, especially Shannon Zhang. You guys keep my life meaningful and colorful.

vi

Vita

June 2014 ………………………………………..……………….… B.S. Bioengineering,

Zhejiang University, China

September 2014 to present …………………………………Graduate Research Associate,

Department of Chemical and Biomolecular Engineering,

The Ohio State University, U.S.

Publications

1. Fan Y., Miozzi M. M., Stimple S. D., Han T.C. and Wood D.W. (2018), Column-

free purification methods for recombinant proteins using self-cleaving

aggregating tags. Journal of Polymers. DOI: 10.3390/polym10050468

2. Lahiry A., Fan Y., Stimple S. D., Raith M. and Wood D. W. (2017), Inteins as

tools for tagless and traceless protein purification. J. Chem. Technol. Biotechnol.

DOI:10.1002/jctb.5415

3. Yang D T., Lu X., Fan Y., et al. Evaluation of nanoparticle tracking for

characterization of fibrillar protein aggregates[J]. AIChE Journal, 2014. DOI:

10.1002/aic.1434

Fields of Study

Major Field: Chemical Engineering vii

Table of Contents

Abstract ...... ii

Dedication ...... iv

Acknowledgments...... v

Vita ...... vii

List of Tables ...... xiii

List of Figures ...... xiv

Chapter 1 Introduction ...... 1

1.1 Inteins ...... 1

1.2 Inteins as self-cleaving tags for downstream purification of recombinant proteins . 5

1.3 Inteins as self-splicing tools for protein labeling onto nanoparticles ...... 10

1.4 Dissertation ...... 12

Chapter 2 Development of purification strategies using self-cleaving tag ...... 15

2.1 Introduction ...... 15

2.2 Materials and Methods ...... 26

2.2.1 Chemicals and Reagents ...... 26 viii

2.2.2 Plasmids construction ...... 26

2.2.3 Protein expression ...... 27

2.2.4 Lysis and recovery ...... 27

2.2.5 ELP-mediated protein purification ...... 28

2.2.6 NpuN* resin production ...... 29

2.2.7 On-column purification using intein resin on AKTA Pure 25 ...... 30

2.2.8 Intein cleavage analysis ...... 30

2.2.9 Protein quantification ...... 31

2.2.10 Activity assays ...... 31

2.3 Results ...... 33

2.3.1 Column-free purification for recombinant proteins using ELP-fused self-

cleaving tag ...... 33

2.3.2 On-column purification for proteins with disulfide bonds using the engineered

self-cleaving tag ...... 39

2.4 Discussions ...... 45

Chapter 3 Expression and purification of protein therapeutics from mammalian cell expression with self-cleaving tag ...... 48

3.1 Introduction ...... 48

3.2 Materials and Methods ...... 53 ix

3.2.1 Plasmid construction ...... 53

3.2.2 Recombinant protein expression in mammalian cell system ...... 53

3.2.3 Western blot analysis ...... 55

3.2.4 Recombinant protein purification using intein column ...... 55

3.3.5 N-Glycosylation determination ...... 56

3.3 Results ...... 57

3.3.1 Expression vector construction ...... 57

3.3.2 Leader sequence screening in 12-well plates ...... 60

3.3.3 Scale-up expression and purification ...... 65

3.4 Discussions ...... 69

Chapter 4 Characterization of Extein Dependency for the engineered Npu Split Intein .. 72

4.1 Introduction ...... 72

4.2 Materials and Methods ...... 77

4.2.1 Plasmid construction ...... 77

4.2.2 Shake flask expression ...... 77

4.2.3 24-well plate expression ...... 78

4.2.4 In solution cleaving kinetics study ...... 79

4.2.5 In vitro FRET assay ...... 80

x

4.3 Results ...... 81

4.3.1 N-extein characterization by in solution cleaving kinetics ...... 81

4.3.2 N-extein characterization using high-throughput method based on Förster

resonance energy transfer (FRET) ...... 87

4.4 Discussions ...... 96

Chapter 5 Site-specific and oriented protein labeling on nanodiamonds using self-splicing split inteins ...... 99

5.1 Introduction ...... 99

5.2 Materials and Methods ...... 106

5.2.1 ...... 106

5.2.2 Plasmids construction ...... 106

5.2.3 Protein expression ...... 107

5.2.4 IMAC purification and dialysis ...... 108

5.2.5 In solution splicing with purified precursors ...... 109

5.2.6 Design of experiment (DOE) studies for effect screening ...... 109

5.2.7 Intein peptide coupling with nanodiamonds ...... 110

5.2.8 Intein-mediated onto fluorescent nanodiamonds ...... 111

5.3 Results ...... 111

xi

5.3.1 In solution protein trans-splicing studies using purified precursors ...... 111

5.3.2 Protein trans-splicing onto nanoparticles using purified precursors and

clarified lysate ...... 122

5.4 Discussions ...... 130

Chapter 6 Conclusions and Future Work ...... 134

6.1 Conclusions ...... 134

6.2 Future work ...... 136

6.2.1 Development of the Npu split intein-mediated purification system for

commercialization ...... 136

6.2.2 Development of the split intein-mediated bioconjugation strategies for

commercialization ...... 139

Bibliography ...... 142

Appendix A Primer list ...... 161

xii

List of Tables

Table 1 Selected split inteins that are used for various applications ...... 5

Table 2 Currently approved biosimilar in the united states by FDA (data from FDA website) ...... 25

Table 3 Summary of purification results by using ELP self-cleaving tag ...... 39

Table 4 Summary of the leader sequences used in this work ...... 61

Table 5 Expected half-life of the cleavage reactions based on on-column cleavage studies

...... 88

Table 6 Signal to noise ratio and Z’ value of the FRET assay ...... 90

Table 7 factors and levels in the design of experiments for splicing effect screening ... 115

Table 8 JMP statistical analysis of DOE effect screening study for Gos-TerL...... 117

Table 9 JMP statistical analysis of DOE effect screening study for GP41.1...... 121

Table 10 Primers used for the plasmid construction in Chapter 2 ...... 161

Table 11 Primers used for the plasmid construction in Chapter 3 ...... 162

Table 12 Primers used for the plasmid construction in Chapter 4 ...... 164

Table 13 Primers used for the plasmid construction in Chapter 5 ...... 165

xiii

List of Figures

Figure 1.1 The split intein-mediated protein trans-splicing mechanism...... 3

Figure 2.1 The overall structure of the Npu split intein complex shown in a ribbon representation...... 16

Figure 2.2 Cleavage kinetics of different mutants of NpuN and NpuC with streptokinase

(SK) as the target protein...... 18

Figure 2.3 Schematic of column-free purification method using ELP-tagged split intein.

...... 20

Figure 2.4 Schematic of on-column purification method using the Npu split intein self- cleaving tag...... 22

Figure 2.5 The cleavage kinetics of the ELP-tagged split intein system for super-folder green fluorescent protein (sfGFP) at pH 8.5 and pH 6.2 with different temperature conditions respectively...... 35

Figure 2.6 Purification results using ELP self-cleaving tag for different target proteins by

Coomassie staining (a) super-folder green fluorescent protein (sfGFP); (b) -lactamase

(-lac); (c) Streptokinase (SK); (d) Maltose-binding protein (MBP)...... 38

Figure 2.7 Chromatograms for monitoring the purification process on the intein column using AKTA pure 25...... 41 xiv

Figure 2.8 High sensitivity silver staining SDS-PAGE gel for G-CSF purification using self-cleaving purification system...... 42

Figure 2.9 SEC chromatograms overlay of purified sample and formulation blank...... 43

Figure 2.10 In vitro study of G-CSF protein activity using the NFS-60 cell proliferation assay with comparison to standard purchased from PeproTech...... 44

Figure 3.1 Distribution of expression systems for recombinant proteins and antibodies currently approved in the US or EU...... 49

Figure 3.2 Mammalian cell secretion pathway...... 51

Figure 3.3 Insertion of IRES-GFP fragment enables qualitative and quantitative determination of transfection efficiency...... 58

Figure 3.4 Test of baseline expression of the current leader sequence from IgG kappa light chain and the effect of inserting IRES-GFP...... 59

Figure 3.5 (a) Overlay of phase contrast channel and GFP channel for 12 constructs with different leader sequences and different lengths of spacing after 72 hours expression in

12-well plate; (b) GFP assay results for three replicates of 12-well expression...... 62

Figure 3.6 Western blot results and JMP analysis result of twelve constructs expression level...... 64

Figure 3.7 Western blot of samples from HEK293F cells expression media for different intein-tagged target proteins...... 65

xv

Figure 3.8 Silver staining gel and western blot results for 20mL expression of NpuC*-

SEAP construct with SA leader sequence in HEK293EBNA cells and purification using split-intein mediated system...... 66

Figure 3.9 Western blot result for 20 mL expression and purification of NpuC*-IFN with

SA leader sequence ...... 69

Figure 4.1 The role of +1 extein residue of the engineered Npu split intein system. (A)

The N-intein (NpuN*) is covalently immobilized on an agarose resin (Thermo scientific

SulfoLink® coupling resin) and the target protein (eGFP) is fused to the C-terminal of the

C-intein (NpuC*) (B) Graph shows t1/2 (h) calculated for C-terminal cleavage of eGFP with different +1 residues...... 74

Figure 4.2 FRET-based screening strategy...... 76

Figure 4.3 The effect of key residues in the sensitivity enhancing domain. (a) SDS-PAGE results of cleavage kinetics of GDGHG-NpuN or GDGAG-NpuN or GAGHG-NpuN with

NpuC*-SK at pH 8.5 and pH 6.2 over the time period of 20 hours. (b) Calculated half- time of the reactions by assuming first-order reaction in all cases...... 82

Figure 4.4 Rate constant k of the cleavage reactions under both pH 6.2 and 8.5 with different -1 extein residues...... 83

Figure 4.5 (a) Overlay plot of cleavage rate at pH 6.2 and hydrophobic index of -1 residue; Note: proline’s hydrophobicity index data is not available. (b) Overlay plot of cleavage rate at pH 6.2 and amino acid pI of -1 residue...... 84

xvi

Figure 4.6 (a) Crystal structure of the Ssp DnaE intein (PDB entry 1ZDE) showing the close packing of His125 and Phe+2[99]. (b) Modified structure using UCSF Chimera based on the Npu DnaE intein (PDB entry 4QFQ) showing the relative position of Arg50 and -1 residue...... 86

Figure 4.7 Plot of FRET ratio versus time for intein variants reactions with different C- exteins ...... 90

Figure 4.8 Representative FRET screening results...... 92

Figure 4.9 (a) examples of slow, media and fast cleaving mutant screened from the library using FRET assay; (b) non-linear fitting of FRET decay curves using JMP software...... 93

Figure 4.10 Rate constants of the sequenced mutants screened from the N-extein library by using the FRET assay...... 94

Figure 5.1 fluorescent nanodiamond and its negatively charged nitrogen-vacancy defect center (NV- center) structure.[109] ...... 101

Figure 5.2 General scheme of protein trans-splicing with the possible , , and residues at the 1 and +1 positions of the intein or the C-terminal extein, respectively...... 103

Figure 5.3 Split intein-mediated bioconjugation scheme for N/C-terminal oriented attachment of protein of interests...... 105

Figure 5.4 (a) N-terminal oriented in solution splicing scheme; (b) C-terminal oriented in solution splicing scheme...... 112

xvii

Figure 5.5 Splicing results in buffers with or without the addition of 2 mM DTT at room temperature: (a) N-terminal oriented splicing scheme using Gos-TerL split intein; (b) C- terminal oriented splicing scheme using GP41.1 split intein...... 113

Figure 5.6 Precipitation of GosC-sfGFP was observed in buffers with low pH or low salt concentration during dialysis. A: 0 mM NaCl; B: 100 mM NaCl; C: 200 mM NaCl; D:

300 mM NaCl; the numbers represent the pH value...... 116

Figure 5.7 Gos-TerL split intein splicing efficiency versus splicing time plots under different splicing conditions. (Larger dots represent a ratio of 5:1 and smaller dots represent a ratio of 1:1) ...... 117

Figure 5.8 Scaled estimates and prediction profilers from the DOE effect screening study of Gos-TerL by JMP...... 118

Figure 5.9 GP41.1 split intein splicing efficiency versus splicing time plots under different splicing conditions. (a) with fragments ratio of 5:1; (b) with fragments ratio of

1:1...... 120

Figure 5.10 Scaled estimates and prediction profilers from the DOE effect screening study of GP41.1 by JMP...... 121

Figure 5.11 (a) Scheme of in solution splicing with chemically synthesized GosN peptide and GosC-sfGFP and their theoretical splicing product; (b) SDS-PAGE gel result for in solution splicing test with different ratio of GosC-sfGFP to GosN peptide; (c) Protein sequence coverage results from Bottom-up LC MS/MS...... 123

xviii

Figure 5.12 (a) Scheme of in solution splicing with chemically synthesized GP41.1C peptide and sfGFP-GP41.1N and their theoretical splicing product; (b) SDS-PAGE gel result for in solution splicing test with different ratio of sfGFP-GP41.1N to GP41.1C peptide...... 124

Figure 5.13 Creation of intein-tagged micro/nanodiamond through maleimide-mediated coupling with sulfhydryl group on the peptide...... 125

Figure 5.14 Co-localization of Texas Red and FITC channels for fluorescent diamonds post trans-splicing reaction...... 126

Figure 5.15 Fluorescence detection of nanodiamonds with and without GosN peptide conjugated after protein trans-splicing with GosC-sfGFP using Maestro imaging system.

...... 128

Figure 5.16 Fluorescence detection of nanodiamonds with and without GP41.1C peptide conjugated after protein trans-splicing with sfGFP-GP41.1N using Maestro imaging system...... 129

xix

Chapter 1 Introduction

This chapter is partially based on my work, “Inteins as tools for tagless and traceless protein purification”, which was published in Journal of Chemical Technology and

Biotechnology in 2017.

1.1 Inteins

Inteins are intervening protein elements that can ligate flanking peptides (N-extein and C- exteins) while excising themselves in a process called protein splicing[1]. In 1990, the first intein was found embedded in the VMA1 of Sacchromyces cerevisiae[2,3].

Since then, more than 1500 inteins have been identified from all three domains of life using bioinformatics to align several highly-conserved sequence motifs[4]. While most of the inteins identified so far are contiguous, a small number of them have also been found to be artificially or naturally split recently. These split intein domains are individually transcribed, translated as two separate polypeptides, and spontaneously assemble to reconstitute a functional intein. The subsequent ligation of the fused N- and C-extein sequences is referred to as protein trans-splicing (PTS) [5,6]. Similar to contiguous inteins, the canonical split intein splicing pathway involves an N-S or N-O acyl shift at the splice sites after association, the formation of a branched intermediate, and 1

cyclization of a conserved residue at the C-terminus of the intein to form a succinimide ring. The result is excision of the intein and ligation of the flanking exteins[7]. The detailed mechanism is shown in Figure 1.1.

The amino acids strictly required for this single-turnover process are a cysteine or serine at the first position of the intein, referred to as the 1 position, and a cysteine, serine or threonine immediately downstream of the intein, referred to as the +1 position[9]. Some inteins have slightly different sequences in the conserved motifs, requiring alternative splicing mechanisms.

2

Figure 1.1 The split intein-mediated protein trans-splicing mechanism.(1) Split intein fragment association; (2) N-S or N-O acyl shift; (3) Transesterification; (4) Asparagine cyclization and acyl rearrangement.[8]

3

Normally, artificial split inteins are generated by splitting the corresponding contiguous inteins at the insertion site of the endonuclease domain, which is not required for splicing activity and thus a natural choice for this purpose. However, splitting at other sites can also result in functional split inteins[10,11]. The first identified naturally occurring split intein is from cyanobacterium Synechocystis sp. Strain PCC6803 (Ssp) DnaE protein[5].

Since then, similar split inteins have also been found in DnaE from other , among which the one from Nostoc punctiforme(Npu) has been well characterized[6]. Its N fragment consists of 102 amino acids and the C fragment has 36 amino acids with a half-life on the order of tens of seconds at 30 C[12]. The fast reaction rate and tight association of the N and C fragments make this Npu split intein powerful tools in protein engineering and biological applications. Recently, more split inteins have been identified by bioinformatics from metagenomics databases, which are proven to be a useful way for finding new split inteins with interesting properties[13-15]. For example, the GP41.1 intein is the split intein with fastest splicing that described so far, and the

Gos-TerL intein is the first intein identified to have a unique combination of Ser1 and

Cys+1. Table 1 has summarized some selected split inteins with their features and current applications.

4

Table 1 Selected split inteins that are used for various applications

Inteins Size of IN/IC Applications

Ssp DnaB 106/48 Purification[16]

Artificial Mtu RecA 110/58 Purification[17]

Mxe GyrA 119/79 Labeling[18]

Ssp DnaE 123/36 Cyclization[19]; labeling[20]

Npu DnaE 108/36 Labeling[21]; purification[22]

Naturally-Occurring AceL-TerL 25/104 Labeling[23]

GP41-1 96/36 Fastest splicing[15] Gos-TerL 37/115 Labeling[24]

The following sections will discuss in depth of existing inteins that have been developed as protein purification tools and bioconjugation tools.

1.2 Inteins as self-cleaving tags for downstream purification of recombinant proteins

Since the introduction of recombinant insulin in 1982, biopharmaceuticals have been a major driving force for growth in the pharmaceutical industry. Over 200 recombinant protein therapeutics have been approved, with many more currently in late-stage clinical trials[25]. Advancements in upstream production technologies for recombinant proteins have allowed titers of up to ten grams per liter to be achieved[26-28]. Hence, the

5

bottleneck for biomanufacturing has become protein purification. There is an increasing demand for the development of economical, simple, and efficient protein purification strategies and platform technologies. For instance, the historical dominance of monoclonal antibodies (mAbs) as a biopharmaceutical product class was made possible partly due to the development of the Protein A purification platform, which requires relatively minor adaptations from product to product. Protein A provided a means for relatively straightforward mAb purification, and mAb-based technologies have matured rapidly as a result. However, recent trends toward more diverse product portfolios necessitate flexible, more generally-applicable processes[29]. Purification processes based on conventional chromatographic techniques often require multiple chromatography steps with lengthy development of product-specific schemes due to the unique physical and chemical properties of each new molecule. Consequently, the use of affinity tags can provide a potentially generalizable platform for purification of a broad range of proteins, thus reducing development costs and time to market. Considering that affinity purification often results in high yield and purity, usually over 90%, the inclusion of an affinity tag is regarded as an economically favorable and timesaving alternative compared to conventional approaches[30,31]. Additionally, many tags can also have beneficial effects for the target proteins, such as facilitating protein folding or improving solubility[30].

Despite the advantages of tag-based purification technology, the presence of the tag may interfere with the biochemical properties of the target protein and potentially cause

6

immunogenic responses with proteins intended for human use. Therefore, in many cases, tag removal is required before the target protein can be characterized or administered as a therapeutic. Traditionally, serine proteases such as factor Xa, enterokinase, and 픞- thrombin were the reagents of choice for endoproteolytic removal of affinity tags; however, non-specific cleavage at secondary sites greatly limited their applications[32].

In recent years, certain viral proteases with more stringent sequence specificity have gained popularity, such as tobacco etch (TEV) and human rhinovirus 3C protease.

SUMO protease (Ulp1) has also been introduced to recognize the tertiary structure of the

SUMO tag and specifically cleave the target protein[33-35]. Alternatively, exoproteases, like aminopeptidase and carboxypeptidase, can be used for C-terminal tag removal. In practice, requirements for prolonged, high-temperature incubations with these proteases have prevented their use at the industrial manufacturing scale. More importantly, some of these proteases are extremely difficult to overproduce in heterologous systems, leading to high process costs[32], and in most cases, one or more non-native residues remain at the cleaved terminus of the processed protein.

Although proteases are still used for tag removal at the laboratory scale, there are also many newly developed self-cleaving enzymes that have the potential to be more efficient alternatives. Combined with an appropriate tag and affinity ligand, the use of self- cleaving enzymes can combine primary capture of the fusion protein, purification, and tag removal into a single step. The majority of these auto-catalytic proteins are engineered inteins, but can also be other self-cleaving enzymes, including FrpC from

7

Neisseria meningitides[36] , sortase A (SrtA) from Staphylococcus aureus[37], subtilisin and its pre-domain from Bacillus myloliquefaciens[38], N-terminal protease (Npro)[39] and Cysteine protease domain (CPD)[40]. Over past decades, inteins have been engineered for use as self-cleaving tags for the purification of recombinant proteins. By mutating critical residues, these inteins can be modified to N-terminal or C-terminal cleaving variants[7,22,41-43]. These cleaving variants can then be combined with an affinity tag (e.g. poly-histidine, binding domain (CBD), phasin (PhaR), maltose binding protein (MBP), etc.) or non-chromatographic tag, like elastin-like polypeptide

(ELP), to purify target proteins. The affinity tag or non-chromatographic tag is used to capture the target protein from a crude cell lysate, and self-cleavage is induced by addition of a reducing agent such as DTT, or by a change in environmental conditions such as pH or temperature. pH-induced inteins are the most economical because they only require a simple pH shift in the buffer system to induce cleavage. The most thoroughly studied pH-induced intein is Ssp DnaB intein, which has been fused with various affinity and non-chromatographic purification tags[44]. Another pH-inducible intein is the ΔI-CM mini intein, which is engineered from Mtu RecA intein[43]. This intein-based purification system has been successfully applied to purify different proteins in fusion with ELP rag and PHB-binding tag, as well as chromatographic tags like CBD and MBP[45]. For these inteins, the most significant limitation is the uncontrolled cleaving during expression of the tagged protein, which leads to product loss before the purification process takes place.

8

Compared to pH-inducible inteins, thiol-induced inteins exhibit minimal premature cleavage in vivo. The most published of these is the Sce VMA intein from

Saccharomyces cerevisiae, which is commercially available as part of the IMPACT Kit in fusion with a CBD tag and the target protein[41]. Its N-terminal cleavage can be induced by thiol compounds such as 1,4-dithiothreitol (DTT), 2-Mercaptoethane sulfonate sodium (MESNA), hydroxylamine, thiophenol, 2-mercaptoethanol or free cysteine, with DTT being the most commonly used reducing agent. However, many thiol compounds can disrupt disulfide bonds, which makes them unsuitable for purifying a significant number of therapeutic proteins. Moreover, thiol compounds are toxic and expensive, thus significantly increasing the cost of the overall process at large scale.

More recently, issues with premature cleaving have been addressed by the development of natural and artificial split inteins, which are inactive when expressed as separate segments but can be activated by fragment reassembly during the purification process. Lu and coworkers developed an affinity purification scheme using artificially split Ssp DnaB mini-intein with controllable tag removal by inducible auto-cleavage at either C-terminus or N-terminus of intein[16]. In our lab, artificially split ΔI-CM purification system was also developed in fusion with ELP tags on both segments, and it successfully purified four different sizes of target proteins[17]. Artificially split inteins are often less active than their counterparts partly due to a lower affinity between the split fragments and they tend to form aggregates when expressed separately. In 2013, Guan and coworkers developed a system called Split Intein Mediated Ultra-Rapid Purification of Tagless

9

protein (SIRP) system, which engineered the naturally fast-splicing Nostoc punctiforme

DnaE (Npu) intein into a rapid cleaving intein[22]. This system provides almost complete tag removal in less than 30 min at room temperature in the presence of 50 mM DTT.

1.3 Inteins as self-splicing tools for protein labeling onto nanoparticles

Nanoparticles engineered to carry drugs, proteins, or nucleic acids, have enabled the development of a host of tools for the diagnosis and treatment of disease[46-51]. This includes combination therapeutics, in which nanoparticles carry multiple drug payloads and targeting moieties[46,51,52], theranostic nanoparticles that can both identify and treat disease[46,52], and smart nanoparticles that response to local environments to release their payload[46]. Nanoparticles that are under development include nanodiamonds, carbon nanotubes, mesoporous silica, iron oxide, quantum dots, plasmonic nanoparticles, gold nanoparticles, and upconverting nanoparticles[46,53,54].

For most biomedical applications, target proteins or drug payloads are conjugated to the nanoparticle surface, by chemical reactions[55]. Conventional chemical reactions, targeting lysine or cysteine residues of proteins are inadequate for many precision applications or require the user to possess the expertise needed for alternative bioconjugation reactions, such as Click reactions. These reactions often result in random coupling and additional stress to proteins from harsh reaction conditions. Currently, there is a need for new conjugation approaches that are fast, efficient and can provide one or more of the following; site-specific linkages, homogenously oriented target molecules on

10

the nanoparticle surface and attachment of more than one target on the nanoparticle surface. An important additional requirement is simplicity, allowing novices in protein conjugation chemistry to easily explore new ideas in nanoparticle design and application.

One of the promising approaches to achieve site-specific and oriented bioconjugation is by using split intein-mediated protein trans-splicing (PTS). The kinetics of protein splicing is relatively fast compared to other methods, with a number of them having reaction times within several minutes. Another significant advantage of PTS-based protein ligation over is that it can be readily applied in vivo. In the previous studies, split inteins have been successfully applied for site-specific labeling of functional groups on to living cell surfaces. In such methods, one fragment of the split intein can be fused with the desired label, like fluorophore or functional peptide, while the other fragment can be fused with a transmembrane protein exposed at the cell surface.

Upon incubation of the two components, desired label on to living cells can be achieved.

For example, human transferrin receptor was labeled with a fluorophore at its C-terminus exposed at the cell surface in a living cell through trans splicing using an artificial split- intein[56]. In a similar study, natural split intein Npu DnaE was used for ligation of an exogenous functional polypeptide to membrane proteins on living cells[57]. In another in vivo labeling example, split intein based trans-splicing was used to site-specifically conjugate target proteins to quantum dots (QDs)[58,59]. In particular, the use of orthogonal split inteins, which do not cross-react with each other, might allow

11

simultaneous multicolor or multifunction labeling of proteins within living cells or nanoparticle surfaces.

1.4 Dissertation

This dissertation will be broken into two major sections according to the applications of the split inteins. The first section (Chapter 2, 3 and 4) will focus on the development and optimization of the self-cleaving intein purification platform using the engineered Npu split intein. The second section (Chapter 5) will focus on the development of novel bioconjugation platforms for site-specific and oriented attachment of proteins onto nanodiamonds using two atypical split inteins.

Chapter 2 will describe the successful demonstration of the Npu split intein system for column-free purification of proteins in combination with the aggregating tag, elastin-like polypeptide (ELP), as well as on-column purification for high value-added biosimilars.

The pH sensitivity of the established column-free system was explored, and four examples of different proteins were purified using the system with detailed characterization. In parallel, a popular biosimilar target, Granulocyte-Colony Stimulating

Factor (G-CSF), were expressed in fusion with the intein tag and purified on the intein column. The final purified materials were characterized for the purity level, aggregation level and potency.

Chapter 3 will describe efforts to adapt the Npu split intein purification system for mammalian cell expressed protein therapeutics production. Screening platform for

12

optimizing secretion level of mammalian cell expressed proteins were developed and different leader sequences were screened for the model protein, secreted embryonic alkaline phosphatase (SEAP). The optimal leader sequences were utilized for the scale-up production and purification of complex proteins with proper glycosylation.

Chapter 4 will describe the in-depth characterization of the extein dependency of the Npu split intein system. In solution cleaving kinetics were carried out for testing the -1 residue effect on the cleavage rate and pH sensitivity of the engineered split intein. Furthermore, a new assay based on Förster Resonance Energy Transfer (FRET) was developed to further characterize the N-extein mutant libraries in a high-throughput format.

Chapter 5 will describe an ongoing project to develop new bioconjugation platforms for attaching proteins onto fluorescent nanodiamonds (FND) using two atypical split inteins.

The N-terminal oriented conjugation platform was developed by using Gos-TerL intein, while the C-terminal oriented conjugation platform was developed by using GP41.1 intein. A full design of experiment study (DOE) was designed and executed for examining the ligation efficiency of each split intein under different reaction conditions, including pH, salt concentration, temperature and components ratio. As proof-of-concept, the super-folder green fluorescent protein (sfGFP) was used as model protein and demonstrated to be successfully conjugated onto the fluorescent nanodiamonds in both strategies.

13

Finally, in Chapter 6, the work in this dissertation will be summarized and the main results will be highlighted in a broader context. Future work for each section will also be discussed.

14

Chapter 2 Development of purification strategies using self-cleaving tag

This chapter is partially based on my work, “Column-free purification methods for recombinant proteins using self-cleaving aggregating tags”, which was published in

Journal of Polymers in 2018.

2.1 Introduction

Previously, lots of proteins have been purified on a laboratory scale using contiguous intein-mediated self-cleaving tags from E. coli expression system[60-62]; although, premature cleavage during protein expression heavily limits the application of this approach. More recently, the discovery of naturally split inteins has enabled new purification strategies to eliminate the premature cleaving issue while retaining fast cleaving properties [22,42]. The Nostoc Punctiforme (Npu) DnaE split intein has been thoroughly characterized and shows promise, with fast reaction rate and tight association of the intein fragments. Figure 2.1 shows the crystal structure of the Npu split intein with key residues and secondary structures annotated. However, the wild type intein does not exhibit controlled cleaving and thus is not practical for the development of a purification platform (Figure 2.2a). For these reasons, based on accumulated understandings of intein

15

biology, previous lab members sought to combine known rational to engineer an optimized intein for industrial applications in protein purification.

Figure 2.1 The overall structure of the Npu split intein complex shown in a ribbon representation. Chains A (green) and B (magenta) refer to NpuN and NpuC respectively.

Secondary structures are numbered from NpuN to NpuC (PDB: 4LX3).[63]

It was determined that a Cys1Ala effectively suppresses splicing and unwanted

N-terminal cleaving. In previous work, we have identified an aspartic acid to glycine mutation in the Mtu RecA intein that accelerates C-terminal cleaving in the absence of splicing[64]. This aspartic acid residue is conserved and present in the Npu split intein.

Chen et al. used this mutation to accelerate cleaving in the Npu split intein, but the cleavage still required high concentrations of thiol reagent[22,42]. The necessary addition 16

of the reducing reagent (DTT) for triggering the cleavage reaction significantly limited its application for the production of target proteins containing disulfide bonds, including many protein therapeutics. Based on the proposed pH sensitivity mechanism of our previously reported ∆I-CM intein, we elected to mutate the penultimate Serine residue

C ( Ser35) of the intein to Histidine. The resulting C fragment mutant is named as NpuC*.

The rationale was that the pH-dependence of the intein cleaving reaction closely mimics the pKa curve for the Histidine side chain, and this residue is within the active site and has been hypothesized to involve in the cleaving reaction directly[44,65-67]. Although somewhat more pH sensitive than the initial intein, this mutant showed moderate activity with unacceptable levels of pH control (Figure 2.2b). A sensitivity enhancing domain, which was initially designed to enhance zinc sensitivity from a computational design, was added in front of NpuN fragment. The resulting N fragment mutant is named as

NpuN*. The combination of the re-engineered intein fragments resulted in rapid and highly pH sensitive cleaving for the model protein, streptokinase (SK) (Figure 2.2d).

17

Figure 2.2 Cleavage kinetics of different mutants of NpuN and NpuC with streptokinase

(SK) as the target protein. (a) wild type NpuN and NpuC; (b) wild type NpuN and mutant

NpuC*; (c) mutant NpuN* and wild type NpuC; (d) mutant NpuN* and mutant NpuC*.

At laboratory scale, affinity domains such as Protein A, poly-Histidine, Chitin Binding

Domain (CBD) and Maltose Binding Protein (MBP), have been utilized extensively due to their high specificity and simple purification methods[30]. Although the corresponding affinity resins are commercially accessible and can achieve a high level of purity, these methods are inherently limited by high resin cost and slow volumetric throughput when processing large volumes. Thus, an opportunity exists in the use of non-chromatographic methods, which have significant potential for expanding bioprocess throughput and reducing purification costs at laboratory scale and in commercial manufacturing, especially for non-therapeutic bioproducts.

Over the years, biopolymers have been employed to create inexpensive non- chromatographic purification methods with good selectivity and yields. In most cases, 18

engineered biopolymers serve as purification tags, which can induce a tagged fusion protein to form highly selective aggregates under specific chemical or physical conditions. Reported tags include the elastin-like polypeptide (ELP), repeat-in-toxin

(RTX) domain, and ELK16 [68-71]. In particular, the ELP tag induces aggregation of fusion proteins under elevated temperature or high salt concentrations[68,69].

To demonstrate the potential impact of combining split inteins with polymeric aggregation tags, we have developed non-chromatographic purification strategies using the ELP tag in combination with an engineered Npu split intein derived for tag removal.

Importantly, this intein has been engineered to exhibit highly pH-sensitive cleavage as mentioned above, which minimizes product loss during precipitation and wash steps while providing rapid cleavage of the tag to release the purified target protein. The schematic of this column-free purification method is shown in Figure 2.3. As a proof of concept, four target proteins have been shown to be purified successfully in this work, including -lactamase (-lac), super-folder green fluorescent protein (sfGFP), streptokinase (SK) and maltose binding protein (MBP).

19

Figure 2.3 Schematic of column-free purification method using ELP-tagged split intein.

Meanwhile, we also have developed an optimized on-column purification strategy using the same engineered Npu split intein. For high value-added proteins and unstable proteins, this strategy can provide a higher purity as the primary capture step with very mild purification conditions. To achieve a practical and economical purification platform using the self-cleaving tag, the N-fragment of the split intein was engineered to be covalently immobilized onto agarose beads. In this way, the resin can be potentially regeneratable under chaotropic or caustic conditions for multiple cycles of use to reduce

20

the overall cost of the system. To achieve oriented and uniform immobilization, a single cysteine was fused to the C-terminus of the intein N-fragment with the internal cysteine residues (C29 and C60) mutated to serine. This was shown to enable site-specific conjugation onto a solid surface without compromising cleaving activity. With these modifications, the intein resin was made through the coupling reaction between the

sulfhydryl group of the unique cysteine on the C terminus of NpuN* and the iodoacetyl group on the commercial SulfoLink resin. The NpuC* fusion protein can be expressed in the suitable host and then purified by the assembly of the two complementary intein segments due to their nanomolar affinity. Due to the pH sensitivity of the engineered split intein, the cleavage reaction on column can be greatly accelerated through pH shift from

8.5 to 6.2. The schematic of on-column purification system using the Npu split intein self-cleaving tag is shown in Figure 2.4.

21

Figure 2.4 Schematic of on-column purification method using the Npu split intein self- cleaving tag.

In previous work, various simple target proteins have been successfully purified using this on-column purification method with high purity and yield. In this work, we seek to purify a more complex category of proteins that contain disulfide bonds. This is part of the project originated from a challenge initiated by the Defense Advanced Research

22

Projects Agency (DARPA). The ultimate goal is to achieve Biologically-derived

Medicine on demand (Bio-Mod) in a miniaturized device within short timeframes at the battlefield. The final device would have the flexibility to produce multiple types of therapeutics through modular reaction design. The work is in collaboration with the

University of Maryland, Baltimore County (UMBC) and ThermoFisher, where they are responsible for hardware development and upstream cell-free expression respectively.

The split intein mediated purification strategy is chosen as the first capture step for manufacturing single-dose levels of FDA-approved biologics in the miniaturized device, as it provides a general platform and eliminates the effort of product development for each individual target.

One of the target proteins that we highly focus on is a very popular biosimilar target,

Granulocyte-colony stimulating factor (G-CSF). G-CSF is also called filgrastim and used as bone marrow stimulant that helps the body make white blood cells after receiving cancer treatment. In 2008, the European Medicines Agency (EMA) approved the first biosimilar of G-CSF. Currently, there are seven biosimilar G-CSF products approved by the EMA[72]. All of the approved products including the reference product Neupogen

(Amgen) are produced in E. coli cells and consist of 175 amino acids, containing two disulfide bonds. The protein sequence is identical to that of natural human G-CSF, except for an added N-terminal methionine necessary for the expression in E. coli and that it is not glycosylated. Glycosylation extends its serum half-life in the human bodies but does not affect the activity of the molecule[73].

23

The cost of biological drugs is extremely high compared to chemically-synthesized drugs, which could be due to several reasons including the prolonged development stage and requirement of special handling. As of now, the patents of several top-selling biologics are either expired or about to expire, which incents a lot of the biopharmaceutical companies being involved in the biosimilar production. Biosimilars should potentially lower the health care cost through competition, which might provide patients broader access to biopharmaceuticals and further stimulate research and development. Table 2 summarizes the biosimilars that are currently approved for marketing in the United States by the Food and Drug Administration (FDA).

Potentially, the Bio-mod device could provide an alternative route for the production of biosimilars that obviates the need for individual drug stockpiling, cold chain handling, and long development for each product. In this work, we will demonstrate the successful purification of G-CSF using the split intein-mediated purification method as the first capture step. A detailed characterization will also be performed to evaluate the purity, stability, aggregation level and activity after recovery.

24

Table 2 Currently approved biosimilar in the united states by FDA (data from FDA website)

Originator Biologic Active Substance Manufacturer Approval Date Neupogen filgrastim- Sandoz 3/2015 sndz/Zarxio Remicade infliximab- Celltrion/Pfizer 4/2016 dyyb/Inflectra Enbrel etanercept- Sandoz 8/2016 szzs/Erelzi Humira adalimumab- Amgen 9/2016 atto/Amjevita Remicade infliximab- Samsung/Merck 4/2017 abda/Renflexis Humira adalimumab- Boehringer 8/2017 adbm/Cyltezo Ingelheim Avastin bevacizumab- Amgen 9/2017 awwb/Mvasi Herceptin trastuzumab- Biocon/Mylan 12/2017 dkst/Ogivri Remicade infliximab- Pfizer 12/2017 qbtx/Ixifi Procrit epoetin alfa- Pfizer 05/2018 epbx/Retacrit Neulasta pegfilgrastim- Biocon/Mylan 06/2018 jmbd/Fulphila Neupogen filgrastim- Pfizer 07/2018 aafi/Nivestym Humira adalimumab- Sandoz 10/2018 adaz/Hyrimoz Neulasta pegfilgrastim- Coherus 11/2018 cbqv/ Udenyca Bioscience Rituxan rituximab-abbs/ Celltrion 11/2018 Truxima Herceptin trastuzumab-pkrb Celltrion/Teva 12/2018 /Herzuma Herceptin trastuzumab-dttb/ Samsung 01/2019 Ontruzant

25

2.2 Materials and Methods

2.2.1 Chemicals and Reagents

All chemicals were purchased from either Sigma Aldrich (St. Louis, MO) or Thermo

Fisher Scientific (Waltham, MA), unless otherwise stated. All cloning enzymes were purchased from New England Biolabs (Ipswich, MA). All oligonucleotides were synthesized by Sigma Aldrich (St. Louis, MO).

2.2.2 Plasmids construction

Primer sequences used for plasmid construction in this study are available in Appendix A

Table 10. Plasmids encoding target proteins, -lactamase (-lac), streptokinase (SK), super-folder green fluorescent protein (sfGFP), maltose binding protein (MBP) and

Granulocyte-colony stimulating factor (G-CSF) tagged with C-fragment of the split intein

(NpuC*) were constructed by overlap PCR, where unique restriction sites for NdeI and

XhoI were designed at the 5’- and 3’- ends of the PCR products, respectively. Each PCR- amplified fusion protein gene was digested with NdeI and XhoI and ligated into pET21a

(+). To create the ELP-tagged NpuN* construct, the NpuN* fragment was amplified using the EcoRI-NpuN*-F and XbaI-NpuN*-R primers. The PCR product was digested with

EcoRI and XbaI and then ligated into the previously reported pET-ELP backbone plasmid[68]. NpuN* gene was also amplified by primers and cloned into pET vector using NdeI and XhoI site.

26

2.2.3 Protein expression

All protein expression experiments were performed in the Escherichia coli strain BLR

(DE3) except for intein-tagged G-CSF. The NpuC*-GCSF precursor protein was expressed in the engineered E. coli K12 strain Shuffle T7. Transformed cells containing the expression plasmids were cultured in 5 mL Luria Broth (LB) media supplemented with 100 g/ml ampicillin at 37 C for 16 to 18 hrs. The cultures were diluted 1:100 (v/v) into Terrific Broth (TB) or 2X LB media supplemented with 100 g/ml ampicillin in

Thomson’s UltraYieldTM Flasks. The BLR (DE3) cells were then grown at 37 C for 3-4 hrs until OD600 reached to 0.6-0.8 and the Shuffle T7 cells were grown at 30 C until

OD600 reached to 0.6 to 0.8. Protein expression was then induced by addition of 0.5 mM

(final concentration) isopropyl -D-1-thiogalactopyranoside (IPTG) and allowed to continue at 16 C for 24 hours.

2.2.4 Lysis and recovery

Cells were harvested by centrifugation at 6,000 g for 10 min at 4 C. The cell pellets were either stored at -20 C or immediately resuspended in either low salt buffer (20 mM

AMPD, 20 mM PIPES and 1 mM EDTA at pH 8.5) for ELP purification or wash buffer

(20 mM AMPD, 20 mM PIPES and 200 mM NaCl at pH 8.5) for BRT17 purification. In each case, the cell pellet was resuspended in one tenth of its culture volume before lysis.

The resuspended cultures were then sonicated for 10 cycles of 30 sec sonication at a setting of 4-5 W, with 30 sec on ice. The lysate was then clarified by centrifugation at 27

15,000 rpm for 15 min at 4 C and the supernatant was recovered for purification of the targets.

2.2.5 ELP-mediated protein purification

For each ELP purification procedure, 500 µl of ELP-NpuN* clarified lysate was mixed with a variable volume of NpuC*-POI clarified lysate according to the expression titer of the target protein. Ammonium sulfate was then added to the mixture to a final concentration of 0.4 M (using a 1:5 dilution of a 2M ammonium sulfate stock solution) and the solution was incubated at 37 C for 10 min. The sample was then centrifuged at

15,000 rpm for 6 min and the recovered pellet was re-dissolved in low salt buffer (20 mM

AMPD, 20 mM PIPES and 1 mM EDTA at pH 8.5). Another round of precipitation was done to increase the purity by adding ammonium sulfate as before to a final concentration of 0.4 M, and the solution was again incubated at 37 C for 10 min. The sample was then centrifuged again at 15,000 rpm for 6 min and the pellet was then dissolved in cleaving buffer (20 mM AMPD, 20 mM PIPES, 1 mM EDTA at pH 6.2). The sample was incubated in 37 C for 5 hrs. After 5 hrs of cleavage, ammonium sulfate was added as before at a final concentration of 0.4 M to precipitate the ELP tag, and the sample was incubated at 37 C for 10 min and centrifuged at 15,000 rpm for 6 min. The purified product was recovered in the supernatant.

28

2.2.6 NpuN* resin production

The SulfoLink coupling resin was purchased from Thermo fisher, which is porous, crosslinked, 6% beaded agarose that has been activated with iodoacetyl groups for covalent immobilization of peptides or molecules containing sulfhydryl (-SH) groups.

The expressed NpuN* from BLR cells were resuspended in 1mL of lysis buffer (20 mM

Tris, 2mM MgCl2, pH 8.5)/0.5g of biomass after harvest. Benzonase Nuclease

(Millipore) was added into the resuspended mixture with 10 units/mL to degrade all forms of DNA and RNA, which reduces the viscosity in protein extracts and prevents cell clumping. Then the mixture was passed through a homogenizer with 20 kpsi pressure to lyse the cells. The protein extract was purified on AKTA Pure 25 using 5 tandem HisTrap

FF columns and buffered exchanged into coupling buffer (50 mM Tris, 5 mM EDTA, pH

8.5) with a final concentration of 5 mg/mL. Equal volume of the empty resin was transferred to a gravity column and equilibrated with 10 column volumes (CV) of coupling buffer. The protein solution was added with a final concentration of 25 mM

TCEP (Thermo fisher) and mixed with the equilibrated empty resin for overnight at 4 C with gently mixing to allow coupling reaction to complete. Then, the resin was washed with 10 CV of coupling buffer to remove unbound materials, followed by incubation of

50mM L-Cysteine-HCl (Sigma Aldrich) in coupling buffer for one hour to block the unoccupied active sites on the agarose beads. Lastly, at least 6 CV of 1 M NaCl was used to wash out the remaining impurities and the resin was then washed with 2 CV of 20 %

Ethanol with one additional CV of 20% Ethanol added for storage. 29

2.2.7 On-column purification using intein resin on AKTA Pure 25

Intein resin was packed in a 0.707mL Omnifit glass column in house. The column was equilibrated with at least 10 CV of column buffer (20 mM AMPD, 20 mM PIPES, 200 mM NaCl, pH 8.5) until all traces stabilized. Sample was injected onto the column using a 10 mL SuperLoop at a flow rate of 0.7 mL/min. It was followed by 10 CV of column buffer to wash out the unbound proteins with a flow rate of 0.7 mL/min. Then 10 CV of cleavage buffer (20 mM AMPD, 20 mM PIPES, 200 mM NaCl, pH 6.2) was used to intiate the intein cleavage. The flow was stopped afterwards. The intein cleavage was carried out at room temperature for 20 hours. 5 CV of elutions were collected in a Frac-

920 fraction collection.

2.2.8 Intein cleavage analysis

For cleavage analysis, 20 μL samples were taken at different time points during the incubation process. The cleavage reaction in each sample was stopped by adding 2X sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) loading dye and heated at 98 °C for 10 min. The percent cleaved at each time point was estimated by scanning densitometry of the precursors and the product bands on SDS-PAGE gels using

ImageJ (NIH). The percent cleaved was calculated to be the intensity of the cleaved product band divided by the sum of the intensities of the precursor and product bands.

30

2.2.9 Protein quantification

Protein concentration was determined using a Bradford Assay with Bovine Serum

Albumin (BSA) used as a standard. The standard curve was generated by using 1, 2, 4, 6,

8 and 10 g BSA/mL water. Clarified lysate samples were diluted 1:10,000 and purified product samples were diluted 1:100 in water. The Bio-Rad Quick StartTM Bradford 1x

Dye Reagent (Bio-Rad Catalog #5000205) was used and OD595 was measured after 5 min incubation using a Biotek Synergy 2 plate reader. All assays were performed in triplicate.

2.2.10 Activity assays

Green fluorescent protein

Diluted samples of 100 L each were read in a Biotek Synergy 2 plate reader with an excitation wavelength of 485 nm and an emission wavelength of 528 nm. All assays were performed in triplicate.

-lactamase activity assay

-lac activity was determined using a nitrocefin dye-based colorimetric assay, where nitrocefin changes color from yellow to red when hydrolyzed by -lac. A 0.5 mM stock solution of nitrocefin was made by dissolving the appropriate amount of nitrocefin in dimethylsulphoxide (DMSO). Protein samples were diluted with PBS and added to a 96 well plate with the nitrocefin solution to give a final nitrocefin concentration of 100 M.

The mixture was incubated for 2 minutes at room temperature. After the color

31

development, OD492 was read using a Biotek Synergy 2 plate reader. An OD492 of 0.53 corresponds to 100 nmol of hydrolyzed nitrocefin[74]. One unit of -lac is defined as the amount of -lac needed to hydrolyze 1 mol of nitrocefin in one minute. All assays were performed in triplicate.

Streptokinase Activity Assay

The activity assay of SK is an endpoint chromogenic method based on the potency of converting plasminogen to plasmin. In this method, plasmin accelerates the hydrolysis of

S-2251™ (Chromogenix Catalog #820332) and results in the formation of a yellow end- product that can be detected at an optical absorbance of 405 nm. Thus, plasminogen activation by SK can be quantitatively assayed using the synthetic chromogenic substrate.

To perform the assay, the samples were diluted in 1X sample buffer (10 mM Tris-HCl pH

7.4, 0.1 mM NaCl, 1 mg/mL BSA) to a desired concentration. Substrate solution was prepared by adding 1 mL 0.5 M Tris-HCl pH 7.4 to 1 mL 3 mM S-2251 and 5 μL 10%

Tween 20 and was prewarmed to 37 C. Immediately before use, 45 μL of glu- plasminogen (1 mg/mL) was added to the substrate solution. To assay in a 96-well plate format, 60 μL of diluted SK sample was mixed thoroughly with 40 μL of substrate solution. A standard curve was generated by using recombinant SK over the range 10.0

IU/mL, 5 IU/mL, 2.5 IU/mL, 1.25 IU/mL, 0.675 IU/mL. The plate was then incubated at

37 °C for 20 min and the absorbance was measured at 405 nm[75]. All assays were performed in triplicate.

GCSF activity assay

32

Murine myeloblastic NFS-60 cells (M-NFS-60) were purchased from ATCC (ATCC

CRL-1838). The base media for this cell line is ATCC-formulated RPMI-1640 media, catalog No. 30-2001. The complete media were supplemented with 10% fetal bovine serum (FBS). The G-CSF activity of the fusion protein was measured by NFS-60 proliferation assay. NFS-60 cells were washed with RPMI media 1640/10% FBS and aliquoted to 96-well microtiter plates at a density of 1x105 cells per mL. Subsequently, 10

L of 10-fold serial dilutions of the purified G-CSF and G-CSF standards (PeproTech) were added in triplicates with concentration ranging from 0.1 pM to 10,000 pM. The plates were incubated at 37 C in a 5% CO2 incubator for 48 hours. An MTS cell proliferation assay kit (BioVision) was then performed according to the manufacturer’s instructions. The method is based on the reduction of MTS tetrazolium compound by viable cells to generate a colored formazan product that is soluble in cell culture media.

20 L/well MTS reagent was added into each well and incubate for 0.5 to 4 hours at 37

C in standard culture conditions. Then, the plates were briefly shaken and measured for absorbance of treated and untreated cells using the plate reader at 490 nm.

2.3 Results

2.3.1 Column-free purification for recombinant proteins using ELP-fused self-cleaving tag pH controllability of ELP-tagged split intein system

33

Most commonly, the cleavage activity of inteins can be induced or accelerated by either thiol addition, a small shift in pH or increases in temperature[76]. While thiol-induced inteins are more tightly controllable, pH-induced inteins are more economical at large scale. Thiol compounds may also disrupt the disulfide bonds in the target protein, which makes thiol-induced inteins unsuitable for a significant number of target proteins. In this method, we used the engineered Npu split intein, which itself exhibits pH sensitivity in previous in solution cleaving kinetics studies as shown in the introduction section. To confirm that the pH sensitivity is retained in this method, we carried out cleavage kinetics studies to evaluate the pH controllability of this intein in fusion to the ELP aggregating tag.

Time point samples were collected during the cleavage reactions for both self-cleaving aggregating tags at pH 8.5 and 6.2 respectively. At each time point, the reactions were stopped by addition of 2X SDS-PAGE loading dye and heating for 10 min at 98 °C. With both tags and super-folder green fluorescent protein (sfGFP) as a model target protein, the cleavage reactions were almost complete after two hours of incubation at 37 °C in pH

6.2 buffer. The results were shown in Figure 2.5. Further, the final cleavage efficiencies were approximately 94% for the ELP system. Importantly, the cleavage rate at pH 8.5 was approximately 4-fold lower compared to cleavage at pH 6.2, indicating that the pH sensitivity is retained with ELP tag. Specifically, at pH 8.5 and 37 °C, the cleavage efficiency was only 20% for ELP system. Additionally, at pH 8.5 and 25 °C, the cleavage efficiency was only 4% after the first hour of the reactions. This controllability can be

34

used to achieve minimum product loss during the initial precipitation and wash steps, while allowing relatively fast cleavage at lower pH during final protein recovery. The initial precipitation and wash steps during the purification process for both systems are less than an hour and can be performed at room temperature, which would further slow the intein cleavage, leading to a significant reduction of product losses.

Figure 2.5 The cleavage kinetics of the ELP-tagged split intein system for super-folder green fluorescent protein (sfGFP) at pH 8.5 and pH 6.2 with different temperature conditions respectively. (a) Normalized cleavage percentage of the reactions over time at pH 8.5 and pH 6.2 at 37 °C and 25C determined by ImageJ; (b) Samples at different time points (0 hr, 1 hr, 2 hr and 5 hr) during cleavage reaction at pH 8.5 and pH 6.2 at

37 °C and 25 C on SDS-PAGE gels.

35

Purification examples using self-cleaving Elastin-like polypeptide (ELP) tag

The ELP tag used in this work is comprised of 110 repeats of the block VPGXG, where

X is Val, Ala, or Gly in a 5:2:3 ratio. In a previously developed ELP-based purification method from our lab, we used the ΔI-CM mini-intein derived from the Mycobacterium tuberculosis RecA intein as the self-cleaving element, and artificially split it into a 110-aa

N-fragment and a 58-aa C-fragment[17]. In that work, both intein fragments were fused to ELP tags, while the C-fragment was also fused to the N-terminus of the target protein.

The two fragments were expressed separately and could be either purified separately and mixed, or combined before cell lysis and co-purified in a single mixture, in both cases eliminating premature cleavage during expression. Although that method was successfully demonstrated, the affinity of the two intein fragments for each other and the cleavage activity of the artificial split intein were much lower than the contiguous ∆I-CM parent intein. More importantly, fusions to the intein fragments tended to form aggregates during expression, which is often observed with artificially split intein systems[77-79]. It has also been hypothesized that the ELP tag places a significant metabolic burden on the expressing cells, and therefore may decrease overall expression of the target protein in

ELP fusions. The Npu intein of the current strategy is naturally split, however, and tends to exhibit much better folding in fusion to various target proteins, and has also exhibited rapid splicing and cleaving in some previously reported systems[22,80]. Because the Npu intein fragments exhibit better folding, the fusion between the intein C-fragment and the precursor protein no longer requires a solubility tag, which was the main function of the

36

ELP tag in the ΔI-CM artificially split intein system. Instead, the 36-aa Npu C-fragment alone acts as a small affinity tag for the target protein, where it is captured by the aggregating ELP-intein N-fragment through quick and tight association. Consequently, the current design is likely to increase the final yields and extend the capability of the system to purify a wider range of target proteins.

A schematic of the purification method based on the self-cleaving ELP tag is shown in

Figure 2.3. The self-cleaving aggregating tag (ELP-NpuN*) and the tagged target protein of interest (NpuC*-POI) were initially expressed separately and their clarified lysates were then mixed together to capture the target protein onto the ELP tag segment. Due to the strong affinity between the N and C-fragment of the Npu DnaE intein[81], both the aggregating tag and the precursor protein will be precipitated upon addition of 0.4 M ammonium sulfate and thus separated from the host cell proteins and impurities. The assembly of the two intein fragments and purification of the precursor proteins were performed in buffers at pH 8.5, after which the complex was solubilized in a low salt buffer at pH 6.2 to accelerate the cleavage rate. After 5 hours of incubation at low pH, the cleaved ELP tag was separated from the product protein through another round of precipitation, allowing the eluted product to be collected in the supernatant. To demonstrate the general applicability of the ELP-tagged split intein method for purification, four target proteins of various sizes (super-folder green fluorescent protein,

-lactamase, streptokinase and maltose-binding protein) were purified (Figure 2.6).

37

Figure 2.6 Purification results using ELP self-cleaving tag for different target proteins by

Coomassie staining (a) super-folder green fluorescent protein (sfGFP); (b) -lactamase

(-lac); (c) Streptokinase (SK); (d) Maltose-binding protein (MBP). Lanes: L: protein ladder; WL: whole lysate; CL: clarified lysate; W1: supernatant of first precipitation; W2: supernatant of second precipitation; 0 hr: start of cleavage reaction at pH 6.2, 37 °C; 1 hr: after 1 hour of cleavage; 2 hr: after 2 hours of cleavage; 5 hr: after 5 hours of cleavage; E: elution sample (recovered protein in the supernatant).

38

All four target proteins could be purified successfully using the self-cleaving ELP tag, where each showed no premature cleavage in vivo but fast cleavage upon dissolution in low pH buffer. The yield, specific activity and recovery of each target is summarized in

Table 3 for each protein. All of them showed 4 to 8-fold increase in yields compared to the previous dual ELP-tagged split I-CM purification system, ranging from approximately 200-300 g per ml of shake flask culture.

Table 3 Summary of purification results by using ELP self-cleaving tag

Aggregating Product Yield 1 Recovery Specific activity (Unit/mg) tag protein (g/mL) 2 Fluorescent under 485nm sfGFP 338.545.5 28.60.1% excitation ELP -lactamase 248.143.2 167.213.3 79.04.4% Streptokinase 389.065.3 22600.51851.9 30.30.5% MBP 223.062.2 Binds Maltose resin NR 3 1 Yield is defined as g recovered protein of interest per mL shake flask culture by using 1 mL of N-part clarified lysate; 2 Recovery is based on activity for elution and total cell lysate; 3 Not reportable.

2.3.2 On-column purification for proteins with disulfide bonds using the engineered self- cleaving tag

As mentioned in earlier sections, pH-induced inteins eliminate one of the most significant disadvantages that greatly limit the application scope of the intein-based purification

39

methods, which is the necessary addition of reducing reagents. These reagents, like DTT at high concentration (greater than 10 mM), are frequently used to reduce disulfide bonds of proteins and, more generally, to prevent intramolecular and intermolecular disulfide bonds from forming between cysteine residues of proteins. Although they are effective in inducing the intein cleavage reaction, they also make the intein-based methods unsuitable for purification of complex proteins with disulfide bonds.

As a demonstration that the split intein-mediated purification system is capable of purifying disulfide-bonded proteins, we present a case study here where the popular biosimilar target, Granulocyte-Colony Stimulating Factor (G-CSF), was chosen as a purification target. Since G-CSF contains two disulfide bonds, the established manufacturing process for the commercial product requires the expression of G-CSF in inclusion bodies in E. coli and then refolding into the active state. To eliminate the need of prolonged and complex refolding process, the NpuC* tagged G-CSF was expressed in

SHuffle T7 cells. It is an engineered E. coli K12 strain that can promote proper disulfide bond formation in the cytoplasm. The clarified lysate of the precursor protein was loaded onto the in-house packed split intein column using AKTA pure 25. It was followed by a wash step to remove unbound materials and impurities at pH 8.5 and then the buffer environment was switched to pH 6.2 by running 10 CV of the cleavage buffer through the column. The flow was then stopped and the intein column was incubated for 20 hours to allow the intein cleavage reaction to complete at room temperature. The elution fractions were collected and further analyzed. Figure 2.7 shows the AKTA chromatograms for the

40

purification process. SDS-PAGE gel samples were also collected during each step of the purification to visualize the process on the protein gel. Figure 2.8 shows the result of the high sensitivity silver staining gel for the collected reducing and non-reducing samples.

The strong band in the elution fractions matched the predicted size of G-CSF (18.8 kDa).

The purity after this single-step purification was over 95% based on the silver staining gel. In the non-reducing samples that without the addition of -mercaptoethanol in the loading dye, there were additional faint bands showing up around the predicted size of G-

CSF dimers and potentially higher aggregates. The higher molecular weight products or the host cell impurities could be further removed by incorporating additional polishing steps, such as anion exchange chromatography.

Elution Binding phase pH8.5 buffer wash pH6.2 buffer wash 20 hours

Figure 2.7 Chromatograms for monitoring the purification process on the intein column using AKTA pure 25.

41

Reducing Non-Reducing kDa CL FT W8.5 W6.2 E2 E3 E4 E5 E6 FB E2 E3 FB

35 G-CSF Dimers 25

18 Purified G-CSF

10

Figure 2.8 High sensitivity silver staining SDS-PAGE gel for G-CSF purification using self-cleaving purification system. CL: clarified lysate; FT: flow through; W8.5: wash with pH 8.5 buffer; W6.2: wash with pH 6.2 buffer; E: Elution fraction; FB: pooled and dialyzed into formulation buffer.

Meanwhile, we also ran analytical size exclusion chromatography (SEC) for the confirmation of the aggregation level for the final purified products. The elution fractions were pooled and dialyzed into its reported formulation buffer (10 mM Sodium Acetate,

5 % Sorbitol, 0.004 % Tween 20) for storage purpose. Figure 2.9 shows the overlay of

SEC chromatograms for the purified sample in formulation buffer and formulation blank when running on the Acclaim SEC-1000 column. The result showed that more than

99% of the purified material was in its monomer state, indicating that the product recovered from the split intein-mediated purification was highly stable and pure. The

42

purification conditions and the additional 20 hours incubation on the intein column do not trigger aggregation of the G-CSF molecule during the process.

Figure 2.9 SEC chromatograms overlay of purified sample and formulation blank.

To further test the activity of the recovered G-CSF from the split intein-mediated purification, the standard NFS-60 cell proliferation assay was performed. Active G-CSF stimulates the growth of the NFS-60 cells, while MTS assay can quantify its cell proliferation by following manufacturer’s protocol and measuring the absorbance at O.D.

490 nm. The G-CSF standard was purchased from ProproTech for comparison, with a reported activity of > 1 x 107 units/mg. Figure 2.10 shows the in vitro study of G-CSF activity using the NFS-60 cell proliferation assay. The assay was performed in triplicates 43

with a serial of dilutions of G-CSF, ranging from 0.1 pM to 10,000 pM, and untreated cells as control. The purified material recovered from the split intein-mediated purification showed at least 10 times higher activity compared to the purchased standard, indicating an activity of > 1 x 107 units/mg. The lower activity of the standard sample could be attributed to the potential activity loss during the storage and reconstitution process.

Figure 2.10 In vitro study of G-CSF protein activity using the NFS-60 cell proliferation assay with comparison to standard purchased from PeproTech.

44

In summary, based on all of the characterization presented in this work, the recovered material from the split intein-mediated purification was stable and potent with a very high purity.

2.4 Discussions

In the non-chromatographic purification method that we developed, we utilize the ELP aggregating tag with the engineered Npu split intein. The self-cleaving aggregating tag brings the target protein into an insoluble phase, allowing it to be purified by simple centrifugation and washing. Subsequent cleavage of intein delivers a purified native target in the soluble phase for simple recovery. Other reported systems using aggregating tags include ELK16, L6KD, FK and FR, which have also been fused to the contiguous

I-CM intein[82]. When expressed with target proteins, these peptides led to the formation of active aggregates and the intein cleavage could be used to recover the proteins in solution. Although these systems are very effective for purification of unstable peptides, they resulted in low yields due to premature cleavage by the I-CM intein.

Similar strategies described in this work could be adopted using the engineered pH- sensitive split intein for improving the purification results. Also, this column-free purification method could find useful applications for the commodity products production, where relative lower purities and lower costs are required as compared to protein therapeutics.

45

In the chromatographic purification method that we developed, we utilize the covalent immobilization of the N-fragment of the intein onto the agarose beads, and a small intein

C-fragment tag attached to the protein of interest. This self-cleaving tag mediated purification system allows the highly specific binding of the target protein on-column followed by the cleavage reaction at pH 6.2 to release the purified product. Due to the elimination of the reducing agents as the inducer of the cleavage reaction, this is the first reported system that utilizes split intein for purification of proteins with disulfide bonds.

Moreover, compared to previously reported systems, the covalent immobilization of N- fragment of intein allows for the regeneration of intein resin to be performed under harsh conditions without leaching into the solution and loosing ligand on the resin over multiple uses. It has been demonstrated that the resin can be regenerated using 6 M

Guanidine chloride solutions for at least 20 times. Thus, the overall resin cost can be reduced per batch of recombinant protein production. This on-column purification method could be applied for the production of protein therapeutics at manufacturing scale in the future, which generally needs a higher purity and a better controlled purification processes.

Additionally, since each fragment of the split intein alone is incapable of self-cleaving, these methods completely eliminate premature cleavage in vivo, but still provide rapid and pH-sensitive cleavage in vitro during the purification process. Further, due to the small size of the C-fragment of the intein (~3.6 kDa), the methods generally provide

46

higher yields compared to systems where tags were expressed in fusion to target proteins[17,71].

When comparing the cleavage rates of the chosen purified protein, it is clear that SK reported the fastest cleaving. It is also determined that -lac showed the slowest cleaving rate. The observed differences in the cleavage rate of different target proteins can be attributed to different C-exteins leading the target protein sequences, which have observed in several different systems[22,83]. It will also be further discussed in Chapter

4. Therefore, for specific targets, these methods could be optimized by differing incubation time and/or temperature, possibly resulting in a shorter processing time or an improved yield. In some cases, making small changes to the initial amino acids of the target protein may also help to increase cleaving rate and controllability.

These methods could be potentially applied to mammalian and other eukaryotic expression systems that require higher expression temperatures or extended expression times for more complex proteins. The elimination of premature cleavage and the use of mild cleavage conditions enable the production and purification of important therapeutics with complex glycosylation patterns. This potential application will be further explored in

Chapter 3.

In conclusion, the work presented and discussed in this Chapter provides a column-free and an on-column purification strategy, respectively, as platform technologies for the production of recombinant proteins. They are simple and cost-effective alternatives to the current downstream purification strategies.

47

Chapter 3 Expression and purification of protein therapeutics from mammalian cell

expression with self-cleaving tag

3.1 Introduction

Recombinant DNA technologies revolutionized the way that medicines are developed in the early 1970s and gave rise to the modern biopharmaceutical industry. Since the introduction of the first recombinant protein therapeutic insulin in 1982, more than 130 recombinant protein therapeutics have been approved and become available as extremely valuable medical treatments[84,85]. The development of monoclonal antibodies has been a dominant force in this growth and represents more than one-third of all approvals by

Food and Drug Administration (FDA), but there is also strong emergence of non- antibody therapeutics as well. Generally, these proteins are synthesized by large-scale cultivation of genetically engineered host cells, which includes microbial cells, insect cells or mammalian cells. For protein therapeutics to be effective, they need to be synthesized in biologically active forms with proper folding and sometimes post- translational modifications. As shown in Figure 3.1, mammalian cells are the most preferred platform in the biopharmaceutical industries for the currently approved biological drugs[86]. There are a couple of reasons for the preference. Most importantly,

48

they are more advanced than prokaryotes and have appropriate organelles to perform post-translational modifications to obtain fully functional and therapeutically safe proteins very similar to those naturally produced in human bodies. Additionally, many bacteria cells have a high endotoxin level that may be pathogenic when used for the therapeutic purpose in humans. This results in an additional endotoxin removal step to be incorporated, as well as demonstration of the endotoxin level below pathogenic threshold in the final product.

Figure 3.1 Distribution of expression systems for recombinant proteins and antibodies currently approved in the US or EU.

49

When using mammalian cell lines, most proteins are secreted into the media rather than requiring cell lysis to extract the protein of interest as in the case with prokaryotes. The most common mammalian cell lines used for protein therapeutics production include

Chinese hamster ovary (CHO) cells, Human embryonic kidney 293 (HEK293) and murine cells (NS0 and Sp2/0)[87]. Generally, translated target proteins will first enter endoplasmic reticulum (ER) before transport into the Golgi complex, the process of which is directed by a signal peptide at the N-terminus (Figure 3.2). These signal peptides are normally between 15 and 30 amino acids in length and will be cleaved off the target by signal peptidase once they have served their purpose[88]. The signal peptide sequences are very diverse in length and composition and are found in both prokaryotes and eukaryotes[89]. However, it was later found that they share very conserved structures. Signal peptides usually begin with a short positively charged amino acids sequence termed as “n-region”. The core of the signal peptide is a long sequence containing about 5 to 16 hydrophobic amino acids, which tends to form a single alpha helix termed as "h-region”. They usually end with several amino acids that recognized and cleaved by signal peptidase[88]. In addition to these common structures, proline residues are virtually absent at the end of the leader sequence and the beginning of the mature peptide[90]. The highly diverse primary sequence, however, suggests that the signal peptide plays an important role in regulating the secretion level of a particular protein from the cells. Moreover, the expression level of the protein is very likely to be adjustable by replacing the signal peptide.

50

Figure 3.2 Mammalian cell secretion pathway.

We have validated the purification strategies for the production of recombinant protein using the engineered self-cleaving tag from E. coli expression in Chapter 2. Therefore, it is evident that the work needs to be extended to demonstrate the compatibility of the split-intein mediated purification system with mammalian cell expressed protein therapeutics. This will greatly expand the application scope of the technology.

No previous intein-mediated purification systems have reported successful purification of proteins expressed from mammalian cells. The premature cleavage in vivo or the requirement of high concentration of the reducing reagent are the biggest obstacles for this particular application. We have successfully addressed these issues by developing the chromatographic method using the Npu split intein described in this work, which

51

provides a controlled and pH-responsive platform for purification of recombinant proteins. However, another challenge arose regarding secretion and post-translational modification of the target proteins tagged with NpuC* in mammalian cells. It has been hypothesized that one of the limiting steps during the secretory pathway is the translocation of secretory proteins into the lumen of the ER[91]. When intein C-fragment

(NpuC*) was inserted in between target proteins and their native signal peptides, previous results indicated that the precursor proteins had no secretion or very low level of expression from mammalian cells. This could possibly be attributed to the electrostatic interference between the signal peptide and NpuC* fragment due to the positively charged residues on the surface of NpuC*. Studies have also shown that sequences beyond the cleavage site of the signal peptide can influence its function[92]. Moreover, it is a common practice to improve secretion level of target protein by adopting foreign signal peptide[92-94]. It has been demonstrated that alternative signal peptides (leader sequences) can lead to increased protein secretion and many eukaryotic signal peptides are functionally interchangeable between different species. Based on these observations, it is very likely that we can identify potent signal peptides that can be fused to intein- tagged precursor proteins for improved secretion levels. Therefore, the goal of this work is to design and screen for optimal leader sequences that can lead to high secretion levels of recombinant proteins expressed in mammalian cells, as well as being compatible with our split intein purification system. To investigate this potential, we used the model protein, secreted alkaline phosphatase (SEAP), tagged with NpuC*, which the protein

52

itself is usually secreted and glycosylated. Through the screening of twelve leader sequences, we identified several optimal leader sequences for enhancing the secretion of

SEAP. The optimal leader sequence was further tested for scale-up expressions of five different target proteins. The glycosylation profile and the compatibility with the split intein-mediated purification were also demonstrated.

3.2 Materials and Methods

3.2.1 Plasmid construction

The expression vector is pTT vector, an oriP-based vector having an improved cytomegalovirus expression cassette[95]. The IRES-GFP was amplified and inserted into pTT vector by using two unique restriction enzymes, AfeI and PacI. The signal peptides were added into the plasmid pTT-IK-NpuC*-SEAP by inverse PCR. Expression plasmids for NpuC* tagged interferon alpha 2b (IFN2b), erythropoietin (EPO), tissue plasminogen activator (tPA), and granulocyte-colony stimulate factor (GCSF), were constructed by Gibson assembly into pTT vector backbone. The list of primers used in this work is shown in Appendix A, Table 11.

3.2.2 Recombinant protein expression in mammalian cell system

The mammalian cell line HEK293EBNA was purchased from ATCC (ATCC CRL-

10852TM). The frozen cell vial was taken from liquid nitrogen tank and was recovered by rapid thawing in 37 C water bath. The cells were added and cultured in Dulbecco’s

53

modified Eagle/s medium (DMEM) supplemented with 8 mM L-glutamine and 10% feral bovine serum in T-25/75 cell culture flasks. The cells were cultured at 37 C with 5%

CO2 in a humid incubator. For subculturing, old media were removed and 2 - 5 mL of trypsin-EDTA solution was added. The flask was allowed to sit at 37 C for about 5 min or until the cells detach. The cell mixture was transferred to a 15 mL falcon tube and centrifuged at 100 g for 5 min at room temperature to remove the trypsin/EDTA solution completely by aspiration. Fresh complete media was added, and the mixture was dispensed into new culture flasks with a subcultivation ratio of 1:4 to 1:10. The subculturing was performed twice per week to keep the cell line alive till the cells reached 30th generation. Before transient transfection, the cells were passage the day before and the confluency on the day of transfection was in the range of 50% - 70%.

Generally, 1 g of sterile plasmid DNA was transfected per 1 mL of expression media.

Plasmid DNA was added into Serum free media with 1/20 of expression volume. The transfection reagent, polyethylenimine (PEI), was added into a separate tube with same amount of serum free media. The tubes containing DNA and PEI were combined with a final ratio of 1:3 (w/w). The DNA-PEI mixture were incubated at room temperature for

10 min and then added into the culture media. The expressed protein was then harvested

3 to 7 days after transfection.

The suspension-adapted HEK293 cell line (Expi293F) was purchased from Thermo

Fisher. Cells, growing in a 12 well plates or 125 mL flasks, were transfected at a cell density of 2.5x106 /mL using ExpiFectamine 293 according to the manufacturer’s

54

instructions. Following transfection, the cells were grown at 37 C by shaking at 200 rpm in a 5% CO2 humid incubator for five days.

3.2.3 Western blot analysis

Protein samples were incubated with 2X SDS PAGE loading dye at 98 C for 5 min before loading onto SDS PAGE gel, alongside a prestained protein molecular weight marker (Thermo Fisher). The gels were run at 200V for 45 min and the proteins were subsequently transferred onto PVDF membranes using wet transfer method at 300 mA for 90 min. The membranes were blocked at R.T. for at least one hour or 4 C for overnight in 5% nonfat milk-PBS blocking buffer. And then it was followed by at least one-hour incubation with mouse anti-His primary antibody (Thermo Fisher) at 1:10,000 dilution and then at least 30 min with goat anti-mouse secondary antibody, which is labeled with horseradish peroxidase (HRP) at 10,000 dilution. Between each antibody incubation, the membranes were washed three times with PBST buffer (1X PBS, 0.1%

Tween 20). The membranes were incubated with SuperSignal West Femto Maximum

Sensitivity Substrate (Thermo Fisher) for 5 min and then scanned on LI-COR C-DiGit blot scanner to enable visualization of protein bands.

3.2.4 Recombinant protein purification using intein column

After harvesting from mammalian cell culture, the media was clarified at 1,000 g for 5 min to remove dead floating cells. The clarified media was buffer exchanged into intein column buffer using Amicon ultra centrifugal filters with molecular weight cut-off

55

(MWCO) of 30 kDa or diluted 5 times into intein column buffer (20 mM AMPD, 20 mM

PIPES, 200 mM NaCl, pH 8.5). Then it was loaded onto the intein column for capturing the precursor proteins. After binding, the column was washed with 15 column-volume

(CV) of column buffer to remove any host cell impurities. After washing, 15 CV of cleavage buffer (20 mM AMPD, 20 mM PIPES, 200 mM NaCl, pH6.2) was passed through to change the buffer environment of the resin. Then, 1 CV of the cleavage buffer was added on top of the resin with the bottom of the column capped to induce the intein

C-terminal cleavage reaction. The column was incubated at room temperature for 5 hours. Then, the cleaved product was collected from the column with two additional elution fractions.

3.3.5 N-Glycosylation determination

Peptide-N-Glycosidase F, also known as PNGaseF, is an amidase that cleaves N-glycans from glycolproteins. 9 l of purified protein was incubated 1l of glycoprotein denaturing buffer (10X) at 100 C for 10 min. The denatured sample was chilled on ice and centrifuge for 10 seconds. Then, 2 l GlycoBuffer 2 (10X), 2 l 10% NP-40, 6 l

H2O and 1 l PNGaseF were added into the reaction and mixed gently. The reaction mixture was incubated at 37 C for 1 hour. The extent of deglycosylation was assessed by running SDS-PAGE gels by checking mobility shifts.

56

3.3 Results

3.3.1 Expression vector construction

The cell line used in this study is human embryonic kidney (293) cells stably transformed with EBNA-1 gene (HEK293E), where its expressed protein drives episomal replication of ori-P containing plasmids. Moreover, the cell line is integrated with Ad5 E1a/E1b fragment to enhance transcription of CMV promoter driven transgene[95]. The expression vector was derived from the oriP-containing pTT vector, which is specifically designed for HEK293E cell transfection. NpuC* tagged SEAP (Secreted embryonic alkaline phosphatase) was used as the reporter gene to monitor the productivity through

SEAP secretion level determination and its glycosylation status. To account for the variability in the transfection efficiency between well-to-well and batch-to-batch, an additional internal ribosome entry site (IRES) taken from encephalomyocarditis virus

(EMCV) was inserted that would allow the expression of two genes in a single vector.

Thus, the insertion of green fluorescent protein gene (GFP) downstream of the IRES enables us to normalize the transfection efficiency through GFP fluorescence assay during the screening process. As shown in Figure 3.3, with the integration of the IRES-

GFP region, the cells with a successful transfection produced green fluorescence under the excitation of the GFP channel using the fluorescent microscope. This can be used as a quick and qualitative method to check the level of transfection for a particular expression.

By conducting quantitative green fluorescence assay against cell lysate after expression,

57

different transfection efficiencies and difference mRNA levels can thus be compensated by normalizing GFP signals to the secretion levels of the protein. This is especially important and crucial for comparison of the results when working with small volumes of samples in the well plates.

Figure 3.3 Insertion of IRES-GFP fragment enables qualitative and quantitative determination of transfection efficiency.

Before screening for optimal leader sequences, the baseline of expression and purification, as well as the effect of the insertion of the IRES-GFP region, were first tested. The signal peptide from IgG kappa light chain (IK) was used previously in our lab

58

with the contiguous intein, I-CM, which was thus used in this preliminary study.

Expressions were carried out with the constructs of pTT-IK-NpuC*-SEAP with and without the insertion of IRES-GFP region, respectively. Based on the results shown in

Figure 3.4, the insertion of IRES-GFP fragment did not have a significant effect on the expression level of the intein-tagged precursor protein.

Figure 3.4 Test of baseline expression of the current leader sequence from IgG kappa light chain and the effect of inserting IRES-GFP. 20mL each expression materials were purified using 200ul NpuN resin and resin samples were taken to monitor the cleavage at

0 hr, 5hr and overnight at room temperature.

59

However, both secretion levels of the two constructs were very low, with the majority of the proteins being trapped in the cell pellets. This indicates that the secretory pathway of the proteins was severely compromised. After loading the expression media onto the intein column for purification, we observed a small size shift of the proteins after a 5 hours or overnight incubation at pH 6.2 room temperature. This shift represented the intein cleavage reaction from precursor to untagged target protein. No observable elution bands after purification were detected from the western blot imaging, most likely due to the extremely low level of proteins captured on the intein column. Therefore, there is a considerable need to improve the expression and secretion level to apply this Npu split intein-mediated purification method for the production of more therapeutic-relevant proteins.

3.3.2 Leader sequence screening in 12-well plates

Through literature searches, three leader sequences were used in this study that had the highest secretion level reported by Kober and coworkers, which were Ig kappa light chain leader sequence from Mus musculus (IK), serum albumin leader sequence from Homo sapiens (SA), Azurocidin leader sequence from Homo sapiens (AZ)[93]. Furthermore, we incorporated different lengths of their native amino acids following the leader sequences. These constructs were also tested to see whether the addition of a spacer in between the signal peptide and the NpuC* tag would help eliminate the potential electrostatic disruption to the secretion pathway. The spacer lengths inserted in the constructs were 10, 20 or 30 amino acids taken from the original sequences downstream 60

of the signal peptides. A summary of the leader sequences used in this work was provided in Table 4. Since the IgG kappa light chain signal peptide (IK) was previously used for expressing contiguous intein fusion constructs in mammalian cells, it was set as the control of the study. A total of twelve different leader sequences were cloned into the expression vector. The HEK293E cells were transiently transfected with these expression constructs in the 12-well plate format, with each well containing 1mL DMEM media supplemented with 10 % fetal bovine serum and L-glutamine.

Table 4 Summary of the leader sequences used in this work

Leader Amino acid sequence Protein source sequence IK METDTLLLWVLLLWVPGSTGD IK+10 METDTLLLWVLLLWVPGSTGDIVLTQSPAS METDTLLLWVLLLWVPGSTGDIVLTQSPASLA Mouse Ig kappa IK+20 VSLGQRAT light chain METDTLLLWVLLLWVPGSTGDIVLTQSPASLA IK+30 VSLGQRATISCRASQSVS AZ MTRLTVLALLAGLLASSRA AZ+10 MTRLTVLALLAGLLASSRAGSSPLLDIVG Human MTRLTVLALLAGLLASSRAGSSPLLDIVGGRKA AZ+20 Azurocidin RPRQFP preproprotein MTRLTVLALLAGLLASSRAGSSPLLDIVGGRKA AZ+30 RPRQFPFLASIQNQGR SA MKWVTFISLLFLFSSAYS SA+10 MKWVTFISLLFLFSSAYSRGVFRRDAHK Human serum MKWVTFISLLFLFSSAYSRGVFRRDAHKSEVA SA+20 albumin HRFKDL preproprotein MKWVTFISLLFLFSSAYSRGVFRRDAHKSEVA SA+30 HRFKDLGEENFKALVL

61

After transient transfection of these twelve constructs in triplicates, images under phase contrast channel and GFP channel were taken and overlaid using fluorescent microscope coupled with Nikon camera (Figure 3.5a). All samples were harvested after 72 hours of expression. The HEK293E cells were then detached from the plate surface and washed with PBS buffer. The cell pellets were lysed by sonication and GFP assay was performed on the clarified lysates. The results were shown in Figure 3.5b for the triplicate runs.

Figure 3.5 (a) Overlay of phase contrast channel and GFP channel for 12 constructs with different leader sequences and different lengths of spacing after 72 hours expression in

12-well plate; (b) GFP assay results for three replicates of 12-well expression.

62

According to the results, the transfection efficiency showed huge variability among different constructs. The variability could come all kinds of sources, including the well positions, different quality of DNA samples, batch to batch differences and so on.

Therefore, it is necessary and important to have an internal control for more accurate comparison.

Meanwhile, western blots were performed for the samples taken from the culture media of each well. As depicted in Figure 3.6, the secretion level was strongly dependent on the leader sequence used. By performing statistical analysis on the results using JMP with

Dunnett’s with control, six out of twelve leader sequences showed significant improvement when compared to the control leader sequence IK. We observed about 4 to

10-fold increase of normalized secretion level by using these alternative leader sequences. Interestingly, the effect of the spacers varied among different signal peptides.

The spacers with 20 amino acids and more completely abolished the secretion in the case of IK and SA, however, we saw improvement with the spacers in the case of AZ. This indicates that some spacers might introduce additional disruption in the secretion pathway potentially due to the secondary structure formation, while others might help decrease the charge and change between the signal peptide and the intein tag.

63

Figure 3.6 Western blot results and JMP analysis result of twelve constructs expression level.

One of the optimal leader sequences, SA, was used for mammalian cell expression of four additional target proteins to determine whether replacing the native signal peptide with a “universal” leader sequence could overcome expression issues in other cases. The additional target proteins were erythropoietin (EPO), tissue plasminogen activator (tPA), interferon (IFN) and granulocyte-colony stimulate factor (GCSF), which are all therapeutically relevant. As depicted in Figure 3.7, all precursor proteins with the intein tag showed strong bands from the expression media. Based on the amino acid sequences, unglycosylated SEAP precursor is 62 kDa, EPO precursor is 21 kDa, tPA precursor is 63 kDa, IFN and GCSF precursors are both around 23 kDa. The apparent molecular weight of each target protein was all higher than its protein mass, indicating glycosylation for all

64

of them. For example, EPO has 3 sites of N-glycosylation and 1 site of O-glycosylation, even though it has a relatively small primary structure. Because of its glycosylation, EPO is reported to have an apparent molecular weight between 30 and 40 kDa.

SEAP EPO tPA IFN GCSF

80kDa

80kDa

20kDa

Figure 3.7 Western blot of samples from HEK293F cells expression media for different intein-tagged target proteins.

3.3.3 Scale-up expression and purification

Scale-up expressions and purifications were done for the constructs with optimal leader sequences. Transient transfected cells were harvested after 72 hours of expression and their supernatant were loaded onto the intein column for purification as described in

65

previous chapter. Samples collected during the processes were loaded onto SDS-PAGE gels and imaged by silver staining and western blot, respectively. The construct with leader sequence SA+0 yielded better results, which is shown in Figure 3.8.

P S 2X FT W 0hr 0.5hr 1hr 2hr 5hr E1 E2 PNGaseF PostE P S 2X FT W 0hr 0.5hr 1hr 2hr 5hr E1 E2 PNGaseF PostE

180 180

130 130

100 100

70 70

55 55

40 40

35 35

Silver Staining Gel Western Blot

Figure 3.8 Silver staining gel and western blot results for 20mL expression of NpuC*-

SEAP construct with SA leader sequence in HEK293EBNA cells and purification using split-intein mediated system.P: Pellet; S: Supernatant; 2X: 2 times concentrated after buffer exchange; FT: flow through; W: wash pH8.5 and pH6.2 combined; 0hr: 0 hour resin sample; 1hr: 1 hour resin sample; 2hr: 2 hour resin sample; 5hr: 5 hour resin sample; E1: elution fraction 1 after 5 hours; E2: elution fraction 2 after 5 hours;

PNGaseF: PNGaseF treatment of elution fraction 1 sample; PostE: resin sample after elution.

66

Due to the low expression titers from transient transfection of the adherent cell line and the bulk existence of bovine serum albumin protein in the culture media, the purity of the final product was low as seen in the silver staining result. Moreover, as the molecular weight of SEAP protein is very close to bovine serum albumin protein, the identity of the bands needed to be verified on a western blot that specifically detects the His-tagged target protein SEAP. The western blot result indicated successful expression and intein- mediated purification as we observed strong bands in the elution samples. Moreover, the size shift of the elution sample after PNGaseF enzyme treatment, which digests N- glycans, indicated proper glycosylation of SEAP. Since the molecular weight of glycosylated SEAP protein is around 75 kDa, the separation between precursor and cleaved product with 3.6 kDa differences could not be clearly detected even on 8% SDS-

PAGE gels. However, the elution fractions recovered from the split-intein mediated purification indicated successful cleavage of the precursor to the tagless target protein.

The same leader sequence from human serum albumin protein (SA) was also tested for the expression of intein-tagged interferon alpha 2b protein (IFN a2b). Interferons are cytokines with major therapeutic applications for treating viral and other microbial infections. The interferon alpha 2b is commercialized under the tradename A and produced in E. coli system, which requires refolding into its native conformation. The refolding process often results in low recovery and loss of activity. In addition, the E. coli produced interferon alpha 2b lacks the post-translational O-glycosylation, which leads to a shorter serum half-life when used as a therapeutic. Mammalian cells are preferable

67

alternative hosts for the production of fully glycosylated IFN a2b. Hence, we can use it as another value-added target for validating the compatibility of the intein-mediated purification system with mammalian expressions. In Figure 3.9, the western blot showed two strong expression bands in the supernatant sample, which were all captured on the intein column and detected by the anti-his antibodies during western blotting. This suggested that both were IFN a2b related species, but with heterogenous post- translational modifications that accounts for differences in electrophoretic mobilities.

After cleavage on intein column for 5 hours at room temperature, purified proteins were recovered in the elution fractions and digested with PNGaseF enzyme. Strong elution bands were detected on the western blot, indicating successful purification of the target protein using intein-mediated purification method. Since interferons are reported to be O- glycosylated and sialylated, no size change was observed after PNGaseF treatment as expected. Nevertheless, Neuraminidase and O-glycosidase need to be purchased for further verification of the glycosylation patterns.

68

Figure 3.9 Western blot result for 20 mL expression and purification of NpuC*-IFN with

SA leader sequenceS: Supernatant; 0hr: 0 hour resin sample; 5hr: 5 hour resin sample;

E1: elution fraction 1 after 5 hours; E2: elution fraction 2 after 5 hours; PNGaseF:

PNGaseF treatment of elution fraction 1 sample;

3.4 Discussions

The development of purification platforms for production of recombinant proteins also depends on an efficient expression system. Previous work has demonstrated efficient expressions in E. coli systems and in vitro translation systems for intein tagged precursor

69

proteins. In this work, we demonstrate significantly enhanced expression and secretion for intein-tagged target proteins from mammalian cell expressions by using alternative leader sequences. The identification of heterogenous leader sequences would potentially have universal applicability in improving the secretion of all intein-tagged precursor proteins.

Based on literature searches, twelve leader sequences were designed and used to secrete the model precursor protein, NpuC*-SEAP. Compared to the control leader sequence IK, the secretion level of the model precursor protein was increased 4 to 10-fold. By co- expressing green fluorescent protein as an internal control, the observed effects would not be compounded with different transfection efficiencies. These findings have confirmed that the secreted level of proteins can be significantly influenced by replacing the leader sequence with that of a heterogenous protein or with a designed leader sequence.

However, the effect of the spacers was inconclusive due to the case-dependent behavior with different signal peptides. The work reported by Gulin and co-workers also suggested that the secretion of the protein could be significantly improved or adversely compromised with only one or two amino acids in between the signal peptide and the mature protein sequence in a case dependent manner[92].

Nevertheless, this approach has determined 6 leader sequences that have significantly improved the secretion level of the model protein. It is anticipated that it could also be used to secrete other precursor proteins that are of significant biological and clinical values in the mammalian cell systems. One of the optimal leader sequences, SA, was

70

used for the expression and secretion of four other target proteins and all of them showed strong expression bands from HEK293F cells. This suggests that the leader sequences screened from the study could be used as a universal leader sequence for the expression of potentially all proteins of interest with NpuC* tag in mammalian cell expression systems.

Scale-up expressions were performed for testing the purification using Npu split intein- mediated system. Two target proteins, SEAP and IFN a2b, were successful purified using the intein column from mammalian cell expressions. PNGaseF enzyme treatment indicated proper glycosylation of SEAP protein and the apparent molecular weight of

IFN a2b observed on the western blot suggested that it undergoes different post translational modifications when produced in HEK293E cells. Although further study is needed for determining whether HEK293 produced IFN a2b is O-glycosylated and sialylated as previously reported[96], the strategy adopted in this study could be an alternative way for the production of IFN a2b in an active and glycosylated form.

In conclusion, these results together suggest that the intein tag is compatible with the mammalian expression system by incorporating alternative leader sequences and does not interfere with the post-translational modifications. This is the first intein-mediated purification strategy that has been demonstrated for the successful purification of protein therapeutics from mammalian cell expressions. This enables the application of Npu intein-mediated purification platform for the production of value-added protein therapeutics.

71

Chapter 4 Characterization of Extein Dependency for the engineered Npu Split Intein

4.1 Introduction

During the development of the Npu split intein purification system, we have come to notice that the cleavage rate of the intein reactions varies among different target proteins as shown and discussed in Chapter 2. This is in accordance with published data from various inteins either for self-cleaving applications or self-splicing applications. It is reported that all characterized inteins exhibit a sequence preference at extein residues adjacent to the splice/cleave site[6,22,97-100]. For splicing reaction, a mandatory catalytic Cys, Ser or Thr residue is required at +1 position (the first residue of the C- extein). In addition, there is a bias for residues in the N- and C-extein sequence by analyzing an extein sequence database for more than 200 inteins[98]. Deviations from these preferred sequences lead to a significant reduction in the splicing activity, limiting the applicability of protein splicing-based methods[99,101]. Similarly, in cleaving mutants, the extein residues at the cleavage junction can either accelerate premature cleaving or completely abolish cleavage[22,97,102]. For example, the IMPACT system developed by New England Biolab Inc. (NEB), which used Sce VMA1 intein for N- terminal or C-terminal cleaving, showed strong preference for the identity of the target

72

protein residues at the cleavage junctions[97]. Residues such as methionine, and glutamine were the most well tolerated extein residues for C-terminal cleavage, whereas the presence of a +1 valine, isoleucine, asparagine, glutamate, lysine, arginine or histidine resulted in a relatively lower cleavage efficiency. Therefore, it is critical for the engineered Npu split intein to be thoroughly characterized for its extein dependency. It will not only provide information about the intein cleavage activity for specific target proteins, but also potentially provide means for achieving consistent cleavage rate.

For C-extein dependency of this engineered Npu split intein, it was systematically characterized by a former graduate student, Ashwin Lahiry. By using a model protein, green fluorescent protein (GFP), the +1 position was varied with 20 different amino acids and characterized for the half-time of the cleavage reaction at pH 6.2. The +2, +3 positions were fixed as the native exteins, phenylalanine (F) and asparagine (N). The result indicated that the cleavage rate was greatly accelerated with large, hydrophobic amino acid residues (Y, F, H, W, L, I, M), whereas small (S, N, A, G) and charged amino acid residues (R, K) were significantly slower, especially glycine (Figure 4.1).

Additionally, the cleavage reaction is completely abolished when proline is at +1 position. This was likely due to the structural alteration of the protein backbone by the proline residue. Using the information from the +1 study, a fractional-factorial library was constructed to test different combinations of extein residues at +1 and +2 with a total of 36 constructs. A similar trend was found in the +1/+2 study and the half-time of the cleavage reaction ranges from less than 15 minutes up to more than 24 hours. The effect

73

of +3 position was also tested and found to impact intein cleavage kinetics minimally except when it has a proline residue.

Figure 4.1 The role of +1 extein residue of the engineered Npu split intein system. (A)

The N-intein (NpuN*) is covalently immobilized on an agarose resin (Thermo scientific

SulfoLink® coupling resin) and the target protein (eGFP) is fused to the C-terminal of the

C-intein (NpuC*) (B) Graph shows t1/2 (h) calculated for C-terminal cleavage of eGFP with different +1 amino acid residues. Note: +1 Proline did not cleave and therefore t1/2

(h) could not be calculated. Data shown here are the average of two independent biological replicates and error bars represent standard error of mean. (From Ashwin

Lahiry’s unpublished paper)

74

Since now we have a better understanding of the C-extein dependency of the engineered intein, the characterization of the N-extein also needs to be carried out to understand the extein effect on the cleaving activity thoroughly. Moreover, to achieve a more consistent performance as a platform technology, strategies need to be employed to rescue the slow cleaving target proteins starting with small or charged amino acids, as well as slow down the cleaving reaction with bulky amino acids to minimize product loss during purification. By exploring and understanding the N-extein dependency of the intein, N- exteins could potentially be used to modulate the cleavage activity. The study would also provide insight into the mechanism of the intein cleavage reaction with the current sensitivity enhancing domain (SED). The information gathered would be beneficial for the future design and optimization of the split intein.

In this work, we first tested the targeted mutations in the sensitivity enhancing domain to determine the critical residue that contributes to the pH sensitivity and the cleavage rate of the current engineered split intein. Meanwhile, -1 extein residue was characterized individually by gel-based approach to investigate its effect on the split intein cleavage activity. Moreover, studies were carried out to further understand the extein influence beyond a single position by randomization of three residues nearest to the N-fragment of the intein, with a mutant library of 8000-amino acid combinations. A Förster resonance energy transfer (FRET)-based high-throughput approach was developed, which offers considerable advantages over typical gel-based approach for individual characterization.

Cyan and yellow fluorescent proteins (CFP and YFP) were used as FRET pairs in the

75

strategy (Figure 4.2). CFP was joined to the front of NpuN* and YFP was joined to the end of NpuC* with a few extein residues in between, respectively. N-extein mutations that enhance or suppress C-terminal cleavage can be rapidly identified by monitoring the in vitro FRET signal decay. This allows for continuous, real-time monitoring of the cleaving kinetics in cell lysates. Using this FRET-based approach, the relationship between N-exteins and cleaving kinetics was investigated for the Npu split intein.

Excite 400 nm Emit 540 nm CFP YFP InN InC

Different N-exteins (-1,-2,-3)

C-terminal Cleavage No C-terminal cleavage

Assembled Intein Assembled Intein No FRET FRET

Figure 4.2 FRET-based screening strategy.

76

4.2 Materials and Methods

4.2.1 Plasmid construction

For different mutants of NpuN fragment, different amino acids codons were added directly onto forward primers to amplify the NpuN genes, resulting in different NpuN intein mutants. The PCR products were inserted with two unique restriction sites, NdeI and XhoI, into a pET vector for protein expression in E. coli.

Plasmids used as templated to amplify cyan and yellow fluorescent proteins (CFP and

YFP) were purchased from Addgene (plasmid #14030 and #14031). The CFP was added in front of NpuN* using NdeI and EcoRI restriction sites. YFP was added after NpuC* using Acc65I and XhoI restriction sites. The C-exteins immediately following NpuC* were FFN, MFN, MTP, ATP, PFN respectively.

The mutant library of -1/-2/-3 exteins was generated by amplifying the NpuN and N- extein residues together by using degenerate primer that contained three NNS (N = A or

T or C or G, S = C or G) codons in place of the -1, -2 and -3 residue codons, followed by ligating EcoRI and XhoI-degisted PCR product and pET-CFP-NpuN plasmid.

The primers used in this work is listed in Appendix A, Table 12.

4.2.2 Shake flask expression

Single colonies were used to inoculate 5 mL LB cultures with 100 g/mL ampicillin. The cultures were grown for 18 to 20 hours at 37 C water bath shaker at 220 RPM. The overnight cultures were diluted 1:100 into 50 to 200 mL of 2X LB media with 100 77

g/mL ampicillin in a baffled flask. The expression cultures were grown at 37 C for 3 to

4 hours until OD600 reaches between 0.6 to 0.8. The cultures were then induced by addition of IPTG with 0.5 mM final concentration. Expressions were then carried out at

16 C, 250 RPM for 20 to 24 hours post induction. Cultures were harvested by centrifugation at 6000 g for 10 min. The pellets were stored at -20 C until use.

4.2.3 24-well plate expression

Single colonies were used to inoculate in a 96-well plate with 500 l of LB media with

100 g/mL ampicillin. The 96-well plate was shaken in a Minitron (INFORS HT) at 200

RPM overnight at 37 C. The overnight culture was diluted 1:100 into four sterile 24-well plates with 2 mL 2X LB media per well supplemented with 100 g/mL ampicillin. The

24-well plates were sealed with AeraSeal breathable seals (Excel Scientific) and agitated in the Minitron shaker at 200 RPM at 37 C for 3 to 4 hours until visibly turbid

(OD600 of 0.6 to 0.8). IPTG was then added into each well by multichannel pipette to a 1 mM final concentration. Expression was carried out for another 4 to 5 hours at 37 C.

After expression, cultures were harvested by centrifugation at 3500 g for 10 min. The pellets were stored at -20 C until use. The cell pellets were resuspended in 200 l per well of B-PER bacterial cell lysis reagents when ready to use. After 10 min incubation in the lysis reagents, clarification was carried out by centrifugation at 3500 g for 15 min at 4 C. The supernatant was then transferred to a 96-well clear assay plate with flat bottom. 78

4.2.4 In solution cleaving kinetics study

The pellets of intein constructs were resuspended in 10% original culture volume of His- tag column buffer (1X PBS, 500 mM NaCl, 10 mM Imidazole). Cell lysis was done by

30 s sonication at 5 – 7 Watts, followed by 30 s chilling on ice for a total of 10 cycles.

The solution was then clarified by centrifugation at 10,000 g for 15 min at 4 C. The clarified lysate was diluted 5-fold into His-tag column buffer and loaded onto a gravity column with Ni-NTA resin (Thermo Fisher). After binding, the column was washed with

10 column-volume of the His-tag column buffer, followed by 10 column-volume of the wash buffer (1X PBS, 500 mM NaCl, 50 mM Imidazole). Then 10 column-volume of another wash buffer (20 mM AMPD, 20 mM PIPES, 200 mM NaCl, pH 8.5 or 6.2) was used to wash out the imidazole on column. The proteins were then eluted from the column by using elution buffers (20 mM AMPD, 20 mM PIPES, 200 mM NaCl, 50 mM

EDTA, pH 8.5 or 6.2). The concentrations of the eluted fragments were determined by

Bradford Assay. The proteins of NpuN and NpuC constructs were then mixed at corresponding pH with a ratio of 3:1. 20 L samples were taken during the cleaving reactions for 1 hours, 2 hours, 5 hours and 18 hours’ time points. The samples were immediately mixed with 20 L of 2X SDS PAGE loading dye and boiled at 95 C for 5 min. The cleavage percentages were analyzed on SDS-PAGE gels and quantified by

ImageJ (NIH). The half-time or the rate constants of the reactions were calculated by assuming first-order reactions:

79

푑[푝푟푒푐푢푟푠표푟 푝푟표푡푒푖푛] − = 푘[푝푟푒푐푢푟푠표푟 푝푟표푡푒푖푛] 푑푡

ln (2) 푡 = 1/2 푘

4.2.5 In vitro FRET assay

45 L of the clarified lysate from each mutant was mixed with 55 L of corresponding assay buffers (20 mM AMPD, 20 mM PIPES, 200 mM NaCl, pH 8.5 or 6.2). The NpuC*-

YFP fragments in corresponding buffers were added with a volume of 100 l into each well. FRET measurements were carried out by using a Biotek Synergy 2 plate reader.

Sample wells were excited at 395/25 nm with emission measured at 485/20 nm, from

CFP, and at 528/20 nm from FRET. To calculate FRET ratio, the fluorescence at 528 nm were divided by the fluorescence values at 485 nm. The rate constant (k) values were derived by fitting FRET curves to the non-linear equation:

퐹푅퐸푇 = (푏1 − 푏2) ∗ 푒−푘푡 + 푏2

The signal to noise ratio (S/N) is calculated using the following equation:

푀푒푎푛 푃표푠 푐표푛푡푟표푙 − 푀푒푎푛 푁푒푔 푐표푛푡푟표푙 푆/푁 = √(푃표푠 푐표푛푡푟표푙 푆푇퐷)2 + (푁푒푔 푐표푛푡푟표푙 푆푇퐷)2

The Z’ value is calculated using the following equation:

3(푃표푠 푐표푛푡푟표푙 푆푇퐷 + 푁푒푔 푐표푛푡푟표푙 푆푇퐷) 푍′ = 1 − 푀푒푎푛 푃표푠 푐표푛푡푟표푙 − 푀푒푎푛 푁푒푔 푐표푛푡푟표푙

80

4.3 Results

4.3.1 N-extein characterization by in solution cleaving kinetics

Targeted mutation in sensitivity enhancing domain

As discussed in Chapter 2, previous studies have shown that the Npu split intein only exhibits strong pH sensitivity with the addition of the sensitivity enhancing domain

(SED) at the N-terminus of NpuN. The sensitivity enhancing domain consists of five amino acids, GDGHG. To better understand the role and function of it, we mutated the key residues, which are aspartic acid (D) residue and the histidine (H) residue respectively, to study their effect on the pH-sensitivity and cleavage activity of the modified split intein. Figure 4.3 shows that both the histidine residue and the aspartic acid residue contribute to the pH sensitivity of the intein, as mutation of either of them did not eliminate the cleavage rate differences at pH 8.5 and pH 6.2 completely. From the calculated half-time of the cleavage reactions at different pH, aspartic acid residue had a stronger effect on the pH sensitivity of the intein than histidine. This suggests that a more systematic study is needed to understand the N-extein effect more thoroughly.

81

Figure 4.3 The effect of key residues in the sensitivity enhancing domain. (a) SDS-PAGE results of cleavage kinetics of GDGHG-NpuN or GDGAG-NpuN or GAGHG-NpuN with

NpuC*-SK at pH 8.5 and pH 6.2 over the time period of 20 hours. (b) Calculated half- time of the reactions by assuming first-order reaction in all cases.

Effect of -1 residue on the cleavage rate and pH sensitivity of the intein

Consequently, the -1 position was first systematically characterized using a similar gel- based approach for characterization of the C-exteins, where the green fluorescent protein was used as the model target protein. The -2 position was fixed as Glycine (G) and -3 position was Methionine (M), which was translated from the start codon. The NpuN mutants with different -1 residues were purified and added to NpuC*-GFP protein solution in a three-fold excess at pH 8.5 and 6.2. Figure 4.4 shows the rate constants

82

calculated by gel-based approach for cleavage reactions of inteins at both pH 8.5 and 6.2 with different amino acids at -1 position.

Figure 4.4 Rate constant k of the cleavage reactions under both pH 6.2 and 8.5 with different -1 extein residues.

Based on the result, it was determined that the amino acid at -1 position had a significant effect on the cleavage rate, as well as the pH sensitivity of the intein. Most of the amino acids at -1 position can offer pH sensitivity to the intein, although to different levels.

Among 20 amino acids, aspartic acid offers the fastest cleavage rate and greatly enhances pH sensitivity of the intein, which is consistent with the previous result that it contributed 83

most in sensitivity enhancing domain for the engineered Npu split intein’s activity and pH sensitivity. Interestingly, isoleucine reversed pH sensitivity of the split intein, with faster cleavage rate at pH 8.5 than at pH 6.2. However, this needs to be further verified with a faster cleaving target protein before drawing the conclusion, due to the inaccuracy of the method when determining the half-time for slow cleavage reaction.

Additionally, the half-time of the cleavage reactions was found to be highly correlated with the hydrophobicity index of the amino acids at -1 position (Figure 4.5a), but not with the residue charge status (Figure 4.5b). With a hydrophilic residue at -1 position, such as aspartic acid (D) and asparagine (N), the split intein results in a much faster cleavage rate at pH 6.2 and possesses pronounced pH sensitivity.

a b 120 120 11 100

0

.

7

100 100 10 90

H

80 ) p 80 80

9

h

t

)

(

a

h 70 60 60

( 2 I 8

x

.

p

e

2 6 60

.

d 40 40

d

6 7

i

H

n

i c 50

p

H

20 20 a

y

6 t

p

t

o

a i 40

t

c

n

2

i

a / 0 0 i 5

1

b 2 30

/

m

T

o -20 D N Q K T E H G R S V A L C M P W F Y I -20 1

A h 4 T 20

p

o -40 -40 3 r 10

d

y -60 -60 2 0 H -80 -80 D N Q K T E H G R S V A L C M P W F Y I Amino acid residue at -1 postion Amino acid residue at -1 postion

hydrophobicity index at pH7.0 Half-time at pH6.2 Half-time at pH6.2 amino acid pI

Figure 4.5 (a) Overlay plot of cleavage rate at pH 6.2 and hydrophobic index of -1 residue; Note: proline’s hydrophobicity index data is not available. (b) Overlay plot of cleavage rate at pH 6.2 and amino acid pI of -1 residue.

84

Hypothesis for the mechanism of the exteins’ role during intein cleavage reactions

The results that we collected so far provide us with insight for the structural role and mechanism of the N-/C-exteins for the intein cleavage reactions. Along with the published data, the hypothesis for the C-extein effect would be that a bulky hydrophobic residue would help histidine 125 in the split intein to position appropriately and reduce the distance between this His125 and terminal asparagine to facilitate the asparagine cyclization step. The His125 is completely conserved in the DnaE split intein family and has been reported as a general acid or base in the branched intermediate formation that stimulates the cleavage[103]. Figure 4.6a shows the positioning of Phe+2 and His125 in the crystal structure of Ssp DnaE split intein, which is the only high- resolution structures of inteins bearing the native C-extein residues. Shah and his coworkers also performed Molecular Dynamics simulations on the Npu split intein, indicating that the presence of a sterically bulky amino acid at the +2 position in the C- extein acts to constrain the motions of the histidine side chain and thus lead to a more compacted arrangement inside the catalytic core[99]. For N-exteins, it is likely that the residues help arginine 50 to position appropriately through hydrogen bond formation, which also plays an essential role in the charge relay of the final asparagine cyclization step. Figure 4.6b shows the relative positioning of -1 residue with arginine 50 when it is aspartic acid (D) or phenylalanine (F) by UCSF Chimera software, respectively. Aspartic acid at -1 position can form hydrogen bonds with Arg50 but phenylalanine cannot, resulting in the observed cleavage rate differences from in solution studies. Additionally,

85

because of the conserved histidine and penultimate histidine that we engineered into the cleaving mutant of Npu split intein, the charge relay for the asparagine cyclization might be more favorable at lower pH, thus leading to the observed pH sensitivity of the engineered split intein. However, crystal structure of the modified split intein and quantum mechanical studies are needed before we can get definitive answers for the intein C-terminal cleavage mechanism.

a b

H125 and C-extein (+2) interaction -1D forms hydrogen bonding with 50R -1F cannot form hydrogen bonding with 50R Shah NH, Eryilmaz E, Cowburn D, Muir TW. Extein Residues Play an Intimate Role in the Rate-Limiting Step of Protein Trans-Splicing. Journal of the American Chemical Society. 2013;135(15):5839-5847.

Figure 4.6 (a) Crystal structure of the Ssp DnaE intein (PDB entry 1ZDE) showing the close packing of His125 and Phe+2[99]. (b) Modified structure using UCSF Chimera based on the Npu DnaE intein (PDB entry 4QFQ) showing the relative position of Arg50 and -1 residue.

86

4.3.2 N-extein characterization using high-throughput method based on Förster resonance energy transfer (FRET)

To extend the effect screening from -1 residue to -1/-2/-3 residues, a high throughput screening method was established based on Förster resonance energy transfer (FRET). It allows us to rapidly look at a much more complete design space than the previous gel- based approach with individual characterization. For identification of possible extein effect on the intein cleavage activity, we constructed a system in which the FRET-active fluorophore pair, cyan and yellow fluorescent protein (CFP and YFP), was used. As described in Figure 4.2 in the introduction, we attached the CFP upstream of the intein N fragment and attached YFP downstream of the intein C fragment. These two parts were expressed separately in E. coli, and their lysates were combined for FRET measurement.

Theoretically, the quantum yield of the energy transfer, which is called the FRET efficiency (E), depends on the donor-to-acceptor separation distance r with an inverse 6th-

1 power low. The equation is 퐸 = 푟 , where R0 is the distance of donor and acceptor at 1+( )6 푅0 which the energy transfer efficiency is 50%. Typically, the distance is in the range of 1 to

10 nm, where the chromophores distance of CFP and YFP is about 2.4 nm assuming they are centrally buried in the b-barrel structure[104]. Therefore, when the intein exhibits C- terminal cleavage, the YFP will be released from the complex and be relatively far with

CFP, resulting in no FRET. When there is no cleavage, association and folding of the intein bring the two fluorophores together, resulting in energy transfer. By monitoring the

87

FRET signal decay over time, we can do real-time tracking of the cleaving kinetics of the intein.

Method validation

To establish and validate the relationship between FRET delay and intein cleavage activity, we tested several intein variants with known cleavage behavior based on previous on-column cleavage studies. The CFP-NpuN*, containing the SED, was reacted with five different counterpart constructs with different C-exteins inserted in between

NpuC* and YFP. The expected half-lives of the reactions under pH 8.5 and pH 6.2 were listed in Table 5. The C-exteins, PFN, generates the inactive construct that abolishes the intein cleavage, while the C-exteins, FFN, generates the fastest cleaving intein variant.

Table 5 Expected half-life of the cleavage reactions based on on-column cleavage studies

t1/2 (h)

C-exteins pH 8.5 pH 6.2 FFN 1.0 <0.25 MFN 7.0 0.5 MTP N.R. 6.5 ATP N.R. >20 PFN N.R. N.R. Note: N.R.: not reportable.

88

After the expression of these constructs in E. coli, the reactions were carried out in a 96- well format. By excitation of the reaction mixture at 395  25 nm, the fluorescence emission was monitored at 528  20 nm (YFP) and 485  20 nm (CFP). The emission at the longer wavelength comes from the acceptor YFP and is mostly due to FRET, whereas the emission at the shorter wavelength comes from the donor CFP. The ratio of these two emissions was defined and used as the FRET ratio in this study. In this way, it could account partially for the well-to-well variability and bring the signals to a similar level for comparison. The loss of the FRET ratio was tracked over the time course of six hours, and Figure 4.7 shows the plot of FRET ratio versus time in minutes for reaction mixtures with different C-exteins.

From the results, the trends matched perfectly with the expected results. When the C- exteins were FFN, fastest decay of the FRET ratio was observed among the reactions at pH 8.5. Meanwhile, at pH 6.2, the cleavage reaction with C-exteins FFN went to completion within the assay preparation time. This resulted in a flat line with the lowest

FRET ratio, which came from the background emission of donor and acceptor. In the case of PFN being the C-exteins, it resulted in a non-cleaving construct, leading to the maximum level of FRET. The assay’s signal to noise ratio and Z value were calculated and summarized in Table 6, assuming the reaction with PFN at pH 6.2 as the positive control and the reaction with FFN at pH 6.2 as the negative control. The results indicated this FRET-based high-throughput screening to be an excellent assay.

89

Figure 4.7 Plot of FRET ratio versus time for intein variants reactions with different C- exteins

Table 6 Signal to noise ratio and Z’ value of the FRET assay

Value S/N 33.79 Z’ 0.882

90

However, we saw a slight increase in the trend when the C-exteins being PFN, MTP, or

ATP at pH 8.5, which were non-cleaving or slow cleaving constructs. Studies were carried out to determine the reason for these increasing trends at pH 8.5. This could due to the photobleaching from the CFP fluorophore during repeated excitation or slow association of the two intein fragments. Therefore, a control construct was designed and tested, where the split intein was genetically fused as a contiguous intein. CFP and YFP were added in the upstream and downstream of it. PFN was used as the C-exteins, resulting in a construct with no C-terminal cleavage. From the result, although the FRET ratio was higher due to shorter distance between the fluorophores, similar signal decay was observed under similar conditions. Hence, the increasing trend was most likely due to a relatively higher photobleaching rate of the fluorophore at pH 8.5 than the cleavage rate. It was determined that the exact cleavage rate at pH 8.5 was compounded with the photobleaching rate. Apart from it, this FRET-based method is still an excellent high- throughput screening tool, with the capability of determining the relative cleavage rate for a large mutant library. It helps to understand the effect of the N-exteins thoroughly and thus rapidly identify mutants with desired cleavage activity.

Mutants screening

After validating this FRET-based method, we built the N-extein libraries by replacing the

SED and simultaneously mutating all combinations of the -1, -2 and -3 residues (8000 combinations) using degenerate primers. After expression of the selected mutant in 24- well plates, this high-throughput FRET-based assay was engaged. Each mutant was

91

mixed with an equal amount of NpuC*-YFP lysate, with MFN as the C-exteins, at both pH 8.5 and 6.2 in 96-well plates. A total of six rounds with roughly 600 mutants were screened using the FRET-based assay. Figure 4.8 showed two representative screening results.

Figure 4.8 Representative FRET screening results.

By plotting the FRET ratio versus time, we can quickly identify the mutants with slower or faster cleaving rate compared to the control and by looking at individual mutant, we are able to determine its pH sensitivity. After each round of screening, we selected about four constructs in average in each of the slow, medium, and fast cleaving mutant category

92

for further characterization. As an example, Figure 4.9a shows the FRET decay curves for the selected mutants, each representing a slow, medium and fast cleaving mutant.

a FRET vs. Time (min) EDE b Plot C construct LLF 0.70 MFN TPP 0.69 0.68 0.86 0.85 0.67 0.84

T 0.66

E

0.83 R 0.82 F 0.65 0.81 0.64 0.80 0.79 0.63

8

.

0.78 5 0.62 0.77 0.76 0.61 0.75 0 50 100 150 200 250 300 350 Time(min) 0.74 0.73 Parameter Estimate Low High 0.72 b1 0.6923279023 1.01378 3.04134 0.71 b2 0.6264972164 0.83128 2.49385 0.70 b3 -0.034255488 -0.0084 -0.0028

p

T 0.69

E 0.68 H

R

F Plot 0.86 0.85 0.84 0.72 0.83 0.82 0.70 0.81 0.80

T 0.68

0.79 E

6

R

.

F

0.78 2 0.77 0.66 0.76 0.75 0.74 0.64 0.73 0.72 0.62 0.71 100 150 200 250 300 350 0.70 Time(min) 0.69 Parameter Estimate Low High 0.68 b1 0.7875546611 1.01378 3.04134 0 20 40 80 120 160 200 240 280 320 b2 0.5595422467 0.83128 2.49385 Time (min) b3 -0.001582703 -0.0084 -0.0028

Figure 4.9 (a) examples of slow, media and fast cleaving mutant screened from the library using FRET assay; (b) non-linear fitting of FRET decay curves using JMP software.

To link the phenotype with genotype, we sequenced the selected mutants for determining their N-extein sequences. Moreover, using JMP software, we fitted the FRET decay

93

curves with a non-linear equation for determination of the rate constants of the cleavage reactions:

퐹푅퐸푇 = (푏1 − 푏2) ∗ 푒−푘푡 + 푏2

A total of 50 mutants were sequenced, and their rate constants of the cleavage reactions at pH 6.2 were determined. Figure 4.10 summarizes the screening results by plotting the rate constants by their N-extein sequences.

Figure 4.10 Rate constants of the sequenced mutants screened from the N-extein library by using the FRET assay.

94

Based on the results, we found that the trend correlated in general with the results in the -

1 residue study that obtained from a gel-based approach. The mutants with higher rate constants consist of multiple hydrophilic residues (D, N, H or Q), while the mutants with extremely small rate constants contain at least one bulky hydrophobic residue (L, F, Y or

W). Interestingly, the two best ones contain a proline residue in either -2 or -3 position.

The proline-induced protein backbone conformation change in those positions might be beneficial for the cleavage reaction. The gel-based approach needs to be employed for a more accurate quantitation and comparison of the cleavage rate among the selected mutants. Molecular dynamics studies could also be carried out to provide more insight into the structure roles of the N-exteins and their potential interactions with key residues in the catalytic core of the intein.

So far, we have identified mutants with different N-exteins that can greatly slow down the cleavage reactions. They can be potentially used as alternative ligands for the on- column purification of target proteins, which mature sequence begins with multiple hydrophobic residues. On the other hand, we have not obtained a mutant that exhibits a significantly faster cleavage rate than the control construct with the sensitivity enhancing domain as the N-exteins. This could due to the optimal composition of the residues in the sensitivity enhancing domain, which already contains two hydrophilic residues (D and

H). Since the results suggested that the N-extein residue effect is cumulative, the mutant library could be expanded from three amino acid combinations to five amino acids

95

combinations as the N-exteins to find such mutant with extremely fast cleavage rate at pH

6.2 while still retaining the pH sensitivity.

4.4 Discussions

In this work, we explored the effect of the N-exteins on the intein activity by performing individual gel-based characterization, as well as developing a high-throughput FRET- based assay. We saw a cumulative effect of the aspartic acid and histidine residues for the pH sensitivity and cleavage activity in the current sensitivity enhancing domain. By systematically studying the dependency at -1 position, we found that different residue contributed to the pH sensitivity and cleavage activity to a different extent, with the hydrophilic residue offering the fastest cleavage and hydrophilic residue offering the slowest cleavage. Meanwhile, a high-throughput assay was developed based on the

FRET-active pair, CFP and YFP, to expand the exploration to a larger design space. With this system, the intein cleavage activity can be tracked in vitro by real-time monitoring the FRET ratio loss in the cell lysates. The increased throughput of the FRET screen allowed us to search for extein effects in a much greater number of the mutant library that had been possible previously in gel-based approaches.

The results of this study indicate that the exteins proximal to the intein active sites can have pronounced effects on C-terminal cleavage activities. These effects could lead to significantly improved or compromised cleaving efficiencies. The intein cleavage activity

96

can thus be modulated by the N-extein sequences to provide desirable processing time during purification.

The information collected also provides us with insight regarding the mechanism of the activity and pH sensitivity of the split intein. In the future, molecular dynamics simulations could be done to help us better understand the mechanism of the engineered split intein system and provide information for the rational design of the current split intein for further improvement of its cleavage properties.

The FRET assay can be adapted to study the cleavage properties of any inteins in a real- time and high-throughput manner. It enables easy and fast screening of intein variants that possess unique properties. One limitation for the FRET system is that it requires external illumination to initiate the fluorescence energy transfer, which would lead to background noise from direct excitation of the acceptor and the photobleaching of the two fluorophores. It limits its application for long-time tracking, and the cleavage activities might be compounded with the photobleaching rate. A potential optimization is by using a similar strategy but with bioluminescence resonance energy transfer

(BRET)[101,102]. The donor in a BRET assay uses a bioluminescent luciferase typically from Renilla reniformis (36 kDa) rather than CFP for producing an initial photon emission compatible with YFP. BRET has also been developed using a better luciferase enzyme and commercialized as NanoLuc from Promega. This luciferase, engineered from the deep-sea shrimp Oplophorus gracilirostris, is smaller (19 kDa) and 150X brighter than the more commonly used luciferase from Renilla reniformis[105]. The

97

strong signal could potentially reduce the overlap of the donor and receptor emission spectra, thus providing a better signal to noise ratio of the assay.

In conclusion, although more work needs to be done before drawing definitive conclusions, the information collected in this work is very valuable and applicable for guiding the optimization of cleavage reactions of specific target proteins. The findings have significant implications for intein applications, especially for downstream purification and the intein mechanism study.

98

Chapter 5 Site-specific and oriented protein labeling on nanodiamonds using self-splicing

split inteins

5.1 Introduction

Over the past years, a lot of efforts have been focused on the development of fluorescent labeling to analyze biological processes, as well as track and localize individual drugs, proteins and other molecules. So far, the most widely used fluorescent markers are fluorescent proteins or organic dyes, such as rhodamine, coumarin and cyanine dyes and their derivatives[106]. While most of them exhibit acceptable quantum yields, many undergo photobleaching during continuous excitation. Additionally, the intensity of the fluorescence highly depends on the local environment and can be influenced by pH, buffer and temperature. The only fluorophore that is approved by FDA for clinical use is the organic dye indocyanine green (ICG). However, ICG suffers from very low fluorescence quantum yield, low stability and non-specific binding, which restricts its application for long-term tracking in live cells.

Fluorescent nanoparticles have gained popularity over the recent years and can overcome some of the limitations of organic dyes. For an ideal fluorescent nanoparticle to be used in cell labeling or imaging applications, it should meet most, if not all, of the following

99

criteria: (1) biocompatible; (2) low toxicity; (3) highly sensitive; (4) high fluorescence quantum yield; (5) absence of blinking or photo bleaching; (6) high surface to volume ratio; (7) modifiable surface chemistry; (8) excitable and detectable in the biological transparency window. For all the nanoparticles being developed now, they can be categorized into a few distinct groups, including quantum dots, dye-doped nanospheres, gold nanoparticles, carbon dots and upconverting nanoparticles. Quantum dots (QDs) can be excited in the ultraviolet region, and then emit photons at longer wavelengths.

Compared to organic dyes, QDs can be made in an ultrasmall size (<2 nm) with non- quenching fluorescence and high fluorescence quantum yields. However, QDs have limited long-term in vivo applications in the biological sciences mostly due to the known toxicity of the Cd that forms the core of most QDs. While it is possible to coat the QDs with a protective shell, their biocompatibility is still questionable due to concerns about potential leaking of toxic Cd2+[107,108]. On the other hand, upconverting nanoparticles and gold nanoparticles have good biocompatibility and photostability but are limited by low fluorescence quantum efficiency. The emergence of fluorescent nanodiamonds

(FNDs) has started a new era in cell labeling, imaging and tracking with nanoparticles.

They are proposed to be the best alternatives that can meet the ideal fluorescent marker’s criteria, owing to their unique optical and chemical properties. They are biocompatible, infinitely photostable, and fluoresce in a variety of wavelengths, including the near- infrared (NIR) region [109-114]. When imaging or tracking in cells using organic dyes or fluorescent proteins, the process is often affected by the high autofluorescence

100

backgrounds from cell endogenous components in the range of 300 to 600 nm[115]. The negatively charged nitrogen-vacancy defect center (NV- center) in diamond is an ideal optical chromophore for biological imaging applications. These NV centers are formed during the irradiation and thermal annealing of diamond powders containing nitrogen atoms as impurities[116]. They absorb strongly at around 560 nm and emit light efficiently at around 690 nm, which is well separated from cell autofluorescence.

Additionally, the NV centers are infinite photostable with no photobleaching or blinking been reported over an extended period of exposure[117].

Figure 5.1 fluorescent nanodiamond and its negatively charged nitrogen-vacancy defect center (NV- center) structure.[109]

101

Although fluorescent nanodiamond is proven to be a promising tool in the cell labeling and imaging applications, a critical step that still exists is to conjugate biologically active agents to the particles’ surfaces. The surfaces of FNDs are tunable, and can be modified to contain carboxyls, alcohols, and sulfhydryl groups, as well as bio-orthogonal conjugation groups. Existing methods for creating FND-protein conjugates usually use random chemical coupling with reactive groups (-COOH, -NH2, -SH) on the protein surface. However, the bioactivity of the conjugated molecules is usually compromised due to the adopted random coupling strategies. Moreover, it is generally not possible to control the orientation of the targeting protein on the nanoparticle, resulting in a heterogeneous population. Therefore, it would be beneficial to develop a platform technology that allows for the site-specific conjugation of proteins onto nanoparticles, with uniform orientations for proper functionalities.

The aim of this part is to take advantage of split intein-mediated ligation to develop simple and robust bioconjugation technologies for creating protein-labeled fluorescent nanodiamonds. This work is in collaboration with Columbus Nanoworks, which has extensive experience and expertise in these fluorescent nanodiamonds production and applications. These fluorescent nanodiamonds have recently played a central role in new discoveries due to their unique features and Columbus Nanoworks is developing such fluorescent nanodiamonds for real-time, photostable fluorescence monitoring of proteins and cells, and they can produce highly functionalized fluorescent nanodiamonds (20 to

100 nm) on a commercial scale.

102

Split intein-meidated protein trans-splicing (PTS) is proven useful for various applications, including protein semi-synthesis[118,119], protein cyclization[19], segmental isotopic labeling[120,121], and conjugation of QDs to biomolecular targets[58]. As mentioned earlier in the introduction chapter, the amino acids strictly required for this PTS process are a cysteine or serine at the first position of the intein, referred to as the 1 position, and a cysteine, serine or threonine immediately downstream of the intein, referred to as the +1 position (Figure 5.2)[9].

Figure 5.2 General scheme of protein trans-splicing with the possible cysteine, serine, and threonine residues at the 1 and +1 positions of the intein or the C-terminal extein, respectively.

103

The smaller one of the components often can be prepared synthetically to be attached to the probes of interest. The other component can be expressed as a fusion to a target protein using recombinant DNA technology. Split inteins allow platform approaches to be developed for protein modification or conjugation without complex and target-specific optimizations.

Figure 5.3 shows the proposed strategies for achieving N-terminal or C-terminal oriented bioconjugation onto the nanodiamond surfaces. For N-terminal oriented conjugation, a novel GOS-TerL split intein is chosen as the self-splicing tool. It was identified from metagenomics databases and reported as the first intein harboring the atypical combination of Ser1 and Cys+1[13]. Inteins with atypical split site close to the N- terminal end of their sequence are very rare to find naturally. A short N-fragment can be chemically produced for solid phase peptide synthesis. The fact that the GOSN fragment only has 37 amino acids and is free of cysteine residues makes this intein an ideal candidate for simple and specific conjugation of its N-fragment with a synthetic fluorophore[122] or onto the modified solid surfaces by incorporating a single cysteine residue at its N-terminus. The unique cysteine at the protein terminus can be specifically modified using classic bioconjugation schemes for thiol moieties, for example, using maleimide or iodoacetamide chemistries[123]. Also, the GOS-TerL split intein can undergo fast protein trans-splicing under ambient temperature and without requiring a denaturation step[13], which makes it a very simple and efficient tool for N-terminal oriented ligation of biomolecules to intein-tagged solid surfaces, like nanodiamonds. For

104

C-terminal oriented bioconjugation, GP41.1 split intein is utilized as the splicing tool. It was identified from the same metagenomics database as Gos-TerL intein and reported to be the fastest splicing intein discovered so far[15]. The GP41.1C fragment consists of 36 amino acids and is free of any cysteine residues. Similarly, a single cysteine residue can be incorporated at its C-terminus, which makes it a perfect candidate for building C- terminal oriented conjugation platform. It was shown that the splicing reaction can be completed within 20-30 seconds at elevated temperature and it is very tolerant to a broad range of reaction conditions.

Figure 5.3 Split intein-mediated bioconjugation scheme for N/C-terminal oriented attachment of protein of interests. 105

Split intein-mediated protein trans-splicing would allow for precise attachment of oriented proteins to the nanodiamond surface, where the size of the fluorescent nanodiamonds and protein label can be chosen according to specific purposes. With this work, we are hoping that these approaches will significantly reduce the time and effort needed to create nanoparticle-protein conjugates, and provides more controllability when applied to various applications in biomedical researches.

5.2 Materials and Methods

5.2.1 Peptide synthesis

GosN peptide was chemically synthesized at Ohio Peptide LLC, including an N-terminal

Cystein, a G4S linker and its five natural N-extein residues (KVEFE). The synthesized peptide sequence is as follows:

CGGGGSKVEFESISQESYINIEVNGKVETIKIGDLYKKLSFNERKFNE. GP41.1C peptide was chemically synthesized from New England Peptide, including its native C- extein residues (SSSDV), a G4S linker and a C-terminal Cysteine. The synthesized peptide sequence is as follows: MMLKKILKIEELDERELIDIEVSGNHLFYANDILTHNSSSDVGGGGSC.

5.2.2 Plasmids construction

The GosC intein segment and its five natural C-extein residues (CEFLG) was codon optimized for high level expression in Escherichia coli. The GosC -CEFLG gene

106

sequence was synthesized by Integrated DNA Techonology, Inc. (IDT), which was later amplified by primers and inserted into an expression vector derived from the pET21 plasmid using NdeI and EcoRI restriction sites. The sequence of the protein of interest, super-folder green fluorescent protein (sfGFP), was inserted using EcoRI and XhoI to create the plasmids for expression of intein fusion proteins. The encoded sequence is

(GosC sequence underlined):

(NdeI)MHHHHHHKLPESVVKNNINLKIETPYGFENFYGVNKIKKDKYIHLEFTNGE

KLKCSLDHPLSTIDGIVKAKDLDKYTEVYTKFGGCFLKKSKVINESIELYDIVNSG

LKHLYYSNNIISHN(MfeI)CEF(EcoRI)LG-[target protein]-(XhoI). The pCIRC-GP41.1 plasmid was purchased from Addgene (Plasmid #74227). GP41.1N and its five natural N- extein residues (TRSGY) was amplified by primers and inserted into an expression vector derived from the pET21 plasmid by means of Gibson assembly. NdeI and EcoRI restriction sites allowed for sfGFP to be cloned in fusion to the intein. The encoded sequence is (GP41.1N sequence underlined): (NdeI)-[target protein]-(EcoRI)

TRSGYCLDLKTQVQTPQGMKEISNIQVGDLVLSNTGYNEVLNVFPKSKKKSYKIT

LEDGKEIICSEEHLFPTQTGEMNISGGLKEGMCLYVKE-(XhoI)HHHHHHS.

The primers used in this work are summarized in Appendix A, Table 13.

5.2.3 Protein expression

Protein expressions were performed in the Escherichia coli strain BLR (DE3).

Transformed cells containing the expression plasmids were cultured in 5 mL Luria Broth

(LB) media supplemented with 100 µg/mL ampicillin at 37 C for 16 to 18 hrs. The 107

cultures were diluted 1:100 (v/v) into 2X LB media supplemented with 100 µg/mL ampicillin in Thomson’s (Oceanside, CA, USA) UltraYield™ Flasks. The cells were then grown at 37 C for 2 to 3 hrs until OD600 reached to 0.6–0.8. Protein expression was then induced by addition of 1 mM isopropyl β-D-1-thiogalactopyranoside (IPTG) and allowed to grow at 16 ◦C for 24 h.

5.2.4 IMAC purification and dialysis

The precursor proteins were purified using immobilized metal affinity chromatography

(IMAC). The IMAC column was first equilibrated in binding buffer (1X PBS, 500 mM

NaCl, 10 mM Imidazole) and then the clarified lysate containing the precursor protein was applied on column. The column was washed with 10 column-volume of the binding buffer, then followed by 10 column-volume of washing buffer (1X PBS, 500 mM NaCl,

50 mM Imidazole). Finally, 4 column-volume of elution buffer (1X PBS, 500 mM NaCl,

50 mM Imidazole) was used to promote elution of the his-tagged protein with each column-volume collected separately. The second elution fraction or all pooled fractions were dialyzed into splicing buffers to be used later in bioconjugation process. Usually, the dialysis was carried out using Thermo Scientific Pierce 96-well 10K MWCO microdialysis devices for processing volume of 100 L. For larger volumes, Slide-A-

Lyzer cassette (0.1-0.5 mL or 0.5-3 mL) from Thermo Fisher was used with 10K

MWCO. The dialysis was done at 4 C for overnight with dialysis buffer changed after

108

the first hour. If needed, the purified protein could also be concentrated and buffered exchanged using Amicon Ultra centrifugation filters with MWCO of 10 kDa.

5.2.5 In solution splicing with purified precursors

The splicing experiment was designed using a 5:1 molar ratio of peptide to intein-POI conjugate. 2mM Dithiothreitol (DTT) was added to initiate the splicing reaction. 20 L samples were taken at different time points (0 min, 30 min, 1 hr, 2 hrs, and 4 hrs) during the ligation process for ligation analysis. The splicing reaction at each time point was stopped by adding 20 l 2X sodium dodecyl sulfate-polyacrylamide gel electrophoresis

(SDS-PAGE) loading dye and heated at 98 C for 10 min. The percent ligated at each time point was estimated by scanning densitometry of the precursor and ligated product bands on SDS-PAGE gels using ImageJ (NIH).

5.2.6 Design of experiment (DOE) studies for effect screening

All DOE studies were generated using JMP software (SAS institute). For the effect screening experiments, the screening module as used to design a completely randomized study with an unblocked fractional factorial design. The goal was to maximize the splicing efficiency with discrete numeric factors of pH, salt concentration, temperature and ratio. All statistical analysis was performed using JMP software with a significance level of 0.05.

109

5.2.7 Intein peptide coupling with nanodiamonds

1 mg of glycidol coated nanodiamond was resuspended in 1 mL of solution containing

100uL Dimethylacetamide (DMAC) and 900uL Dimethylformamide (DMF). 50mg of N,

N’-Disuccinimidyl carbonate (DSC) was added into solution. Samples were vortexed at room temperature for 4 hours. Post incubation, the DSC activated diamonds were rinsed three times with a mixture of DMAC and Tetrahydrofuran (THF) in equal volume and stored at -20 C in 10 mg/mL DMAC solution.

200 g of DSC activated glycidol fluorescent nanodiamond (gFND-DSC) was taken and reacted with 5 L of 20 mg/mL NH2-PEG2-MAL in DMAC for 4 hours with continuous agitation. Maleimide activated diamonds were then rinsed three times with a mixture of

DMAC and THF in equal volume. The diamonds were then stored for later use at -20 C in 10 mg/mL DMAC.

Cysteine containing peptides were reduced with an equimolar of TCEP prior of addition to the nanodiamonds. 100 g gFND-MAL was spun down and the DMAC solvent was with 100 L of 50 mM HEPES buffer (pH7.4) with 0.05% Tween 20 (HEPES-t). 10 g of TCEP reduced peptide was added and incubated for 2 hours with mixing. After incubation, 10 L of 1:10000 diluted β-mercaptoethanol (βME) was added and incubate for an additional 30mins to quench any remaining active maleimide groups. The nanodiamonds were then rinsed 3 times with 1 mL HEPES-t buffer, which was later resuspended in 100 L of the splicing buffer before reaction.

110

5.2.8 Intein-mediated protein splicing onto fluorescent nanodiamonds

100 g of peptide coupled nanodiamonds were incubated with 5 g corresponding intein construct in the splicing buffer. Incubation of reaction mixture was carried out at room temperature for two to four hours on an orbital shaker. Samples were then rinsed 3 times with HEPES-t buffer and suspended at a concentration of 1 mg/mL in the same buffer.

5.3 Results

5.3.1 In solution protein trans-splicing studies using purified precursors

Before testing the feasibility of intein splicing onto solid surfaces, we did in solution studies using recombinantly expressed fragments to explore and optimize protein trans- splicing conditions. For N-terminal oriented splicing scheme shown in Figure 5.4 (a), we attached maltose binding protein (MBP) in front of GosN fragment to mimic the nanoparticles presence with similar effect of steric hindrance. In between MBP and GosN, five native N-exteins (KVEFE) of Gos-TerL intein was retained to ensure consistent behavior between different target proteins. The GosC fragment is attached in front of the model protein, super-folder green fluorescent protein (sfGFP), which enables us to easily quantify the process by reading fluorescence. Five native C-exteins (CEFLG) was also retained in between the GosC fragment and sfGFP. His tags were added at the end of

MBP-GosN protein and at the front of GosC-sfGFP for the ease of purification. Similarly, for C-terminal oriented splicing scheme in Figure 5.4 (b), MBP was added at the end of

111

the GP41.1C and sfGFP was attached in front of GP41.1N for demonstrating the splicing reaction. Five native N-exteins and C-exteins (TRSGY and SSSDV) were also retained respectively.

Figure 5.4 (a) N-terminal oriented in solution splicing scheme; (b) C-terminal oriented in solution splicing scheme.

These precursor proteins were expressed in E. coli BLR cells and purified on Ni-NTA resin for His-tag mediated purification under native conditions. Although purification using denaturing conditions for Gos-TerL intein constructs was reported to result in higher splicing activity, the renaturing conditions could vary among different target proteins and need optimization for each target which would greatly complicate the

112

process. Following the dialysis into splicing buffers, the reactions were set up by mixing

GosC-sfGFP with a 3-fold excess of MBP-GosN and sfGFP-GP41.1N with a 3-fold excess of GP41.1C-MBP in reported splicing buffers with and without DTT for initial testing.

Results were shown in Figure 5.5.

Figure 5.5 Splicing results in buffers with or without the addition of 2 mM DTT at room temperature: (a) N-terminal oriented splicing scheme using Gos-TerL split intein; (b) C- terminal oriented splicing scheme using GP41.1 split intein.

The results shown above indicate that the splicing reaction can only occur with the addition of 2 mM DTT in the splicing solutions for both split inteins. Fortunately, 2 mM

113

DTT is incapable of disrupting disulfide-containing proteins. When in the absence of denaturant, a high concentration of reducing agent (e.g. at least 10 mM DTT) is needed to break the native disulfides in the protein samples. For example, 100 mM DTT is required for completely breaking the disulfide bonds in hirudin with longer than 2 hours of incubation[124]. At room temperature, the trans-splicing reaction of both inteins went to about 70% completion after 2 hours under reported splicing buffer (50 mM Tris/HCl, 300 mM NaCl, 1 mM EDTA, 2 mM DTT, pH 7.0). There were no side products observed from C-terminal or N-terminal cleavage reactions.

To thoroughly characterize and optimize the splicing conditions for both split inteins, we performed design of experiment (DOE) studies to examine the effect of different factors on splicing efficiency, including pH, salt concentration, temperature, fragment ratio and ligation time. Table 7 summarizes the factors and levels tested in the studies and we used

JMP software to generate a completely randomized study of 60 runs for effect screening, where four time points (15 min, 30 min, 60 min and 120 min) were taken for each run.

The rest of the buffer components were kept the same between runs, containing 1 mM

EDTA and 2 mM DTT with 50 mM buffering salts to maintain pH (Tris for pH 7 to 9,

Bis-Tris for pH 6 or Sodium Acetate for pH 5).

114

Table 7 factors and levels in the design of experiments for splicing effect screening

Factors Levels NaCl salt concentration (mM) 0 100 200 300 pH 5 6 7 8 9 Ratio (Target protein vs 5:1 1:1 Counterpart) Temperature (C) 4 R.T. 37 Ligation time (min) 15 30 60 120

The pre-purified materials were dialyzed into corresponding buffers in each run. During the dialysis process, the GosC-sfGFP protein showed significant precipitation under low pH and low salt concentration buffers as shown in Figure 5.6. This indicates that the Gos-

TerL intein fragments are instable under low pH and low salt conditions, which also prevents the splicing reaction to occur. Therefore, pH 5 and 6 levels were eliminated in the Gos-TerL splicing DOE study and it was modified to be 12 runs instead.

115

Figure 5.6 Precipitation of GosC-sfGFP was observed in buffers with low pH or low salt concentration during dialysis. A: 0 mM NaCl; B: 100 mM NaCl; C: 200 mM NaCl; D:

300 mM NaCl; the numbers represent the pH value.

Figure 5.7 shows plots of splicing efficiency versus splicing time under different conditions from the effect screening DOE study for Gos-TerL split intein. By statistical analysis, all factors had significant effect on the splicing efficiency (Table 8). The overall splicing efficiency was lower than the initial studies, which results from the loss of stability over the extended period of E. coli pellet storage of GosC-sfGFP. According to the scaled estimates and prediction profilers in Figure 5.8, the optimal splicing condition was determined to be in the buffer containing 300 mM NaCl, pH 7.0 at room temperature with a ratio of GosC-sfGFP to MBP-GosN being 5:1.

116

Figure 5.7 Gos-TerL split intein splicing efficiency versus splicing time plots under different splicing conditions. (Larger dots represent a ratio of 5:1 and smaller dots represent a ratio of 1:1)

Table 8 JMP statistical analysis of DOE effect screening study for Gos-TerL.

Factor Prob > F pH <.0001 Ratio <.0001 Temperature <.0001 Salt concentration 0.0015 Ligation time <.0001

117

Scaled Estimates Continuous factors centered by mean, scaled by range/2 Scaled Term Estimate Std Error t Ratio Prob>|t| Intercept 0.0412006 0.014133 2.92 0.0059* pH[8-7] -0.005128 0.012885 -0.40 0.6929 pH[9-8] -0.054746 0.012885 -4.25 0.0001* Ratio[5-1] 0.0944523 0.010804 8.74 <.0001* Temp[20-4] 0.0776204 0.012885 6.02 <.0001* Temp[37-20] -0.028934 0.012885 -2.25 0.0306* Salt conc.[100-0] -0.057272 0.014405 -3.98 0.0003* Salt conc.[200-100] 0.0220964 0.014849 1.49 0.1450 Salt conc.[300-200] 0.0237487 0.014405 1.65 0.1075 Ligation time (min) 0.0374509 0.006649 5.63 <.0001* 0.35 0.3

y

n c 0.247323 0.25

n

o

i

e

t 0.2

i

a [0.21189,

c

i 0.15

g

f

i

f

L

e 0.28276] 0.1 0.05 0 1

y

t

i

l 0.75

i

b

a

r 0.660283 0.5

i

s

e 0.25

D 0

7 8 9 1 5 4 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 1

.

2 3 0 0 0 2 4 6 8 0 2 4 2 7

. 0 .

1 2 3 1 1 1

0 0 120 7 5 20 300 Ligation pH Ratio Temp Salt conc. time (min) Desirability

Figure 5.8 Scaled estimates and prediction profilers from the DOE effect screening study of Gos-TerL by JMP.

For GP41.1 split intein, no precipitation was observed in any buffer conditions, therefore the DOE study design was kept the same as previously designed with 60 runs. Figure 5.9 shows plots of splicing efficiency versus splicing time from the DOE study for the

GP41.1 intein. By statistical analysis, all factors also had significant effect on the splicing efficiency (Table 9). According to the scaled estimates and prediction profilers in Figure

118

5.9, the optimal splicing can be achieved by using buffer containing more than 200 mM

NaCl, with pH ranging from 6.0 to 9.0 at 37C, and with a ratio of sfGFP-GP41.1N to

GP41.1C-MBP being 5:1. From the results of GP41.1 DOE study, it was determined that the splicing rate of GP41.1 is much faster than Gos-TerL intein, which is consistent with literatures. With a ratio of 5:1, the GP41.1 splicing reaction can achieve more than 90% completion in less than 15 min under most of the conditions tested. Therefore, the

GP41.1 split intein is more robust and flexible for efficient splicing when compared to

Gos-TerL split intein in solution.

Together, these results suggest that both split inteins can be potentially used in the proposed schemes for protein bioconjugation purpose onto solid surfaces. Guided by the information acquired in these studies, we can predict the splicing rate and efficiency when used for bioconjugation process onto nanoparticles. When this technology goes into applications or commercialization in the future, we can also use this set of information for determining the required incubation time for achieving targeted splicing efficiency for different target proteins, which may each use different buffer conditions or temperatures for stability reasons.

119

Figure 5.9 GP41.1 split intein splicing efficiency versus splicing time plots under different splicing conditions. (a) with fragments ratio of 5:1; (b) with fragments ratio of

1:1.

120

Table 9 JMP statistical analysis of DOE effect screening study for GP41.1.

Factor Prob > F pH <.0001 Ratio <.0001 Temperature <.0001 Salt concentration <.0001 Ligation time <.0001

Scaled Estimates Continuous factors centered by mean, scaled by range/2 Scaled Term Estimate Std Error t Ratio Prob>|t| Intercept -0.224002 0.026612 -8.42 <.0001* Salt concentration[100-0] 0.1364918 0.023065 5.92 <.0001* Salt concentration[200-100] 0.0829434 0.02309 3.59 0.0004* Salt concentration[300-200] -0.015251 0.022738 -0.67 0.5031 pH[6-5] 0.3398137 0.02533 13.42 <.0001* pH[7-6] -0.032073 0.025787 -1.24 0.2148 pH[8-7] 0.0179488 0.025787 0.70 0.4871 pH[9-8] 0.0160158 0.025787 0.62 0.5352 Ratio[5-1] 0.422296 0.016214 26.05 <.0001* Temperature[20-4] 0.1536152 0.019763 7.77 <.0001* Temperature[37-20] 0.0608905 0.019763 3.08 0.0023* Ligation time (min) 0.0822042 0.010558 7.79 <.0001*

Figure 5.10 Scaled estimates and prediction profilers from the DOE effect screening study of GP41.1 by JMP.

121

5.3.2 Protein trans-splicing onto nanoparticles using purified precursors and clarified lysate

The GosN peptide was synthesized externally by Ohio Peptide LLC starting with a N- terminal cysteine for site-specific reaction with maleimide group on the nanodiamond surfaces. The splicing reaction was first tested with the chemically synthesized peptide and GosC-sfGFP in solution for confirming the splicing reaction (Figure 5.11 (a)). The splicing was taking place in the optimal buffer (50 mM Tris, 300 mM NaCl, 1 mM

EDTA, 2 mM DTT, pH 7.0) determined in the previous section at room temperature for four hours (Figure 5.11 (b)). Then, the splicing product was excised from the SDS-PAGE gel and sent to Ohio State university’s mass spectrometry and proteomics facility for sequence confirmation. The gel slices were digested with trypsin enzyme, which mainly cleaves peptide chains at the carboxyl side of lysine and arginine. Liquid chromatography with tandem mass spectrometry (LC MS/MS) were performed on the digested samples for determining peptide sequence. From the results shown in Figure 5.11 (c), the protein sequence coverage is 87% confirming that the splicing reaction occurred and produced the correct product. Although the first 7 amino acids were not detected and confirmed due to the method detection limit, the following amino acid sequence would only exist after splicing reaction successfully occurred.

122

(a) (b) GosC-sfGFP : GosN Peptide ratio GosC-sfGFP only 1:0.5 1:1 1:5 1:10 GosN Peptide only Solid phase synthesis of GosN peptide

C GGGGSKVEFE Gos-TerL N GosC-sfGFP, + 41.3 KDa His6 Gos-TerL C CEFLG sfGFP Spliced product

GosC = CGGGGSKVEFECEFLG sfGFP Peptide (GosN)

(c) Bottom up LC MS/MS Protein sequence coverage: 87% Matched peptides shown in bold red.

Figure 5.11 (a) Scheme of in solution splicing with chemically synthesized GosN peptide and GosC-sfGFP and their theoretical splicing product; (b) SDS-PAGE gel result for in solution splicing test with different ratio of GosC-sfGFP to GosN peptide; (c) Protein sequence coverage results from Bottom-up LC MS/MS.

Similarly, the GP41.1C peptide was chemically synthesized with a cysteine at the end for

C-terminal coupling. However, a couple attempts of synthesis were failed initially by two different companies before a high-quality product was delivered. The splicing reaction was also tested with chemically synthesized peptide and sfGFP-GP41.1N in solution to confirm the splicing reaction before moving to the next steps (Figure 5.12 (a)). The splicing took place in one of the optimal buffers determined in the previous section (50

123

mM Tris, 300 mM NaCl, 1 mM EDTA, 2 mM DTT, pH 7.0) and incubated at room temperature for four hours (Figure 5.12 (b)).

It was observed that the splicing efficiency was significantly lower than the study performed in previous section. This is potentially due to the low solubility and stability of the synthesized peptide. The peptide is unable to dissolve in buffers contains high salt concentration, so it is very likely that it precipitates out of the solution during splicing reactions as well since the splicing buffer contains 300 mM NaCl. Further troubleshooting and optimization are needed for increasing the splicing efficiency of

GP41.1 with the synthesized peptide.

N C (a) (b) sfGFP-GP41.1 : GP41.1 Peptide ratio 1:1 1:3 1:5

His6 sfGFP TRSGY GP41.1 N + N GP41.1 C SSSDVGGGGSC sfGFP-GP41.1 Solid phase synthesis of GP41.1C peptide Spliced product

= His6 sfGFP TRSGYSSSDVGGGGSC

Figure 5.12 (a) Scheme of in solution splicing with chemically synthesized GP41.1C peptide and sfGFP-GP41.1N and their theoretical splicing product; (b) SDS-PAGE gel

124

result for in solution splicing test with different ratio of sfGFP-GP41.1N to GP41.1C peptide.

After confirming Gos-TerL mediated protein trans-splicing occurred with the synthesized peptide, the splicing reactions were continued to be tested on the solid surface at

Columbus Nanoworks. Initially, 15 microns of fluorescent microdiamond was used in the studies for the ease of verification for bioconjugation under fluorescent microscope. The

GosN peptide was coupled onto the modified microdiamond surface through maleimide- sulfhydryl reactions (Figure 5.13).

S GGGGS Gos-TerL N

Fluorescent nanodiamond CGGGGSKVEFE Gos-TerL N

Maleimide group GP41.1 C SSSDVGGGGSC

GP41.1 C GGGGS S

Figure 5.13 Creation of intein-tagged micro/nanodiamond through maleimide-mediated coupling with sulfhydryl group on the peptide.

125

Then, the GosN-tagged microdiamonds were reacted with pre-purified GosC-sfGFP fragment in the splicing buffer with agitation to prevent diamond aggregation. It was followed with multiple washes in 50 mM HEPES buffer with 0.05% Tween 20 to remove non-specific binding. Figure 5.14 shows the images under fluorescent microscope for the microdiamonds post protein trans-splicing reaction.

Texas Red (Ex: 540-580; Em:600-660) FITC (Ex: 465-495; Em: 515-555) Merge

Overlay Spectra NV- Center Fluorescence spectra GFP spectra 12000 12000 12000 10000 10000 10000 8000

n

n 8000

n

o

8000 o

i

i

o

s

s

i

s

s s 6000

i

i 6000

s

i 6000

m

m

E

m 4000 E E 4000 4000 2000 2000 2000 0 0 0 400 500 600 700 800 400 500 600 700 800 400 500 600 700 800 Wavelength Wavelength Wavelength

Figure 5.14 Co-localization of Texas Red and FITC channels for fluorescent diamonds post trans-splicing reaction.

126

Texas Red channel was used, where the excitation is 540 nm to 580nm and emission is

600 nm to 660 nm, for the intrinsic fluorescence from the diamonds. Meanwhile, FITC channel was also used, where the excitation is 465 nm to 495 nm and emission is 515 nm and 555 nm, for detecting the fluorescence of the sfGFP. Then both channels and spectra were also overlaid to demonstrate successful bioconjugation mediated by Gos-TerL protein trans-splicing.

Later on, similar experiments were performed on 100 nm fluorescent nanodiamonds with and without coupled GosN peptide. After protein trans-splicing with GosC-sfGFP fragment and multiple washes, the FNDs were draw up into capillary tubes and imaged using Maestro imaging system from 650 to 750 nm with a 665 nm longpass filter for NV- center induced fluorescence of nanodiamonds, and 500 to 550 nm with a 495 nm longpass filter for GFP emission respectively (Figure 5.15). After reacting with GosC- sfGFP, under 532 nm excitation, we saw similar intensity of fluorescence from FNDs in both cases. Meanwhile, under 430 nm excitation, we only observed strong green fluorescence from the FNDs coupled with GosN peptide after the splicing reaction, indicating effective bioconjugation of model protein, sfGFP, onto these fluorescent nanodiamonds. For FNDs without the GosN peptide, very low level of green fluorescence was observed, which represents the degree of non-specific binding between the diamonds and the proteins to be conjugated. Therefore, the N-terminal oriented bioconjugation scheme was successfully demonstrated with minimum background signals.

127

Figure 5.15 Fluorescence detection of nanodiamonds with and without GosN peptide conjugated after protein trans-splicing with GosC-sfGFP using Maestro imaging system.

Same experiments were conducted to demonstrate the proof-of-principle for the C- terminal oriented bioconjugation strategy. The 100 nm fluorescent nanodiamonds was used in the study and compared in parallel for the ones with or without coupled GP41.1C peptide. Because of the low splicing efficiency with the peptide, the splicing reaction was carried out by mixing the corresponding FNDs with sfGFP-GP41.1N for four hours in the splicing buffer with agitation. After multiple washes, the FNDs were imaged using the

Maestro imaging system with the same settings as described earlier. Similar results

128

(Figure 5.16) were seen as the N-terminal oriented bioconjugation strategy, where FNDs with GP41.1C peptide showed bright green fluorescence and FNDs without the peptide showed minimum fluorescence after conducting the splicing reactions. Therefore, the C- terminal oriented bioconjugation strategy is also demonstrated to be successful.

Figure 5.16 Fluorescence detection of nanodiamonds with and without GP41.1C peptide conjugated after protein trans-splicing with sfGFP-GP41.1N using Maestro imaging system.

129

Additionally, because of the high affinity and specificity in between the intein segments, we anticipate that the actual bioconjugation processes do not require a pre-purified target protein. In this case, the FNDs can be added directly to cell lysate to initiate intein mediated protein trans-splicing, and the resulting functionalized FNDs can then be washed and recovered through simple centrifugation. From initial demonstrations, we saw very similar results as the ones showed above (Figure 5.15 and 5.16), which indicates the feasibility of using clarified lysate directly for the bioconjugation processes.

This can greatly simplify the overall process when implemented for future applications with minimum hands-on work and knowledge in chemistry reactions.

Other target proteins have also been included and tested in parallel for in solution splicing studies, including -lactamase and interferon alpha 2b. Similar splicing rate and efficiency for both strategies were observed for different target proteins (results not shown in this thesis). This further proves that the split intein mediated protein trans- splicing is efficient and general and therefore the bioconjugation strategies can be potentially used as platform technologies.

5.4 Discussions

In this work, we have successfully demonstrated the proof-of-principle studies of two split intein-mediated bioconjugation schemes for site-specific attachment of proteins onto fluorescent nanodiamonds. One is N-terminal oriented bioconjugation scheme by utilizing Gos-TerL split intein and the other one is C-terminal oriented bioconjugation

130

scheme by utilizing GP41.1 split intein. Both split inteins were recently identified from metagenomic database, possessing unique features and efficient protein trans-splicing that are critical in the proposed schemes.

The feasibility of both strategies was first tested in in solution studies by using recombinantly expressed fragments from E. coli cells. The optimal splicing conditions were determined in both cases by setting up DOE studies for effect screening. Although the splicing efficiency of Gos-TerL intein still needs improvement, it is the only split intein that possesses a unique combination of Ser1/Cys+1 combination to allow for successful implementation of the N-terminal oriented conjugation strategy. Others have demonstrated that Ser1/Cys+1 combination was found to be almost inactive with only trace amounts of splicing product being observable by mutating other identified split inteins. The rare combination and relatively low efficiency could also be due to the thermodynamically unfavored reaction for the shift from an oxoester as the first intermediate to a as the second intermediate during the splicing reaction. On the other hand, the GP41.1 split intein demonstrated very fast and flexible splicing conditions during in solution studies as reported.

The splicing reactions were then tested using synthetic peptides in solution before coupling onto solid surfaces. While Gos-TerL split intein showed similar splicing efficiency as previous in solution experiments, GP41.1 split intein showed significantly lower splicing efficiency in the optimal buffer, potentially due to the low solubility of the

131

peptide in the buffer containing high salt. Further experiments are being carried out for troubleshooting and testing in low salt buffers to achieve higher splicing efficiency.

Moreover, feasibility of the strategies was continued to be tested onto the fluorescent nanodiamonds. The peptides were coupled onto the modified solid surfaces of fluorescent nanodiamonds with terminal in peptides and maleimide group on the diamond surfaces. Strong green fluorescence was observed by imaging functionalized FNDs after incubating intein tagged-FNDs with corresponding precursor proteins in both strategies, while little fluorescence being detected for the negative controls. The intensity of the green fluorescence is relatively lower in the C-terminal oriented scheme compared to the

N-terminal oriented scheme, which is possibly due to the lower splicing efficiency of the

GP41.1C peptide with the sfGFP-GP41.1N fragment.

Overall, the feasibility of the strategies has been validated successfully. Additionally, - lactamase is also used as another model protein for quantitative analysis of these bioconjugation strategies, which has an easy colorimetric activity assay. It can also give us additional information about whether different orientation of attachment of - lactamase onto FNDs would result in different enzymatic activities. More experiments are needed to further optimize the processes for higher efficiencies. Moreover, it is also necessary to carefully quantify the bioconjugation processes, such as the amount of peptide coupled per mg of fluorescent nanodiamonds and the resulting amount of target proteins conjugated in each strategy. This would allow better controllability when implementing the platform technologies in future applications.

132

Ultimately, the commercial goal of the project is to develop a toolbox for external customers to easily attach their specific targets onto fluorescent nanodiamonds in a site- specific and exclusively oriented manner. These new bioconjugation methods are fast, efficient, and more importantly, would fall within the expertise and capabilities of a modern molecular biology/cell biology laboratory, without the need for chemical conjugation expertise. Further, the simplicity of these platforms will facilitate the development of novel nanodiamond applications, and thus stimulate the development of broad in vivo and in vitro applications. Impacts are likely in areas of cancer imaging and diagnosis, cardiovascular and neurological disease, and pure biological research.

Depending on the specific targets, N-terminal or C-terminal conjugation strategy can be recommended. For example, for the attachment of antibodies, C-terminal oriented conjugation scheme would be generally preferred as it leaves the complementarity- determining regions (CDRs) undisturbed for antigen binding. The information that we gather from the studies would provide valuable guides for the potential users, including optimal splicing conditions to be used and potential pitfalls they might encounter.

133

Chapter 6 Conclusions and Future Work

6.1 Conclusions

In this dissertation, several advances in the use of the split intein as a self-cleaving tag for downstream purification were presented and discussed, as well as utilizing other split inteins as self-splicing tools for bioconjugation purposes. In comparison with the contiguous intein, the association of the split intein fragments offers an additional layer of control to the intein-mediated cleavage or splicing. This not only improves the controllability and flexibility of the methods, but also offers new possibilities that are hard to achieve with contiguous inteins. The work described in this dissertation was an effort to extend the usage of split inteins into existing and new areas.

In Chapter 2, the self-cleaving tag was successfully incorporated in the column-free purification strategy with the combination of the aggregating tag, elastin-like polypeptide

(ELP). By using the pH-inducible split intein, the product recovery and yield from the purification process is greatly improved compared to the contiguous intein-based systems. In parallel, the on-column purification method was developed using the same self-cleaving tag, where the N-fragment of the split intein was covalently immobilized on the agarose beads. It is the first intein-mediated purification method that is capable of

134

purifying proteins with disulfide bonds. A case study with a biosimilar target, granulocyte-colony stimulating factor (GCSF), was presented with demonstration of successful purification and recovery of the highly potent and stable protein. These methods should find use in many large-scale purification applications for commodity enzymes or recombinant protein therapeutics.

In Chapter 3, one of the challenges during the development of the split intein-mediated purification platform, which was the expression of intein-tagged precursor protein in mammalian cells, was addressed. The expression and secretion of the precursor proteins from mammalian cells were greatly enhanced by using heterogeneous leader sequences screened from the study. The optimal leader sequences can be used to successfully generate proteins from mammalian cells in a product-independent manner at different scales. This encourages the development and application of this split intein-mediated purification technology for an industrial manufacturing process.

In Chapter 4, the N-extein dependency of the engineered split intein in the purification platform was thoroughly characterized. Residues immediately preceding the intein greatly affect the pH sensitivity and cleavage activity of the split intein. We identified that hydrophilic residues at the N-exteins positions accelerates the cleavage reaction, while the hydrophobic residues lead to decreased cleavage rate of the intein. The N- extein effects reported in this work can provide guidance for improving the cleavage activities of the split intein when purifying specific targets. We also expect that the

FRET-based assay, which was developed during the study, will find use in tracking the in

135

vivo or in vitro activities of other contiguous or split inteins for high-throughput screening.

In Chapter 5, different split inteins were utilized for developing novel bioconjugation methods for site-specific and oriented attachment of proteins onto fluorescent nanodiamonds. An N-terminal oriented conjugation scheme was developed using Gos-

TerL split intein, while a C-terminal oriented conjugation was accomplished by using

GP41.1 split intein. Our present results indicate efficient, site-specific conjugation of

FNDs to either the N- or C-terminus of a target protein with purified materials or crude lysates. These platform technologies are notable due to their potential biomedical imaging applications, such as in vivo labelling or drug uptake pathway tracking.

Overall, the work described in this dissertation is a major step forward for developing split intein-based technologies.

6.2 Future work

6.2.1 Development of the Npu split intein-mediated purification system for commercialization

Unlike full-length antibodies or Fc-fusion proteins, there is currently no purification platform available for all other proteins. At laboratory scale, tag-based approaches, such as His-tag or Strep tag, are usually used for purifying the protein of interest. However, the remaining tag portion may interfere with the downstream characterization or applications, especially in crystallization or protein-protein interaction studies. The engineered Npu

136

split intein, presented in this work, is a very useful tool for purification of tagless and traceless proteins without complex optimization. It is shown to have several attractive features, such as pH sensitivity, tight controllability, well-understood cleavage behavior and compatibility with mammalian cell expression systems. To commercialize this intein technology for purification applications at both laboratory and industrial scale, several additional improvements need to be made.

First is to optimize and scale up the production of the intein resin to become more economically viable. The expression of the ligand needs to be adapted from the original shake flasks and optimized in the bioreactor with a larger production capacity. The backbone of the resin, SulfoLink, is currently purchased from ThermoFisher at a price of about $12 per mL, with a limiting binding capacity and restricted flow rate and pressure. Alternative resin backbone with lower cost and higher binding capacity, like epoxide resin, could be purchased and tested from an external vendor. One of the advantages of SulfoLink resin is the mild coupling conditions; however, the ligand might be able to withstand much harsher conditions from previous tests. Although more work needs to be done for switching the resin backbone and the coupling chemistry, the economic benefit, along with a much higher binding capacity, is worth the effort.

Second is to optimize the regeneration conditions of the intein resin. Currently, 6 M

Guanidine chloride is used for regeneration purpose and is demonstrated to be effective for at least 20 cycles of study. This might be acceptable at laboratory scale, but it becomes cost-prohibitive for the waste disposal of the guanidine chloride buffer at

137

manufacturing scale. More screening studies need to be carried out for finding an alternative buffer, preferably with low concentration of sodium hydroxide. If needed, the

NpuN* ligand can be adapted to be caustic stable through direct evolution or rational design at susceptible cleavage site at caustic conditions.

Third is to incorporate more target proteins during the development of the split intein- based purification technology. Because of limited resources in academia, the methods reported in this dissertation have only been validated with a handful of target proteins.

Among the tested target proteins, we have not experienced any failed purifications with soluble and well-folded precursor proteins using the self-cleaving tag. The cleavage rate of each target protein can also be predicted based on the leading amino acids of their primary sequences. When expanding the target protein libraries, counterexamples might show up which can potentially provide us additional information on whether other factors need to be considered for a successful purification. Systematic exploration for the effect of structures (globular or extended, monomer or dimer, etc.) or isoelectric point of the proteins on the cleavage rate might be needed. This could be achieved by external collaborations with companies or universities who are interested in trying the technology.

Corresponding strategies can be employed for unexpected results as early as possible in the development stage.

Additionally, molecular dynamics studies can also be incorporated during the development of the technology. It can provide mechanism explanations for the observed cleavage activities, thus aiding further optimization of the ligand with rational design.

138

More development work needs to be done before commercializing of this technology, however significant progress has been made in the past few years. The engineered split intein is a very promising tool in the biotechnology industry for purification applications.

As a generalized platform, it can be easily integrated in the high-throughput screening paradigms and readily applied to multiple applications. The target validation during the early drug discovery phase is one example, where a large number of potential biological modulators need to be tested against a chosen set of targets. The generality and simplicity of the system could greatly speed up the drug discovery phase as it eliminates the individually tailored optimization period, moving to the next step of target validation in a much faster way.

6.2.2 Development of the split intein-mediated bioconjugation strategies for commercialization

The ultimate goal of this work is creating a “kit” for customers, with intein coupled fluorescent nanodiamonds and specialized plasmid vectors for expressing tagged versions of the desired target proteins. The resulting “kit” would allow for precise attachment of oriented proteins to the fluorescent nanodiamond surface, where the fluorescent nanodiamond and the protein label can be fully customized by the end user. This approach will significantly reduce the time and effort needed to create nanoparticle- protein conjugates and will be accessible to researchers who lack significant protein chemistry or bioconjugation experience. Since the project is newly started and still in the

139

phase I proof-of-concept stage, there are multiple major developments that be done before moving towards commercialization.

Firstly, accurate quantitation of the amount of accessible conjugation spots on the fluorescent nanodiamonds and the bioconjugation efficiencies is needed. This is relatively hard to achieve in phase one of the project due to the limited funding. Mass spectroscopy needs to be performed for digested samples of the conjugated FNDs in order to determine whether there is an identity peptide can be used for quantitation purpose. If so, C13 labeled peptide needs to be synthesized and used in the quantitative mass spectroscopy. Secondly, additional model proteins need to be incorporated to demonstrate the successful bioconjugation using the split inteins and their applications as labeled FNDs. Currently, there are two target proteins under development as illustration examples in phase two, IFN a2b and ML39 single-chain variable fragments (scFv). IFN a2b has a cell proliferation assay for determine its activity after conjugating onto FNDs and could be used to track the potential drug uptake pathway in vitro and in vivo with collaborations. The splicing has been successfully demonstrated in solution but needs to be verified on the surface of the fluorescent nanodiamonds. ML39 specifically binds to the extracellular domain of the tumor surface protein, ErbB2, for ErbB2-positive breast cancer cells[125]. The scFv conjugated-FND could be potentially used to track and monitor breast cancer cells with the ErbB2 markers in vivo.

Additionally, orthogonal intein pairs can be used to build a dual-labeling system, such as the two split inteins Gos-TerL and GP41.1 described in the work. Cross reaction between

140

the intein pairs needs to be ensured so that they can be used in one system. Multi-proteins specific labeling could be quite beneficial for protein research, especially in a complex background such as live cells. It may facilitate in vivo protein functional studies by monitoring multiple proteins simultaneously.

With the availability of highly efficient and robust split inteins, trans-splicing and trans- cleavage will become more versatile and indispensable tools for a wide variety of applications, including protein production, conjugation and modifications.

141

Bibliography

1. Perler, F.B.; Davis, E.O.; Dean, G.E.; Gimble, F.S.; Jack, W.E.; Neff, N.; Noren,

C.J.; Thorner, J.; Belfort, M. Protein splicing elements: Inteins and exteins--a

definition of terms and recommended nomenclature. Nucleic Acids Res 1994, 22,

1125-1127.

2. Hirata, R.; Ohsumk, Y.; Nakano, A.; Kawasaki, H.; Suzuki, K.; Anraku, Y.

Molecular structure of a gene, vma1, encoding the catalytic subunit of h(+)-

translocating adenosine triphosphatase from vacuolar membranes of

. J Biol Chem 1990, 265, 6726-6733.

3. Kane, P.M.; Yamashiro, C.T.; Wolczyk, D.F.; Neff, N.; Goebl, M.; Stevens, T.H.

Protein splicing converts the tfp1 gene product to the 69-kd subunit of the

vacuolar h(+)-adenosine triphosphatase. Science 1990, 250, 651-657.

4. Perler, F.B. Inbase: The intein database. Nucleic Acids Res 2002, 30, 383-384.

5. Wu, H.; Hu, Z.; Liu, X.Q. Protein trans-splicing by a split intein encoded in a split

dnae gene of synechocystis sp. Pcc6803. Proc Natl Acad Sci U S A 1998, 95,

9226-9231.

142

6. Iwai, H.; Zuger, S.; Jin, J.; Tam, P.H. Highly efficient protein trans-splicing by a

naturally split dnae intein from nostoc punctiforme. Febs Lett 2006, 580, 1853-

1858, 10.1016/j.febslet.2006.02.045.

7. Xu, M.Q.; Perler, F.B. The mechanism of protein splicing and its modulation by

mutation. EMBO J 1996, 15, 5146-5153.

8. Mills, K.V.; Johnson, M.A.; Perler, F.B. Protein splicing: How inteins escape

from precursor proteins. J Biol Chem 2014, 289, 14498-14505,

10.1074/jbc.R113.540310.

9. Noren, C.J.; Wang, J.; Perler, F.B. Dissecting the chemistry of protein splicing

and its applications. Angew Chem Int Ed Engl 2000, 39, 450-466.

10. Sun, W.; Yang, J.; Liu, X.Q. Synthetic two-piece and three-piece split inteins for

protein trans-splicing. J Biol Chem 2004, 279, 35281-35286,

10.1074/jbc.M405491200.

11. Appleby, J.H.; Zhou, K.; Volkmann, G.; Liu, X.Q. Novel split intein for trans-

splicing synthetic peptide onto c terminus of protein. J Biol Chem 2009, 284,

6194-6199, 10.1074/jbc.M805474200.

12. Shah, N.H.; Dann, G.P.; Vila-Perello, M.; Liu, Z.; Muir, T.W. Ultrafast protein

splicing is common among cyanobacterial split inteins: Implications for protein

engineering. J Am Chem Soc 2012, 134, 11338-11341, 10.1021/ja303226x.

143

13. Bachmann, A.L.; Mootz, H.D. An unprecedented combination of serine and

cysteine nucleophiles in a split intein with an atypical split site. J Biol Chem 2015,

290, 28792-28804, 10.1074/jbc.M115.677237.

14. Dassa, B.; London, N.; Stoddard, B.L.; Schueler-Furman, O.; Pietrokovski, S.

Fractured genes: A novel genomic arrangement involving new split inteins and a

new family. Nucleic Acids Res 2009, 37, 2560-2573,

10.1093/nar/gkp095.

15. Carvajal-Vallejos, P.; Pallisse, R.; Mootz, H.D.; Schmidt, S.R. Unprecedented

rates and efficiencies revealed for new natural split inteins from metagenomic

sources. J Biol Chem 2012, 287, 28686-28696, 10.1074/jbc.M112.372680.

16. Lu, W.; Sun, Z.; Tang, Y.; Chen, J.; Tang, F.; Zhang, J.; Liu, J.N. Split intein

facilitated tag affinity purification for recombinant proteins with controllable tag

removal by inducible auto-cleavage. J Chromatogr A 2011, 1218, 2553-2560,

10.1016/j.chroma.2011.02.053.

17. Shi, C.; Meng, Q.; Wood, D.W. A dual elp-tagged split intein system for non-

chromatographic recombinant protein purification. Appl Microbiol Biotechnol

2013, 97, 829-835, 10.1007/s00253-012-4601-3.

18. Kurpiers, T.; Mootz, H.D. Site-specific chemical modification of proteins with a

prelabelled cysteine tag using the artificially split mxe gyra intein. Chembiochem

2008, 9, 2317-2325, 10.1002/cbic.200800319.

144

19. Scott, C.P.; Abel-Santos, E.; Wall, M.; Wahnon, D.C.; Benkovic, S.J. Production

of cyclic peptides and proteins in vivo. Proc Natl Acad Sci U S A 1999, 96,

13638-13643.

20. Borra, R.; Dong, D.; Elnagar, A.Y.; Woldemariam, G.A.; Camarero, J.A. In-cell

fluorescence activation and labeling of proteins mediated by fret-quenched split

inteins. J Am Chem Soc 2012, 134, 6344-6353, 10.1021/ja300209u.

21. Yang, J.Y.; Yang, W.Y. Site-specific two-color protein labeling for fret studies

using split inteins. J Am Chem Soc 2009, 131, 11644-11645, 10.1021/ja9030215.

22. Guan, D.; Ramirez, M.; Chen, Z. Split intein mediated ultra-rapid purification of

tagless protein (sirp). Biotechnol Bioeng 2013, 110, 2471-2481,

10.1002/bit.24913.

23. Thiel, I.V.; Volkmann, G.; Pietrokovski, S.; Mootz, H.D. An atypical naturally

split intein engineered for highly efficient protein labeling. Angew Chem Int Ed

Engl 2014, 53, 1306-1310, 10.1002/anie.201307969.

24. Bachmann, A.L.; Mootz, H.D. N-terminal chemical protein labeling using the

naturally split gos-terl intein. J Pept Sci 2017, 23, 624-630, 10.1002/psc.2996.

25. Walsh, G. Biopharmaceutical benchmarks 2014. Nat Biotechnol 2014, 32, 992-

1000, 10.1038/nbt.3040.

26. Kelley, B. Very large scale monoclonal antibody purification: The case for

conventional unit operations. Biotechnol Prog 2007, 23, 995-1008,

10.1021/bp070117s.

145

27. Kelley, B. Industrialization of mab production technology: The bioprocessing

industry at a crossroads. MAbs 2009, 1, 443-452.

28. Li, F.; Vijayasankaran, N.; Shen, A.Y.; Kiss, R.; Amanullah, A. Cell culture

processes for monoclonal antibody production. MAbs 2010, 2, 466-479.

29. Hanke, A.T.; Ottens, M. Purifying biopharmaceuticals: Knowledge-based

chromatographic process development. Trends Biotechnol 2014, 32, 210-220,

10.1016/j.tibtech.2014.02.001.

30. Terpe, K. Overview of tag protein fusions: From molecular and biochemical

fundamentals to commercial systems. Appl Microbiol Biotechnol 2003, 60, 523-

533, 10.1007/s00253-002-1158-6.

31. Wood, D.W. New trends and affinity tag designs for recombinant protein

purification. Curr Opin Struct Biol 2014, 26, 54-61, 10.1016/j.sbi.2014.04.006.

32. Waugh, D.S. An overview of enzymatic reagents for the removal of affinity tags.

Protein Expr Purif 2011, 80, 283-293, 10.1016/j.pep.2011.08.005.

33. Butt, T.R.; Edavettal, S.C.; Hall, J.P.; Mattern, M.R. Sumo fusion technology for

difficult-to-express proteins. Protein Expr Purif 2005, 43, 1-9,

10.1016/j.pep.2005.03.016.

34. Marblestone, J.G.; Edavettal, S.C.; Lim, Y.; Lim, P.; Zuo, X.; Butt, T.R.

Comparison of sumo fusion technology with traditional gene fusion systems:

Enhanced expression and solubility with sumo. Protein Sci 2006, 15, 182-189,

10.1110/ps.051812706.

146

35. Peroutka Iii, R.J.; Orcutt, S.J.; Strickler, J.E.; Butt, T.R. Sumo fusion technology

for enhanced protein expression and purification in prokaryotes and eukaryotes.

Methods Mol Biol 2011, 705, 15-30, 10.1007/978-1-61737-967-3_2.

36. Sadilkova, L.; Osicka, R.; Sulc, M.; Linhartova, I.; Novak, P.; Sebo, P. Single-

step affinity purification of recombinant proteins using a self-excising module

from neisseria meningitidis frpc. Protein Sci 2008, 17, 1834-1843,

10.1110/ps.035733.108.

37. Mao, H. A self-cleavable sortase fusion for one-step purification of free

recombinant proteins. Protein Expr Purif 2004, 37, 253-263,

10.1016/j.pep.2004.06.013.

38. Ruan, B.; Fisher, K.E.; Alexander, P.A.; Doroshko, V.; Bryan, P.N. Engineering

subtilisin into a fluoride-triggered processing protease useful for one-step protein

purification. Biochemistry 2004, 43, 14539-14546, 10.1021/bi048177j.

39. Achmuller, C.; Kaar, W.; Ahrer, K.; Wechner, P.; Hahn, R.; Werther, F.;

Schmidinger, H.; Cserjan-Puschmann, M.; Clementschitsch, F.; Striedner, G., et

al. N(pro) fusion technology to produce proteins with authentic n termini in e.

Coli. Nat Methods 2007, 4, 1037-1043, 10.1038/nmeth1116.

40. Shen, A.; Lupardus, P.J.; Morell, M.; Ponder, E.L.; Sadaghiani, A.M.; Garcia,

K.C.; Bogyo, M. Simplified, enhanced protein purification using an inducible,

autoprocessing enzyme tag. PLoS One 2009, 4, e8119,

10.1371/journal.pone.0008119.

147

41. Chong, S.; Shao, Y.; Paulus, H.; Benner, J.; Perler, F.B.; Xu, M.Q. Protein

splicing involving the saccharomyces cerevisiae vma intein. The steps in the

splicing pathway, side reactions leading to protein cleavage, and establishment of

an in vitro splicing system. J Biol Chem 1996, 271, 22159-22168.

42. Ramirez, M.; Valdes, N.; Guan, D.; Chen, Z. Engineering split intein dnae from

nostoc punctiforme for rapid protein purification. Protein Eng Des Sel 2013, 26,

215-223, 10.1093/protein/gzs097.

43. Wood, D.W.; Wu, W.; Belfort, G.; Derbyshire, V.; Belfort, M. A genetic system

yields self-cleaving inteins for bioseparations. Nat Biotechnol 1999, 17, 889-892,

10.1038/12879.

44. Mathys, S.; Evans, T.C.; Chute, I.C.; Wu, H.; Chong, S.; Benner, J.; Liu, X.Q.;

Xu, M.Q. Characterization of a self-splicing mini-intein and its conversion into

autocatalytic n- and c-terminal cleavage elements: Facile production of protein

building blocks for protein ligation. Gene 1999, 231, 1-13.

45. Fong, B.A.; Wood, D.W. Expression and purification of elp-intein-tagged target

proteins in high cell density e. Coli fermentation. Microb Cell Fact 2010, 9, 77,

10.1186/1475-2859-9-77.

46. Pelaz, B.; Alexiou, C.; Alvarez-Puebla, R.A.; Alves, F.; Andrews, A.M.; Ashraf,

S.; Balogh, L.P.; Ballerini, L.; Bestetti, A.; Brendel, C., et al. Diverse applications

of nanomedicine. ACS Nano 2017, 10.1021/acsnano.6b06040.

148

47. Krishnamachari, Y.; Geary, S.M.; Lemke, C.D.; Salem, A.K. Nanoparticle

delivery systems in cancer vaccines. Pharmaceutical research 2011, 28, 215-236,

10.1007/s11095-010-0241-4.

48. Park, Y.M.; Lee, S.J.; Kim, Y.S.; Lee, M.H.; Cha, G.S.; Jung, I.D.; Kang, T.H.;

Han, H.D. Nanoparticle-based vaccine delivery for cancer immunotherapy.

Immune network 2013, 13, 177-183, 10.4110/in.2013.13.5.177.

49. Xing, Y.; Dai, L. Nanodiamonds for nanomedicine. Nanomedicine 2009, 4, 207-

218, 10.2217/17435889.4.2.207.

50. Fang, C.; Bhattarai, N.; Sun, C.; Zhang, M. Functionalized nanoparticles with

long-term stability in biological media. Small 2009, 5, 1637-1641,

10.1002/smll.200801647.

51. Fang, R.H.; Zhang, L. Nanoparticle-based modulation of the immune system.

Annu Rev Chem Biomol Eng 2016, 7, 305-326, 10.1146/annurev-chembioeng-

080615-034446.

52. Anchordoquy, T.J.; Barenholz, Y.; Boraschi, D.; Chorny, M.; Decuzzi, P.;

Dobrovolskaia, M.A.; Farhangrazi, Z.S.; Farrell, D.; Gabizon, A.; Ghandehari, H.,

et al. Mechanisms and barriers in cancer nanomedicine: Addressing challenges,

looking for solutions. ACS Nano 2017, 11, 12-18, 10.1021/acsnano.6b08244.

53. Colombo, E.; Feyen, P.; Antognazza, M.R.; Lanzani, G.; Benfenati, F.

Nanoparticles: A challenging vehicle for neural stimulation. Frontiers in

neuroscience 2016, 10, 105, 10.3389/fnins.2016.00105.

149

54. Shah, S. The nanomaterial toolkit for neuroengineering. Nano convergence 2016,

3, 25, 10.1186/s40580-016-0086-6.

55. Hermanson, G.T. Bioconjugate techniques. Academic Press: San Diego, 1996; p

xxv, 785 p.

56. Volkmann, G.; Liu, X.Q. Protein c-terminal labeling and biotinylation using

synthetic peptide and split-intein. PLoS One 2009, 4, e8381,

10.1371/journal.pone.0008381.

57. Dhar, T.; Mootz, H.D. Modification of transmembrane and gpi-anchored proteins

on living cells by efficient protein trans-splicing using the npu dnae intein. Chem

Commun (Camb) 2011, 47, 3063-3065, 10.1039/c0cc04172f.

58. Charalambous, A.; Andreou, M.; Skourides, P.A. Intein-mediated site-specific

conjugation of quantum dots to proteins in vivo. J Nanobiotechnology 2009, 7, 9,

10.1186/1477-3155-7-9.

59. Charalambous, A.; Antoniades, I.; Christodoulou, N.; Skourides, P.A. Split-

inteins for simultaneous, site-specific conjugation of quantum dots to multiple

protein targets in vivo. J Nanobiotechnology 2011, 9, 37, 10.1186/1477-3155-9-

37.

60. Fong, B.A.; Wu, W.-Y.; Wood, D.W. The potential role of self-cleaving

purification tags in commercial-scale processes. Trends Biotechnol 2010, 28, 272-

279.

150

61. Shi, C.; Miskioglu, E.E.; Meng, Q.; Wood, D.W. Intein-based purification tags in

recombinant protein production and new methods for controlling self-cleavage.

Pharmaceutical Bioprocessing 2013, 1, 441-454.

62. Guan, D.; Chen, Z. Challenges and recent advances in affinity purification of tag-

free proteins. Biotechnol Lett 2014, 36, 1391-1406, 10.1007/s10529-014-1509-2.

63. Wu, Q.; Gao, Z.; Wei, Y.; Ma, G.; Zheng, Y.; Dong, Y.; Liu, Y. Conserved

residues that modulate protein trans-splicing of npu dnae split intein. Biochem J

2014, 461, 247-255, 10.1042/BJ20140287.

64. Shemella, P.; Pereira, B.; Zhang, Y.; Van Roey, P.; Belfort, G.; Garde, S.; Nayak,

S.K. Mechanism for intein c-terminal cleavage: A proposal from quantum

mechanical calculations. Biophys J 2007, 92, 847-853,

10.1529/biophysj.106.092049.

65. Du, Z.M.; Shemella, P.T.; Liu, Y.Z.; McCallum, S.A.; Pereira, B.; Nayak, S.K.;

Belfort, G.; Belfort, M.; Wang, C.Y. Highly conserved histidine plays a dual

catalytic role in protein splicing: A pk(a) shift mechanism. Journal of the

American Chemical Society 2009, 131, 11581-11589, Doi 10.1021/Ja904318w.

66. Wood, D.W.; Wu, W.; Belfort, G.; Derbyshire, V.; Belfort, M. A genetic system

yields self-cleaving inteins for bioseparations. Nat Biotechnol 1999, 17, 889-892.

67. Wood, D.W.; Derbyshire, V.; Wu, W.; Chartrain, M.; Belfort, M.; Belfort, G.

Optimized single-step affinity purification with a self-cleaving intein applied to

human acidic fibroblast growth factor. Biotechnol Progr 2000, 16, 1055-1063.

151

68. Banki, M.R.; Feng, L.A.; Wood, D.W. Simple bioseparations using self-cleaving

elastin-like polypeptide tags. Nat Methods 2005, 2, 659-661, Doi

10.1038/Nmeth787.

69. Fong, B.A.; Wu, W.Y.; Wood, D.W. Optimization of elp-intein mediated protein

purification by salt substitution. Protein Expres Purif 2009, 66, 198-202,

10.1016/j.pep.2009.03.009.

70. Wu, W.; Xing, L.; Zhou, B.; Lin, Z. Active protein aggregates induced by

terminally attached self-assembling peptide elk16 in escherichia coli. Microb Cell

Fact 2011, 10, 9, 10.1186/1475-2859-10-9.

71. Shur, O.; Dooley, K.; Blenner, M.; Baltimore, M.; Banta, S. A designed, phase

changing rtx-based peptide for efficient bioseparations. Biotechniques 2013, 54,

197-198, 200, 202, 204, 206, 10.2144/000114010.

72. Pahnke, S.; Egeland, T.; Halter, J.; Hagglund, H.; Shaw, B.E.; Woolfrey, A.E.;

Szer, J.; Working Group Medical of the World Marrow Donor, A. Current use of

biosimilar g-csf for haematopoietic stem cell mobilisation. Bone Marrow

Transplant 2018, 10.1038/s41409-018-0350-y.

73. Bonig, H.; Silbermann, S.; Weller, S.; Kirschke, R.; Korholz, D.; Janssen, G.;

Gobel, U.; Nurnberger, W. Glycosylated vs non-glycosylated granulocyte colony-

stimulating factor (g-csf)--results of a prospective randomised monocentre study.

Bone Marrow Transplant 2001, 28, 259-264, 10.1038/sj.bmt.1703136.

152

74. Jun, Y.; Wickner, W. Assays of fusion resolve the stages of docking, lipid

mixing, and content mixing. Proc Natl Acad Sci U S A 2007, 104, 13010-13015,

10.1073/pnas.0700970104.

75. Sands, D.; Whitton, C.M.; Longstaff, C. International collaborative study to

establish the 3rd international standard for streptokinase. J Thromb Haemost

2004, 2, 1411-1415, 10.1111/j.1538-7836.2004.00814.x.

76. Wood, D.W.; Camarero, J.A. Intein applications: From protein purification and

labeling to metabolic control methods. J Biol Chem 2014, 289, 14512-14519,

10.1074/jbc.R114.552653.

77. Aranko, A.S.; Wlodawer, A.; Iwai, H. Nature's recipe for splitting inteins. Protein

engineering, design & selection : PEDS 2014, 27, 263-271,

10.1093/protein/gzu028.

78. Southworth, M.W.; Adam, E.; Panne, D.; Byer, R.; Kautz, R.; Perler, F.B. Control

of protein splicing by intein fragment reassembly. EMBO J 1998, 17, 918-926,

10.1093/emboj/17.4.918.

79. Otomo, T.; Ito, N.; Kyogoku, Y.; Yamazaki, T. Nmr observation of selected

segments in a larger protein: Central-segment isotope labeling through intein-

mediated ligation. Biochemistry-Us 1999, 38, 16040-16044, DOI

10.1021/bi991902j.

153

80. Zettler, J.; Schutz, V.; Mootz, H.D. The naturally split npu dnae intein exhibits an

extraordinarily high rate in the protein trans-splicing reaction. Febs Lett 2009,

583, 909-914, 10.1016/j.febslet.2009.02.003.

81. Iwai, H.; Zuger, S.; Jin, J.; Tam, P.H. Highly efficient protein trans-splicing by a

naturally split dnae intein from nostoc punctiforme. Febs Lett 2006, 580, 1853-

1858, DOI 10.1016/j.febslet.2006.02.045.

82. Zhao, Q.; Zhou, B.; Gao, X.; Xing, L.; Wang, X.; Lin, Z. A cleavable self-

assembling tag strategy for preparing proteins and peptides with an authentic n-

terminus. Biotechnol J 2017, 12, 10.1002/biot.201600656.

83. Cooper, M.A.; Taris, J.E.; Shi, C.; Wood, D.W. A convenient split-intein tag

method for the purification of tagless target proteins. In Current protocols in

protein science, John Wiley & Sons, Inc.: 2018; Vol. 91, pp 5.29.21-25.29.23.

84. Kinch, M.S. An overview of fda-approved biologics medicines. Drug Discov

Today 2015, 20, 393-398, 10.1016/j.drudis.2014.09.003.

85. Lagasse, H.A.; Alexaki, A.; Simhadri, V.L.; Katagiri, N.H.; Jankowski, W.;

Sauna, Z.E.; Kimchi-Sarfaty, C. Recent advances in (therapeutic protein) drug

development. F1000Res 2017, 6, 113, 10.12688/f1000research.9970.1.

86. Dumont, J.; Euwart, D.; Mei, B.; Estes, S.; Kshirsagar, R. Human cell lines for

biopharmaceutical manufacturing: History, status, and future perspectives. Crit

Rev Biotechnol 2016, 36, 1110-1122, 10.3109/07388551.2015.1084266.

154

87. Estes, S.; Melville, M. Mammalian cell line developments in speed and

efficiency. Adv Biochem Eng Biotechnol 2014, 139, 11-33,

10.1007/10_2013_260.

88. Hegde, R.S.; Bernstein, H.D. The surprising complexity of signal sequences.

Trends Biochem Sci 2006, 31, 563-571, 10.1016/j.tibs.2006.08.004.

89. Gierasch, L.M. Signal sequences. Biochemistry 1989, 28, 923-930.

90. Martoglio, B.; Dobberstein, B. Signal sequences: More than just greasy peptides.

Trends Cell Biol 1998, 8, 410-415.

91. Knappskog, S.; Ravneberg, H.; Gjerdrum, C.; Trosse, C.; Stern, B.; Pryme, I.F.

The level of synthesis and secretion of gaussia princeps luciferase in transfected

cho cells is heavily dependent on the choice of signal peptide. J Biotechnol 2007,

128, 705-715, 10.1016/j.jbiotec.2006.11.026.

92. Guler-Gane, G.; Kidd, S.; Sridharan, S.; Vaughan, T.J.; Wilkinson, T.C.; Tigue,

N.J. Overcoming the refractory expression of secreted recombinant proteins in

mammalian cells through modification of the signal peptide and adjacent amino

acids. PLoS One 2016, 11, e0155340, 10.1371/journal.pone.0155340.

93. Kober, L.; Zehe, C.; Bode, J. Optimized signal peptides for the development of

high expressing cho cell lines. Biotechnol Bioeng 2013, 110, 1164-1173,

10.1002/bit.24776.

94. Chen, W.; Zhao, X.; Zhang, M.; Yuan, Y.; Ge, L.; Tang, B.; Xu, X.; Cao, L.;

Guo, H. High-efficiency secretory expression of human neutrophil gelatinase-

155

associated lipocalin from mammalian cell lines with human serum albumin signal

peptide. Protein Expr Purif 2016, 118, 105-112, 10.1016/j.pep.2015.10.012.

95. Durocher, Y.; Perret, S.; Kamen, A. High-level and high-throughput recombinant

protein production by transient transfection of suspension-growing human 293-

ebna1 cells. Nucleic Acids Res 2002, 30, E9.

96. Loignon, M.; Perret, S.; Kelly, J.; Boulais, D.; Cass, B.; Bisson, L.;

Afkhamizarreh, F.; Durocher, Y. Stable high volumetric production of

glycosylated human recombinant ifnalpha2b in hek293 cells. BMC Biotechnol

2008, 8, 65, 10.1186/1472-6750-8-65.

97. Chong, S.; Montello, G.E.; Zhang, A.; Cantor, E.J.; Liao, W.; Xu, M.Q.; Benner,

J. Utilizing the c-terminal cleavage activity of a protein splicing element to purify

recombinant proteins in a single chromatographic step. Nucleic Acids Res 1998,

26, 5109-5115.

98. Amitai, G.; Callahan, B.P.; Stanger, M.J.; Belfort, G.; Belfort, M. Modulation of

intein activity by its neighboring extein substrates. Proc Natl Acad Sci U S A

2009, 106, 11005-11010, 10.1073/pnas.0904366106.

99. Shah, N.H.; Eryilmaz, E.; Cowburn, D.; Muir, T.W. Extein residues play an

intimate role in the rate-limiting step of protein trans-splicing. J Am Chem Soc

2013, 135, 5839-5847, 10.1021/ja401015p.

156

100. Ellila, S.; Jurvansuu, J.M.; Iwai, H. Evaluation and comparison of protein splicing

by exogenous inteins with foreign exteins in escherichia coli. FEBS Lett 2011,

585, 3471-3477, 10.1016/j.febslet.2011.10.005.

101. Cheriyan, M.; Pedamallu, C.S.; Tori, K.; Perler, F. Faster protein splicing with the

nostoc punctiforme dnae intein using non-native extein residues. J Biol Chem

2013, 288, 6202-6211, 10.1074/jbc.M112.433094.

102. Southworth, M.W.; Amaya, K.; Evans, T.C.; Xu, M.Q.; Perler, F.B. Purification

of proteins fused to either the amino or carboxy terminus of the mycobacterium

xenopi gyrase a intein. Biotechniques 1999, 27, 110-114, 116, 118-120.

103. Frutos, S.; Goger, M.; Giovani, B.; Cowburn, D.; Muir, T.W. Branched

intermediate formation stimulates peptide bond cleavage in protein splicing. Nat

Chem Biol 2010, 6, 527-533, 10.1038/nchembio.371.

104. Bajar, B.T.; Wang, E.S.; Zhang, S.; Lin, M.Z.; Chu, J. A guide to fluorescent

protein fret pairs. Sensors (Basel) 2016, 16, 10.3390/s16091488.

105. Machleidt, T.; Woodroofe, C.C.; Schwinn, M.K.; Mendez, J.; Robers, M.B.;

Zimmerman, K.; Otto, P.; Daniels, D.L.; Kirkland, T.A.; Wood, K.V. Nanobret--a

novel bret platform for the analysis of protein-protein interactions. ACS Chem

Biol 2015, 10, 1797-1804, 10.1021/acschembio.5b00143.

106. Michalet, X.; Kapanidis, A.N.; Laurence, T.; Pinaud, F.; Doose, S.; Pflughoefft,

M.; Weiss, S. The power and prospects of fluorescence microscopies and

157

spectroscopies. Annu Rev Biophys Biomol Struct 2003, 32, 161-182,

10.1146/annurev.biophys.32.110601.142525.

107. Derfus, A.M.; Chan, W.C.W.; Bhatia, S.N. Probing the cytotoxicity of

semiconductor quantum dots. Nano Lett 2004, 4, 11-18, 10.1021/nl0347334.

108. Cho, S.J.; Maysinger, D.; Jain, M.; Roder, B.; Hackbarth, S.; Winnik, F.M. Long-

term exposure to cdte quantum dots causes functional impairments in live cells.

Langmuir 2007, 23, 1974-1980, 10.1021/la060093j.

109. Hsiao, W.W.; Hui, Y.Y.; Tsai, P.C.; Chang, H.C. Fluorescent nanodiamond: A

versatile tool for long-term cell tracking, super-resolution imaging, and nanoscale

temperature sensing. Acc Chem Res 2016, 49, 400-407,

10.1021/acs.accounts.5b00484.

110. Ho, D.; Wang, C.-H.K.; Chow, E.K.-H. Nanodiamonds: The intersection of

nanotechnology, drug development, and personalized medicine. Science Advances

2015, 1, e1500439, 10.1126/sciadv.1500439.

111. Rosenholm, J.M.; Vlasov, II; Burikov, S.A.; Dolenko, T.A.; Shenderova, O.A.

Nanodiamond-based composite structures for biomedical imaging and drug

delivery. Journal of Nanoscience and Nanotechnology 2015, 15, 959-971,

10.1166/jnn.2015.9742.

112. Schirhagl, R.; Chang, K.; Loretz, M.; Degen, C.L. Nitrogen-vacancy centers in

diamond: Nanoscale sensors for physics and biology. Annual Review of Physical

Chemistry 2014, 65, 83-105, doi:10.1146/annurev-physchem-040513-103659.

158

113. Balasubramanian, G.; Lazariev, A.; Arumugam, S.R.; Duan, D.w. Nitrogen-

vacancy color center in diamond-emerging nanoscale applications in bioimaging

and biosensing. Current Opinion in 2014, 20, 69-77,

10.1016/j.cbpa.2014.04.014.

114. Beha, K.; Fedder, H.; Wolfer, M.; Becker, M.C.; Siyushev, P.; Jamali, M.;

Batalov, A.; Hinz, C.; Hees, J.; Kirste, L., et al. Diamond nanophotonics.

Beilstein Journal of Nanotechnology 2012, 3, 895-908, 10.3762/bjnano.3.100.

115. Mansfield, J.R.; Gossage, K.W.; Hoyt, C.C.; Levenson, R.M. Autofluorescence

removal, multiplexing, and automated analysis methods for in-vivo fluorescence

imaging. J Biomed Opt 2005, 10, 41207, 10.1117/1.2032458.

116. Davies, G.; Hamer, M.F. Optical studies of 1.945 ev vibronic band in diamond.

Proc R Soc Lon Ser-A 1976, 348, 285-298, DOI 10.1098/rspa.1976.0039.

117. Fu, C.C.; Lee, H.Y.; Chen, K.; Lim, T.S.; Wu, H.Y.; Lin, P.K.; Wei, P.K.; Tsao,

P.H.; Chang, H.C.; Fann, W. Characterization and application of single

fluorescent nanodiamonds as cellular biomarkers. Proc Natl Acad Sci U S A 2007,

104, 727-732, 10.1073/pnas.0605409104.

118. Ludwig, C.; Schwarzer, D.; Zettler, J.; Garbe, D.; Janning, P.; Czeslik, C.; Mootz,

H.D. Semisynthesis of proteins using split inteins. Methods Enzymol 2009, 462,

77-96, 10.1016/S0076-6879(09)62004-8.

119. Giriat, I.; Muir, T.W. Protein semi-synthesis in living cells. J Am Chem Soc 2003,

125, 7180-7181, 10.1021/ja034736i.

159

120. Romanelli, A.; Shekhtman, A.; Cowburn, D.; Muir, T.W. Semisynthesis of a

segmental isotopically labeled protein splicing precursor: Nmr evidence for an

unusual peptide bond at the n-extein-intein junction. Proc Natl Acad Sci U S A

2004, 101, 6397-6402, 10.1073/pnas.0306616101.

121. Zuger, S.; Iwai, H. Intein-based biosynthetic incorporation of unlabeled protein

tags into isotopically labeled proteins for nmr studies. Nat Biotechnol 2005, 23,

736-740, 10.1038/nbt1097.

122. Bachmann, A.L.; Mootz, H.D. N-terminal chemical protein labeling using the

naturally split gos-terl intein. J Pept Sci 2017, 10.1002/psc.2996.

123. Hermanson, G.T. Bioconjugate techniques, 2nd edition. Bioconjugate Techniques,

2nd Edition 2008, 1-1202, Doi 10.1016/B978-0-12-370501-3.00001-1.

124. Chang, J.Y. A two-stage mechanism for the reductive unfolding of disulfide-

containing proteins. J Biol Chem 1997, 272, 69-75.

125. Li, X.; Stuckert, P.; Bosch, I.; Marks, J.D.; Marasco, W.A. Single-chain antibody-

mediated gene delivery into erbb2-positive human breast cancer cells. Cancer

Gene Ther 2001, 8, 555-565, 10.1038/sj.cgt.7700337.

160

Appendix A Primer list

Table 10 Primers used for the plasmid construction in Chapter 2

Primer Name Sequence 5’ – 3’ BglII-Forward ATCGAGATCTGACGTCCGATCCC CTCAAAAATGGCTTCATAGCTCATAATATGCACCC NpuC*-Blac-1 (overlap) AGAAACGCTGGTG CACCAGCGTTTCTGGGTGCATATTATGAGCTATGAA NpuC*-Blac-2 (overlap) GCCATTTTTGAG GCGCTCGAGTCAGTGATGATGATGATGATGCCAAT Blac-2 (His-Stop-XhoI) GCTTAATCAGTGAGGC CAAAAATGGCTTCATAGCTCATAATATGGTGAGCA NpuC*-sfGFP-1 (overlap) AGGGCGAGGAGCTG CAGCTCCTCGCCCTTGCTCACCATATTATGAGCTAT NpuC*-sfGFP-2 (overlap) GAAGCCATTTTTG T7 terminator GCCCCAAGGGGTTATGCTAG ATGGCTTCATAGCTCATAATATGACACCACTGGGTC NpuC*-GCSF-1 (overlap) CCGC GCGGGACCCAGTGGTGTCATATTATGAGCTATGAA NpuC*-GCSF-2 (overlap) GCCAT GCSF-2 (XhoI) ATCTGGCACAGCCCCTCGAG CTCAAAAATGGCTTCATAGCTCATAATATGAAAAT NpuC*-MBP-1 (overlap) CGAAGAAGGTAAA TTTACCTTCTTCGATTTTCATATTATGAGCTATGAA NpuC*-MBP-2 (overlap) GCCATTTTTGAG GCGCTCGAGTCATTAATGATGATGATGATGATGCG MBP-2 (His-Stop-XhoI) AGCTCGAATTAGTCTGCGC GCGGAATTCGGTGACGGTCACGGTGCCTTAAGCTA EcoRI-NpuN*-F TGAAACGGA XbaI-NpuN*-R GGCTCTAGATTAATTCGGCAAATTATCAACCCG

161

Table 11 Primers used for the plasmid construction in Chapter 3

Primer Name Sequence 5’ – 3’ TTGAAGCTTGACCATGGAGACAGACACACTCCTGC IK-kozac-Npuc-1 TATGGGTACTGCTGCTCTGGGTTCCAGGTTCCACTG (HindIII) GTGACATCAAAATAGCCACACGTAAA ATGCAAGCGCTGAATTCTCCGGAGGTTATTTTCCAC AfeI-IRES-F CATATTGCCGTCTTTTGGCAATGTGAG mEGFP-stop-PacI-XhoI- ATGCACTCGAGTTAATTAATCACGTGTCTAGACTAC R TTGTACAGCTCGTCCATGCCGAGAGTGATCCC AAGCGCAAGCACTGTGAGTCGGGTCATGGCGGTAC Inverse AZ-1 CAAGCTTTATCGCTAGCCGGAG CTGGCTGGCCTTCTGGCATCATCAAGAGCCATCAA Inverse AZ-2 AATAGCCACACGTAAATATTTAGGCAAAC AAGCGGGGAGCTGCCGGCTCTTGATGATGCCAGAA Inverse AZ+10AA-1 GGCCAGCCAGAAGC TTGGATATTGTTGGTACTTAAGATCAAAATAGCCAC Inverse AZ+10AA-2 ACGTAAATATTTAGGCAAAC ACCAACAATATCCAAAAGCGGGGAGCTGCCGGCTC Inverse AZ+20AA-1 TTGATGATGCCAGAAGGCCAGCCAGAAGC GGTAGGAAAGCACGCCCCCGCCAGTTTCCCCTTAA Inverse AZ+20AA-2 GATCAAAATAGCCACACGTAAATATTTAGGCAAAC GGGGCGTGCTTTCCTACCACCAACAATATCCAAAA Inverse AZ+30AA-1 GCGGGGAGCTGCCGGCTCTTGATGATGCCAGAAGG CCAGCCAGAAGC GATTTGGGCGAAGAGAACTTCAAGGCCCTGGTTCT Inverse AZ+30AA-2 TATTGCCCTTAAGATCAAAATAGCCACACGTAAAT ATTTAGGCAAAC TGTCAGGACAATGTCACCAGTGGAACCTGGAACCC Inverse IK+10AA-1 AGAGCAGCAG CAATCACCGGCGTCCCTTAAGATCAAAATAGCCAC Inverse IK+10AA-2 ACGTAAATATTTAGGCAAAC GGACGCCGGTGATTGTGTCAGGACAATGTCACCAG Inverse IK+20AA-1 TGGAACCTGGAACCCAGAGCAGCAG CTTGCCGTTAGCCTGGGTCAAAGAGCTACCCTTAA Inverse IK+20AA-2 GATCAAAATAGCCACACGTAAATATTTAGGCAAAC TTGACCCAGGCTAACGGCAAGGGACGCCGGTGATT Inverse IK+30AA-1 GTGTCAGGACAATGTCACCAGTGGAACCTGGAACC CAGAGCAGCAG AGAGCTACCATTAGCTGTCGGGCAAGCCAGTCCGT Inverse IK+30AA-2 ATCTCTTAAGATCAAAATAGCCACACGTAAATATTT AGGCAAAC CCTAAAGACACCCCGACTATACGCGCTGGAGAAGA Inverse SA+10AA-1 GAAAAAGC 162

Table 11 Continued

AGAGACGCTCACAAGCTTAAGATCAAAATAGCCAC Inverse SA+10AA-2 ACGTAAATATTTAGGCAAAC CTTGTGAGCGTCTCTCCTAAAGACACCCCGACTATA Inverse SA+20AA-1 CGCGCTGGAGAAGAGAAAAAGC TCTGAAGTCGCTCATCGGTTTAAGGATTTGCTTAAG Inverse SA+20AA-2 ATCAAAATAGCCACACGTAAATATTTAGGCAAAC CTTAAACCGATGAGCGACTTCAGACTTGTGAGCGT Inverse SA+30AA-1 CTCTCCTAAAGACACCCCGACTATACGCGCTGGAG AAGAGAAAAAGC GATTTGGGCGAAGAGAACTTCAAGGCCCTGGTTCT Inverse SA+30AA-2 TATTGCCCTTAAGATCAAAATAGCCACACGTAAAT ATTTAGGCAAAC CAGGCTAATAAAGGTTACCCACTTCATGGCGGTAC Inverse SA-1 CAAGCTTTATCGCTAGCCGGAG CTTTTTCTCTTCTCCAGCGCGTATAGTATCAAAATA Inverse SA-2 GCCACACGTAAATATTTAGGCAAAC

163

Table 12 Primers used for the plasmid construction in Chapter 4

Primer Name Sequence 5’ – 3’ ATGCGGATCCAATAATTTTGTTTAACTTTAAGAAGG BamHI-RBS-CFP-F AGATATACATATGTCTAAAGGTGAAGAATTATTCG GCGG AGACCCGCCTCCACCGAATTCTTTGTACAATTCATC BsrGI-CFP-R CATACCATGGG GAATTCGGTGGAGGCGGGTCTGGTGACGGTCACGG EcoRI-ZnNpuN-F TGCCTTAAG ATCGGGTACCATGTCTAAAGGTGAAGAATTATTCA Acc65I-YFP-F CTGG AATTCTCGAGTTAGTGGTGGTGGTGGTGGTGTTTGT XhoI-His6-YFP-R AC ACATGGTACCCGGGGTAGCATTATGAGCTATGAAG Acc65I-ATP-NpuC(HN) CCATTTTTGAGTGC ACATGGTACCGTTGAAAAAATTATGAGCTATGAAG Acc65I-FFN-NpuC(HN) CCATTTTTGAGTGC ACATGGTACCCGGGGTCATATTATGAGCTATGAAG Acc65I-MTP-NpuC(HN) CCATTTTTGAGTGC -1-2-3 library for ATGCGAATTCGGTGGAGGCGGGTCTNNSNNSNNSG Npu(EcoRI) CCTTAAGCTATGAAACGGAAATATTGACAG

164

Table 13 Primers used for the plasmid construction in Chapter 5

Primer Name Sequence 5’ – 3’ CCGTTCACCTCGATGTTAATGTAGGACTCTTGGGAT Inverse GOSN-G4S-Cys- ATAGATTCAAACTCCACCTTGCTTCCTCCTCCTCCG FactorXa-1 CAGAATTCTGAAATCCTTCCCTCGATCCC AAAAGTCGAAACCATAAAGATAGGAGACTTATATA Inverse GosN-His6-2 AAAAACTTAGTTTTAACGAACGCAAGTTCAACGAA CTCGAGCACCACCACCACCAC CAATAATTTTGTTTAACTTTAAGAAGGAGATATACA GOSc Fragment 1 FOR TATGCATCATCATCATCATCATAAGTTGCCAG GOSc sfGFP CAGCTCCTCGCCCTTGCTCACGCCCAGGAATTCGCA FRAGMENT 1 REV ATTGTG sfGFP Fragment 2 FOR GTGAGCAAGGGCGAGGAGCTG CAGCTTCCTTTCGGGCTTTGTAAGCTTCTCGAGTCA sfGFP Fragment 2 REV CTTGTACAGCTCGTCCATGCC pET Vector FOR ACAAAGCCCGAAAGGAAGCTG CTCAATATCGATCAGTTCACGTTCATCCAGCTCTTC Inverse GP41.1-RBS-1 GATTTTCAGGATTTTTTTCAGCATCATATGTATATC TCCTTCTTAAAGTTAAACAAAATTATTGGATCC GTGTCCGGTAACCACCTGTTTTACGCTAACGATATT Inverse GP41.1C-MBP-2 CTGACCCACAACAGCAGCAGCGATGTGGGAGGCGG GGGTAGCTGCATGAAAATCGAAGAAGGTAAACTGG RBS-sfGFP-fragment1- CCAATAATTTTGTTTAACTTTAAGAAGGAGATATAC FOR ATATGGTGAGCAAGGGCGAGGAG TRSGY-EcoRI-sfGFP- ATAACCCGACCGGGTGAATTCCTTATACAGCTCGTC fragment1-REV CATGCCG EcoRI-TRSGY-GP41N- GAATTCACCCGGTCGGGTTATTGCTTGGATCTGAAA fragment2-FOR ACCCAGG CAGCTTCCTTTCGGGCTTTGTTCAGTGGTGGTGGTG STOP-His6-XhoI- GTGGTGCTCGAGTTCTTTAACATACAGACACATACC GP41N-fragment2-REV TTCTTTCAG

165