<<

Protein Engineering Hydrophobic Core Residues of Computationally Designed G and Single-Chain Rop: Investigating the Relationship between Protein Primary structure and Protein Stability through High-Throughput Approaches

Thesis

Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University

By

Weiyi Li, B.S.

Graduate Program in Chemistry

The Ohio State University

2014

Thesis Committee:

Professor Thomas J. Magliery, Advisor

Professor Karin Musier-Forsyth

Copyright by

Weiyi Li

2014

Abstract

The sequence-structure-stability relationship is a key problem in the field of protein science. Although a large amount of research has been working on it in various methods and aspects, it is still not completely understood. Recently, the cooperation between rational design and combinatorial library methods bring new insight into the protein hydrophobic core. In this study, we investigated the influence of hydrophobic core residue packing to protein stability according to a computationally designed Protein G homolog and a single-chain four helix-bundle protein Rop. Based on a previously computationally designed protein G, we established two parallel hydrophobic core libraries muti-site and single-site with 6 residues in the hydrophobic core randomized simultaneously− or individually. High− Throughput Thermal Scanning (HTTS) and colony- based DNA sequencing were utilized to investigate the protein thermal stability. The comparison of the HTTS from the two libraries indicated that none of the expected mutations results in a thermally more stable protein than the computationally designed protein G, and the single-site mutation showed a similar effect than multi-site mutation on the computationally designed protein G. The original computational design was more stable than both of the two library designs. The core library of single-chain Rop was constructed as the eight residues in the central two layers randomized into all 20 amino acids. The large library was screened for Rop function, based on a cell-based screen with

ii GFP reporter plasmid. The library was selected to eliminate the background inactive Rop protein and enrich the active Rop. Unfortunately, a significant amount of the colony- based DNA sequencing results showed the presence of non-authentic Rop sequence.

Cloning contamination and arabinose concentration used in screening could be two potential factors that cause the failure of screening and selection. The library construction and cloning procedure also need to be revisited.

iii

Dedication

To my family

iv Acknowledgements

The work presented in this thesis would not have been possible without the help and support of many people. In my years at The Ohio State University, it is my pleasure to meet and work with a number of bright and talented graduate students, postdocs, staff, and faculty. I would like to express my sincere gratitude to my advisor Professor Thomas

J. Magliery for his dedicative spirit, admirable guidance, and strongly support for me. I thank all my coworkers who answered my questions patiently, pointed out my mistakes during the experiment, and provide a harmonious working atmosphere. I thank my committee member Dr. Musier-Forsyth who takes time out from busy schedule. I thank every Maliery lab member for their help and support, they make my years in graduate valuable and enjoyable.

v Vita

2011 ...... B.S. Biochemistry, University of Massachusetts Boston, Boston, Massachusetts

2011 to present ...... Graduate Teaching Associate, Chemistry, The Ohio State University

Fields of Study

Major Field: Chemistry

Specialization: Biological Chemistry

vi Table of Contents

Abstract ...... ii

Dedication ...... iv

Acknowledgements ...... v

Vita ...... vi

List of Tables ...... ix

List of Figures ...... x

Chapter 1: Introduction ...... 1 1.1 Protein structure and stability ...... 1 1.2 Computational design method ...... 7 1.3 Combinatorial design ...... 8 1.4 Hydrophobic core design ...... 11 1.5 High-throughput thermal scanning ...... 13 1.6 Conclusion ...... 14 1.7 Computationally designed protein G ...... 15 1.8 Single-chain Rop ...... 19

Chapter 2: Method and material ...... 25 2.1 Reagent ...... 25 2.2 Plasmids and strains...... 26 2.3 Instruments ...... 26 2.4 General protocal ...... 27 2.5 Computationally designed Protein G library ...... 29 2.6 Single-chain (sc) Rop NNK-core library ...... 31 2.7 Screening and selection ...... 33 2.8 High-Throughput protein purification ...... 34 2.9 High-Throughput thermal scanning...... 35 2.10 Colony-based sequencing ...... 36

vii 2.11 Protein purification ...... 36

Chapter 3: Results and discussion...... 39 3.1 Construction of computationally designed protein G library ...... 39 3.2 HTTS and sequencing ...... 41 3.3 Discussion of protein G library ...... 47 3.4 Construction of single-chain Rop library ...... 50 3.5 Screening and selection ...... 52 3.6 Discussion of single-chain Rop library ...... 54

References ...... 56

viii List of Tables

Table 1 6 mutation sites of computational designed GB1 ...... 31

Table 2 Oligo for construction of computationally designed protein G ...... 31

Table 3 Oligo for construction of single-chain Rop ...... 33

Table 4 Tm value for each confirmed GB1 variants ...... 45

ix List of Figures

Figure 1 Fundamental Rules ...... 18

Figure 2 Heptad repeat diagram of two layers of Rop ...... 21

Figure 3 Ribbon diagram of wild-type Rop and schematic representations ...... 22

Figure 4 Topology redesign strategy ...... 23

Figure 5 Schematic representation of the four possible topologies for sc-Rop variants ...24

Figure 6 Plasmid pMRH6 with Kanamycin resistance gene ...... 30

Figure 7 The structure of computational designed protein GB1...... 40

Figure 8 HTTS of multi-site mutation on designed protein G ...... 43

Figure 9 HTTS of single-site mutation on designed protein G ...... 45

Figure 10 Residual distributions at the 6 mutation sites ...... 46

Figure 11 Structure of single-chain Rop ...... 51

Figure 12 Screening of NNK library ...... 52

Figure 13 Cloning strategies for single-chain Rop ...... 53

Figure 14 XbaI and XmaI site in pMRH6 plasmid and pUCBADGFPuv plasmid ...... 55

x Chapter 1

Introduction

1.1 Protein structure and stability

Epidemic disease has been raging for the history of humankind. Fortunately, the advance in medicine significantly contributes to the prolongation of human life. However, the increase of lifespan is compensated by some degenerative disease including Alzheimer’s, cancer, and others. Different from epidemic diseases that are caused by bacteria, fungi, and viruses, the degenerative diseases are caused by the misfolding of protein in the human body. Unexpected mutations along with some external factors result in the incorrect folding and instability of . How can we explain the protein folding in molecular level? What are the factors that affect protein stability? How can we improve protein stability?

From unfolded to folded states, proteins undergo a pathway to a lower entropy and lower free energy states. Therefore, a folded protein has a single conformation instead of multiple conformation of unfolded protein. Since unfolded state is highly “disordered”, it is possible for a protein to start with different starting points and follow different pathway to achieve different intermediates. Based on Leveinthal’s paradox, it will take an enormous long time for a protein to randomly search among all possible configurations to

fold. In order to minimize the time for sampling all possible configurations, it would only

1 require a tiny time scale for a protein to achieve to its folded state. A better explanation

suggests that a bias against local unfavorable folding will determine the folding of a peptide. In other words, a local stable interaction determines the further folding of the peptide in vicinity. 1 In reality, most of the wild type protein is still marginally stable,

which means that native protein still has potential for evolution to achieve a more stable

state. The instability of a protein is always closely related to the weak forces or

unfavorable force that destabilized protein three-dimensional structures. The hydrophobic

effect is considered as a major force for protein folding, which drives protein to a more

condensed structure by minimizing the unfavorable contact between the hydrophobic

residue and water molecule.2 Based on thermodynamics, the spontaneous folding of a

protein requires a negative G, so a negative enthalpy change or a positive entropy

change are required.3 Obviously,∆ the negative enthalpy change could be contributed by

the release of heat during the formation of hydrogen bond and other covalent bond.

However, decreasing in entropy is usually not considered as a favorable process. To

rationalize this process, the surrounding should be considered. As the protein folds, the

hydrophobic residues are gradually buried into the interior of protein, and the entropy of

water molecules that initially surround the exposed hydrophobic residues increase from

order state to a less order state. Since the entropy gain of water molecules overweighs the

entropy consumption of the protein, the overall entropy increases as the protein folds.

Hydrogen bonds and salt-bridges are also considered as an important factor to affect

protein stability. Increased amounts of hydrogen bonds and salt-bridge generally increase

protein stability.4,5

2 It is postulated by Anfinsen6 that the native structure of globular proteins is

determined only by the protein’s sequence. The complexity of determining

protein structure including the knowledge of the characteristics of each individual amino

acid, since the hydrophobicity, surface area charge distribution, main chain and side-

chain hydrogen bond, salt link, and even the oligomeric states are all closely related to

the feature of the primary protein sequence. Currently, we would like to investigate the relationship between amino acid sequence and protein stability. The conformational stability of a protein is directly related to its Tm. An early study7 from Zhang and coworkers on T4 lysozyme showed that seven residues of substitution were additive and

8 increased Tm 8.3 . Malakauskas and Mayo (1998) made seven mutations on protein

G 1, which increase℃ the stability by 4.3 kcal/mol, correspond with an increase of Tm to

above훽 100 . Ruvinove et al.9 increased the stability of subtilisin prodomain by 2.5

kcal/mol with℃ three mutations. Pace10 showed that single amino acid substitutions

resulted in a large increase in stability, but without accurate prediction in advance (1990).

He also concluded that a 1 kcal/mol increase in conformational stability at 25 corresponds to an increase of 7 in Tm. Although previous study showed manual℃

alteration of the relationship between℃ protein stability and amino acid sequence, it is still

difficult to predict the impact of a single change in amino acid sequence to protein

stability.

In order to improve the protein stability, it is important to study the structure of

natural protein with extreme thermostability. Thermophiles have been studied recently to

compare with mesophiles. Unlike other type of bacteria, thermophiles were found in

3 various geographically heated regions, and survived at temperature as high as 122 .11

Based on Szilágyi and Závodszky’s12 work, the differences in various characteristic℃s (ion pairs, cavities, hydrogen bonds, secondary structure, and polarity of surfaces) were seen between the extreme thermophiles, moderate thermophiles, and mesophiles. The stabilities between thermophiles and mesophiles were strongly correlated with number and networking of ion pairs. The number of weaker ion pairs increase significantly in both extremely thermophilic protein and moderately thermophilic protein, whereas, extremely thermophilic ones showed additional strong ion pairs. A major reason that the stability of ion pairs is relatively low in room temperature is that the favorable interaction within salt bridge is insufficient to compensate the desolvation penalty. The other properties between extremely thermophilic protein and moderately thermophilic protein are relatively less significant. An increase in polarity of exposed surface and helices propensity is seen in moderately thermophilic proteins, and a decrease in the 훼number of cavities and propensity is observed in extremely thermophilic proteins. Although a great variation훽 of factors for stabilization was found, the general trend still provided information about the evolution of thermophile from mesophile to adapt to higher temperature. The engineering of enhanced thermostable protein can be achieved by focusing on the electrostatic interaction on the protein surface. However, Gromiha and coworkers13 proposed a unified, systematic principle to directly compare the thermostability between thermophilic protein and mesophilic protein by computing different parameters. In their works, a set of 373 thermophilic proteins and their mesophilic counterparts were tested for surrounding hydrophobicity, ion pairs, hydrogen

4 bonds, inter-residue interaction energy, long-range order, and multiple contact index. As

a result, 80% of the analyzed proteins have a higher hydrophobicity in thermophilic

proteins than mesophilic ones, which account for a major contribution to a higher

thermostability for thermophilic protein. The ion pairs, hydrogen bonds, and interaction

energy are favored in 68%, 50%, and 62% of thermophilic proteins, respectively. In

addition, surrounding hydrophobicity of residues in the interior of both mesophilic and

thermophilic proteins were computed in their work. It was observed that core residues are

tightly packed while surface residues are more loosely packed in thermophilic proteins.

Although the stability of thermophilic proteins has been viewed from different

perspectives, there is still controversial about the most important factor for designing

thermostable proteins.

Although the engineering of protein stability is achievable, it is known that the

increasing of protein stability could be a tradeoff of other properties of proteins. Since

enzymes were thought to use their ordered structure to perform catalytic reaction, it could

be true that the enzyme residues that are involved in the catalytic reaction are not

optimized for stability. Shoichet and coworkers14 first tested the relationship between

protein stability and protein function by mutating functionally important residues in the

active sites of T4 lysozyme. As a result, six mutations at two catalytic resides reduced the

activity but increased the thermal stability, and nine mutations at two substrate-binding residues increased the stability with the cost of activity. Regarding to the extremely high thermostability of thermophilic protein, it was a breakthrough point to study the stability-

activity relationship by investigating the activity of thermophilic protein. The stability-

5 activity relationship between thermophile, mesophile, and psychrophilic -amylases were

studied by D’Amico and coworkers15. The negative relationship of increasing훼 stability and decreasing in conformational flexibility was confirmed for the three types of proteins

by testing the fluorescence quenching of three proteins through probing the accessibility

of tryptophan. The activity-stability relationship was explained by focusing on the

stability between active site and protein structure of each type of protein. Thermophilic

and mesophilic -amylases showed that the structural unfolding is a major reason for the loss of activity at훼 high temperature, while the active site of psychrophilic enzymes is

more heat-labile than its protein structure. Therefore, the low stability of active site is a

major determinant of activity at low temperature. They also proposed a new view for the

stability-activity relationship by introducing a folding funnel for psychrophilic and thermophilic enzymes. The bottom properties of psychrophilic enzyme depicts a large population of conformers with low energy barriers to flip between them, and low enthalpy change for interconversion between them, whereas the bottom for thermophilic enzyme depicted a single global minimum with high energy barriers. Upon substrate binding to any subpopulation of psychrophilic protein, the equilibrium shift from conformers to the subpopulation requires low free energy change. However, the substrate binding to a single conformer of thermophilic protein, with a high free energy change required from other unpopular conformers shifting toward to the single conformer. From their studies, a rational basis for protein adaptation to high temperature with least compensation of function was provided.

Basically, my study focuses on the effect of hydrophobic residues to protein stability

6 by investigating two types of proteins that are easy to prepare, and are well-studied

previously. Through a comprehensive mutagenesis study of the hydrophobic core of the

protein Rop, we are interested in the effect of residue propensity in the hydrophobic core

to the thermostability of the overall structure, the topology of the secondary structure,

oligomerization with respect to the modified structure, and the stability-function relationship. Site-directed mutagenesis is introduced into a protein G mimetic with a computational design in advance. For the designed protein G, we focus on rationalizing the suitability of computational design method with respect to the experimental construction. Overall, by introducing combinatorial library approaches, we attempt to bring new view about directed evolution into protein stability engineering.

1.2 Computational design method

Rational protein design is the creation of novel protein to fold into a predictable structure. The first fully designed de novo protein was done by Stephan Mayo in 199716.

In 2003, Kuhlman and coworkers17 firstly designed a protein with a fold not yet found in nature. In 2008, Baker and coworkers18,19 computationally designed enzymes for two different reactions. However, the challenge of de novo protein designs includes robustness of protein folding, the enzymatic activities, and the binding pocket, which retard the progress of its practical application.20 However, the more practical application

of protein engineering is the redesign of an existing protein, to improve its functional and

structural characteristics. Computational methods are the foundation for protein

engineering. Theoretically, target structures are first chosen from existing protein structure. The freedom of amino acids was then defined based on the chemical variability

7 of the sequence. The structural flexibility is designed by using rotamer libraries to infer

the side-chain conformation. Finally, an energy function was introduced to quantify

sequence-structure compatibility by ranking and scoring the sequence.21

Computational protein design tries to mimic the course of evolution, to choose

highly functional sequences and improve the protein properties by fine-tuning the backbone and side-chain conformation. However, the experimental validation is a grand challenge for the computational design. Although several groups have expressed proteins based on their computational studies, there is actually no valid way to accurately calculate the protein stability based on the point mutations introduced into the gene sequence. One method to access to the experimental study of protein stability is the systematic study, which is performed by rationally designing a small number of proteins, and intensively study their physical properties including X-ray crystallography, NMR spectroscopy, CD spectroscopy, etc. Another method is based on the establishment of combinatorial library. A combinatorial gene library can sample a vast amount of gene sequence with the randomized mutation sites. A further sorting process can be accessed to select the desired variants according to their stability or function. In Magliery’s laboratory, combinatorial study is a major method for accessing problem of protein stability.

1.3 Combinatorial methods

For different purposes of protein design, two strategies of combinatorial methods

can be introduced. The random design is an exploration of the extensive region of

8 sequence space, in other words, to investigate all the possibility of all sequences within

the library size. As a result, a large variability of protein stability and function could be

discovered. However, proteins with desired function and stability will occur at a low frequency. Rational design is more purposeful compared with random design. In this

strategy, a desired protein structure is a primary goal by designing a biased library with

specific codes to achieve a certain property such as arranging hydrophobic residues in the inner core or arranging charged residues on the water solvated surface.22,23 Rational

design to improve preexisting scaffold is thoroughly and widely studied in Magliery’s

Lab.

In order to design a robust protein, it is important to understand the methodology

that nature uses to arranges residues in the peptide chain. Upon folding, the residues with

polar and nonpolar side chains behave differently in accessibility to solvent in a protein,

and the hydrophobicity of each residue was first quantitated by Michael Zehfus24. In

addition, the prediction of secondary structure is also thoroughly studied in the last

several decades. Dictionary of protein secondary structure (DSSP) was proposed by

Wolfgang Kabsch and Christian Sander25, which is the first method to build a relationship

between amino acid sequence and protein structure. DSSP designed a pattern-recognition

process of hydrogen-bonded and geometrical features extracted from X-ray coordinates

to assign eight types of secondary structure. DSSP was developed by introducing

multiple hydrogen bond thresholds in 2002 by Burkhard Rost26. The helix and beta

propensity scales are also a widely studied area. Chou and Fasman developed the first

helix propensity scale by using the sequence.27 Helix propensities were also studied in

9 amino acid polymers and the defined length peptides. In 1998, Martin Scholtz28 derived a

helix propensity scale based on the measurement of previous helix propensity in 11

systems, including both proteins and peptides. In 2012, helix propensities were studied from molecular simulation.29 Experimental quantitation of beta-sheet preference has been

addressed first in a zinc-finger peptide.30 It was also studied in IgG-binding domain from protein G by site-directed mutagenesis into alanine.31

Based on the previous studies, several rules of residue preference are well

established. The amino acid types including Glu, Ala, Leu, Met, Gln, Lys, Arg, favor

helix formation. The amino acid types including Gly, Pro, Ser, Asn, Val, Thr, Tyr, and

Cys do not favor helix formation. In practice, current understanding of prediction strategy

is still far from native protein. Nevertheless, a great improvement has achieved to design

a relatively stable or well-function protein. In 2012, Koga and coworker32 proposed rules

for designing an ideal protein structure by consistent local and non-local interactions, and

it was confirmed that the NMR structure of the designed protein are remarkably

consistent with the computational design model.

The quality of the constructed combinatorial library is an obstacle before processing to the protein expression step. A selection step with regard to extracting the structurally

or functionally valuable variants is necessary to improve the quality of the library. The selection for structure is used in Magliery’s Lab, which aims at removing the misfolding

or unfolding variants from the well folded variants. For Rop, reporter plasmid was

constructed with the introduction of green florescence protein gene sequence, and the

screening plasmid is co-translated with the reporter. The effect of fluorescence is an

10 indication of the structure of the protein.33

1.4 Hydrophobic core design

In considering characterization of designed proteins, a major discrepancy between

designed protein and native protein is that designed protein lacks the degree of structural

order, which is observed in crystal structures of native proteins. Therefore, the

hydrophobic core packing effectively is considered to be the first hurdle for

computational design. Hydrophobic core packing was thought to theoretically have an

impact on four aspects of protein function: fold, stability, conformational uniqueness and

function. Based on the previous core design study, hydrophobic core packing plays an

important role in protein stability, playing some role in determining conformational

uniqueness, and has not been found to play a role in global protein folding and protein

function.33

A rational explanation of thermal stability increase caused by core designing was given by Ventura and Serrano34. They used an automatic design algorithm to select

several hundred core sequence of SH3, and 13 variants with calculated stability within a

certain range of wild type SH3 were chosen, and they are as divergent as possible from

natural SH3 sequences. Three variants of the 13 with stability similar to that of the WT

protein were selected for kinetic and structural analysis, which indicate that the three core

mutants bury a larger hydrophobic volume than the WT protein. Ile to Val mutation at

specific positions on the three sequences increased their stability by up to 2.0 kcal/mol

and slow down the unfolding. The increase in stability was contributed by releasing the

11 conformational strain due to the Ile. Therefore, there is no simple rule that high

compactness will result in increased stability.

A more widespread study of hydrophobic core mutagenesis was reported by Murphy and Kuhlman35. They completely redesign the protein hydrophobic core with allowance

of backbone flexibility. To explore the importance of sequence space in protein design,

they use four different computational procedures to generate designed sequences of a 105

amino acid four-helix buddle protein with difference in the flexibility of backbone. The

flexible backbone design with the native amino acid disallowed at each design position

(DRNN) showed an exceptional thermal stability. The extreme thermal stability of

DRNN might be partially due to the burial of an additional 27 hydrophobic atoms by

comparing with wild type structure. While the fixed backbone design with all amino acid

types allowed at each design position (FBAA) showed a nearly same stability with

DRNN, but with one less buried hydrophobic atom than in the wild-type protein.

Although there is no single characteristic that explains the reason that FBAA and DRNN

are more stable than the wild-type protein, the increased sequence diversity could bring

new view in designing a stable protein by perturbing the pre-determined structure.

The relationship between protein hydrophobic core and protein stability will always be an important topic, and it could be a guideline to investigate the impact of functionally important residues in the hydrophobic core. In addition, the pattern of the core residue networking could also be a phenotype that is readily observed to explain the protein core packing.

12 1.5 High throughput thermal scanning

For a library, collection of a billion or trillion variants is constructed simultaneously.

In a whole library, many of the variants will be structurally or functionally unacceptable.

And it will be an enormous amount of labor force to characterize each single variant

within the whole library. Therefore, a way to rapidly identify the protein structure and

stability is necessary to achieve. Circular dichroism and NMR spectroscopy are widely

used to study the protein structure of single variants. However, it is time-consuming and

material uneconomically to screen the whole library by using the traditional methods. In

practice, it is impossible to cover the entire variants within the library.

Edgell first used fluorescence as a method to characterize protein. He and his

colleagues36 introduced a two-channel semiautomated titrating fluorometer to carry out

automated chemical denaturation of eglin c in 20 min for each. Then, Todd and his

colleagues37 use ThermoFluor (a miniaturized high-throughput protein stability assay) to

analyze the protein stability in the presence of fluorescence dye. By comparing with

Edgells’ work of intrinsic fluorescence, the fluorescence in this case is due to the

interaction of the protein and the fluorescent dye. To explain the reporter dye, Ellestad

and colleagues38 examined the thermal unfolding of BACE1 by fluorescence dye Sypro

orange. The increase in fluorescence intensity is due to the exposure of hydrophobic core

that creates a low dielectric environment. Recently, Lavinder and coworkers39

demonstrate a high-throughput thermal scanning (HTTS) method to determine the approximate thermal stabilities of proteins with high throughput and low cost.

Differential scanning fluorimetry (DSF) is appropriate for HTTS because it fulfills the

13 criteria for high-throughput screening including low sample volume requirement, high sample throughput, ease of set up, and rapid analysis. Through utilizing a sensitive dye that is fluorescently active in hydrophobic environment, the molten globular protein could be readily distinguished from native-like protein. Since the molten globular protein is partially folded with loosely packed hydrophobic core, the dye will bind to the hydrophobic core and shows a large fluorescence signal at room temperature and no increase in fluorescence upon heating. However, the native-like protein with tightly packed hydrophobic core will show little fluorescence at room temperature but a sharp increase in fluorescence upon heating due to the binding of dye to the denatured hydrophobic core.40 HTTS is beneficial for screening a library with large amount of variants with their stability, which can be analyzed simultaneously by using real-time

PCR machine.

1.6 Conclusion

According to Darwinian evolution, the population of organisms is subject to the change in their environments. The beneficial mutations will gradually emerge, and non- beneficial mutations will gradually disappear over time. In the laboratory, the evolutionary principle could be mimicked to generate structurally or functionally interesting variants. A typical directed-evolution experiment includes an evolutionary cycle of mutation, selection, and amplification. A large amount of work in gene library design is considered to be directed-evolution. In practice, the enormous quantities of possible polypeptide sequences would be an obstacle to identifying and isolating the variants of interest. Therefore, it is important to wisely design the library with appropriate

14 technique. The combination of computational and experimental method is commonly

used in directed-evolution. The experimental method is commonly referred to PCR or

DNA shuffling. The advantage of PCR is that this technique doesn’t need knowledge of

structure or mechanism in advance. DNA shuffling combines the beneficial gene

together, and streamlines the exploration of sequence space. The selection of the variants

of interest is achieved by screening. The efficiency of screening is closely related to the

deepness of sequence. Libraries with 70 to 90 residue polypeptides composed primarily of random combinations of glutamine, leucine, and arginine were constructed and screened for folded molecules. For larger and complex random libraries, powerful selection methods, such as mRNA display, are needed to isolate structurally or functionally protein by sampling of sequence space and subsequent optimization of promising leads. During screening, the selected variants will be amplified. Currently, the directed-evolution is widely used and well established in tailoring the stability and specificity. Although successful design of efficient enzyme has been reported in previous studies, it is still a challenge to solve more complicated problems in medical and industrial applications. 41

1.7 Computational designed protein G

Protein G is an immunoglobulin-binding protein, with a 56-residue long B1 domain.

Its crystal structure was firstly solved by Gilliland and his colleagues42. GB1 is formed by

one and four beta strands, and an interface is formed between the beta sheet

and alpha helix. Protein GB1 contains no disulfide bonds or cofactors that could influence its stability, so it makes protein GB1 an ideal model to study importance of

15 specific design for its structure. 43,44 The folding pathway of GB1 and the role of alpha

helix and are well studied in the last two decade. Serrano and his colleague

discovered that the motif for folding and stability of GB1 is different.45 Later, Takada and

coworkers46 found that the folding of C-terminal beta hairpin was formed earlier and is more rigid, but the alpha helix is alternatively formed and deformed during the folding of

beta hairpin. In addition, McCallister and coworkers47 concluded that the C-terminal beta turn is largely formed in the transition state and the N-terminal beta turn is disrupted in the meantime. Through ab initio simulation, Kolinski and Kmiecik48 demonstrate that the

nucleation of the beta sheet residues between beta-1 and beta-4 strands occurs after the

formation of the C-terminal hairpin. In addition, Derreumaux49 concluded that C-terminal

hairpin can stabilize itself independently of the rest of protein G, but the N-terminal hairpin cannot. Since the folding of GB1 is thoroughly studied, is that possible to computationally design a GB1 with a different sequence? In other words, how can we design a protein with similar hydrophilic and hydrophobic pattern that can fold into a similar 3D structure with wild type GB1? And could this computationally designed protein achieve a higher thermal stability than wild type GB1?

In Magliery’s Lab, the introduction of Rosetta, a software for modeling protein structure, can offer a variety of effective sampling algorithms to manipulate protein backbone and side chain by proposing site-directed mutagenesis on the DNA sequence. A feature of wild type protein G structure could be investigated if there is positive or negative factor that can affect the protein stability. The wild type protein G showed a hydrophobic interface between the alpha helix and the beta sheet, and a hydrophilic outer

16 surface of beta sheet. However, the inner core of GB1 is not fully enclosed by the edge of

the alpha helix and beta sheet, because the aromatic residue might not be sufficient to

minimize the interactions between water and inner core. It might cause unfavorable

interaction between polar solvent and the hydrophobic core, and further destabilize the

protein structure. In addition, on the water-accessible beta sheet surface, the optimization of charge-charge interactions is a major strategy to improve the protein stability.50 Some work of computational design of protein G was done. Clore and his coworkers51 redesign

5 residues on the beta sheet that were involved in the formation of hydrophobic core.

They concluded that the stability of the mutants is not strictly correlated with the number

of changes and the residue volume. Mayo and his coworkers52 studied the relationship

between helical surface and protein stability by redesign six helix surface positions on

Protein GB1 to increase the helix propensity, and more thermally stable variants are

found by comparing with the wild type protein G. Since no hydrogen bond or salt bridge

interaction was in their designed residues, the increased stability of the variant is due to

the increased helix propensity and more favorable helix dipole interaction. Koga and

coworkers32 proposed a feasible and credible approach to design ideal protein structure

based on Rosetta, which was a guideline for protein G project performed in Magliery Lab.

This approach was designed to generate a funnel-shaped energy landscape by focusing on the local interaction between residues close along the linear sequence and filtering out the

non-local interactions that vary strongly with small alteration of its tertiary structure. The

fundamental rule for design of secondary structure first correlated the length of loop of

, , and with their chirality or orientation, and the different arrangement of the

훼훽 훽훽 훽훼

17 secondary structure were achieved by introducing a definition of chirality to specify the

L- and R- handed or Parallel or Antiparallel secondary structure. (Figure 1) The

dependence of chirality on loop length is similar for simulation and native protein. Then,

the two secondary structure elements are improved to favor a desired conformation of

three secondary structure including , , or with the difference in their

chirality and orientation. Then, the funnel훽훽훼 훼훼훽-shaped energy훽훼훽 landscape could be defined by

the secondary structures that favor the tertiary motif with desired topology, and

experimental characterization could be performed to determine the designed structure.

Apart from the stabilization by narrowing down the possibility of topology, the residue

sequence could be designed with favorable non-local interaction. This design principle

and methodology could be used to design stable protein building block. However, Rosetta

was failed to design a protein G homology, and mutagenesis study could focus on the

area where the Rosetta weakest: on the surface or at interface.

a b c d

Figure 1. Fundamental Rules. a, -rule. L (left-handed) and R (righthanded) -units. b, -rule. P(parallel) and A(antiparallel) -units are illustrated. c, -rule. d, Chirality (L versus R) of a -unit.32 훽훽 훽훽 훽훼 훽훼 훼훽 훽훽 18 1.8 Single-chain Rop

A four helix bundle protein is a globular protein that is capable of adopting a stable,

folded structure in aqueous solution, and it has been successfully designed by rational and

combinatorial methods. The first de novo design of a four helix-bundle protein was an incremental synthesis of two amphiphilic 16-residue peptides, which form two alpha helices that cooperatively tetramerize in solution.53 And the loop design achieved the goal

to design a helical dimer in solution.54 Furthermore, de novo design of coiled coil as a

starting point to achieve a stable globular protein make it possible to discover the

relationship between amino acid sequence and structure.55 Numerous rational design

experiments have been based on folding from coiled coil to helix bundle. Rop is one of

the four helix bundle protein that has been widely used in protein design work. Rop is a

63-residue protein that facilitate the binding of the inhibitory RNA 1 to the the ColE 1

origin RNA, thereby, modulating the copy number of ColE1 plasmid. Banner and his

colleagues56,57 found out that Rop was mostly dimeric under physiological condition.

Three years later, they determined the high resolution X-ray crystal structure of Rop,

which is a dimeric four-helix bundle with antiparallel topology. Then, the structure of

ColE1 Rop was determined by NMR data by a combined use of distance geometry and

restrained molecular dynamics calculations, which showed no major difference with the

previous X-ray structure.58 Cesareni and his coworkers59 did mutagenesis work on the

solvent exposed residue to discover the effect on its function. Rop is resistant to the

mutagenesis on the surface exposed mutagenesis.

Hydrophobic core packing is crucial for the stability of a four-helix bundle protein.

19 Site-directed mutagenesis in hydrophobic core of Rop has been studied to investigate the effect of replacing essential residues on protein folding and protein structure. Munson and coworkers60 first repacked the core by alternatively mutating each layer of the core

into alanine and leucine. By comparing with wild-type Rop, these new variants are thermally more stable, but are less resistant to chemical denaturants. Shortly after,

Munson and coworkers61 did a complete kinetic study of Rop. They redesigned the core

by randomizing different amount of residues into alanine and leucine in different amounts of layers. It turned out both folding and unfolding rate are dramatically increased by hydrophobic core mutagenesis. The molecular basis of the rates enhancement is the replacement of the buried salt bridges and hydrogen-bonding interactions. In addition, the cysteine residues in the hydrophobic core complicated the folding and unfolding of Rop

that can result in misfolding. Hari and coworkers62 studied cysteine-free Rop recently.

Mutagenesis work showed that the removing of cysteines increased the folding and unfolding of Rop.

In 1996, Munson and coworkers63 did a complete study of the effect of hydrophobic

core pattern to Rop stability. The wild type Rop has an 8 layers of alternative “a” and “d”

residues perpendicular to the long axis of the bundle (Figure 2), with “a” and “d” residues

reversed at only 2nd and 7th layers. This causes a slight curvature at the end of bundle and

could be a factor to destabilize the wild type Rop. However, Munson and coworkers

redesigned the “a” and “d” residues at the central two layers, central four layers, central 6

layers, and all the 8 layers, respectively, into alanine (“a”) and leucine (“d”). (Figure 3)

Every repacking pattern resulted in an increase in thermal stability comparing with wild

20 type, and Ala2-Leu2-8 and Ala2-Leu2-8-reverse has the highest stability. Other design with smaller size or larger size residue destabilized the structure.

Figure 2. Heptad repeat diagram of two layers of Rop showing the positions of the “a” and “d” residues, which pack between the four helices to form the hydrophobic core. Helices 1 and 2 are from one monomer; helices 1’ and 2’ are from the antiparallel second monomer. Arrows indicate the direction of the peptide chain, in the N-terminal to C- terminal direction63

21

Figure 3. Ribbon diagram of wild-type Rop and schematic representations showing the hydrophobic core side chains in one monomer of wild-type and the repacked mutants.Large circles represent leucine; small circles, alanine; large squares represent large, non-leucine side chains; and small squares represent small, nonalanine side chains. Ellipses represent methionine residues and triangles represent valine residues. Colored side chains indicate the repacked layers. Because Rop is an antiparallel dimer, residues in the upper layers of one monomer interact with residues in the lower layers of the other monomer and vice versa.63

Topological study of Rop also draws interest about the difference between monomeric and dimeric Rop. The original idea for a single chain Rop was the simplicity

of mutagenesis in hydrophobic core. For a dimeric Rop, the investigation of replacement

of one amino acid would be designed as a double change to match up the interaction

between antiparallel packing, and complicate the effect of a target mutation site.

Monomeric Rop can simplify the double change into the single residue change, and eliminate the effect from the other symmetric amino acid change. Predki and Regan64

22 demonstrated that the rearrangement and reconnection of the dimeric four-helix bundle

Rop into a monomeric Rop (1-1’-2’-2) could reach a similar to dimeric wild type Rop- like structure. (Figure 4) Also, it was confirmed that length of loop is a key point to determine the oligmeric state of Rop. Kresse and his coworkers65 compared another study

of different arrangement of loops in monomeric Rop. The secondary structure elements

and loops were rearranged as left-handed or right-handed structures, along with newly constructed loops or the same loops as in wild type. In their studies, the left-handed bundle was thermally more stable than the right-handed bundle. In the Magliery Lab, the monomeric four-helix bundle protein (single-chain Rop) was further studied by mutagenesis in the hydrophobic core based on A1-A2+B2+B1 scaffold in Figure 5.

Through random mutagenesis of the hydrophobic core, the residue propensity of amino acids of single-chain Rop could be compared with that of the dimeric Rop, and the possibility of a more complex interaction could be also detected in single-chain Rop.

Figure 4. Topology redesign strategy. A schematic illustration of the relative topology of wild-type Rop and the monomeric variants is depicted in the upper panels. In the lower panels, ribbon drawings of wild-type Rop and the model for the monomeric ROPS~ variant are shown. 64 23

Figure 5. Schematic representation of the four possible topologies for monomeric Rop variants. Wild-type Rop is shown in the middle for reference. The top two drawings are schematic representations of LM-Rop and RM-Rop. The two left-handed monomers are shown at the left and the right-handed monomers at the right. The helices are shown as cylinders and the loop regions as solid lines. The Termini are indicated by the letter N and C, respectively. The resulting order of the helices is indicated under the drawings; + signs indicate newly constructed loops and – signs indicate loops that are the same as in wild-type Rop.65

The hydrophobic core studies of single-chain Rop was not found in the previous research. Due to the simplicity of a mutagenesis study in single-chain Rop, the effect of single substitution at hydrophobic core could be examined independently by comparing with the dimeric Rop. My study will focus on establishing a combinatorial library at the central two layers of single-chain Rop. It will be an interesting topic to compare the core packing pattern and overall thermal stability of the monomeric Rop and dimeric Rop.

24 Chapter 2

Material and Method

2.1 Reagents

General chemical reagents and buffers were mainly purchased from American

Bioanalytical Inc, Sigma-Aldrich and Fisher Scientific. Enzymes (Restriction enzyme,

Phusion HF DNA polymerase, and T4 DNA ligase) and DNA ladders (100bp ladder, 1kb

ladder, and Lambda/BstE II ladder) were mainly purchased from New England Biolabs.

RNase A (protease-free) and protein ladder (10-225 kDa) were purchased from USB.

DNase 1 (recombinant, RNase-free) was purchased from Roche. Antibiotics (Ampicillin,

Kanamycin, and streptomycin), IPTG and reducing agent (DTT, PMSF) were mainly

purchased from Gold Bio. The 1000× stock solution (ampicillin at 100mg/ml, kanamycin

at 35 mg/ml, IPTG at 100 mM) were prepared sterile filtered using 0.2 um syringe filter

from Millipore. Individual deoxyribonucleotide triphosphate (100 mM) were purchased

from American Bioanalytical Inc. Oligonucleotide were purchased from Sigma-Aldrich

and suspended into 100 uM stock with 1×TE buffer (10 mM Tris, 1 mM EDTA, pH 8.0).

Cell culture media (Bacto Agar, Bacto Tryptone, Bacto Yeast extract) were purchased

from BD (Becton Dickinson& Company). Ni-NTA Agarose, Ni-NTA Magnetic Agarose

Beads (6×1ml), and DNA mini-preparation kits were purchased from Qiagen. TEV

Protease was produced in-house. Water for molecular biology was purified by Barnstead

NANOpure Diamond Water Purification System to 18 MΩ·cm. 25 2.2 Plasmids and cell stains

The pUCBADGFPuv plasmid was constructed by Magliery et al. The pMRH6

plasmid was constructed in the Magliery lab, the Ohio State University. E. coli strains

DH10B, BL21(DE3) cells were purchased were purchased from Stratagene. E. coli strain

DH10B (DE3) was lysogenized with DE3 lamboid phage by previous member of the

Thomas Magliery Lab.

2.3 Instrument

PCR reactions were performed in a CFX96 Real-Time PCR Detection System (Bio-

Rad). Concentration of DNA solutions was performed in a DNA1200P-15 speedvac

(Thermo Scientific). DNA electrophoresis was performed in a 10 cm horizontal gel

apparatus (1-2% agarose gel). Protein electrophoresis was performed in a 6.5 cm vertical

gel apparatus (12.5% or 18% SDS-PAGE gels). BioRad PowerPac Basic was used as a

power supply for electrophoresis. Electroporation was performed in a Bio-Rad

MicroPulser Electroporator. Sonication was carried out using a Misonix Sonicator 3000

Ultrasonic Cell Disruptor with Temperature Control. Centrifugation for small volume

sample (0-1.5 ml) was performed in Legend Micro 21R (Fisher Scientific) or Eppendorf

5451D. Centrifugation for medium volume sample (2-50 ml) was performed in

Eppendorf 5810R. Centrifugation for high speed and large volume samples was

performed in Sorvall RC 6 (Thermo Scientific). Protein concentration and cell density

was determined using Agilent 8453 UV-visible spectrophotometer. High-throughput

thermal scanning was performed with BioRad C1000 thermal Cycler CFX96. Liquid cell

culture incubation was performed in INNOVA 4335 (New Brunswick Scientific),

26 Thermo Scientific SHKE480, or Thermo Scientific SHKA 4000. Plate culture incubation

was performed in Thermo Scientific Heraeus B12.

2.4 General protocol

PCR reactions were performed under instruction of PCR Protocol for Phusion High-

Fidelity DNA Polymerase (M0530) provided by New England Biolabs. In general, 25 ul

or 50 ul were used for reassembly and amplification reaction. A typical 25 ul reassembly

reaction included final concentration of 1×Phusion HF buffer (5 ul of 5×Phusion HF

buffer), 200 uM dNTP (0.5 ul of 10 mM dNTPs), 2 uM each primer (0.5 ul of 100 uM

each primer), 1 unit Phusion HF Polymerase (0.5 ul of 2000 units/ml), and 18 ul

nuclease-free water. DMSO was optionally added to a 3% final concentration. The

reassembly reaction was carried out with 5 cycles with an denaturation step for 1 minutes

at 95 , followed by a series of thermal cycle at 95 for 30 seconds, a temperature

range ℃varied from 55 to 70 for 30 seconds, and℃ 72 for 20 seconds, and an

extension step at 72 ℃ for 5 minutes.℃ Finally, the reaction℃ was held at 4 . Amplification reactions were set up℃ in a similar condition as the reassembly reaction, but℃ with an additional 25-50 ng DNA template for a 25 ul reaction, and 25 thermal cycles instead of 5 thermal cycles. PCR reaction was cleaned up following the Qiagen PCR purification kit and protocol. Restriction digestion reaction was performed under the instruction of

Optimizing Restriction Endonuclease Reaction from New England Biolabs. Restriction digestion was typically done at 50 ul included a final concentration of 1×NEBuffer (5 ul of 10×NEB), 10-20 units of restrictive enzyme, up to 1 ug DNA, and nuclease-free water. The reaction was incubated at temperature for optimal activity of the enzyme, with

27 an incubation time relative to the availability to be heat inactivated and suitability for

extended digestions of each enzyme. Gel purification of digested vector was purified on a

1~2% agarose gel and Qiagen gel purification kit with its protocol. Ligation performed

under the instruction of Ligation Protocal with T4 DNA Ligase (M0202) provided by

New England Biolab. Ligation was typically done with less than 10 ul volume include

1×T4 ligase buffer (1 ul 10×T4 ligase buffer), DNA with a final concentration of 5~20

ng/ul (molar ratio 1:3 vector: insert), and 1000 units of T4 ligase, and was incubated at 16

overnight. The ligations was cleaned up by an addition of equal volume of 25:24:1

Tris℃ -buffered phenol:chloroform: isoamyl alcohol, followed by 2 minutes vortex, 2

minutes spinning at 13200 rpm/16100 g, and removal of the bottom organic layer. Next,

an addition of equal volume of isoamyl alcohol was done to remove any residual phenol, followed by 2 minutes vortex, 2 minutes spinning at 13200 rpm/16100 g, and removal of the bottom organic layer. An ethanol precipitation was done to concentrate the sample and remove any salt and unwanted buffer agent. First, addition of 1/10th volume of 3 M

sodium acetate at pH 5.5 and 2.5 volume absolute ethanol was done, which was followed

by freezing at -80 . Then, the sample was centrifuged at 4 for 30-60 minutes, and

ethanol was removed.℃ After that, the sample was resuspended℃ by adding 2.5 volume of

70% ethanol, centrifuged at 4 degree for 30-60 minutes. Last, 70% ethanol was removed, and the sample was dried by speedvac.

Electrocompetent cells were prepared from 1 L 2YT culture. Seed culture (25 ml for

1L culture) was inoculated from a single colony on a streak agar plate and grown

overnight at 37 . The 1 L 2YT culture was grown at 37 with vigorous shaking to an

℃ ℃ 28 OD600~0.6 and plunged into on an ice bath to quench the cell growth. The cells were then

centrifuged at 6500 rpm for 5 minutes at 4 and then washed twice with 10% glycerol.

Then, the centrifugation and wash step were℃ repeated. After a third centrifugation, the

cells were resuspended in the residual 10% glycerol to afford a thick cell slurry. These

were quickly aliquoted (40-80 ul per aliquot) into cold 1.5 eppendorf tubes, and snap-

frozen on dry ice. Electrocompetent cell was stored at -80 . Electroporation was carried

out by mixing 1ul DNA into 1 aliquot of competent cell, pulsing℃ at 2.5 kV in a 0.2 cm

cuvette, and immediately quenched by 1 ml of 2YT medium. The culture was typically recovered by an additional 1 ml of 2YT medium for 1 hour, at 37 , 250 rpm. After 1 hour incubation, the culture was typically performed by spread plating℃ on agar plates with appropriate antibiotic selection.

2.5 Protein G library

The B1 domain of computationally designed Protein G was constructed by randomizing position 5, 7, 30, 34, 54, and 56 individually (single point mutation instead of mutation of 6 positions simultaneously) at the interface of alpha helix and beta strand into corresponding residues shown in Table 1. The design of site-directed mutagenesis was done by Rosetta@home project that was developed by David Baker Laboratory. The oligo design of Protein G was shown at Table 2. Each of the six libraries was constructed sequentially by reassembly PCR and amplification PCR. The reassembly product was synthesized from overlapping fragment G-lib-1 and G-lib-2 by 18 complementary base pairs. The amplification product was synthesized from extending the reassembly by using

G-lib-F and G-lib-R. Both PCR reactions were catalyzed by Phusion HF Polymerase. The

29 final PCR product was double digested at AflIII and BamHI restriction sites, and cloned

into pMRH6 vector that was also digested at the same restriction sites. (Figure 6) The 6

libraries were transformed respectively into DH10B and grown with 1× kanamycin to saturation at 37 . The saturated culture was mixed with 50% glycerol as a volume ratio of 2:1, and stored℃ at -80 for further use. The 6 libraries from the glycerol stock were

minipreped and analytically℃ digested at restrictive site AsiSI and AflIII, and AsiSI and

BamHI respectively. The restriction site confirmed library was then sequenced. The 6

libraries from the glycerol stock were inoculated into 5 ml 2YT and grown with

1xKanamycin to OD600=1, respectively. To mix the 6 sub-libraries into one library, 500

ul of each library were collected and mixed, an additional 2 ml 2YT was added, and the

mixture grown at 37 with 1xKanamycin for 3 hours. The final Protein GB1 library was

mixed with 50% glycerol℃ as a volume ratio of 2:1, and stored at -80 for further use.

Figure 6. Plasmid pMRH6 with Kanamycin resistance gene

30 Postion (Identity) Randomized into

5 (Met) Leu, Met, Val

7 (Ile) Phe, Leu, Ile, Val

30 (Ala) Ala, Val

34 (Ala) Ala, Val

54 (Leu) Phe, Leu, Ile, Val

56 (Val) Phe, Leu, Ile, Val

Table 1. 6 mutation sites of computational designed GB1 and their corresponding mutants.

G-lib-F ATATATATA ACACGT GGC GAA AAT TTA TAT TTC CAG GGT AGC AGT GGC GGC CGT TAT GAG G-lib-1 GGT AGC AGT GGC GGT CGT TAT GAG VTG CGT NTY GAT GAT GGT AAC AAT ACA GAT ACA CAG ACC TTT AAC GTG ACC AGC CCG GAA GAA TTT CTG AGC AAT G-lib-2 ACG CTG GCC GCT CTT CCG AAA AGT CAC CTG TTT GCC GTT CTC CTT CGC TTT TTT GTC NRC GCT GCT ACG NRC ATT GCT CAG AAA TTC TTC CGG G-lib-R ATT ATA ATT GGA TCC TTA GCC ATC AAT ACG RAN ATC RAN ACG CTG GCC GCT CTT CCA AAA AG

Table 2. Oligo for construction of computational designed GB1. G-lib-1 and G-lib-2 are reassembly oligo. G-lib-F and G-lib-R are amplification oligo. The mutation sites are bold.

2.6 Single-chain (sc) rop NNK-core library

Sc-Rop NNK core library was designed to link two monomers of the four -helix buddle dimeric Rop into one single chain four-helix buddle monomeric Rop. The 8

positions of the two central layers (4 residues in each layer) were randomized into the

corresponding residues in table 3. The library was constructed based on an

31 engineered Cys-free Rop sequence, with an additional BsaI restriction site between the

two monomers. Oligos are shown at Table 3. The library design was started from

construction of each monomer (sc-1 and sc-2) by one reassembly reaction and two

amplification reactions, and followed by the linkage of sc-1 and sc-2 at BsaI site. Sc-1

reassembly product was synthesized from overlapping sc-1-1 and sc-1-2 at 18

complementary base pairs. The first Sc-1 amplification product was synthesized by

extending the reassembly product using sc-1-F (including restriction site AlfIII) and sc-1-

R (including restriction site BamHI) simultaneously. The second sc-1 amplification product was synthesized by extending the first amplification product using sc-1-F’ (70- mer) and sc-1-R’ (30-mer) simultaneously. The Synthesis of sc-2 was under the same protocol of sc-1. The sc-2 reassembly product was synthesized from sc-2-1 and sc-2-2.

The first sc-2 amplification product was synthesized from sc-2-F and sc-2-R. The second sc-2 amplification product was synthesized from sc-2-F’ and sc-2-R’. All the PCR reactions were catalyzed by Phusion HF Polymerase. Both sc-1 and sc-2 were digested at

Bsa1 sites, and were then ligated at Bsa1 site to form a complete sc-Rop. sc-Rop was double digested at Alf3 and BamH1 sites, and cloned into pMRH6 vector that was also digested at the same restrictive sites.

32

ATTATTACACGTGGCGGTGAAAACCTGTATTTTCAGACTAAGCAAGAGAAGACA SC-rop-F1 ACTAAGCAAGAGAAGACAGCACTTAATATGGCTCGTTTTNNKCGTTCTCAANNK SC-rop-1 CTTACTCTTCTTGAAAAACTTAATGAACTTGATGCT GAAAGAAGCAAGAACAGAACGATAAAGTTCATCMNNATGATCATGMNNAGAT SC-rop-2 TCAGCAATATCAGCTTGTTCGTCAGCATCAAGTTCATTAAG SC-rop-R1 ATTATTGGATCCGGTCTCGTTTTTTTTGAAAGAAGCAAGAAC SC-rop-F2 ATTATTACACGTGGTCTCNAAAACGGTCAAATTGATGAACAGGCTGACATC GATGAACAGGCTGACATCGCAGAAAGCNNKCATGACCATNNKGATGAACTGT SC-rop-3 ATCGTAGTGT AAGCTCATTTAATTTTTCCAGAAGGGTCAGMNNTTGACTGCGMNNAAAACGT SC-rop-4 GCCATGTTCAGTGCTGTTTTTTCTTGTTTAGAACCACCAAA SC-rop-R2 ATTATTGGATCCTCAACCCTTAGCAAGCTCATTTAATTTTTC AAAAAAAAAAATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Sc-FF1 AAAAAATTATTACACGTGGCGGT Sc-RR1 AAAAAAAAAAAAAAAAAAAAAAAAATTATTGGATCCTCAGGT AAAAAAAAAAAAAAAAAAAAAAAAAAAATAAAAAAAAAAAAAAAAAAAA Sc-FF2 AAAAAATTATTACACGTGGTCTC Sc-RR2 AAAAAAAAAAAAAAAAAAAAAAAAATTATTGGATCCTCAACC

Table 3. Oligo for construction of single-chain Rop. The mutation sites are bold.

2.7 Screening and selection

The sc-Rop NNK core library was transformed into DH10B (with pUCBADGFPuv

already inside) and grown on a LB-kan-amp-0.000% arabinose (KAA) agar plate at 42 ℃ overnight. According to a cell-based screen for function of Rop developed by Magliery and Regan33, the in vivo activity of Rop can be detected by comparing the fluorescence phenotype of colonies. The active Rop display a high fluorescence phenotype, the less

active Rop display a low fluorescence phenotype, and an inactive Rop did not display any

fluorescence. The active and inactive Rop can be distinguished and sorted from this

method. In addition, the percentage of active Rop in the library can be raised by several

round of selection growth. The naïve library was grown in 1L of 2YT at 42 for 16

hours. 1 ml of the saturated culture was collected and transferred into a new℃ 1L 2YT

33 medium and grown at the same condition. After each round of selection, the library was

screened on the Kanamycin plate to observe the fluorescence phenotype. The theoretical

library size of sc-Rop NNK core library is 208, and 65% “hit rate” (1% of total colonies

showed strong fluorescence) was obtained after one round of selection growth.

2.8 High-throughput protein purification

High-throughput protein purification was based on the protocol described in Jason

Lavinder’s dissertation67 and Lavinder68. Protein G library from the glycerol stock was grown and plated onto a LB-kanamycin plate. Individual colonies were grown in 1.5 ml

2YT containing 1×kanamycin in each well of 96-2ml-deepwell plate with porous membrane (TiterBlockplate) covered at 37 for 18 hours. The seeds were diluted with

2YT to OD600=0.85, induced with 1-10 ul IPTG,℃ and expressed at 30 for 10-14 hours.

The plate was spun down at 3700 rpm for 30 minutes, and the supernatants℃ were

discarded. The plate was covered with Titer-Tops film and stored at -80 for at least 1

hour. The cell pellets were thawed for 10 minutes on ice, and were then resuspende℃ d by

200 ul lysis buffer including 100 ugml-1 lysozyme, 0.5 ug DNase 1, 40 ng RNase A, 5 mM MgCl2, and 0.5 mM CaCl2. After incubation at 4 ℃ for 1 hour, centrifugation was performed at 3700 rpm to separate the pellets and soluble fractions. The soluble fraction was incubated at 4 for 1 hour with 20 ul Ni-NTA Magnetic Agarose Beads (Qiagen).

The magnetic beads℃ were immobilized at the bottom using a 96-well Magnetic Plate

(Qiagen), and the supernatants were removed by pipette. The magnetic beads were washed by wash buffer (lysis buffer with 20mM imidazole) and resuspended in 30 ul lysis buffer. The target proteins were cleaved off the resin by adding twice of 10

34 ug rTEV protease (in 0.5 ul) and 5 mM DTT and incubated at room temperature

overnight following incubation at 30 for 3 hours. Then, magnetic beads were

immobilized using the 96- well magnetic℃ plates and the cleaved proteins were transferred

into a new 96-well PCR plate.

2.9 High-throughput thermal scanning

HTTS was obtained from CFX96 Real-Time PCR Detection System (Bio-Rad).

Samples were prepared by mixing 1 ul of 300X SYPRO Orange dye (Sigma, stock solution of 5000X0 with 19 ul of protein. Samples were loaded into iCycler 96-well 0.2 thin-wall PCR plates, and sealed with iCycler optical quality sealing tape (BioRad).

Thermal denaturation was processed from 25 to 95 with an increment of 0.2 degree

per second. Fluorescence intensities were detected℃ using℃ the FRET channel of the RT-

PCR with used a 490+/-10nm excitation filter and a 575+/-10 nm emission filter. The HT

data were analyzed by Microsoft Excel. Expected HTTS data displayed a bell-shaped curve, the variants showed non-expression, molten globules, or no binding to dye were rejected. In addition, the data with less than 25% increase from room temperature to the fluorescence maximum and data with decrease more than 2-fold from the maximum fluorescence to the final temperature were rejected. Data then substitute in a variation of the Clarke & Fersht equation, which accounts for non-flat pretransitional baselines, to the data from 25 to the temperature at the florescence maximum. αF and βF are the

intercept and ℃slope of baseline for the folded states, respectively. m is an exponential

factor of the slope of the transition at melting temperature (Tm). The value of αF, βF,Tm,

and m were substituted into the normalized fluorescence signals at temperatures T by

35 least-squares using the Solver plug-in of Microsoft Excel. On the thermal denaturation

curve, Tm was the inflection point of at which the slope was maximal within the range

from room temperature to the temperature of fluorescence maximum.

2.10 Colony-based sequencing

The active variants confirmed by HTTS from protein G library were selected for sequencing. 12-15 variants directly from glycerol stock were spotted onto one LB-Kan agar plate, and were incubated at 37 in Thermo Scientific Heraeus B12 overnight. 96

of the active variant that showed strong℃ fluorescence in sc-core library were selected and

grown in a 96 X 2ml square deepwell TiterBlockplate at 37 overnight. The saturated

culture was then spotted onto LB-Kan plates in 1-2 ul volumes℃ to create 24 clones on a single agar plate, and was then incubated at 37 overnight. The colonies were sequenced by Genewiz Inc. ℃

2.11 Protein Purification

Protein GB1 variants were expressed in BL21 (DE3) with 1 X Kanamycin in 1 L

2YT media and grown at 37 to OD600=0.75. The culture was induced with IPTG to a

final concentration of 0.1 mM℃ IPTG, and was immediately cold shocked with mixture of

ice and water for 10 minutes. Manual shaking was applied periodically during cold

shock. After cold shock, the culture was incubated at 16 degree at 12-18 hours for expression. Cells was separated and collected using Sorvall RC 6 (Thermo

36 Scientific) centrifuge to spin down in GS3 tubes at 6000 rpm for 10 minutes. Cell was

resuspended in 25 ml lysis buffer (50 mM Tris HCl, 300mM NaCl, 10 mM imizazole, 2

mM ME, pH 8) using a 5 ml serological pepitte∙ . Lysis buffer was premixed to contain

-1 2 ug 훽mL-1 DNase 1, 200 ng mL RNase A, 5 mM MgCl2, 0.5 mM CaCl2, 0.1% Triton

X-100, and 1.2 mg ml-1 lysozyme. The cells were incubated at 4 for 30 minutes. The

cells were sonicated 3 times at 50% power for 30 seconds on ice,℃ and 2 minutes were

allowed between each pulse. The cells were transferred into SS34 tubes, and the soluble

fraction containing protein was separated from the pellets by centrifugation using

Eppendorf 5810R at 20000 rpm, 4 for 1 hour. 1.5 ml of Qiagen 50% slurry Ni-NTA

agarose was added to the supernatant℃ at a 50 ml conical tube, and the tube was mixed at

4 for 1 hour to allow the binding of protein to Ni-NTA. The flow-through was

collected℃ by discarding the supernatant into a large, pre-fritted column. The bound resin

was washed twice with 7 ml wash buffer (lysis buffer with 20 mM imidazole), and the protein was eluted three times with 1 ml elution buffer (lysis buffer with 250 mM imidazole). After adding an additional 1 ml of lysis buffer to the 3ml pooled eluent, the

6xHis-tag cleavage was first done in the eluent containing 5 mM DTT and 50 ul of TEV protease at room temperature overnight, and then done at 30 for another 3 hours with an additional 50 ul of TEV protease, and an additional 5 mM℃ DTT. The cleaved eluent was applied equally to two PD10 column, and eluents were collected by apply 3.5 ml lysis buffer to each PD10 column for buffer exchange. Another round of Ni-NTA binding was applied at the same condition for 1 hour. The flow-through was collected by loading the eluents into a small, pre-fritted column. The bound resin was washed twice with 3 ml

37 of lysis buffer. The cleaved protein was collected by eluting the resin twice with 1 ml of elution buffer.

38 Chapter 3

Results and discussion

3.1 Construction of computational protein GB1 library

The computationally designed GB1 library was constructed from two overlapping oligos, and two amplification oligos, with 6 positions randomized at the interface of - helix and -sheets. Among the 6 randomized positions, position 5 and 7 are located on훼 -

1 strand, position훽 54 and 56 are located on -2 strand, and position 30 and 34 are located훽 on the -helix. (Figure 7) In GB1 -1 and 훽 -2 are arranged in parallel, constituting a sheared훼 topology. It has been suggested훽 that훽 protein mechanical stability depends on interaction across the surfaces in a sheared topology that are to be sheared upon forced unfolding.69 The molecular dynamics simulations on GB1 suggested that the mechanical

unfolding of GB1 involved the shearing of two structural motifs against each other: the

second -haripin against the rest of GB1, the first -haripin and -helix.70 Therefore, it

could be훽 proposed that the hydrophobic interaction훽 between the 훼-helix and the two

parallel strands play an important role to resist the mechanical훼 shear, and prevent GB1

from unfolding.훽 The hydrophobic interaction from the 6 positions serves as glues to hold

the -helix and -sheet. In the original scaffold of computationally designed GB1, M5,

I7, A30,훼 A34, L54,훽 and V56 faced inward to form a tightly packed hydrophobic core.

Although they are all nonpolar residues potentially with strong hydrophobic interaction

39 between them, RosettaDesign failed to design a thermally more stable GB1 homology than wild-type GB1. It could provide a better understanding of the sequence-stability relationship by statistical analysis of the constructed library suggested by new artificial mutations at the hydrophobic core. Protein GB1 homolog library was originally designed as a combinatorial library with 6 positions randomized simultaneously had a theoretical diversity of 4,224 different sequences including original scaffold. In this library, position

5 were randomized to Lue, Met, and Val, position 30 and 34 were randomized to Ala and

Val, and position 7, 54, and 56 were randomized to Phe, Leu,Iile, and val.

Figure 7. the structure of computational designed protein GB1. Residue Met-5, Ile-7, Ala- 30, Ala-34, Leu-54, and Val-56 are shown as stick.

40 A parallel combinatorial GB1 library of single-site mutation was also constructed.

Alexander et al. reported an observation that a single amino acid substitution can

completely change the fold of a protein. A single-site mutation at position 45 of GB1can swap GB1 into a completely different three-dimensional structure, and the important hydrophobic interaction in the core of GB1 conformation became unstructured c-terminal and n-terminal.71 Since GB1 is a small protein, the fraction of residue that change their

structure could be large. Unfortunately, there is no way to predict this phenomenon.

Nevertheless, it is possible to expect a stability change from a single-site mutation at

hydrophobic core. In this parallel combinatorial library, the randomized sites are same

with that of the multi-site mutation library described above, but one mutation instead of 6

mutations was made at each time. Therefore, the theoretical library size of this single-site

mutation is 19 variants. The construction method of the single-site mutation library is

same with that of multi-site mutation.

The constructed library were cloned into a pMRH6 vector under the control of a T7

promoter, which was incorporated with an N-terminal 6-His-tag for facilitating the affinity binding with magnetic beads during protein purification. The construct was transformed into BL21 for high throughput expression by using 96-well plate to generate

96 variants at one time.

3.2 HTTS and sequencing

HTTS was employed to screen the protein GB1 library based on the protein stability, which is a ThermoFluor method to detect the thermal stability of individual protein

41 variants based on the fluorescence intensity of hydrophobic dye during the course of

temperature change. By adding SYPRO orange, a hydrophobic dye, into the protein sample, it could interact with the protein hydrophobic core during protein unfolding at increasing temperature. Previous study of HTTS showed that the molten globule variants

showed a high fluorescent intensity at room temperature, and no unfolding transition

curve was shown. However, the native like variants showed low fluorescence intensity at

room temperature, and a sharp increase during unfolding, which exhibit a bell-shaped curve. It can be fitted to the Clark & Fersht equation72, which accounts for non-flat

pretransition baselines, to the data from room temperature to the temperature at the

florescence maximum.

For both multi-site mutation Protein GB1 library and single-site mutation Protein

GB1 library, 96 variants were randomly selected. 30 variants from multi-site mutation

and 36 variants from single-site mutation passed the quality control and were plotted.

Other variants showed no expression, molten globule, or no binding to dye were rejected

from analysis. From the HTTS of the multi-site mutation Protein GB1, all of the

remaining variants displayed native-like curve similar with the wild type, a sharply

increase in fluorescent intensity after a range of baseline. (Figure 8) The variants

experienced a small range of thermal stability. By comparing with the Tm of wild type

(59.9 ), all of the variants showed a lower Tm than wild type. Therefore, the predicted

mutations℃ did not generate a higher stability than the original scaffold designed by

computer.

42

Figure 8. HTTS of multi-site mutation on designed protein G. The wild-type is shown as black dot.

By comparing with the multi-site mutation GB1 library, the single-site mutation

GB1 library showed the impact of individual mutation to the overall GB1 stability. It is possible that each individual mutation of the six positions might result in an increase or

decrease of protein stability without interference from other mutations. Accordingly, if

one simultaneously randomizes 6 mutations at one time, positive mutations could be

compensated by the mutations with negative impact. Therefore, single-site mutation is

possible to show increased variability of Tm.

43 From the HTTS of single-site mutation GB1 library, the 35 variants with native-like unfolding behavior are shown in figure 9. They fall in a Tm range between 54.4 to

60.0 . The sequencing result of the 35 variants was shown in Table 4, and 10 different℃

sequences℃ were confirmed from the sequencing results. Several HTTS-confirmed variants

could share a same sequence, which result in its corresponding Tm value with small

deviation. It could be seen that the original scaffold (MIAALV), showed a Tm (56.4 -

60.0 ) value higher than other variants. Similar with the multi-site mutation, the single℃ -

site mutation℃ did not show any successful design that is better than the original

computational design. From the study of residue distribution (figure 10) of all sequenced

variants, it could be seen that the residue showed in the original scaffold is dominant over

the other predicted mutations at each position, especially at position 5 and 30, which were

only occupied by original residue. To consider the results of HTTS and residue

distribution as a whole, a positive correlation between quantity and quality was shown in

this library. Among the designed sequences, the residues with higher propensity in this

library also appeared in the variants with higher thermal stability. However, it is only

applied to the residues in the original scaffold. Although the other designed mutants

apparently showed different Tm value from each other, they have an approximately equal

residue propensity that are all significantly lower than that of the original scaffold.

Therefore, Rosetta mimicked the course of evolution. Comparing with other protein

variants, the most stable protein variant is more competitive and adaptive in the

microenvironment, and its sequence is more populated in the library.

44 1.2

1

0.8

0.6

0.4 normolized Fluoresence normolized

0.2

0 25 35 45 55 65 75 Temperature

Figure 9. HTTS of single-site mutation on designed protein G. The wild-type is shown as black dot.

Mutants Tm ( ) MIAALI 54.4 MVAALV 54.4-54.7℃ MIAAFV 54.9-55.2 MIAAVV 55.0-55.3 MIAVLV 55.4-55.7 MIAAIV 55.8-56.1 MFAALV 57.0 MLAALV 56.7-57.0 MIAALF 56.9-57.2 MIAALV 56.4-60.0 Wild type 59.9

Table 4. Tm value for each confirmed GB1 variants

45 Amino acid distribution 1 0.9 0.8 0.7 Ala 0.6 Ile 0.5 Phe 0.4 Val 0.3 Met 0.2 Leu 0.1 0 5 7 30 34 54 56

Figure 10. Amino acid distribution at the 6 mutation sites among native-like variants

The thermal denaturation curves for multi-site mutation library and single-site mutation library apparently showed high similarity of overall pattern and Tm range.

However, an obvious difference between the two libraries is the relative thermal stability

difference between the original scaffold and the other variants. The HTTS of multi-site

mutation library suggested that the stability difference between the original scaffold and

other variants is slightly larger than the stability difference within the other variants. By

comparison, the HTTS of single-site mutation library suggested that the stability difference is similar between all the variants including the original scaffold. In other words, the stability of mutants in single-site mutation library is closer to the stability of the original scaffold than that in the multi-site mutation. It could be assumed that

46 multisite substitution contribute more than single-site substitution to destabilize the designed GB1 protein.

Table 4 also suggested the small stability difference between original scaffold and other variants. The second stable variants (MIAALF) have a Tm range overlapping with that of the original scaffold. In other words, at least one sample in the variants (MIAALF) was found thermally more stable than one sample from the original scaffold (MIAALV).

It is not reliable to conclude that MIAALF variant are thermally more stable than

MIAALV, because this could be due to the random error made by DSF measurement.

3.3 Discussion of computationally designed protein G library

The site-directed mutagenesis at the 6 positions in the hydrophobic core was suggested by Dr. Baker. It is predicted that the original Rosetta designed GB1 core is defective, because it is not as stable as the wild type GB1. By examining the secondary and tertiary structure stabilization, some positive and negative effect can be expected to substitute the 6 core residues in the original scaffold into the new residues. Positions 30 and 34 are occupied by alanine at the original scaffold, and were designed to mutate to valine. Significant amounts of alpha helix tolerance study showed that it is highly favorable for alanine to appear in the alpha helix. Compare to other hydrophobic residue, the alanine incorporation in the alpha helix usually buries more apolar area, which will result in a smaller apolar surface.73 The enthalpy study showed that the desolvation penalty for alanine is relatively low during protein folding, due to a lower backbone polar area.74 Those factors could all explain the presence of alanine at the original scaffold.

However, the requirement of tertiary interaction for stabilization might contradict the

47 requirement of secondary interaction. For tertiary interaction, the favorable hydrophobic interaction should be considered as a main factor in the core packing. Sauer and Gregoret compare the tolerance of alanine and valine in the alpha hydrophobic core. They found out that alanine had higher preference to appear at less buried area, and valine had higher preference to appear at largely buried area.75 The reason could be that the burial of valine could result in a larger buried volume and more favorable Van der waals interactions by compared with alanine. Therefore, the randomization to both valine and alanine could be an investigation of the competition between the secondary interaction and tertiary interaction. My result showed that the tertiary interaction overweight the secondary interaction as a factor to determine the residue propensity at position 30 and 34.

Another potential controversial issue should be addressed based on my result is the residue propensity at position 5. In the original scaffold, Methionine was showed at position 5, and a randomization to Leu and Val were predicted to lead to a more stable structure. Several previous studies of methionine mutagenesis provided evidence to favor the new residues. Matthews and coworkers’ study suggested that the all of the single substitution of 10 core residues to methionine result in a decreased thermal stability of T4 lysozyme.76 Another study from Sternberg, & Chickos showed that each methionine-to- leucine substitution at a restricted internal site is predicted to have an entropy cost of about 0.8 kcal/mol. This could be attributed to the higher degrees of freedom of methionine side chain comparing with other hydrophobic residues. However, my result showed that protein GB1 has an overwhelmingly preference of methionine at position 5.

This result totally opposed the predicted mutation that the replacement of methionine will

48 improve protein stability. Another study from Creighton showed that the van der Waals

volume occupied by leucine is as same as for methionine. One possible explanation to

neutralize this controversy could be attributed to the unfavorable conformational stain caused by the presence of leucine, and result in an increase in free energy. Also, it is possible that presence of methionine could easier adapt to a “jigsaw puzzle” by comparing with other randomized residues. Nevertheless, it is still not a reliable explanation to rationalize this problem because the side chain of methionine has similar size with the side chain of leucine.

At position 7, 54, and 56, the same mutations were made. Phenylalanine was the only newly designed aromatic residues in this library. The simultaneously randomization of the residues in the three positions to phenylalanine were intended to achieve pi-pi stacking interaction, which is a major factor for protein stability. Since pi-pi stacking should be achieved by at least two aromatic residues stacked each other in parallel configuration, single phenylalanine replacement should not considered as a factor for protein stabilization due to the absence of additional aromatic partners.

According to the Tm value shown at table 4, the designed mutants were failed to

achieve a higher thermal stability than the original scaffold. In contrast with the designed

mutants, the original scaffold introduced more favorable hydrophilic interactions and the

least amount of conformational strain into the overall protein structure. Any single

substitution could change the core-packing pattern, and destabilize protein structure. A

previous assumption of the masking effect of multi-site mutations could be reconsidered

here. If it is true that compensation occurred between the effects of multisite mutation,

49 the single-site mutation could result in a higher variable Tm value. However, by

comparing the two sets of thermal denaturation curve, mutants were both densely

distributed without different variability were found. Hence, consequently, single site

mutation did not indicate a more noticeable influence to the protein stability than

multisite mutation.

Not only for the hydrophobic core, the other areas in the computationally designed

protein G could be also drawn attention for stability studies. On the solvent exposed

surface, hydrophilic residues could be rearranged to maximize the amount of the ion pair

formed between the side chains of them. At the edge between the solvent exposed surface area and the buried core, the residues could be redesigned to ensure the core residues are largely enclosed from the assessable hydrophilic environment.

3.4 Construction of Single-Chain Rop library

Assuming the two alpha helix in Rop are named alpha helix A and B, respectively,

the constructed left-handed Single-Chain Rop was named A1-B1-B2-A2, with the two

alpha helix in the second Rop reversed. By comparing the right-handed arrangement A1-

A2-B2-B1, the left-handed Rop was proved thermally more stable65 and was used in my

mutagenesis study. The 8 residues at the two layers close to the center were randomized

into codon NNK that encode all 20 amino acids. (Figure 11) The benefit to design a

single chain instead of a dimeric Rop is the simplicity of study of single amino acid

change. Since the dimeric Rop is arranged antiparallel, the actual mutation will be

doubled than expected. Therefore, 16 instead of 8 mutations will be obtained if the Rop is

dimeric, which will complicate the study of packing pattern at the desired layers.

50 Apart from the benefit, one drawback of the single-chain Rop design is the complication of cloning. The connection of the dimeric Rop doubles the length of DNA sequence, which contains the combination of two 60 residue-long sequence. Therefore, it is difficult for PCR reaction to distinguish the difference between the repeating sequences.

Two parallel PCR reaction process was designed to solve this problem. For each sequence, three restrictive sites (AflIII, BsaI and BamHI) were included. BsaI site were after AflIII sites and precede the BamHI sites respectively at the two sequences. The two sequences could be digested and ligated at the bsa1 site, and the resulting insert will be ligated into the vector at AflIII and BamHI. (Figure 12)

Figure 11. Structure of single-chain Rop. 8 residues in the central two layers were shown as cartoon

51

Figure 12. Cloning strategy for single-chain Rop

3.5 Screening and selection

The screening and selection is achieved by co-expressing the single-chain Rop screening plasmid with the reporter plasmid. The fluorescent phenotype was checked after each round of enrichment, and the amount of active Rop (% hits) was increased correspondingly. The highly fluorescent colonies should be considered as active. In the original library, only less than 10 active colonies were found. After the first round of enrichment, ~65% hits were achieved. 95% and 98% hits were achieved for the second and third round of enrichment, respectively. (Figure 13) The round with low % hit could ensure the high coverage of the original library size in the pool, and the round with high

% hit could ensure the high efficiency of the pool. Since the theoretical library size of single chain Rop is 208, the round with 65% hits was chosen, which is convenient for

52 active colonies identification without loss of too much population in the library. To confirm that the phenotype correctly indicates the presence of single chain Rop screening plasmid, and DNA gel was ran to observe the presence of correct single-chain Rop constructs. The gel showed the two bands correspond to single-chain Rop and the reporter plasmid respectively, with correct cutout fragment size. Unfortunately, according to the sequencing result of random picked 40 active variants, a significant amount of sample showed sequences different from the authentic single-chain Rop sequence. And it was confirmed that the incorrect sequence was not the sequence of the reporter plasmid.

a b

c d

Figure 13. Screening of NNK library. a. less than 10 active colonies. b. 65% hits. c. 95% hits. d. 98% hits.

53 3.6 Discussion of single-chain Rop library

The possible reason could be a combination of the screening/selection and cloning

contamination. The phenotype fluorescence is actually directly related to the

concentration of arabinose, the presence of which will activate the AraC-mediated

activation of GFP expression. Within a narrow range of arabinose concentration of

0.0005%, the functional screening would be related to the phenotype fluorescence. If the

arabinose concentration were moderately deviated from 0.0005%, the screening result

would be biased by visually observing the phenotype. Hence, the observed “highly

fluorescent” colonies may not indicate high activity. Therefore, it would not guarantee

that the selected variants are functionally active. DNA from the biological source

organism instead of the target insert could be ligated into a cloning vector, and participate

the enrichment process.

DNA gel electrophoresis can be employed to resolve this problem. Theoretically,

active Rop reduce the amount of ColE1 plasmid, and inactive Rop increase the amount

ColE1 plasmid. For each round of enrichment, the amount of ColE1 plasmid will

decrease, and the amount of pMRH6 plasmid will increase. According to this trend, the

relative amount of ColE1 plasmid and pMRH6 plasmid can be observed from DNA gel.

XbaI and XmaI restriction sites can be chosen to differentiate between the two plasmids.

ColE1 plasmid can be double digested at two sites approximately adjacent to each other, and pMRH6 plasmid can be double digested at two sites within a distance of 1600 base- pair long. (Figure 14) The intensity of Rop band and ColE1 band in each round of enrichment can be used to detect activity of Rop. The Sc-Rop library should be re-cloned

54 if no evidence of active Rop shows on gel.

Figure 14. Restriction site for XbaI and XmaI in pMRH6 plasmid and pUCBADGFPuv plasmid

Because of the simplicity in studying the hydrophobic core, Single-Chain Rop will

be a major direction for core packing issue. The residue propensity and packing pattern

could be compared between the single-chain Rop and dimeric Rop to bring new insight in the role of loop. Also, the oligomeric state and overall topology are valuable to investigate.

55 Reference

1. Zwanzig, R.; Szabo, A.; Bagchi, B., Levinthal's paradox. Proc. Natl. Acad. Sci. USA. 1992, 89, (1): 20-2. 2. Lins, L.; Brasseur, R.; The hydrophobic effect in protein folding. FASEB J. 1995, 9, 535- 540. 3. Privalov, P; Thermodynamics of protein folding. J. Chem. Thermodyn. 1997, 29, (4), 447-474. 4. Kumar1, S.; Tsai, C.; Nussinov, R.; Factors enhancing protein thermostability. Protein Eng. 2000, 13, (3), 179-91. 5. Vogt, G.; Woell, W.; Argos, P.; Protein Thermal Stability, Hydrogen Bonds, and Ion Pairs. J Mol Biol. 1997, 269, (4): 631-43. 6. Anfinsen, CB.; Principles that govern the folding of protein chains. Science. 1973, 181, (4096): 223-30. 7. Zhang, XJ.; Baase, WA.; Shoichet, BK.; Wilson, KP.; Mattthews, BW.; Enhancement of protein stability by the combination of point mutations in T4 lysozyme is additive. Protein Eng. 1995, 8, (10): 1017-22. 8. Malakauskas, SM.; Mayo, SL.; Design, structure and stability of a hyperthermophilic protein variant. Nat Struct Biol. 1998, 5, (6): 470-5. 9. Ruvinov S, Wang L, Ruan B, Almog 0, Gilliland G, Eisenstein E, Bryan P. Engineering the independent folding of the subtilisin BPN' prodomain: Analysis of two-state folding vs. protein stability. Biochemistry 1997, 36, (34):10414-10421. 10. Pace, CN. 1990. Measuring and increasing protein stability. TIBTECH 1990, 8:93–98. 11. Takai, K.; Nakamura, K.; Toki, T.; Tsunogai, U.; Miyazaki, M.; Miyazaki, J.; Hirayama, H.; Nakagawa, S.; Nunoura, T.; Horikoshi, K.; Cell proliferation at 122°C and

isotopically heavy CH4 production by a hyperthermophilic methanogen under high- pressure cultivation. Proc. Natl. Acad. Sci. USA. 2008, 105, (31):10949-54. 56 12. Szilágyi, A.; Závodszky, P.; Structural differences between mesophilic, moderately thermophilic and extremely thermophilic protein subunits: results of a comprehensive survey. Structure. 2000, 8, (5):493-504. 13. Gromiha, MM1.; Pathak, MC.; Saraboji, K.; Ortlund, EA.; Gaucher EA. Hydrophobic environment is a key factor for the stability of thermophilic proteins. Proteins. 2013, 81, (4):715-21. 14. Shoichet, BK.; Baase, WA.; Kuroki, R.; Matthews, BW.; A relationship between protein stability and protein function. Proc Natl Acad Sci U S A. 1995, 92, (2): 452–456. 15. D'Amico, S.; Marx, J.; Gerday, C.; Feller, G.; Activity-Stability Relationships in Extremophilic Enzymes. J Bio Chem. 2003 278, (10): 7891-6. 16. Dahiyat, B.; Mayo, S.; De Novo Protein Design: Fully Automated Sequence Selection. Science. 1997, 278, (5335): 82-7. 17. Kuhlman, B.; Dantas, G.; Ireton, G.; Varani, G.; Stoddard, B.; Baker, B. Design of a Novel Globular Protein Fold with Atomic-Level Accuracy. Science. 2003, 302, (5649): 1364-8. 18. Jiang, L.; Althoff, EA.; Clemente, FR.; Doyle, L.; Rothlisberger, D.; Zanghellini, A.; Baker, D. De novo computational design of retro-aldol enzymes. Science. 2008, 319, (5868):1387-91. 19. Rothlisberger, D.; Khersonsky, O.; Wollacott, AM.; Jiang, L.; Dechancie, J.; Betker, J.; Baker, D. Kemp elimination catalysts by computational enzyme design. Nature. 2008, 8, (453):190-195. 20. Lippow, SM.; Tidor, B. Progress in computational protein design. Curr Opin Biotechnol. 2007, 18, (4): 305-11. 21. Samish, I.; MacDermaid, C.; Perez-Aguilar, J.; Saven, J. Theoretical and Computational Protein Design. Annu Rev Phys Chem. 2011, 62: 129-49. 22. Magliery, T.; Regan, L. Library approaches to biophysical problems. Eur J Biochem. 2004, 271, (9): 1593-4. 23. Moffet, D.; Hecht, M. De Novo Proteins from Combinatorial Libraries. Proc Natl Acad Sci USA. 2003, 100, (23): 13270-3.

57 24. Rose, GD.; Geselowitz, AR.; Lesser, GJ.; Lee, RH.; Zehfus, MH. Hydrophobicity of Amino Acid Residues in Globular Proteins. 1985, 229, (4716): 834-8. 25. Kabsch, W.; Sander, C. Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Biopolymers. 1983, 22, (12): 2577-637. 26. Andersen, CA.; Palmer, AG.; Brunak, S.; Burkhard, R. Continuum Secondary Structure Captures Protein Flexibility. Structure. 2002, 10, (2): 175-84. 27. Chou, PY.; Fasman, GD. Prediction of protein conformation. Biochemistry 1974, 13,(2): 222–245. 28. Pace, CN.; Scholtz, JM. A Helix Propensity Scale Based on Experimental Studies of Peptides and Proteins. Biophysics J. 1998, 75, (1), 422-7. 29. Best, RB.; de Sancho, D., Mittal, J. Residue-Specific a-Helix Propensities from Molecular Simulation. Biophys J. 2012, 102, (6): 1462-7. 30. Kim, CA.; Berg, JM.; Thermoynamic beta-sheet propensities mearsured using a zinc- finger host peptide. Nature. 1993, 362, (6417): 267-70. 31. Minor, DL Jr.; Kim, PS. Measurement of the β-sheet-forming propensities of amino acids. Nature. 1994, 367, (6464): 660-3. 32. Koga1, N.; Tatsumi-Koga, R.; Liu, G.; Xiao, R.; Acton, TB.; Montelione, GT.; Baker, D. Principles for designing ideal protein structures. Nature. 2012, 491, (7423): 222-7. 33. Magliery, T.J.; Regan, L. A cell-based screen for function of the four-helix bundle protein Rop: A new tool for combinatorial experiments in biophysics. Protein Eng. Des. Select. 2004, 17: 77-83 34. Lazar, GA., Handel, TM. Hydrophobic core packing and protein design. 1998, 2, (6): 675–9. 35. Ventura, S.; Vega, MC.; Lacroix, E.; Angrand, I.; Spagnolo, L.; Serrano, L. Conformational strain in the hydrophobic core and its implications for protein folding and design. Nat Struct Biol. 2002, 9, (6): 485-93. 36. Murphy1, GS.; Mills, JL.; Miley, MJ.; Machius, M.; Szyperski, T.; Kuhlman, B. Increasing Sequence Diversity with Flexible Backbone Protein Design: The Complete Redesign of a Protein Hydrophobic Core. Structure. 2012, 20, (5): 1086-96. 37. Edgell, MH.; Sims, DA.; Pielak, GJ.; Yi, F.; High-precision, high-throughput stability

58 determinations facilitated by robotics and a semiautomated titrating fluorometer. Biochemistry. 2003, 42, (24): 7587-93. 38. Matulis, D.; Kranz, JK.; Salemme, FR.; Todd, MJ. Thermodynamic stability of carbonic anhydrase: measurements of binding affinity and stoichiometry using ThermoFluor. Biochemistry. 2005, 44, (13):5258-66. 39. Lo, MC.; Aulabaugh, A.; Jin, G.; Cowling, R.; Bard, J.; Malamas, M.; Ellestad, G. Evaluation of fluorescence-based thermal shift assays for hit identification in drug discovery. Anal Biochem. 2004, 332, (1):153-9. 40. Lavinder, J.J.; Hari, S.B.; Sullivan, B.J. & Magliery, T.J. "High-throughput thermal scanning: a general, rapid dye-binding thermal shift screen for protein engineering," J. Am. Chem. Soc. 2009, 131: 3794-3795 41. Seabrook, SA.; Newman, J. High-Throughput Thermal Scanning for Protein Stability: Making a Good Technique More Robust. ACS Comb. Sci., 2013, 15, (8): 387–392. 42. Jäckel, C.; Kast, P.; Hilvert, D. Protein Design by Directed Evolution. Annu. Rev o Biophys. 2008, (37): 153-173. 43. Gallagher, T.; Alexander, P.; Bryan, P.; Gilliland, GL.; Two Crystal Structures of the B1 Immunoglobulin-Binding Domain of Streptococcal Protein G and Comparison with NMR. Biochemistry. 1994, 33, (15): 4721-9. 44. Strop, P.; Marinescu, AM.; Mayo. SL. Structure of a protein G helix variant suggests the importance of helix propensity and helix dipole interactions in protein design. Protein Sci. 2000, 9, (7): 1391-4. 45. Gronenborn, AM.; Frank , MK.; Clore, GM.; Core mutants of the immunoglobulin binding domain of streptococcal protein G" stability and structural integrity. FEBS Lett. 1996, 398, (2-3): 312-6. 46. Blanco, FJ.; Ortiz, AR.; Serrano, L. Role of a nonnative interaction in the folding of the protein GB1 domain as inferred from the conformational analysis of the alpha-helix fragment. Fold Des. 1997, 2, (2): 123-33. 47. Lee, SY.; Fujitsuka, Y.; Kim, DH.; Takada, S. Roles of Physical Interactions in Determining Protein Folding Mechanisms: Molecular Simulation of Protein G and Spectrin SH3. Proteins, 2004, 55, (1): 128-38.

59 48. McCallister, EL, Alm, E.; Baker, D. Critical role of β-hairpin formation in protein G folding. Nat. Struct Biol. 2000, 7, (8): 669-73. 49. Kmiecik, S.; Kolinski. A.; Folding Pathway of the B1 Domain of Protein G Explored by Multiscale Modeling. Biophys J. 2008, 94, (3): 726-36. 50. Derreumaux, P. Role of supersecondary structural elements in protein G folding. J. Chem. Phys. 2003, 119, (4940). 51. Loladze VV, Ibarra-Molero B, Sanchez-Ruiz JM, Makhatadze GI. Engineering a thermostable protein via optimization of charge-charge interactions on the protein surface. Biochemistry. 1999, 38, (50): 16419-23. 52. Gronenborn, AM.; Frank, K.; Clore, GM. Core mutants of the immunoglobulin binding domain of streptococcal protein G" stability and structural integrity. FEBS. 1996, 398, (2- 3): 312-6. 53. Strop, P.; Marinescu, AM.; Mayo. SL. Structure of a protein G helix variant suggests the importance of helix propensity and helix dipole interactions in protein design. Protein Sci. 2000, 9, (7): 1391-4. 54. Ho, SP.; DeGrado, WF. Design of a 4-Helix Bundle Protein: Synthesis of Peptides Which Self-Associate into a Helical Protein. J. Am. Chem. Soc. 1987, 109, (22): 6751-8. 55. DeGrado, W. F.; Summa, C. M.; Pavone, V.; Nastri, F.; Lombardi, A. De novo design and structural characterization of proteins and metalloproteins. Annual Review of Biochemistry. 1999,68, 779-819. 56. Cesareni, G.; Muesing, MA.; Polisky, B. Control of ColEl DNA replication: The rop gene product negatively affects transcription from the replication primer promoter. Proc. Natl. Acad. Sci. USA. 1982, 79, (20): 6313-17. 57. Banner, DW.; Cesareni, G.; Tsernoglou, D.; Crystallization of the ColE1 Rop protein. J Mol Biol 1983, 170, (4): 1059-60. 58. Banner, DW.; Kokkinidis, M., Tsernoglou, D. Structure of the ColE1 Rop protein at 1.7 .A. resolution. J Mol Biol. 1987, 196, (3): 657-75. 59. Eberle, W., et al., The structure of Co1E1 rop in solution. J Biomol NMR 1991, 1, (1), 71- 82. 60. Castagnoli, L., et al., Genetic and structural analysis of the ColE1 Rop (Rom) protein.

60 Embo J 1989, 8, (2), 621-9. 61. Munson, M, O’Brien, R.; Sturtevant, JM.; Regan, L. Redesigning the hydrophobic core of a four-helix-bundle protein. Protein Sci. 1994, 3, (11): 2015-22. 62. Munson, M.; Anderson, KS.; Regan, L. Speeding up protein folding: mutations that increase the rate at which Rop folds and unfolds by over four orders of magnitude. Fold Des. 1997, 2, (1): 77-87. 63. Hari, S.B.; Byeon, C.; Lavinder, J.J. & Magliery, T.J. Cysteine-free rop: A four-helix bundle core mutant has wild-type stability and structure but dramatically different unfolding kinetics. Protein Sci. 2010, 19: 670-679. 64. Munson, M.; Balasubramanian, S.; Fleming, G.; Nagi, AD.; O'Brien, R.; Sturtevant, JM.; Regan, L. What makes a protein? Hydrophobic core designs that specify stability and structural properties. Protein Sci. 1996, 5, (8): 1584-1593. 65. Predki, F.; Regan, L.; Redesigning the Topology of a Four-Helix-Bundle Protein: Monomeric Rop. Biochemistry. 1995, 34, (31): 9834-9. 66. Kresse, HP.; Czubayko, M.; Nyakatura, G.; Vriend, G.; Sander, C.; Bloecker, H.; Four- helix bundle topology re-engineered: monomeric Rop protein variants with different loop arrangements. Protein Eng. 2001, 14, (11): 897-901. 67. Lavinder, J. J. Analyzing the Sequence-Statebility Landscape of the Four-helix Bundle Protein Rop: Developing High-Throughput Approaches for Combinatorial Biophysics and Protein Engineering. The Ohio State University, Columbus, 2009. 68. Lavinder, J. J.; Hari, S. B.; Sullivan, B. J.; Magliery, T. J., High-throughput thermal scanning: a general, rapid dye-binding thermal shift screen for protein engineering. J Am Chem Soc 2009, 131, (11), 3794-5. 69. Identification of a mechanical rheostat in the hydrophobic core of protein L. Sadler, DP.; Petrik, E.; Taniguchi, Y.; Pullen, JR.; Kawakami, M.; Radford, SE. Brockwell, DJ. J Mol Biol. 2009, 393,(1): 237-48. 70. Glyakina, AV.; Balabaev, NK.; Galzitskaya, OV. Mechanical unfolding of proteins L and G with constant force: similarities and differences. J Chem Phys. 2009, 131, (4):045102. 71. A minimal sequence code for switching protein structure and function. Alexander, PA.; He, Y.; Chen, Y.; Orban, J.; Bryan, PN. Proc Natl Acad Sci U S A. 2009, 106, (50):

61 21149-54. 72. Clarke, J. & Fersht, A. R. Engineered disulfide bonds as probes of the folding pathway of barnase: increasing the stability of proteins against the rate of denaturation. Biochemistry. 1993, 32, 4322-9. 73. Alpha-helix stabilization by alanine relative to glycine: roles of polar and apolar solvent exposures and of backbone entropy. López-Llano, J.; Campos, LA.; Sancho, J. Proteins. 2006, 64, (3): 769-78. 74. Ermolenko, D.; Richardson, JM.; Makhatadze, GI. Noncharged amino acid residues at the solvent-exposed positions in the middle and at the C terminus of the alpha-helix have the same helical propensity. Protein Science. 2003, 12, (6): 1169-76. 75. Gregoret, LM.; Sauer, RT. Tolerance of a protein helix to multiple alanine and valine substitutions. Fold Des. 1998, 3, (2):119-26. 76. Gassner, NC.; Baase, WA.; Mat, BW. A test of the "jigsaw puzzle" model for protein folding bymultiple methionine substitutions within the core of T4 lysozyme. Proc. Natl. Acad. Sci. USA. 1996, 93, (22): 12155-8. 77. Creighton TE. Protein folding. (Freeman and Co. New York W.H. 1992)

62