<<

The Pennsylvania State University

The Graduate School

PROTEIN OF STRUCTURALLY HOMOLOGOUS

A Thesis in

Integrative Biosciences

by

Hui Li

© 2005 Hui Li

Submitted in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

December 2005

The thesis of Hui Li was reviewed and approved* by the following:

Stephen J. Benkovic Evan Pugh Professor and Eberly Chair in Chemistry Thesis Advisor Chair of Committee

Ming Tien Professor of Biochemistry

Squire Booker Associate Professor of Biochemistry and Molecular Biology

Costas D. Maranas Professor of

Richard J. Frisque Professor of Molecular Virology Co-Director, Graduate Education Integrative Biosciences Graduate Program The Huck Institutes of the Sciences

*Signatures are on file in the Graduate School

iii ABSTRACT

One of the ultimate goals of engineering is the de novo design of novel proteins with desired activities and properties. However, our current knowledge of protein structures and functions are far from complete to achieve this goal. On the other hand, nature has successfully evolved an enormous number of proteins with novel functions for their hosts to fit the ever changing environments, and naturally occurring proteins present the most diverse and complicated information about - function relationships. Studying the evolutionary-related proteins with structural and functional will provide not only the detailed information about protein structure-function relationships, but also the insights to the strategies that nature had adopted for protein .

Here, we studied two pairs of with significant homology on their structures, reaction mechanisms, and the architectures. In our attempt to interconvert the enzymatic activities between two members in the same pair, rational methods, such as site-directed mutagenesis and rational domain swapping, were applied first on the basis of our current understanding of each protein. Further, the additional sequence spaces were explored using combinatorial methods, such as ITCHY, random mutagenesis, and DNA shuffling, to identify their potential roles in terms of protein structures and functions.

The first pair of enzymes we chose are Escherichia coli purT-encoded glycinamide ribonucleotide (GAR) transformylase (PurT) and Escherichia coli N5- carboxylaminoimidazole ribonucleotide (N5-CAIR) synthetase (PurK). While both iv enzymes are involved in the de novo purine biosynthesis, PurT catalyzes the third

reaction of the purine biosynthetic pathway, the conversion of GAR, ATP and formate to

formyl GAR, ADP and inorganic phosphate (Pi); and PurK catalyzes the sixth reaction of

the purine biosynthetic pathway, the conversion of 5-aminoimidazole ribonucleotide

(AIR), ATP and bicarbonate to N5-CAIR, ADP and Pi. The effort to interconvert the

enzymatic activities between PurT and PurK suggested that these two enzymes might

evolve through domain swapping. Several crucial structural elements for catalysis were

also identified in each protein, which provides value information for protein structure-

function relationships.

The second pair of enzymes are Escherichia coli N-acetylneuraminate

(NAL) and Escherichia coli dihydrodipicolinate (DHDPS), two (β/α)8 barrel proteins. NAL catalyzes the degradation of N-acetylneuminate (NANA) to N- acetylmannosamine (ManNAc) and pyruvate. DHDPS catalyzes the branch-point reaction of the lysine biosynthetic pathway in plants and microbes: the condensation of L- aspartate-β-semialdehyde and pyruvate to dihydrodipicolinate (DHDP). Both enzymes were observed to be able to catalyze each other’s reaction, and this functional promiscuity between NAL and DHDPS is considered as a strong statement that they are evolutionary-related. A possible evolutionary scheme from NAL to DHDPS through divergent path was further approved by the attempt to interconvert the enzymatic activities between NAL and DHDPS. A conserved Arg residue in DHDPS was identified to be crucial for the DHDPS activity. Several DHDPS mutants with an enhanced NAL activity were also identified using combinatorial methods. v TABLE OF CONTENTS

LIST OF FIGURES ...... vii

LIST OF TABLES...... x

LIST OF ABBREVIATIONS...... xi

ACKNOWLEDGEMENTS...... xiv

Chapter 1 Introduction ...... 1

1.1 ...... 1 1.2 Protein Evolution...... 4 1.3 Methodologies of Protein Engineering...... 7 1.3.1 Rational ...... 7 1.3.2 Combinatorial Approaches...... 13 1.3.2.1 Generation of Molecular Diversity for ...... 14 1.3.2.2 Screening and Selection ...... 25

Chapter 2 Rational Domain Swapping between purT-encoded GAR transformylase (PurT) and N5-CAIR synthetase (PurK) ...... 30

2.1 Introduction...... 30 2.1.1 GAR transformylase (PurT) and N5-CAIR synthetase (PurK)...... 31 2.1.2 Protein evolution by domain swapping ...... 43 2.2 Experimental:...... 49 2.2.1 Materials:...... 49 2.2.2 Bacterial Strains: ...... 50 2.2.3 Methods:...... 50 2.3 Results and Discussion ...... 57 2.4 Conclusion ...... 73

Chapter 3 Identification of functional subdomains in purT-encoded GAR transformylase (PurT) and N5-CAIR synthetase (PurK) by combinatorial and rational methods...... 75

3.1 Introduction...... 75 3.2 Experimental:...... 77 3.2.1 Materials:...... 77 3.2.2 Bacterial Strains: ...... 78 3.2.3 Methods:...... 78 3.3 Results and Discussion ...... 85 3.4 Conclusions: ...... 110 vi Chapter 4 Interconversion of enzymatic activities between N-acetylneuraminate lyase (NAL) and dihydrodipicolinate synthase (DHDPS), two (β/α)8 barrel proteins ...... 114

4.1 Introduction: ...... 114 4.1.1 The (β/α)8 barrel proteins...... 114 4.1.2 The evolution of the (β/α)8 barrel proteins ...... 117 4.1.3 Protein engineering of the (β/α)8 barrel proteins...... 126 4.1.4 N-acetylneuraminate lyase (NAL) and dihydrodipicolinate synthase (DHDPS) ...... 129 4.1.4.1 Introduction of NAL and DHDPS...... 129 4.1.4.2 The active sites of NAL and DHHPS...... 138 4.2 Experimental:...... 145 4.2.1 Materials:...... 145 4.2.2 Bacterial Strains: ...... 146 4.2.3 Methods:...... 146 4.3 Results and Discussion ...... 160 4.4 Conclusion ...... 190

Bibliography ...... 193

vii LIST OF FIGURES

Figure 1: Schematic diagram of mechanisms for major structural changes in protein evolution...... 6

Figure 2: Schematic overview of general procedures of rational design and combinatorial mutagenesis...... 8

Figure 3: Proposed catalytic mechanisms of crotonase and 4-chlorobenzoyl dehalogenase...... 10

Figure 4: Schematic overview of methods for the creation of DNA sequence diversity...... 15

Figure 5: Schematic overview of DNA shuffling. A family of homologous genes is randomly fragmented (e.g., by treatment with DNase I)...... 18

Figure 6: Schematic Overview of ITCHY (Incremental Truncation for the Creation of Hybrid Enzymes)...... 23

Figure 7: The de novo purine biosynthetic pathway in microorganisms...... 33

Figure 8: Reactions catalyzed by PurT transformylase and PurK...... 34

Figure 9: Three-dimensional structures of PurT, PurK, and superposition of the α- carbons for PurT (in blue) and PurK (in red)...... 36

Figure 10: Active sites of PurT, PurK, and superposition of conserved residues in PurT and PurK active sites...... 38

Figure 11: Comparison of amino acid sequences of E. coli PurT and PurK, B loop and J loop from a structure-based alignment...... 39

Figure 12: Similarity in the chemical reactions catalyzed by PurT and PurK...... 42

Figure 13: Reactions and Structure of Carbamoyl Phosphate Synthetase (CPS)...... 45

Figure 14: Reactions catalyzed by PurN and PurU enzymes and ribbon representation of three-dimensional structure of PurN...... 47

Figure 15: Construction of hybrid enzymes...... 51

Figure 16: Schematic overview of the construction using THIO-ITCHY...... 79

Figure 17: The schematic representation of pDIM-KT vector and pDIM-TK vector for ITCHY libraries, and DNA agarose gel analysis of TK ITCHY viii libraries incorporated with different concentrations of αS-dNTP after mung bean treatment...... 87

Figure 18: Distribution of crossovers in the naïve KT and TK ITCHY libraries...... 90

Figure 19: Distribution of rationally chosen crossovers in each individual domain. ..93

Figure 20: Design of ITCHY libraries for the A- and C- domains, and their respective control constructs...... 103

Figure 21: Distribution of crossovers in the naïve KA, TA, KC, and TC ITCHY libraries...... 107

Figure 22: Functional subdomains in PurK A- domain and PurT C- domain...... 109

Figure 23: Functional hybrid proteins identified by the rational methods and combinatorial methods...... 111

Figure 24: The three-dimensional structure of triosephosphate (5TIM.pdb) viewed from the C- terminal end of the barrel and from the side.....115

Figure 25: Reactions catalyzed by phosphoribosylanthranilate(PRA) isomerase (TrpF), indole-3-glycerol phosphate(IGP) synthase (TrpC), phosphoribosylformimino-5-amino-1-phosphoribosyl-4-imidazole carboxamide(ProFAR) isomerase (HisA), and imidazole glycerol phosphate(ImGP) synthase (HisF)...... 120

Figure 26: Reactions catalyzed by the L-Ala-D/L-Glu epimerase from E. coli (AEE), the muconate lactonizing enzymes II from Pseudomonas sp. P51 (MLE II), and the o-succinylbenzonate synthase (OSBS)...... 123

Figure 27: Schematic overview of the generation of Split-Trp 1p proteins...... 127

Figure 28: The reaction and proposed mechanism of NAL...... 130

Figure 29: The reaction and proposed mechanism of DHDPS...... 133

Figure 30: Ribbon representation of the E. coli NAL homotetramer bound with sialic acid alditol and the structure of sialic acid alditol...... 136

Figure 31: Superposition of three-dimensional structures of E. coli NAL and DHDPS viewed from the C- terminal end of the barrel and from the side, and the superposition of active sites residues in E. coli NAL and DHDPS...... 137

Figure 32: Structure-based alignment of amino acid sequences of Escherichia coli DHDPS and NAL, the β7 loop, and the β8 loop...... 143 ix Figure 33: The identification of DHDP generated by the wild type DHDPS and NAL using HPLC...... 166

Figure 34: The schematic overview of NAL and DHDPS secondary structures and their active sites residues, and the schematic overview of four hybrid proteins between NAL and DHDPS, and their in vivo complementation results in NAL(-) on M9+NANA medium and on M9+glucose medium...... 178

Figure 35: Schematic overview of SCRATCHY...... 182

Figure 36: The active site of DH-EP13...... 187

x LIST OF TABLES

Table 1: Summary of homology-dependent in vitro recombination methods for the generation of DNA sequence diversity...... 20

Table 2: Summary of homology-independent in vitro recombination methods for the generation of DNA sequence diversity...... 22

Table 3: Summary of in vivo and in vitro selection methods and their applications...27

Table 4: The primers used for construction of the hybrid enzymes...... 53

Table 5: Kinetic parameters of ATPase activtity of the two wild type enzymes (PurT and PurK), and KABT hybrid ...... 63

Table 6: The ATPase activities of the wild type PurT and PurK, and the hybrid enzymes under different conditions...... 65

Table 7: Activities of the production of acetyl phosphate, formyl phosphate and carboxyl phosphate by the wild type PurT and PurK, and the hybrid enzymes...71

Table 8: The primers used for construction of the hybrid enzymes with crossovers in individual domains...... 82

Table 9: In vivo activities of the hybrid protein with crossovers in the B- domain.....95

Table 10: Kinetic parameters of the PurK acitivity of PurK and KTBLK...... 97

Table 11: In vivo activities of the hybrid protein with crossovers in the A- and C- domains...... 100

Table 12: Primers used in Chapter 4...... 148

Table 13: Kinetic parameters of the NAL activities of the wild type NAL and DHDPS, and the NAL mutants...... 163

Table 14: Kinetic parameters of the DHDPS activities of the DHDPS mutants with mutations in the active site...... 171

Table 15: Kinetic parameters of the NAL and DHDPS activities of the DHDPS mutants...... 186

xi LIST OF ABBREVIATIONS

ADP adenosine diphosphate

AEE L-Ala-D/L-Glu epimerase from E. coli

AICAR 5-aminoimidazole-4-carboxamide ribotide

AIR aminoimidazole ribonucleotide

AMP adenosine monophosphate

ASA aspartate-β-semialdehyde

ATP adenosine triphosphate

BC biotin carboxylases bp base pair

BSA bovine serum albumin

N5-CAIR N5-carboxylaminoimidazole ribonucleotide

cAMP cyclic AMP

CATH the Class, Architecture, Topology and Homologous superfamily

database

CBPHA trans-2’-carboxybenzal-pyruvate hydratase-aldolase

CdRP 1-(2-carboxyphenylamino)-1-desoxyribulose-5-phosphate

DAP diaminopimelic acid

DDL D-alanine:D-alanine

DHDP dihydrodipicolinate

DHDPR dihydrodipicolinate

DHDPS dihydrodipicolinate synthase xii dNTPs deoxynucleotide triphosphates

DPDR dihydrodipicolinate reductase

EDTA ethylene diamine tetraacetic acid epPCR error-prone PCR

FAD flavin adenine dinucleotide fGAR formyl GAR

GAR glycinamide ribonucleotide

GTS

HBPHA trans-o-hydroxylbenzylidenepyruvate -aldolase

HisA phosphoribosylformimino-5-amino-1-phosphoribosyl-4-imidazole

carboxamide isomerase

HisF imidazole glycerol phosphate synthase

KDGA 2-keto-3-deoxygluconate aldolase

KDGDH D-5-keto-4-deoxglucarate dehydratase

IGP indole-3-glycerol phosphate

ImGP imidazole glycerol phosphate

IMP inosine monophosphate

IPTG isopropyl -D-thiogalactoside

ITCHY incremental truncation for the creation of hybrid enzymes

LB Luria Bertain broth

LDH lactate

ManNAc N-acetyl mannosamine

MLE II muconate lactonizing enzymes II from Pseudomonas sp. P51 xiii NADH nicotinamide ribonucleotide

NADPH nicotinamide ribonucleotide phosphate

NAL N-acetylneuraminate lyase

NANA N-acetylneuraminate

OSBS o-succinylbenzonate synthase

PAGE polyacrylamide gel electrophoresis

PCR chain reaction

PEP phosphoenol phosphate

PK pyruvate

Pi inorganic phosphate

PRA phosphoribosylanthranilate

PRFAR N’-[(5’-phosphoribulosyl)formimino]-5-aminoimidazole-4-

carboxamide ribonucleotide

ProFAR N’-[(5’-phosphoribosyl)formimino]-5-aminoimidazole-4-

carboxamide ribonucleotide

SCOP the Structural Classification of Proteins database

SDS sodium dodecyl sulfate

TCA trichloroacetate

THDP 2,3,4,5-tetrahydrodipicolinate

TIM triosephosphate isomerase

TrpF phosphoribosylanthranilate isomerase

TrpC indole-3-glycerol phosphate synthase

Trp1p phosphoribosyl anthranilate isomerase xiv ACKNOWLEDGEMENTS

I would like to thank Dr. Stephen J. Benkovic for his support, patience and advice. I also thank all members of Benkovic group, past and present, for their support during my stay in Penn State. In particular, I would like to thank Dr. Walter Fast for training me when I first joined the group and Dr. Seunggoo Lee for his collaboration and inspiration in the (β/α)8-barrel project. Finally but most importantly, I would like to acknowledge my family, especially my wife, for all their support and encouragement for all these years.

Chapter 1

Introduction

1.1 Protein Engineering

The ability to create proteins de novo with desired activities and properties is one

of the ultimate goals of biotechnology. Naturally occurring enzymes are able to catalyze a

broad range of chemical transformations that range from the fixation of nitrogen to the

synthesis of large and intricately structured molecules. They are highly efficient catalysts

6 8 -1 -1 with typical apparent second-order rate constants (kcat/Km) from 10 to 10 M sec and

rate accelerations over background rates in water can reach 1017 (Radzicka, 1995). Their

energy-efficient operation under mild conditions, combined with few by products from

their reactions, makes enzymes environmentally friendly (Arnold, 2001). In addition, the

enantioselectivity and regioselectivity of enzymes make them appealing for the

production of chiral pharmaceuticals (Buckland, 2000). Additionally, proteins have been

targeted as a new resource to develop novel therapeutics as they are involved in every aspect of living organisms (Vasserot, 2003).

There are some successful examples of industrial applications of , from laundry detergents to drug synthesis (Schmid, 2001), although to date, the impact

has been modest for several reasons. Natural enzymes have been designed and evolved 2 for the survival and reproduction of organisms that make them. For certain industrially important transformations, no known natural enzymes are suitable for the reactions. Since industrial processes often favor high temperature, a non-aqueous media and an extreme range of pH values in order to increase solubility, and to decrease the viscosity of the media and the risk of microbial contamination, enzymes can easily be denatured, or lose most of their activity under these conditions. Moreover, some enzymes are inhibited by low concentration of products. Although this feature is useful in regulating the metabolism inside a cell, at the same time, it really limits implementation of a biocatalytic process to make large quantities of product. Also some enzymatic reactions require expensive cofactors. Therefore, in order to generate a suitable protein without all these problems, there is great interest in the design of enzymes with favorable properties.

Ideally, the ultimate goal of protein engineering is to create an enzyme for any given chemical reaction (Nixon, 1998). However, our current knowledge of proteins is far from ideal for de novo design of protein function. To overcome this obstacle, two alternate solutions have been extensively pursued: (1) discovery of new enzymes by screening natural resources, and (2) the modification of the existing protein scaffolds

(Ostermeier, 2000). As expected, a combination of these two strategies would be the most successful, because the characterization of new proteins will improve our current understanding of protein structure-function relationships, and provide further guidance for protein engineering.

Currently, there are four major areas in which enzymes have been modified for greater utility as industrial biocatalysts, including: (1) modifying the activity/temperature profile of enzymes, which in most cases were to obtain greater ; (2) 3 improving stability or activity of enzymes in non-natural environments, such as organic solvents or an non-natural host; (3) changing the substrate specificity of enzymes to favor novel substrates and (4) altering the enantioselectivity of enzymes (Kuchner, 1997;

Arnold, 1999). In addition to generating better biocatalysts, protein engineering broadens and deepens our understanding of protein evolution and protein structure-function relationships, which can facilitate the development of new technologies for protein engineering.

Beyond individual proteins, whole metabolic pathways, even whole microorganisms, have been targeted and engineered to facilitate the production of industrially valuable compounds (Szczebara, 2003; Umeno, 2005). This new field, known as metabolic engineering, has become a major focus of applied biotechnological research during the past decade, and has been further enhanced by the development of technologies, especially recombinant DNA technology (Nielsen, 2001; Petri,

2004). As a model system, carotenoid biosynthetic pathways have been modified to generate unnatural compounds by directed evolution (Schmidt-Dannert, 2000; Umeno,

2004). The improvement of the whole-cell biocatalysts can be achieved by genome shuffling, the recombination of the genomic information from many different parental strains, which has the potential to facilitate cell and metabolic engineering and provide a non-recombinant alternative to the rapid production of improved organisms (Patnaik,

2002; Zhang, 2002). 4 1.2 Protein Evolution

Proteins are involved in every aspect of a living organism, yet they are composed of only twenty natural amino acids. The magic of biological systems to develop such great protein functional diversity has long been appreciated by protein scientists. Nature itself has been repeatedly successful in modifying protein functions to adapt to ever changing environments. The study and understanding of protein evolution in nature clearly can provide important insight into the design of similar strategies.

It is generally accepted that nature evolves new protein functions by redesigning existing protein frameworks (Ostermeier, 2000). Yet given the large number of proteins in nature, the number of different protein folds is estimated only to be 1000 (Chothia,

1993). Based on their structure and function, proteins can be classified into superfamilies, then subfamilies. Enzymes in the same superfamily adopt a similar protein fold, but exhibit different catalytic activities. One good example is the (β/α)8-barrel . Proteins with the (β/α)8-barrel domain represent about 10% of enzymes with known structures, and the reactions they can catalyze include oxidation, reduction, hydrolysis, and aldol cleavage (Reardon, 1995).

Modification of an existing protein structure through minor and/or major changes in gene sequence is the source of new enzymatic activities. Minor changes, including point mutations, deletions, and insertions of a few amino acids, have been sufficient sometimes for the protein to tolerate small environmental changes and to optimize existing functions (Chen, 1993; Miyazaki, 1999). However, major structural changes through duplication, circular permutation, fusion, insertion and deletion of long sequences provide opportunities for evolution of new functions (Ostermeier, 2000; Lutz, 5 2002) (Figure 1). Such radical rearrangements allow the exploration of large sequence space and activation of latent functionality, in combination with other sequences.

Protein diversity generated through natural evolution represents a major source of information for understanding the relationship between protein structure and function.

Since the completion of the first bacterial genome, that of Haemophilus influenzae

(Fleischmann, 1995), the number of published genome sequences has grown exponentially (van Nimwegen, 2003). Following the completion of a new genome sequence, genome annotation has been applied to predict the function of the potential protein coding regions. First, the evolutionary relationships between these predicted gene products and known proteins are evaluated based on the . For instance, orthologs are the genes in different species that evolve from a common ancestral gene and encode proteins with the same function, and paralogs are the genes in the same species that evolve from and encode proteins with different functions.

Then, the function of each predicted gene product is assigned according to its evolutionary relationship with known proteins. Therefore, the accuracy of genome annotation heavily relies on the accurate exploitation of protein evolutionary relationships

(Thornton, 1999; Todd, 2001). On the other hand, the thorough scrutiny of orthologs and paralogs from different species, combined with known protein structural and functional information, can lead to a better understanding of protein evolution and structure-function relationships (Phillippe, 2003). This benefits protein engineers in designing new in vitro methods for protein evolution, which can better mimic natural evolution processes to improve existing enzymes and develop novel functions.

6

Figure 1: Schematic diagram of mechanisms for major structural changes in protein evolution. In practice, new functions often evolve by a combination of mechanisms. (Adapted from (Todd, 2001; Lutz, 2002))

7

1.3 Methodologies of Protein Engineering

In recent years, protein scientists have made great strides in the development of novel protein engineering methodologies (Arnold, 1999; Neylon, 2004, Lutz, 2004). All these techniques generally fall into one of two approaches: rational and combinatorial mutagenesis. Figure 2 shows the general procedures of rational design and combinatorial mutagenesis, which can be repeatedly performed on the mutant enzymes to further improve the desired properties. For both approaches, the gene(s) encoding the enzyme(s) of interest, a suitable expression system and a sensitive detection system for desired properties are prerequisites.

1.3.1 Rational Protein Design

Rational protein design usually requires a knowledge of the structure of the enzyme and detailed knowledge of the protein’s mode of action, and is therefore very information-intensive. The process of rational design starts with the choice of a suitable protein structure. Using computational molecular modeling, the residues that may contribute to the desired properties are identified and mutated. The alteration can be achieved through a single-point mutation, exchange of elements of secondary structure, exchange of whole domains, and generation of fusion proteins (Nixon, 1998). The purified mutant enzymes are characterized carefully for their activity and properties because even minor changes, such as a single point-mutation, may cause a significant structural disturbance. 8

Figure 2: Schematic overview of general procedures of rational design and the combinatorial mutagenesis. During rational protein design, mutants are designed on basis of structural information and then constructed. After transformation into a host organism, the variant is expressed, purified and characterized for desired properties. Combinatorial mutagenesis starts with construction of mutant gene libraries by random mutagenesis or recombination. The mutants with desired properties are identified by selection or screening. Protein characterization and product analysis sort out desired and negative mutations. Both approaches can be repeated or combined for further improvement. (Adapted from (Bornscheuer, 2001)) 9 Rational protein design has been quite successful in modifying enzyme substrate specificity (Cedrone, 2000), and stabilizing enzymes toward thermo inactivation and oxidation (Bornscheuer, 2001). Deletion of the four C-terminal residues which partially occupy the active site cleft of Lactococcus lactis PepC converts it into an with properties resembling those of an from the family (Mata, 1999). The substitution of phenylalanine in place of asparagine at position

190 of Bacillus licheniformis alpha- (BLA) leads to a sixfold increase of the enzyme's half-life at 80 degrees C (Declerck, 2000), because deaminodation of asparagine and glutamine residues is mainly responsible for the irreversible thermoinactivation of BLA(Tomazic, 1988a,b).

Successful examples of interconverting catalytic activity by rational design are rare, mainly due to our limited understanding of the protein structure-function relationship. The excellent work by Xiang et al. rationally evolved a crotonyl hydratase activity from a 4-chlorobenzoyl-CoA dehalogenase scaffold, an activity that is absent from the wild-type enzyme (Xiang, 1999). The activity conversion was achieved by introducing two catalytic glutamate residues at positions 117 and 137, substituting a five amino acid segment at positions 144-148 to ease the unfavorable steric and electronic interactions resulting from the introduction of the glutamate residue at position 117, and inserting a conserved proline residue at position 136 to help orient the catalytic side chain of the new glutamate residue at position 137 (Figure 3). The resultant 4-chlorobenzoyl-

CoA dehalogenase mutant shows a crotonase activity, the syn hydration of crotonyl-CoA,

-1 -1 at kcat = 0.06s and Km = 50µM, compared to kcat = 1000s and Km = 40µM of wild type

crotonase (Xiang, 1999). 10

(A) Crotonase

A98 E164 N E144 O O G141 O CoA-S H O OH CoA-S N H O O H O CoA-S H H O

(B) 4-Chlorobenzoyl-CoA dehalogenase F64 N G114 O N H O CoA-S H G117 O CoA-S

CoA-S W137

O Cl OH O D145 Cl O H H H90 Figure 3: Proposed catalytic mechanisms of crotonase and 4-chlorobenzoyl dehalogenase. (A) The mechanism of the crotonase reaction involves the concerted syn addition of a proton from Glu164 to C(2) and hydroxide from water(blue) bound to Glu144 to C(3) across the si face of the double bond between C(2) and C(3). Two catalytic glutamate residues are labeled in red. (B) The 4-chlorobenzoyl dehalogenase catalyzes a multistep reaction. Asp145 adds to the benzoyl ring by forming a Meisenheimer intermediate. The departure of chloride is facilitated by the attack of a water (blue) activated by His90 and activation of the carbonyl by Trp137.

11 Mayo et al. suggested that engineering proteins for novel activities would benefit from computational methods that can search a larger sequence space than possible by any purely experimental protocols (Bolon, 2002). He and his coworkers created computationally a novel active site for ester hydrolysis within a thioredoxin scaffold, which features histidine-mediated nucleophilic hydrolysis of p-nitrophenyl acetate

(Bolon, 2001). Other successful de novo protein designs include a periplasmic receptor

that can trigger gene expression by binding extracellular Zn2+ (Dwyer, 2003); and a novel

α/β protein fold (Kuhlman, 2003). Although the catalytic activities of theses

computationally designed proteins were very modest, the computational protein design

methods provide a means to design novel catalytic activities that are inaccessible by

natural enzymes.

However, several disadvantages of rational protein design restrict its scope. The

method requires extensive structural and biochemical information that is not available for

the majority of potentially interesting enzymes. This problem is more acute for protein

sequences predicted from genomic sequences since their functions are predicted on the

basis of their homology with known enzymes and not on any structural and mechanistic

information. Additionally, a successful rational protein design requires not only a detailed

knowledge of the existing function, but also a detailed knowledge of the desired function.

To date, the catalytic activities of most mutated enzymes from rational protein design are

modest compared with natural enzymes because of our limited understanding of protein

folding, dynamics and stability.

There is, however, ample evidence that rational design efforts coupled with

directed evolution should dramatically improve the scope of de novo protein design 12 (Cedrone, 2000; Bornscheuer, 2001). While the former is better suited for the engineering of novel catalytic functions, the latter is better suited for optimizing enzyme activity. It is well known that subtle changes in the geometry of an active site suffice to generate totally unpredicted consequences for enzyme function (Koshland, 1998). Today, the fine- tuning of engineered enzymes can only be fulfilled by directed evolution, which explores sequence space that may contribute to enzyme function, but is not obvious for rational design. An impressive example that combined rational protein design and directed evolution was represented by Cherry et al., in which a heme from Coprinus cinereus was subjected to multiple rounds of directed evolution in an effort to produce a mutant suitable for use as a dye-transfer inhibitor in laundry detergent (Cherry, 1999). In the first round of random and site-directed mutagenesis, position G239 was identified to be crucial for protein stability. A mutant with the best combination of available mutations from both site-directed and random mutagenesis was subjected to a second round of random mutagenesis and the resultant mutants with improved stability were ranked based on activity/stability. The last step of the directed evolution was in vivo shuffling of a wild type gene with ten mutants that represented a spectrum between low activity/high stability and high activity/low stability variants in attempt to combine beneficial mutations for high activity/high stability mutants. A mutant with 174 times the thermostability and 100 times the oxidative stability of the wild-type enzyme was created. 13 1.3.2 Combinatorial Approaches

Combinatorial approaches, also well known as directed evolution, mimic natural protein evolution to generate a protein with desired properties and activities by iterative screening or selection from a large pool of protein variants (Petrounia, 2000) (Figure 2).

In the past decade, directed evolution has become a major area of protein engineering in terms of improvement of protein functions and the development of novel technologies. In contrast to rational design, one advantage of directed evolution is that little or no information regarding protein structure and function is required. The accumulation of protein sequences from genome sequencing also contributes to the increasing popularity of directed evolution.

Generally, each directed evolution process includes two steps: the generation of molecular diversity, followed by identification of variants with desired activities or activities. The large diversity of protein repertoires is typically achieved at the corresponding DNA level due to the direct linkage between protein and its encoding nucleic acid sequence. The identity of any selected protein can be determined directly by

DNA sequencing. In addition, DNA material is straightforward to amplify and manipulate with a plethora of molecular biology tools available. Therefore, the methodology of choice in a directed evolution experiment is to construct a library of variant genes, and screen and select for favorable properties from the protein products of these genes. 14 1.3.2.1 Generation of Molecular Diversity for Directed Evolution

The first step of directed evolution is to creat protein-encoding DNA libraries by methods that may be divided broadly into those using random mutagenesis and those using recombination (Figure 4). These methods have been described in detail by Arnold and Georgiou (Arnold and Georgiou, 2003a). The study of the merits of these two approaches as strategies for obtaining proteins with novel activity led to the conclusion that recursive random mutagenesis produced essentially asexual populations, within which beneficial mutations drove each other into extinction (clonal interference); DNA shuffling and combinatorial cassette mutagenesis led instead to the accumulation of beneficial mutations within a single allele (Rowe, 2003).

Random mutagenesis methods directly generate sequence diversity in the form of random point mutations, insertion or deletions on the parental gene sequence. These methods can be divided further into random point mutation methods that introduce mutations randomly at random positions along the entire gene, and saturation mutation methods that introduce one of twenty amino acids at a specific position within a gene sequence (Ness, 2000). Iterated random point mutagenesis coupled with a selection or screen to evolve an improved protein is the most popular methodology for directed evolution (Shao, 1996). Random point mutagenesis of the gene for the protein to be improved can be performed by error-prone PCR (epPCR), or exposure to mutator strains, or chemical and physical mutagents (Joyce, 1994; Nguyen, 2003; Taguchi, 1998).

Random point mutagenesis by mutator strains or chemical and physical mutagents exposure is easy to set up, but random mutations can be introduced into the vector 15

Figure 4: Schematic overview of methods for the creation of DNA sequence diversity. (A) (1). Random point mutagenesis methods introduce mutations at random positions throughout the entire target gene sequence.(A) (2). methods incorporate the randomized codons that code 20 amino acid residues into the specific positions of the target gene sequences. The small bars with different colors represent different mutations. Random point and saturation mutagenesis methods do not incorporate foreign sequences into the parental genes. (B) (1). In homology-dependent recombination methods, the parental genes are digested randomly into small pieces, and the crossovers between the parental genes are generated by the annealing of these small DNA fragments based on their sequence homology. (B) (2). In homology-independent recombination methods, after the parental genes are digested into small fragments, the crossovers between the parental genes are generated by blunt-end ligation. Recombination methods bring existing sequence diversity from parental gene sequences into novel combinations

16 carrying the gene of interest as well as the gene itself. Some mutations in the vector might disrupt the replication of the vector, and therefore lead to loss of some potential candidates. Error-prone PCR utilizes the low fidelity of DNA polymerase in the presence of Mn2+ and biased concentrations of dNTPs to incorporate mutations along the gene of

interest, with the ability to control the mutagenesis level by the proportion of Mn2+ in the

reaction and number of cycles of amplification (Cirino, 2003). Utilizing a number of

nucleoside triphosphate analogues, one in five base pairs can be changed in a controllable

manner (Zaccolo, 1996). Generally, random mutagenesis methods do not require structural and functional information for the proteins of interest. When enough structural

information is available to target certain regions, but not enough for designing and testing

specific single mutations, saturation mutagenesis methods, also known as

oligonucleotide-directed randomization, can be performed to only randomize specific

positions in target genes. This approach lies between those structural-based rational

design and random point mutagenesis approaches. Saturation mutagenesis methods

replace native codons with a mixture of all possible codons at specific positions in target

gene sequences by incorporating synthetic DNA olionucleotides containing randomized

codons (Reidhaar-Olson, 1991). Like rational design, the factors limiting this approach

are the need to identify critical regions for mutagenesis, the bias problems associated with

synthesis of DNA olionucleotides, and the low chance to generate beneficial mutations,

for which vast libraries are required for testing when multiple random changes are

introduced.

While random mutagenesis methods can be regarded as the “asexual” evolution of

an improved protein by stepwise accumulation of single mutations, recombination-based 17 methods represent another aspect of natural evolution: generation of novel diversity by sexual recombination between preexisting diversity. In nature, the exchange of homologous regions of chromosomes during meiosis in sexually reproducing organisms and genetic exchange in bacteria is the major generator of diversity (Ness, 2000). Under selective pressure, those organisms bearing a better mixture of beneficial alleles and fewer deleterious alleles are more desirable to propagate. Through multiple generations under selective pressure, natural or imposed, the accumulation of beneficial alleles can result in remarkable modifications from ancestral organisms. In 1994, the first laboratory random recombination method, DNA shuffling, was developed by Stemmer et. al., and it has become the most efficient and rapid method for directing the evolution of nucleic acids and proteins. (Figure 5) (Stemmer, 1994a, b). In the past decade, several other recombination methods for directed evolution have been developed to improve protein activity and stability, to engineer allosteric interactions, and to alter substrate specificity

(Petrounia, 2000; Tao, 2002). These methods can be generally divided into two groups: homology-dependent methods and homology-independent methods.

Homology-dependent recombination methods include DNA shuffling, its modified variants, and in vivo DNA shuffling in S. cerevisiae (Manivasakam, 1995). All require sequence homology between DNA fragments from the same and different genes to generate crossovers. DNA shuffling and family DNA shuffling are powerful tools to improve favorable protein properties by generating new combinations of beneficial sequences while flushing out neutral and deleterious sequences. These methods can be utilized to process DNA sequences from different sources, including a single gene, the same gene with different sequence diversity generated by random mutagenesis, gene 18

Figure 5: Schematic overview of DNA shuffling. A family of homologous genes is randomly fragmented (e.g., by treatment with DNase I). The gene fragments are assembled into a library of full-length chimeric genes with multiple crossovers by iterative cycles of denaturation, annealing, and DNA polymerase extension. The protein library is obtained by expression of the DNA library. 19 fragments with the same function from different species, genes encoding a whole metabolic pathway, and even whole genome sequences from different species (Ness,

2000). To improve the chances of recombination or increase the distribution of crossover events, a variety of experimental techniques have been developed based on DNA shuffling (Table 1) (Stevenson, 2002; Lutz, 2004). Several recombination methods utilizing synthetic degenerate olionucleotides, such as degenerate homoduplex recombination, synthetic shuffling, and assembly of designed oligonucleotides, have been developed recently. These methods significantly increase the library sequence diversity, especially the high resolution crossovers (i.e. sites of recombination very near each other), which accelerates directed evolution (Ostermeier, 2003).

However, there are certain drawbacks associated with DNA shuffling and its variant techniques: lack of high resolution crossovers and dependence on homology for crossover generation (Ostermeier, 2003). The low resolution crossovers make it difficult to recombine short genes or separate polymorphisms near each other (Moore, 2001). The dependence on homology results in a bias toward generating crossovers in regions with highest homology. It is also a generally known fact that a severe bias toward no recombination occurs among parental sequences with less than 60% sequence identity using annealing-based methods (Stemmer, 1994a, Moore, 2001). The recombination methods involving synthetic oligonucleotides partially address these limitations, but require the synthesis of high quality, expensive oligonucleotides. The dependency on homology limits annealing-based methods from exploring sequence space with low DNA sequence identity. Given the fact that protein structure is more conserved than DNA sequence, it has been shown by Monte Carlo simulations of directed evolution strategies 20

Table 1: Summary of homology-dependent in vitro recombination methods for the generation of DNA sequence diversity. Method Characterizations Reference DNA Shuffling Fragmentation by DNase I digestion Stemmer, 1994a,b Random Priming Fragmentation by random priming Shao, 1998 Recombination (RPR) synthesis Restriction Fragment Fragmentation by restriction digestion Kikuchi, 1999 Shuffling Random Chimeragenesis Annealing of gene fragments to a Coco, 2001 on Transient Templates transient template (RACHITT) Staggered Extension Template switch by using extremely Zhao, 1998 Process (StEP) short extension time Single-stranded Shuffling Single-stranded DNA as templates Kikuchi, 2000 Recombination-dependent Similar to StEP, with the use of gene- Ikeuchi, 2003 Exponential Amplification specific primers PCR (RDA-PCR) Mutagenic and Reassembly with a unidirectional primer Song, 2002 Unidirectional Ressambly to generate shuffling sequences truncated (MURA) from one terminus Recombined Extension on Random unidirectional priming on a Lee, 2003 Truncated Templates single-stranded template (RETT) Degenerate Homoduplex Synthetic oligonucleotides encoding all Coco, 2002 Recombination (DHR) polymorphisms from parental genes for one step recombination Synthetic Shuffling A group of synthetic degenerate Ness, 2002 oligonucleotides encoding all the genetic diversity for shuffling Assembly of Designed Similar to synthetic shuffling with the Zha, 2003 Oligonucleotides (ADO) addition of new diversity into oligonucleotides

21 that DNA recombination in regions lacking sequence homology is more likely to create productive hybrid proteins (Bogarad, 1999).

The need for a recombination method that allows exploration of DNA sequence space regardless of sequence homology motivated the development of homology- independent recombination methods (Table 2), including the Incremental Truncation for the Creation of Hybrid Enzymes (ITCHY) (Ostermeier, 1999a) (Figure 6). These methods generate crossovers between parental genes using ligation or pre-synthetic primers instead of annealing, which eliminate the dependence on DNA homology to generate crossovers. Theoretically, ITCHY and SHIPREC are able to generate crossovers at every base pair along the length of the parental gene sequences (Ostermeier, 2003).

However, one important limitation of ITCHY and SHIPREC is that both methods can only generate single crossover hybrids between just two parental sequences. Lutz et. al. proposed a method, known as SCRATCHY, to generate multiple crossovers per sequence without homology dependency (Lutz, 2001). In SCRATCHY, the single crossovers generated by ITCHY are recombined to generate multiple crossovers by DNA shuffling.

The number of crossovers generated by SCRATCHY can be further enhanced by PCR amplification of crossover-containing sections for DNA shuffling ((Kawarasaki, 2003).

In parallel with the development of experimental techniques for directed evolution, tremendous efforts have been undertaken to develop computational modeling frameworks for directed evolution. These computational methods can be divided into two groups based on the subject of these algorithms study: DNA and protein (Lutz, 2004).

The DNA algorithms simulate the experimental process for directed evolution and predict the molecular diversity of combinatorial DNA libraries generated under certain 22

Table 2: Summary of homology-independent in vitro recombination methods for the generation of DNA sequence diversity.

Method Characterizations Reference: Incremental Truncation for the Blunt-ended ligation of truncated Ostermeier, Creation of Hybrid Enzymes fragments of two parental genes by 1999a,b (ITCHY) ExoIII digestion Sequence Homology Circular permutation of two tandem Sieber, 2001 Independent Protein parental genes followed by pre-selection Recombination for correctly fold proteins (SHIPREC) Structure-based Combinatorial Recombination of structural building O'Maille, 2002 Protein Engineering blocks determined by protein structures (SCOPE) Sequence-independent Site- Recombination of structural building Hiraga, 2003 directed Chimeragenesis blocks determined by SCHEMA (SISDC) algorithm (Voigt, 2002) Gene Reassembly Recombination of gene fragments from Richardson, parental genes by restriction digestion 2002 SCRATCHY DNA Shuffling of chimerical gene Lutz, 2001 fragments from ITCHY libraries

23

Figure 6: Schematic Overview of ITCHY (Incremental Truncation for the Creation of Hybrid Enzymes). Two opposing incremental truncation libraries are generated from two parental genes by III digestion, and then are randomly ligated together to generate an ITCHY library. For each pair parental genes, two ITCHY libraries are constructed, each of which has one parental sequence at N terminus. (Adapted from (Stevenson, 2002))

24 conditions, including random mutagenesis and DNA shuffling (Moore, 2000; Maheshri,

2003). The protein algorithms predict the stability or activity of the resultant hybrid proteins by quantifying the impact of replacement, deletion and insertion of protein building blocks utilizing sequence and structure information of parental sequences.

Focusing on the local networks of interactions, the SCHEMA algorithm can identify the crossover points to generate hybrid proteins with minimal disruption of favorable interactions (Voigt, 2003). Those hybrid proteins with minimal disruption are most likely to fold into parental structures, and most likely to be functional. On the other hand, the second-order mean-field identification of residue-residue clashes in protein hybrids

(SIRCH) was able to identify interactions between not only local but also distal secondary structures in (DHFR) (Moore, 2003). In SIRCH, the probability of all possible residue-residue combinations in a minimum Helmholtz free energy ensemble is evaluated using second-order mean-field description based on the calculation of the total conformational energy of both the native and denatured states,

followed by the detection of the clashes in potential hybrids using the pairwise

substitution patterns uncovered by the second-order mean-field description (Moore,

2003). Even with recent successes, more reliable in silico predictive algorithms for

protein engineering are still being developed by iteration between computational

modeling and experimental testing, which also provide insight into the structure-function

relationships of proteins. 25 1.3.2.2 Screening and Selection

When a DNA library with great diversity is generated using the techniques described above, the next step of directed evolution is to identify the individual members with desired activities or properties, which is achieved by either screening or selection.

The design of screening and selection methods is essential for the success of directed evolution and determines the output of DNA libraries, as implied by the statement “ you get what you select for” (Zhao, 1997). A practical screening and selection method has to meet certain requirements: (1) the establishment and maintenance of the linkage between genotype (DNA libraries) and phenotype (protein products of DNA libraries), which can be either a direct physical link as in a or mRNA-peptide fusion, or indirect link by compartmentalization in a phage surface or cell surface display; (2) the sensitivity to detect a slight improvement in the desired properties or activity; (3) the simplicity of the assay to analyze a large number of candidates. The development of recent screening and selection protocols has been described by Arnold and Georgiou

(Arnold & Georgiou, 2003b).

Screening is a serial analysis of the desired properties of each member of the library and typically allows between 104 and 106 distinct members to be considered,

possibly up to 108 members in the cases of high-throughput screening. In screening, to

identify individual variants with improved activities or properties, every member of the

library is examined individually based on its performance as a protein. Typically, low- or

medium- throughput screening are carried out on agar plate or microtiter plate based

assays. With the advent of robotic automation and colony picking technologies,

throughput of screening has been increased substantially. Recently, utilizing 26 fluorescence-activated cell sorting (FACS) in combination with cell surface display technologies, large libraries exceeding 109 members can be quantitatively screened for

desired functions, including expression level, stability, ligand binding and catalysis, at a

rate of 30000 cells/second (Boder, 1998;Georgiou, 2000; Daugherty, 2000; Olsen, 2000;

Becker, 2004).

Selection is a parallel analysis of the desired properties of all members of the

library and allows simultaneous analysis of up to 1014 members, which usually is

considered as the upper limit of the number of proteins that can be analyzed

experimentally. Generally, based on the location of protein expression, selection methods

can be divided into two categories: in vivo and in vitro selections (Table 3).

Obviously, in vivo methods require that the DNA library must be transformed into

cells and utilize host transcription and translation machinery for the expression of proteins. When the proteins are expressed inside the cytoplasm, the functions examined have to be linked to visible phenotypes that include genetic complementation of nutritional deficits, a growth advantage on altered environments or a colorimetric assay

(Jestin, 2004). In the cases of phage-display and cell surface display, the proteins are on the surface of phage particles or cells and accessible for direct examination. In vivo methods have been successfully applied for the selection of protein-protein interactions, protein solubility, and especially the development of novel substrate specificity and enzymatic catalysis (Bornscheuer, 1999; Jurgens, 2000; Sieber, 2001; Horswill, 2004).

One drawback of in vivo methods results from their requirement of transformation, which sets a limit of library size determined by transformation efficiency. Typically, the upper limit for Escherichia coli libraries is 1010 to 1011, and even lower for yeast libraries 27

Table 3: Summary of in vivo and in vitro selection methods and their applications. Method Characterization Applications In vivo selections: Genetic Based on cell survival Enzymatic catalysis (Jurgens, 2000); Complementation under non-permissive novel substrates (Bornscheuer, conditions for desired 1999); altered thermostability activities or properties (Fridjonsson, 2002); protein interaction (Colas, 1996); protein ligand binding (Licitra, 1996) Phage-display Display of proteins of Catalytic (Baca, 1997); (Smith, 1985) interest on the surface of protein interaction (Shanmugavelu, the phage by fusing with 2000); protein-ligand binding (Beste, one of the coat proteins 1999); enzymatic catalysis (Ponsard, 2001) Cell-surface Display of proteins of Protein interaction (Boder, 2000); Display interest on the surface of (Olsen, 2000) a living cell by fusing with a membrane protein Plasmid Display Formation of translated Protein-ligand binding (Cwirla, (Cull, 1992) protein and plasmid 1997) DNA complex utilizing DNA-binding protein In vitro selections: Ribosome Display Formation of non- Protein interaction (Hanes, 2000) (Mattheakis, 1994) covalent complex between translated peptide, ribosome, and mRNA mRNA-protein Fusion of translated Protein-ligand binding (Wilson, Fusion peptide and mRNA 2001) (Roberts, 1997) trough a covalent linkage by puromycin CIS display Formation of translated Protein interaction (Odegrip, 2004) (Odegrip, 2004) protein and DNA complex utilizing RepA In vitro Compartmentalization Protein-ligand binding (Yonezawa, Compartmentalizati utilizing water-in-oil 2003); enzyme catalysis (Griffiths, on (IVC) emulsion 2003) (Tawfik, 1998)

28 because of lower transformation efficiency in yeast. Additionally, after each in vitro randomization step to further introduce novel sequence diversity in directed evolution, a new library has to be created and transformed for the next round of selection. Once proteins are expressed in vivo, growth disadvantage or even toxicity in the host environment can lead to a loss of potential candidates. Another major concern is false positives arising from the cells’ ability to survive by bypassing the desired selection pressure.

In the past decade, several exciting advances have been made in the area of in vitro selection methods, including ribosome display, mRNA-protein fusion, CIS display, and water-in-oil in vitro compartmentalization (Table 3). All these in vitro selection methods utilize cell-free translation systems for protein expression and no transformation is necessary. Therefore, the libraries constructed in vitro can maintain their size without losing potential candidates. It has been demonstrated that an increase in library size improves the chance to select for the desired function and increases the diversity of molecules selected (Lancet, 1993). As shown in Table 3, in vitro selection methods have been mostly applied to the areas of protein interaction and protein-ligand binding, with recent advances in enzymatic catalysis. Major questions are now to increase in vitro translation yield and to improve correct folding of translated polypeptides into their three- dimensional structures (Pluckthun, 2000).

In this thesis, two pairs of proteins: Escherichia coli purT-encoded glycinamide ribonucleotide transformylase (PurT) and N5-carboxylamino imidazole ribonucleotide

synthetase (PurK); Escherichia coli N-acetylneuraminate lyase (NAL) and

dihydrodipicolinate synthase (DHDPS) have been chosen in an attempt to interconvert 29 their activities using rational and combinatorial technologies. Both pairs of enzymes share similarity in structures and chemical activities, but low DNA sequence homology.

Several key structural elements for enzyme functionality were identified and provided further insights into the better understanding of the protein structure-function relationships. 30

Chapter 2

Rational Domain Swapping between purT-encoded GAR transformylase (PurT) and N5-CAIR synthetase (PurK)

2.1 Introduction

The evolution of biosynthetic pathways has been a hot topic for decades, for which several theories have been proposed, including the retrograde hypothesis, forward evolution, gradual accumulation of mutant enzymes, and molecular recruitment (Nixon,

1997). In the theory of molecular recruitment, also known as “domain swapping”, new enzyme functions are evolved by recruitment of functional domains from enzymes catalyzing analogous reactions (Jensen, 1976). Naturally existing functional domains or

modules are considered as building blocks for complex structures, and novel functions

result from assembly of different combination of these functional domains. Ostemerier

defined domain swapping as the genetic rearrangement of proteins by combining pieces

of genes that code for domains or subdomains (Ostermeier, 2000). The identification and

characterization of these functional building blocks is relevant to the elucidation of

questions regarding protein evolution, folding, and protein engineering for tailored

functions (Lee, 2003).

Nature has been exploring domain swapping for the evolution of novel protein

functions, especially for enzymes involved in biosynthetic or metabolic pathways,

through mechanisms that include formation of multifunctional proteins, tandem

duplication, domain recruitment, and circular permutation (Figure 1). Formation of 31 multidomain proteins occurs more frequently in eukaryotes than in prokaryotes since prokaryotic enzymes catalyzing successive reactions are often evolved into a single, multidomain protein in eukaryotes. Duplication of genes or gene fragments in tandem is another method to evolve new function, with advantages to increase stability, to generate additional binding sites, and to develop modular proteins. Domain recruitment is a mechanism by which functional units from one protein are “recruited” by another protein, such as ligand binding domains which then can be utilized by different proteins. Circular permutation generates a protein with new N- and C- termini, and has its original N- and

C- termini fused together. In nature, circular permutation happens after tandem duplication of a gene, and introduces a new reading frame in the first repeat and a new stop codon in the second repeat. These mechanisms have been applied in multiple areas of protein engineering, such as , creation of activators and inhibitors, improvement in stability or expression, modification of substrate specificity, improvement of catalytic efficiency, alteration of multimodular synthetases, improvement of therapeutic properties, creation of molecular biosensors, and creation of novel enzymes (Nixon, 1998; Beguin, 1999; Ostermeier, 2000).

2.1.1 GAR transformylase (PurT) and N5-CAIR synthetase (PurK)

In order to further understand the role of domain swapping in the evolution of

biosynthetic pathways, two enzymes in the de novo purine biosynthetic pathway, purT-

encoded glycinamide ribonucleotide transformylase (PurT) and N5-carboxylamino

imidazole ribonucleotide synthetase (PurK), were chosen as a model system. While it is 32 nearly ubiquitous in all organisms, the de novo purine biosynthetic pathway produces purines which can be utilized as building blocks for DNA and RNA synthesis, as an energy source for chemical reactions (ATP), as proton donors for cellular redox reactions

(NADH, NADPH, FAD, etc.), and as signaling molecules in various regulatory pathways

(cAMP) (Warren, 1996). In bacteria, plants, fungi, and yeast, the de novo purine biosynthetic pathway includes 12 enzymatic steps, catalyzing the conversion from phosphoribosylpyrophosphate to inosine monophosphate (IMP), which then serves as precursor for the synthesis of adenosine and guanosine monophosphate (Figure 7). The de novo purine biosynthetic pathway has drawn considerable attention for its critical role and as a potential target for cancer chemotherapy because cancer cells require large amounts of purines to maintain their rapid growth. Structures of all enzymes in the bacterial pathway except PurL are now available, with seven of them from E. coli.

PurT, glycinamide ribonucleotide (GAR) transformylase, catalyzes the third reaction of the de novo purine biosynthetic pathway, the conversion of GAR, ATP and formate to formyl GAR, ADP and Pi while PurK, N5-carboxylaminoimidazole ribonucleotide (N5-CAIR) synthetase, catalyzes the fifth step of the pathway, the

conversion of 5-aminoimidazole ribonucleotide (AIR), ATP and bicarbonate to N5-CAIR,

ADP and Pi (Figure 8). There are several reasons for choosing these two enzymes:

(1) They have highly similar three-dimensional structures. Recently, high resolution three-dimensional structures of E. coli PurK, and E. coli PurT in complex with GAR and different ATP analogs were solved (Thoden, 1999; 2000; 2002). Based on the structure and amino acid sequence analysis, both enzymes belong to a newly emerging containing an ATP-grasp fold, which also includes biotin carboxylases (BC), ATP ADP 33 formate Pi GA R Transformylase PurT PRPP O H N HO O- Amidotransferase HO O- GAR Synthetase 2 HN NH 2 - P O HO O OOH (PurF) P O (PurD) HO O PurN O ATP HN HO O- O O Gln Glu O ADP P O O HN P P O O O O OH PPi Gly Pi O O P O O HO OH HO OH HO OH HO OH Phosphoribosyl phosporibosylamine glycinamide 10-formyl ATP formylglycinamide pyrophosphate (PRPP) ribonucleotide NTHF THF Gln ribonucleotide (GAR) (fGAR) Pi ADP FGAM Synthase Glu (PurL) O O N N OH 5-CAIRN O N 5 N HN - -CAIR Synthase - HO O - HO O O - N (PurE) N NH OH (PurK) HO HO O NH P O N + P O 2 O NH 2 HN NH O P AIR Synthase P O 2 O O O O O (PurM) O O ATP ADP ATP HO OH ADP HCO HO OH HO OH Pi Pi - HO OH 5-aminoimidazole 3 ribonucleotide(NN 5-CAIR) carboxyaminoimidazole ATP aminoimidazole formylglycinamidine ribonucleotide (CAIR) Asp ribonucleotide(AIR) ribonucleotide (FGAM) SAICAR Synthetase ADP (PurC) O CO H O O O Pi 2 N CO 2H N N N N NH NH NH H 2 2 HO O- HO O- HO O- HO O- N NH N NH N N N P O 2 P O 2 AICAR P O N O P O O O H O O AdenylosuccinateO Transformylase O O IMP CyclohydrolaseO Lyase (PurB) 10-formyl(PurH) (PurJ) HO OH HO OH HO OH O HO OH THF 2 fumarate N H THF

Figure 7: succinylThe de aminoimidazolecarboxyamide novo purine biosynthetic pathwayaminoimidazolecarboxyamide in microorganisms. The geneformylAICAR name for each enzymeinosine in monophosphateE. coli is (IMP)in the ribonucleotide (SAICAR) (fAICAR) parenthesis. Two enzymes, PurN and PurT, canribonucleotide catalyze (AICAR) the third step, the formylation of GAR, utilizing different cofactors. In higher organisms, the carboxylation of AIR by PurK and Class I PurE are catalyzed by a single Class II PurE. 34

Figure 8: Reactions catalyzed by PurT transformylase (A) and PurK (B).

35 carbamoyl phosphate synthetase (CPS), D-alanine:D-alanine ligase (DDL), and GAR

synthetase (PurD), the second enzyme in the purine biosynthetic pathway, etc. (Galperin,

1997). All these enzymes catalyze reactions which involve ATP-dependent ligation of a

carboxyl group carbon of one substrate with an amino or enamine nitrogen of the second

one, and produce an acylphosphate intermediate for subsequent attack by a wide range of

nucleophiles. Additionally, each ATP-grasp protein has a similar molecular architecture

consisting of three motifs termed the A-, B-, and C- domains. As shown in Figure 9 (A)

and (B), PurT and PurK are no exceptions. Furthermore, PurT and PurK are

superimposable with a root-mean-square deviation of 1.1 Ǻ for 177 structurally

equivalent α-carbons (Thoden, 2000). The A- domain of both enzymes adopt a

Rossmann-fold structure, which is shared by many nucleotide binding proteins including

the N-terminal domain of PurN and PurD, two enzymes in the purine biosynthetic

pathway. The linkage between the A- and B- domain, F74-F87 in PurK and A105-A122

in PurT, adopts a helix-residue-helix motif, which appears to be a structural hallmark for

enzymes belonging to ATP-grasp superfamily (Thoden, 1999). The B-domain is the

smallest domain of the three structural motifs of PurT and PurK and contains a loop

region essential for enzymatic activity, which will be discussed in detail later. The C-

domain, the most complicated of the three motifs, forms a palm-and-thumb structure with

A- and B- domains. The C-domain varies in size and complexity among members of the

ATP-grasp superfamily, with only one common structurally conserved strand-loop

structure (J loop), aside from overall protein topology (Thoden, 1999).

(2) They have a similar active site organization. Besides the similarity in overall three-

dimensional structure, PurT and PurK share high homology in their active sites 36

Figure 9: Three-dimensional structures of PurT (A) (Thoden, 2000), PurK (B) (Thoden, 1999), and superposition of the α-carbons for PurT (in blue) and PurK (in red) (C). The A-, B-, and C- domains are color-coded in blue, green, and yellow, respectively. The substrates in complex with enzymes are displayed in a ball-and-stick representation. (A). PurT structure with GAR and 5’-adenylyl imidodiphosphate (AMPPNP). (B). PurK structure with ADP. (C). Superposition of PurT and PurK structures. PurT and PurK are displayed in red, and blue ribbon-representation, respectively. The superposition was done manually using Weblab Viewer Pro 3.7 by Molecular Simulations Inc. 37 organization. For PurT and PurK, five structurally conserved loop regions form the active site and will be introduced in the order from the N- to C- terminus, as displayed in

Figure 10 (A) and (B). In PurT, the residues in the P loop are involved in binding of the phosphate group of GAR, and the same function is attributed to the P loop in PurK based on their structural homology. A similar mononucleotide binding motif has been established in PurN (Klein, 1995), PurD (Wang, 1998), PurF (Krahn, 1997), and PurE

(Mattews, 1999). Thus six of 12 enzymes in the purine biosynthetic pathway have a similar for the phosphate group of a ribose 5-phosphate moiety, common to all intermediate metabolites in the purine biosynthetic pathway, which leads to the speculation that there is a common ribose 5-phosphate binding moiety amongst all the 12 enzymes (Kappock, 2000). In the PurT structure with bound GAR and AMPPNP, the residues on the B loop are involved in binding of AMPPNP; especially the phosphate groups (Figure 10C). The backbone NH groups of S161 and G162 are positioned within hydrogen bonding distance of a γ- and a β- phosphoryl oxygen of ATP, respectively.

Additionally, the hydroxyl group of S161 is within hydrogen bonding distance to the amino group of GAR and a γ- phosphoryl oxygen of ATP. It also has been shown that the conformation of the PurT B loop is highly sensitive to the chemical identity of the nucleotide situated in the binding pocket, especially the nature of the moiety occupying the γ-phosphate position (Thoden, 2002). In the PurK structure with bound MgADP, the

B loop is disordered due to the missing γ-phosphate of ATP (Figure 10B). Generally, the

B-loops among the ATP-grasp proteins are Gly- and Ser- rich as shown by structural alignments (Figure 11B). But the PurK B-loop contains some 38

Figure 10: Active sites of PurT (A), PurK (B), and superposition of conserved residues in PurT and PurK active sites (C). The B loop, the Ω loop, the P loop, the J loop, and the C loop are color-coded in blue, yellow, red, green, and pink, respectively. The substrates are shown in ball-and-stick representation. The superposition was done manually using Weblab Viewer Pro 3.7 by Molecular Simulations Inc. (A). PurT structure with GAR and AMPPNP. (B). PurK structure with ADP. (C). Superposition of conserved residues in PurT and PurK active sites. PurK and PurT conserved residues within the active sites are depicted in red and blue line representations, respectively. The residues are indicated based on Escherichia coli numbering. The GAR and AMPPNP molecules are shown in ball-and-stick representation. Two Mg2+ ions are depicted in green. 39

Figure 11: Comparison of amino acid sequences of E. coli PurT and PurK(A), B loop (B) and J loop (C) from a structure-based alignment. The alignment was done using CE (combinatorial extension of the optimal path) (Shindylalov, 1998). Conserved residues in each enzyme are masked in green. Identical residues between different enzymes are marked by red asterisk. (A). Comparison of PurT and PurK sequences. The boundaries of individual domains are marked by red bar. Five structural conserved loop regions forming the active site are labeled. (B). Comparison of B loop sequences in five ATP-grasp proteins. (C). Comparison of J loop sequences in seven ATP-grasp proteins. DDL, D- alanine:D-alanine ligase; GTS, glutathione synthetase; CPSN, N-terminal domain of carbamyol phosphate synthetase; BC, biotin carboxylases.

40 unique features, including an extra residue G124 and four uncommon residues: R121,

R122, Y126 and D127, the latter two of which are conserved amongst various PurK sequences. One possible role for Y126 is to sequester the active site from the outside environment in conjunction with residues on the J-loop, which will be introduced later.

The Ω loop is present in most members of the ATP-grasp superfamily in the C-domain, which interacts with the B loop to provide a protective face over the β- and γ- phosphates of ATP and the putative acyl phosphate intermediate. In PurT and PurK, this Ω loop is very short. As mentioned above, the J loop is the only common feature in the C-domain amongst all members of ATP-grasp superfamily (Figure 11C) (Thoden, 1999). A conserved Glu residue, E238 in PurK and E279 in PurT, is known to be a metal ion ligand. A conserved Arg residue, R242 in PurK and R283 in PurT, probably interacts with formate or bicarbonate, then putative acyl phosphate intermediate, acting as an arm to deliver the intermediate to GAR or AIR. To facilitate the nucleophilic attack of the amine group on the acyl phosphate intermediate, D286 in PurT is believed to act as the general base to deprotonate the amine group of GAR because GAR are protonated at physiological pH (Thoden, 2000). But its counterpart in PurK, N245 in PurK, is not likely to perform the same function, the deprotonation of the amine group in GAR, which can be done by D127 in the PurK B loop. Although the structural evidence of D127 in the

PurK active site is not available, its counterpart in PurT, S161 is in the hydrogen binding distance with the amine group of GAR (Thoden, 2000). In DDL, GTS, and two domains of CPS, the J loop is near the end of their C-domain. In contrast, an additional globular motif follows the J loop in the C-domain of PurD, BC, PurT and PurK. In this region, a 41 conserved C loop contributes a significant portion of the mononucleotide binding site, in combination with the P loop from A-domain.

The exact binding sites for formate in PurT and bicarbonate in PurT are not clear, though it was suggested that the binding sites for formate or bicarbonate are located in the center of three domains, involving the γ phosphate of ATP, the nitrogen atom for nucleophic attack in GAR or AIR, and residues from the J-loop (Thoden, 1999; 2000).

(3) They have a similar chemical reaction mechanism. The enzymes of the ATP-grasp protein superfamily catalyze reactions involving formation of acyl phosphate intermediates from an acid and ATP, which is a common strategy for ATP-dependent C-

N ligation (Galperin, 1997). As shown in Figure 12, both PurT and PurK reactions consist of formation of acyl phosphate intermediates by ATP cleavage, then nucleophilic attack on the acyl phosphate intermediates by a nitrogen of the mononucleotides as the last step. It has been demonstrated experimentally that formyl phosphate is chemically and kinetically competent as an intermediate for PurT (Marolewski, 1997). By using a

PurT mutant with a single mutation G162I, the formation of formyl phosphate was detected (Marolewski, 1997). In addition, wild type PurT is capable of catalyzing the conversion of acetate and ATP to acetyl phosphate and ADP, implying the formation of formyl phosphate as an intermediate (Marolewski, 1994). No direct evidence of the formation of carboxyphosphate by PurK as an intermediate is available due to the poor stability of carboxyphosphate, whose half-life in water is estimated to be less than 70ms

(Sauers, 1975). But in the presence of ATP, AIR and [18O]-bicarbonate, PurK catalyzes

the 18O transfer from [18O]-bicarbonate into [18O]Pi. Quantitation of the amount of

[18O]Pi generated reveals that about 1 atom of 18O was transferred from bicarbonate to Pi 42

Figure 12: Similarity in the chemical reactions catalyzed by PurT and PurK. For both enzymes, an ATP is cleaved to form intermediates (formyl phosphate for PurT and carboxyl phosphate for PurK) and ADP. The last step is nucleophilic attack on the intermediates by a nitrogen atom in GAR or AIR to form formyl GAR and N5-CAIR, respectively.

43 for every ADP generated (Mueller, 1994). This observation is consistent with the hypothesis of carboxyphosphate as an intermediate for PurK. Two other ATP-grasp proteins, biotin carboxylases and carbomyol phosphate synthetase, also utilize carboxyphosphate generated from ATP and carbonate as intermediate in their reactions

(Polakis, 1974; Ogita, 1988; Raushel, 1979; 1980).

(4) Localization of conserved residues. E. coli PurT and PurK genes share 48% DNA sequence identity, below the 60% DNA sequence identity required for annealing-based combinatorial methodologies. On the amino acid sequence level, these two enzymes are

55% homologous with 27% of the residues being identical (Figure 11A). Besides the conserved residues throughout the ATP-grasp superfamily, the identical residues between

PurT and PurK are congregated into several regions, five of which are the loops believed to form the active site (Figure 11). The P-, Ω-, and J- loops sequences are remarkably similar between PurT and PurK, and the most divergent difference in sequences exists in the B-loops (Figure 11B). This kind of sequence arrangement raises questions about how these proteins in the purine biosynthetic pathway have evolved, whether these conserved blocks are interchangeable between different enzymes without sacrificing enzymatic activity, and what possible role the non-conserved residues play in protein structure and functionality. These questions will be addressed in the next two chapters.

2.1.2 Protein evolution by domain swapping

In general, biosynthetic pathways are built by domain swapping. The proteins in pathways are composed of domains with defined catalytic activities that are fused to 44 generate new activities. The purine biosynthetic pathway is no exception. The structural homology of the ribose 5-phosphate binding sites of the six crystallographically characterized purine enzymes suggests that a common ribose 5-phosphate binding moiety may be utilized by the 12 enzymes of the purine biosynthetic pathway. Another application of domain swapping in the evolution of the purine biosynthetic pathway enzymes is the formation of multifunctional proteins. GAR synthetase (GARS), GAR transformylase (GART), and AIR synthetase (AIRS), the enzymes catalyzing the second, third, and fifth steps in the purine pathway, are three separate proteins in Escherichia coli

(Zalkin, 1999) and Bacillus subtilis (Ebbole, 1987), a bifunctional protein with GARS and AIRS activities and a monofunctional protein with GART activity in yeast (Henikoff,

1986b), and one single trifunctional protein in human (Aimi, 1990), mouse (Kan, 1993), and chicken (Daubner, 1985). The gene organization clearly shows a two-step evolutionary pathway for the fusion of these genes.

At the same time, some ATP-grasp proteins also adopt domain swapping as their evolution strategy. Carbamoyl-phosphate synthetase catalyzes the first step of the de novo pyrimidine biosynthetic pathway, the production of carbamoyl phosphate, through a reaction mechanism requiring one molecule of bicarbonate, two molecules of MgATP, and one molecule of glutamine (Figure 13A). The enzyme from E. coli is composed of

two polypeptide chains. The smaller subunit, a Class I amidotransferase, catalyzes the

hydrolysis of glutamine to glutamate and ammonia, and transfers ammonia to the

catalytic site of the larger subunit. The larger subunit catalyzes the formation of carbomyol phosphate using two molecules of MgATP and ammonia from the small subunit, which can be further divided into two halves referred to as the carboxyphosphate 45

Figure 13: Reactions and Structure of Carbamoyl Phosphate Synthetase (CPS). (A). Reactions catalyzed by CPS. The small subunit of CPS catalyzes hydrolysis of glutamine to glutamate and ammonia. The formation of carboxyphosphate and carbamoyl phosphate is catalyzed by N- and C- terminal domains of large subunit of CPS, respectively. (B). Ribbon representation of structure of CPS large subunit. N- and C- terminal domains are depicted in yellow, and green, respectively, both of which adopt a ATP-grasp protein conformation. AMPPNPs in the active sites are displayed in stick-and-ball representation. (Adapted from (Thoden, 1997)) 46 and carbamoyl phosphate synthetic components (amino acid residues 1-400 and 553-

993). Sequence analysis of N- and C- terminal regions of the carB gene of Escherichia coli, which codes for the large subunit of CPS, suggests that it arose from tandem duplication of a smaller ancestral gene (Nyunova, 1983). With 39% identity and 64% homology between residues 1-400 and 553-993, these two homologous regions from the larger subunit adopt the same conformations that are characteristic for members of the

ATP- grasp superfamily (Figure 13B) (Tholden, 1997). Furthermore, sequence homology between the N-terminal half of CPS large subunit and N-terminal domain of acetyl-CoA carboxylase (Takai, 1988) and (Lim, 1988) suggest that these enzymes evolved by domain recruitment. Davidson et. al. also suggest that the small subunit of CPS, the glutamine amidotransferase, probably evolved from an ancestral that was duplicated and inserted into other proteins by domain recruitment on the basis of its sequence and functional similarity with several enzymes that utilize cleavage of glutamine (Davidson, 1993).

Previous evidence for the role of domain swapping in protein evolution was primarily based on sequence and structure comparison between related enzymes. In order to demonstrate domain swapping, two functional chimerical GAR transformylases were created with the N-terminal GAR binding domain of PurN and the C-terminal catalytic/

N10-formyl-tetrahydrofolate binding site of PurU, whose specific activity was 100- to

1000- fold lower than the wild-type PurN activity (Figure 14B) (Nixon, 1997). PurN, a

second GAR transformylase, catalyzes the transfer of the formyl group from N10-formyl- tetrahydrofolate to the free amino of GAR to give formyl-GAR and tetrahydrofolate.

PurU, N10-formyl-tetrahydrofolate hydrolase, hydrolyses N10-formyl-tetrahydrofolate to 47

Figure 14: Reactions catalyzed by PurN and PurU enzymes (A) and ribbon representation of three-dimensional structure of PurN (B). The C-terminal domain, the N10-formyl- tetrahydrofolate binding domain of PurN, is depicted in blue. In functional chimeric enzymes, this domain is replaced by C-terminal region of PurU. The GAR and the N10- formyl-tetrahydrofolate are displayed in ball-and-stick representation. (Adapted from (Nixon, 2000)) 48 formate and tetrahydrofolate (Figure 14A). Sequence comparison of E. coli purN and purU genes shows a significant homology (≈60%) between the N10-formyl- tetrahydrofolate binding domain of PurN and a region in the C-terminal portion of PurU, which suggests that these two enzymes may evolve through domain swapping. The formyl transfer activity of a rationally designed hybrid enzyme was improved further by using random mutagenesis (Nixon, 2000). This excellent work clearly suggests that functional enzymes would be reconstructed utilizing existing functional protein domains, and the enzymatic activity could be improved by directed evolution.

Based on the similarity in their structures, active site arrangement, reaction mechanism, and localized conserved residues, PurT and PurK form a good model system to demonstrate domain swapping for protein evolution. In this chapter, our first attempt to locate functional domains that could be switched between PurT and PurK focuses on the three distinct domains displayed in the three-dimensional structure. Six hybrid enzymes were rationally designed and created, each of which has a domain replaced with the corresponding domain from the other enzyme. The activities and stability of these chimeric proteins were characterized by in vivo and in vitro assays, which provide information for better understanding of protein evolution. 49

2.2 Experimental:

2.2.1 Materials:

Restriction enzymes, T4 DNA ligase, Calf Intestinal Alkaline , pMAL-c2x vector, and amylose resin were obtained from New England Biolabs

(Ipswich, MA). pBC(SK+) and pBluescript II KS(+) vectors were obtained from

Novagen (Madison, WI). pDIM-PGX vector was obtained from Dr. Stefan Lutz (Lutz,

2001b). Taq polymerase was obtained from Promega (Madison, WI). Pfu turbo polymerase was obtained from Stratagene (La Jolla,CA). dNTPs and complete, EDTA- free inhibitor were obtained from Roche (Indianapolis, IN). The QIAprep, QIA quick Gel and PCR purification , QIA plasmid preparation kit, and Ni-NTA agarose were obtained from Qiagen (Valencia, CA). PK/LDH from rabbit liver and are obtained from Sigma (St. Louis, MO). Superose12 gel filtration column was obtained from Amersham-Pharmacia Biotech (Piscataway, NJ). BSA, all protein gel electrophoresis agents and molecular weight markers were obtained from Bio-Rad

(Hercules, CA). All DNA oligonucleotides were ordered from Integrated DNA

Technologies (Coralville, IA). GAR, and AIR were synthesized as previously described, respectively (Shim, 1998; Firestine, 1994). All other materials were obtained from commercial sources and were of the highest available quality. 50 2.2.2 Bacterial Strains:

DH-5α and BL21(DE3) were obtained from Invitrogen (Carlsbad, CA), and

Novagen (Madison, WI), respectively. TX680F’ [ara ∆ (gpt-rpo-lac) thi rbs-221 ilvB2102 ilvH1202 purN’-lacZ+Y+::KanR purT], E. coli GAR transformylase auxotropic strain, is

a gift from Dr. J.M. Smith. CSH26,purK [purK, zba::tn10], E. coli purK auxotropic

strain, was a gift from Dr. Gert Dandanell (Sorensen, 1997). PurT(-), E. coli K-12

MG1655 with chromosomal purN and purT deleted, and PurK(-), E. coli K-12 MG1655

with chromosomal purK deleted, were constructed by one-step knockout method

(Datsenko, 2000).

2.2.3 Methods:

Construction of hybrid enzymes

Six hybrid enzymes were constructed using the PCR overlapping extension

method, each with one domain replaced by corresponding domain from the other enzyme

(Figure 15) (Ho, 1989). In the first PCR, the N-terminal and C-terminal fragments were

amplified separately, with corresponding inside and outside primers. For N-terminal

fragments, the outside primer introduced an NdeI site at 5’ end to facilitate cloning and a

start codon. For C-terminal fragments, the outside primer introduced a SpeI site at 3’ end.

The inside primers for KTB and TKB hybrids are KAT Forward and TABK Reverse, TAK

Forward and KABT Reverse, respectively. PCR amplification was performed with 50ng

DNA template, 0.2mM dNNPs, 1× PCR buffer, 2mM MgCl2, 1µM primers, and 2 units

Taq/Pfu polymerase in a total volume of 100µl. Reaction conditions were 5 min at 94°C 51

Figure 15: Construction of hybrid enzymes. (A). Organization of hybrid enzymes. Six hybrid enzymes were constructed using the PCR overlapping extension method, each with one domain replaced by corresponding domain from another enzyme. The sequence of PurT and PurK are depicted in blue, and red, respectively. The numbers represent amino acid residue positions. (B). Expression plasmid of hybrid enzymes. The hybrid enzymes are cloned between NdeI and SpeI sites. 52 followed by 30 cycles of 30 seconds at 94°C; 30 seconds at 55°C; 1 min at 72°C, followed by 10 min at 72°C. After purification with the QIAquick PCR purification kit, the PCR products were quantified by OD260 for the next step. In the second PCR, equal

amounts of N-terminal and C-terminal fragments were combined and amplified using

overlap extension to obtain full length fragment under identical conditions to the first

PCR. The full length fragments were cloned into pDIM-PGX vector using NdeI and SpeI sites and their sequences were confirmed by DNA sequencing (Lutz, 2001). The hybrid enzymes were also subcloned into pMAL-c2x vector via XmnI and SpeI sites to increase protein expression and to improve protein solubility. Primers used for PCR were shown in Table 4.

Construction of E. coli PurT and PurK auxotrophic strains

To minimize the possibility of recovering wild type genes from recombination during in vivo complementation, E. coli strain PurT(-), and PurK(-), with deletion of chromosomal purT, and purK genes, were constructed using the method previously described with some modifications (Datsenko, 2000). An antibiotic (chlormphenicol or

kanamycin) resistance gene was amplified using two primers containing flanked-FRT

(FLP recognition target) and chromosomal sequences outside the gene which will be

replaced. After purification with the QIAquick PCR purification kit, the PCR products

were digested with DpnI and repurified. The linear PCR products were transformed by

electropration into electrocompetent E. coli K-12 MG1655 containing a helper plasmid,

which codes for Red recombination system (Murphy, 1998). After being spread on LB

plates with desired antibiotic, the cells were incubated at 37°C to remove the helper

plasmid, which has a temperature sensitive origin. The deletion of chromosomal purT 53

Table 4: The primers used for construction of the hybrid enzymes. Primers Primer Sequences (5’- 3’) Outside Primers PurT (NdeI) Forward GGAATTCCATATGACGTTATTAGGCACTGCG PurK(NdeI) Forward GGAATTCCATATGAAACAGGTTTGCGTCCTC PurT (SpeI) Reverse CCACTAGTTTAACCCTGTACTTTTACCTG PurK (SpeI) Reverse CCACTAGTTTAACCGAACTTACTCTGCGC PurT(XmnI) Forward GAATTAATTCATGACGTTATTAGGCACTGCG PurK(XmnI) Forward GAATTAATTCATGAAACAGGTTTGCGTCCTC Inside Primers TAK Forward GAAGAGCTGCAGCTGCCCACTGCACCGTGGCAGTTACTT TAK Reverse AAGTAACTGCCACGGTGCAGTGGGCAGCTGCAGCTCTTC KAT Forward GATAAGCTCCACCTGCCGACTTCCACTTATCGTTTTGCC KAT Reverse GGCAAAACGATAAGTGGAAGTCGGCAGGTGGAGCTTATC TABK Forward GTTGAAGGCGTCGTTAAGTTTTCTGGTGAAGTGTCGCTGGTT TABK Reverse AACCAGCGACACTTCACCAGAAAACTTAACGACGCCTTCAAC KABT Forward GTCGAGCAGGGCATTAACTTCGACTTCGAAATTACCCTGGTA KABT Reverse TACCAGGGTAATTTCG AAGTCGAAGTTAATGCCCTGCTCGAC

54 and purK genes were verified by PCR. To remove the antibiotic resistance gene on chromosome, a second helper plasmid, a temperature sensitive plasmid encoding FLP recombinase, was transformed into the cells with antibiotic resistance gene on chromosome. After incubation at 37°C, the loss of the antibiotic resistance gene, as well as the helper plasmid, was confirmed by loss of all antibiotic resistances.

Screening of PurT and PurK activities by in vivo complementation

pDIM plasmids containing hybrid enzymes were purified and electroporated into

E. coli auxotrophic strains. For each transformation, three single colonies were picked

and streaked a “X” shape on a LB plate with 100 mg/ml ampicillin to make a master

printing plate. After incubation at 37°C overnight, the cells on master plates were replica

printed onto selective plates (M9 salt, 0.2% glucose, 0.06% caseine, 2 µg/ml thiamine,

1.5 % agar, 100 µg/ml ampicillin) with 0.3 mM isopropyl β-D-thiogalactoside (IPTG).

Selections were performed at 30°C for up to 48 hr. To affirm the complementation, the

cells that appear on selective plates were restreaked on new selective plates to obtain

single colonies, from which the plasmids were extracted and retransformed into

auxotrophic strains, followed by the same selection procedure. pDIM plasmids containing

wild type purT and purK genes were used as controls and included on each selective

plates. Plasmids extracted from first and second selections were sequenced to confirm no sequence changes. All DNA sequencing was performed at the Nucleic Acids Facility of

Pennsylvania State University.

Overexpression and purification of hybrid enzymes

For protein overexpression, the hybrid enzymes were amplified from pDIM

plasmids and subcloned into pMAL-c2x vector via XmnI and SpeI sites. pMAL-c2x 55 plasmids containing KAT, TKB, and KABT were transformed into PurT(-) strain, while pMAL-c2x plasmids containing TAK, KTB, and TABK were transformed into PurK(-)

strain. 1% overnight culture of individual colonies was inoculated into LB media (0.2%

glucose, 100 µg/ml ampicillin) and grown to OD600 ~0.5 at 37°C. The cultures were

transferred to 18°C and induced with 0.3 mM isopropyl β-D-thiogalactoside (IPTG). After

8 hr shaking at 18°C, the cells were centrifuged and pellets were stored in -70°C.

The primary purification of the overexpressed hybrid proteins was performed by

affinity chromatography, using the N-terminal maltose-binding-protein (MBP).

Harvested cells were suspended in 40 ml of column buffer (20 mM Tris-HCl, pH 7.4, 200

mM NaCl, and 1 mM EDTA) containing 50 mg/ml complete protease inhibitor and

1mg/ml egg white lysozyme (sigma) and subjected to two rounds of freeze and thaw,

followed by sonication. After centrifugation, the supernatant was loaded onto an amylose

resin column pre-equilibrated with column buffer and washed with 12 column volume of

column buffer. The was eluted with 4 column volume of column buffer

with 10 mM maltose. The protein solution was then loaded onto a Superose 12 gel

filtration column (24 mL bed volume) pre-equilibrated in buffer X (20 mM Tris-HCl pH

7.4, 100 mM NaCl). Fractions containing hybrid proteins were verified by SDS-PAGE

and pooled accordingly. The purified protein, at greater than 95% homogeneity, was

dialyzed to remove salt and concentrated using Centricon spin filters (MWCO 10K,

Amicon, Bedford, MA), then stored frozen at -70°C. Protein concentrations were

determined by the Bradford analysis against BSA.

Measurement of ATP hydrolysis 56

In an attempt to determine the effect of HCO3¯ on ATP cleavage by the hybrid

proteins, the HCO3¯ in solutions was removed as previously described (Meyer, 1992).

The ATPase activity of each hybrid protein was measured to monitor the production of

ADP via the coupling reactions of and (PK/LDH),

as previously described with some modifications (Marolewski, 1994). In a total volume

of 100 µl, 100 mM HEPES pH7.4, 20 mM KCl, 8 mM MgCl2, 2 mM PEP, 0.2 mM

NADH, and 10 units of PK/LDH were combined and allowed to incubate at 25°C. The

reactions were initiated by addition of enzymes and the NADH decrease in absorbance at

340 nm (ε = 6.22 mM-1 cm-1) was monitored on a Cary I spectrophotometer (Varian, Palo

Alto, CA). For hybrid fusion proteins, the ATPase specific activity (µmol min-1 mg-1) was determined at saturating concentrations of substrates (ATP, 5 mM; GAR, 100 µM; AIR

260 µM, formate, 3 mM; HCO3¯, 3 mM). For KABT, wild type PurT and PurK fusion

proteins, the kinetic parameters were determined in triplet over a concentration range of

10-200 µM AIR, 5-100 µM GAR, 0.1-10 mM ATP, and 0.1-2 mM formate. By varying

the concentration of one substrate at saturating concentrations of the other substrates, the

Michaelis constants for each substrate were determined by fitting to double reciprocal

plot using the program KaleidaGraph from Synergy Software.

Detection of Acyl Phosphate

The presence of acyl phosphate was determined by conversion to the

hydroxamate derivative according to the method as previously described (Pechere, 1967).

In a total volume of 300 µl, 100 mM HEPES pH7.4, 20 mM KCl, 8 mM MgCl2, and

saturating concentrations of substrates were combined and allowed to incubate at 25°C.

The reactions were initiated by addition of enzymes. At different time points, the reaction 57 aliquot was incubated with an equal volume of freshly made 1/1 v/v mixture of 3 M hydroxylamine solution and ~3 M NaOH for 10min. 1 volume of 0.735 M trichloroacetate (TCA) and 2 volume of 0.22 mM FeCl3 in 0.5 M HCl were added. The

absorbance at 490 nm were determined and compared to a succinohydroxyamate standard

for quantitation purposes.

2.3 Results and Discussion

Construction of hybrid proteins

The similarity in the overall structures, the active sites arrangement, conserved

amino acid residues, and the reaction mechanisms between PurT and PurK suggests the

possibility that these two enzymes were evolved by domain swapping. To identify

possible functional domains between PurK and PurT, the three domains displayed in the

three-dimensional structure were chosen as the first candidates since each ATP-grasp

protein has a similar molecular architecture consisting of three domains. Additionally, the

A-domain of both enzymes adopt a Rossmann-fold structure, which is shared by many

nucleotide binding proteins including the N-terminal domain of PurN and PurD, two

enzymes in the purine biosynthetic pathway. On the basis of the three-dimensional

structure and sequence analysis, a conserved threonine residue adjacent to the helix-

residue-helix motif between the A- and B- domains, and a conserved phenylalanine

residue in the linker region between the B- and C- domain were chosen as the boundaries

to separate three domains for the rational construction of the hybrid proteins. Six hybrid

proteins with one domain replaced by the corresponding domain from the other protein 58 were constructed by overlap extension PCR to be expressed as a single fusion protein. As shown in Figure 15, TAK has the A- domain (residues M1-P129) from PurT followed the

B- and C- domain (residues Tr94-G355) from PurK, KAT has the A- domain (residues

M1-P93) from PurK followed by the B- and C- domain (residues T130-G392) from PurT,

KTB has the A- domain (residues M1-P93) from PurK followed by the B- domain

(residues T130-K199) from PurT and the C- domain (residues F158-G355) from PurK,

TKB has the A- domain (residues M1-P129) from PurT followed by the B- domain

(residues T94-N157) from PurK and the C- domain (residues F200-G392) from PurT,

TABK has the A- and B- domains (residues M1-K199) from PurT followed the C- domain

(residues F158-G355) from PurK, and KABT has the A- and B- domains (residues M1-

N157) from PurK followed by the C- domain (residues F200-G392) from PurT.

For hybrid proteins, heterodimerization is one approach to attenuate unfavorable

interactions and provides more structural flexibility between domains from different

proteins, in such hybrid proteins are expressed as two polypeptide chains to form a

heterodimer by non-covalent interactions (Ostermeier, 1999a). KABT and TABK were constructed as heterodimers to attenuate unfavorable interactions between domains as a fusion protein. For KABT, the gene fragment encoding the A- and B- domains (residues

M1-N157) from PurK was subcloned into pBC (SK+) vector via BamHI and HindIII sites, and another gene fragment encoding the C- domain (residues F200-G392) from

PurT was subcloned into pBluescript II KS (+) also via BamHI and HindIII sites. For

TABK, the fragment encoding the A- and B- domains (residues M1-K199) from PurT was

subcloned into pBC (SK+) vector via BamHI and HindIII sites, and another gene

fragment encoding the C- domain (residues F158-G355) from PurK was subcloned into 59 pBluescript II KS (+) also via BamHI and HindIII sites. pBC (SK+) and pBluescript II

KS(+) vectors have different antibiotic resistances as selection marker. Both pBC (SK+) and pBluescript II KS(+) vectors containing the fragments for the same hybrid protein were transformed into E. coli auxotrophic strains at the same time. The cells with two different vectors were selected with double antibiotic resistances, and subject to in vivo selection on minimal media. Upon induction, two protein fragments for the same hybrid protein were expressed from the two different vectors, and assembled together to form a heterodimer protein.

In vivo activity of the hybrid proteins

The initial selection of functional hybrid proteins with either PurT or PurK activities were performed by in vivo complementation of the auxotrophic E. coli strains,

TX680F’ for PurT activity and CSH26 for PurK activity, which have inactivated chromosomal purT and purN, purK genes, respectively. TX680F’ has been utilized in vitro to detect a PurN activity that was almost ten thousand times lower than that of the wild type PurN when looking at the specificity constant of kcat/Km(GAR) (Ostermeier,

1999b). Since both PurN and PurT catalyze the formation of formyl GAR, the selection

sensitivity of the PurN activity using TX680F’ could be applied into the selection of the

-1 -1 PurT activity. Because the specificity constant of kcat/Km(GAR) of PurT (3.7 µM ·s )

(Marolewski, 1994) was 5 times bigger than that of PurN (0.76 µM-1·s-1) (Ostermeier,

1999b), TX680F’ was expected to be able to detect a PurT activity in vitro that is fifty

thousand times lower than that of the wild type PurT. CSH26 has only been utilized to

detect the wild type purK genes from different species (Sørensen, 1997), and no report

about its in vitro selection sensitivity is available. Under the conditions tested, including 60 different temperatures, induction for protein expression, failure of all six hybrid proteins as single proteins to complement the growth of E. coli auxotrophic strains on minimal media suggests that they are not functional enzymes with either PurT or PurK activities.

In addition, the construction of KABT and TABK as heterodimers did not make them be

able to complement the growth of E. coli auxotrophic strains on minimal media. During

the selection, certain numbers of auxotrophic cells (usually 1 in ~105) were observed to

be able to bypass the selection pressure and grow on minimal media by recovering wild type genes essential for cell survival through recombination process, a phenomena known as reversion. To eliminate confusion caused by the reversion, two new E.coli PurT and

PurK auxotrophic strains with the deletion of the chromosomal purT and purK genes

were constructed as previously described (Datsenko, 2000). In vivo selection of PurT or

PurK activities from all six hybrid proteins using these new auxotrophic strains provided

the same negative results, suggesting that these six hybrid proteins don’t possess the

sufficient activities to synthesize formyl GAR or N5-CAIR in order to support the growth of auxotrophic strains on minimal media.

Sometimes, low protein expression and low stability, rather than the lack of the sufficient catalytic activity, might also result in the failure associated with in vivo

selection methods. Especially for hybrid proteins, poor solubility is a serious problem due to unfavorable interactions between regions from different proteins that lead to incorrect folding and aggregation. Overexpression of these six hybrid proteins clearly demonstrated their poor solubility by SDS-PAGE gel electrophoresis, which may contribute to the failure of in vivo complementation. To increase protein expression level

and improve solubility, the hybrid enzymes were subcloned into pMAL-c2x vector via 61 XmnI and SpeI sites, and then were expressed as a fusion protein with an N-terminal maltose-binding protein (MBP) under the control of a Ptac promoter, a stronger promoter compared to the lac promoter in pDIM vector. SDS-PAGE gel electrophoresis showed that the hybrid proteins fused with MBP expressed from pMal-c2x vector accounted for at least 10% of total protein in auxotrophic strains with significantly improved solubility under isopropyl β-D-thiogalactoside (IPTG) induction. However, they were still not able

to support the growth of auxotrophic strains on minimal media, which clearly indicated

that the hybrid proteins between PurT and PurK by rational domain swapping are not

enzymes efficient enough to complement auxotrophic strains on minimal media.

Overexpression and purification of hybrid enzymes

The initial attempt to purify the hybrid proteins as single proteins failed due to

protein aggregation during overexpression, suggesting the poor solubility of these hybrid

proteins. To improve their solubility, the six hybrids, as well as the parental wild type

genes, were cloned into pMAL-c2x vector and expressed as a fusion protein with N-

terminal maltose-binding protein (MBP). Upon induction at 18°C, the expression level of

the hybrid fusion proteins reached at least 10% of total E. coli proteins as verified by

SDS-PAGE electrophoresis, almost half of which existed as soluble proteins.

The soluble, overexpressed wild-type and hybrid fusion proteins were isolated

and purified to homogeneity (> 95 %) using two-step chromatography. Protein of >90 %

purity as indicated by SDS-PAGE was obtained in one step affinity chromatography by

utilizing the specific binding between the N-terminal maltose-binding-protein (MBP)

domain of the fusion proteins and the amylose resin (NEB), as described in the Materials

and Methods section. 62 In the subsequent purification step by gel filtration, the wild type PurT fusion protein was eluted as a single peak corresponding to the monomeric protein species, which agrees with the previous result (Marolewski, 1994). In contrast, the monomeric state of the wild type PurK fusion protein demonstrated by gel filtration differs from the previous data, in which PurK apparently formed a dimer (Meyer, 1992). The disagreement on the oligomeric state of wild type PurK might result from the addition of the N-terminal MBP domain. In the same gel filtration experiments, all six hybrid fusion proteins also appeared to be monomers, but with various levels of higher molecular mass contaminants that may originate from aggregation of partially folded hybrid proteins.

After the second step purification by gel filtration, the homogeneity of the purified fusion protein was above 95 %.

The kinetic parameters of the wild type PurT and PurK fused with MBP were determined by following the cleavage of ATP using the coupling reactions of pyruvate kinase (PK) and lactate dehydrogenase (LDH). Comparison with previous PurT and PurK kinetic data suggests that the catalytic performance of these two enzymes (based on the wild type proteins) is impaired by the addition of the MBP domain (Table 5) (Meyer,

1992, Marolewski, 1994). However, the hybrid proteins showed severe aggregation and precipitation upon the removal of the MBP domain by factor Xa cleavage, and had to be purified only as fusion proteins.

ATP hydrolysis of wild type and hybrid proteins

As mentioned above, both PurT and PurK reactions consist of three steps: ATP cleavage, formation of intermediates (formyl phosphate for PurT and carboxyl phosphate for PurK), and nucleophilic attack on intermediates by mononucleotides (Figure 12). 63

Table 5: Kinetic parameters of ATPase activity of the two wild type enzymes (PurT and PurK), and KABT hybrid enzyme.

Km (GAR) Km (AIR) Km Km Km kcat kcat/Km kcat/Km kcat/Km kcat/Km kcat/Km -1 (µm) (µm) (ATP) (HCO2¯) (HCO3¯) (s ) (GAR) (AIR) (ATP) (HCO2¯) (HCO3¯) (µm) (µm) (µm) (µm-1· s-1) (µm-1· s-1) (µm-1· s-1) (µm-1· s-1) (µm-1· s-1) PurTa 10.1±0.5 NA 45±12 319±15 NA 37.6±0.8 38 NA 0.84 0.12 NA

PurTb 36±5 NA 85±14 1700±310 NA 2.1±0.1 0.058 NA 0.025 0.0012 NA

b -6 KABT ND ND 1400±300 ND ND (9±0.9) ND ND 6.4×10 ND ND ×10-3 PurKc NA 26 90 NA ND 52 NA 2.0 0.58 NA ND

PurKb NA 232±20 ND NA ND 26.4±1.5 NA 0.11 ND NA ND a adapted from (Marolewski, 1994). b fusion protein with an N-terminal MBP domain. c adapted from (Meyer, 1992). NA not applicable. ND not determined.

64 It has been shown that the ATPase activity of the wild type PurT and PurK is dependent on the presence of GAR and AIR, respectively (Meyer, 1992, Marolewski, 1994). The ability of the wild type PurT to catalyze the conversion of acetate and ATP to acetyl phosphate and ADP in the absence of GAR also suggested that the ATP cleavage and the formation of intermediate may occur in the absence of GAR (Marolewski, 1994). The failure to detect ADP in solution formed by the wild type PurT in the presence of formate and ATP may be explained by tighter binding of ADP or formyl phosphate in the active site.

The ATPase activity of PurT and PurK fusion proteins with an additional N- terminal MBP domain also showed a similar dependency on their respective substrates

GAR or AIR. But the kinetic parameters of these ATPase activities changed significantly upon the addition of an N-terminal MBP domain, as compared to those of the wild type

PurT and PurK (Table 6). For the PurK fusion protein, the kcat for the ATPase activity has

decreased 2 fold, when the Km for AIR has increased almost 9 fold. For the PurT fusion

protein, the kcat for the ATPase activity has decreased almost 18 fold, while the Km for

ATP, GAR, and HCO2¯ have increased 2 fold, 4 fold, and 5 fold, respectively. At the

same time, PurT and PurK fusion proteins were observed to still be capable of supporting

the growth of their respective E. coli auxotrophic strains under limiting purine conditions.

The growth rates were indistinguishable from those of cells containing the wild type

enzymes, indicating that the production of formyl GAR or N5-CAIR was sufficient to

sustain normal rates of cell growth.

Only the kinetic parameters of the ATPase activity of KABT hybrid protein fused

with an N-terminal MBP domain were determined (Table 5). Compared to the wild type 65

Table 6: The ATPase activities of the wild type PurT and PurK, and the hybrid enzymes under different conditions. Specific AIR& GAR& Activity None HCO ¯ AIR HCO ¯ GAR 3 HCO ¯ 2 HCO ¯ (µmol/min/mg) 3 2 PurK 0.23 1.09 8.72 10.0 0.35 0.46 0.40

PurT 0.021 0.031 0.033 0.030 0.16 4.64 5.14

KAT 0.045 0.040 0.039 0.041 0.047 0.054 0.061

TAK 0.66 0.69 0.91 0.96 0.65 0.61 0.67

TKB 0.073 0.073 0.087 0.13 0.061 0.10 0.099

KTB 0.028 0.026 0.053 0.077 0.022 0.033 0.037

KABT 0.16 0.15 0.14 0.14 0.14 0.10 0.12 T K 0.26 0.23 0.25 0.27 0.20 0.22 0.28 AB All proteins were purified as fusion proteins with an N-terminal MBP domain. The ATPase activities were measured under saturating concentrations of each substrate, and determined by fitting into: kcat = Vmax/ [E]. The standard error for each value is less than 20%.

66

PurT fusion protein, KABT has a 200 fold decrease in the kcat and a 16 fold increase in the

Km for ATP, while it has a 2700 fold decrease in the kcat compared to the wild type PurK

fusion protein.

The ATPase activities of all six hybrid fusion proteins, as well as the wild type

PurT and PurK fusion proteins, in the presence of different substrates were tested and the results were shown in Table 6. The values for ATPase activities of the wild type PurT and PurK in the absence of any substrate were due to the background rate obtained during the assay and low protein concentrations used. As expected, ATPase activities of the wild type PurT and PurK reached the peak only in the presence of GAR and AIR, respectively.

Addition of HCO3¯ and HCO2¯ did not affect their ATPase activities much.

All six hybrid fusion proteins showed ATPase activities that were 2 orders of

magnitude lower than those of the wild type enzymes, though they can not support the

growth of either purT or purK auxotrophic strains under limiting purine conditions. The

observation of the ATPase activities of the hybrid proteins suggests that a functional ATP binding pocket for ATP hydrolysis could be constructed between B- and C- domains from different parental proteins. Furthermore, unlike those of the wild type PurT and

PurK fusion proteins, these ATPase activities were observed even in the absence of AIR and GAR (Table 6). Based on the Km for ATP of KABT hybrid protein and the wild type

PurT, the chimeric ATP binding site formed by the PurK B- domain and the PurT C- domain has decreased its ATP binding affinity almost 16 fold, as compared to that of the wild type PurT. As an indication of a weaker ATP binding site, the increase of the Km for

ATP in KABT hybrid protein might result from unfavorable interactions in the interface

between the B- and C- domains from different parental proteins. Of all six hybrid 67 proteins, TAK showed the highest ATPase activity as its B- and C- domains are all from

PurK. No correlation between each single domain and its effect on the ATPase activity of the hybrid proteins was observed.

As shown in Table 6, the ATPase activities of the six hybrid proteins showed different responses to different substrates. The ATPase activity of KAT only increased in

the presence of GAR or GAR and formate, which was in same trend as that of the wild

type PurT. Meanwhile, the ATPase activity of TAK only increased in the presence of AIR

or AIR and bicarbonate, which is also similar to the wild type PurK. In these two hybrid

proteins, only the A- domains were swapped, and the B-, C- domains remain intact.

Within the A- domain, most of the conserved residues between PurT and PurK are located in only two regions. The first region contains the residues that form the P loop, and the second region contains the amino acid residues 79-84 in PurT or 46-51 in PurK.

Figure 10C shows that the residues from these two regions form a ribose 5-phosphate binding site. While the residues in the P loop are involved in binding of the phosphate group of the ribose 5-phosphate, a conserved Glu residue (E82 in PurT and E49 in PurK) in the second region serves to bridge the 2’- and 3’- hydroxyl groups of the ribose 5-

phosphate. Since both GAR and AIR share this common ribose 5-phosphate moiety, the

residues in the A- domain of PurT and PurK are probably not involved in determining

their respective enzyme substrate specificities, i.e. they do not interact with the

glycinamide moiety of GAR and the 5-aminoimidazole moiety of AIR. Therefore, KAT hybrid protein might still retain the GAR binding pocket intact and prefer the binding to

GAR just like the wild type PurT, as indicated by that its ATPase activity was enhanced in the presence of GAR. On the other hand, TAK hybrid protein might still retain the AIR 68 binding pocket intact and prefer the binding to AIR just like the wild type PurK, as indicated by that its ATPase activity was enhanced in the presence of AIR. For the wild type PurT, it has been shown that GAR binding would enhance ATP binding by comparing the 45 µm Km value for ATP in the full reaction with the 77µm Km value for

ATP in the side reaction (Marolewski, 1994). The full reaction by PurT means the

biologically relevant PurT reaction, the conversion from GAR, ATP, and formate to

formyl GAR, ADP, and Pi. The side reaction is the cleavage of ATP catalyzed by PurT in

the presence of acetate to generate acetyl phosphate and ADP, which does not require

GAR. The failure of KAT and TAK hybrid proteins to support the E. coli auxotrophic

strains growth on purine free selective media might be explained by lack of activity due

to the disruption of the active sites, or the low activities over the sensitivity limit of in

vivo assay.

Surprisingly, the ATPase activities of the TKB and KTB hybrid proteins increased

in the presence of either GAR or AIR. The similar patterns were also observed when

GAR or AIR was added in the reaction mixture after the reaction was started by the addition of the hybrid proteins. In the TKB and KTB hybrid proteins, the B- domains were

switched, and the A- and C- domains remain intact. One reasonable interpretation of this

surprising finding is that in the wild type PurT and PurK, the elements determining their

respective substrate specificities exist in both the B- and C- domains. When the B-

domains were swapped, the resultant hybrid proteins TKB and KTB, could bind with

either GAR or AIR, which then enhanced their ATPase activities. As can be seen in

Figure 10A, the GAR binding pocket is situated at the apex of the A-, B-, and C- domain

of the wild type PurT. While the ribose 5-phosphate binding site is located in the A- 69 domain, the residues interacting with the glycinamide moiety of GAR are located in the

B- and C- domain. In the B- domain, the hydroxyl oxygen of S161 in the B loop is in hydrogen bond distance with the amino group of GAR and one of γ-phosphoryl oxygens of ATP, which might assist in positioning the amino group of GAR and one of γ- phosphoryl oxygens of ATP to optimal locations for the reaction to occur. In the C- domain, D286 in the J loop also interacts with the amino group of GAR, whose potential function is to deprotonate the amino group of GAR to facilitate nucleophilic attack. The guanidinium group of R363 in the C loop interacts with the carbonyl oxygen of the glycyl moiety of GAR. All these residues have their conserved counterparts in the wild type

PurK based on the structure-based alignment (Figure 11A). Although without direct structural evidence, these residues in PurK are presumed to have similar functions to their counterparts in PurT.

For the last pair of the hybrid proteins in which the C- domain was swapped,

KABT and TABK, their ATPase activities remain nearly unchanged under different substrates. The results are not unusual because the C- domain is the largest and most

complex domain of the three domains, and changing it might cause severe structural disturbance. As shown in Figure 10, the C- domain is like a palm with the A- and B- domains sitting on the top like two fingers, with the active site situated at the interface of three domains. In the KABT and TABK hybrid proteins, both ATP and GAR/AIR binding

pockets are perturbed when the C- domains were swapped. Although the B- and C-

domains from different parental proteins still form a functional ATP binding pocket, the

mononucleotide binding pocket that is mostly situated in the interface of the A- and C- 70 domains might be disrupted completely. Consequently, the ATPase activities of the KABT and TABK hybrid proteins lose their coordination with GAR or AIR.

Formation of acyl phosphate intermediates

Like the reactions catalyzed by other members of the ATP grasp protein

superfamily, PurT and PurK reactions also involve the formation of an acyl phosphate

intermediate. Marolewski et. al. has shown that formyl phosphate is chemically and

kinetically competent for PurT as an intermediate, and the generation of formyl

phosphate was directly detected using a PurT mutant with a single mutation G162I

(Marolewski, 1997). G162 is located on the B-loop of PurT, which contains the residues

involving in the binding of phosphate groups of ATP. In the wild type PurT, the formyl

phosphate intermediate is well protected in the active site. However, the mutation of

G162I might disrupt the loop structure and lead to the diffusion of formyl phosphate into

solution for detection. No free formyl phosphate was detected from the wild type PurT

reaction, probably due to lack of accumulation (Marolewski, 1994). In addition to its

biologically relevant transformation of GAR into formyl GAR, PurT also catalyzes the

conversion from ATP and acetate to acetyl phosphate and ADP in the absence of GAR as

another indication of formation of an acyl phosphate intermediate. No direct evidence for

the carboxyl phosphate as intermediate for PurK is available.

The capabilities of the hybrid proteins, as well as the two wild type proteins, to produce acetyl phosphate, formyl phosphate or carboxyl phosphate were tested by the conversion of the free acyl phosphate to the hydroxamate, which was visualized spectrophotometrically as described in the Materials and Methods section (Table 7). The

attempt to detect carboxyl phosphate generated by the wild type and hybrid proteins did 71

Table 7: Activities of the production of acetyl phosphate, formyl phosphate and carboxyl phosphate by the wild type PurT and PurK, and the hybrid enzymes. Specific Activity Production of Production of Production of (µmol/min/mg)×10-4 acetyl phosphate formyl phosphate carboxyl phosphate

PurTa 4.4×103 NA NA

PurTb 715 ND ND

PurKb ND ND ND

b KAT 3.8 2.8 ND

b TAK 8.6 4.0 ND

b TKB 11.1 1.7 ND

b KTB 0.61 0.43 ND

b KABT 10.7 7.7 ND

T Kb 0.80 0.75 ND AB a adapted from (Marolewski, 1994). b fusion protein with an N-terminal MBP domain. NA not applicable. ND not detected. The activities were measured under the saturating concentrations of each substrate, and determined by fitting into: kcat = Vmax/ [E]. The standard error is less than 10%.

72 not succeed, which could be explained by the poor stability of carboxyl phosphate.

Additionally, the wild type PurK was observed not to be able to produce formyl phosphate or acetyl phosphate as formate and acetate are not its natural substrate. Acetyl phosphate, but no formyl phosphate, was detected by the action of the wild type PurT fusion protein, just like the wild type PurT without the N- terminal MBP domain.

All hybrid proteins showed the ability to generate either acetyl phosphate or formyl phosphate, although at various rates. These activities are not affected by the presence of GAR or AIR, in contrast to the wild type proteins in which there is no release of ADP in the absence of GAR or AIR. For each hybrid protein except TKB, the catalytic

activity for the production of acetyl phosphate is in reasonable agreement with that for

the production of formyl phosphate, suggesting no preference for formate or acetate.

Compared to their ATPase activities, the hybrid proteins catalyze the production of acyl

phosphate with a two orders of magnitude less efficiency. For the wild type PurT, the

ratio of acetyl phosphate trapped to ADP produced was 0.70 ± 0.15 (Marolewski, 1994).

Due to highly optimized architecture in the active site of the wild type enzyme, the acetyl

phosphate uncounted for is likely hydrolyzed either in solution or at the active site before being trapped. The ratio of acetyl phosphate trapped to ADP produced decreased to less than 0.01 in the hybrid proteins. More than 99% of ATP was hydrolyzed and released

into solution as ADP and Pi without being transferred to produce acetyl or formyl phosphate. This significant drop in the efficiency of Pi transfer could be well explained by the disruption of the active sites in the hybrid proteins. 73 2.4 Conclusion

Domain swapping has been shown to be a strategy for protein evolution, especially the evolution of the metabolic pathways. As two enzymes in the purine biosynthetic pathway, PurT and PurK display typical features of proteins evolved from domain swapping on the basis of the similarity on the structures, reaction mechanisms, the active sites organization, and the conserved residues.

To cover every combination of domain swapping between PurT and PurK, six chimeric hybrid proteins were designed and generated rationally with one domain replaced by the corresponding domain from the other protein, in an effort to identify functional domains utilized in protein evolution. Although failing to complement the E. coli auxotrophic strains in vivo as the wild type enzymes, these hybrid proteins still showed the partial activities of the wild type enzymes to catalyze the ATP cleavage and the formation of formyl or acetyl phosphate intermediates by in vitro studies. The establishment of two wild type catalytic activities in the hybrid proteins clearly indicated the strong structure-function relationships between PurT and PurK and the possibility of domain swapping as the strategy for their evolution.

It was shown that a functional ATP binding pocket could be constructed between the B- and C-domains from different proteins based on the observation of ATPase activities in these six hybrid proteins. However, the introduction of a foreign domain led to a significant drop in the ATP binding affinity and ATPase activity. Additionally, the effects of different substrates on the ATPase activities of each hybrid protein, combined with the structural data, revealed that in PurT and PurK, the A-domains might be 74 involved in the binding of ribose 5-phosphate moiety of the substrates, and the residues in the B- and C-domains might determine the substrate specificities. This assignment of different functions into distinct structural domains was considered as a typical feature for the proteins evolved by domain swapping. Furthermore, each hybrid protein showed the ability to catalyze the formation of acetyl or formyl phosphate, which is only one step apart from accomplishing the wild type reactions. Although acetyl or formyl phosphate was produced by the hybrid proteins at a rate more than 100 fold lower than their ATPase activities, the observation of this catalytic activities in the hybrid proteins further proved the structure-function relationships between PurT and PurK. The significant decrease in the Pi transfer efficiency upon the introduction of a foreign domain was consistent with the previous observation that the ATPase activities of the hybrid proteins were substantially impaired due to the domain replacement.

75

Chapter 3

Identification of functional subdomains in purT-encoded GAR transformylase (PurT) and N5-CAIR synthetase (PurK) by combinatorial and rational methods

3.1 Introduction

Domain swapping has been shown to be a feasible approach for protein evolution,

especially for the enzymes involved biosynthetic or metabolic pathways, not only by the

computational evidence based on sequence homology, but also by experimental studies of functional enzymes constructed through domain swapping (Nixon, 1997, Ostermerier,

2000). Given the similarity in their structures, the chemistry of their reactions, and conserved residues involved in substrates binding and catalysis, PurT and PurK were chosen to demonstrate domain swapping for protein evolution. The identification and characterization of the functional domains for protein evolution clearly will have an impact on improving our current understanding of protein structure-function relationships.

My initial effort to identify functional domains by swapping distinct domains according to the three-dimensional structures did not generate any hybrid proteins with the wild type enzyme functions. This could be due to the disruption of the active sites, or to the low activities that exceeded the sensitivity limit of in vivo assay. The poor activities and solubility of the hybrid proteins might originate from the limited access of sequence space in the rational domain swapping. In each hybrid protein, the crossover points were selected based on the three-dimensional structures with the intention not to disrupt the 76 overall structure; however, this left most of sequence space untouched. Given the high structural homology and low amino acid sequence homology between PurT and PurK, the crossover point chosen in the rational domain swapping might not be suitable to generate a functional chimeric protein.

In this chapter, additional sequence space along the entire purT and purK genes was explored by the generation of chimeric protein libraries using the incremental truncation method and in vivo selection, in an effort to generate functional chimeric proteins. At the same time, functional subdomains that might originate from domain swapping in each individual domain were identified using rational and combinatorial methods.

77

3.2 Experimental:

3.2.1 Materials:

Restriction enzymes, T4 DNA ligase, and Calf Intestinal were obtained from New England Biolabs (Bevely, MA). pET22b vector was obtained from Novagen (Madison, WI). pDIM-PGX vector was obtained from Dr. Stefan Lutz

(Lutz, 2001b). , DNA exonuclease III, and Taq polymerase were obtained from Promega (Madison, WI). Pfu turbo polymerase was obtained from

Stratagene (La Jolla,CA). dNTPs and complete, EDTA-free protease inhibitor were obtained from Roche (Indianapolis, IN). The QIAprep, QIA quick Gel and PCR purification kit, QIA plasmid preparation kit, and Ni-NTA agarose were obtained from

Qiagen (Valencia, CA). PK/LDH from rabbit liver and lysozyme are obtained from

Sigma (St. Louis, MO). HiTrapTM DEAE column was obtained from Amersham-

Pharmacia Biotech (Piscataway, NJ). BSA, all protein gel electrophoresis agents and

molecular weight markers were obtained from Bio-Rad (Hercules, CA). All DNA

oligonucleotides were ordered from Integrated DNA Technologies (Coralville, IA). α-

Phosphothioate nucleotides had been synthesized previously (Chen, 1983). GAR, and

AIR were synthesized as previously described, respectively (Shim, 1998; Firestine,

1994). All other materials were obtained from commercial sources and were of the

highest available quality. 78 3.2.2 Bacterial Strains:

DH-5α and BL21(DE3) were obtained from Invitrogen (Carlsbad, CA), and

Novagen (Madison, WI), respectively. PurT(-), E. coli K-12 MG1655 with chromosomal purN and purT deleted, and PurK(-), E. coli K-12 MG1655 with chromosomal purK deleted, were constructed by one-step knockout method (Datsenko, 2000).

3.2.3 Methods:

Construction of chimeric protein libraries by ITCHY

The chimeric protein libraries were constructed as previously described, with some modifications (Figure 16) (Lutz, 2001). The fragments containing two parental genes in sequential order separated by an unique HindIII site were prepared and subcloned into pDIM-PGX vector via NdeI and SpeI sites. The resultant pDIM vectors were linearized by HindIII digestion and gel purified using QIAquick Gel and PCR purification kit. The linearized vectors were amplified with two primers, linker forward

(5’ -CTTACTGCAGCGCTCGAG- 3’) and linker reverse (5’ –

CTTCTAGATATCGGATTC- 3’) in the presence of deoxynucleoside 5'-O-(1-thio) triphosphates, and α-phosphothioate nucleotides were incorporated directly into DNA during the amplification. PCR amplification was performed with 10ng DNA template,

167 µM dNTP, 33 µM αS-dNTP, 1× PCR buffer, 1mM MgCl2, 1µM each primer, and 2

units Taq polymerase in a total volume of 50 µl. The Reaction conditions were 5 min at

94°C followed by 30 cycles of 30 seconds at 94°C; 30 seconds at 55°C; 5 min at 72°C,

followed by 10 min at 72°C. After purification with the QIAquick PCR purification kit, 79

Figure 16: Schematic overview of the library construction using THIO-ITCHY. The procedure is described in detail in the Materials and Methods section. The α- phosphothioate nucleotides were incorporated randomly into DNA during the PCR amplification, as depicted as stars. The presence of α-phosphothioate nucleotides in DNA strands prevents the degradation by exonuclease III. (adapted from (Lutz, 2001))

80 the PCR products were quantified by OD260 for the next step. The amplified DNA was

mixed with exonuclease III (120 U/µg 5’ –end DNA) in 1 × buffer supplied by

manufacturer and incubated for 30 min at 37°C. The reactions were quenched by the

addition of PB buffer, and the DNA was purified using QIAquick PCR purification kit.

The purified DNA was then mixed with mung bean nuclease (2.3U/µg DNA) in 1 × buffer supplied by manufacturer and incubated for 30 min at 30°C to remove the single- stranded 5’-overhang. The DNA was purified again using QIAquick PCR purification kit.

The DNA fragments with blunt ends were prepared by mixing the DNA with Klenow fragment (1U/µg DNA) in 1 × Klenow buffer and 100 µm dNTP for 10 min at 37°C. The

DNA was purified again using QIAquick PCR purification kit. The blunt-ended plasmids were cyclized to make a plasmid library by intra-molecular ligation overnight at 4°C using 24 U T4 DNA ligase in 1 × buffer supplied by manufacturer in a total volume of

400 µl. The circular plasmid libraries were purified and desalted using QIAquick PCR purification kit. In the last step, the purified plasmid libraries were transformed into freshly prepared E. coli DH5α-E cells by electroporation, and the cells were spread on

LB agar plates containing antibiotics. After overnight incubation at 30°C, the libraries were recovered from the plates into 20 ml storage media (2× TY, 2% glucose, and 15% glycerol), concentrated by centrifugation, and stored frozen at -70°C.

In vivo Selection of ITCHY libraries

After extracted from E. coli DH5α-E cells using QIA plasmid preparation kit, the

ITCHY plasmid libraries were electroporated into E. coli auxotrophic strains and spread on LB agar plates containing the correct antibiotics. After overnight incubation at 30°C, the libraries were recovered from the plates into 10 ml of selective media (M9 salts, 0.2% 81 glucose, 100 µg/ml ampicillin), and resuspended in 2 ml of selective media. The culture

was shaken at 37°C for 2 hr before being plated in a series of dilutions on the selective

plates with 0.3 mM isopropyl β-D-thiogalactoside (IPTG). The plates were incubated at

30°C for up to 72 hr. To affirm the complementation, the cells that appear on the

selective plates within 48 hr were randomly chosen and restreaked on new selective

plates to obtain single colonies, from which the plasmids were extracted and

retransformed into E. coli auxotrophic strains, followed by the same selection procedure.

pDIM plasmids containing wild type purT and purK genes were used as controls and

included on each selective plates. Plasmids extracted from the first and second selections

were sequenced to confirm no sequence changes and derive the crossover points between

two parental genes. All DNA sequencing was performed at the Nucleic Acids Facility of

Pennsylvania State University.

Construction of hybrid enzymes & screening of PurT and PurK activities by in vivo

complementation

The PurT and PurK hybrid proteins with the crossover points in individual

domains were constructed and screened in vivo for PurT and PurK enzymatic activities,

using the same methods as described in the Materials and Methods section in Chapter 2.

The primers used for construction of the hybrid proteins were listed in Table 8.

Purification of proteins with His6 tag

DNA fragments encoding the wild type PurK and KTBLK were amplified off the

pDIM vectors and subcloned into pET22b vector via NdeI and HindIII sites, and their

sequences were confirmed by DNA sequencing. The pET22b vectors were transformed

into E. coli BL21 (DE3) strain for overexpression. 1% overnight culture of individual 82

Table 8: The primers used for construction of the hybrid enzymes with crossovers in individual domains. Primers Primer Sequences (5’- 3’) Outside Primers PurT (NdeI) Forward GGAATTCCATATGACGTTATTAGGCACTGCG PurK(NdeI) Forward GGAATTCCATATGAAACAGGTTTGCGTCCTC PurT (SpeI) Reverse CCACTAGTTTAACCCTGTACTTTTACCTG PurK (SpeI) Reverse CCACTAGTTTAACCGAACTTACTCTGCGC Inside Primers KAT1 Forward GGAATTCCATATGAAACAGGTTTGCGTCCTCGGTTCCGGTGA ACTGGGTAAAGAA KAT2 Forward GGTAACGGGCAGTTAGGCCGTGAAGTGGCAATCGAGTGTCAG KAT2 Reverse CTGACACTCGATTGCCACTTCACGGCCTAACTGCCCGTTACC KAT3 Forward CGCTGGCCGGAAACCGCATTAATCCAACTTGAAGAGGAAGGA KAT3 Reverse TCCTTCCTCTTCAAGTTGGATTAATGCGGTTTCCGGCCAGCG TAKTBLK1 Forward GCAACTCGCGTGATGTTATTAGGTAACGGGCAGTTAGGCCGT TAKTBLK1 Reverse ACGGCCTAACTGCCCGTTACCTAATAACATCACGCGAGTTGC TAKTBLK2 Forward GGCTCCGGTGAACTG GGTAAA ATG CTGCGTCGTCAGGCAGGC TAKTBLK2 Reverse GCCTGCCTGACGACGCAGCATTTTACCCAGTTCACCGGAGCC TAKTBLK3 Forward GCTATTGCCACCGATATGCTGACCCGCGAGCTGGCGCGCCAT TAKTBLK3 Reverse ATGGCGCGCCAGCTCGCGGGTCAGCATATCGGTGGCAATAGC KTB1 Forward GCACCGTGGCAGTTACTTGCCAGCGAAAGCCTTTTCCGCGAG KTB1 Reverse CTCGCGGAAAAGGCTTTCGCTGGCAAGTAACTGCCACGGTGC KTB2 Forward CTGGCGATTGTTAAGCGTCGCCTGAGCTCTTCCGGCAAGGGG KTB2 Reverse CCCCTTGCCGGAAGAGCTCAGGCGACGCTTAACAATCGCCAG KTB3 Forward CGCGGTCAATGGCGTTTACGCTCTGCAGAGCAACTTGCTCAG KTB3 Reverse CTGAGCAAGTTGCTCTGCAGAGCGTAAACGCCATTGACCGCG KTB4 Forward CCGGCAGAGTGTTACGGCGAAGTAATTGTTGAAGGCGTCGTT KTB4 Reverse AACGACGCCTTCAACAATTACTTCGCCGTAACACTCTGCCGG TKB1 Forward TGCATTGTAAAACCGGTGATGGGTGGTTATGACGGTCGCGGT TKB1 Reverse ACCGCGACCGTCATAACCACCCATCACCGGTTTTACAATGCA TKB2 Forward AAGGGGCAGACGTTTATTCGTGCAAATGAAACCGAACAGTTA TKB2 Reverse TAACTGTTCGGTTTCATTTGCACGAATAAACGTCTGCCCCTT TKB3 Forward AAGTACGCTCAGCAAGGCGGTCCGGCAGAGTGTTACGGCGAA TKB3 Reverse TTCGCCGTAACACTCTGCCGGACCGCCTTGCTGAGCGTACTT TKC1 Forward GGCGTCCATTTCTGTGCACCACTGACGCATAACCTGCATCAGGC TKC1 Reverse GCCTGATGCAGGTTATGCGTCAGTGGTGCACAGAAATGGACGCC TKC2 Forward TATGGGTTGTTTGGTGTCGAGTGTTTTGTCACCCCGCAAGGT TKC2 Reverse ACCTTGCGGGGTGACAAAACACTCGACACCAAACAACCCATA TKC3 Forward TCTCAAGACTCTCAGAGTTTGAGCTGCATCTGCGGGCGATT TKC3 Reverse AATCGCCCGCAGATGCAGCTCAAACTCTGAGAGTCTTGAGA TKC4 Forward GATTTGCAGATTCGTTTATTTGACAAAGAAGTCCGTCCGGGG TKC4 Reverse CCCCGGACGGACTTCTTTGTCAAATAAACGAATCTGCAAATC TKC5 Forward GATGGCAGCCGTCGTCTGGGGCATCTGAATTTGACCGACAGC TKC5 Reverse GCTGTCGGTCAAATTCAGATGCCCCAGACGACGGCTGCCATC KTBLKKC1 Forward GATGGCAGCACCGTGTTTTATCCAGTAGGTCATCGCCAGGAA 83

KTBLKKC1 Reverse TTCCTGGCGATGACCTACTGGATAAAACACGGTGCTGCCATC KTBLKKC2 Forward GTGGGCGTGATGGCGATGGAGCTATTTGTCTGTGGTGATGAG KTBLKKC2 Reverse CTCATCACCACAGACAAATAGCTCCATCGCCATCACGCCCAC KTBLKKC3 Forward GGTGCCAGCATCAGCCAGTTTGCCCTGCATGTACGTGCCTTC KTBLKKC3 Reverse GAAGGCACGTACATGCAGGGCAAACTGGCTGATGCTGGCAC KTBLKKC4 Forward CTGGTGCATCTGCACTGGTACGGTAAGCCGGAAATTGATGGC KTBLKKC4 Reverse GCCATCAATTTCCGGCTTACCGTACCAGTGCAGATGCACCAG KTBLKKC5 Forward GATGGCAGCCGTCGTCTGGGGCATCTGAATTTGACCGACAGC KTBLKKC5 Reverse GCTGTCGGTCAAATTCAGATGCCCCAGACGACGGCTGCCATC KLT Forward GTTAAGCGTCGCACTAGTAGTTCTGGTCGCGGTCAATGG KLT Reverse CCATTGACCGCGACCAGAACTACTAGTGCGACGCTTAAC TLK Forward GTAAAACCGGTGATGGGCGGTTATGACGGCAAGGGGCAGAC TLK Reverse CGTCTGCCCCTTGCCGTCATAACCGCCCATCACCGGTTTTAC

Table 8 (continued)

84 colonies was inoculated into LB media with 100 µg/ml ampicillin and grown to OD600

~0.5 at 37°C. The cultures were transferred to 18°C and induced with 0.3 mM isopropyl

β-D-thiogalactoside (IPTG). After 24 hr shaking at 18°C, the cells were centrifuged and

pellets were stored in -70°C.

The primary purification of the overexpressed hybrid proteins was performed by

affinity chromatography, utilizing the specific binding between Ni-NTA resin and the C-

terminal His tag. Harvested cells were suspended in 40 ml of lysis buffer (50 mM Sodium

phosphate buffer, pH 8.0, 300 mM NaCl, and 10 mM imidazole) containing 50 mg/ml

complete protease inhibitor and 1mg/ml egg white lysozyme (sigma) and subjected to

two rounds of freeze and thaw, followed by sonication. After centrifugation, the

supernatant was loaded onto a Ni-NTA resin column pre-equilibrated with lysis buffer

and washed 12 column volume of wash buffer (20 mM imidazole in lysis buffer). The

protein with His tag was eluted with 4 column volume of elution buffer (250 mM

imidazole in lysis buffer) and 1 ml fraction was collected. The fractions containing

desired proteins as verified by SDS-PAGE electrophoresis were pooled together and

dialyzed overnight against fresh lysis buffer to remove imidazole. The protein solution

was then loaded onto a HiTrapTM DEAE column (5 mL bed volume) pre-equilibrated in

buffer X (20 mM Tris-HCl pH 7.4, 100 mM NaCl). Eluted by a linear gradient of 0 – 1M

NaCl, the protein came out at 0.3 – 0.4 M NaCl. Fractions containing desired proteins

were verified by SDS-PAGE and pooled accordingly. The purified protein, at greater than

95% homogeneity, was dialyzed to remove the salts and concentrated using Centricon

spin filters (MWCO 10K, Amicon, Bedford, MA), then stored frozen at -70°C. Protein

concentrations were determined by the Bradford analysis against BSA. 85 Kinetic Assay for the PurK activity

The PurK activities of the wild type PurK and KTBLK with a C- terminal His tag

were measured by monitoring the production of ADP via the coupling reactions of

pyruvate kinase and lactate dehydrogenase (PK/LDH), as previously described with some

modifications (Mueller, 1994). In a total volume of 100 µl, 100 mM HEPES pH 8.0, 20

mM KCl, 8 mM MgCl2, 2 mM PEP, 0.2 mM NADH, and 10 units of PK/LDH were

combined and allowed to incubate at 25°C. The reactions were initiated by the addition of

enzymes and the NADH comsuption in absorbance at 340 nm (ε = 6.22 mM-1 cm-1) was monitored on a Cary I spectrophotometer (Varian, Palo Alto, CA). The kinetic parameters were determined in triplet over a concentration range of 5-40 µM AIR, 5-100

µM ATP. By varying the concentration of one substrate at the saturating concentrations of the other substrates, the Michaelis constants for each substrate were determined by fitting to double reciprocal plot using the program KaleidaGraph from Synergy Software.

3.3 Results and Discussion

Construction of chimeric protein libraries between PurT and PurK by ITCHY

The chimeric protein libraries between PurT and PurK were constructed using the incremental truncation for the creation of hybrid enzymes (ITCHY) technology, because the 48% identity at the DNA level between purT and purK genes excludes the usage of the annealing-based methods to generate crossovers between these two genes. ITCHY was chosen for its speciality to create chimeric proteins with crossover at every nucleotide between two parental genes, regardless of their DNA sequence homology 86 (Figure 6) (Ostermeier, 1999a, b). On the other hand, ITCHY technology has its own limitations. (1) To cover all possible positions to generate functional chimeric proteins, two ITCHY libraries have to be set up, each with one parental protein sequence in the N- terminus followed by the other. (2) Only one crossover could be introduced between two parental genes by ITCHY. Poor solubility due to incorrect folding or structural disturbance between parts from different proteins is a big issue usually associated the single crossover chimeric proteins, which might cause the lossing potential candidates.

New experimental protocols and computational algorithms have been designed with the intention of solving the problems. (Lutz, 2001b; Kawarasaki, 2003; Saraf, 2004;

Griswold, 2005).

Two PurT and PurK ITCHY libraries were constructed using THIO-ITCHY, which was developed based on the original two vector-based ITCHY (Figure 16) (Lutz,

2001a). KT ITCHY library was constructed between the N-terminal gene fragment of

PurK (PurK[1-277]) and the C-terminal gene fragment of PurT (PurT[112-392])

(Figure 17A), and TK ITCHY library was constructed between the N-terminal gene fragment of PurT (PurT[1-319]) and the C-terminal gene fragment of PurK (PurK[79-

355]) (Figure 17B). For both libraries, the overlapping region between two parental genes consists of about 200 amino acid residues, which cover the B- domains and most of the

C- domains. The A-domain was excluded from the overlapping region because of the structural and functional similarity between the PurT and PurK A-domains. While both

A-domains adopt a Rossmann-fold structure, the conserved residues in both A-domains are likely only involved in the binding of the ribose 5-phosphate moiety of the substrates.

The last module motif in the C-domain was not included in the overlappoing region also 87

Figure 17: The schematic representation of pDIM-KT vector (A) and pDIM-TK vector (B) for ITCHY libraries, and DNA agarose gel analysis of TK ITCHY libraries incorporated with different concentrations of αS-dNTP after mung bean nuclease treatment. In (A) and (B), The two gene fragments are cloned in series into pDIM vectors via NdeI and SpeI sites, and separated by a HindIII site. The gene fragments from PurK and PurT are depicted in red and blue, respectively. The pDIM vector carries an ampicillin resistance gene as selection marker (Ap) and has a ColE1 origin for replication. The fusion proteins are under control of the lac promoter. (C). λ DNA HindIII and φX174 DNA HaeIII molecular weight markers (lane1); 100 bp DNA ladder (lane 2); TK linear ITCHY library under 1:6 ration of αS-dNTP over total dNTP (lane 3&4); TK linear ITCHY library under 1:8 ration of αS-dNTP over total dNTP (lane 5&6); TK linear ITCHY library under 1:10 ration of αS-dNTP over total dNTP (lane 7&8).

88 because of the structural and functional similarity between this motifs in PurT and PurK

The conserved residues in this motif are located in the C-loop and are responsible for the binding of the phosphate group of GAR or AIR. As each gene fragment would have 1 to

(200 ×3) bp truncated, an ideal library containing one member of each desired fusions would have 3.6 ×105 ([200 ×3]2) members.

In THIO-ITCHY, α-phosphothioate nucleotides were incorporated randomly into

the target DNA fragment by PCR amplification of the entire plasmid in the presence of

αS-dNTP. These nucleotides analogs have been shown to protect the DNA

fragmentsfrom exonuclease digestion upon incorporation (Putney, 1981), thus leading to

the desired variation in truncation length upon nuclease treatment. In the PCR

amplification step of linearized pDIM vectors in the presence of αS-dNTP, three ratios of

αS-dNTP over total dNTP (1:6, 1:8, 1:10) were used to determine the optimal

concentration of αS-dNTP for the generation of the ITCHY libraries (Figure 17C). As

shown in Figure 17C, the libraries amplified in the presence of higher concentrations of

αS-dNTP tended to have more DNA fragments with sizes similar to the starting vectors

after DNA exonuclease treatment. This situation is due to that under higher concentration

of αS-dNTP, the amplified DNA fragments were more likely to have α-phosphothioate

nucleotides incorporated in the early stage of the amplification, which generated longer

DNA fragments than those using lower concentration of dNTP upon DNA exonuclease

digestion. The ratio 1:6 of αS-dNTP over total dNTP (167 µM dNTP, 33 µM αS-dNTP)

was chosen as the optimal condition to create ITCHY libraries. After the transformation

into DH5α-E, both ITCHY libraries contain approximately 3 ×106 independent members,

and they should contain all possible fusions between the two gene fragments in the 89 overlapping region with a library size nine time bigger than the theoretical library size.

The distribution of crossovers between the parental gene fragments, as well as the variation in fragment size in the naïve (unselected) libraries, was investigated by DNA sequencing of plasmids from several randomly chosen colonies (Figure 18). For KT naïve library, seven of the characterized sequences had crossovers located in the desired sequence space while only one sequence was outside the targeted sequence space. For TK naïve library, five of the characterized sequences had crossovers located in the desired sequence space while three sequences were outside the targeted sequence space. The random distribution of crossovers over desired sequence space in the naïve libraries indicated no apparent bias toward particular regions within the parental gene fragments.

In vivo selection of functional hybrid enzymes

Both the KT and TK plasmid libraries were recovered from DH5α-E cells and transformed into the E. coli PurT(-) or PurK(-) auxotrophic strains for the selection of the catalytically active hybrid enzymes that might have PurT or PurK activities. The selections were performed as described in the Materials and Methods section. Only colonies which express functional hybrid enzymes to complement the deleted chromosomal genes could grow on minimal medium. Unfortunately, all plasmids from randomly chosen colonies that grew for the selection of PurT activity contained the wild type purT gene while all plasmids from randomly chosen colonies that grew for the selection of PurK activity contain the wild type purK gene. The causes for the recovery of the wild type genes are not clear.

In order to remove the wild type purT and purK genes, the plasmids from the colonies that grew in the first selection were recovered and subjected to the restriction 90

Figure 18: Distribution of crossovers in the naïve KT (A) and TK (B) ITCHY libraries. The crossovers between two parental gene fragments are shown as red stars. The yellow sections outlined the targeted sequence space between two parental enzymes. 91 digestion by HincII and MscI, respectively. There is one HincII site in the purT gene at position 1122 bp and one MscI site in the purK gene at position 81 bp, which are outside of the overlapping region of the ITCHY libraries. These two sites were chosen so that only the plasmids with the wild type genes, no those with desired fusion DNA fragments, were linearized upon the restriction digestion. After the digestion, the DNA samples were treated with calf intestinal alkaline phosphatase to avoid the self-ligation of the linearized plasmids, retransformed into PurT(-) or PurK(-) strains, and selected in vivo for PurT or

PurK activities. As the linearized plasmids containing the wild type genes were eliminated by the host cells after the transformaiton, the circular plasmids with desired fusion genes still retained in the auxotrophic strains for the selection. No hybrid enzymes capable of catalyzing either PurT or PurK reactions were identified from both libraries from the second in vivo selection.

The failure to identify functional hybrid enzymes that are sufficient enough to complement the auxotrophic strains might originate from several reasons. First, the targeted sequence spaces designed in the ITCHY libraries might be not large enough to include potential crossover positions to generate functional hybrid enzymes. Also, the wild type genes that emerged during the selection might be so overwhelming in terms of catalytic activities for functional hybrid proteins to show themselves even after the effort to eliminate the wild type genes by restriction digestion, because the activities of the wild type proteins might probably be a couple of orders of magnitude higher than those of hybrid enzymes. Additionally, the sensitivity of in vivo selection might limit the ability to detect some functional hybrid enzymes with low activities. The last possible reason would be that the PurT and PurK activities can not be reconstituted among the chimeric 92 proteins between E. coli PurT and PurK generated by ITCHY, due to significant amino acid sequence difference between the two enzymes. In PurT and PurK, the active sites are situated in the center of three domains, and the optimal orientation amongst three substrates has been suggested to be critical for the reactions to occur (Marolewski, 1997).

Because of the low amino acid sequence identity (23%) between PurT and PurK, the structural clashes between the portions from the different parental proteins would be expected in the hybrid proteins. The structural clashes, especially those at the domain interface, might be strong enough to disrupt the relative orientation of three substrates, and thus lead to the complete loss of the catalytic activities.

Identification of functional subdomains in each individual domain by rational method

The whole domain swapping by rational or combinatorial methods did not generate functional hybrid enzymes between PurT and PurK using in vivo selection, suggesting that the three domains observed in the three-dimensional structures might not be utilized as building blocks for the evolution of PurT and PurK. The targets for the searching functional domains for domain swapping shifted to the next level, the subdomains in each individual domain.

The B- domain was chosen as the first domain to be investigated for its small size and simple structure. As shown in Figure 19, the starting constructs were TKB and KTB for identifying the essential structural elements in the PurT and PurK B-domains for the catalytic activities, respectively. With the B-domain swapped from the other protein, TKB and KTB are not functional proteins as shown previously. A series of hybrid proteins

were designed rationally with gradually longer insertion of the wild type sequence into

the B-domain of TKB and KTB, and the sequences that resulted in the recovery of the 93

Figure 19: Distribution of rationally chosen crossovers in each individual domain. The sequences from PurT and PurK are shown in blue and red, respectively. The crossovers were numbered according to the domain they belong to. 94 original activities were identified using in vivo selection. Based on the structural and sequence analysis, four positions in the PurT B-domain of KTB and three positions in the

PurK B-domain of TKB were chosen as crossover points. Accordingly, four hybrid

proteins based on KTB and three hybrid proteins based on TKB were created, and each

had the region before the crossover point replaced by the corresponding region from the

B-domain of the other protein (Figure 19). In such way, more and more wild type

sequences were brought into the hybrid proteins in order to identify the regions that

recover the original activity in the B-domains. These hybrid proteins were subcloned into pDIM vectors, and selected in vivo for PurK and PurT activities after the transformation into E. coli auxotrophic strains (Table 9). Two out of four hybrid enzymes from KTB,

KTB 3 and KTB 4, showed PurK activity by supporting growth of the E. coli PurK(-)

auxtrophic strain on minimal medium, and only TKB 3 from TKB showed PurT activity

by supporting growth of the E. coli PurT(-) auxtrophic strain on minimal medium. While

KTB 2 did not show PurK activity in vivo, its difference from KTB 3 is a region with 13

amino acid residues (PurK T123-R135), in which five residues (G125, Y126, D127,

G128, G130) are conserved in PurK (Figure 11B). In the PurK structure, these 13 amino

acid residues form the B loop. Site-directed mutagenesis study of PurT has indicated that

mutation of the residues in the B loop significantly affected its catalytic activity

(Marolewski, 1997). Surprisingly, TKB 2, the corresponding hybrid protein of KTB 3 in the PurT B-domain, did not show PurT activity, while the PurT activity was restored in

TKB 3 with the addition of an extra α-helix (PurT S170-G185). Although without

conserved residues, the residues in this α-helix, particularly residue W188, might play a

role in confining the movement of PurT B loop, which is Gly and Ser rich. PurK B- 95

Table 9: In vivo activities of the hybrid protein with crossovers in the B- domain. Hybrid PurT PurK Hybrid PurT PurK protein activity activity protein activity activity

KTB 1 - - TKBLT - -

KTB 2 - - TKBLHT - -

KTB 3 - + KTBLK - +

KTB 4 - + KABTLT - -

TKB 1 - - TABKLK - -

TKB 2 - - KLT - -

TKB 3 + - TLK - -

“+” means that the hybrid proteins can support the growth of auxotrophic strains on the minimal medium while “-” means they can not. 96 domain does not contain this α-helix, but the conserved Tyr residue in the B loop might play a similar role in positioning the PurK B loop residues in the PurK active site. A new hybrid protein, TKBLHT which has the PurT B loop and the extra α-helix in the PurK B-

domain of TKB, was constructed, and still did not show PurT activity in vivo. This finding suggested that the structure and functionality of PurT B-domain is sensitive for sequence changes.

To appraise the importance of B loop for PurT and PurK activities, another six hybrid proteins were created and selected in vivo for PurT and PurK activities. TKBLT,

KTBLK, KABTLT, TABKLK, KLT, and TLK are same as TKB, KTB, KABT, TABK, the wild

type PurK, and the wild type PurT, respectively, only with the B loop from the other

protein. PurT B loop includes three serine residues from position 159 to 161 and PurK B

loop has a sequence of GYDG from position 125 to 128. Out of the six hybrid proteins,

only KTBLK was observed to be able to support the growth of PurK(-) auxotrophic strain

on minimal medium, suggesting that PurK B loop is the only element necessary for PurK

activity in PurK B-domain. On the other hand, PurT B loop needs some other structural elements in PurT B-domain to support its function. In both PurT and PurK, the B loop is not the only element that determines their respective activities, which require the coordination of other elements from the A- and C-domains.

In order to measure the in vitro activity of KTBLK, the wild type PurK and KTBLK were purified with a C- terminal His tag. The kinetic parameters of their PurK activity were determined by following the production of ADP utilizing the coupling reaction of

PK/LDH (Table 10). Compared to the wild type PurK, KTBLK has a 7 fold decrease in

the kcat, a 7 fold decrease in the Km for ATP, and a 6.4 fold decrease in Km of AIR. When 97

Table 10: Kinetic parameters of the PurK acitivity of PurK and KTBLK.

k K (ATP) K (AIR) k /K (AIR) k /K (ATP) enzyme cat m m cat m cat m (s-1) (µm) (µm) (µm-1· s-1) (µm-1· s-1)

PurK 5.9 ± 0.5 48 ± 8.5 66 ± 16 0.089 0.12

KTBLK 0.87 ± 0.02 6.8 ± 0.7 10.3 ± 1.2 0.084 0.13

98 looking at the specificity constants, the kcat/Km (ATP) and the kcat/Km (AIR), the wild type

PurK and KTBLK are equally active in terms of the PurK activity. These findings clearly

indicated the essential role of the B loop for PurK activity. The Km (AIR) and Km (ATP) of KTBLK are smaller than those of the wild type PurK, which might result from the shift of the initial velocity profile to the lower level due to the smaller kcat value observed in

KTBLK.

To identify the functional subdomains in the A- and C-domains, the initial

constructs for PurT and PurK activities were the wild type PurT and KTBLK, respectively.

Both are functional enzymes with the original activity. A series of hybrid proteins were

designed rationally with gradually longer insertion of the sequence from the other protein

into the A- and C-domain of PurT and KTBLK, and the sequences that resulted in the loss

of the catalytic activities were identified using in vivo selection. At the same time, the

tolerance of each catalytic activity from the incorporation of foreign sequences was also

studied. According to the location of conserved amino acid residues in the A- and C-

domains, three and five crossovers were chosen in the A- domains and C- domains,

respectively. For each A- domain, three hybrid proteins were created with the N- terminal

region before the crossovers replaced by the corresponding region from the same domain

of the other protein. For each C- domain, five hybrid proteins were created with the C-

terminal region after the crossovers replaced by the corresponding region from the same

domain of the other protein (Figure 19). By doing so, more and more foreign sequences

were introduced into the hybrid proteins, and the regions that cause the loss of activities

after being introduced were identified using in vivo selection. All these hybrid proteins

were subcloned into pDIM vectors, and selected in vivo for PurK and PurT activities after 99 the transformation into E. coli auxotrophic strains (Table 11). Out of 16 hybrid proteins tested, only TAKTBLK1 and KAT 1 retained in vivo PurK and PurT activity, respectively,

which suggested that only the residues in the first β sheet in PurT and PurK structures are

interexchangeable between the A- domains of these two proteins without the sacrifice of

the original activity. TAKTBLK1 is KTBLK with the first 7 amino acid residues in PurK A-

domain replaced by the first 18 residues of PurT. KAT 1 is the wild type PurT with the

first 18 amino acid residues replaced by the first 8 amino acid residues of PurK. The

demolition of the original activity caused by the introduction of longer foreign sequences, some of which only have less than 20 amino acid residues, indicated that both A- and C domains in PurT and PurK are sensitive to sequence changes and even small changes might result in the generation of non-functional hybrid proteins.

The most interesting findings came from the comparison of in vivo selection results of two pairs of hybrid proteins, TAKTBLK 1 and TAKTBLK 2, KAT 1 and KAT 2

(Table 11). TAKTBLK1 is KTBLK with the first 7 amino acid residues in PurK A-domain replaced by the first 18 residues of PurT while TAKTBLK 2 is KTBLK with the first 14

amino acid residues in PurK A-domain replaced by the first 25 residues of PurT. KAT 1 is

the wild type PurT with the first 18 amino acid residues replaced by the first 8 amino acid

residues of PurK while KAT 2 is the wild type PurT with the first 25 amino acid residues

replaced by the first 14 amino acid residues of PurK. In each pair, the two hybrid proteins only differ in a region with 7 amino acid residues, which are located in the P loop in the 100

Table 11: In vivo activities of hybrid protein with crossovers in the A- and C- domains. Hybrid PurT PurK Hybrid PurT PurK protein activity activity protein activity activity

TAKTBLK 1 - + KAT 1 + -

TAKTBLK 2 - - KAT 2 - -

TAKTBLK 3 - - KAT 3 - -

KTBLKKC1 - - TKC 1 - -

KTBLKKC 2 - - TKC 2 - -

KTBLKKC 3 - - TKC 3 - -

KTBLKKC 4 - - TKC 4 - -

KTBLKKC 5 - - TKC 5 - -

“+” means that the hybrid proteins can support the growth of auxotrophic strains on minimal medium while “-” means they can not.

101 A- domains, but the introduction of the P loop region from the other protein led to the complete loss of the original activity. With a sequence of 8-GNGQLGR-14 in PurK and a sequence of 19-GSGELGK-25 in PurT, the amino acid sequences of PurK and PurT P loop regions are very similar. In the three-dimensional structures, all but the second amino acid residues in the P loop of both enzymes adopt similar confirmations. In

TAKTBLK 2 hybrid protein, the side chain of the S20 residue from PurT P loop was

penetrates into a hydrophobic pocket formed by residues Val6, Pro29 and Val31 in PurK

A-domain. In KAT 2 hybrid protein, the side chain of the N9 residue from PurK P loop

adopts a different conformation and protrudes towards a α-helix between D46 and A53 in

PurT A-domain, which is absent in PurK A-domain. These unfavorable interactions due

to the introduction of foreign P loop might lead to the loss of the original activities in

TAKTBLK 2 and KAT 2 by disrupting the binding pocket for ribose 5-phoshphate in the

A-domain of both hybrid proteins.

Identification of functional subdomains in the A- and C- domains by combinatorial

method

My previous study of the functionality of the hybrid proteins constructed rationally using in vivo selection showed that the residues in PurK B loop were important

for PurK activity as KTBLK is a functional enzyme with PurK activity, while all three

domains of PurT, as well as PurK A- and C-domains were highly sensitive to sequence

changes and some sequence changes as small as 14 amino acid residues let to the loss of

the catalytic activities. The combinatorial approach was adopted to investigate the

functional subdomains in PurT and PurK A- and C- domains to explore additional

sequence spaces. 102 Four ITCHY libraries, KA, KC, TA, and TC were constructed, each designed with the targeted sequence on one specific domain. For instance, KA library was designed for PurK A-domain and KC library was designed for PurK C-domain while TA library was designed for PurT A-domain and TC library was designed for PurT C- domain. Each library started with an inactive wild type PurT or KTBLK with a truncation at the either N- or C-terminus to avoid the selection of the starting constructs in the subsequent in vivo selection step. As shown in Figure 20A, KA library was constructed between a PurT A-domain (PurT M1-T129) and a truncated KTBLK without the first 14

amino acid residues, separated by a HindIII site. TA library was constructed between a

PurK A-domain (PurK M1-T93) and a truncated PurT without the first 26 amino acid

residues, separated by a HindIII site. KC library was constructed between a truncated

KTBLK without the last 44 amino acid residues and a PurT C-domain (PurT F200-G392),

separated by a HindIII site. TC library was constructed between a truncated PurT without

the last 33 amino acid residues and a PurK C-domain (PurK F159-G355), separated by a

HindIII site. Therefore, the overlapping regions of KA, KC, TA, and TC libraries were

PurK(15-93), PurK(158-312), PurT(27-130) and PurT(200-360), respectively.

Accordingly, the theoretical ITCHY library sizes of KA, KC, TA, and TC libraries were

5.6×104 ([79×3]2), 2.2×105 ([155×3]2), 9.7×104 ([104×3]2), and 2.3×105 ([161×3]2),

respectively. Using ITCHY technology, each library contained hybrid proteins with

crossovers in one specific domain, and was subjected to in vivo selection in order to

determine if the original activity could be restored upon the addition of foreign sequences from the other protein, and thus identify the interexchangeable functional subdomains between PurK and PurT. PurK activity was selected in KA and KC libraries, and PurT 103

Figure 20: Design of ITCHY libraries for the A- and C- domains (A), and their respective control constructs (B) (C). The sequences from PurT and PurK are shown in blue and red, respectively. The amino acid residues were numbered according to the protein they belong to. 104 activity was selected in TA and TC libraries, using their respective E. coli auxotrophic strains.

Before the construction and in vivo selection of ITCHY libraries, several control hybrid proteins were created for each library, and their functionalities were examined in vivo, with the purpose to minimize false positive rates of the ITCHY libraries (Figure

20B&C). As shown in Figure 20B, three control hybrid proteins were constructed for each library. The first control was the wild type PurT or KTBLK with truncation at the N-

or C-termimus, the second control was the truncated wild type PurT or KTBLK with the

domain of interest replaced by the corresponding domain from the other protein, and the

last control was the truncated wild type PurT or KTBLK with the insertion of the

corresponding region of the truncated sequence from the other protein to make a full

length protein. For instance, three control constructs for KA library were a KTBLK with a deletion of the first 14 amino acid residues, a TABKLK, and a KTBLK with the first 14

amino acid residues replaced by the first 26 amino acid residues for PurT, as shown in

Figure 20C. For each library, these three control hybrid proteins should not be functional in vivo, to minimize the numbers of functional hybrid proteins with crossovers in the proximity of domain boundaries and maximize the chance to identify functional hybrid proteins with crossovers in the middle of targeted sequence spaces.

Additionally, based on PurT and PurK structural superposition, PurT has an extra strand with 10 amino acid residues at the N-terminus in the A-domain, while PurK has an extra α-helix with 15 amino acid residues at the C-terminus in the C-domain (Figure 9).

To determine the effects of these extra structural elements with respect to the protein activities, one additional control construct was created for each library as shown in Figure 105 20C, with a truncation or an insertion at the N- or C-terminus of the wild type PurT and

KTBLK. This control construct should be a functional enzyme in vivo in order to identify functional hybrid proteins from the selection of its ITCHY library, because all hybrid proteins in the ITCHY libraries have the same truncation or insertion just like the control construct. After being subcloned into pDIM vectors via NdeI and SpeI sites and transformed into E. coli auxotrophic strains, the control constructs of KA and KC

libraries were selected in vivo for PurK activity, while the control constructs of TA and

TC libraries were selected in vivo for PurT activity. The selection results of the control

constructs for each library turned out to be exactly as expected, with the first three control

constructs being non-functional enzymes and the last control construct being a functional enzyme.

One thing worthy to point out is that there was a controversy about the oligomeric state of the wild type PurT. PurT has been shown experimentally to exist as a monomer in solution (Marolewski, 1994), but was in contrast to the dimer structure determined by crystallographic study (Thoden, 2000). In PurT dimer interface, the residue Thr2 – Ala12 in one monomer was suggested to form an anti-parallel β-sheet with residues of Val335 –

Gln341 of the other monomer (Thoden, 2000). The fact that a truncated PurT∆11 is still a functional protein with PurT activity provides another evidence for the monomeric state of PurT.

As the control experiments went well exactly as expected, four ITCHY libraries designed for the A- and C- domain of PurT and PurK were generated using THIO-

ITCHY methodology using the ratio 1:6 of αS-dNTP over total dNTP (167 µM dNTP, 33

µM αS-dNTP) (Lutz, 2001a). After the transformation into E. coli DH5α-E strain, each 106 ITCHY libraries contains (3-36)×106 independent members, more than ten times bigger

than the theoretical library sizes. The distribution of crossovers between the parental gene

fragments, as well as the variation in fragment size in the naïve (unselected) libraries, was

investigated by DNA sequencing of plasmids from several randomly colonies

(Figure 21). For KA naïve library, three of the characterized sequences had crossovers

located in the desired sequence space while two sequences were outside the targeted

sequence space. For TA naïve library, five of the characterized sequences had crossovers

located in the desired sequence space while only one sequence was outside the targeted

sequence space. For KC naïve library, five of the characterized sequences had crossovers

located in the desired sequence space while only one sequence was outside the targeted

sequence space. For TC naïve library, all seven of the characterized sequences had

crossovers located in the desired sequence space. The random distribution of crossovers

over desired sequence space in the naïve libraries indicates no apparent bias toward

particular regions within the parental gene fragments.

All of the plasmid libraries were recovered from DH5α-E. KA and KC libraries were transformed into the E. coli PurK(-) auxotrophic strains for the selection of the catalytically active hybrid enzymes with PurK activity while TA and TC libraries were transformed into the E. coli PurT(-) auxotrophic strains for the selection of the

catalytically active hybrid enzymes with PurT activity. The selections were performed as

described in the Materials and Methods section. Only colonies which express functional

hybrid enzymes to complement the deleted chromosomal genes could grow on minimal

medium. No functional hybrid enzymes were identified from KA and TC libraries based

on in vivo selection, indicating that both PurK A- domain and PurT C domains are 107

Figure 21: Distribution of crossovers in the naïve KA (A), TA (B), KC (C), and TC (D) ITCHY libraries. The crossovers between two parental gene fragments are shown as red stars. The yellow sections outlined the targeted sequence space between two parental enzymes.

108 sensitive to sequence changes and even small changes might result in the generation of non-functional hybrid proteins.

Only one functional hybrid protein with PurT activity that was exactly aligned fusion was identified from TA library, and its sequence was determined by DNA sequencing. In this hybrid protein (TA75), the first 75 amino acid residues in the wild type PurT was replaced by the first 41 amino acid residues in PurK A-domain

(Figure 22). Most of conserved residues in this region between PurT and PurK are located in the P loop, the residues in which are involving in forming a binding pocket for ribose 5-phosphate moiety of GAR. Base on the structural similarity of this region between PurT and PurK, these amino acid residues could be divided into two groups. The first group includes residues T13 – R 43 in PurT and residues M1 – L32 in PurK, which adopt exactly the same structure and are superimposable, the β-sheet–loop–helix–β-sheet.

The residues in the second group adopt different structures in each protein, a helix-loop- helix structure by residues Y44 – E74 in PurT and a loop structure by residues D33 –

V39 in PurK. In TA75 hybrid protein, this helix-loop-helix structure in PurT was replaced by the loop structure in PurK without losing its capability of binding GAR, suggesting that it is not necessary for GAR binding. The reasons for the occurrence of this helix-loop-helix motif in the PurT structure are not clear. In addition, the second enzyme in purine biosynthetic pathway, PurD also has this helix-loop-helix structure in its A-domain, suggesting that PurT and PurD are closer in evolutionary relation than

PurT and PurK.

From KC libraries, two functional hybrid proteins with PurK activity that was exactly aligned fusion were isolated, and their sequences were determined by DNA 109

Figure 22: Functional subdomains in PurK A- domain (A) and PurT C- domain (B). PurT and PurK three-dimensional structures are displayed in solid-ribbon representation. The substrates in complex with enzymes are displayed in a ball-and-stick representation. The functional subdomains are displayed in blue. The big arrows represent the direction of domain swapping. The structure-based alignments of subdomain sequences are also displayed. Conserved redisues in each enzyme are masked in green. Identical residues between different enzymes are marked by red asterisk.AMPPNP, 5’-adenylyl imidodiphosphate. 110 sequencing. One hybrid protein (KC295) has the residues 295 – 355 in PurK C-domain replaced by residues 336 – 392 in PurT C-domain, while the second hybrid protein

(KC289) has the residues 289 – 355 in PurK C-domain replaced by residues 329 – 392 in

PurT C-domain (Figure 22). In KC289, an extra β-sheet between PurT E329 and E334 was introduced, and it might have a potential role to locate the residues in the C loop into their optimal positions by forming an anti-parallel β-sheet with the C loop. As mentioned previously, the residues in the C loop are believed to interact with the phosphate group of

GAR. In PurK, residue Y304 that is at the N- terminus of the C loop might play a similar role in positioning the C loop by protruding its big side chain on the top of the C loop.

The functional subdomains identified so far are all involving in forming a binding pocket for ribose 5-phosphate moiety, which is common in all the substrates in purine biosynthetic pathway. These results suggest the existence of a common ribose 5- phosphate binding motif in the enzymes of the purine biosynthetic pathway, which was also proved by crystallographic study. The failure to swap these functional subdomains in the other direction might be explained by that unfavorable interactions caused by doing so were strong enough to exceed the tolerance limit for a functional enzyme.

3.4 Conclusions:

The attempt to identify the functional domains and subdomains in PurT and PurK using rational and combinatorial methods generated a series of functional hybrid proteins

(Figure 23). Using rational methods, one loop region with the sequence 125-GYDG-129 in PurK B domain was identified to be important for PurK activity. While PurT has a 111

Figure 23: Functional hybrid proteins identified by the rational methods (A) and combinatorial methods (B). The sequences from PurT and PurK are shown in blue and red, respectively. The amino acid residues were numbered according to the protein they belong to. The hybrid proteins with the same function were compiled together and labeled at the left.

112 conserved 159-SSS-161 sequence at the corresponding region, this discrepancy in sequence obviously was the result of the . Using combinatorial method, it was showed that PurK activity could tolerate sequence change as high as 36%

(127 of 355 amino acid residues substituted by PurT sequence in KC288), while the PurT activity could only tolerate a 19% sequence change (74 of 392 substituted by PurK sequence in TA 75). The relative low tolerance of sequence change in PurT suggests that

PurT activity might be developed after the appearance of PurK activity. Additionally, the interexchangeable functional structural motifs between PurT and PurK identified by combinatorial method are involved in the binding of ribose 5-phosphate moiety of the substrates, which is in accord with the speculation that a common structural motif for ribose 5-phosphate exists in every enzyme of the purine biosynthetic pathway (Kappock,

2000).

During the searching of functional domains in PurT and PurK, it was difficult for rational design to identify a suitable crossover, thus a functional subdomain, due to the limited knowledge of protein structure-function relationships and low sequence homology between PurT and PurK. The combinatorial methods were better choices for this task for their ability to explore vast sequence space, and indeed succeeded in identifying two functional subdomains

However, generally speaking, both PurT and PurK activities are sensitive to sequence and structural changes based on the results from domain swapping using rational and combinatorial methods. In rational design, some changes with only several amino acid residues led to the complete loss of the catalytic activities in hybrid proteins.

No distribution of functional crossovers along the overlapping region was observed in the 113 ITCHY libraries. The structural rigidity might originate from their reactions, the ways how they organize their active sites, and low sequence homology. In PurT and PurK reactions, each enzyme requires three substrates, generates an unstable intermediate, and has multiple enzymatic steps. Additionally, active sites are situated in the center of three domains in each enzyme. The reactions only occur when all three substrates and catalytic residues are in the perfect positions. Any small structural changes that might affect the optimal orientation of these elements would generate a significant impact on protein activities. The single crossover into hybrid proteins generated by the methodologies used for domain swapping might also contribute to the failure to generate functional hybrid proteins, given the low sequence homology between PurT and PurK. The unfavorable interactions between parts from different proteins in the hybrid proteins might lead to the disruption of the active sites, or low solubility. The low sequence homology between

PurT and PurK was believed to be a result of the accumulation of mutations to optimize each activity after their divergence. 114

Chapter 4

Interconversion of enzymatic activities between N-acetylneuraminate lyase (NAL) and dihydrodipicolinate synthase (DHDPS), two (β/α)8 barrel proteins

4.1 Introduction:

4.1.1 The (β/α)8 barrel proteins

Since the discovery of its archetypical structure of triosephosphate isomerase

(TIM) from chicken muscle (Banner, 1975), the (β/α)8 barrel fold is the most commonly

observed fold in enzymes (Figure 24) (Gerlt, 2003). With nearly 900 (β/α)8 barrel

structures in the , the (β/α)8-barrel proteins account for approximately

10% of all enzymes with known molecular structure, exceeding any other known fold in terms of overall number and functional diversity (Farber, 1990). A study of protein fold usage of several microbial genomes further emphasized the abundance of the (β/α)8 barrel

fold (Gerstein, 1998; 2000).

A canonical (β/α)8 barrel consists of a cylindrical core of eight parallel β-sheets

surrounded by an outer wheel comprising typically eight α-helices (Figure 24). Some

proteins display deviations from the classical (β/α)8-barrel fold, including variations in

the number and orientation of β-strands, local structural distortions and lack of barrel

closure (Nagono, 2002). For instance, one enolase from yeast has a βααβ(β/α)6 barrel structure (Wedekind, 1994) and quinolinic acid phosphoribosyltransferase only has a

(β/α)6 fold (Eads, 1997). Additionally, many (β/α)8 barrel proteins contain extra structural 115

Figure 24: The three-dimensional structure of triosephosphate isomerase (5TIM.pdb) viewed from the C- terminal end of the barrel (A) and from the side (B). The α-helices and β-sheets are displayed in red and green, respectively.

116 elements that link to this central fold, at either the N- or C- terminus of the barrel, or in the loop segments (Pujadas, 1999). In the latest release (version 1.69, July 2005), the

Structural Classification of Proteins (SCOP) database (http://scop.mrc-

lmb.cam.ac.uk/scop/) lists 31 (β/α)8 barrel fold families (Murzin, 1995). In the SCOP

database, the proteins are divided based on their amino acid sequence, structural and

functional similarities into families (clear evolutionary relationship), superfamilies

(possible common evolutionary ancestor) and folds (Murzin, 1995). Despite its eightfold

pseudosymmetry, (β/α)8 barrel fold is believed to have fourfold symmetry on the basis of the packing of the side chains of residues in the β-sheets within the interior of the barrel, indicating that the smallest possible unit is the (β/α)2 unit (Wieregna, 2001).

With a few exceptions, such as narbonin and concanavalin B, most (β/α)8 barrel

proteins are enzymes (Vega, 2003). These enzymes catalyze a great variety of chemical

reactions, and include five of the six general classes of catalytic activities according to

the Enzyme Commission classification scheme, excluding only the (Pujadas,

1999). Additionally, a large proportion of enzymes in some metabolic pathways, such as

glycolysis and tryptophan biosynthesis, are (β/α)8 barrel proteins (Erlandsen, 2000). In

the enzymes with (β/α)8 barrel fold, the active site residues are located at the C- terminal

face of the barrel, which comprises the C- terminal ends of the β-sheets and the loops that

connect β-sheets with the subsequent α-helices. Hocker et. al. suggested that the loops

linking the α-helices with the subsequent β-sheets, which are located at the C- terminal

face of the barrel, are important for stabilizing the barrel fold (Hocker, 2001). This

functional arrangement between the two faces of the (β/α)8 barrel allows for diverse

catalytic activities without compromising the fold stability (Petsko, 2000), and also 117 serves as evidence of divergent evolution of the (β/α)8 barrel proteins (Farber, 1990). The

widespread occurrence of the (β/α)8 barrel proteins, combined with their catalytic

versatility and structural properties, attributes to the enormous interest in the (β/α)8 barrel proteins as targets in the fields of protein engineering (Hocker, 2001, 2005; Gerlt, 2003), protein evolution (Henn-Sax, 2001; Gerlt, 2001; Schmidt, 2003) and (Wu,

2002a,b; Forsyth, 2002).

4.1.2 The evolution of the (β/α)8 barrel proteins

An amino acid sequence analysis of the (β/α)8 barrel proteins from non-

homologous enzyme families does not reveal significant sequence similarity, which

precludes its utilization in the study of protein evolution. Historically, this fact led to the

suggestion that the (β/α)8 barrel proteins evolve through convergent paths in order to

obtain a stable protein fold (Lesk, 1989). However, continual accumulation of structural

and sequence information supports the divergent evolution theory that the (β/α)8 barrel proteins result from divergent evolution from one or several common ancestors (Farber,

1990; Reardon, 1995; Copley, 2000; Nagano, 2002; Gerlt, 2003).

Although the overall amino acid sequence homology between the (β/α)8 barrel proteins is generally low, structure-based sequence alignments reveal the presence of clusters of homologous residues among the proteins with similar reaction mechanisms, or proteins in the same or related pathways (Copley, 2000; Wise, 2002). Concentrating on the (β/α)8 barrel proteins of central metabolism, Copley and Bork utilized a protein

algorithm to demonstrate that 12 (β/α)8 barrel superfamilies in the 118 SCOP database share a common evolutionary origin on the basis of sequence relationships detected with PSI-Blast, and proposed a phylogeny of these (β/α)8 barrel proteins of central metabolism according to their sequence, structures, and functions

(Copley, 2000). By placing known the (β/α)8 barrel structures of central metabolism into

this phylogeny, they also suggested that these metabolic pathways might evolve through

molecular recruitment (Copley, 2000).

Given that the majority of the (β/α)8 barrel proteins are enzymes, and their active

sites are restricted to the C-terminal face of the barrel, divergent evolution was

considered as a more reasonable answer than for the evolution of

the (β/α)8 barrel proteins (Farber, 1990). With more sequence and structural information

available, some common structural motifs were recognized as the more convincing

arguments to support divergent evolution of the (β/α)8 barrel proteins. In 1990, Farber and Petsko classified 17 (β/α)8 barrel structures into four families on the basis of their

structural features, and the sequence, structural, and functional similarity between

enzymes in each family suggested divergent evolution from a common ancestor (Farber,

1990).

A common phosphate binding motif was identified as the most conserved

structural motif among the (β/α)8 barrel proteins. Approximately two-thirds of the established (β/α)8 barrel protein families utilize substrates or cofactors that contain at

least one phosphate group (Nagano, 2002). First recognized in three (β/α)8 barrel

enzymes from the tryptophan biosynthetic pathway, this phosphate binding site consists

of the residues from the C- termini of the loops connecting β7-α7 and β8-α8, and the N-

termini of a short additional helix (α8’) in the loop connecting β8-α8 (Wilmanns, 1991). 119

Some (β/α)8 barrel proteins without this phosphate binding site, such as 3-

dehydroquinase, N-acetylneuraminate lyase (NAL) and dihydrodipicolinate synthase

(DHDPS), were suggested to arise from an ancestor with a phosphate binding site

(Copley, 2000).

It has been shown that almost one half of the characterized (β/α)8 barrel proteins require the presence of metal ions for catalysis (Nagano, 2002). In some enzymes from the enolase family, the metal ion binding sites are highly conserved (Hasson, 1998;

Chaudhuri, 2003). However, in terms of positions, types and numbers of amino acid residues, these metal ion binding sites are more versatile than the conserved phosphate binding site (Nagano, 2002). In addition, some conserved patterns of catalytic residues in terms of their positions were also observed from different families (Nagano, 2002).

The study of the (β/α)8 barrel proteins in metabolic pathways using directed evolution technologies also provided some insightful evidence to support divergent evolution of the (β/α)8 barrel proteins. HisA and HisF catalyze two successive reactions

in the histidine biosynthetic pathway while TrpF is involved in the tryptophan

biosynthetic pathway. These three enzymes belong to the (β/α)8 barrel proteins, and their

three-dimensional structures are superimposable with each other (Priestle, 1987; Lang,

2000). Both HisA and TrpF catalyze an Amadori rearrangement, which is the irreversible

isomerization of an aminoaldose into an aminoketose (Figure 25). HisF catalyzes a ring

closure reaction that yields imidazole glycerol phosphate (ImGP) and 5-aminoimidazole-

4-carboxamide ribotide (AICAR), which is involved in the de novo purine biosynthesis

(Figure 25). One single mutation (D127V) in HisA from Thermotoga maritime was found 120

Figure 25: Reactions catalyzed by phosphoribosylanthranilate(PRA) isomerase (TrpF), indole-3-glycerol phosphate(IGP) synthase (TrpC), phosphoribosylformimino-5-amino- 1-phosphoribosyl-4-imidazole carboxamide(ProFAR) isomerase (HisA), and imidazole glycerol phosphate(ImGP) synthase (HisF). TrpF and TrpC, HisA and HisF catalyze two successive reactions in the tryptophan and histidine biosynthetic pathway, respectively. All four enzymes are the (β/α)8 barrel proteins. TrpF and HisA catalyze the Amadori rearrangement of the aminoaldoses PRA and ProFAR into the aminoketoses CdRP and PRFAR. TrpC and HisF catalyze ring closure reactions that yield IGP, ImGP and AICAR. The ammonia molecule necessary for HisF reaction is provided by the glutaminase (HisH) (not shown). PRA, phosphoribosylanthranilate; CdRP, 1-(2- carboxyphenylamino)-1-desoxyribulose-5-phosphate; IGP, indole-3-glycerol phosphate; ProFAR, N’-[(5’-phosphoribosyl)formimino]-5-aminoimidazole-4-carboxamide ribonucleotide; PRFAR, N’-[(5’-phosphoribulosyl)formimino]-5-aminoimidazole-4- carboxamide ribonucleotide; AICAR, 5-aminoimidazole-4-carboxamide ribotide; ImGP, imidazole glycerol phosphate. (Adapted from (Leopoldseder, 2004)).

121 to be sufficient to convert it into a protein with TrpF activity at the complete loss of HisA activity, though this TrpF activity observed in the HisA mutant is 105 fold lower than that of wild type TrpF (Jurgens, 2000). The amino acid sequence identity between HisA and

TrpF is only 11%. The D127V mutation was believed to change substrate specificity of the enzyme (Jurgens, 2000). The corresponding D130V substitution in HisF from

Thermotoga maritime was also able to confer a TrpF activity. The Trp activity of this

HisF(D130V) mutant is more than 106 fold lower than that of wild type TrpF

(Leopoldseder, 2004). Saturation random mutagenesis study at position 127 of HisA and

position 130 of HisF suggested that removal of the negative charge introduced by the Asp

residue in these two positions is sufficient to establish a TrpF activity in HisA and HisF

(Leopoldseder, 2004). Binding studies of the TrpF substrate phosphoribosylanthranilate

(PRA) to wild type HisA and HisA(D127V) showed that the binding affinity of

HisA(D127V) to PRA was more than ten fold stronger than that of wild type HisA,

indicating that the generation of TrpF activity in HisA(D127V) might be caused by the binding of the negatively charged TrpF substrate, PRA, due to the removal of a Asp

residue (Leopoldseder, 2004).

Another successful example of changing substrate specificity between the (β/α)8 barrel proteins came from a study of members of the enolase superfamily containing the

(β/α)8 barrel fold. Enzymes from the enolase superfamily catalyze different reactions

using a conserved partial reaction, the abstraction of the α-proton of the carboxylate

substrate by an active site base to generate an enolate anion intermediate that is stabilized

by coordination to the essential Mg2+ ion (Gerlt, 2005). It has been demonstrated that the

L-Ala-D/L-Glu epimerase from E. coli (AEE) mutant with single mutation D297G and the 122 muconate lactonizing enzymes II from Pseudomonas sp. P51 (MLE II) mutant with a single mutation E323G are able to catalyze the o-succinylbenzonate synthase (OSBS) reaction, as well as their respective wild type reactions (Figure 26) (Schmidt, 2003).

Neither wild type progenitor catalyzes the OSBS activity. Although all three enzymes belong to the MLE subgroup of enolase superfamily, each of them catalyzes a different chemical reaction: 1,1-proton transfer by AEE, cycloisomerization by MLE, and β- elimination/dehydration by OSBS (Babbitt, 2000). The mutations in the AEE(D297G) mutant and MLE II(E323G) mutant are each located at the end of the β8 sheet of the

(β/α)8 barrel domain, and it has been postulated that these substitutions result in a

relaxation of substrate specificities of the wild type enzymes and allow the binding of the

OSBS substrate to the active site (Schmidt, 2003). The generation of functional

promiscuity in the (β/α)8 barrel proteins by single amino acid substitution exemplified by

these exciting findings suggests that these (β/α)8 barrel enzymes might have evolved from an ancestral enzyme with broader substrate specificity through divergent evolution, and also indicates that the (β/α)8 barrel enzymes are suitable targets for the design of new

catalytic activities.

It is widely agreed that new enzymes evolve from existing ones through the

duplication of genes encoding the existing enzymes. This allows one copy to retain the

original function while the duplicate copy undergoes mutations for divergence of

sequence and function at the same time (Jensen, 1976; O’brien, 1999). In terms of the

relationship between the reactions catalyzed by the progenitor and the “new” enzyme,

divergent enzyme evolution has been shown to follow one of three general routes (Gerlt,

2001). 123

Figure 26: Reactions catalyzed by the L-Ala-D/L-Glu epimerase from E. coli (AEE), the muconate lactonizing enzymes II from Pseudomonas sp. P51 (MLE II), and the o- succinylbenzonate synthase (OSBS). Each catalyzes a different chemical reaction: 1,1- proton transfer by AEE, cycloisomerization by MLE, and β-elimination/dehydration by OSBS. (Adapted from (Schmidt, 2003)).

124 (1) The duplicate retains the substrate specificity of the progenitor and develops a different chemical mechanism through divergent evolution to catalyze new reactions. The evolution of the (β/α)8 barrel enzymes catalyzing the successive steps in histidine and

trypophan biosynthetic pathways is believed to follow this route (Henn-Sax, 2002).

(2) The duplicate retains the chemical mechanism of the progenitor and develops new

substrate specificity through divergent evolution to catalyze new reactions. Examples for this route include the members in the (β/α)8 barrel-containing enolase superfamily that share a conserved reaction mechanism but catalyze different reactions due to different substrate specificities (Babbitt, 1996).

(3) The duplicate only retains the active site architecture of the progenitor and develops both new substrate specificity and an altered chemical mechanism to catalyze new reaction. Orotidine 5’-monophosphate decarboxylase and 3-keto-L- gulonate 6-phosphate

decarboxylase, two (β/α)8 barrel enzymes, are the only known examples for this route

(Wise, 2002). Although these two enzymes share a conserved active site architecture and a similar three-dimensional structure, they catalyze different reactions in terms of substrate specificity and reaction mechanism.

Recently, there were astounding findings about the evolution of the (β/α)8 barrel

suggesting that even the (β/α)8 barrel structure itself might have been evolved from gene

duplication and fusion of (β/α)4 half-barrels. The amino acid sequences and three-

dimensional structures of HisA and HisF have been revealed to have an internal two-fold

symmetry (Fani, 1994; Lang, 2000). As this sequence duplication is not easily observable

in other closely related (β/α)8 barrel proteins, HisA and HisF are likely the most

reminiscent of the hypothetical common ancestor of all the (β/α)8 barrel proteins since 125 HisA and HisF have retained ancestral features that have been obscured in other proteins

(Copley, 2000). On the basis of these findings, it was postulated that the two enzymes have arisen from a common (βα)4-half-barrel precursor through a series of gene duplication events (Lang, 2000). In an effort to support this hypothesis, the (β/α)8 barrel structures of HisA and HisF were expressed separately as the N- and C- terminal (β/α)4 half-barrels. The N- and C- terminal (β/α)4 half-barrels of HisF form well-defined

secondary and tertiary structures, and exist predominantly as dimers when expressed

separately. Co-expression in vivo or joint refolding of these two half barrels in vitro led

to assembly of a catalytically fully active heterodimeric complex that exhibit well-

defined secondary and tertiary structures similar to that of wild type HisF (Hocker, 2001).

Moreover, a fusion protein with HisA C- terminal (β/α)4 half-barrel followed by HisF N-

terminal (β/α)4 half-barrel, as well as a fusion protein generated by duplication of the

gene encoding HisA C- terminal (β/α)4 half-barrel, can also form well-defined secondary

and tertiary structures similar to that of wild type HisF (Hocker, 2004). Besides

supporting the hypothesis that the (β/α)8 barrel structure might evolve from gene

duplication and fusion of (β/α)4 half-barrels, these results also highlighted a promising

scenario for directed evolution of new enzymatic activities: generation of novel functions

in the (β/α)8 barrel proteins by swapping of (β/α)4 half-barrels with distinct functional

properties (Henn-Sax, 2001). 126

4.1.3 Protein engineering of the (β/α)8 barrel proteins

The (β/α)8 barrel fold has been demonstrated to be a suitable scaffold for the evolution of novel enzymatic functions by either nature or protein engineers, mostly due

to its structural features. Even small changes in the (β/α)8 barrel proteins might be able to

affect their catalytic activities, such as single site mutations around the active site (Cheon,

2004), or deletion of two or three amino acid residues in the loop region (Norledge,

2001).

On the basis of the hypothesis that the (β/α)8 barrel structure might evolve from

gene duplication and fusion of (β/α)4 half-barrels, split-protein sensors, which can be

used for the analysis of protein–protein interactions in living cells, were developed by

spliting the phosphoribosyl anthranilate isomerase(Trp1p) from Saccharomyces

cerevisiae (Figure 27) (Tafelmeyer, 2004). A Trp1p library with random new N- and C- terminal ends was created by circular permutation. The library DNA fragments were ligated into the expression vector so that two polypeptides that can form an antiparallel coiled coil upon association were fused with an N- and C-terminal fragment of Trp1p upon expression. A DNA fragment containing a terminator sequence and a promoter was inserted between the original N- and C- termini of Trp1p for the separated expression of

the N- and C- terminal fragments of Trp1p. Upon the formation of coiled coil structure,

the N- and C-terminal fragments of Trp1p are forced into a tight and functional complex

that can complement an auxotrophic yeast strain EGY48 on medium without tryptophan.

Four split-Trp1ps were identified to be suitable to sense protein interaction in the cytosol,

two of which were also able to detect the interaction of membrane proteins. The position 127

Figure 27: Schematic overview of the generation of Split-Trp 1p proteins. As a starting point, a rearranged copy of the TRP1 gene (red) was created in which the original N and C termini of Trp1p were connected by a short linker containing a unique AvrII site. The linear fragment was incubated with T4 DNA ligase to circularize the gene. Treatment of the ligation mix with DNaseI resulted in randomly cut linear molecules and fragments corresponding to the size of TRP1 were isolated. Isolated fragments were cloned into a yeast expression vector containing two polypeptides that associate into an antiparallel coiled coil (green and blue boxes). It should be noted that due to the blunt-end cloning step, the majority of the clones will carry TRP1 fragments that are out of frame with one or both of the polypeptides that form the coiled coil or will be inserted in a wrong orientation into the plasmid. in yeast cells was used to insert a terminator sequence and the PGAL1 promoter (light gray box, see text for details) between the original N and C termini of Trp1p. Coexpression of the two fragments and selection for complementation of tryptophan auxotrophy of yeast cells allowed the isolation of functional split-Trp pairs. (Adapted from (Tafelmeyer, 2004)). 128 of one of the splits (Trp44) was found to be very close to the active site while in another variant (Trp204), a loop of eight highly conserved residues was deleted that was thought to be important for binding of the phosphate group of the substrate. These positions have not been predicted previously to tolerate disruption.

To lay a groundwork for engineering novel protein functions using the (β/α)8 barrel scaffold, the amino acid sequence mutability at 182 out of 250 positions in yeast

TIM was examined with respect to the protein functionality (Silverman, 2001). Based on an alignment of 43 unique TIM sequences from a wide range of species, positions that maintained a single amino acid or class of amino acids in ≥ 75% of the aligned sequences were designed phylogenetically hydrophobic or porlar, and positions that did not conserve any amino acid or physical property were designed as phylogenetically variable.

Therefore, each position in the (β/α)8 barrel was assigned into one of three groups in

terms of the conservancy of the amino acid residue identity or physical properties from

different sequences at this particular position: those are phylogenetically hydrophobic,

polar, or variable. On the basis of this phylogenetical alignment, degenerate libraries

using only seven amino acids were constructed by the use of oligonucleotide assembly and gene shuffling, followed by the functional selection in a TIM-deficient E. coli

auxotrophic stain. It was observed that residues at the interface between β-sheets and α-

helices, turn sequences, α-helix capping and α-helix stop motifs were extremely mutable.

In contrast, residues forming the central core of the β-barrel, β-sheet stop motifs, as well

as a buried salt bridge were sensitive to substitution, which mainly clustered in the C-

terminal quarter of the barrel. Additionally, 142 out of the 182 positions of TIM can be

changed to at least one amino acid of a seven amino acid alphabet (K, E, Q, F, I, L, V, 129 and A) without the sacrifice of its activity, and this simplification could greatly decrease the computational burden of the (β/α)8 barrel design.

4.1.4 N-acetylneuraminate lyase (NAL) and dihydrodipicolinate synthase (DHDPS)

In order to further improve the understanding of the evolution and structure- function relationships of the (β/α)8 barrel proteins, two (β/α)8 barrel enzymes, N-

acetylneuraminate lyase (NAL) and dihydrodipicolinate synthase (DHDPS), were

investigated by attempting to interconvert the enzymatic activities between each other

using site-directed and random mutagenesis. Both NAL and DHDPS belong to the NAL subfamily of the class I aldolase family of the aldolase superfamily of the (β/α)8 barrel proteins, which also includes D-5-keto-4-deoxglucarate dehydratase (KDGDH), trans-o- hydroxylbenzylidenepyruvate hydrolase-aldolase (HBPHA), trans-2’-carboxybenzal- pyruvate hydratase-aldolase (CBPHA), and 2-keto-3-deoxygluconate aldolase (KDGA)

(Barbosa, 2000). All these enzymes share a common step in their reaction mechanisms, the formation of a Schiff base between a strictly conserved lysine residue and the C2 carbon of the common α-keto acid moiety of the substrate.

4.1.4.1 Introduction of NAL and DHDPS

NAL (EC 4.1.3.3) catalyzes the aldol cleavage of N-acetylneurmainate (NANA) to form pyruvate and N-acetylmannosamine (ManNAc) via a Schiff base intermediate

(Figure 28). The initial step is the ring-opening of the α-anomer of NANA to its linear 130

Figure 28: The reaction (A) and proposed mechanism (B) of NAL. (A) NAL catalyzes the aldol cleavage of N-acetylneurmainate (NANA) to form pyruvate and N- acetylmannosamine (ManMAc), and also the condensation from pyruvate and N- acetylmannosamine to NANA. (B) The propose mechanism of NAL. (Adapted from (Barbosa, 2000)). 131 form, followed by the Schiff base formation, cleavage, formation of ManNAc by spontaneous ring-closure, and the release of pyruvate. Originally found in some neuraminate-producing bacteria such as Vibrio cholerae (Brug, 1959) and Clostridium perfringens (Popenoe, 1957), this enzyme is also widely distributed in animal tissues such as rat liver and brain (Comb, 1960), pig kidney (Brunetti, 1962), and bovine kidney

(Sirbasku, 1970).

N-acyl derivatives of neuraminic acid are generically called sialic acids and are important building blocks in the polysaccharide chains in the glycoproteins and glycolipids of the cell coats in eukaryotes (Reglero, 1993). NAL can be used as a key enzyme for the enzymatic determination of sialic acids (Sugahara, 1980), whose level in serum and other body fluids is known to be significant in the diagnosis of malignant and inflammatory diseases (Dnistrian, 1983; Shamberger, 1984). In pathogenic bacteria such as E. coli K1 serotypes and Neisseria meningitidis, sialic acids are incorporated into a capsular homopolymer that have been identified as pathogenic determinants (Vimr,

1995), often causing neonatal meningitis and urinary tract infections (Sarff, 1975;

Kaijser, 1977). In non-pathogenic bacteria such as E. coli, NAL plays a central role in regulating the intracellular concentration of sialic acids (Vimr, 1985a), which is toxic at high concentration (Vimr, 1985b).

E. coli NAL can catalyze either the cleavage of NANA or the condensation of

NANA from pyruvate and ManNAc (Uchida, 1984), while Clostridium perfringens and animal NAL can not catalyze the condensation reaction (Comb, 1960; Brunetti, 1962). E. coli NAL has low substrate specificities for both cleavage and condensation reactions. In the cleavage reaction, E. coli NAL is active against the NANA analogs with 132 modifications at N or C9 positions, albeit at low rates compared to that of NANA

(Uchida, 1984; Kiefel, 2000). In the condensation reaction, though pyruvate is accepted as the only donor (Kim, 1988), E. coli NAL can accept a wide variety of hexoses, pentoses, D- and L- sugars as substrates (Kong, 1995, 1998; Fitz, 1995). The ability to

catalyze the condensation reactions, combined with its low substrate specificity, makes E.

coli NAL an ideal target for the enzymatic synthesis of sialic acids and derivatives

(Wada, 2003).

DHDPS (EC 4.2.1.52) catalyzes the condensation of (S)- apartate-β-semialdehyde

[(S)-ASA] and pyruvate to form 2,3-dihydrodipicolinate (DHDP), the key first common step to the biosynthesis of (S)lysine and meso-diaminopimelate (Figure 29) (Yugari,

1965). The DHDPS reaction is under allosteric control by the feedback inhibitor (S)-

lysine (Yugari, 1962). Depending on their regulatory properties with respect with (S)-

lysine, DHDPS enzymes from different species can be categorized into three groups

(Blicking, 1997). Plant enzymes are strongly inhibited by (S)-lysine with an IC50 between

0.01 and 0.05 mM, including DHDPS from Triticum aestivivum (Kumpaisal, 1989),

Daucus carota sativa (Matthews, 1978), Nicotiana sylvestris (Ghislain, 1990), Zea mayes

(Frisch, 1991a), and Piscum sativum (Dereppe, 1992). Enzymes from Gram-negative bacteria such as E. coli (Yugari, 1965), Bacillus sphaericus (Bartlett, 1986), and

Methanobacterium thermoautotrophium (Bakhiet, 1984) are weakly inhibited by (S)- lysine with an IC50 between 0.25 and 1.0 mM. Enzymes from Gram-positive bacteria

such as B. licheniformis (Stahly, 1969), B. megaterium (Webster, 1970), B. subtilis

(Yamakura, 1974), Corynebacterium glutamicum (Cremer, 1988), B. cereus (Hoganson, 133

Figure 29: The reaction (A) and proposed mechanism (B) of DHDPS. (A) DHDPS catalyzes the condensation of (S)- apartate-β-semiadehyde [(S)-ASA] and pyruvate to form 2,3-dihydrodipicolinate (DHDP) via a Schiff base intermediate. (B) The proposed mechanism of DHDPS. (4S)-4-hydroxyl-2,3,4,5-tetrahydro-(2S)-dipicolinate (HPTA) was suggested to be the direct product of DHDPS, but undergoes spontaneous dehydration to form DHDP (Blicking, 1997). (Adapted from (Dobson, 2005)). 134

1975), and B. lactofermentum (Tosaka, 1978) appear not be inhibited at all (IC50 >

10mM).

As DHDPS is widely distributed in plants and microorganisms, but not in animals, it attracts continued attention as a target for antibiotics and herbicides (Coulter, 1999;

Cox, 2000). Additionally, as the rate-limiting step in (S)-lysine biosynthesis, DHDPS also

attracts the attention of biotechnologists aiming to engineer crops rich in (S)-lysine, which is often the limiting nutrient in staple crops (Miflin, 2000). For these reasons,

DHDPS has been studied extensively in terms of both reaction mechanism and the regulation of its activity.

DHDPS reaction was suggested to proceed via a ping-pong mechanism, which requires two stable enzyme forms with the presence of an irreversible step between the bindings of the two substrates (Karsten, 1997). For DHDPS, the first step is the binding

of pyruvate to the free enzyme. The irreversible inhibition of DHDPS activity upon the

addition of sodium borohydride only in the presence of pyruvate led to the suggestion

that a Schiff base intermediate was formed between pyruvate and an enzymatic lysine

residue (Shedlarski, 1970). This intermediate was observed by both electrospray

ionization mass spectrometry (Borthwick, 1995) and X-ray crystallography (Blickling,

1997a). The lysine residue that forms the Schiff base intermediate was identified as K161

in E. coli DHDPS (Laber, 1992). The release of water and enamine formation has been

suggested to be the irreversible step (Karsten, 1997). Then, (S)-ASA binds to the stable

substituted-enzyme, followed by dehydration and cyclization to form (4S)-4-hydroxyl-

2,3,4,5-tetrahydro-(2S)-dipicolinate (HPTA). 13C- and 1H- NMR spectroscopy studies supported that HPTA is the only product of E.coli DHDPS, and the conversion from 135 HPTA to DHDP by dehydration occurs spontaneously (Blickling, 1997a). The mechanism by which (S)-lysine exerts regulatory control over E. coli DHDPS is not well understood, although kinetic and structural studies support the proposal that (S)-lysine is an allosteric inhibitor (Laber, 1992; Blickling, 1997a, b).

According to their three-dimensional structures, both E. coli NAL and DHDPS form a homotetramer (Figure 30) (Izard, 1994; Mirwaldt, 1995). Earlier gel filtration studies predicted that DHDPS forms a homotetramer (Shedlarsky, 1970) while NAL forms a homotrimer (Uchida, 1984; Aisaka, 1991). The asymmetric unit contains the monomers 1 and 2, and the tetramer is formed together with monomers 3 and 4, which are related to 1 and 2 by crystallographic symmetry, respectively. Therefore, this homotetramer structure resembles more closely a dimer of dimers, with stronger connections between monomers 1 and 2, and rather weaker connections between monomers 3 and 4.

Each monomer of NAL or DHDPS contains an N-terminal (β/α)8 barrel domain

and a C-terminal helical domain with three helices (Figure 31) (Izard, 1994; Mirwaldt,

1995). The superposition of E. coli NAL and DHDPS monomer structures using CE

(Combinatorial extension of the optional path) reveals a root-mean-square deviation of

1.80Å for all α-carbons in the backbone (Shindyalov, 1998). As shown in Figure 28, the

N- terminal (β/α)8 barrel domain consists of the residues for catalysis and substrate

binding, and the functions of the C- terminal helical domains are unknown. The C-

terminal helical domain is fixed in its position towards the (β/α)8 barrel domain by several hydrogen bonds and ionic interactions, such as a salt bridge between R271 and

E58 in NAL, and R268 and E55 in DHDPS, which are conserved in each protein. 136

Figure 30: Ribbon representation of the E. coli NAL homotetramer bound with sialic acid alditol (A) and the structure of sialic acid alditol (B). (A) A CPK represented sialic acid alditol (yellow) is situated in the active site of each monomer (Izard, 1994). (B) Sialic acid alditol is a NANA analog, and acts as an inhibitor to NAL, with a hydroxyl group at the 2-position instead of the keto group in NANA (Deijl, 1983; Ooi, 2000).

Figure 31: Superposition of three-dimensional structures of E. coli NAL and DHDPS viewed from the C- terminal end of the barrel (A) and from the side (B), and the superposition of active sites residues in E. coli NAL and DHDPS (C). The sialic acid alditol molecule is shown in ball-and-stick representation. The superposition was done manually using Weblab Viewer Pro 3.7 by Molecular Simulations Inc. (A) and (B) NAL and DHDPS structures are ribbon represented in green and red respectively, with a sialic acid alditol situated in the active site. The catalytically conserved Lys and Tyr residues are displayed in line representation in each structure. (C) DHDPS and NAL conserved residues within the active sites are depicted in red and green line representations, respectively. The residues are indicated based on E. coli numbering. 138 4.1.4.2 The active sites of NAL and DHHPS

As in all the (β/α)8 barrel proteins, the active sites of both NAL and DHDPS are

located at the C-terminal end of the barrel, and show similar architecture based on the

superposition of the two active sites (Figure 31C).

In the NAL active site, K165 is placed within the β-barrel and its ε-amino forms

the Schiff base with the the α-keto acid at C-2 position of the substrate. Mutagenesis of

this Lys residue in the NAL from Clostridium perfringens to either Arg, Glu, or Ala

resulted in gradually decreased NAL activity (Kruger, 2001) While the mutagenesis of

this Lys residue to Arg led to a mutant that was 30 fold less active than the wild type

enzyme, the mutagenesis of this Lys residue to Gln and Ala led to a mutant that was

effectively inactive and a mutant that was completely inactive, respectively. Remarkably,

the exchange of Lys to Arg had almost no influence on the Km value, indicating that this

lysine residue might not be involved in the initial substrate binding (Kruger, 2001).

In the structures of H. influenzae NAL complexes with different substrate

analogues, the hydroxyl group of Y137 was observed to form a hydrogen bond with the

hydroxyl group at C-4 position of the substrate, as well as the carboxylate group at C-1

position of the substrate (Barbosa, 2000). Accordingly, Y137 was suggested to mediate

the proton abstraction by the carboxylate group in a substrate-assisted reaction (Barbosa,

2000). In addition, Y137 might be involved in stabilization of the side chain of K165

and the carboxylate group of the substrate (Lawrence, 1997).Changing this tyrosine

residue in Clostridium perfringens NAL to Phe, His, Trp, or Cys resulted in the single-

site mutants with severe attenuation of NAL activity, indicating the importance of Y137 139 in catalysis (Kruger, 2001). The backbone NH groups of S47 and T48 form hydrogen bonds with the carboxylate group at C-1 position of the substrate. Additionally, the hydroxyl group of S47 forms hydrogen bonds with Y137 and Y110, and the latter tyrosine residue is provided by the other monomer in the same asymmetric unit. These three residues were suggested to form a proton transfer network to shuttle protons to and from the active site, and assist Y137 for its roles in catalysis (Barbosa, 2000).

The functions of the other residues in the NAL active site have been determined based on both the structure of H. influenzae NAL bound to sialic acid alditol (Barbosa,

2000) and the kinetic study of of site-directed mutants of Clostridium perfringens NAL

(Kruger, 2001). The carboxyl oxygen atom of G189 forms a hydrogen bond with the hydroxyl group at C-4 position of the substrate. The hydroxyl group of S208 forms a hydrogen bond with the hydroxyl group at C-7 position of the substrate, as well as its backbone amine group forms a hydrogen bond with the hydroxyl group at C-6 position of the substrate. The backbone amine group of D191 forms a hydrogen bond with the hydroxyl group at C-8 position of the substrate, while the carboxylate oxygens of E192 form double hydrogen bonds with the hydroxyl groups at C-8 and 9 positions of the substrate. The N-acetyl group is oriented outwards from the active site and makes only van der Waals’ interactions with the protein, although the carbonyl oxygen atom of the

N-acetyl group forms an intra-ligand interaction with the hydroxyl group at C-7 position of the substrate (Barbosa, 2000).

In additional to its size, the charge of D191 has been shown to be necessary for substrate binding. A D191N mutant of Clostridium perfringens NAL was shown to have a 28 fold lower kcat value and a 350 fold higher Km value compared to the wild type NAL 140 (Kruger, 2001). On the other hand, the conserved D191E mutation resulted in a mutant that was nearly as active as the wild type enzyme and had a 10 fold lower substrate binding affinity (Kruger, 2001). The importance of E192 in substrate binding was confirmed by the fact that mutation of this Glu in Clostridium perfringens NAL to Asp or

Gln showed no effect on NAL activity, but significantly lowered the substrate binding affinity. The substrate specificity of the mutant was shifted towards more efficient cleavage of 5,9-N-acetylneuraminate (Neu5,9Ac2) than the wild type enzyme by the mutation from Glu to Gln (Kruger, 2001).

Based on the similarity between NAL and DHDPS active sites, the DHDPS active site residues K161, Y133, Y107, T44, T45, and G186 are presumed to have similar roles as their corresponding counterparts in NAL since these residues adopt similar configurations on the basis of the superposition between the two active sites (Figure

31C). DHDPS mutants with single mutation Y133F, T44V, or Y107F showed substantially reduced DHDPS activity, supporting their involvement in substrate binding and/or catalysis (Dobson, 2004). Beside these conserved residues common to both NAL and DHDPS, there are several residues in the DHDPS active site, such as R138, I203, and

N248, which are conserved. Situated at the entrance to the active site, the guanidine of

R138 has been shown to interact with the carboxyl groups of (S)-ASA by structural studies, suggesting that this residue might be essential for substrate binding (Blickling,

1997). Upon the refinement of E. coli DHDPS structures, the Nε atom of R138 was

observed to make a connection to the hydroxyl group ofY133 via a water molecule, while

one N atom of the terminal guanidine still retained a hydrogen bond with the backbone

oxygen of Y107 of the other monomer in the dimer (Dobson, 2005). This is an important 141 observation as it suggests that the guanidine moiety of R138 might be more intimately involved in catalysis than first thought. Although the side chain of DHDPS D188 is oriented in an almost identical geometry to that of NAL D191, the carboxylate D188 might be responsible for the interaction with the ammonium group of the (S)-ASA

(Barbosa, 2000). The distance between the phenolic hydroxy group of Y133 and the C-3 position of the bound pyruvate (>4Å) makes a catalytic role for Y133 in enamine formation unlikely, while the backbone oxygen of a conserved residue I203 is in a good position (approximately 3.5Å to C-3) for the abstraction of a proton in the C-3 position

(Blickling, 1997). In NAL, the residue corresponding I203 in DHDPS is also an Ile residue with almost identical configuration, presumably indicating similar function for these two Ile residues. Since (S)-ASA has been observed to exist in its hydrated form predominantly in aqueous solution (Tudor, 1993; Coulter, 1996), G186 and N248 are in favorable positions to coordinate the hydroxyl groups of (S)-ASA based on the structural studies (Blickling, 1997). However as the biologically relevant form of (S)-ASA remains to be determined, the functions of I203 and N248 in catalysis and substrate binding are pure speculation and require additional evidence.

On basis of the similarity of the active site residues between NAL and DHDPS,

Barbosa et. al. suggested that the residues forming the respective active sites can be partitioned into two groups. The first group contains the residues that are shared by NAL and DHDPS, and involved in the binding of the α-keto acid moiety of the substrate and the aldol cleavage/condensation reactions. This group consists of residues T44, T45,

Y107, Y133, K161, G186, and I203 from DHDPS, and S47, T48, Y110, Y137, K165,

G189, and I206 from NAL (Figure 31C). The second group contains the residues that are 142 unique to each particular enzyme, and involved in the binding with the remainder of the substrate and possibly the special catalytic steps. This group consists of residues R138 and D188 from DHDPS, and D191, E192 and S208 from NAL (Barbosa, 2000). This separation of the active sites residues based on their functions provides an economy of evolution within the NAL subfamily. The interactions associated with the common Schiff base formation and aldol cleavage/condensation reaction steps are all mediated by a common set of residues and/or structural motifs, while the residues in the second group determine each particular enzymatic reaction. Although no three-dimensional structure is available for any other member of the NAL subfamily, primary sequence analysis revealed some conserved residues in each enzyme, which could be assigned into the second group. For instance, a conserved residue D181 in KDGA was proposed to fulfill an identical role in KDGA to D191 in NAL, and a R143 in KDGDH might play a similar carboxylate-binding role to R138 in DHDPS (Barbosa, 2000).

While the amino acid sequence identity between E. coli NAL and DHDPS is only

23.5%, the sequence homology between these two proteins reaches 51%, with a 53% homology between the N-terminal (β/α)8 barrel domains and a 43% homology between the C-terminal helical domains. Most of the conserved residues between NAL and

DHDPS are located in the loop regions, and the N- terminus of the α-helices. As shown

in Figure 32, most of the active site residues are located in the loops from β-strands to the

subsequent α-helices, while only the catalytically essential Lys and Tyr residues are in

the middle of β6 and β5 strands. Furthermore, most of the active site residues in the

second group are situated in the β7 and β8 loops, with only one exception that R138 in

DHDPS is in the β5 loop. 143

Figure 32: Structure-based alignment of amino acid sequences of Escherichia coli DHDPS and NAL (A), the β7 loop (B), and the β8 loop(C). The alignment was done using CE (combinatorial extension of the optimal path) (Shindylalov, 1998). Secondary structural elements found in NAL are shown above the alignment. Conserved residues in each enzyme are masked in gray. Identical residues between different enzymes are marked by red asterisk. The active site residues common to both enzymes (the first group) are displayed in red, while the active site residues peculiar to each enzyme (the second group) are displayed in blue.

144 In an initial attempt to interconvert the enzymatic activity between E. coli NAL and DHDPS, the residues in the second group of each enzyme were targeted and mutated using site-directed and saturation mutagenesis, followed by the characterization of the effects of these mutations on enzymatic activities. Then, additional sequence space was explored in E. coli NAL and DHDPS genes using random mutagenesis, ITCHY, and

SCRATCHY, to interconvert the enzymatic activity between these two enzymes. 145

4.2 Experimental:

4.2.1 Materials:

Restriction enzymes, T4 DNA ligase, and Calf Intestinal Alkaline Phosphatase were obtained from New England Biolabs (Ipswich, MA). pET22b and pET16b vectors were obtained from Novagen (Madison, WI). pDIM-PGX vector was obtained from Dr.

Stefan Lutz (Lutz, 2001b). NuSeive agarose was obtained from VWR (West Chester,

PA). Taq polymerase was obtained from Promega (Madison, WI). Pfu turbo polymerase was obtained from Stratagene (La Jolla,CA). DnaseI, dNTPs and complete, EDTA-free protease inhibitor were obtained from Roche (Indianapolis, IN). The QIAprep, QIAXII gel extraction kit, QIA quick Gel and PCR purification kit, QIA plasmid preparation kit, and Ni-NTA agarose were obtained from Qiagen (Valencia, CA). Diaminopimelic acid

(DAP), N-acetyl mannosamine (ManNAc), N-acetyl neuraminic acid (NANA), E. coli N- acetyl neuraminate lyase (NAL), Dowex 1×8-200 ion-exchange resin, amberlite IR-

120(+) ion-exchange resin, PK/LDH from rabbit liver and lysozyme were obtained from

Sigma (St. Louis, MO). N-[acetyl-1-14C]-mannosamine, was obtained from Moravek

Biochemicals Inc. (Brea, California). N-Acetylneuraminic acid aldolase (NAL) was also

obtained from TOYOBO ENZYMES (Osaka, Japan). N-acetyl neuraminic acid (NANA)

was also obtained from Jülich Fine Chemicals GmbH (Jülich, Germany). HiTrapTM

DEAE column, HiTrapTM Q column and Superose12 gel filtration column were obtained

from Amersham-Pharmacia Biotech (Piscataway, NJ). (S)-ASA was synthesized as 146 previously described (Coulter, 1996). BSA, all protein gel electrophoresis agents and molecular weight markers were obtained from Bio-Rad (Hercules, CA). All DNA oligonucleotides were ordered from Integrated DNA Technologies (Coralville, IA). All other materials were purchased from commercial sources and were of the highest available quality.

4.2.2 Bacterial Strains:

DH-5α and BL21(DE3) were obtained from Invitrogen (Carlsbad, CA), and

Novagen (Madison, WI), respectively. The E. coli DHDPS auxotrophic strain, AT997, was obtained from the E. coli Genetics Stock Center, Yale Univeristy (Bukhari, 1971).

Strain AT997 was maintained on LB medium supplemented with 50 µg/ml diaminopimelic acid (DAP). DHDPS(-), E. coli K-12 MG1655 with chromosomal dapA deleted, and NAL(-), E. coli K-12 MG1655 with chromosomal nanA deleted, and

BL21(DE3) nanA-, E. coli BL21(DE3) with chromosomal nanA deleted, were

constructed by a one-step knockout method (Datsenko, 2000).

4.2.3 Methods:

Construction of NAL and DHDPS mutants by site-directed mutagenesis

All NAL and DHDPS mutants were constructed by overlap extension PCR using

pDIM-NAL or pDIM-DHDPS as templates (Ho, 1989). In the first PCR, the N-terminal

and C-terminal fragments were amplified separately, with corresponding inside and 147 outside primers. For N-terminal fragments, the outside primer introduced an NdeI site at

5’ end to facilitate cloning and a start codon. For C-terminal fragments, the outside primer introduced a SpeI site at 3’ end. PCR amplification was performed with 50ng

DNA template, 0.2mM dNTPs, 1×PCR buffer, 2mM MgCl2, 1µM primers, and 2 units

Taq/Pfu polymerase in a total volume of 100µl. Reaction conditions were 5 min at 94°C

followed by 30 cycles of 30 seconds at 94°C; 30 seconds at 55°C; 1 min at 72°C,

followed by 10 min at 72°C. After purification with the QIAquick PCR purification kit,

the PCR products were quantified by OD260 for the next step. In the second PCR, equal

amounts of N-terminal and C-terminal fragments were combined and amplified using

overlap extension to obtain full length fragment under identical conditions to the first

PCR. The full length fragments were cloned into pDIM-PGX vector using NdeI and SpeI sites and their sequences were confirmed by DNA sequencing (Lutz, 2001). The mutants were also subcloned into pET22b vector via NdeI and HindIII sites for protein overexpression and purification. Primers used for PCR are shown in Table 12.

Construction of NAL and DHDPS hybrid proteins

The four hybrid enzymes NALαβ8, DHDPSαβ4, DHDPSαβ6, and DHDPSαβ8

were constructed by overlap extension PCR using their respective inside and outside

primers listed in Table 12, as described above (Ho, 1989). NALαβ8 has the NAL N-

terminal (β/α)8 barrel domain (NAL 1-227) followed by the DHDPS C- terminal helical

domain (DHDPS 225-292). DHDPSαβ4 has the first (β/α)4 unit from DHDPS (DHDPS

1-129) followed by NAL 134-297. DHDPSαβ6 has the first (β/α)6 unit from DHDPS

(DHDPS 1-180) followed by NAL 184-297. DHDPSαβ8 has the DHDPS N-terminal

(β/α)8 barrel domain (DHDPS 1-224) 148

Table 12: Primers used in Chapter 4. Primers Primer Sequences (5’- 3’) Outside Primers NAL(NdeI) Forward GGAATTCCATATGGCAACGAATTTACGTGGCGTA DHDPS(NdeI) Forward GGAATTCCATATGTTCACGGGAAGTATTGTCGCG NAL (HindIII) Reverse CCCAAGCTTCCCGCGCTCTTGCATCAACTG DHDPS(HindIII) Reverse CCCAAGCTTCAGCAAACCGGCATGCTTAAG NAL (SpeI) Reverse CCACTAGTTCACCCGCGCTCTTGCATCAACTG DHDPS(SpeI) Reverse CCACTAGTTTACAGCAAACCGGCATGCTTAAG DHDPR (NdeI) Forward GGAATTCCATATGCATGATGCAAACATCCGCGTT DHDPR (BamHI) CGGGATTCTTACAAATTATTGAGATCAAGTAC Reverse Insider Primers NAL L142R Forward TACAACATTCCAGCCAGGAGTGGGGTAAAACTG NAL L142R Reverse CAGTTTTACCCCACTCCTGGCTGGAATGTTGTA NAL E192A Forward AACGGTTACGACGCAATCTTCGCCTCT NAL E192A Reverse AGAGGCGAAGATTGCGTCGTAACCGTT NAL I247F Forward GATTTACTGTTCAAAACGGGCGTA NAL I247F Reverse TACGCCCGTTTTGAACAGTAAATC NAL V251N Forward CTGATCAAAACGGGCAATTTCCGCGGCCTGAAA NAL V251N Reverse TTTCAGGCCGCGGAAATTGCCCGTTTTGATCAG DHDPS R138L Forward GTGCCGTCCCTTACTGGCTGCGAT DHDPS R138L Reverse ATCGCAGCCAGTAAGGGACGGCAC DHDPS A189E Forward CGGCGATGATGAGAGCGCGCTGG DHDPS A189E Reverse CCAGCGCGCTCTCATCATCGCCG DHDPS V205S Forward GGGGTTATTTCCAGTACGACTAACGTC DHDPS V205S Reverse GACGTTAGTCGTACTGGAAATAACCCC DHDPS F244A Forward AACAAACTAGCTGTCGAACCCAAT DHDPS F244A Reverse ATTGGGTTCGACAGCTAGTTTGTT DHDPS F244I Forward AACAAACTAATTGTCGAACCCAAT DHDPS F244I Reverse ATTGGGTTCGACAATTAGTTTGTT DHDPS D187Y A189E TTTGTTCTGCTGAGCGGCTACGATGAAAGCGCGCTGGA Forward CTTCATGCAATTGGGCGGT DHDPS D187Y A189E GAAGTCCAGCGCGCTTTCATCGTAGCCGCTCAGCAGAA Reverse CAAA DHDPS V202GS204G CAATTGGGCGGTCATGGGGGTATTGGCAGTACGTATAA V205ST207Y Forward CGTCGCAGCGCGT DHDPS V202GS204G ACGCGCTGCGACGTTATACGTACTGCCAATACCCCCATG V205ST207Y Reverse ACCGCCCAATTGCATGAAGTCCAGCGC DHDPS V202GS204G ATGCAATTGGGCGGTCATGGGGGTATTGGCAGTACGACT V205S Forward AACGTCGCAGCG DHDPS V202GS204G CGCTGCGACGTTAGTCGTACTGCCAATACCCCCATGACC V205S Reverse GCCCAATTGCAT DHDPS β78 Lib Forward GATTTTGTTCTGCTGAGCGGCNNSNNSNNSAGCGCGCTG GACTTCATGCAATTGGGCGGTCATGGGNNSATTNNSNNS ACGNNSAACGTCGCAGCGCGT DHDPS β78 Lib Reverse GCCGCTCAGCAGAACAAAATC 149 DHDPS αβ4 Forward GCTGAGCATACTGACCTGCCGATGGTGGTGTACAACATTCCA DHDPS αβ4 Reverse TGGAATGTTGTACACCACCATCGGCAGGTCAGTATGCTCAGC DHDPS αβ6 Forward AAAGAGCTGGTTTCAGATGATCTTGTGCTCTATAACGGTTAC DHDPS αβ6 Reverse GTAACCGTTATAGAGCACAAGATCATCTGAAACCAGCTCTTT DHDPS αβ8 Forward TGCAAACTGGCAGCAGAAGTTGATATCCAGACCGCGAGAAA DHDPS αβ8 Reverse TTTCTCGCGGTCTGGATATCAACTTCTGCTGCCAGTTTGCA NAL αβ8 Forward GTTAAGGCGCTGAAAGAAGGCCATTTTGCCGAGGCACGCGTT NAL αβ8 Reverse AACGCGTGCCTCGGCAAAATGGCCTTCTTTCAGCGCCTTAAC

Table 12 (Continued) 150 followed by the NAL C- terminal helical domain (NAL 228-297). The full length fragments were cloned into pDIM-PGX vector using NdeI and SpeI sites and their sequences were confirmed by DNA sequencing (Lutz, 2001).

Construction of DHDPS library with randomized residues at position 187, 188, 189, 202,

204, 205, and 207

The dapA codon 187, 188, 189, 202, 204, 205, and 207 were randomized by the

PCR overlapping extension method using pDIM-DHDPS as templates (Ho, 1989). In the first PCR, the N-terminal and C-terminal fragments were amplified separately, with the

DHDPS(NdeI) Forward and DHDPS β78 Lib Forward primers as the 5’primers, and the

DHDPS(SpeI) Reverse and DHDPS β78 Lib Reverse primers as the 3’primers (Table

12). The desired codons were replaced by NNS in the DHDPS β78 Lib Forward primer, where N represents equal molar mixtures of all four bases and S represents an equal molar mixture of G and C. PCR amplification was performed with 50ng DNA template,

0.2mM dNNPs, 1×PCR buffer, 2mM MgCl2, 1µM primers, and 2 units Taq/Pfu

polymerase in a total volume of 100µl. Reaction conditions were 5 min at 94°C followed

by 30 cycles of 30 seconds at 94°C; 30 seconds at 55°C; 1 min at 72°C, followed by 10

min at 72°C. After purification with the QIAquick PCR purification kit, the PCR

products were quantified by OD260. In the second PCR reaction, equal amounts of N-

terminal and C-terminal fragments were combined and amplified using overlap extension

to obtain full length fragment under identical conditions to the first PCR. The full length

fragments were cloned into pDIM-PGX vector using NdeI and SpeI sites, and then

transformed into DH5α-E strain.

151 Size and in-frame selection

Two ITCHY libraries between NAL and DHDPS (pDIM-NDX and pDIM-DNX) with more than 8 × 106 independent members were created using THIO-ITCHY by

Seunggoo Lee, a former post-doc in our group. The starting construct for pDIM-DNX

library had the DHDPS sequence (1-224), and followed by the NAL sequence (72-297),

while the starting construct for pDIM-NDX library had the NAL sequence (1-227), and

followed by the DHDPS sequence (69-292). The overlapping regions for both libraries

consisted of 153 amino acid resides, and covered the last six (β/α) units of the barrel

domain.

The gene fragments were isolated from these two ITCHY libraries by restriction

digestion of NdeI and SpeI. After agarose gel analysis, the DNA fragments with similar

size to the wild type genes were separated and purified, followed by the ligation into

pDIM-N6 vector for in-frame selection.

After ligation into pDIM-N6 vector, the gene fragments from size selection were

put in the front of a kanamycin resistance gene so that only the in-frame hybrid proteins

can be expressed as a fusion protein with neomycin , therefore, exhibit

a kanamycin resistance. The in-frame selection was carried out on LB agar plates

containing 15 µg/ml Kanamycin, 100 µg/ml ampicillin, and 100 µM IPTG.

DNA shuffling

After size selection and in-frame selection, the two hybrid gene libraries

(pDIMN6-DNX and pDIMN6-NDX) were individually amplified by PCR with Pfu DNA

polymerase (Stratagene), using the following primer pair: T3, 5'-

ATTAACCCTCACTAAAGGGA-3'; and Cneorev: 5'-TAGCCGAATAGCCTCTCCAC- 152

3'. After QIAquick PCR purification, the products were quantified by OD260 measurement

and mixed to equal amounts. Three micrograms of DNA mixture with 1:1 from each

library were shuffled essentially as described (Zhao, 1997) with some modifications. The

DNA mixture was diluted to 45 µl 10 mM Tris-HCl (pH 7.4) and 5 µ1 10× digestion

buffer (500 mM Tris-HCl pH 7.4, 100 mM MnCl2) was added. This mixture was

equilibrated at 15oC for 5 min on a thermocycler before 0.30 U DNase I (10 U/µl; Roche) was added. The digestion was done at 15oC and terminated after 2 min by adding 5 µl 0.5

M EDTA. The size of fragments was confirmed on a 2% NuSeive agarose gel before purification using QIAXII gel extraction kit (Qiagen). Spin-column purified fragments

(10 µl) were added to 10 µl 2 × PCR premix [5-fold diluted cloned Pfu buffer, 0.4 mM each dNTP, 0.06 U cloned Pfu polymerase (Stratagene, La Jolla, CA)]. The DNA fragments were self-assembled under the conditions: 3 min 96oC followed by 40 cycles

of 1 min 94oC, 1 min 55oC, 1 min + 5 s/cycle 72oC, followed by 7 min at 72oC. One microlitre of this reaction was used as template in a 25-cycle PCR reaction in the presence of gene-specific primer pairs. PCR conditions (100 µl final volume): 30 µmol each primer, 1× Taq buffer, 0.2 mM each dNTP and 2.5 U Taq/Pfu (1:1) mixture. PCR program: 2 min 96oC, 10 cycles of 30 s 94oC, 30 s 55oC, 45 s 72oC, followed by another

14 cycles of 30 s 94oC, 30 s 55oC, 45 s + 20 s/cycle 72oC, and finally 7 min 72oC. This second PCR reaction yielded product bands of the original gene size, which then were

cloned into the pDIM vector via NdeI and SpeI sites. Their sequences were confirmed by

DNA sequencing.

153 Error-prone PCR

The dapA gene was randomized using the error-prone PCR method previously described (Matsumura, 2001). The gene fragment from pDIM- vector was amplified

using the primer pair: T3, 5'-ATTAACCCTCACTAAAGGGA-3'; and pDIMrev: 5'-

ATAAGGGCGACACGGAAATG-3'. The reaction mixture (total 50 µl) contained 1×

PCR buffer (60 mM Tris-Cl, pH8.5, 15 mM (NH4)2SO4, 2 mM MgCl2), 200 µM dNTPs,

1 µM primers, 10-50 ng template, 2.5 U Taq, and various amount of 10× mutagenic

buffer (8 mM dTTP, 8 mM dCTP, 48 mM MgCl2, 5 mM MnCl2). PCR reaction

conditions were 5 min at 94°C followed by 25 cycles of 30 seconds at 94°C; 30 seconds at 55°C; 1 min at 72°C, followed by 10 min at 72°C. Then Taq polymerase was

eliminated by 50 µg/ml after the addition of 5 mM EDTA and 0.5% sodium

dodecyl sulfate (SDS) for 15 min at 65 ˚C. After the purification with the QIAquick PCR

purification kit, the PCR products were quantified by OD260 and cloned into pDIM-PGX

vector using NdeI and SpeI sites, and then transformed into DH5α-E strain.

Screening of DHDPS and NAL activities by in vivo complementation

The pDIM plasmids containing the gene of mutants were purified and

electroporated into E. coli auxotrophic strains. The selective medium for DHDPS activity

contains M9 salt, 0.2% glucose, 1.5 % agar, 2 mM MgSO4, 100 µM CaCl2, 100 µg/ml

ampicillin, and 0.3 mM isopropyl -D-thiogalactoside (IPTG). The selective medium for

NAL activity contains M9 salt, 0.2% NANA, 1.5 % agar, 2 mM MgSO4, 100 µM CaCl2,

100 µg/ml ampicillin, and 0.3 mM isopropyl -D-thiogalactoside (IPTG). For each

transformation, three single colonies were picked and streaked a “X” shape on a LB plate

with 100 mg/ml ampicillin to make a master printing plate. After incubation at 37°C 154 overnight, the cells on master plates were replica printed onto selective plates. Selections were performed at 30°C for up to 48 hr. To affirm the complementation, the cells that

appear on selective plates were restreaked on new selective plates to obtain single

colonies, from which the plasmids were extracted and retransformed into auxotrophic

strains, followed by the same selection procedure. pDIM plasmids containing wild type

dapA and nanA genes were used as controls and included on each selective plates.

Plasmids extracted from first and second selections were sequenced to confirm no

sequence changes. All DNA sequencing was performed at the Nucleic Acids Facility of

Pennsylvania State University.

Overexpression of proteins with His6 tag

The pET vectors containing desired protein genes were transformed into E. coli

BL21 (DE3) strain for overexpression. 1% overnight culture of individual colonies was

inoculated into LB media with 100 µg/ml ampicillin and grown to OD600 ~0.5 at 37°C.

The cultures were transferred to 18°C and induced with 0.3 mM isopropyl -D- thiogalactoside (IPTG). After 24 hr shaking at 18°C, the cells were centrifuged and pellets were stored in -70°C.

Purification of proteins with His6 tag from cell lysate

The primary purification of the overexpressed mutated proteins was performed by affinity chromatography, using the His tag attached to the N- or C- terminal of the protein. Harvested cells were suspended in 40 ml of lysis buffer (50 mM Sodium phosphate buffer, pH 8.0, 300 mM NaCl, and 10 mM imidazole) containing 50 mg/ml complete protease inhibitor and 1mg/ml egg white lysozyme (sigma) and subjected to two rounds of freeze and thaw, followed by sonication. After centrifugation, the 155 supernatant was loaded onto a Ni-NTA resin column pre-equilibrated with lysis buffer and washed 12 column volume of wash buffer (20 mM imidazole in lysis buffer). The protein with His tag was eluted with 4 column volume of elution buffer (250 mM imidazole in lysis buffer) and 1 ml fraction was collected. The fractions containing desired proteins were pooled together and dialyzed overnight against fresh lysis buffer to remove imidazole. The protein solution was then loaded onto a HiTrapTM DEAE column

(5 mL bed volume) pre-equilibrated in buffer X (20 mM Tris-HCl pH 7.4, 100 mM

NaCl). After the elution by a linear gradient of 0 – 1M NaCl, fractions containing desired proteins were verified by SDS-PAGE and pooled accordingly. The purified protein, at greater than 95% homogeneity, was dialyzed to remove salt and concentrated using

Centricon spin filters (MWCO 10K, Amicon, Bedford, MA), then stored frozen at -70°C.

Protein concentrations were determined by the Bradford analysis against BSA.

Purification of proteins with His6 tag from inclusion body

Several DHDPS selectants are found to form inclusion body after being cloned into pET22b vector and overexpressed in E. coli BL21(DE3), but were purified by

refolding after solubilization in urea.

The primary purification was still performed by affinity chromatography, using

the His tag attached to the N- or C- terminal of the protein. Harvested cells were lysed

under the same conditions for soluble proteins, as described previously. After

centrifugation, the pellet was solubilized in buffer B (100 mM NaH2PO4, pH 8.0, 10 mM

Tris-Cl, and 8 M urea) at 5 ml per gram wet weight by gently vortexing. After

centrifugation, 50% Ni-NTA resin (Qiagen) was added into the supernatant at 1 ml resin

to 4 ml lysate and mixed by gently shaking for 60 min at room temperature. Then, the 156 resin-lysate was loaded onto an empty column. After the collection of the flow-through, the resin was washed by 3 column volume of buffer C (100 mM NaH2PO4, pH 6.3, 10

mM Tris-Cl, and 8 M urea) for three times. The protein with His tag was eluted with 1

column volume of buffer D (100 mM NaH2PO4, pH 5.9, 10 mM Tris-Cl, and 8 M urea)

for four times, and then with 1 column volume of buffer E (100 mM NaH2PO4, pH 4.5,

10 mM Tris-Cl, and 8 M urea) for four times. The pH of each buffer was adjusted using

HCl right before use. The fractions containing desired proteins were verified by SDS-

PAGE and pooled together.

The proteins were refolded using quick dilution method. The proteins solubilized in urea was added dropwise into refolding buffer (50 mM Tris-Cl, pH 7.4, 100 mM NaCl,

1 mM EDTA) containing 50 mg/ml complete protease inhibitor while gently stirring at 4

˚C, and kept overnight at 4 ˚C. The solution was centrifuged to remove aggregated proteins due to incorrect folding, followed by the concentration using amicon concentrator (Millpore). The protein solution was then loaded onto a Superose12 gel filtration column (24 mL bed volume) pre-equilibrated in buffer X (20 mM Tris-HCl pH

7.4, 100 mM NaCl). Fractions containing hybrid proteins were verified by SDS-PAGE and pooled accordingly. The purified protein, at greater than 95% homogeneity, was dialyzed to remove salt and concentrated using Centricon spin filters (MWCO 10K,

Amicon, Bedford, MA), then stored frozen at -70°C. Protein concentrations were determined by the Bradford analysis against BSA.

Cloning and purification of dihydrodipicolinate reductase (DHDPR)

Dihydrodipicolinate reductase (DHDPR) catalyzes the conversion from NADH and DHDP to NAD+ and 2,3,4,5-tetrahydrodipicolinate (THDP), the step next to DHDPS 157 reaction in the biosynthesis of lysine (Scapin, 1995). E. coli dapB gene that encodes

DHDPR was amplified from E. coli K12 genomic DNA using two gene specific primers

(Table 12), and cloned into pET16b vector via NdeI and BamHI sites. DHDPR with an

N- terminal His tag was purified using affinity chromatography, as described previously, then stored frozen at -70°C. Protein concentrations were determined by the Bradford analysis against BSA.

Enzymatic synthesis of N-[acetyl-1-14C]-neuraminic acid (14C-NANA)

N-[acetyl-1-14C]-neuraminic acid (14C-NANA) was synthesized in a reaction

mixture containing100 mM Tris-Cl pH7.7, 1mg/ml BSA, 1 mg/ml sodium azide, 0.5 M

pyruvate, 10 µCi N-[acetyl-1-14C]-mannosamine (Moravek), and 5 U NAL (TOYOBO)

with a total volume of 200 µl. The reaction was carried out at 37˚C for 4 hours. After diluted with 10 ml water, the reaction mixture was loaded onto a Dowex 1X8-200

(formate form) anion-exchange column so that negatively charged NANA binds to the resin and neutral ManNAc flows through. The column was washed with 30 ml water twice, followed by the elution with 30 ml 5 mM ammonium formate. After mixing with 3 gram amberlit IR-120(+) ion-exchange resin, 14C-NANA was lyophilized to dryness.

Dried 14C-NANA was dissolved in 2 ml water and loaded onto a HiTrapTM Q column (5

mL bed volume) pre-equilibrated in water. The column was washed with 7.5 ml water

and 7.5 ml 5 mM sodium bicarbonate, and then was eluted with a linear gradient from 5-

25 mM sodium bicarbonate followed by 7.5 ml 25 mM sodium bicarbonate. Fractions

containing 14C-NANA were verified using a LS 6500 scintillation counter (Beckman) and

pooled accordingly. 14C-NANA was lyophilized to dryness after mixing with amberlit IR- 158 120(+) ion-exchange resin. The final product 14C-NANA was dissolved in 1ml water and was found to be kinetically active against E. coli NAL (sigma).

In vitro kinetic assay for DHDPS activity

The DHDPS activity was determined using coupling reaction by DHDPR, the next enzyme in the biosynthesis of lysine (Yugari, 1965). Initial velocity data were collected on a Cary 1 spectrophotometer (Varian, Palo Alto, CA) equipped with thermospacers and connected to a circulating water bath to maintain a constant temperature of 25 ˚C. A total 100 µl reaction mixture contained 50 mM Tris-Cl, pH 7.4,

0.2 mM NADH, 10 mM pyruvate, 20 µg DHDPR, varied concentrations of the enzymes, and varied concentrations of ASA. DHDPR was included in the assays as a coupling enzyme at sufficient quantities to give linear reaction progress curves in all cases.

Pyruvate was kept at saturating concentration to obtain the kinetic parameters for ASA.

After incubation for 5 min, pyruvate was added to initiate the reaction. The progress of the reactions was monitored by following the reduction of dihydrodipicolinate by NADH

-1 -1 at 340 nm (ε340 = 6.220 mM cm ) as catalyzed by DHDPR. The Michaelis constants for

each substrate were determined by fitting the initial velocity data at different substrate concentrations to double reciprocal plot or the Michaelis-Menten equation using the program KaleidaGraph from Synergy Software.

Inhibition of DHDPS activity by NANA

The inhibitory effect of NANA on the DHDPS activity was determined using the coupling DHDPR reaction, as described above. A total 100 µl reaction mixture contained

50 mM Tris-Cl, pH 7.4, 0.2 mM NADH, 0.02 mM pyruvate, 0.025 mM ASA, 20 µg

DHDPR, 1 µg DHDPS, and varied concentrations of NANA. The DHDPS activities 159 under various concentration of NANA were determined, and compared with that without

NANA.

In vitro kinetic assay for NAL activity

The NAL activity was determined either using coupling reaction by lactate dehydrogenase (LDH) (Comb, 1962) or using 14C-NANA.

In the coupling reaction, initial velocity data were collected on a Cary 1

spectrophotometer (Varian, Palo Alto, CA) equipped with thermospacers and connected

to a circulating water bath to maintain a constant temperature of 25 ˚C. A total 100 µl

reaction mixture contained 20 mM potassium phosphate, pH 7.2, 0.2 mM NADH, varied

concentrations of the enzymes and NANA (Julich Fine Chemicals). After incubation for

5 min, NANA was added to initiate the reaction. The progress of the reactions was

-1 monitored by following the reduction of pyruvate by NADH at 340 nm (ε340 = 6.22 mM cm-1) as catalyzed by LDH. The Michaelis constants for each substrate were determined

by fitting the initial velocity data at different substrate concentrations to double reciprocal

plot or the Michaelis-Menten equation using the program KaleidaGraph from Synergy

Software.

In the radioactive assay, 14C-NANA was used for its higher sensitivity. A total

100 µl reaction mixture contained 20 mM potassium phosphate, pH 7.2, 1 mg/ml BSA,

0.2 mM NANA (Julich Fine Chemicals), 0025 µCi 14C-NANA, and varied concentrations

of the enzymes. After incubation for 5 min, the enzymes were added to initiate the

reaction. At different time points, an aliquot of the reaction mixture (usually 30 µl) was

taken, and added into 1ml water. After loaded onto a Dowex 1X8-200 (formate form)

column in a Pasteur pipet, the flow-trough was collected. Then the column was washed 160 twice with 3 ml water, and eluted with 4 ml 200 mM sodium formate. The radioactivity of each fraction was determined using a LS 6500 scintillation counter (Beckman).

Because ManNAc generated from the cleavage of NANA was predominantly in the fractions of the flow-through and the first wash, the amount of NANA being cleaved was calculated to determine the NAL specific activity (µmol.min-1.mg-1) of each protein.

Analysis of in vitro DHDPS activity by HPLC

The DHDPS activity of NAL was examined using HPLC to detect the production

of DHDP. The reaction mixture (total 500 µl) contains 100 mM HEPES, pH 7.5, 20 mM

pyruvate, 2.2 mM (S)-ASA, 1 mg/ml BSA, and 6 mg/ml NAL(purified) or 0.25 mg/ml

NAL(sigma). After incubation at room temperature for 5 min, NAL was added to start the reaction. 150 µl aliquot was taken at 10 min, 30 min, and 1 hr from NAL(purified) reaction or at 2 hr and 5 hr from NAL(sigma) reaction, and stored in -70 ˚C to stop the reaction. After thawing, the proteins were removed from the reaction mixture by centrifugation using Centricon MWCO10 (Millipore). 50 µl of each sample was injected into a reverse phase C18 column, and eluted with a linear gradient of 100%-5% acetronitrile on a water 600 controller and pump (Millipore). 270 nm wavelength was used to detect DHDP because DHDP has its maximum absorbance at 270 nm (Yugari,

1965). An assay with the wild type DHDPS (0.1mg/ml) was also included as control.

4.3 Results and Discussion

Development of DHDPS activity in NAL scaffold by rational site-directed mutagenesis 161 In our attempt to incorporate DHDPS activity into NAL scaffold, five NAL mutants, L142R, I247F, L142R/E192A, L142R/I247F and L142R/V251N, were created by site-directed mutagenesis on the basis of the superposition of NAL and DHDPS active sites. The first target residue, L142, is located on a short helical turn within β5 loop

(Figure 31 and 32). DHDPS has a conserved Arg residue at this position, which has been suggested to be involved in binding to (S)-ASA (Blickling, 1997) and catalysis (Dobson,

2005). The second target residue is E192, which is located within β7 loop. The residues in this loop are involved in binding NANA in NAL (Barbosa, 2000), while they are responsible for the coordination of the amino group of (S)-ASA in DHDPS (Blickling,

1997).The third target residue, A251, is located the loop connecting α9 and α10 helix, whose counterpart (N248) in DHDPS is supposed to coordinate the hydroxyl groups of hydrated (S)-ASA (Blickling, 1997). Mutation I247F at the C terminus of α9 helix was introduced to decrease the accessible volume of NAL active site to simulate DHDPS active site.

The DHDPS activity of these NAL mutants was first tested in E. coli auxotrophic strains, AT997 and DHDPS(-), by in vivo complementation on minimal medium. When the bacterial auxotrophic strains were complemented with the gene of DHDPS wild-type in vector pDIM, the cells reproducibly grew on minimal medium after incubation for

1.5 days. None of the five NAL mutants tested was found to be able to complement the

growth of either AT997 or DHDPS(-) on minimal medium after 3 days incubation under

all conditions tested, such as different incubation temperatures, different expression

vectors. The expression levels of all tested mutants in auxotrophic strains were in the same range, as verified by SDS-PAGE gel. 162 In order to determine the kinetic parameters of the NAL mutants, their gene fragments were subcloned into pET22b vector via NdeI and HindIII sites and purified with a C-terminal His tag. All but L142R/E192A were still able to catalyze the original activity, although less efficiently than the wild type NAL (Table 1). Mutation of E192A led to complete loss of NAL activity, indicating its important role for NAL activity.

Combined with the findings that changing this Glu residue in Clostridium perfringens

NAL to Asp or Gln showed no effect on NAL activity, but significantly lowered the substrate binding affinity (Kruger, 2001), it can be concluded that both the charge and the size of G192 residue are important for substrate binding. For L142R, the decrease in

NAL activity was mainly caused by a decrease in kcat while Km for NANA was slightly

affected by the mutation. The decreases in both kcat and NANA binding of I247F and

L142R/I247F suggested that changing the accessible volume of the active site might

affect substrate binding, but catalytic activity could be affected even more due to the

change. Compared to L142R, L142R/V251N exhibited a two fold decrease in Km for

NANA while kcat was slightly affected by the mutation. The decreases in kcat and NANA

binding are consistent with a stepwise perturbation of the active site.

The in vitro DHDPS activity of each protein was tested using coupling DHDPR

reaction (Yugari, 1965), and the results are shown in Table 13. The pyruvate binding

affinity were not determined due to the limited amount of (S)-ASA available.

Interestingly, the wild type NAL showed a very weak DHDPS activity. However, both the turnover tate and the binding affinity for (S)-ASA were much lower, and the

specificity constant kcat /Km for (S)-ASA was several orders of magnitude lower than that

of the wild type DHDPS. This activity promiscuity in NAL, and its tolerance of a wide 163

Table 13: Kinetic parameters of the NAL activities of the wild type NAL and DHDPS, and the NAL mutants. NAL activity DHDPS activity

Km kcat/Km Relative kcat kcat Km (ASA) enzyme -1 (NANA) (NANA) -1 k /K (s ) (s ) (mM) cat m (mM) (mM-1· s-1) (ASA) NAL 30.4±1.7 1.94±0.15 15.7 0.08±0.002 8.65±1.7 1

DHDPS NA NA NA 16.3±0.6 0.26±0.03 6800

L142R 3.4±0.3 1.53±0.16 2.2 0.53±0.03 3.07±0.43 19

I247F 0.51±0.03 4.56±0.36 0.11 NA NA NA

L142R/E192A NA NA NA NA NA NA

L142R/I247F 0.53±0.02 6.13±0.37 0.09 0.09±0.001 5.81±1.21 1.7

L142R/V251N 2.95±0.07 3.54±0.26 0.83 0.31±0.01 6.77±0.41 5.0

NA, not applicable.

164 range of aldoses as substrates for its condensation reaction, are features typical of a multi- potent ancestral enzyme. The complete loss of DHDPS activity in I247F was consistent with its significant decrease in NAL activity, indicating change in the accessible volume of the active site might substantially affect protein activity. L142R has an increased

DHDHPS activity mostly caused by an increase in kcat, whereas the binding affinity for

(S)-ASA was slightly improved. This result was also in accord to the prediction that R138

in DHDPS might play a role in catalysis, as well as substrate binding, based on the

structural data (Dobson, 2005). The significance of this Arg residue to DHDPS activity

was also demonstrated by the recovery of DHDPS activity in L142R/I247F due to the

introduction of L142R mutation, while I247F was found to be lack of DHDPS activity.

Compared to L142R, L142R/I247F has an decreased DHDHPS activity mostly caused by

an decrease in kcat, whereas the binding affinity for (S)-ASA was slightly impaired, which

suggests that change in the accessible volume of the active site might have a more serious

impact on catalysis than substrate binding, and might cause conformational changes of

catalytically essential residues. Compared to L142R, the decrease in kcat, and especially

the decrease in Km for (S)-ASA of L142R/V251N implied that N248 might not be involved in (S)-ASA binding as presumed based on the structural data (Blickling, 2005).

The overall improvement of DHDPS activity can be best compared when looking at the specificity constant kcat /Km for (S)-ASA. The increase was 19 fold in L142R, 1.7

fold in L142R/I247F, and 5 fold in L142R/V251N. This 19 fold improvement of DHDPS

activity caused by L142R mutation in NAL has also been observed by Joerger et. al.

(Joerger, 2003), which clearly demonstrate the importance of this Arg residue to DHDPS

activity. 165 Identification of DHDPS product generated by NAL using HPLC

The ability of NAL to catalyze DHDPS reaction was further examined using

HPLC analysis. 270 nm wavelength was used to detect DHDP because DHDP has been found to have a maximum absorbance at 270 nm (Yugari, 1965). After the elution with a linear gradient of 100%-5% acetonitrile, no peak was observed under 270 nm from a mixture of pyruvate and (S)-ASA, suggesting that DHDP could not be generated by a non-enzymatic reaction (Figure 33A). When the wild type DHDPS was added into the reaction mixture, a new peak with a small shoulder was observed after only 3 min. The spectrum of this new compound showed a maximum absorbance at 270 nm, which is exactly same as DHDP (Figure 33B). Combined with the fact that the absorbance of this peak increased after longer incubation, this new compound was believed to be DHDP and produced by DHDPS activity.

E. coli NAL proteins from two different sources were used to test their ability to generate DHDP, one was purified with a C-terminal His tag and the other was commercially available from Sigma. After the elution with a linear gradient of 100%-5% acetronitrile, the same peak as the one showed in the DHDPS reaction was observed only

10 min after NAL(C-His) was added into the reaction mixture (Figure 33C), but it took 2 hr to appear for NAL(sigma) because of its lower protein concentration (Figure 33D). For both NAL, the spectrum of the peak was found to be exactly identical to that from

DHDPS reaction with a maximum absorbance at 270 nm, and the absorbance of the peak also increased along the time. These observations clearly indicated that NAL can catalyze the condensation of pyruvate and (S)-ASA to form DHDP. 166

Figure 33: The identification of DHDP generated by the wild type DHDPS and NAL using HPLC. (A) The reaction mixture only with (S)-ASA and pyruvate after 2 hour incubation. (B) DHDPS reaction mixtures at 3 min (red) and 10 min (black). The spectrum of the major peak from 210 to 400 nm is inset. (B) NAL(C-His) reaction mixtures at 10 min (red) and 60 min (black). The spectrum of the major peak (marked by red triangle) from 210 to 400 nm is inset. (C) NAL(sigma) reaction mixtures at 2 hour (red) and 5 hour (black). The spectrum of the major peak from 210 to 400 nm is inset.

167 Development of DHDPS activity in NAL scaffold by rational site-directed mutagenesis

The structural analysis and previous in vitro kinetic data showed that NAL and

DHDPS are highly evolutionary-related, and NAL might serve as an ancestor for the evolution of DHDPS activity. Because the irreversibility of DHDPS reaction has been suggested to be originated from the spontaneous dehydration of HPTA to form DHDP, it is highly likely that DHDPS is also able to catalyze a cleavage reaction just like NAL. To incorporate NAL activity in the DHDPS scaffold, the residues that are distinct between the active sites of NAL and DHDPS are chosen as primary targets (Figure 31C). The first target residue was R138, which is located at a short helical turn within β5 loop. When this Arg residue is modeled into NAL active site in a conformation as found in DHDPS, the guanidinium group lies right on the top of NAL active site. While this Arg residue is critical for DHDPS catalysis and substrate binding, its presence might interfere with

NANA binding based on the structural data. The next two targets are A189 and V205, which are located within the β7 and β8 loop, respectively. Their counterparts in NAL,

E192 and S208, are involved in NANA binding according to the structural data (Barbosa,

2000). And the substitutions of E192 in NAL have been shown to cause a substantial decrease in NANA binding (Kruger, 2001), a shift of substrate specificity (Williams,

2005), or even the demolition of NAL activity. Mutations F244I and F244A at the C- terminus of α9-helix were added to increase the accessible volume of DHDPS active site because NANA is a bigger substrate (Figure 31C). The last two targets are β7 and β8 loops, which contain most of the second group residues in NAL and DHDPS active sites.

Therefore, these two loops in DHDPS were replaced by the corresponding sequences of

NAL (β7: D187Y/A189E; β8: V202G/S204G/V205S/T207Y). Later, a short version of 168 β8 loop (V202G/S204G/V205S) was introduced due to the insolubility resulted from the mutation T207Y. The following DHDPS mutants were created and cloned in pDIM vector by NdeI and SpeI sites: R138L, A189E, V205S, F244I, F244A, R138L/A189E,

R138L/V205S, R138L/F244A, A189E/F244A, D187Y/A189E (β7),

V202G/S204G/V205S/T207Y (β8), D187Y/A189E/V202G/S204G/V205S/T207Y (β7

β8), D187Y/A189E/V202G/S204G/V205S (β7newβ8), D187Y/A189E/F244I (β7 F244I),

D187Y/A189E/F244A (β7F244A), V202G/S204G/V205S/T207Y/F244A (β8F244A),

V202G/S204G/V205S/T207Y/F244I (β8F244I), D187Y/A189E/V202G/S204G/V205S/

T207Y/F244A (β7β8F244A), D187Y/A189E/V202G/S204G/V205S/T207Y/F244I

(β7β8F244I), R138L/D187Y/A189E/V202G/S204G/V205S (R138Lβ7newβ8),

D187Y/A189E/V202G/S204G/V205S/F244A (β7newβ8F244A), R138L/D187Y/A189E/

V202G/S204G/V205S/F244A (R138Lβ7newβ8 F244A).

The in vivo selection of NAL activity was carried out in an E. coli auxtrophic strain NAL(-), which has the chromosomal nanA gene deleted, on minimal medium with

NANA as only carbon source. NANA has been shown to be able to induce NAL activity and support E. coli K12 growth as a sole carbon source (Vimr, 1985a). After the plasmids containing the gene fragments of the DHDPS mutants, as well as the wild type NAL and

DHDPS genes, were transformed into NAL(-), transformed cells were replicated onto a minimal plate from a master plate, and incubated at 30 ˚C or 37 ˚C for 3 days. Only the

NAL(-) cells with the plasmid containing the wild type NAL gene were able to grow reproducibly on minimal medium after 2 days, and none of DHDPS mutants was observed to be able to support NAL(-) growth on minimal medium. After elongated 169 incubation (> 3days), even NAL(-) cell with only empty vector were able to grow on minimal medium, indicating that results obtained after > 3 days were no longer reliable.

In order to support bacterial growth with NANA as the only carbon source, the

NAL activity must be high enough so as to provide all materials required for cell growth, which has to be observable in 3 days. If the NAL activities of the DHDPS mutants are below this threshold value, they certainly can not be identified using in vivo complementation method. To avoid this problem associated with in vivo selection, the following DHDPS mutants were subcloned into pET22b vector by NdeI and HindIII sites, and purified to >95% homogeneity with a C-terminal His tag using affinity chromatography, followed by in vitro kinetic characterization: R138L, A189E, V205S,

F244I, F244A, R138L/A189E, R138L/V205S, R138L/F244A, A189E/F244A,

D187Y/A189E (β7), V202G/S204G/V205S/T207Y (β8), D187Y/A189E/V202G/S204G/

V205S/T207Y (β7 β8). After overexpression, two mutants, V202G/S204G/V205S/

T207Y (β8), D187Y/A189E/V202G/S204G/V205S/T207Y (β7 β8) were found to be insoluble, which might be resulted from the steric interference between the N- terminal barrel domain and the C-terminal helical domain due to the substitution of T207Y.

Without the substitution of T207Y, a new DHDPS mutant, D187Y/A189E/V202G/

S204G/V205S (β7newβ8), was found to have about 30% in the soluble section and purified accordingly.

In the first in vitro kinetic assay to detect NAL activity, the progress of the reactions was monitored by following the reduction of pyruvate to lactate by NADH as catalyzed by lactate dehydrogenase (LDH), which was measured at 340nm (ε340 = 6.22 mM-1 cm-1) on a Cary I spectrometer (Varian). Unfortunately, none of the purified 170 DHDPS mutants but F244A was identified to be able to catalyze the cleavage of NANA to pyruvate and ManNAc, even under the conditions of the saturating concentration of

NANA, long incubation time (5 hour), and high protein concentration (>3 mg/ml). The

-5 -1 kcat of F244A to cleave NANA to ManNAc and pyruvate was estimated to be 5 ×10 s ,

5 which is almost 6 ×10 fold lower than that of the wild type NAL, while the Km for

NANA of F244A was not determined due to its low activity. The generation of NAL

activity in the DHDPS scaffold by opening up the DHDPS active site indicated that NAL

and DHDPS share the same catalytic machinery, but the accessibility of the active sites

limits the reactions they can catalyze. At the same time, although all the NAL active site

residues identified from the structural data were introduced into the DHDPS scaffold, the

failure to generate or improve the NAL activity suggests that some unidentified residues

outside the active site might play a role in catalysis directly or indirectly. It has been

generally accepted that some distant residues from the active site might have a significant

impact on enzymatic activity or reaction mechanism through an intra-molecular

interaction network (Benkovic, 2003).

To evaluate the effects of different active site changes on DHDPS activity, the

residual DHDPS activity and the (S)-ASA binding affinity of each purified DHDPS

mutant was also determined using the coupling DHDPR reaction, as described in

Materials and Methods section (Table 14). All DHDPS mutants with the substitution of

R138 lost their DHDPS activity completely, indicating the essential role of R138 for

DHDPS activity. This finding is consistent with the prediction that R138 might be

involved in catalysis and (S)-ASA binding based on the structural data (Mirwaldt, 1995;

Dobson, 2005). The mutation A189E led to a two fold decrease in kcat while the Km for 171

Table 14: Kinetic parameters of the DHDPS activities of the DHDPS mutants with mutations in the active site. k /K (ASA) enzyme k (s-1) K (ASA) (mM) cat m cat m (mM-1· s-1) DHDPS 16.3±0.6 0.26±0.03 63

R138L ND ND ND

A189E 7.7±0.6 0.18±0.03 43

V205S 18.6±1.2 1.01±0.16 18.5

F244A 4.74±0.41 0.30±0.05 15.8

F244I 17.2±0.3 1.42±0.04 12.1

R138L/A189E ND ND ND

R138L/V205S ND ND ND

R138L/V205S ND ND ND

A189E/F244A 2.2±0.1 0.27±0.03 8.1

D187Y/A189E 8.6±0.6 4.3±0.7 2 D187Y/A189E/ 2.5±0.3 1.5±0.2 1.7 V202G/S204G/V205S ND, not detectable.

172 (S)-ASA slightly decreased, and suggests that the addition of an extra negatively charged residue to β7 loop did not affect (S)-ASA binding since there are two negatively charged residues in β7 loop already. The mutation V205S caused a 4 fold increase in the Km for

(S)-ASA while the kcat slightly improved, implying that this change only affected the

substrate binding, not the catalysis. F244A and F244I mutations were introduced to

increase the accessible volume of the DHDPS active site to accommodate NANA as a

substrate. Although located at the same position, these two mutations affected different

kinetic parameters of the DHDPS activity, while F244I led to a 5 fold increase in the Km for (S)-ASA and an almost unchanged kcat, and F244A led to a 3.5 fold decrease in the kcat and a slight increase in the Km for (S)-ASA. The mutation of F244A might open up the

DHDPS active site for bound (S)-ASA to move more freely, and the resultant entropic

increase might be responsible for the decrease in the kcat. In the DHDPS structure, the α9

loop, in which F244 residue is located, is in an orientation relatively closer to the active

site than in NAL, The side chain of Ile residue protrudes into DHDPS active site if it is

modeled into the DHDPS structure in a conformation as found in NAL, which might

interfere with (S)-ASA binding, and result in the increase in the Km for (S)-ASA. The

overall influence on the DHDPS activity is comparable between these two mutants when

looking at the specificity constant the kcat/the Km for (S)-ASA (15.8 vs 12.1). The

A189E/F244A double mutations led to an 8 fold decrease in the kcat and no effect on the

Km for (S)-ASA. When considering the 2 fold decrease in the kcat in the A189E mutant

and the 3.5 fold decrease in the kcat in the F244A mutant, the effects of these two mutations on the kcat are additive, indicating there is no interaction between F244 and

A189 residues. The double mutations D187Y/A189E introduce the NAL β7 loop in the 173

DHDPS active site, and led to a 2 fold decrease in the kcat and a 16 fold increase in the Km for (S)-ASA. Compared to the kinetic parameters of the A189E mutant, addition of a Trp residue into the DHDPS active site significantly impaired the (S)-ASA binding affinity due to its big size. The last DHDPS mutant has five mutations, D187Y/A189E/V202G/

S204G/V205S, by which the NAL β7 and β8 loops were introduced into the DHDPS active site. When looking at the specificity constant the kcat/the Km for (S)-ASA, this

mutant is the least active one with a 7 fold decrease in the kcat and a 5.5 fold increase in

the Km for (S)-ASA.

The second in vitro assay for NAL activity utilizes radioactive NANA, N-[acetyl-

1-14C]-neuraminic acid (14C-NANA), for its high sensitivity. N-[acetyl-1-14C]-neuraminic

acid (14C-NANA) was synthesized from the condensation of N-[acetyl-1-14C]- mannosamine (14C-ManNAc) and pyruvate catalyzed by NAL, as described in Materials

and Methods section. After the cleavage of 14C-NANA, the remaining 14C-NANA and

14C-ManNAc were separated using a Dowex 1×8-200 column, and their radioactivity was

determined in a LS 6500 scintillation counter (Beckman), followed by the calculation of

the specific activity (µmol/min/mg). The NAL activity of each purified DHDPS mutant,

as well as the wild type NAL and DHDPS, was determined using 14C-NANA, as

described in Materials and Methods section. The wild type NAL was observed to have a

specific activity at 6.9 µmol/min/mg, consistent with the value determined by the

coupling LDH reaction. Interestingly, the wild type DHDPS (C-His) showed a weak

NAL activity at 4.5 ×10-5 µmol/min/mg, more than 105 fold lower than that of the wild

type NAL. To verify this finding, the dapA gene was subcloned into pET16b vector via

NdeI and BamHI sites, and the wild type DHDPS was purified with an N-terminal His 174 tag from a BL21(DE3) strain with the chromosomal nanA deleted, followed by the characterization of its NAL activity using 14C-NANA. The similar NAL activity detected from DHDPS (N-His) indicated that the wild type DHDPS indeed is able to catalyze the cleavage of NANA to pyruvate and ManNAc, albeit more than 105 times less efficiently

than the wild type NAL. The relatively small accessible volume of the DHDPS active site

with respect to the size of NANA might attribute to this low NAL activity. Surprisingly,

all purified DHDPS mutants, even F244A, did not show the ability to cleave 14C-NANA,

which is in contrast to the previous result that F244A can catalyze the cleavage of

NANA. F244A was repurified from the BL21(DE3) nanA- strain, and tested for NAL

activity using both in vitro methods. The lack of the NAL activity from the newly

purified F244A indicated that the previous result for the NAL activity of F244A was

actually an artifact, and might be caused by protein precipitation.

The facts that the wild type DHDPS can catalyze the NAL reaction while DHDPS

mutants with the active site residues from NAL can not were out of the expected, and

difficult to interpret. One possible explanation is that both the DHDPS active site and

NANA might need to adopt some certain conformations for the binding and then the

reaction to occur, and the binding between NANA and DHDPS was very weak due to the

high energy barrier of these conformations. This weak binding was suggested by the fact

that the DHDPS activity of the wild type DHDPS was not inhibited in the presence of 10

mM NANA, 100 times in excess of pyruvate (data not shown). Although the mutations

were introduced into the DHDPS active site to simulate the NAL active site, these

changes might prevent the DHDPS active site from adopting into the conformations that

NANA could bind, and further prevent the reaction to occur. 175 Construction of DHDPS library with randomized residues at position 187, 188, 189, 202,

204, 205, and 207

The structural and kinetic data have shown that the residues within the β7 and β8 loops are involved in the substrate binding in NAL and DHDPS, and the substitutions of them have led to the complete loss or attenuation of the activities, lower substrate binding affinity, or a shift of substrate specificity (Barbosa, 2000; Kruger, 2001; Williams, 2005).

As shown in Figure 32B and C, the amino acid residues within the β7 and β8 loops are quite diverse between NAL and DHDPS.

In order to evolve the NAL activity in the DHDPS scaffold, the NAL active site was copied into the DHDPS scaffold by site-directed mutagenesis, but no NAL activity was observed from all the DHDPS mutants either in vivo or in vitro. To fully explore the possibility to generate a NAL active site in the DHDPS scaffold, each position in the

DHDPS β7 and β8 loops that has a different residue from the same position in NAL was replaced by each of the 20 amino acid residues using saturation mutagenesis, and a

DHDPS library was constructed as described in Materials and Methods section, followed by in vivo selection of the NAL activity in E. coli auxotrophic strain NAL(-) on selective medium. The positions that were targeted are positions 187, 188, and 189 in the β7 loop, and positions 202, 204, 205, and 207 in the β8 loop. DNA sequencing of several plasmids from randomly chosen colonies of the naïve library showed that the residues at these positions were randomized.

None DHDPS mutants with the NAL activity was identified from in vivo selection of the DHDPS library with randomized residues in the β7 and β8 loops on the selective medium, which might be due to several reasons. First, no NAL activity could be 176 reconstituted in the DHDPS scaffold. One more likely scenario is that some DHDPS mutants might retain the NAL activity as the wild type DHDPS, but these activities were so low to be observed because they were over the sensitivity limits of in vivo selection.

Another explanation came from the intrinsic bias associated with the synthesis of DNA olionucleotides due to the degeneracy of the genetic code. During the oligonucleotide synthesis, the codon for each target residue was replaced by NNS, while N represents equal molar mixtures of all four bases and S represents equal molar mixtures of G and C.

Therefore, NNS represents 32 different codons, encoding all 20 amino acids and an amber stop codon (TAG). But the degeneracy of the genetic code leads to a form of codon bias as there are three times as many codons for some amino acids, such as Ser and

Arg, than others such as Trp and Met. As the number of the positions randomized increases, the sequence diversity of the library becomes more biased towards the amino acids with high degeneracy, which results in the losing of some potential candidates. In our library, seven positions were randomized, which means that the possibility for Ser and Arg to appear is 37=2187 times higher than that of Trp and Met.

Construction of hybrid proteins between NAL and DHDPS by rational design

The four fold symmetry of the (β/α)8 barrel structure allows the suggestion that

the (β/α)8 barrel structure are assembled from (β/α)2N precursors (Gerlt, 2003), which has been shown experimentally that a compact (β/α)8 barrel structure could be assembled by

two half (β/α)4 barrels, and even retains its catalytic function (Hocker, 2001). Therefore,

novel protein functions might be evolved by mixing precursors that separately deliver

substrate binding or catalysis, which can be also considered as one way of “domain

swapping”. 177

As shown in Figure 34A, the (β/α)8 barrel structure of NAL and DHDPS is

divided into four (β/α)2 units. In both enzymes, the first (β/α)2 unit contains the residues

(T44 and T45 in DHDPS, S47 and T48 in NAL) that are involved in the binding of the first carboxylate group of the substrates. The second (β/α)2 unit contains the residues

(Y106 and Y107 in DHDPS, F109 and Y110 in NAL) that are involved in the

dimerization, and formation of the active site in the other monomer. The third (β/α)2 unit contains the residues (Y133, R138, and K161 in DHDPS, Y137 and K165 in NAL) that are involved in catalysis. The last (β/α)2 unit contains the residues (G186 and D188 in

DHDPS, G189, D191, E192, and S208 in NAL) that are involved in substrate binding.

Because of this assignment of catalysis and substrate binding functions into distinct (β/α)2 units in NAL and DHDPS, it is likely to generate the NAL activity by (β/α)2 units

swapping between NAL and DHDPS.

Four hybrid proteins with different (β/α)2 units were created as shown in Figure

34B. DHDPS (β/α)4 has the N-terminal (β/α)4 half barrel from DHDPS (DHDPS 1-129),

followed by the C-terminal (β/α)4 half barrel and the helical domain from NAL (NAL

134-297). DHDPS (β/α)6 has the first three (β/α)2 units from DHDPS (DHDPS 1-180),

followed by the last (β/α)2 unit and the helical domain from NAL (NAL 184-297).

DHDPS (β/α)8 has the N-terminal (β/α)8 barrel domain from DHDPS (DHDPS 1-224),

followed by the C-terminal helical domain from NAL (NAL 228-297). The last hybrid,

NAL (β/α)8 has the N-terminal (β/α)8 barrel domain from NAL (NAL 1-227), followed

by the C-terminal helical domain from DHDPS (NAL 225-292).

After the cloning into pDIM vector by NdeI and SpeI sites and transformation

into E. coli auxotrophic strain NAL(-), a series dilutions of transformed cells were drop- 178

Figure 34: The schematic overview of NAL and DHDPS secondary structures and their active sites residues (A), and the schematic overview of four hybrid proteins between NAL and DHDPS (B), and their in vivo complementation results in NAL(-) on M9+NANA medium (C) and on M9+glucose medium (D). (A) Four (β/α)2 units was labeled in cyan, green, gray, and pink, respectively, from the N- to C- terminus. The active site residues in NAL (green) and DHDPS (red) are displayed in their relative locations in the secondary structures. (B) Four hybrid proteins were created between NAL (green) and DHDPS (red), and cloned into pDIM vector by NdeI and SpeI sites. (C) (D) After the pDIM vectors containing the hybrid proteins were transformed into NAL(-), the serial dilutions of the transformed cells (2 µl of ~10N cells ml-1) were drop-spotted onto M9+NANA medium (selective medium) and M9+glucose medium (permissive medium). pDIM-PurT was included as a control. 179 spotted on selective medium (M9+NANA) and permissive medium (M9+glucose) for in vivo selection of the NAL activity. pDIM-PurT was also included as a control. As expected, they all grew on the permissive medium (M9+glucose) with glucose as carbon source, and the colonies showed up only after 1 day incubation (Figure 33D). After 3 days incubation, colonies were observed on the selective medium (M9+NANA) (Figure

34C). For PurT control, the colonies were observed only in the first two spots, indicating that PurT was not able to complement NAL(-) growth on the selective medium. For

DHDPS(β/α)4, DHDPS(β/α)6, and DHDPS(β/α)8 hybrid proteins, the colonies were observed in the first five spots, but the cell density in each spot was lower when

compared to that on the permissive medium, suggesting that these colonies were false

positive, and their growth was not the result of the complementation, but the reversion.

These findings indicated that the (β/α)2 units in NAL could not be replaced by those in

DHDPS without the expense of the NAL activity.

For NAL(β/α)8, small colonies were observed to be even distributed in each spot,

with a cell density similar to that obtained on the permissive medium, which suggested

that NAL(β/α)8 could complement NAL(-) growth on the selective medium, albeit at a

slow growth rate. NAL(β/α)8 has the whole (β/α)8 barrel structure from NAL and the

helical domain from DHDPS. On the basis of these findings, it can be concluded that the

NAL activity requires the integrity of the NAL (β/α)8 barrel structure and sensitive to the

sequence changes, and the residues within the C- terminal helical domain do not

participate in the NAL catalysis. The attempt to purify NAL (β/α)8 hybrid protein for in

vitro characterization failed due to its solubility. 180 Creation of multiple-crossover DNA library between NAL and DHDPS using

SCRATCHY

The previous effort to engineer the NAL activity into the DHDPS scaffold using rational methods was not successful, possibly on account of the limited sequence spaces investigated. To explore additional sequence spaces outside of the active sites to improve the NAL activity in DHDPS, several technologies to generate combinatorial protein libraries were examined, such as ITCHY, SCRATCHY, random mutagenesis, and DNA shuffling.

Two ITCHY libraries between NAL and DHDPS (pDIM-NDX and pDIM-DNX) with more than 8 × 106 independent members were created using THIO-ITCHY by

Seunggoo Lee, a former post-doc in our group. The starting construct for pDIM-DNX

library had the DHDPS sequence (1-224), and followed by the NAL sequence (72-297),

while the starting construct for pDIM-NDX library had the NAL sequence (1-227), and

followed by the DHDPS sequence (69-292). The overlapping regions for both libraries

covered the last six (β/α) units of the barrel domain. In vivo selection of either NAL or

DHDPS activities of both libraries in E. coli auxotrophic strains did not yield any functional hybrid protein out of these two ITCHY libraries.

One major drawback associated with the hybrid protein libraries generated by

ITCHY is due to the nature of this technology that only one crossover can be introduced between two parental genes, which usually results in poor solubility. To introduce multiple crossovers between the genes of interest without sequence identity limitations, a method called SCRATCHY was developed, which combines ITCHY and DNA shuffling

(Lutz, 2001b). Generally, SCRATCHY consists of the construction, size selection, and 181 in-frame selection of two ITHCY libraries between two parental genes, followed by DNA shuffling of these two ITCHY libraries (Figure 35). SCRATCHY has been shown experimentally to create functional hybrid proteins with multiple crossovers (Lutz,

2001b; Kawarasaki, 2003), and even hybrid proteins with novel substrate specificity

(Griswold, 2005).

After the hybrid gene fragments were cut off the plasmids by restriction digestion of NdeI and SpeI, those DNA fragmentswith similar size to the wild type genes were separated by gel electrophoresis and purified, followed by the ligation into pDIM-N6 vector for in-frame selection. After size selection, still two thirds of the library should have a frameshift at the crossover position, leading to massive mutation and premature truncation of the proteins. After ligated into pDIM-N6 vector, the gene fragments were fused to a neoR gene so that only the in-frame hybrid proteins can be expressed as a fusion protein with a functional neomycin phosphotransferase encoded by neoR, therefore, exhibit a kanamycin resistance. After size selection and in-frame selection, two

ITCHY libraries, pDIMN6-NDX (7.3 ×105 members) and pDIMN6-DNX (2.8×105 members), were available for DNA shuffling.

The hybrid genes were amplified using each ITCHY library individually as template, and equal amount of DNA from both libraries were mixed, and shuffled as described in Materials and Methods section. But under all conditions tested, such as different fragment sizes (from 40 to 500 bp), different PCR conditions for reassembly step, the final full length gene fragment always had either dapA or nanA sequences by

DNA sequencing, indicating that no hybrid proteins with multiple crossovers could be generated between DHDPS and NAL by SCRATCHY. 182

Figure 35: Schematic overview of SCRATCHY. (Adapted from (Lutz, 2001b)). 183 The low DNA sequence identity (46%) between DHDPS and NAL was considered as the major obstacle to generate hybrid proteins with multiple crossovers between DHDPS and NAL by SCRATCHY. In two successful examples using

SCRATCHY, the DNA identity between two parental proteins, rat-θ2 and human-θ1 glutathione , was as high as 63% (Kawarasaki, 2003; Griswold, 2005). In another example, the two parental proteins, E. coli and human GAR transformylase, shared 50% identity on DNA level, which is similar to that between NAL and DHDPS.

But only 16 functional hybrid constructs from in vivo selection of two ITCHY libraries were subject to DNA shuffling (Lutz, 2001b). In our case, each ITCHY library had more than 1×105 members after size selection and in-frame selection, and was subject to DNA

shuffling. As the number of different hybrid sequences used for DNA shuffling increase, the DNA fragments without crossovers are more overwhelming compared to those

containing one crossover after Dase I treatment, therefore, and the chances for two

fragments with one crossover to anneal to generate multiple crossovers decreases in the

reassembly step. The situation is more similar to DNA shuffling between dapA and nanA

genes, which has been shown to be unable to generate crossovers between the parental

genes (data not shown).

Enhancement of the NAL activity in the DHDPS scaffold by random mutagenesis

The wild type DHDPS has been shown to have a weak NAL activity, but

mutations that brought the NAL active site residues into the DHDPS scaffold did not improve this NAL activity, in contrast, these mutations led to the demolition of the NAL activity. In order to improve this NAL activity in DHDPS, randomly mutagenized

DHDPS libraries were created by error-prone PCR and DNA shuffling, and followed by 184 the in vivo selection of the NAL activity in E. coli auxotrophic strain NAL(-) on selective medium.

Two randomly mutagenized DHDPS libraries with different mutagenesis rates were created using error-prone PCR, as described in Materials and Methods section.

Upon the DNA sequencing results of several plasmids from randomly chosen members of each library, the mutagenesis rate of was 0.3-0.4 % per PCR, and the mutagenesis rate of the other was 0.8-1% per PCR. Equal numbers of cells from each library were pooled together, and plated on the selective medium for in vivo selection. Another randomly mutagenized DHDPS library with 1.6 ×107 individual members was created using DNA shuffling.

None DHDPS mutant from DNA shuffling library was identified to have a NAL activity from in vivo selection after 3 days incubation. Three DHDPS mutants from error- prone PCR library (DH-EP13, 53, and 55) were selected to be able to support the growth of NAL(-) on selective medium. To confirm their complementation results, the plasmids carrying each mutant were extracted, and retransformed into fresh NAL(-) cells, and then the in vivo screening of the NAL activity was repeated on the selective medium. The same results observed in the second complementation experiment proved that mutations introduced into the dapA gene has resulted in these three mutants with the ability to support the growth of NAL(-) on the selective medium. The amino acid sequences of the mutants were derived from their DNA sequences by DNA sequencing. DH-EP13 carried one mutation: H240L; and DH-EP53 carried fifteen mutations: I6T, T45S, A67V, I71N,

T89A, D127G, N156D, I158V, H200Q, T206A, A210T, M214L, M237T, D246V,

K257E; and DH-EP55 had three mutations: V60A, L66P, and S111P. 185 In order to purify these mutants for in vitro characterization, the gene fragments were subcloned into pET22b vector by NdeI and HindIII sites, and transformed into

BL21(DE3) nanA- strain. After overexpression, DH-EP13 was found in the soluble

fraction of the cell lysate and purified with a C- terminal His tag using Ni-NTA affinity

chromatography, while DH-EP53, 55 were found in the insoluble fraction of the cell

lysate and purified by refolding after the solubilization in 8 M urea. For each mutant, the residual DHDPS activity was determined with the coupling reaction of DHDPR, and the

NAL activity was determined using 14C-NANA as substrate. The in vitro kinetic constants of each mutant, as well as the wild type NAL and DHDPS were shown in

Table 15. DH-EP13 had an almost 200 decrease in the kcat and a slightly increase in the

Km for the (S)-ASA for its DHDPS activity, and a two fold increase in its NAL activity

when compared to the wild type DHDPS. Located in the middle of the α9 helix in

DHDPS, H240 residue is situated between the barrel and the helical domain, right behind

the β7 and β8 loops. One nitrogen atom of its imidazole group is in the hydrogen bond

distance with the hydroxyl oxygen of T206 in the β8 loop (Figure 36). Mutation H240L

might break this interaction, and increase the flexibility of the β8 loop. In NAL, the

corresponding residue to DHDPS H240 is a conserved residue I243. DH-EP53 had a 50

decrease in the kcat and a 17 fold increase in the Km for the (S)-ASA for its DHDPS activity, and an 18 fold increase in its NAL activity when compared to the wild type

DHDPS. 12 of 15 mutations in DH-EP53 are in the barrel domain while the other three are in the helical domain. Interestingly, 8 of 12 mutations in the barrel domains are located in the N-terminus of the barrel, mostly in the loops connecting α-helices and subsequent β-sheets. Since these loops are suggested to be important for the stability of 186

Table 15: Kinetic parameters of the NAL and DHDPS activities of the DHDPS mutants. DHDPS activity NAL activity Relative Relative kcat Km (ASA) Specific activity enzyme -1 k /K Specific (s ) (mM) cat m (µmol/min/mg) (ASA) activity NAL 0.08±0.002 8.65±1.7 1 6.9±0.5 150000

DHDPS 16.3±0.6 0.26±0.03 6800 (4.5±0.6)×10-5 1

DH-EP13 0.07±0.004 0.31±0.04 24 (1.0±0.2)×10-4 2.2

DH-EP53 0.32±0.09 4.49±1.66 7.7 (8.2±0.5)×10-4 18

DH-EP55 7.1±0.8 2.66±0.43 288 (7.1±0.4)×10-4 16

DH-EPSH1 0.0015±0.0004 ND ND (3.3±0.6)×10-4 7.3

ND, not determined.

187

Figure 36: The active site of DH-EP13. The sialic acid alditol molecule is shown in ball- and-stick representation. DHDPS and NAL conserved residues within the active sites are depicted in red and green line representations, respectively. DHDPS T206 and H240 are depicted in ball and stick representation, and a potential hydrogen bond between them is labeled in green. In DH-EP 13, mutation H240L might bread the hydrogen bond with T206. In DH-EP 53, 2 of 15 mutations are close to the active site. T45S might not lead to a big change in terms of substrate binding and catalysis. T206A mutation also might break the hydrogen bond between His240 and T206. The residues are indicated based on E. coli numbering.

188 the barrel structure (Hocker, 2001), the aggregation of mutations in these loop regions might explain the poor solubility of DH-EP53. In the remaining four mutations, A210T and M214L are located in the N-terminus of the β8 sheet, also quite far away from the active site, and only T45S and T206A are close to the active site. Situated in the β2 loop,

T45 actually is involved in the binding of the carboxylate group of pyruvate in DHDPS, and mutation T45S might not lead to a significant effect on the DHDPS activity. Located in the β8 loop, T206 might interact with H240, the residue that was substituted by a Leu residue in DH-EP13. The mutation T206A might also lead to the breakage of this interaction, and increase the flexibility of the β8 loop. DH-EP55 had a 2.3 fold decrease in the kcat and a 10 fold increase in the Km for the (S)-ASA for its DHDPS activity, and an

16 fold increase in its NAL activity when compared to the wild type DHDPS. While

residues V60 and L66, and S111 are located in the α-2 helix, and the N-terminus of the α-

4 helix, respectively, they are distant away from the active site. Therefore, V60A, L66P, and S111P mutations might influence the activities by structural changes. For instance,

L66P mutation might lead to a short α-2 helix as it is located at the C- terminus of the α-2 helix. S111P, combined with P110, would substantially increase the rigidity of the β-4 loop, which contains the conserved Y106 and Y107 residues that are involved in dimerization and catalysis.

DNA shuffling of three DHDPS mutants with enhanced NAL activity

To further improve the NAL activity in the DHDPS scaffold, DH-EP13, 53, and

55 gene fragments were amplified off the plasmids and subject to DNA shuffling as described in Materials and Methods section, for its ability to flush out the deleterious mutation and generate new combinations of beneficial mutations. After ligation into 189 pDIM vector by NdeI and SpeI sites, the DNA shuffling library with 1.2 ×106 individual

members was transformed into E. coli auxotrophic strain NAL(-), followed by in vivo

selection of the NAL activity on the selective medium.

During the in vivo selection, the NAL(-) cells containing pDIM-DH-EP53 were

included as a control, and only the colonies that had bigger sizes than those of the control

after 3 days incubation were picked. Four functional DHDPS mutants were identified to

be able to complement NAL(-) better than DH-EP53 after the confirmation of the

complementation. However, DNA sequencing of these four mutant genes revealed that

three of them were identical to DH-EP55, and the last one (DH-EPSH1) contained

mutations that were not present in the three parental sequences while the colonies

containing this mutant were observed on the selective medium only after 2 days

incubation. DH-EPSH1 had 11 mutations: I9V, I97V, N156S, D187Y, A189E, V202G,

S204G, V205S, T207Y, V245A, L286H, in which D187Y, A189E, V202G, S204G,

V205S and T207Y are actually the active site residues in the β7 and β8 loops of NAL.

The reasons for which these new mutations were generated are not completely

understood, and might be due to the mutagenesis nature of DNA shuffling.

To obtain its in vitro kinetic parameters, DH-EPSH1 gene fragment was ligated

into pET22b via NdeI and HindIII sites, and DH-EPSH1 protein was purified with a C-

terminal His tag by refolding after the solubilization in 8 M urea, because it was found in

the insoluble fraction of the cell lysate. Like other DHDPS mutants, the residual DHDPS

activity of DH-EPSH1 was determined with the coupling reaction of DHDPR, and its

NAL activity was determined using 14C-NANA as substrate (Table 15). DH-EPSH1 had

a more than 10000 fold decrease in the kcat for its DHDPS activity, and a 7.3 fold increase 190 in its NAL activity when compared to the wild type DHDPS, while its Km for the (S)-

ASA of DH-EPSH1 was not determined for its low DHDPS activity. In DHDPS, besides

those active site residues, I9, I97, N156, V245, and L286 are located in the center of the

β1 sheet, the α3 loop, the α5 loop, the α9 loop, and the N-terminus of the α10 helix,

respectively. The roles of these five residues with respect to the catalysis were hard to

interpret due to their distance from the active site. The instability of the mutant might

result from the mutation T207Y, as mentioned before.

The relatively low in vitro NAL activity of DH-EPSH1 seemed at odds from its in

vivo complementation result. As a matter of fact, this discrepancy between in vitro and in

vivo experiments was not unusual. Presumably, the differences in enzymatic activity in

vivo and in vitro originate from the overall compromised structural integrity of the

hybrids under in vitro conditions. In vitro experiments are usually performed in a

substantially more dilute environment compared to cellular conditions, which render the

enzymes prone to denaturation and aggregation. In contrast, this destabilization could be

at least partially compensated by macromolecular crowding in vivo to provide sufficient

enzymatic activity to complement.

4.4 Conclusion

As two members of the NAL subfamily of the aldolase superfamily of the (β/α)8 barrel proteins, the structure and reaction mechanism homology between DHDPS and

NAL indicate that they are evolutionary-related. Our attempt to interconvert the enzymatic activities between NAL and DHDPS by rational and combinatorial 191 technologies led to some interesting observations and speculations about protein evolution schemes between NAL and DHDPS.

The wild type NAL was shown experimentally to be able to catalyze DHDPS reaction. Using radioactive NANA, DHDPS was also observed to be able to catalyze

NAL reaction. The catalytic promiscuity between E. coli NAL and DHDPS provide further functional evidence for divergent evolution. NAL presents some typical features of a multipotent ancestral enzyme, such as the existence of another activity in its scaffold, the tolerance of a wide range of aldolses as substrates for its condensation reaction. It has been shown that a single mutation L142R in NAL led to the complementation of E. coli

DHDPS deficient strain AT997, and thus conferring a selective advantage (Joerger,

2003). Although we did not observe the complementation in vivo, it was still surprising to observe that a single mutation of L142R in NAL increased its DHDPS activity 19 fold.

This significant effect of the DHDPS activity in NAL by a single mutation increases the likelihood that DHDPS activity might evolve from NAL scaffold through gene duplication, followed by the optimization.

However, DHDPS’ essential role in the lysine biosynthesis and the cell viability, and its ubiquity in plants and bacteria imply that the generation of DHDPS activity might happen pretty early during the evolution course. As it is not essential for cell viability, the

NAL activity might be developed in DHDPS scaffold since DHDPS showed a weak NAL activity. Out attempt to improve the NAL activity in DHDPS actually would mimic the optimization of the NAL activity if the previous evolution scheme is true. The mutations that introduced the NAL active site residues into DHDPS active site by site-directed mutagenesis did not improve the NAL activity of the DHDPS mutants, indicating that 192 some elements outside of the NAL active site might also contribute to the optimization of

NAL activity. Random mutagenesis of DHDPS generated several DHDPS mutants with an increased NAL activity 18 fold higher than that of the wild type DHDPS, at the cost of significant reduction of the DHDPS activity. One particular mutant, DH-EP53 has a 16 fold increase in the kcat for the NAL activity and a 23 fold decrease in the specificity constant kcat /Km (ASA) for the DHDPS activity, when compared to the wild type

DHDPS. Additionally, a pair of residues (T206 and H240) was identified in DHDPS, and

the breakage of the hydrogen bond between them might play a role in the enhancement of

NAL activity in DHDPS scaffold. The relatively low NAL activity observed in the

DHDPS scaffold, combined with the difficulty to improve this NAL activity in the

DHDPS scaffold, also supported the evolutionary scheme from NAL to DHDPS through

divergent path. 193

Bibliography

Aimi, J., H. Qiu, J. Williams, H. Zalkin and J. E. Dixon (1990). "De novo purine

nucleotide biosynthesis: cloning of human and avian cDNAs encoding the

trifunctional glycinamide ribonucleotide synthetase-aminoimidazole

ribonucleotide synthetase-glycinamide ribonucleotide transformylase by

functional complementation in E. coli." Nucleic Acids Res 18(22): 6665-72.

Arnold, F. H., Ed. (2000). Evolutionary Protein Design. Advances in Protein Chemistry.

New York, Academic Press.

Arnold, F. H. (2001). "Combinatorial and computational challenges for biocatalyst

design." Nature 409(6817): 253-7.

Arnold, F. H. and G. Georgiou, Eds. (2003). Directed Enzyme Evolution: Screening and

Selection Methods. Totowa, NJ, Humana Press.

Arnold, F. H. and G. Georgiou, Eds. (2003). Directed Evolution Library Creation:

Methods and Protocols. Totowa, NJ, Humana Press.

Arnold, F. H. and A. A. Volkov (1999). "Directed evolution of biocatalysts." Curr Opin

Chem Biol 3(1): 54-9.

Babbitt, P. C. and J. A. Gerlt (2000). "New functions from old scaffolds: how nature

reengineers enzymes for new functions." Adv Protein Chem 55: 1-28. 194 Babbitt, P. C., M. S. Hasson, J. E. Wedekind, D. R. Palmer, W. C. Barrett, G. H. Reed, I.

Rayment, D. Ringe, G. L. Kenyon and J. A. Gerlt (1996). "The enolase

superfamily: a general strategy for enzyme-catalyzed abstraction of the alpha-

protons of carboxylic acids." Biochemistry 35(51): 16489-501.

Baca, M., T. S. Scanlan, R. C. Stephenson and J. A. Wells (1997). " of a

catalytic to optimize affinity for transition-state analog binding." Proc

Natl Acad Sci U S A 94(19): 10063-8.

Bakhiet, N., F. W. Forney, D. P. Stahly and L. Daniels (1984). " Lysine biosynthesis in

Methanobacterium thermoautotrophicum is by the diaminopimelic acid pathway."

Curr. Microbiol. 10(4): 195-198.

Banner, D. W., A. C. Bloomer, G. A. Petsko, D. C. Phillips, C. I. Pogson, I. A. Wilson, P.

H. Corran, A. J. Furth, J. D. Milman, R. E. Offord, J. D. Priddle and S. G. Waley

(1975). "Structure of chicken muscle triose phosphate isomerase determined

crystallographically at 2.5 angstrom resolution using amino acid sequence data."

Nature 255(5510): 609-14.

Barbosa, J. A., B. J. Smith, R. DeGori, H. C. Ooi, S. M. Marcuccio, E. M. Campi, W. R.

Jackson, R. Brossmer, M. Sommer and M. C. Lawrence (2000). "Active site

modulation in the N-acetylneuraminate lyase sub-family as revealed by the

structure of the inhibitor-complexed Haemophilus influenzae enzyme." J Mol Biol

303(3): 405-21.

Bartlett, A. T. M. and P. J. White (1986). "J. Gen. Microbiol." 132: 3169-3177. 195 Becker, S., H. U. Schmoldt, T. M. Adams, S. Wilhelm and H. Kolmar (2004). "Ultra-

high-throughput screening based on cell-surface display and fluorescence-

activated cell sorting for the identification of novel biocatalysts." Curr Opin

Biotechnol 15(4): 323-9.

Beguin, P. (1999). "Hybrid enzymes." Curr Opin Biotechnol 10(4): 336-40.

Benkovic, S. J. and S. Hammes-Schiffer (2003). "A perspective on enzyme catalysis."

Science 301(5637): 1196-202.

Beste, G., F. S. Schmidt, T. Stibora and A. Skerra (1999). "Small antibody-like proteins

with prescribed ligand specificities derived from the lipocalin fold." Proc Natl

Acad Sci U S A 96(5): 1898-903.

Blickling, S. and J. Knablein (1997). "Feedback inhibition of dihydrodipicolinate

synthase enzymes by L-lysine." Biol Chem 378(3-4): 207-10.

Blickling, S., C. Renner, B. Laber, H. D. Pohlenz, T. A. Holak and R. Huber (1997).

"Reaction mechanism of Escherichia coli dihydrodipicolinate synthase

investigated by X-ray crystallography and NMR spectroscopy." Biochemistry

36(1): 24-33.

Boder, E. T., K. S. Midelfort and K. D. Wittrup (2000). "Directed evolution of antibody

fragments with monovalent femtomolar antigen-binding affinity." Proc Natl Acad

Sci U S A 97(20): 10701-5. 196 Boder, E. T. and K. D. Wittrup (1998). "Optimal screening of surface-displayed

polypeptide libraries." Biotechnol Prog 14(1): 55-62.

Bogarad, L. D. and M. W. Deem (1999). "A hierarchical approach to protein molecular

evolution." Proc Natl Acad Sci U S A 96(6): 2591-5.

Bolon, D. N. and S. L. Mayo (2001). "Enzyme-like proteins by computational design."

Proc Natl Acad Sci U S A 98(25): 14274-9.

Bolon, D. N., C. A. Voigt and S. L. Mayo (2002). "De novo design of biocatalysts." Curr

Opin Chem Biol 6(2): 125-9.

Bornscheuer, U. T., J. Altenbuchner and H. H. Meyer (1999). "Directed evolution of an

: screening of enzyme libraries based on pH-indicators and a growth

assay." Bioorg Med Chem 7(10): 2169-73.

Bornscheuer, U. T. and M. Pohl (2001). "Improved biocatalysts by directed evolution and

rational protein design." Curr Opin Chem Biol 5(2): 137-43.

Borthwick, E. B., S. J. Connell, D. W. Tudor, D. J. Robins, A. Shneier, C. Abell and J. R.

Coggins (1995). "Escherichia coli dihydrodipicolinate synthase: characterization

of the imine intermediate and the product of bromopyruvate treatment by

electrospray mass spectrometry." Biochem J 305 (Pt 2): 521-4.

Brug, J., R. J. Esser and G. B. Paerels (1959). "The enzymic cleavage of N-

acetylneuraminic acid." Biochim Biophys Acta 33(1): 241-2. 197 Brunetti, P., G. W. Jourdian and S. Roseman (1962). "The sialic acids. III. Distribution

and properties of animal N-acetylneuraminic aldolase." J Biol Chem 237: 2447-

53.

Buckland, B. C., D. K. Robinson and M. Chartrain (2000). "Biocatalysis for

pharmaceuticals--status and prospects for a key technology." Metab Eng 2(1): 42-

8.

Bukhari, A. I. and A. L. Taylor (1971). "Genetic analysis of diaminopimelic acid- and

lysine-requiring mutants of Escherichia coli." J Bacteriol 105(3): 844-54.

Cadwell, R. C. and G. F. Joyce (1994). "Mutagenic PCR." PCR Methods Appl 3(6):

S136-40.

Cedrone, F., A. Menez and E. Quemeneur (2000). "Tailoring new enzyme functions by

rational redesign." Curr Opin Struct Biol 10(4): 405-10.

Chaudhuri, B. N., M. R. Sawaya, C. Y. Kim, G. S. Waldo, M. S. Park, T. C. Terwilliger

and T. O. Yeates (2003). "The crystal structure of the first enzyme in the

pantothenate biosynthetic pathway, ketopantoate hydroxymethyltransferase, from

M tuberculosis." Structure (Camb) 11(7): 753-64.

Chen, J. T. and S. J. Benkovic (1983). "Synthesis and separation of diastereomers of

deoxynucleoside 5'-O-(1-thio)triphosphates." Nucleic Acids Res 11(11): 3737-51. 198 Chen, K. and F. H. Arnold (1993). "Tuning the activity of an enzyme for unusual

environments: sequential random mutagenesis of E for catalysis in

dimethylformamide." Proc Natl Acad Sci U S A 90(12): 5618-22.

Cheon, Y. H., H. S. Park, J. H. Kim, Y. Kim and H. S. Kim (2004). "Manipulation of the

active site loops of D-hydantoinase, a (beta/alpha)8-barrel protein, for modulation

of the substrate specificity." Biochemistry 43(23): 7413-20.

Cherry, J. R., M. H. Lamsa, P. Schneider, J. Vind, A. Svendsen, A. Jones and A. H.

Pedersen (1999). "Directed evolution of a fungal peroxidase." Nat Biotechnol

17(4): 379-84.

Chothia, C. (1992). "Proteins. One thousand families for the molecular biologist." Nature

357(6379): 543-4.

Cirino, P. C., K. M. Mayer and D. Umeno (2003). "Generating mutant libraries using

error-prone PCR." Methods Mol Biol 231: 3-9.

Coco, W. M., L. P. Encell, W. E. Levinson, M. J. Crist, A. K. Loomis, L. L. Licato, J. J.

Arensdorf, N. Sica, P. T. Pienkos and D. J. Monticello (2002). "Growth factor

engineering by degenerate homoduplex gene family recombination." Nat

Biotechnol 20(12): 1246-50.

Coco, W. M., W. E. Levinson, M. J. Crist, H. J. Hektor, A. Darzins, P. T. Pienkos, C. H.

Squires and D. J. Monticello (2001). "DNA shuffling method for generating

highly recombined genes and evolved enzymes." Nat Biotechnol 19(4): 354-9. 199 Colas, P., B. Cohen, T. Jessen, I. Grishina, J. McCoy and R. Brent (1996). "Genetic

selection of peptide aptamers that recognize and inhibit cyclin-dependent kinase

2." Nature 380(6574): 548-50.

Comb, D. G. and S. Roseman (1960). "The sialic acids. I. The structure and enzymatic

synthesis of N-acetylneuraminic acid." J Biol Chem 235: 2529-37.

Comb, D. G. and S. Roseman (1962). "N-acetylneuraminic acid aldolase." Methods

Enzymol. 5: 391-394.

Coulter, C. V., J. A. Gerrard, J. A. E. Kraunsoe, D. L. Moore and A. J. Pratt (1996).

"Aspartate semi-aldehyde: synthetic and structural studies." Tetrahedron 52:

7127.

Coulter, C. V., J. A. Gerrard, J. A. E. Kraunsoe and A. J. Pratt (1999). "Escherichia coli

dihydrodipicolinate synthase and dihydrodipicolinate reductase: kinetic and

inhibition studies of two putative herbicide targets." Pesticide Science 55(9): 887-

895.

Cox, R. J., A. Sutherland and J. C. Vederas (2000). "Bacterial diaminopimelate

metabolism as a target for antibiotic design." Bioorg Med Chem 8(5): 843-71.

Cremer, J., C. Treptow, L. Eggeling and H. Sahm (1988). "Regulation of enzymes of

lysine biosynthesis in Corynebacterium glutamicum." J Gen Microbiol 134(12):

3221-9. 200 Cull, M. G., J. F. Miller and P. J. Schatz (1992). "Screening for receptor ligands using

large libraries of peptides linked to the C terminus of the lac repressor." Proc Natl

Acad Sci U S A 89(5): 1865-9.

Cwirla, S. E., P. Balasubramanian, D. J. Duffin, C. R. Wagstrom, C. M. Gates, S. C.

Singer, A. M. Davis, R. L. Tansik, L. C. Mattheakis, C. M. Boytos, P. J. Schatz,

D. P. Baccanari, N. C. Wrighton, R. W. Barrett and W. J. Dower (1997). "Peptide

agonist of the as potent as the natural ." Science

276(5319): 1696-9.

Datsenko, K. A. and B. L. Wanner (2000). "One-step inactivation of chromosomal genes

in Escherichia coli K-12 using PCR products." Proc Natl Acad Sci U S A 97(12):

6640-5.

Daubner, S. C., J. L. Schrimsher, F. J. Schendel, M. Young, S. Henikoff, D. Patterson, J.

Stubbe and S. J. Benkovic (1985). "A multifunctional protein possessing

glycinamide ribonucleotide synthetase, glycinamide ribonucleotide

transformylase, and aminoimidazole ribonucleotide synthetase activities in de

novo purine biosynthesis." Biochemistry 24(25): 7059-62.

Daugherty, P. S., B. L. Iverson and G. Georgiou (2000). "Flow cytometric screening of

cell-based libraries." J Immunol Methods 243(1-2): 211-27.

Davidson, J. N., K. C. Chen, R. S. Jamison, L. A. Musmanno and C. B. Kern (1993).

"The evolutionary history of the first three enzymes in pyrimidine biosynthesis."

Bioessays 15(3): 157-64. 201 Declerck, N., M. Machius, G. Wiegand, R. Huber and C. Gaillardin (2000). "Probing

structural determinants specifying high thermostability in Bacillus licheniformis

alpha-amylase." J Mol Biol 301(4): 1041-57.

Deijl, C. M. and J. F. Vliegenthart (1983). "Configuration of substrate and products of N-

acetylneuraminate pyruvate-lyase from Clostridium perfringens." Biochem

Biophys Res Commun 111(2): 668-74.

Dereppe, C., G. Bold, O. Ghisalba, E. Ebert and H.-P. Schär (1992). "Purification and

Characterization of Dihydrodipicolinate Synthase from Pea." Plant Physiol.

98(3): 813-821.

Dnistrian, A. M. and M. K. Schwartz (1983). "Lipid-bound sialic acid as a tumor

marker." Ann Clin Lab Sci 13(2): 137-42.

Dobson, R. C., M. D. Griffin, G. B. Jameson and J. A. Gerrard (2005). "The crystal

structures of native and (S)-lysine-bound dihydrodipicolinate synthase from

Escherichia coli with improved resolution show new features of biological

significance." Acta Crystallogr D Biol Crystallogr 61(Pt 8): 1116-24.

Dobson, R. C., K. Valegard and J. A. Gerrard (2004). "The crystal structure of three site-

directed mutants of Escherichia coli dihydrodipicolinate synthase: further

evidence for a ." J Mol Biol 338(2): 329-39. 202 Dwyer, M. A., L. L. Looger and H. W. Hellinga (2003). "Computational design of a Zn2+

receptor that controls bacterial gene expression." Proc Natl Acad Sci U S A

100(20): 11255-60.

Eads, J. C., D. Ozturk, T. B. Wexler, C. Grubmeyer and J. C. Sacchettini (1997). "A new

function for a common fold: the crystal structure of quinolinic acid

phosphoribosyltransferase." Structure 5(1): 47-58.

Ebbole, D. J. and H. Zalkin (1987). "Cloning and characterization of a 12-gene cluster

from Bacillus subtilis encoding nine enzymes for de novo purine nucleotide

synthesis." J Biol Chem 262(17): 8274-87.

Erlandsen, H., E. E. Abola and R. C. Stevens (2000). "Combining structural genomics

and enzymology: completing the picture in metabolic pathways and enzyme

active sites." Curr Opin Struct Biol 10(6): 719-30.

Fani, R., P. Lio, I. Chiarelli and M. Bazzicalupo (1994). "The evolution of the histidine

biosynthetic genes in prokaryotes: a common ancestor for the hisA and hisF

genes." J Mol Evol 38(5): 489-95.

Farber, G. K. and G. A. Petsko (1990). "The evolution of alpha/ enzymes."

Trends Biochem Sci 15(6): 228-34.

Firestine, S. M. and V. J. Davisson (1994). "Carboxylases in de novo purine biosynthesis.

Characterization of the Gallus gallus bifunctional enzyme." Biochemistry 33(39):

11917-26. 203 Fitz, W., J. R. Schwark and C. H. Wong (1995). "Aldotetroses and C (3)-modified

aldohexoses as substrates for N-acetylneuraminic acid aldolase: a model for the

explanation of the normal and the inversed stereoselectivity." J. Org. Chem, 60:

3663.

Fleischmann, R. D., M. D. Adams, O. White, R. A. Clayton, E. F. Kirkness, A. R.

Kerlavage, C. J. Bult, J. F. Tomb, B. A. Dougherty, J. M. Merrick and et al.

(1995). "Whole-genome random sequencing and assembly of Haemophilus

influenzae Rd." Science 269(5223): 496-512.

Forsyth, W. R. and C. R. Matthews (2002). "Folding mechanism of indole-3-glycerol

phosphate synthase from Sulfolobus solfataricus: a test of the conservation of

folding mechanisms hypothesis in (beta(alpha))(8) barrels." J Mol Biol 320(5):

1119-33.

Fridjonsson, O., H. Watzlawick and R. Mattes (2002). "Thermoadaptation of alpha-

galactosidase AgaB1 in Thermus thermophilus." J Bacteriol 184(12): 3385-91.

Frisch, D. A., B. G. Gengenbach, A. M. Tommey, J. M. Sellner, D. A. Somers and D. E.

Myers (1991). "Isolation and Characterization of Dihydrodipicolinate Synthase

from Maize." Plant Physiol. 96(2): 444–452.

Frisch, D. A., A. M. Tommey, B. G. Gengenbach and D. A. Somers (1991). "Direct

genetic selection of a maize cDNA for dihydrodipicolinate synthase in an

Escherichia coli dapA- auxotroph." Mol Gen Genet 228(1-2): 287-93. 204 Galperin, M. Y. and E. V. Koonin (1997). "A diverse superfamily of enzymes with ATP-

dependent carboxylate-amine/thiol ligase activity." Protein Sci 6(12): 2639-43.

Georgiou, G. (2000). "Analysis of large libraries of protein mutants using flow

cytometry." Adv Protein Chem 55: 293-315.

Gerlt, J. A. and P. C. Babbitt (2001). "Divergent evolution of enzymatic function:

mechanistically diverse superfamilies and functionally distinct suprafamilies."

Annu Rev Biochem 70: 209-46.

Gerlt, J. A., P. C. Babbitt and I. Rayment (2005). "Divergent evolution in the enolase

superfamily: the interplay of mechanism and specificity." Arch Biochem Biophys

433(1): 59-70.

Gerlt, J. A. and F. M. Raushel (2003). "Evolution of function in (beta/alpha)8-barrel

enzymes." Curr Opin Chem Biol 7(2): 252-64.

Gerstein, M. (1998). "Patterns of protein-fold usage in eight microbial genomes: a

comprehensive structural census." Proteins 33(4): 518-34.

Gerstein, M. (2000). "Integrative database analysis in structural genomics." Nat Struct

Biol 7 Suppl: 960-3.

Ghislain, M., V. Frankard and M. Jacobs (1990). "Dihydrodipicolinate synthase of

Nicotiana sylvestris, a -localized enzyme of the lysine pathway."

Planta 180: 480-486. 205 Griffiths, A. D. and D. S. Tawfik (2003). "Directed evolution of an extremely fast

phosphotriesterase by in vitro compartmentalization." Embo J 22(1): 24-35.

Griswold, K. E., Y. Kawarasaki, N. Ghoneim, S. J. Benkovic, B. L. Iverson and G.

Georgiou (2005). "Evolution of highly active enzymes by homology-independent

recombination." Proc Natl Acad Sci U S A 102(29): 10082-7.

Hasson, M. S., I. Schlichting, J. Moulai, K. Taylor, W. Barrett, G. L. Kenyon, P. C.

Babbitt, J. A. Gerlt, G. A. Petsko and D. Ringe (1998). "Evolution of an enzyme

active site: the structure of a new crystal form of muconate lactonizing enzyme

compared with and enolase." Proc Natl Acad Sci U S A

95(18): 10396-401.

Henikoff, S. (1986). "The Saccharomyces cerevisiae ADE5,7 protein is homologous to

overlapping Drosophila melanogaster Gart polypeptides." J Mol Biol 190(4): 519-

28.

Henn-Sax, M., B. Hocker, M. Wilmanns and R. Sterner (2001). "Divergent evolution of

(betaalpha)8-barrel enzymes." Biol Chem 382(9): 1315-20.

Henn-Sax, M., R. Thoma, S. Schmidt, M. Hennig, K. Kirschner and R. Sterner (2002).

"Two (betaalpha)(8)-barrel enzymes of histidine and tryptophan biosynthesis have

similar reaction mechanisms and common strategies for protecting their labile

substrates." Biochemistry 41(40): 12032-42. 206 Hiraga, K. and F. H. Arnold (2003). "General method for sequence-independent site-

directed chimeragenesis." J Mol Biol 330(2): 287-96.

Ho, S. N., H. D. Hunt, R. M. Horton, J. K. Pullen and L. R. Pease (1989). "Site-directed

mutagenesis by overlap extension using the polymerase chain reaction." Gene

77(1): 51-9.

Hocker, B. (2005). "Directed evolution of (betaalpha)(8)-barrel enzymes." Biomol Eng

22(1-3): 31-8.

Hocker, B., S. Beismann-Driemeyer, S. Hettwer, A. Lustig and R. Sterner (2001).

"Dissection of a (betaalpha)8-barrel enzyme into two folded halves." Nat Struct

Biol 8(1): 32-6.

Hocker, B., J. Claren and R. Sterner (2004). "Mimicking enzyme evolution by generating

new (betaalpha)8-barrels from (betaalpha)4-half-barrels." Proc Natl Acad Sci U S

A 101(47): 16448-53.

Hocker, B., C. Jurgens, M. Wilmanns and R. Sterner (2001). "Stability, catalytic

versatility and evolution of the (beta alpha)(8)-barrel fold." Curr Opin Biotechnol

12(4): 376-81.

Hoganson, D. A. and D. P. Stahly (1975). "Regulation of dihydrodipicolinate synthase

during growth and sporulation of Bacillus cereus." J Bacteriol 124(3): 1344-50. 207 Horswill, A. R., S. N. Savinov and S. J. Benkovic (2004). "A systematic method for

identifying small-molecule modulators of protein-protein interactions." Proc Natl

Acad Sci U S A 101(44): 15591-6.

Ikeuchi, A., Y. Kawarasaki, T. Shinbata and T. Yamane (2003). "Chimeric gene library

construction by a simple and highly versatile method using recombination-

dependent exponential amplification." Biotechnol Prog 19(5): 1460-7.

Izard, T., M. C. Lawrence, R. L. Malby, G. G. Lilley and P. M. Colman (1994). "The

three-dimensional structure of N-acetylneuraminate lyase from Escherichia coli."

Structure 2(5): 361-9.

Jensen, R. A. (1976). "Enzyme recruitment in evolution of new function." Annu Rev

Microbiol 30: 409-25.

Jestin, J. L. and P. A. Kaminski (2004). "Directed enzyme evolution and selections for

catalysis based on product formation." J Biotechnol 113(1-3): 85-103.

Jurgens, C., A. Strom, D. Wegener, S. Hettwer, M. Wilmanns and R. Sterner (2000).

"Directed evolution of a (beta alpha)8-barrel enzyme to catalyze related reactions

in two different metabolic pathways." Proc Natl Acad Sci U S A 97(18): 9925-30.

Kaijser, B., L. A. Hanson, U. Jodal, G. Lidin-Janson and J. B. Robbins (1977).

"Frequency of E. coli K antigens in urinary-tract infections in children." Lancet

1(8013): 663-6. 208 Kan, J. L., M. Jannatipour, S. M. Taylor and R. G. Moran (1993). "Mouse cDNAs

encoding a trifunctional protein of de novo purine synthesis and a related single-

domain glycinamide ribonucleotide synthetase." Gene 137(2): 195-202.

Kappock, T. J., S. E. Ealick and J. Stubbe (2000). "Modular evolution of the purine

biosynthetic pathway." Curr Opin Chem Biol 4(5): 567-72.

Karsten, W. E. (1997). "Dihydrodipicolinate synthase from Escherichia coli: pH

dependent changes in the kinetic mechanism and kinetic mechanism of allosteric

inhibition by L-lysine." Biochemistry 36(7): 1730-9.

Kawarasaki, Y., K. E. Griswold, J. D. Stevenson, T. Selzer, S. J. Benkovic, B. L. Iverson

and G. Georgiou (2003). "Enhanced crossover SCRATCHY: construction and

high-throughput screening of a combinatorial library containing multiple non-

homologous crossovers." Nucleic Acids Res 31(21): e126.

Kiefelt, M. J., J. C. Wilson, S. Bennett, M. Gredley and M. von Itzstein (2000).

"Synthesis and evaluation of C-9 modified N-acetylneuraminic acid derivatives as

substrates for N-acetylneuraminic acid aldolase." Bioorg Med Chem 8(3): 657-64.

Kikuchi, M., K. Ohnishi and S. Harayama (1999). "Novel family shuffling methods for

the in vitro evolution of enzymes." Gene 236(1): 159-67.

Kikuchi, M., K. Ohnishi and S. Harayama (2000). "An effective family shuffling method

using single-stranded DNA." Gene 243(1-2): 133-7. 209 Kim, M. J., W. J. Hennen, W. J. Sweers and C. H. Wong (1988). "Enzymes in

carbohydrate synthesis: N-acetylneuraminic acid aldolase catalyzed reactions and

preparation of N-acetyl-2-deoxy-D-neuraminic acid derivatives." J. Am. Chem.

Soc, 110: 6481.

Klein, C., P. Chen, J. H. Arevalo, E. A. Stura, A. Marolewski, M. S. Warren, S. J.

Benkovic and I. A. Wilson (1995). "Towards structure-based : crystal

structure of a multisubstrate adduct complex of glycinamide ribonucleotide

transformylase at 1.96 A resolution." J Mol Biol 249(1): 153-75.

Kong, D. C. M. and M. von Itzstein (1995). "The first synthesis of a C-7 Nitrogen-

containing sialic acid analogue, 5-acetamido-7-azido-3,5,7-trideoxy-D-glycero-D-

galacto-2-nonulopyrasonic acid (7-azido-7-deoxy-Neu5Ac)." Tet. Lett. 36(6):

957-960.

Kong, D. C. M. and M. von Iztstein (1998). "The chemoenzymatic synthesis of 9-

substituted-3,9-dideoxy-D-glycero-D-galacto-2-nonulosonic acids." Carbohydr.

Res. 305: 323-329.

Koshland, D. E., Jr. (1998). "Conformational changes: how small is big enough?" Nat

Med 4(10): 1112-4.

Krahn, J. M., J. H. Kim, M. R. Burns, R. J. Parry, H. Zalkin and J. L. Smith (1997).

"Coupled formation of an amidotransferase interdomain ammonia channel and a

phosphoribosyltransferase active site." Biochemistry 36(37): 11061-8. 210 Kruger, D., R. Schauer and C. Traving (2001). "Characterization and mutagenesis of the

recombinant N-acetylneuraminate lyase from Clostridium perfringens: insights

into the reaction mechanism." Eur J Biochem 268(13): 3831-9.

Kuchner, O. and F. H. Arnold (1997). "Directed evolution of enzyme catalysts." Trends

Biotechnol 15(12): 523-30.

Kuhlman, B., G. Dantas, G. C. Ireton, G. Varani, B. L. Stoddard and D. Baker (2003).

"Design of a novel globular protein fold with atomic-level accuracy." Science

302(5649): 1364-8.

Kumpaisal, R., T. Hashimoto and Y. Yamada (1987). "Purification and Characterization

of Dihydrodipicolinate Synthase from Wheat Suspension Cultures." Plant

Physiol. 85(1): 145-151.

Laber, B., F. X. Gomis-Ruth, M. J. Romao and R. Huber (1992). "Escherichia coli

dihydrodipicolinate synthase. Identification of the active site and crystallization."

Biochem J 288 (Pt 2): 691-5.

Lancet, D., E. Sadovsky and E. Seidemann (1993). "Probability model for molecular

recognition in biological receptor repertoires: significance to the olfactory

system." Proc Natl Acad Sci U S A 90(8): 3715-9.

Lang, D., R. Thoma, M. Henn-Sax, R. Sterner and M. Wilmanns (2000). "Structural

evidence for evolution of the beta/alpha barrel scaffold by gene duplication and

fusion." Science 289(5484): 1546-50. 211 Lee, S. G., S. Lutz and S. J. Benkovic (2003). "On the structural and functional

modularity of glycinamide ribonucleotide formyltransferases." Protein Sci 12(10):

2206-14.

Lee SH, R., Min Jung Kang, Eun-Sun Wang, Zhe Piao, Yeon Jin Choi, Kyung Hwa Jung,

John Y. J. Jeon and Yong Chul Shin (2003). "A new approach to directed gene

evolution by recombined extension on truncated templates (RETT)." Journal of

Molecular Catalysis B: Enzymatic.

Licitra, E. J. and J. O. Liu (1996). "A three-hybrid system for detecting small ligand-

protein receptor interactions." Proc Natl Acad Sci U S A 93(23): 12817-21.

Lim, F., C. P. Morris, F. Occhiodoro and J. C. Wallace (1988). "Sequence and domain

structure of yeast pyruvate carboxylase." J Biol Chem 263(23): 11493-7.

Lutz S and B. S. J. (2002). Engineering Protein Evolution. Directed

of Proteins. B. S and J. K, Wiley-VCH.

Lutz, S., M. Ostermeier and S. J. Benkovic (2001). "Rapid generation of incremental

truncation libraries for protein engineering using alpha-phosphothioate

nucleotides." Nucleic Acids Res 29(4): E16.

Lutz, S., M. Ostermeier, G. L. Moore, C. D. Maranas and S. J. Benkovic (2001).

"Creating multiple-crossover DNA libraries independent of sequence identity."

Proc Natl Acad Sci U S A 98(20): 11248-53. 212 Lutz, S. and W. M. Patrick (2004). "Novel methods for directed evolution of enzymes:

quality, not quantity." Curr Opin Biotechnol 15(4): 291-7.

Manivasakam, P., S. C. Weber, J. McElver and R. H. Schiestl (1995). "Micro-homology

mediated PCR targeting in Saccharomyces cerevisiae." Nucleic Acids Res 23(14):

2799-800.

Marolewski, A., J. M. Smith and S. J. Benkovic (1994). "Cloning and characterization of

a new purine biosynthetic enzyme: a non-folate glycinamide ribonucleotide

transformylase from E. coli." Biochemistry 33(9): 2531-7.

Marolewski, A. E., K. M. Mattia, M. S. Warren and S. J. Benkovic (1997). "Formyl

phosphate: a proposed intermediate in the reaction catalyzed by Escherichia coli

PurT GAR transformylase." Biochemistry 36(22): 6709-16.

Mata, L., J. C. Gripon and M. Y. Mistou (1999). "Deletion of the four C-terminal

residues of PepC converts an aminopeptidase into an oligopeptidase." Protein Eng

12(8): 681-6.

Mathews, II, T. J. Kappock, J. Stubbe and S. E. Ealick (1999). "Crystal structure of

Escherichia coli PurE, an unusual mutase in the purine biosynthetic pathway."

Structure Fold Des 7(11): 1395-406.

Matsumura, I. and A. D. Ellington (2001). "Mutagenic PCR of Protein-coding genes for

In Vitro Evolution." Methods in Molecular Biology vol 182: In Vitro Mutagenesis,

2nd ed.: 261-269. 213 Mattheakis, L. C., R. R. Bhatt and W. J. Dower (1994). "An in vitro polysome display

system for identifying ligands from very large peptide libraries." Proc Natl Acad

Sci U S A 91(19): 9022-6.

Matthews, B. F. and J. M. Widholm (1978). "Regulation of lysine and threonine synthesis

in carrot cell suspension cultures and whole carrot roots." Planta 141(3): 315-321.

Meyer, E., N. J. Leonard, B. Bhat, J. Stubbe and J. M. Smith (1992). "Purification and

characterization of the purE, purK, and purC gene products: identification of a

previously unrecognized energy requirement in the purine biosynthetic pathway."

Biochemistry 31(21): 5022-32.

Miflin, B. (2000). "Crop improvement in the 21st century." J Exp Bot 51(342): 1-8.

Mirwaldt, C., I. Korndorfer and R. Huber (1995). "The crystal structure of

dihydrodipicolinate synthase from Escherichia coli at 2.5 A resolution." J Mol

Biol 246(1): 227-39.

Miyazaki, K. and F. H. Arnold (1999). "Exploring nonnatural evolutionary pathways by

saturation mutagenesis: rapid improvement of protein function." J Mol Evol

49(6): 716-20.

Moore, G. L. and C. D. Maranas (2003). "Identifying residue-residue clashes in protein

hybrids by using a second-order mean-field approach." Proc Natl Acad Sci U S A

100(9): 5091-6. 214 Moore, G. L., C. D. Maranas, S. Lutz and S. J. Benkovic (2001). "Predicting crossover

generation in DNA shuffling." Proc Natl Acad Sci U S A 98(6): 3226-31.

Mueller, E. J., E. Meyer, J. Rudolph, V. J. Davisson and J. Stubbe (1994). "N5-

carboxyaminoimidazole ribonucleotide: evidence for a new intermediate and two

new enzymatic activities in the de novo purine biosynthetic pathway of

Escherichia coli." Biochemistry 33(8): 2269-78.

Murphy, K. C. (1998). "Use of bacteriophage lambda recombination functions to promote

gene replacement in Escherichia coli." J Bacteriol 180(8): 2063-71.

Murzin, A. G., S. E. Brenner, T. Hubbard and C. Chothia (1995). "SCOP: a structural

classification of proteins database for the investigation of sequences and

structures." J Mol Biol 247(4): 536-40.

Nagano, N., C. A. Orengo and J. M. Thornton (2002). "One fold with many functions: the

evolutionary relationships between TIM barrel families based on their sequences,

structures and functions." J Mol Biol 321(5): 741-65.

Ness, J. E., S. B. Del Cardayre, J. Minshull and W. P. Stemmer (2000). "Molecular

breeding: the natural approach to protein design." Adv Protein Chem 55: 261-92.

Ness, J. E., S. Kim, A. Gottman, R. Pak, A. Krebber, T. V. Borchert, S. Govindarajan, E.

C. Mundorff and J. Minshull (2002). "Synthetic shuffling expands functional

protein diversity by allowing amino acids to recombine independently." Nat

Biotechnol 20(12): 1251-5. 215 Neylon, C. (2004). "Chemical and biochemical strategies for the randomization of protein

encoding DNA sequences: library construction methods for directed evolution."

Nucleic Acids Res 32(4): 1448-59.

Nguyen, A. W. and P. S. Daugherty (2003). "Production of randomly mutated plasmid

libraries using mutator strains." Methods Mol Biol 231: 39-44.

Nielsen, J. (2001). "Metabolic engineering." Appl Microbiol Biotechnol 55(3): 263-83.

Nixon, A. E. and S. J. Benkovic (2000). "Improvement in the efficiency of formyl

transfer of a GAR transformylase hybrid enzyme." Protein Eng 13(5): 323-7.

Nixon, A. E., M. Ostermeier and S. J. Benkovic (1998). "Hybrid enzymes: manipulating

enzyme design." Trends Biotechnol 16(6): 258-64.

Nixon, A. E., M. S. Warren and S. J. Benkovic (1997). "Assembly of an active enzyme

by the linkage of two protein modules." Proc Natl Acad Sci U S A 94(4): 1069-73.

Norledge, B. V., A. M. Lambeir, R. A. Abagyan, A. Rottmann, A. M. Fernandez, V. V.

Filimonov, M. G. Peter and R. K. Wierenga (2001). "Modeling, mutagenesis, and

structural studies on the fully conserved phosphate-binding loop (loop 8) of

triosephosphate isomerase: toward a new substrate specificity." Proteins 42(3):

383-9.

Nyunoya, H. and C. J. Lusty (1983). "The carB gene of Escherichia coli: a duplicated

gene coding for the large subunit of carbamoyl-phosphate synthetase." Proc Natl

Acad Sci U S A 80(15): 4629-33. 216 O'Brien, P. J. and D. Herschlag (1999). "Catalytic promiscuity and the evolution of new

enzymatic activities." Chem Biol 6(4): R91-R105.

O'Maille, P. E., M. Bakhtina and M. D. Tsai (2002). "Structure-based combinatorial

protein engineering (SCOPE)." J Mol Biol 321(4): 677-91.

Odegrip, R., D. Coomber, B. Eldridge, R. Hederer, P. A. Kuhlman, C. Ullman, K.

FitzGerald and D. McGregor (2004). "CIS display: In vitro selection of peptides

from libraries of protein-DNA complexes." Proc Natl Acad Sci U S A 101(9):

2806-10.

Ogita, T. and J. R. Knowles (1988). "On the intermediacy of carboxyphosphate in biotin-

dependent carboxylations." Biochemistry 27(21): 8028-33.

Olsen, M. J., D. Stephens, D. Griffiths, P. Daugherty, G. Georgiou and B. L. Iverson

(2000). "Function-based isolation of novel enzymes from a large library." Nat

Biotechnol 18(10): 1071-4.

Ooi, H. C., S. M. Marcuccio and W. R. Jackson (2000). "A new preparation of the

diastereoisomeric N-acetylneuraminate alditols." Aust, J. Chem. 53: 171-174.

Orengo, C. A., A. D. Michie, S. Jones, D. T. Jones, M. B. Swindells and J. M. Thornton

(1997). "CATH--a hierarchic classification of structures."

Structure 5(8): 1093-108.

Ostermeier, M. (2003). "Synthetic gene libraries: in search of the optimal diversity."

Trends Biotechnol 21(6): 244-7. 217 Ostermeier, M. (2003). "Theoretical distribution of truncation lengths in incremental

truncation libraries." Biotechnol Bioeng 82(5): 564-77.

Ostermeier, M. and S. J. Benkovic (2000). "Evolution of protein function by domain

swapping." Adv Protein Chem 55: 29-77.

Ostermeier, M., A. E. Nixon, J. H. Shim and S. J. Benkovic (1999). "Combinatorial

protein engineering by incremental truncation." Proc Natl Acad Sci U S A 96(7):

3562-7.

Ostermeier, M., J. H. Shim and S. J. Benkovic (1999). "A combinatorial approach to

hybrid enzymes independent of DNA homology." Nat Biotechnol 17(12): 1205-9.

Patnaik, R., S. Louie, V. Gavrilovic, K. Perry, W. P. Stemmer, C. M. Ryan and S. del

Cardayre (2002). "Genome shuffling of Lactobacillus for improved acid

tolerance." Nat Biotechnol 20(7): 707-12.

Pechere, J. F. and J. P. Capony (1967). "On the colormetric determination of acyl

phosphate." Anal. Biochem. 22: 536-539.

Petri, R. and C. Schmidt-Dannert (2004). "Dealing with complexity: evolutionary

engineering and genome shuffling." Curr Opin Biotechnol 15(4): 298-304.

Petrounia, I. P. and F. H. Arnold (2000). "Designed evolution of enzymatic properties."

Curr Opin Biotechnol 11(4): 325-30. 218 Petsko, G. A. (2000). "Enzyme evolution. Design by necessity." Nature 403(6770): 606-

7.

Philippe, H., D. Casane, S. Gribaldo, P. Lopez and J. Meunier (2003). "Heterotachy and

functional shift in protein evolution." IUBMB Life 55(4-5): 257-65.

Pluckthun, A., C. Schaffitzel, J. Hanes and L. Jermutus (2000). "In vitro selection and

evolution of proteins." Adv Protein Chem 55: 367-403.

Polakis, S. E., R. B. Guchhait, E. E. Zwergel, M. D. Lane and T. G. Cooper (1974).

"Acetyl coenzyme A carboxylase system of Escherichia coli. Studies on the

mechanisms of the - and carboxyltransferase-catalyzed

reactions." J Biol Chem 249(20): 6657-67.

Ponsard, I., M. Galleni, P. Soumillion and J. Fastrez (2001). "Selection of

metalloenzymes by catalytic activity using phage display and catalytic elution."

Chembiochem 2(4): 253-9.

Popenoe, E. A. and R. M. Drew (1957). "The action of an enzyme of Clostridium

perfringens on orosomucoid." J Biol Chem 228(2): 673-83.

Priestle, J. P., M. G. Grutter, J. L. White, M. G. Vincent, M. Kania, E. Wilson, T. S.

Jardetzky, K. Kirschner and J. N. Jansonius (1987). "Three-dimensional structure

of the bifunctional enzyme N-(5'-phosphoribosyl)anthranilate isomerase-indole-3-

glycerol-phosphate synthase from Escherichia coli." Proc Natl Acad Sci U S A

84(16): 5690-4. 219 Pujadas, G. and J. Palau (1999). "TIM barrel fold: structural, functional and evolutionary

characteristics in natural and designed molecules." Biologia Bratislava 54(3):

231-253.

Putney, S. D., S. J. Benkovic and P. R. Schimmel (1981). "A DNA fragment with an

alpha-phosphorothioate nucleotide at one end is asymmetrically blocked from

digestion by exonuclease III and can be replicated in vivo." Proc Natl Acad Sci U

S A 78(12): 7350-4.

Radzicka, A. and R. Wolfenden (1995). "A proficient enzyme." Science 267(5194): 90-3.

Raushel, F. M. and J. J. Villafranca (1979). "Determination of rate-limiting steps of

Escherichia coli carbamoyl-phosphate synthase. Rapid quench and isotope

partitioning experiments." Biochemistry 18(15): 3424-9.

Raushel, F. M. and J. J. Villafranca (1980). "Phosphorus-31 nuclear magnetic resonance

application to positional isotope exchange reactions catalyzed by Escherichia coli

carbamoyl-phosphate synthetase: analysis of forward and reverse enzymatic

reactions." Biochemistry 19(14): 3170-4.

Reardon, D. and G. K. Farber (1995). "The structure and evolution of alpha/beta barrel

proteins." Faseb J 9(7): 497-503.

Reglero, A., L. B. Rodriguez-Aparicio and J. M. Luengo (1993). "Polysialic acids." Int J

Biochem 25(11): 1517-27. 220 Reidhaar-Olson, J. F., J. U. Bowie, R. M. Breyer, J. C. Hu, K. L. Knight, W. A. Lim, M.

C. Mossing, D. A. Parsell, K. R. Shoemaker and R. T. Sauer (1991). "Random

mutagenesis of protein sequences using oligonucleotide cassettes." Methods

Enzymol 208: 564-86.

Richardson, T. H., X. Tan, G. Frey, W. Callen, M. Cabell, D. Lam, J. Macomber, J. M.

Short, D. E. Robertson and C. Miller (2002). "A novel, high performance enzyme

for starch liquefaction. Discovery and optimization of a low pH, thermostable

alpha-amylase." J Biol Chem 277(29): 26501-7.

Roberts, R. W. and J. W. Szostak (1997). "RNA-peptide fusions for the in vitro selection

of peptides and proteins." Proc Natl Acad Sci U S A 94(23): 12297-302.

Rowe, L. A., M. L. Geddie, O. B. Alexander and I. Matsumura (2003). "A comparison of

directed evolution approaches using the beta-glucuronidase model system." J Mol

Biol 332(4): 851-60.

Saraf, M. C., A. R. Horswill, S. J. Benkovic and C. D. Maranas (2004). "FamClash: a

method for ranking the activity of engineered enzymes." Proc Natl Acad Sci U S

A 101(12): 4142-7.

Sarff, L. D., G. H. McCracken, M. S. Schiffer, M. P. Glode, J. B. Robbins, I. Orskov and

F. Orskov (1975). "Epidemiology of Escherichia coli K1 in healthy and diseased

newborns." Lancet 1(7916): 1099-104. 221 Sauers, C. K., W. P. Jencks and S. Groh (1975). "Alcohol-bicarbonate-water system.

Structure-reactivity studies on the equilibriums for formation of alkyl

monocarbonates and on the rates of their decomposition in aqueous alkali." J. Am.

Chem. Soc. 97(19): 5546-5553.

Scapin, G., J. S. Blanchard and J. C. Sacchettini (1995). "Three-dimensional structure of

Escherichia coli dihydrodipicolinate reductase." Biochemistry 34(11): 3502-12.

Schmid, A., J. S. Dordick, B. Hauer, A. Kiener, M. Wubbolts and B. Witholt (2001).

"Industrial biocatalysis today and tomorrow." Nature 409(6817): 258-68.

Schmidt-Dannert, C., D. Umeno and F. H. Arnold (2000). "Molecular breeding of

carotenoid biosynthetic pathways." Nat Biotechnol 18(7): 750-3.

Schmidt, D. M., E. C. Mundorff, M. Dojka, E. Bermudez, J. E. Ness, S. Govindarajan, P.

C. Babbitt, J. Minshull and J. A. Gerlt (2003). "Evolutionary potential of

(beta/alpha)8-barrels: functional promiscuity produced by single substitutions in

the enolase superfamily." Biochemistry 42(28): 8387-93.

Shamberger, R. J. (1984). "Serum sialic acid in normals and in cancer patients." J Clin

Chem Clin Biochem 22(10): 647-51.

Shanmugavelu, M., A. R. Baytan, J. D. Chesnut and B. C. Bonning (2000). "A novel

protein that binds juvenile hormone esterase in fat body tissue and pericardial

cells of the tobacco hornworm Manduca sexta L." J Biol Chem 275(3): 1802-6. 222 Shao, Z. and F. H. Arnold (1996). "Engineering new functions and altering existing

functions." Curr Opin Struct Biol 6(4): 513-8.

Shao, Z., H. Zhao, L. Giver and F. H. Arnold (1998). "Random-priming in vitro

recombination: an effective tool for directed evolution." Nucleic Acids Res 26(2):

681-3.

Shedlarski, J. G. and C. Gilvarg (1970). "The pyruvate-aspartic semialdehyde condensing

enzyme of Escherichia coli." J Biol Chem 245(6): 1362-73.

Shim, J. H. and S. J. Benkovic (1998). "Evaluation of the kinetic mechanism of

Escherichia coli glycinamide ribonucleotide transformylase." Biochemistry

37(24): 8776-82.

Shindyalov, I. N. and P. E. Bourne (1998). "Protein structure alignment by incremental

combinatorial extension (CE) of the optimal path." Protein Eng 11(9): 739-47.

Sieber, V., C. A. Martinez and F. H. Arnold (2001). "Libraries of hybrid proteins from

distantly related sequences." Nat Biotechnol 19(5): 456-60.

Silverman, J. A., R. Balakrishnan and P. B. Harbury (2001). "Reverse engineering the

(beta/alpha)8 barrel fold." Proc Natl Acad Sci U S A 98(6): 3092-7.

Sirbasku, D. A. and S. B. Binkley (1970). "Purification and properties of N-

acetylneuraminate lyase from beef kidney cortex." Biochim Biophys Acta 206(3):

479-82. 223 Smith, G. P. (1985). "Filamentous fusion phage: novel expression vectors that display

cloned antigens on the virion surface." Science 228(4705): 1315-7.

Song, J. K., B. Chung, Y. H. Oh and J. S. Rhee (2002). "Construction of DNA-shuffled

and incrementally truncated libraries by a mutagenic and unidirectional

reassembly method: changing from a substrate specificity of phospholipase to that

of ." Appl Environ Microbiol 68(12): 6146-51.

Sorensen, I. S. and G. Dandanell (1997). "Identification and sequence analysis of

Sulfolobus solfataricus purE and purK genes." FEMS Microbiol Lett 154(2): 173-

80.

Stahly, D. P. (1969). "Dihydrodipicolinic acid synthase of Bacillus licheniformis."

Biochim Biophys Acta 191(2): 439-51.

Stemmer, W. P. (1994). "DNA shuffling by random fragmentation and reassembly: in

vitro recombination for molecular evolution." Proc Natl Acad Sci U S A 91(22):

10747-51.

Stemmer, W. P. (1994). "Rapid evolution of a protein in vitro by DNA shuffling." Nature

370(6488): 389-91.

Stevenson, J. D. and S. J. Benkovic (2002). "Combinatorial approaches to engineering

hybrid enzymes." J Chem Soc Perkin Trans 2: 1483-1493.

Sugahara, K., K. Sugimoto, O. Nomura and T. Usui (1980). "Enzymatic assay of serum

sialic acid." Clin Chim Acta 108(3): 493-8. 224 Szczebara, F. M., C. Chandelier, C. Villeret, A. Masurel, S. Bourot, C. Duport, S.

Blanchard, A. Groisillier, E. Testet, P. Costaglioli, G. Cauet, E. Degryse, D.

Balbuena, J. Winter, T. Achstetter, R. Spagnoli, D. Pompon and B. Dumas

(2003). "Total biosynthesis of hydrocortisone from a simple carbon source in

yeast." Nat Biotechnol 21(2): 143-9.

Tafelmeyer, P., N. Johnsson and K. Johnsson (2004). "Transforming a (beta/alpha)8--

barrel enzyme into a split-protein sensor through directed evolution." Chem Biol

11(5): 681-9.

Taguchi, S., A. Ozaki and H. Momose (1998). "Engineering of a cold-adapted protease

by sequential random mutagenesis and a screening system." Appl Environ

Microbiol 64(2): 492-5.

Takai, T., C. Yokoyama, K. Wada and T. Tanabe (1988). "Primary structure of chicken

liver acetyl-CoA carboxylase deduced from cDNA sequence." J Biol Chem

263(6): 2651-7.

Tao, H. and V. W. Cornish (2002). "Milestones in directed enzyme evolution." Curr

Opin Chem Biol 6(6): 858-64.

Tawfik, D. S. and A. D. Griffiths (1998). "Man-made cell-like compartments for

molecular evolution." Nat Biotechnol 16(7): 652-6. 225 Thoden, J. B., S. Firestine, A. Nixon, S. J. Benkovic and H. M. Holden (2000).

"Molecular structure of Escherichia coli PurT-encoded glycinamide

ribonucleotide transformylase." Biochemistry 39(30): 8791-802.

Thoden, J. B., S. M. Firestine, S. J. Benkovic and H. M. Holden (2002). "PurT-encoded

glycinamide ribonucleotide transformylase. Accommodation of adenosine

nucleotide analogs within the active site." J Biol Chem 277(26): 23898-908.

Thoden, J. B., H. M. Holden, G. Wesenberg, F. M. Raushel and I. Rayment (1997).

"Structure of carbamoyl phosphate synthetase: a journey of 96 A from substrate to

product." Biochemistry 36(21): 6305-16.

Thoden, J. B., T. J. Kappock, J. Stubbe and H. M. Holden (1999). "Three-dimensional

structure of N5-carboxyaminoimidazole ribonucleotide synthetase: a member of

the ATP grasp protein superfamily." Biochemistry 38(47): 15480-92.

Thornton, J. M., C. A. Orengo, A. E. Todd and F. M. Pearl (1999). "Protein folds,

functions and evolution." J Mol Biol 293(2): 333-42.

Todd, A. E., C. A. Orengo and J. M. Thornton (2001). "Evolution of function in protein

superfamilies, from a structural perspective." J Mol Biol 307(4): 1113-43.

Tomazic, S. J. and A. M. Klibanov (1988). "Mechanisms of irreversible thermal

inactivation of Bacillus alpha-." J Biol Chem 263(7): 3086-91. 226 Tomazic, S. J. and A. M. Klibanov (1988). "Why is one Bacillus alpha-amylase more

resistant against irreversible thermoinactivation than another?" J Biol Chem

263(7): 3092-6.

Tosaka, O. and K. Takinami (1978). "Pathway and regulation of lysine biosynthesis in

Brevibacterium lactofermentum." Agric. Biol. Chem. 42: 95.

Tudor, D. W., T. Lewis and D. J. Robins (1993). "Synthesis of the Trifluoroacetate Salt

of Aspartic Acid ß-Semialdehyde, an Intermediate in the Biosynthesis of L-

Lysine, L-Threonine, and L-Methionine." Synthesis: 1061-1062.

Uchida, Y., Y. Tsukada and T. Sugimori (1984). "Purification and properties of N-

acetylneuraminate lyase from Escherichia coli." J Biochem (Tokyo) 96(2): 507-

22.

Umeno, D. and F. H. Arnold (2004). "Evolution of a pathway to novel long-chain

carotenoids." J Bacteriol 186(5): 1531-6.

Umeno, D., A. V. Tobias and F. H. Arnold (2005). "Diversifying carotenoid biosynthetic

pathways by directed evolution." Microbiol Mol Biol Rev 69(1): 51-78. van Nimwegen, E. (2003). "Scaling laws in the functional content of genomes." Trends

Genet 19(9): 479-84.

Vasserot, A. P., C. D. Dickinson, Y. Tang, W. D. Huse, K. S. Manchester and J. D.

Watkins (2003). "Optimization of protein therapeutics by directed evolution."

Drug Discov Today 8(3): 118-26. 227 Vega, M. C., E. Lorentzen, A. Linden and M. Wilmanns (2003). "Evolutionary markers

in the (beta/alpha)8-barrel fold." Curr Opin Chem Biol 7(6): 694-701.

Vimr, E., S. Steenbergen and M. Cieslewicz (1995). "Biosynthesis of the polysialic acid

capsule in Escherichia coli K1." J Ind Microbiol 15(4): 352-60.

Vimr, E. R. and F. A. Troy (1985). "Identification of an inducible catabolic system for

sialic acids (nan) in Escherichia coli." J Bacteriol 164(2): 845-53.

Vimr, E. R. and F. A. Troy (1985). "Regulation of sialic acid metabolism in Escherichia

coli: role of N-acylneuraminate pyruvate-lyase." J Bacteriol 164(2): 854-60.

Voigt, C. A., C. Martinez, Z. G. Wang, S. L. Mayo and F. H. Arnold (2002). "Protein

building blocks preserved by recombination." Nat Struct Biol 9(7): 553-8.

Wada, M., C. C. Hsu, D. Franke, M. Mitchell, A. Heine, I. Wilson and C. H. Wong

(2003). "Directed evolution of N-acetylneuraminic acid aldolase to catalyze

enantiomeric aldol reactions." Bioorg Med Chem 11(9): 2091-8.

Wang, W., T. J. Kappock, J. Stubbe and S. E. Ealick (1998). "X-ray crystal structure of

glycinamide ribonucleotide synthetase from Escherichia coli." Biochemistry

37(45): 15647-62.

Warren, M. S., K. M. Mattia, A. E. Marolewski and S. J. Benkovic (1996). "The

transformylase enzymes of de novo purine biosynthesis." Pure & Appl. Chem.

68(No. 11): 2029-2036. 228 Webster, F. H. and R. V. Lechowich (1970). "Partial purification and characterization of

dihydrodipicolinic acid synthetase from sporulating Bacillus megaterium." J

Bacteriol 101(1): 118-26.

Wedekind, J. E., R. R. Poyner, G. H. Reed and I. Rayment (1994). "Chelation of serine

39 to Mg2+ latches a gate at the active site of enolase: structure of the bis(Mg2+)

complex of yeast enolase and the intermediate analog

phosphonoacetohydroxamate at 2.1-A resolution." Biochemistry 33(31): 9333-42.

Wierenga, R. K. (2001). "The TIM-barrel fold: a versatile framework for efficient

enzymes." FEBS Lett 492(3): 193-8.

Williams, G. J., T. Woodhall, A. Nelson and A. Berry (2005). "Structure-guided

saturation mutagenesis of N-acetylneuraminic acid lyase for the synthesis of sialic

acid mimetics." Protein Eng Des Sel 18(5): 239-46.

Wilmanns, M., C. C. Hyde, D. R. Davies, K. Kirschner and J. N. Jansonius (1991).

"Structural conservation in parallel beta/alpha-barrel enzymes that catalyze three

sequential reactions in the pathway of tryptophan biosynthesis." Biochemistry

30(38): 9161-9.

Wilson, D. S., A. D. Keefe and J. W. Szostak (2001). "The use of mRNA display to

select high-affinity protein-binding peptides." Proc Natl Acad Sci U S A 98(7):

3750-5. 229 Wise, E., W. S. Yew, P. C. Babbitt, J. A. Gerlt and I. Rayment (2002). "Homologous

(beta/alpha)8-barrel enzymes that catalyze unrelated reactions: orotidine 5'-

monophosphate decarboxylase and 3-keto-L-gulonate 6-phosphate

decarboxylase." Biochemistry 41(12): 3861-9.

Wu, Y. and C. R. Matthews (2002). "A cis-prolyl peptide bond isomerization dominates

the folding of the alpha subunit of Trp synthase, a TIM barrel protein." J Mol Biol

322(1): 7-13.

Wu, Y. and C. R. Matthews (2002). "Parallel channels and rate-limiting steps in complex

protein folding reactions: prolyl isomerization and the alpha subunit of Trp

synthase, a TIM barrel protein." J Mol Biol 323(2): 309-25.

Xiang, H., L. Luo, K. L. Taylor and D. Dunaway-Mariano (1999). "Interchange of

catalytic activity within the 2-enoyl-coenzyme A hydratase/isomerase superfamily

based on a common active site template." Biochemistry 38(24): 7638-52.

Yamakura, F., Y. Ikeda, K. Kimura and T. Sasakawa (1974). "Partial purification and

some properties of pyruvate-aspartic semialdehyde condensing enzyme from

sporulating Bacillus subtilis." J Biochem (Tokyo) 76(3): 611-21.

Yonezawa, M., N. Doi, Y. Kawahashi, T. Higashinakagawa and H. Yanagawa (2003).

"DNA display for in vitro selection of diverse peptide libraries." Nucleic Acids

Res 31(19): e118. 230 Yugari, Y. and C. Gilvarg (1962). "Coordinate end-product inhibition in lysine synthesis

in Escherichia coli." Biochim Biophys Acta 62: 612-4.

Yugari, Y. and C. Gilvarg (1965). "The condensation step in diaminopimelate synthesis."

J Biol Chem 240(12): 4710-6.

Zaccolo, M., D. M. Williams, D. M. Brown and E. Gherardi (1996). "An approach to

random mutagenesis of DNA using mixtures of triphosphate derivatives of

nucleoside analogues." J Mol Biol 255(4): 589-603.

Zalkin, H. and P. Nygaard (1999). Biosynthesis of purine nucleotides. Escherichia coli

and Samonella: Cellular and molecular biology. F. C. Neidhardt. Washinton, D.

C., ASM Press: 561-579.

Zha, D., A. Eipper and M. T. Reetz (2003). "Assembly of designed oligonucleotides as an

efficient method for gene recombination: a new tool in directed evolution."

Chembiochem 4(1): 34-9.

Zhang, Y. X., K. Perry, V. A. Vinci, K. Powell, W. P. Stemmer and S. B. del Cardayre

(2002). "Genome shuffling leads to rapid phenotypic improvement in bacteria."

Nature 415(6872): 644-6.

Zhao, H. and F. H. Arnold (1997). "Combinatorial protein design: strategies for screening

protein libraries." Curr Opin Struct Biol 7(4): 480-5.

Zhao, H. and F. H. Arnold (1997). "Optimization of DNA shuffling for high fidelity

recombination." Nucleic Acids Res 25(6): 1307-8. 231 Zhao, H., L. Giver, Z. Shao, J. A. Affholter and F. H. Arnold (1998). "Molecular

evolution by staggered extension process (StEP) in vitro recombination." Nat

Biotechnol 16(3): 258-61.

VITA

Hui Li

Biographical Information

Born: January 17, 1974 in Tianjin, China

Academic Background

1999-2005 Ph.D in the integrative bioscience Option The Pennsylvania State University University Park, PA 1996-1999 Master of Science in Biochemistry Institute of Molecular Biology Nankai University, China. 1992-1996 Bachelor of Science in Biochemistry and Molecular Biology College of Life Sciences Nankai University, China.