Quick viewing(Text Mode)

Bioengineering Coagulation Factor Xa Substrate Specificity Into

Bioengineering Coagulation Factor Xa Substrate Specificity Into

Bioengineering Factor Xa Substrate Specificity into

Streptomyces griseus

by

> Michael J. Page

B.Sc, Carleton University, 1998

A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY in

THE FACULTY OF GRADUATE STUDIES

Department of Biochemistry and Molecular Biology

We accept this thesis as conforming to the required standard

THE UNIVERSITY OF BRITISH COLUMBIA April 2004

© Michael J. Page, 2004 Abstract

Extended substrate specificity is exhibited by a number of highly evolved members of the SI peptidase family, such as the vertebrate blood coagulation . Dissection of this substrate specificity has been hindered by the complexity and physiological requirements of these proteases. In order to understand the mechanisms of extended substrate specificity, a bacterial trypsin-like , Streptomyces griseus trypsin (SGT), was chosen as a scaffold for the introduction of extended substrate specificity through structure-based genetic engineering.

Recombinant and mutant SGT proteases were produced in a B. subtilis expression system, which constitutively secretes active into the extracellular medium at greater than 15 mg/L of culture. Comparison of the recombinant wild-type protease to the natively produced enzyme demonstrated near identity in enzymatic and structural properties. To begin construction of a high specificity protease, four mutants in the S1 substrate binding pocket

(T190A, T190S, T190V, and T190P) were produced and examined for differences in the

Arg:Lys preference. Only the T190P mutant of SGT demonstrated a significant increase in PI arginine to lysine preference - a three-fold improvement to 16:1 - with only a minor

reduction in catalytic activity (kcat reduction of 25%). The 1.9 A resolution crystal structure of

T190P mutant of SGT in complex with the small molecule inhibitor benzamidine was subsequently determined. The model shows that the increased preference for Arg over Lys side chains in the SI pocket is the result of the second shell residues of the SI pocket, particularly by the N-terminal residue of the protease which does not conflict with the introduced proline ring.

ii Using the T190P mutant of SGT as a starting point, coagulation factor Xa (FXa) substrate specificity determinants were then introduced by additional site-directed mutagenesis. To aid in purification of the recombinant proteases a hexa-histidine tag was added to the C-terminus of the protein. Addition of the purification tag reduced the ability of the expression host to produce the enzyme (3 mg/L of culture) but simplified the purification of SGT from the culture medium. Various combinations of a two-residue loop and a number of point mutations at positions 99, 172, 174, 180, and 217 were constructed and characterized in SGT. The mutant bearing mutations at all positions except residue 217 demonstrated a moderate preference for FXa substrates as determined using chromogenic synthetic peptides.

However, the kinetic properties of the mutant enzyme suggested that the 172-loop, a member of the S3/S4 substrate binding pocket, is not in a conformation similar to FXa. Addition of the

Y217E mutation was designed to stabilize the loop but led to a specific protease similar to coagulation factor XIa and not FXa. These results confirm the evolutionary relationship amongst the vertebrate coagulation proteases and demonstrate the importance and flexibility of the 172-loop. Further, Na+ binding, a novel property found in several coagulation proteases, is suggested to play a role in stabilization of the 172-loop and in turn played an important role in the evolution of the vertebrate coagulation cascade.

iii Table of Contents

Abstract ii

Table of Contents iv

List of Tables vii

List of Figures viii

List of Abbreviations x

Acknowledgments xi

Chapter 1. Introduction 1 1.1 Focus of Research 1 1.2 Overview 1 1.3 Nomenclature of Proteases 4 1.4 Definition of Substrate Specificity 6 1.5 Aspartic, Cysteine, Metallo-, and Threonine Proteases 7 1.6 Catalytic Mechanism of Serine Proteases 9 1.7 Blood Coagulation and 14 1.8 Substrate Specificity of Serine Proteases 17 1.9 Genetic Manipulation of Serine Proteases 20 1.10 Streptomyces griseus Trypsin 21 1.11 Statement of Hypothesis 22 1.12 Objectives and Outline 22

Chapter 2. Recombinant Protein Expression of Streptomyces griseus Trypsin in Bacillus subtilis 25 2.1 Introduction 25 2.2 Materials & Methods 28 2.2.1 Plasmids, Bacterial Strains, and Growth Conditions 28 2.2.2 DNA manipulation 28 2.2.3 Protein Purification 29 2.2.4 Kinetic Analysis , 30 2.2.5 Crystallization 30 2.3 Results and Discussion 31

iv 2.3.1 Production and Purification of Recombinant SGT 31 2.3.2 Kinetic Analysis 36 2.3.3 Crystallization and Structure Determination 37 2.3.4 Comparison of the recombinant and -derived crystal structure 40 2.3.5 Wild-type Native and Recombinant SGT Substrate Binding 41 2.4 Conclusions 42

Chapter 3. Engineering the Primary Substrate Specificity of Streptomyces griseus Trypsin 43 3.1 Introduction 43 3.2 Materials & Methods 44 3.2.1 DNA Manipulation and Protein Purification 44 3.2.2 Kinetic Analysis 45 3.2.3 Crystallization and Structure Determination 45 3.3 Results and Discussion 46 3.3.1 Production of SGT Mutants 46 3.3.2 Kinetic Analysis 46 3.3.3. Crystallization and Structure Refinement 48 3.3.4 Ca*+ of B. subtilis derived SGT 50

3.3.5 T190S and the Loss of y-CH3 51 3.3.6 T190V and the Effect of a Branched Side Chain 53 3.3.7 T190A and the Loss of y-OH 53 3.3.8 T190P 54 3.3.9 Second Shell Residues 58 3.4 Conclusions 58

Chapter 4. Engineering Coagulation Factor Xa Substrate Specificity into Streptomyces griseus Trypsin 59 4.1 Introduction 59 4.1.1 Overview 59 4.1.2 Choice of Mutations to Mimic FXa-like Specificity 62 4.2 Materials & Methods 65 4.2.1 Plasmids, Bacterial Strains, and Growth Conditions 65 4.2.2 Construction of a Hexahistidine-tagged SGT 65 4.2.3 Sequence analysis of the SI Family peptidases 67 4.2.4 DNA Manipulation 67 4.2.5 Purification of His-tagged SGT and Mutants thereof 68 4.2.6 Characterization of Substrate Specificity 69 4.2.7 Macromolecular Substrate Specificity 69 4.3 Results & Discussion 70 4.3.1 Production of His-tagged SGT 70 4.3.2 Techniques for Characterization of Substrate Specificity of Serine Proteases 71

v 4.3.3 Extended Substrate Specificity of SGT 72 4.3.4 Effect of the 99-loop on the Substrate Specificity of SGT 74 4.3.5 Mechanisms of P3 Selectivity in SI Family Peptidases 76 4.3.6 Role of the 172-loop and Residue 217 in the SI Peptidases 79 4.3.7 Additional Elements Needed for Reconstructing FXa-like Specificity 86 4.3.8 Utility of a FXa-like Protease 89 4.4 Conclusions & Future Directions 92

Chapter 5. General Discussion and Outlook 94 5.1 Substrate specificity determinants of the SI family of Serine Proteases 94 5.2 Molecular Evolution of the SI family of Serine Proteases 98 5.3 Paper, Rock, Scissors Genetic Screening of Trypsin-like Proteases 99 5.5 Future Opportunities 107 5.6 Significance of the Work 108 5.7 Conclusions 109

Appendix A: Structural Alignment of Selected SI Family Peptidases 110

Bibliography Ill

vi List of Tables

Table 1.1 A short list of biological processes involving proteolysis 3 Table 1.2 A short list of pathologies involving abnormal proteolysis 3 Table 1.3 Select examples of successful protein engineering of serine proteases 21 Table 2.1 Purification table for recombinant SGT (bSGT) from B. subtilis extracellular supernatant 34 Table 2.2 PI arginine to lysine preference of SGT 37 Table 2.3 Data collection and refinement statistics of wild-type recombinant SGT 39 Table 3.1 Oligonucleotides used to mutate residue 190 in the SGT gene 45 Table 3.2 ES-MS analysis of the four mutants of SGT 46 Table 3.3 PI Arginine to Lysine preference of mutant SGT enzymes 47 Table 3.4 Ki values of benzamidine for recombinant SGT and the four mutant forms 48 Table 3.5 Data collection and refinement statistics for the T190P mutant of SGT 49 Table 4.1 Mutants of SGT constructed to mimic the substrate specificity of FXa 62 Table 4.2 Oligonucleotides used to mutate the SGT gene to mimic residues found in FXa....68 Table 4.3 Steady-state kinetic parameters for the hydrolysis of a series of p-nitroanilide chromogenic substrates by the YSFMP mutant of SGT 83 Table 4.4 Steady-state kinetic parameters for the hydrolysis of a series of p-nitroanilide chromogenic substrates by the YSFMPE mutant of SGT 85 Table 4.5 Cleavage sites of proteases used in processing recombinant proteins 90 Table 5.1 Substrate specificities of SI family peptidases 96 Table 5.2 Substrate specificity determinants of SI family sub-family A peptidases 100 Table 5.3 Inhibition constants of ecotin against a variety of SI family peptidases 104

vii List of Figures

Figure 1.1 Distribution of identified proteases based on catalytic type 2 Figure 1.2 Derivation of the Michaelis-Menten equation 6 Figure 1.3. Catalytic mechanism of a typical zinc metalloprotease 9 Figure 1.4 of serine proteases 10 Figure 1.5 Three dimensional structure of a typical 12 Figure 1.6 Catalytic mechanism of a serine protease 13 Figure 1.7 Overview of vertebrate blood coagulation 15 Figure 1.8 Protein domains of the vertebrate blood coagulation proteases 16 Figure 1.9 Schecter & Berger nomenclature of protease specificity 18 Figure 2.1 Plasmid map of SGT gene cloned into pWB980 33 Figure 2.2. ES-MS spectrum of purified recombinant SGT 35 Figure 2.3 SDS-PAGE of purified recombinant SGT 36 Figure 2.4 Ramachandran plot of the crystal structure of recombinant wild-type SGT 38

Figure 2.5 Superimposition of the Ca traces of native and recombinant wild-type SGT 41 Figure 3.1 Ramachandran plot of the crystal structure of the T190P mutant of SGT 50 Figure 3.2 Comparison of the Ca2+ binding site in SGT enzymes 52 Figure 3.3 Comparison of the SI binding pocket in SGT enzymes 56 Figure 4.1 Residues involved in the extended substrate specificity of coagulation proteases .63 Figure 4.2 Plasmid construction for the production of recombinant His-tagged SGT 66 Figure 4.3 Purification of a typical His-tagged mutant of SGT 71 Figure 4.4 Substrates used to characterize mutants of SGT with altered substrate specificity 73

Figure 4.5 Normalized kcat/Km values for the T190P, LP and YP mutants of SGT 75 Figure 4.6 S3 & S4 binding pockets of FXa 77 Figure 4.7 S3 and S4 binding pockets of FVIIa 78 Figure 4.8 Conformation of the 172-loop in SI peptidases 80 Figure 4.9 Normalized KJKm values for the YFP and YSFP mutants of SGT 82 Figure 4.10 Na+-binding site in 88 Figure 4.11 Prothrombin processing by mutants of SGT 91

viii Figure 5.1 Simplified phylogenetic tree of the SI family of peptidases 95 Figure 5.2 Theory behind the Paper, Rocks, Scissors genetic screen 103 Figure 5.3 Dimeric structure of ecotin 104 Figure 5.4 Structure of GFP 106

ix List of Abbreviations

A Angstrom unit (1 A = 0.1 run) AMC Aminomethylcoumarin aPC Activated B Crystallographic thermal factor (A2) Bz Benzoyl dNTP Dinucleotidetriphosphate Fo, Fc Observed and calculated structure factors FX Coagulation (similar for other coagulation proteases) FXa Activated coagulation factor X (similar for other coagulation proteases) kcat Catalytic constant

Km Michaelis constant LB Luria - Bertani broth NMWL Nominal molecular weight limit PCR Polymerase Chain Reaction Pip Pipecoyl pNA para-nitroanilide r.m.s. Root mean squared

Rfree R factors based test set of excluded reflections

Rmerge Shk! Si |li(hkl) - (hkl)(| / Shk! Si Ii(hkl)

R-cryst £ ||F0bs| - |Fcalc|| / 2 |F0bs| S.D. Standard deviation at 95% confidence interval SDS-PAGE Sodium dodecyl sulfate - polyacrylamide gel electrophoresis Tos Tosyl Tris-HCI Tris(hydroxymethyl)aminomethane hydrochloride Acknowledgments

I am indebted to Ross and Jeff for countless hours of good times, their use of "Number 1", and their ability to define new verbs.

I would especially like to thank Sui-Lam Wong for providing the essential components of the expression system.

I appreciate my parents for their constant support both emotionally and financially throughout my life.

I thank my friends, for their help, advice, time and patience. I especially thank Iain, Marty, Ismail, Angus, Mike K., and Mark and wish them the best in all endeavors.

A special thanks to Tanya for all of her helpful comments.

This thesis is dedicated to my beloved Doe

xi Chapter 1. Introduction

1.1 Focus of Research

How do you create a highly efficient and highly specific enzyme? Over several million years, nature has evolved many enzymes that possess high degrees of specificities. Using methods of genetic manipulation, properties of enzymes can be altered, including their substrate specificity. The present dissertation involves taking a primitive bacterial enzyme and adding substrate specificity where little existed previously. By doing so, we will learn about how molecular evolution has produced substrate specificity in one family of enzymes, the serine proteases. I will use a human protease involved in blood coagulation as our guide and then extend the concepts learned to a system that could produce a variety of other specificities. To begin, I will discuss what is a protease, what do they do and why they are important. From there, I will describe the system and the methods applied to generate a specific protease, what has been done by others and then by the author. Lastly, I will discuss the opportunities that may result from this work.

1.2 Overview

Proteolytic enzymes play a diverse number of roles in a variety of essential biological processes, both as non-specific catalysts of protein degradation and as highly specific agents that control physiological events. Hydrolysis of a peptide bond is the key function that proteases fulfill in vivo, and this may result in the activation or destruction of its substrate.

Numerous biological processes involving proteolytic activity have been characterized and a wealth of information has been gathered on the five major catalytic classes of these enzymes

(Figure 1.1, Table 1.1). Roughly 2 % of all genes in most organisms are proteases, second

1 only to transcription factors. Hence, the importance of these types of enzymes in biological and commercial settings cannot be understated.

Unknown

Figure 1.1 Distribution of identified proteases based on catalytic type.

Numerous pathological conditions are the result of excessive or insufficient proteolytic activity (Table 1.2). These clinical situations can arise from genetic defects in the proteases themselves, their natural substrates or their natural inhibitors. A number of academic laboratories and pharmaceutical companies are devoted to the production of therapeutic products, such as small molecule inhibitors or recombinant proteins, to minimize the effects of protease-related pathologies [1,2]. Given the large number of closely related proteases found in man, the design of potent inhibitors with minimal cross reactivity is an arduous task. Detailed information on the geometry and electronic configuration of the protease is a requirement for proper inhibitor design. These studies have been hampered

2 Biological Proteolytic Event Ref. Process

Apoptosis Control of physiological cell death [3] Blood Proteolytic cascades of clot formation, fibrinolysis Coagulation [4] Blood Pressure Renin-angiotensin and -kinin systems [5] Breakdown of protein into tri- and dipeptides; liberation of Digestion hormones promoting digestion [6] Sperm-Egg interaction, ovulation, ovum implantation and Fertilization parturition [7] Complement activation, antigen presentation, chemokine Immunity and chemotaxin activation; [8] Intracellular Proteasome-Ubiquitin system protein level [9] Protein Zymogen activation and protein sorting Processing [10] Tissue Turnover and repair of the extracellular environment Remodeling [11] Table 1.1 A short list of biological processes involving proteolysis.

Pathology Proteolytic Event Ref. Alzheimer's Processing of amyloid precursor protein [12] Disease Cancer Regulation of apoptosis, tumor growth and invasion [3] Chronic Excessive activation of pro-inflammatory cytokines Inflammation [13] Insufficient levels of coagulation factor activity and slowed Hemophilia [14] clot formation Myocardial Unregulated coagulation leading to restricted blood flow [15] Infarction Parasite Infection Regulation of the parasitic life-cycle [16] Processing of viral coat proteins and other essential Viral Replication replication machinery required for viral infection [17] Table 1.2 A short list of pathologies involving abnormal proteolysis.

3 by the ability to produce large quantities of the target enzyme and detailed three dimensional

structural models of them.

Industrial usage of proteolytic enzymes is widespread and commercially important.

For example, the production of the protease for use in detergents is on the scale of

tons per annum and accounts for 40% of enzyme sales worldwide. Other industrial

applications of proteases include the production of food stuffs, leather, pharmaceuticals,

diagnostic reagents, waste management, and silver recovery [18,19]. Recent advances in

molecular biology technology have allowed for the development of proteases with improved

properties for industrial use. Some successful examples of protein engineering include

thermostability, resistance to oxidation, alteration of pH optima, and increased catalytic

efficiency [20]. However, the use of highly specific proteases for site specific proteolysis has

for the most part been limited to academic endeavors due to the difficulties associated with

producing high purity enzymes from the complex biological systems from which they derive.

In the present dissertation, a novel system has been developed to produce a

recombinant protease and the substrate specificity of the enzyme altered. The results

contribute to our understanding of the substrate specificity determinants of the trypsin-like

family of serine proteases. Furthermore, a framework for designing novel specificities not

observed in nature is then presented based on the work.

1.3 Nomenclature of Proteases

Prior to a literature review, a few words must be mentioned on the nomenclature used

to describe proteolytic enzymes. The International Union of Biochemical and Molecular

Biology Nomenclature Committee (IUBMB-NC) denotes acting on peptide bonds

4 as E.C.3.4, yet only a small fraction of the known proteases have been given full classification. Given the wide distribution of proteolytic enzymes and their historical significance, a variety of common names are used. Papain, commonly used to tenderize meat, owes its name to the papaya plant from which it is derived. Other proteases are named based on the physiological process in which they were first described, such as coagulation factor X which is involved in vertebrate blood coagulation. Unfortunately, a significant number of proteases owe their name to having similar biochemical properties yet are involved in disparate biological processes or having different catalytic mechanisms. For example, two of the many proteases found in the human liver, B and cathepsin D, are named similarly but belong to the cysteine and aspartic protease family, respectively. Thus, the nomenclature for each particular protease has become rather muddled with convention.

A variety of terms have been used to refer to enzymes that catalyze the hydrolysis of a peptide bond. The term peptidase has been suggested by the IUBMB-NC as a preferred alternative to protease [21]. Peptidase refers more easily to both exopeptidases (enzymes that cleave peptide bonds at the ends of a polypeptide chain), (those that cleave peptide bonds in the middle of polypeptide chains), and oligopeptidases (those that cleave only short polypeptides). Exopeptidases are further subdivided into aminopeptidases (cleaving at the N-terminus) and carboxypeptidases (cleaving at the C-terminus). Other terms found in the literature include proteinase and proteolytic enzyme. In the present thesis, the terms protease and peptidase will be used interchangeably to describe the enzymes studied in the research; all of which are endopeptidases.

5 1.4 Definition of Substrate Specificity

Enzyme kinetics are traditionally defined by Michaelis-Menten kinetics which apply a steady state assumption to the reaction (Figure 1.2). Substrate specificity is traditionally defined by the ratio of the maximum reaction velocity catalyzed by the enzyme per unit time

(kcat) to the Michaelis-Menten constant (Km). kcal is equal to the maximal reaction velocity

(Vmax) divided by the total amount of enzyme in the reaction. In theory, both constants (kcat

and Km) are linked mathematically as Km is dependent on the rates of each step in the reaction

(Km = (k.i + k2) / ki). In the present study, the traditional definition of substrate specificity will be applied (S = k^/Km).

B [ES] = k, [E][S] / (k, + k ) E + s ES —> E + p 2 (1) Fraction of enzyme in ES: F = (ES] / ([E] + [ES]) (2) Initial Rate of Reaction: v = d[P] / dt = k [ES]

(1) + (2) yields: F= [S] / {((k 1 + k2 ) / k, ) + [S]} (3) Steady state assumption: d[ES] / dt = 0 Initial rate of reaction: v0 = vmax F (4) Rate of ES formation = Rate of ES breakdown

(3) + (4) yields: v0 = vmax [S] / «(k., + k2 ) / k,) + [S]} (5)

k,[El[Sl = k.^ES] +k2[ES]

MichaelisConstant: K =(k1+k2)/k1 (6)

(valid when k,» k2)

(5)+ (6) v0 = vmax[S]/(K +[S])

Figure 1.2 Derivation of the Michaelis-Menten equation. For a typical enzyme

catalyzed reaction (A) the rate of change of the enzyme (E) in the enzyme-substrate

complex (ES) is assumed to be constant throughout the reaction. The Michaelis-

Menten equation can then be derived (B) by introducing the Michaelis constant (Km),

which is only true if the rate of ES formation (ki) is much larger than the rate of

product formation (k2).

6 1.5 Aspartic, Cysteine, Metallo-, and Threonine Proteases

Historically, four mechanistic classes of proteolytic enzymes have been recognized based on their catalytic mechanism - aspartic, cysteine, metallo-, and serine proteases. With

the advent of whole genome sequencing this classification system has become inadequate as

the variety of catalytic mechanisms identified in nature has expanded rapidly. At present,

nearly 18,000 gene sequences for peptidases have been identified and over 2000 peptidases

have been characterized. Barrett has devised a classification scheme based on statistically

significant similarities in sequence and structure of all known proteolytic enzymes, and terms

this database MEROPS (www.merops.ac.uk) [22,23]. This system divides all known

proteases into 40 clans and over 169 sub-families, not including a group of putative proteases

of unknown mechanism. Only the four historic classifications of proteolytic enzymes will be

mentioned here.

Hydrolysis of a polypeptide backbone requires three key mechanistic hurdles to be

overcome for efficient catalysis to proceed. Peptide bonds are particularly stable due to the

electron resonance between the amide nitrogen and carbonyl group of the bond. By the use of

a general acid, proteases overcome this partial double bond character through the generation

of a negatively charged tetrahedral intermediate that is stabilized by the active site. Secondly,

water is a poor nucleophile and must be activated, typically via a general base. Lastly, amines

are poor leaving groups and must be expelled from the active site prior to completion of the

catalytic cycle [24]. Proteases accomplish these tasks efficiently and increase the rate of

reaction ~1010-fold over the uncatalyzed reaction. Moreover, proteases can catalyze similar

reactions in the hydrolysis of amides, esters, anilides, and thioesters.

7 Aspartic proteases and metalloproteases catalyze the hydrolysis of the polypeptide backbone through activation of a water molecule. Metalloproteases comprise the second largest family of proteases known in nature and typically utilize a zinc ion in their active site; however, cobalt or manganese are also found (Figure 1.3). In many metalloproteases, a single metal ion is utilized in the catalysis; however two metal ions acting cocatalytically are typically found in proteases containing cobalt or manganese. Three amino acid side chains, typically His, Glu, Asp or Lys residues, are involved in the co-ordination of the metal ion and at least one other residue is required for catalysis [25]. The catalytic residue is typically Glu in many metallopeptidases but alternatives exist. Activation of the water molecule in the aspartic family of proteases is the result a pair of Asp residues that co-ordinate the activated water molecule [26].

In contrast to nucleophilic attack of the amide backbone by an activated water molecule, cysteine, serine, and threonine peptidases utilize an amino acid side chain. Cysteine proteases compromise the third largest family of known peptidases and employ a nucleophilic sulfhydryl from a cysteine residue to catalyze the hydrolysis of an amide bond [27]. Although less abundant in nature, threonine peptidases are deeply involved in the key biological process of intracellular degradation of polypeptides by the proteasome [28]. The overall catalytic mechanism of both of these families of protease is more similar to that found in serine proteases, where a nucleophile as well as a proton donor is required for catalysis. The proton donor in all cysteine and threonine peptidases which have been identified is a His residue, which is also true of all known serine proteases.

8 Figure 1.3. Catalytic mechanism of a typical zinc metalloprotease. The zinc ion serves

to polarize the carbonyl group of the substrate as well as facilitate the deprotonation of

the water molecule. Several hydrogen bonding partners stabilize the intermediates but

are not included in the diagram for simplicity.

1.6 Catalytic Mechanism of Serine Proteases

Nearly a third of all known proteases are classified in the serine protease family of enzymes. The family name stems from the nucleophilic serine residue in the active site of the enzyme, and the catalytic potency of this residue is dependent on the Asp-Ser-His charge relay system or catalytic triad which was originally proposed by Blow over 30 years ago

(Figure 1.4) [29]. These three residues are found in an identical structural position in four different three-dimensional protein folds that catalyze the hydrolysis of peptide bonds,

9 suggesting four distinct evolutionary origins. Common examples of these folds are represented by , subtilisin, carboxypeptidase Y, and Clp protease. A number of other enzyme families, including asparaginases, esterases, acylases, and P-lactamases, utilize the Asp-Ser-His catalytic triad or variants to generate a strong nucleophile and promote catalysis [30]. For the remainder of the introduction, I will limit the discussion to the chymotrypsin family (SI peptidase family) of serine proteases, which includes trypsins and . Moreover, I will utilize the chymotrypsin numbering system suggested by Blow to refer to a particular amino acid residue. It must be noted that many of the concepts discussed in relation to chymotrypsin-like proteases apply similarly to other types of proteases.

His57 His57 / Ser195 / Ser195

Asp 102 CHJ Asp102 —CH,

Figure 1.4 Catalytic triad of serine proteases. The Asp, His, and Ser combination is

found in serine proteases and other enzyme families that require a nucleophilic serine

side chain.

Central to the catalytic triad is the existence of a hydrogen bond between residue

Aspl02 and His57, which facilitates the abstraction of the proton from Serl95 and generates a potent nucleophile. Some controversy exists over whether this hydrogen bond can be described as a low barrier hydrogen bond (LBHB), an instance where the pK values between

10 the donor and acceptor are matched. Rejection of the LBHB theory mainly stems from the argument that it would provide no significant improvement to catalytic rate enhancement

[31,32]. Increasing experimental and theoretical data are supporting this theory and the debate continues [33,34]. Stabilization of the catalytic triad is mediated through a network of additional hydrogen bonds provided by several highly conserved amino acid residues surrounding the triad, particularly Ala56 and Ser214 in the chymotrypsin family of serine proteases. Significant effort has been placed in the development of small molecule compounds that mimic the activity the catalytic triad, but have met with limited success due to the complexity of the chemistry involved to generate the nucleophilic serine.

Activation of chymotrypsin-like serine proteases requires proteolytic processing of an inactive zymogen precursor protein. This cleavage occurs at the identical position in all known members of the family: between residues 15 and 16 [24]. The newly created N- terminus produces a conformational change in the enzyme and stabilizes the and substrate binding site through formation of an electrostatic interaction with Aspl94 [29].

Two P-barrel domains, each formed by six anti-parallel p-strands, and a C-terminal a-helix comprise the mature form of the enzyme (Figure 1.5). Both the catalytic residues and substrate binding site lie in the cleft between the P-barrel domains, and enzyme-substrate interactions occur with both domains. A minimum of three disulphide bonds is required to stabilize the overall structure; however five or six are commonly found in the family of enzymes.

11 Figure 1.5 Three dimensional structure of a typical serine protease of the SI family of

peptidases. Components of the catalytic triad are shown in stick form and are located

in between the two (3-barrel domains.

Figure 1.6 depicts the generally accepted mechanism of serine protease catalyzed hydrolysis of a peptide bond [24]. Initially, the hydroxyl oxygen of Ser 195 attacks the carbonyl of the peptide substrate as a result of His57 in the catalytic triad acting as a general base (Steps I and II). The oxyanion tetrahedral intermediate is stabilized by the backbone atoms of Glyl93 (not depicted) and Serl95 that generate a positively charged pocket within the active site (Step III). Collapse of the tetrahedral intermediate generates the acyl-enyzme intermediate and stabilization of the newly created N-terminus is mediated by His57 (Steps IV and V). Evidence for the existence of the acyl-enzyme intermediate was provided in 1954 by

Hartley and Kilbey [35]. In these initial experiments a pre-steady state burst of product

12 His57 His 57

/ Ser195 H,C/ Ser195

\CH , \CH , / /

\ /" o. a---" N—C H O / ^ Asp102 CH. H O Asp102—CH, Acyl-enzyme Intermediate

II His57 VI His57

H,C/ Ser195 „ J Ser19S \

w7 V\ \

°. a- O C, N—C'l / / V Asp102 CH, Asp102 CM; H O Formation of the Michaelis Complex

111 His57 VII His57 / Ser195 / Sen 95 H,C H2C

\CH , \ / . © _ CHj

rr. o. a- a--'

/ V_e Asp102 CH, Asp102 CH; Tetrahcdral Intermediate Tetrahcdral Intermediate

IV His57 VIII His57 / / Serf 95 H,C Serf 95 \ \ CH, CHJ , / H O '*-H-O'

a--' o—c H BO Asp102—CH, Asp102 CH, H O Free C-terminus generated

Figure 1.6 Catalytic mechanism of a serine protease. Formation of a tetrahedral

intermediate is the key conformational step in the acylation and deacylation reactions.

13 correctly identified that a bond to a hydroxyl moiety within chymotrypsin was involved in the reaction mechanism. In the second half of the mechanism, a water molecule displaces the free polypeptide fragment and attacks the acyl-enzyme intermediate (Step VI). Again, the oxyanion hole stabilizes the second tetrahedral intermediate of the pathway and collapse of this intermediate liberates a new C-terminus.

1.7 Blood Coagulation and Fibrinolysis

Vertebrate blood coagulation and fibrinolysis can serve as a useful paradigm for the study of proteolysis in a biological setting. The process serves as a model for pathologies associated with improper proteolysis, molecular evolution through gene duplication and divergence, as well as understanding molecular recognition and substrate specificity. Initially recognized as a cascade of events that leads to amplification and rate enhancement, the feedback pathways of the clotting cascade have only recently become elucidated.

At the site of an injury that leads to disruption of the integrity of a blood vessel, a rapid and specific response must be employed to prevent excessive blood loss and to restrict bacterial infection [36]. In vivo, the formation of a clot requires a minimum of five proteases: coagulation factor XI (FXI), coagulation factor IX (FIX), coagulation factor VU

(FVII), coagulation factor X (FX), and prothrombin. These enzymes circulate in the blood stream at low concentrations in inactive, zymogen forms. Activation of these zymogens requires the site-specific proteolysis of one or more peptide bonds, liberating a free N- terminus and promoting proteolytic activity (active forms of the enzymes are denoted with a lower case "a", such as FXa) [37]. Localization of the proteases to membrane surfaces at the site of injury is provided by three co-factors: activated coagulation factor V (FVa), activated

14 coagulation factor VIII (FVIIIa) and tissue factor (TF) [38]. In combination with a phospholipid bilayer these co-factors promote the rate of clot formation ~106-fold (Figure

1.7). Down-regulation of the pathway is mediated in part by another protease, activated protein C (aPC), which cleaves two of the co-factors, FVa and FVIIIa, at specific positions in

the protein and inactivates them [39]. Thrombin plays a pivotal role in the process as it

activates protein C, FV, and FVTII as well as a number of other signaling proteins that recruit

cells and proteins to the site of damage [40].

Figure 1.7 Overview of vertebrate blood coagulation. Five proteases are involved in

the formation of a cross-linked fibrin blood clot (factors XIa, IXa, Vila, Xa, and

thrombin). Three accessory proteins (factors Villa and Va, and tissue factor (TF)) are

involved in co-localization on a phospholipid surface (PL) and catalytic rate

enhancement of the entire process. Protein C (PC) is activated (aPC) by thrombin and

leads to inhibition of the process by cleaving the co-factors.

15 Biochemical characterization of the purified components of the blood coagulation pathway in vitro has shown that each protease in the pathway prefers to recognize and cleave a particular sequence of amino acids. Substrate specificity of these proteases, however, is not extremely strict owing to the requirement for the process to occur rapidly [41,42]. Substrate specificity is a combination of additional regions of the enzyme that contribute to molecular recognition as well as to the local architecture of the substrate binding site in the active site of the protease. Interactions between proteins are provided by additional protein domains in the polypeptide sequence (Figure 1.8) [43]. Configuration of these domains provides some clues to the evolutionary history of vertebrate blood coagulation.

Prothrombin ©000©^ FVII ©©©©©^

Protein C 00000T£RoI] Key: Apple Domain EGF Domain

Propeptide Cf^j) Fibronection Type I Domain

^) Gia Domain ^jn^ Fibronectin Type II Domain

Kringle Domain I PROT i Protease Domain

Signal Peptide

Figure 1.8 Protein domains of the vertebrate blood coagulation proteases.

16 Through gene sequence analysis, gene duplication and divergence of the coagulation proteases probably occurred prior to the emergence of the vertebrate lineage. Comparison of the published gene sequences from a number of organisms ranging from jawless vertebrates to humans shows that the domain organization of the coagulation machinery is highly conserved in all vertebrates [44]. Slight variations are known to exist, however, including the absence of the contact system (FXI, FXII, and kallikrein) in fish. Doolittle has proposed that the formation of the core of the pathway occurred roughly 450 million years ago [43]. In the intervening time, a significant amount of molecular evolution has taken place resulting in numerous changes to the gene sequence thereby resulting in parallel optimization of the relevant proteins for their physiological role.

1.8 Substrate Specificity of Serine Proteases

Hydrolysis of a polypeptide chain requires proper recognition, orientation and binding of the polypeptide backbone. Thus, the residues adjacent to the scissile bond have a significant impact on the rate of hydrolysis. Nearly 30 years ago, Schecter and Berger described the substrate-protease interaction and their system has been adopted in the literature

(Figure 1.9)[45]. In this model the scissile peptide bond is surrounded by subsites on the protease. Substrate amino acids are termed P (for peptide) and the subsites of the protease that interact with them are called S (for subsite). Substrate residues extending towards the N- terminus of the substrate are numbered P2, P3, P4 and so forth. Conversely, substrate residues extending towards the C-terminus are labeled P2', P3\ P4' and onwards. Regions in the protease are numbered according to the substrate. For example, the PI residue is bound in the

SI pocket. Theoretically, a large amount of variation in substrate specificity can result from

17 the 20 possibilities of amino acid side chains, and a wide diversity of specificity is observed in the nature.

Protease

Scissile Bond

Figure 1.9 Schecter & Berger nomenclature of protease specificity.

Broadly specific proteases recognize and act at a site dictated by a single amino acid in a polypeptide chain whereas highly specific proteases recognize a short motif consisting of three to eight amino acid residues. The biological process in which the protease is involved dictates the level of specificity. Digestive enzymes found in the gut, such as trypsin and chymotrypsin, recognize and cleave polypeptides based on the presence of a single type of amino acid (Arg/Lys and Phe/Trp/Tyr, respectively). Hence, a polypeptide substrate would typically be degraded into multiple fragments for further processing and absorption. In contrast, a number of biological processes require more specific proteolysis. As mentioned previously, the vertebrate blood clotting cascade relies on the specific cleavage of each member of the pathway to function properly. A number of other biological processes require similar levels of specificity, particularly when used for signaling purposes such as hormone and chemokine activation [46]. However, a trade off exists between the level of substrate specificity and the catalytic efficiency of the enzyme.

18 By demonstrating a high degree of selectivity, the preferred substrate has a slow rate of association with the enzyme. In turn, this generates a decreased catalytic rate relative to non-specific enzymes. In vivo such a scenario is not often preferred and alternate mechanisms of substrate specificity are employed. As mentioned in the discussion of the blood coagulation system, additional protein domains are associated with a protease domain to aid protein-. protein interactions. Thus, physiology has placed a barrier on the level of specificity that a protease might possess.

Proteases of the blood coagulation system display a marked preference for certain amino acid side chains in the PI to P4 positions and hydrolyze them rapidly. All coagulation proteases have trypsin-like specific at the primary (PI) position and prefer to hydrolyze peptide bonds on the C-terminal side of Arg or Lys residues [47]. Extended substrate specificity exists in all coagulation proteases in the S2 to S4 binding pockets. On the basis of their specificity, both FXa and thrombin are widely used for the site-specific cleavage of recombinant proteins after Ile-Glu-Gly-Arg and Leu-Val-Pro-Arg sequences in a polypeptide chain, respectively [48]. Similar sequences to those preferred are known to be hydrolyzed in vivo by the two proteases. Kinetic analysis of these proteases has revealed that both enzymes can effectively hydrolyze other sequences of amino acids [41,49-51]. For example, FXa was initially thought to have a strict preference for a small amino acid side chains at P2 (Gly).

However, large residues (Trp, Phe) at this position are preferred in vitro. Thrombin can be genetically manipulated to prefer one of its two cleavage sites, which are different in sequence and structure [52,53]. Based on these discrepancies, it seems possible to engineer at least this level of substrate specificity into a broadly specific trypsin-like protease.

19 1.9 Genetic Manipulation of Serine Proteases

Great strides in biotechnology have been made in the past decade that allow for the

design of enzymes with desirable properties. Examples of successful modifications made

through protein engineering include increased stability or activity at the extremes of

temperatures or pH, resistance to oxidation, and stability in non-aqueous environments [20].

Methods to introduce these properties involve mutagenesis of a target enzyme either through

structure based design or by random mutagenesis combined with some form of genetic

selection [54]. A rational design strategy requires significant amounts of information, particularly three dimensional structures of the initial enzyme as well as the knowledge of the regions that would be involved. Conversely, a randomized mutagenesis procedure combined

with selection or screening requires no information of the sequence, structure or mechanism

and has been widely adopted for the alteration of biochemical properties of enzymes [55].

Central to all forms of protein engineering is the creation of a novel protein by adding a novel function that was not possessed by the target protein.

Creation of a highly specific protease suffers from the practical difficulties associated

with producing a kinetically worse enzyme. As mentioned previously, for a protease to exhibit

a high degree of substrate specificity, some compensation in catalytic efficiency must be made. A number of studies have shown that it is possible to switch substrate specificities

amongst disparate members of the protease family. For example, Hedstrom demonstrated the conversion of a trypsin-like enzyme, which prefers PI Arg/Lys residues, into a chymotrypsin- like enzyme that prefers Phe/Trp/Tyr at PI [56-59]. The change in specificity required mutagenesis of three surface loops in the enzyme as well as a number of other point mutations. Importantly, the regions changed do not contact the substrate directly and the

20 resulting enzyme is inefficient at catalyzing the hydrolysis of peptide bonds. These observations demonstrate the inherent complexity of designing improved substrate specificity in the chymotrypsin family. Other modifications of the substrate specificity in the SI family of proteases have also been successful (Table 1.3). A number of specificity determinants have been uncovered by mutagenesis targeted to probe additional features of the enzyme, such as zymogen processing and protease-inhibitor interactions [60-64]. The wealth of biochemical and structural information available suggests the ability to design extended substrate specificity of the S2 to S4 pockets.

Protease Engineered Property Ref. Trypsin Conversion to -like primary [65] specificity Increased PI specificity towards Arg [66] side chains Increased PI specificity towards Lys [67] side chains SI' Engineering to favor basic residues [68] Kallikrein Modification of S2 binding pocket [69] FIXa Conversion to FXa-like extended [70] substrate specificity Thrombin Alteration of P2-P4 preference creating [52,53] an protease

Table 1.3 Select examples of successful protein engineering of serine proteases.

1.10 Streptomyces griseus Trypsin

Streptomyces griseus trypsin (SGT) was initially purified from Pronase - a commercial preparation of secreted proteases - and characterized on the basis of its hydrolysis of Na-benzoyl-L-arginine ethyl ester (B AEE) and casein and its inhibition by soybean trypsin inhibitor [71]. Further characterization of its specificity and inhibition identified SGT as a

21 typical broad specificity trypsin-like serine protease of the SI family that hydrolyzes polypeptide chains on the C-terminal side of basic residues (Arg and Lys) [72]. Based on

sequence alignments, SGT is more similar to bovine trypsin than to other bacterial serine proteases [73,74]. The structure of SGT was subsequently determined by x-ray crystallography and refined to 1.7 A, and revealed a three-dimensional fold that is also more

similar to mammalian serine proteases than bacterial proteases [75,76]. Although similarities exist at the sequence and structural level, SGT differs from its mammalian homologues in its reduced number of amino acid insertions in the polypeptide chain when aligned by either

sequence or structure (Appendix A). Moreover, SGT contains only three disulfide bonds rather than the five or six typically observed in the SI family of proteases. These differences

suggest that SGT could be used as a model scaffold to study the substrate specificity and other properties demonstrated by mammalian proteases.

1.11 Statement of Hypothesis

If all substrate specificity determinants of coagulation factor Xa are known, then their

introduction into Streptomyces griseus trypsin will result in a protease with similar substrate

selectivity.

1.12 Objectives and Outline

Although a wealth of data is available, several questions remain about the overall mechanism of specificity of serine proteases. Amongst the poorly characterized details of

serine proteases are the flexibility of the active site and its influence on substrate specificity,

the role of water molecules in the active site and the possibility of designing ultra-high

22 specificity serine proteases that can be tailored to desired reactions. In this study, SGT is

developed as a model for mammalian serine protease specificity as it has similarity in both

sequence and structure, and is derived from a bacterial source that should allow for production

in other bacteria and hence allow genetic modification of the protein.

Based on the structural similarities observed near and around the active site of S.

griseus trypsin compared to mammalian serine proteases, site-directed mutagenesis should

produce a catalytically active protease with the specificity of factor X. Four questions are to be addressed in this study:

(1) Does the recombinant SGT protein produced from Bacillus subtilis have similar

enzymatic properties as the wild type protein?

(2) What mutations are required to increase the primary specificity of the enzyme

towards Arg side chains?

(3) What point mutations or surface loops near the active site confer a greater degree of

extended specificity in the enzyme?

(4) What are the complete requirements to convert SGT to coagulation factor Xa-like

specificity?

By engineering substrate specificity into a protease where little exists, a number of benefits will result. As mentioned previously, proteases are used in the site specific cleavage

of recombinant proteins and the enzymes used are costly due to their production from blood.

Proteases resulting from a bacterial expression system would cost far less and facilitate

increased usage. Through the design of specificity similar to a coagulation factor, one can

examine the mechanisms by which the proteases generate specificity and the roles of other regions of the polypeptide that might be involved in specificity. For example, nearly all

23 coagulation factor proteases have a specific sodium binding site and the binding of sodium results in a 3- to 5-fold rate increase in catalysis [77]. Sodium binding in thrombin also alters the substrate specificity of the enzyme, yet dissection of this process has been limited owing to the interconnected relationship amongst biochemical events within the enzyme.

Development of a recombinant bacterial expression system for S. griseus trypsin

(Chapter 2) was a crucial obstacle to overcome in this research. Using this expression system, the primary specificity of the protease was genetically engineered to favor Arg over

Lys side chains at the PI position (Chapter 3). Subsequent mutagenesis of the S2 to S4 pockets of SGT was carried out to engineer coagulation factor Xa substrate specificity to the enzyme (Chapter 4). On the basis of these results, a framework for the production of novel proteases with substrate specificities not observed in nature is outlined and other future directions are discussed (Chapter 5).

24 Chapter 2. Recombinant Protein Expression of

Streptomyces griseus Trypsin in Bacillus subtilis

2.1 Introduction

Development of an efficient, cost effective and scaleable recombinant protein expression system is the first step in protein engineering. Soluble, active and high purity protein must result from an efficient expression system. A number of organisms have been used for this purpose including those from bacteria, fungi, yeast, and eukaryotic cell lines.

Protein expression in lower organisms, such as gram-positive and gram-negative bacteria, costs significantly less but the drawback of these systems is their inability to produce complex proteins. Longer production times and higher costs are associated with eukaryotic based systems; however, their use is usually required when the protein to be produced is large (>60 kDa), has a complex fold and contains disulphide bonds or requires post-translational modification (such as glycosylation). For protein engineering, bacterial, fungal, or yeast expression hosts are typically used due to their amenability to genetic manipulation and efficiency of protein production.

Successful expression of a recombinant protein in lower organisms requires a number of favorable biochemical features. Low toxicity, simple structural fold, lack of disulphide bonds, lack of post-translational modification, and small size tend to help production.

Recombinant expression is also significantly influenced by the properties of the gene that encodes the polypeptide. Optimal codon usage and lack of secondary structure have been shown to be problematic in the expression of several proteins [78]. In many instances,

25 however, having favorable characteristics at the genetic and protein level may still not result

in successful expression and a great deal of trial and error in different systems is needed.

Bacterial protein expression is widely performed in Escherichia coli. As a host for genetic manipulation, E. coli is unrivalled in the diversity and simplicity of methods established for genetic manipulation. Purification and alteration of DNA in these gram- negative bacteria is straightforward and highly reproducible. Indeed, much of the history of molecular biology is the result of the study of this organism. Unfortunately, E. coli is not as

adept at the production of foreign proteins. A number of attempts have been made to engineer the genome of this organism to increase its capacity for recombinant protein production, yet no universal solution has been found [79]. For this reason a number of other bacteria have been investigated as alternatives including Bacilli [80,81], Lactobacilli [82], Streptomyctes

[83], Pseuodomonads [84] and Caulobacter [85]. Importantly, these alternatives have the

ability to secrete heterologous proteins outside of the cell. Secretion eases subsequent purification, reduces the toxicity associated with the protein, and improves the rate of formation of disulphide bonds.

On the basis of its bacterial source, Streptomyces griseus trypsin is suggested as a good target for recombinant protein expression and subsequent protein engineering. SGT has many features that are required for successful protein engineering. The structure of the enzyme has been determined at high resolution and a wealth of structure-function information has been described for highly similar enzymes [75,76,86]. Although it has a number of desirable properties for recombinant expression such as its small size and simple fold, several features of the gene and protein could complicate the production of the recombinant protein.

Translation of the mRNA encoding the protein may be hampered by the high guanine and

26 cytosine content of the SGT gene (70%). Several authors have shown that sub-optimal codon usage in the first several codons can dramatically decrease protein expression [87,88]. In the reducing environment of a bacterial cell, the nascent polypeptide chain may have difficulty forming the three disulphide bonds required to stabilize the structure of SGT [89]. Lastly, if the protease is produced in an active form, it may degrade components of the cell as well other SGT polypeptides. These properties indicate that production of the recombinant protein may be difficult.

Once produced by an expression host, a recombinant protein must be purified to homogeneity. In this process, some form of protein capture to concentrate and crudely purify the protein is typically linked with one or more chromatographic separations. Throughout this procedure the loss of protein, whether due to instability or the activity of contaminants, must be minimized. Given the large body of literature on serine proteases, particularly trypsin-like proteases of the chymotrypsin family, a number of reagents are available commercially and can be used in a variety of purification methods. One particular advantage of using a bacterial expression host is the lack of post-translational modification in the target protein which minimizes sample heterogeneity in the protein purified. Thus, recombinant trypsin-like proteases derived from a bacterial source should be easily purified in high yield.

To begin introducing substrate specificity into SGT, an efficient expression system was required. I chose a bacterial expression system that would facilitate down stream processing and future high-throughput studies. B. subtilis is an excellent expression system for SGT due to its ability to secrete active recombinant protein into the extracellular environment. Comparison of the enzymatic properties and three dimensional structure of the purified protein to the natively derived protein from Streptomyces griseus show that both

27 proteins are identical. These studies provide the basis for the subsequent engineering of substrate specificity.

2.2 Materials & Methods

2.2.1 Plasmids, Bacterial Strains, and Growth Conditions

Escherichia coli was grown using standard methods [90]. B. subtilis strain WB700 was grown in super-rich medium [91] or on tryptose blood agar base (Difco) at 37°C. For the

B. subtilis carrying plasmid pWB980 [92], kanamycin was added to a final concentration of

10 p,g ml"1 in both liquid and solid media.

2.2.2 DNA manipulation

Procedures for genomic (5. griseus (ATCC 10137)) and plasmid DNA manipulation were carried out using established protocols [90]. Plasmid DNA was purified using a QIAprep spin miniprep kit (Qiagen). Enzymes were obtained from New England Biolabs and Roche

Molecular Biochemicals. For PCR amplification of the SprT gene encoding SGT, the following oligonucleotides were designed to maintain reading frame of the sacB signal peptide present in plasmid pWB980:

5' -ggaagcttttgcaGTCGTCGGCGGAACCCGCGCGG-3'

5' -ggtotagattaGAGCGTGCGGGCGGCCGAGG-3'

(restriction sites are underlined, the SprT gene specific sequences are given in upper case).

The PCR fragment was first cloned into pBluescript KS+ (Stratagene) and then sub-cloned into pWB980 using the Hindlll and Xbal restriction enzyme sites contained in the oligonucleotide primers. Transformation of B. subtilis strain WB700 was performed by the

28 method of Spizizen [93]. DNA sequence analysis of the cloned gene was performed using the

Big Dye Terminator kit and analyzed on an ABI 3700 DNA Sequencer (Applied Biosystems).

2.2.3 Protein Purification

In order to purify the native and recombinant protease to homogeneity, a purification strategy was developed at low pH to minimize autolysis. After centrifugation to remove cellular debris (5,000 x g, 1 hr), recombinant SGT was purified from the supernatant of 1 L of

B. subtilis WB700 culture. Sequential ammonium sulphate fractionation was carried out at

30% and 85% saturation. The 85% (NH^SCM fraction pellet was resuspended in 20 mM sodium acetate buffer pH 4.5, dialyzed against the same buffer and applied to a column (15 cm x 1.5 cm) of SP Sepharose Fast Flow (Amersham Pharmacia). After extensive washing with 20 mM sodium acetate buffer containing 50 mM NaCI, pH 4.5, the bound proteins were eluted with 20 mM sodium acetate buffer, pH 4.5, containing 150 mM NaCI. The active fractions were pooled and applied to a Benzamidine Sepharose 4 Fast Flow column (8 cm x

0.75 cm) (Amersham Pharmacia). The column was washed with 20 mM sodium acetate buffer containing 500 mM NaCI, pH 4.5, and the enzyme was eluted in the same buffer containing in addition 40 mM benzamidine HC1 (Sigma). Fractions containing active protease were pooled, concentrated and dialyzed against 10 mM Tris-HCI buffer containing 150 mM NaCI and 20

mM CaCl2, pH 7.6 using a 10,000 NMWL Ultrafree-4 centrifugal filter unit (Millipore). Gel filtration through a column (45 cm x 0.75 cm) of Sephadex G-75 (Amersham Pharmacia) was

performed using 10 mM Tris-HCI containing 150 mM NaCI and 20 mM CaCl2, pH 7.6.

Similarly, native SGT was isolated from 1 g of extracellular filtrate from S. griseus (Sigma).

The final protein concentration was determined by UV absorbance at 280 nm, using the

29 extinction coefficient 37,100 M"1 cm"1 [94] or by a BCA protein assay kit (Pierce). Active site titration was performed using 4-nitrophenyl p'-guanidinobenzoate and a standard curve of p- nitrophenol (Sigma). Sodium dodecylsulphate polyacrylamide gel electrophoresis and

Coomassie Blue staining were performed according to standard procedures [90]. N-terminal protein microsequence analysis was performed by the University of Victoria - Genome BC

Proteomics Centre (Victoria, Canada). Electrospray-mass spectrometry was carried out on a

PE-Sciex API 300 triple quadrupole mass spectrometer (Sciex) equipped with an Ionspray ion

source. The mass spectrometry was performed by Dr. S. He in the Withers laboratory (Dept. of Chemistry, UBC).

2.2.4 Kinetic Analysis

Kinetic analysis was performed in 10 mM Tris-HCI buffer containing 150 mM NaCI,

20 mM CaCl2, and 0.1 % PEG 8000, pH 7.6. A standard of 7-amino 4-methylcoumarin

(AMC) was used to quantify the rates of hydrolysis of the fluorogenic substrates Tos-Gly-Pro-

Arg-AMC and Tos-Gly-Pro-Lys-AMC (Bachem). A minimum of six substrate concentrations ranging from 1 to 50 \iM was used. The final concentration of the enzyme in each assay was

0.5 nM. Non-linear regression of the initial reaction rates and calculation of the kinetic parameters were performed using the Graphpad Prism 3.0 software (Graphpad).

2.2.5 Crystallization

In previous studies, crystals of the native SGT were obtained through batch

crystallization using (NH4)2S04 [75,76]. In the current study, proteins were crystallized using similar conditions (10-15 mg/mL protein, 1.5 M (NH^SO^ 10 mM calcium acetate, pH 6.2)

30 except hanging drop vapor diffusion was utilized where the reservoir contained 1.55 M

(NH4)2S04. Crystals appeared in two to three weeks to dimensions of approximately 0.3 x 0.3 x 0.3 mm. Data were collected at 100 K (Oxford Cryostream) with a Mar345 detector mounted on a Rigaku RU-200 X-ray generator (50 kV, 100 mA) with Osmic focusing

mirrors. Crystals were soaked briefly in 20% glycerol, 2.2 M (NH4)2S04for cryoprotection prior to data collection. Data were processed using the HKL package and refined using CNS version 1.1 in combination with Xtalview [95-97]. The previously reported native SGT structure (PDB entry 1SGT) was used as a model for rigid body refinement of the structure

[76].

2.3 Results and Discussion

2.3.1 Production and Purification of Recombinant SGT

Previous studies demonstrated the ability to produce soluble trypsin-like enzymes in the periplasmic space of E. coli [98-100]. In our experiments, however, E. coli was incapable of generating soluble SGT despite using a variety of plasmid constructs in a number of bacterial host strains. The presence of three disulfide bonds, the high G+C% content of the

SprT gene (70%) and the toxicity of the recombinant protein are possible reasons for the lack of production of recombinant proteins in E. coli [78,88]. To overcome these limitations B. subtilis WB700, a strain that is deficient in seven proteases, was used to produce sufficient yields of recombinant SGT for kinetic and structural analysis [101]. Unlike E. coli, B. subtilis is capable of secreting proteins into the extracellular environment, which facilitates rapid detection, purification and analysis of recombinant proteins.

31 Secretion of proteins into the extracellular medium is facilitated by the presence of a single plasma membrane in the gram-positive bacterium B. subtilis. Following translation by the ribosome, a nascent polypeptide chain is targeted for secretion by the presence of a cleavable amino-terminal signal peptide. On the basis of genome sequence analysis, over 300 proteins (-7.3 % of all genes) have been postulated for secretion in B. subtilis [102-104]. The ability to secrete such a wide diversity of proteins by this organism has been used for the production of a number of recombinant proteins [105-107].

Protein secretion in all bacteria, including B. subtilis, is primarily due to the ATP- dependent Sec pathway. Recognition of the signal peptide is mediated by the signal recognition particle protein complex, which shuttles the unfolded protein to the cell membrane [108]. Removal of the signal peptide occurs as the denatured protein translocates across the cell membrane and is typically carried out by the type I signal peptidase SipS in B. subtilis [109]. In our expression system, secretion of SGT into the extracellular environment was mediated by fusing the gene to the signal peptide sequence of levansucrase (SacB)

(Figure 2.1). Thus, the N-terminus of the protein is not accessible to the active site and the protease is inactive until it is secreted from the cell. Cleavage after the Ala-Phe-Ala sequence at the junction of the fusion protein by SipS generates the correct N-terminus. SGT can then fold in the comparatively non-reducing environment outside of the cell. By mimicking the native organism for the production of the recombinant SGT, we have developed an efficient and novel expression system for the production of trypsin-like enzymes.

32 Xbal (4376)

Replicase

Kanamycin nucleotidyltransferase

Figure 2.1 Plasmid map of SGT gene cloned into pWB980 for recombinant

protein expression in B. subtilis. The gene was cloned to maintain the reading

frame of the SacB signal peptide which facilitates secretion of the recombinant

protein into the extracellular environment.

Recombinant protein yields of >15 mg/L of culture medium were obtained within 24 hours of growth at 37°C. The four step purification typically produced 10-15 mg/L of B. subtilis culture with an overall yield of 80% (Table 2.1). Four separation techniques were applied to yield the highest purity enzyme possible. The methods were chosen based on their compatibility and mild conditions. For example, an affinity chromatography step using

soybean trypsin inhibitor was found to bind SGT effectively. However, removal of the protein

33 from this type of column required pH 2.0 and it was feared that such conditions may destroy unstable mutants of SGT.

Vol. Total Total Specific Purification Yield (mL) Protein Activity Activity (fold) (%) (mg) (mmol/s) (pmol/s/mg) Media 1000 16000 59 4 1 100

(NH4)2S04 50 400 53 134 34 90 precipitate SP Sepharose 15 18.3 51 2774 694 87 Benzamidine 5 12.4 49 3856 964 81 Sepharose G-75 Superdex 4 12.2 48 3955 989 81

Table 2.1 Purification table for recombinant SGT (bSGT) from B. subtilis

extracellular supernatant. Activity was measured by the hydrolysis of the

chromogenic substrate Bz-Ue-Glu-Arg-pNA at 40 uM in 10 mM Tris-HCI, 20

mM CaCl2, pH 7.6.

Native and recombinant SGT were purified to homogeneity prior to analysis. Purity was assessed by several different criteria. By electrospray ionization mass spectrometry, the expected and observed masses for the native and recombinant protein were within experimental error (23106.9 and 23107.0 amu, respectively). For both proteins the integrated data from the m/z+ fragments yielded a single unambiguous peak with minimal background

(Figure 2.2). In addition, when the protease was treated with phenylmethanesulfonyl fluoride,

SDS-PAGE showed a single band of the expected molecular weight (Figure 2.3). Only two autolytic fragments of SGT were observed when the protease was boiled prior to SDS-PAGE in the absence of a strong inhibitor.

34 Amino-terminal sequence analysis revealed that the wild-type enzyme had a single

unambiguous sequence NH2-Val-Val-Gly-Gly-Thr-Arg corresponding to the published SGT sequence (12). Together with the protein assays and active site titration data, these results suggest that the final protein preparation was greater than 99% pure.

BioSpec Reconstruct for +Q1: 5.92 min (6 scans) from BSGT 1.80e5 cps 23107. 0

1.4e5H

21047.0 21637.0 LA L< A ^^Jwakla^i i .. 24355.0 21000 23000 25000 Mass, amu

Figure 2.2. ES-MS spectrum of purified recombinant SGT. The single m/z

peak at 23,107.0 amu with minimal background indicates the high purity of

the recombinant protein.

35 kDa A 97.4 66.2 45.0 31.0

21.5

14.4

Figure 2.3 SDS-PAGE of purified recombinant SGT. Lane A: bSGT (0.5 //g)

inactivated with PMSF prior to addition of SDS-PAGE loading buffer and

boiling. Lane B: bSGT (1.0 jug) without PMSF inhibition. The central lane of

the gel contains low-range protein molecular mass standards (Bio-Rad) whose

masses are given on the left-hand side of the gel. The gel was stained with

Coomassie Brilliant Blue.

2.3.2 Kinetic Analysis

Two fluorogenic peptide substrates, Tos-Gly-Pro-Arg-AMC and Tos-Gly-Pro-Lys-

AMC, were used to monitor the PI Arg:Lys preference of the native and recombinant proteases (Table 2.2). The kinetic parameters of the native and recombinant wild-type SGT are similar to previously reported values using the same pair of substrates [66]. On the basis of the similar rates of hydrolysis of these peptide substrates, we can conclude that the recombinant protein behaves identically to the native protease.

36 Tos-Gly-Pro-Arg-AMC Tos-Gly-Pro-Lys-AMC SR/SK*

kcat kcat / Km kcat Km kcat / Km

(min") (uM) (min" uM") (min) (u.M) (min" u.M")

SGT 4880 2.3 2122. 1670 3.6 464 4.6

±410 ±0.2 ± 120 ±0.4

bSGT 4570 2.0 2285 1520 3.2 475 4.8

± 1210 ±0.2 ±60 ±0.2

* SR/SK = (Tos-Gly-Pro-Arg-AMC k^/ Km)/( Tos-Gly-Pro-Lys-AMC kc, / Km)

Table 2.2 PI arginine to lysine preference of SGT enzymes. Arg:Lys preference

was measured by amidolytic activity of the native (SGT), recombinant SGT

(bSGT) using two fluorogenic peptides, Tos-Gly-Pro-Arg-AMC and Tos-Gly-

Pro-Lys-AMC. Values obtained in triplicate ± S.D.

2.3.3 Crystallization and Structure Determination

X-ray diffraction data were obtained for the recombinant wild-type SGT at 1.5 A resolution. Data collection and refinement statistics are given in Table 2.3. The recombinant protease crystallized in the C222] space group and contained one molecule per asymmetric unit and a Matthews coefficient of 2.2 A3/Da (Table 2.3). The structure was deposited in the

PDB database as lOSS. During refinement, low Rcryst and Rfree values were obtained (0.19 and

0.22). These were accompanied by excellent stereochemistry indicating a high quality model.

Inspection of the Ramachandran plot revealed that all non-glycine backbone atoms are in allowed regions, with only Asnl78 adopting a conformation in the generously allowed region

(Figure 2.4). The overall B-factors for the polypeptide atoms were low (-13 A2), and regions with high B-factors were limited to solvent exposed regions that are not involved in crystal packing. The structure of SGT is the highest quality reported to date likely due to the

37 decreased radiation damage as the present crystals were analyzed under cryogenic conditions with a shorter collection time.

Phi (degrees)

Figure 2.4 Ramachandran plot of the crystal structure of recombinant wild-type SGT.

All non-glycine residues are in the allowed conformation. The plot was calculated

using PROCHECK[110]

38 Data collection

Resolution (A) 1.5 (1.55- 1.65)

Total Observations 25923

Completeness (%) 84.1 (74.1)

Average redundancy 2.8

I/oT 20.4 (5.1)

Rmerge (%) 4.1 (18.2)

Refinement statistics

Space group C222i

Cell dimensions (a,b,c) * 50.04, 69.82, 119.65

Molecules per asymmetric unit 1

Rcryst 0.196

Rfree 0.222

Protein atoms* 1623

Solvent atoms per asymmetric unit 211

Average B-factor for protein (A2) 12.9

Average B-factor for water (A2) 21.0

Occupancy of Ca2+ (B A2) 0.53 (14.98)

Bond length deviations (A) 0.007

Bond angle deviations (°) 1.4

a = P = y = 90°; including alternate side chain conformations

Table 2.3 Data collection and refinement statistics of wild-type recombinant

SGT. Statistics for the highest resolution shell are given in parentheses.

39 2.3.4 Comparison of the recombinant and Pronase-derived crystal structure

All amino acid residues in the model of the recombinant SGT were clearly identified and positioned. In 1SGT [76], several residues had weak or absent side-chain density. In the present structure, most of these densities were clearly resolved, although several residues lacked density in the terminal atoms of their side-chains (Thr20, Gln75, Lys82, Thr98,

Ser236, and Arg243) indicating disorder of these solvent exposed atoms. Residues 77 and 79 were modeled as Gly and Ala in 1SGT but are two Ser residues by DNA sequence analysis

[74]. These Ser residues were clearly resolved in the current electron density map. The electron density of one sulphate ion was observed in the oxyanion hole of the substrate binding site and was included in the structure. The position of this sulphate is conserved in anionic salmon trypsin (PDB entry 1BIT), bovine trypsin (1TLD), and porcine (3EST) [111-113]. Alternate conformations were observed for Glnl92, which either points into the solvent or forms a pair of hydrogen bonds with the backbone NH of Glyl48 of an adjacent SGT molecule. In 1SGT the same residue was noted as having high mobility [76].

Differences in the Ca+2 binding site in the recombinant crystal structure are discussed in

Chapter 3. The overall differences between the native (1SGT) and wild-type recombinant

enzyme are minor (Figure 2.5), with a root mean square deviation of all 892 atoms in the Ca backbone of 0.27 A. The largest deviation in backbone occurs at the 174-loop where peptide bond of Alal77a adopts a 180° rotation compared to the native structure.

40 Figure 2.5 Superimposition of the Ca traces of native and recombinant wild-type SGT.

The two structures (PDB ID 1SGT and 10S8) are indistinguishable as indicated by the

small r.m.s deviation between the Cu backbone atoms of the two structures (0.27 A).

2.3.5 Wild-type Native and Recombinant SGT Substrate Binding

A negatively charged residue (D189) is present in the SI pocket and confers the primary specificity of SGT and other trypsin-like enzymes towards positively charged Arg or

Lys side chains [114]. Based on the structure of 1SGT, D189 is located at the base of a narrow cylindrical cleft that can accommodate these side chains [76]. The slight preference for Arg over Lys in this pocket is due to the requirement for a bridging water molecule between the shorter lysyl-side chain and D189 [66]. The y-OH of residue T190 interacts directly with the substrate via hydrogen bonding. Both Lys and Arg side chains adopt

41 favorable conformations for interaction with D189 and the hydroxyl group of T190. Kinetic analysis of the native and recombinant SGT proteases demonstrated an Arg:Lys preference of

4:1.

Extended substrate specificity is largely absent in all broadly specific trypsin-like enzymes. Accessibility of the catalytic triad in SGT is not hindered by the structure of the active site. Crystal structures of other trypsin-like enzymes have demonstrated that the peptide backbone of substrate residues PI to P3 forms an anti-parallel (3-sheet with residues 214 to

216 in the enzyme [24]. Analysis of the substrate specificity of the S2 to S4 pockets in bovine trypsin has shown an absence for preference of any side chain at these positions [115,116].

The structure of SGT suggests an identical mode of substrate binding and an absence of substrate specificity in the S2 to S4 pockets.

2.4 Conclusions

A novel expression system for the production of recombinant SGT has been developed using B. subtilis. High purity protease resulted from a four-step purification protocol. The recombinant protein demonstrates identical biochemical and structural properties to the

Streptomyces derived protease. On the basis of the high level of production, purity, and ability to crystallize the protein, the recombinant protein is highly amenable for engineering extended substrate specificity into the enzyme.

42 Chapter 3. Engineering the Primary Substrate Specificity of

Streptomyces griseus Trypsin

3.1 Introduction

Substrate specificity is a key concept in the analysis of serine proteases. Sequence analysis studies show that the SI family of trypsin-like enzymes likely evolved from a common ancestral gene [117]. The culmination of incremental evolutionary steps led to the appearance of a number of highly specific proteases such as those found in the vertebrate blood coagulation cascades. These proteases fulfill regulatory roles in cellular processes that are distinct from their more primitive roles as degradative and protective enzymes [118].

Much data have been collected on the specificity determinants of serine proteases, resulting in a classification system based on their primary specificity (SI pocket). The specificity of trypsin-like enzymes at the SI pocket is largely defined by the presence of a negatively charged side chain at position 189 [114]. Optimal binding of the positively charged substrate

(Arg or Lys side chains) to residue 189 is mediated by residue 190 [66,119]. In trypsin-like serine proteases, position 190 is occupied by a limited number of amino acids. Degradative proteases with low primary specificity (Arg:Lys preference of 4:1) display Gin, Thr or Ser at position 190, whereas proteases with high primary specificity (Arg:Lys preferences greater than 7:1) contain Ala or Ser at this position [120]. Previous studies have shown that mutagenesis of position 190 can be used to manipulate the substrate specificity of trypsin to favor cleavage after either Arg or Lys side chains [66,67,119].

Similar to degradative vertebrate trypsin-like enzymes, SGT demonstrates a primary substrate preference of ArgLys of 4:1. In the previous chapter I described the production of

43 fully active recombinant SGT from B. subtilis. Using this system, mutants of SGT were constructed with altered preference for arginine to lysine (Arg:Lys). Mutations were designed to mimic those found in other trypsin-like enzymes.

SGT mutant T190P is considerably more active and less Arg-specific when compared with the previously published S190P mutation created in rat anionic trypsin [67]. Kinetic and structural analysis of the mutant protease shows that both the activity and specificity of the enzyme is affected by residues surrounding residue 190. These results further our understanding of the primary substrate specificity of trypsin-like enzymes. Based on the ease of production and purification of the recombinant protein in our B. subtilis expression system,

SGT is an ideal scaffold for the introduction of additional mutations to enhance the substrate specificity of the S2 to S4 binding pockets.

3.2 Materials & Methods

3.2.1 DNA Manipulation and Protein Purification

Using the previously described SGT gene cloned into pBluescript KS+ plasmid, mutagenesis was performed on the gene using a QuikChange site-directed mutagenesis kit

(Stratagene) as described by the manufacturer. Oligonucleotides used for mutagenesis are provided in Table 3.1. DNA sequence analysis of the cloned gene and mutants was performed using the Big Dye Terminator kit and analyzed on an ABI 3700 DNA Sequencer (Applied

Biosystems). Mutant SGT genes were sub-cloned into plasmid pWB980 and transformed into

B. subtilis WB700 as described previously. Mutant SGT proteins were expressed and purified in an identical manner as the wild-type.

44 Mutation Oligon ucleotide T190A 5' -GGCGTCG ACGCCTGCC AGGGT-3'

T190P 5' -GGCGTCGACCCCTGCC AGGGT-3'

T190V 5' -GGCGTCG ACTTCTGCC AGGGT-3'

T190V 5' -GGCGTCG ACGTCTGCC AGGGT-3'

Table 3.1 Oligonucleotides used to mutate residue 190 in the SGT gene. The

reverse complement sequences of these oligonucleotides were also used in the

mutagenesis.

3.2.2 Kinetic Analysis

Kinetic analysis was performed in 10 mM Tris HC1 buffer containing 150 mM NaCI,

20 mM CaCl2, and 0.1 % PEG 8000, pH 7.6. A standard of 7-amino 4-methylcoumarin

(AMC) was used to quantify the rates of hydrolysis of the fluorogenic substrates Tos-Gly-Pro-

Arg-AMC and Tos-Gly-Pro-Lys-AMC (Bachem). A minimum of six substrate concentrations ranging from 1 to 500 u.M was used. Enzyme concentrations ranged from 0.5 to 10 nM.

Benzamidine concentrations ranged from 5 to 200 nM. Non-linear regression of the initial reaction rates and calculation of the kinetic parameters were performed using the Graphpad

Prism 3.0 software (Graphpad).

3.2.3 Crystallization and Structure Determination

Crystallization conditions for the T190P mutant of SGT were similar to the wild-type protein; however 25 mM benzamidine was included in the sample buffer. Crystals appeared in two to three weeks to dimensions of 0.3 x 0.3 x 0.3 mm. Data collection and structure refinement were identical to those used for the wild-type recombinant structure. Electron

45 density of the inhibitor in the SI pocket of the mutant enzyme was evident, and was included in the final model.

3.3 Results and Discussion

3.3.1 Production of SGT Mutants

Expression levels of the four mutant proteases were comparable to the wild-type SGT protein suggesting that the mutations were not detrimental to the overall stability and folding of the proteases. To ensure accurate kinetics, the complete method using the four-step purification protocol was applied to each protein. ES-MS analysis of the recombinant proteases demonstrated the presence of the mutation, as well as high purity of the sample

(Table 3.2).

Theoretical Mass Observed Mass Difference (amu) (amu) (amu)

T190A 23077.0 23079.0 2~0

T190P 23103.0 23098.8 4.2

T190S 23093.0 23090.0 3.0

T190V 23107.1 23102.0 5.1

Table 3.2 ES-MS analysis of the four mutants of SGT.

3.3.2 Kinetic Analysis

Two fluorogenic peptide substrates, Tos-Gly-Pro-Arg-AMC and Tos-Gly-Pro-Lys-

AMC, were used to monitor the PI Arg:Lys preference of recombinant and mutant proteases

(Table 3.3). Values for the native and wild-type recombinant SGT are provided in the table

46 for reference. Mutants T190P and T190A demonstrated a significant increase in PI Arg:Lys

preference over the wild-type enzyme of 18:1 and 8:1, respectively. All four mutants showed

increased Km values for the substrates tested suggesting that the SI pocket of SGT is

optimized for substrate binding, a feature that has been observed in other trypsin-like enzymes

[66,67,119]. A similar trend of fold differences for the Ki value of the small molecule

inhibitor benzamidine was observed, with the exception of the T190P mutant (Table 3.4).

Tos-Gly-Pro-Arg-AMC Tos-Gly-Pro-Lys-AMC SR/SK*

kcat kcat / Km kcat kcat / Km (min") (MM) (min" uM") (min-) (MM) (min" uM")

SGT 4880 ±410 2.3 ±0.2 2122 1670±120 3.6 ± 0.4 464 4.6 bSGT 4570±1210 2.0 ±0.2 2285 1520 ±60 3.2 ±0.2 475 4.8 T190A 4950 ± 470 12.9 ± 1.0 384 192 ± 32 4.4 ± 0.6 44 8.7 T190P 3610±130 67 ±5 54 527 ±2 166 ±2 3 18 T190S 6036 ± 561 6.1 ±0.6 990 2584 ±141 10.1 ± 1.2 256 3.9 T190V 2300 ±177 224 ± 20 11 791 ± 13 231 ±2 3.4 3.2

SR/SK = ( Tos-Gly-Pro-Arg-AMC kcat / Kra)/(Tos-Gly-Pro-Lys-AMC kcat / Kra)

Table 3.3 PI Arginine to Lysine preference of mutant SGT enzymes. Arg:Lys

preference was measured by amidolytic activity of the native (SGT),

recombinant (bSGT) and mutants of SGT using two fluorogenic peptides, Tos-

Gly-Pro-Arg-AMC and Tos-Gly-Pro-Lys-AMC. Values obtained in triplicate ±

S.D.

47 K; (uM)

bSGT 2.7 ±0.8

T190A 9.9 ± 0.7

T190P 16.4 ±2.2

T190S 4.8 ±0.8

T190V 197 ± 37

Table 3.4 Kj values of benzamidine for recombinant SGT and the four mutant

forms.

3.3.3. Crystallization and Structure Refinement

X-ray diffraction data were obtained for the T190P mutant of SGT at 1.9 A resolution.

Data collection and refinement statistics are given in Table 3.5. As with the recombinant wild- type, the mutant crystallized in the C222i space group, contained one molecule per asymmetric unit and a Matthews coefficient of 2.3 A3/Da. The mutant structure was deposited

in the PDB database as 10S8. During refinement, low Rcryst and Rfree values were obtained

(0.17 and 0.21). These were accompanied by excellent stereochemistry (Figure 3.1).

48 T190P SGT

Data collection

Resolution (A) 1.9 (1.93 -2.05)

Total Observations 14471

Completeness (%) 89.7 (79.1)

Average redundancy 7.1

Vol 44.4 (25.2)

Rmerge (%) 3.1 (5.6)

Refinement statistics

Space group C222,

Cell dimensions (a,b,c) * 50.08, 69.60, 119.83

Molecules per asymmetric unit 1

Rcryst 0.167

Rfree 0.212

Protein atoiW 1632

Solvent atoms per asymmetric unit 193

Average B-factor for protein (A2) 12.8

Average B-factor for water (A2) 21.4

Occupancy of Ca2+ (B A2) 0.43 (17.50)

Bond length deviations (A) 0.007

Bond angle deviations (°) 1.4

* a = P = y = 90°;T including alternate side chain conformations

Table 3.5 Data collection and refinement statistics for the T190P mutant of

SGT in complex with benzamidine. Statistics for the highest resolution shell are given in parentheses.

49 Phi (degrees)

Figure 3.1 Ramachandran plot of the crystal structure of the T190P mutant of SGT.

All non-glycine residues are in the allowed conformation. The plot was calculated

using PROCHECK [110]

3.3.4 Ca2+ binding site of B. subtilis derived SGT

In the previously reported structure (1SGT) the calcium binding site consisted of the

Asp 165 and Glu230 carboxylate groups, two well ordered water molecules and the carbonyl oxygen atoms of residues Alal77a and Glul80 (Figure 3.2) [76]. In the present structures,

Aspl65 was shown to adopt a different conformation than observed previously. In the wild- type recombinant SGT structure, the carboxylate of Asp 165 forms a bidentate electrostatic interaction with Argl69 rather than the structural Ca2+ ion. In the T190P model, Aspl65 adopts a pair of alternative conformations, either facing the Ca2+ ion or towards Argl69.

50 Modeling of the two positions at half occupancy generated a lower B -factor for the conformation involved in the electrostatic interaction with Argl69. In both structures, the carbonyl oxygen of Alal77a was oriented away from the Ca2+ ion and does not appear to be involved in the interaction. Three ordered water molecules are associated with the ion, rather than the two previously observed. In contrast, a single disordered water molecule with a high

B-factor(41.4 A2) was found near the Ca2+ ion in the T190P crystal structure. In other trypsin-like proteases, Ca2+ ions have been found previously with high B-factors relative to the overall structure suggesting less than full occupancy (25). The reduced occupancy of the ion is not surprising as a related enzyme, Streptomyces erythraeus trypsin, lacks Glu230 and no calcium is evident in the structure [121]. Moreover, the suggested Ca2+ binding site in SGT is completely different than that observed in mammalian trypsin-like enzymes [76]. When taken with previous observations that the calcium ion plays no role in catalysis but plays a role only in structural stability in low/high pH solutions, these data suggest that the calcium binding site has weak affinity for the ion [72].

3.3.5 T190S and the Loss of 7-CH3

Degradative proteases involved in digestive and protective functions typically possess

Ser or Thr residues at position 190. However, proteases exhibiting higher substrate specificity may also possess these residues [120]. The T190S mutant of SGT demonstrated no significant increase in overall substrate specificity, yet a minor increase in catalytic activity (kc at increase

of 25%) and a 3-fold increase in Km (Table 3.3) were observed for both the Arg and Lys

containing substrates. Loss of the y-CH3 is unlikely to significantly increase the solvent accessibility of D189, and it is more likely that the increased mobility of the y-OH results in

51 the increased Km values for both substrates. This is valid if koai is used as a measure of the

stabilization of the transition state. In eukaryotic trypsin-like enzymes that contain Ser at

position 190, the space that would be occupied by a y-CH3 is filled by methyl groups from either residue 16 or 138 usually in the form of isoleucine side chains. In SGT, both of these residues are valine and possess one fewer methyl group.

Figure 3.2 Comparison of the Ca2+ binding site in SGT enzymes: (A) native,

(B) wild-type recombinant and (C) T190P mutant of SGT. The number of

water molecules (•) that co-ordinate the structural calcium ion (O), as well as

the conformation of the amino acid ligands differ in all three structures.

52 3.3.6 T190V and the Effect of a Branched Side Chain

In contrast to the kinetics of T190S, the T190V mutant demonstrated a 2-fold

reduction in kcat in combination with nearly a 100-fold increase in Km for both Arg and Lys

containing substrates (Table 3.3). Replacement of the y-OH with a methyl group removes the

hydrogen bonding capacity of the residue and reduces the solvent accessibility of D189. This results in a destabilized transition state complex relative to the wild-type protease and a

weaker electrostatic interaction between the substrate and enzyme. The 100-fold increase in

Km for Arg containing substrates compared to the 70-fold increase in Km for Lys containing

substrates indicates the small increase in volume has a more significant effect on the longer

and bulkier arginyl side chain. D189 is not typically observed with VI90 or 116 in naturally

occurring trypsin-like enzymes likely due to the poor catalytic efficiency of these

combinations of side chains.

3.3.7 T190A and the Loss of y-OH

A majority of vertebrate enzymes involved in physiological regulation, such as the

coagulation factor serine proteases, possess Ala at position 190 and exhibit Arg:Lys substrate

specificities ranging from 7:1 for coagulation factor Xa to greater than 14:1 for bovine

thrombin [119]. The T190A mutation in SGT demonstrates the molecular basis for the predominance of Ala at this position in highly specific proteases favoring a PI Arg residue.

Optimal rates of catalysis at low concentrations of substrate tend to be requisite characteristics

of these vertebrate enzymes. Kinetic analysis of the T190A mutant revealed no change in kcat

for Arg containing substrates but rather a 10-fold reduction in kcat for Lys containing

substrates (Table 3.3). As noted previously, the increase in solvent accessibility of D189

53 should stabilize the transition state complex. However, the loss of hydrogen bonding

(previously provided by the y-OH) exhibits a more pronounced effect on the Lys containing substrate due to its requirement of an ordered bridging water molecule to D189.

3.3.8 T190P

Unlike the T190A mutation, whose effects were predominantly on the kcat of the Lys-

containing substrate, the T190P mutation affected the Km significandy. The Km for the Arg containing substrate was 35-fold higher than the wild-type, compared to the 46-fold increase

for the Lys containing substrate (Table 3.3). Similarly, the reduction in kcat for the Arg substrate (25%) was significantly less than the Lys substrate (66%). Together, these changes generate an overall Arg to Lys preference of 18 to 1. A previous report [66] analyzed the

S190P mutant of rat anionic trypsin, which is analogous to the T190P constructed in SGT.

When using the same pair of substrates used in this study, wild-type rat anionic trypsin exhibits a similar primary substrate specificity to SGT. However, the S190P mutation in rat anionic trypsin results in a highly specific protease that favors the Arg substrate 135-fold over the Lys containing substrate. This increase in substrate specificity was combined with a

greater than 10-fold reduction in kcat for both Arg and Lys substrates. These authors suggested that Tyr228 may be involved in steric clashing with the proline ring at position 190, leading to reduced activity. In SGT, residue 228 is also Tyr suggesting an alternate binding mode of this mutant. To address these discrepancies, the crystal structure of the T190P mutant in complex with the small molecule inhibitor benzamidine was investigated.

Binding of the benzamidine inhibitor to the T190P mutant is nearly identical to that observed in other trypsin-like proteases [112]. The proline ring of residue 190 does not adopt

54 a conformation that occludes the negatively charged carboxylate group of Asp 189, nor does it

conflict with Tyr228 suggesting that the previously characterized specificity of the S190P

mutant in rat anionic trypsin was the result of second shell residues at positions 16 or 138

(Figure 3.3). The mutation does not significantly affect the conformation of any of the residues surrounding T190P, including the critical Aspl89. The local r.m.s. deviation is low

(0.70 A) for all 96 atoms within a 5 A radius of residue 190. However, the backbone carbonyl

group of Aspl89 is rotated 45° relative to the wild-type structure due steric constraints of the

proline residue. Rotation of this carbonyl group does not disrupt the hydrogen bond with the backbone nitrogen of residue 17. Hence, the moderate increase in Arg to Lys substrate

specificity of this mutant is the likely the result of the strengthened interaction between the

substrate and Asp 189 in a more hydrophobic environment. The effect on lysyl-side chains is

more predominant due to the lack of hydrogen bonding of the proline ring to the substrate or a bridging water molecule and would reduce the rate of association of the side chain with

Aspl89.

55 Figure 3.3 Comparison of the SI binding pocket in SGT enzymes: (A) the

recombinant wild-type structure and (B) the T190P mutant of SGT complexed

with the benzamidine inhibitor (Benz). In the wild-type structure, the y-OH

points towards the S1 pocket and provides a H-bonding group for the substrate.

Mutation of residue 190 to proline removes this H-bonding capacity without

disrupting the critical D189.

Although Asp 189 adopts a similar conformation to that found in the wild-type protease, it is possible that the binding of the benzamidine inhibitor stabilizes the conformation of this side chain through formation of the electrostatic interaction. Analysis of

56 the inhibition constants of benzamidine with the wild-type recombinant protease and four mutants reveals the basis for the increased specificity without loss of catalytic activity.

Similar fold differences for the Km values for the peptide substrates and values relative to the wild-type are observed for all mutants except T190P. The Kj value is six-fold higher than

the wild-type, whereas the Km values for the Arg and Lys containing substrates increase 35- fold and 46-fold, respectively (Tables 3.3 & 3.4). As the inhibitor forms a direct electrostatic interaction with Asp 189, the minor difference of the inhibitory constants suggests that in the

T190P mutant the interaction occurs in a more hydrophobic environment and is not accompanied by structural rearrangement of the S1 binding pocket.

Whereas T190A demonstrates the interactions present at the SI binding pocket in the majority of proteases with a PI Arg preference, the kinetic analysis of T190P suggests a potential intermediate in the evolution of the vertebrate coagulation cascade. Only two natural proteases have been identified that possess Asp 189 and Pro 190 - hagfish prothrombin and human kallikrein 10, yet neither protein has been characterized with respect to substrate specificity. Both genes have been characterized by DNA sequence analysis of a number of overlapping cDNA library clones [122-124]. The presence of Pro at this position suggests a similar specificity as that observed for the T190P of SGT. Moreover, the residues surrounding residue 190 are identical to those found in SGT. DNA sequence analysis of the hagfish prothrombin gene reveals that many of the features attributed to substrate specificity, such as

the 60- and 99-loops, are identical. Hence, the reduced catalytic activity and higher Km values caused by the proline at this position may be compensated by an increased concentration of the enzyme or substrate in the blood stream of this primitive vertebrate.

57 3.3.9 Second Shell Residues

A number of catalytic or structural studies involving the structure based design of

enzyme properties have demonstrated an important role for second shell residues surrounding

the mutation of interest [125]. In serine proteases, residue 190 interacts directly with the side

chains of residues 16 (the N-terminus of the protein), 138 and 228. Residue 228 is a highly

conserved tyrosine in known trypsin-like enzymes. Residues 16 and 138 are restricted to

hydrophobic side chains (Val, He, and Leu). The differences in kinetic parameters observed

between mutations made in this study, rat anionic trypsin and human trypsin (type I) are likely

due to the presence or absence of methyl groups within this pair of residues [66,67,119].

Moreover, mutagenesis of residue 16 in rat anionic trypsinogen II has been demonstrated to

affect primary substrate specificity of the enzyme [126]. A more detailed investigation of

these residues is required to understand the basis of substrate specificity within this family of

important enzymes.

3.4 Conclusions

On the basis of the ease of production of SGT in the B. subtilis expression system, it is

possible to enhance the specificity at the S2 to S4 binding pockets by either structure-based

design or a directed evolution strategy. Introduction of an affinity tag, such as a hexahistidine

tag, would speed the purification process of the protein and facilitate characterization of recombinant mutant proteases. The ability to crystallize this molecule readily supports SGT as

a model scaffold for understanding the mechanisms of substrate specificity in the highly evolved serine protease family.

58 Chapter 4. Engineering Coagulation Factor Xa Substrate

Specificity into Streptomyces griseus Trypsin

4.1 Introduction

4.1.1 Overview

Architecture of the active site plays a key role in the physiological functions of serine

proteases. Although the catalytic machinery is similar, if not identical, within the family of

serine proteases, the residues comprising the active site dictate function. Differences in the

active site lead to substrate specificity and the level of regulation by protease inhibitors.

Detailed understanding of the molecular basis of substrate specificity is needed to design

inhibitors for therapeutic applications. Moreover, the ability to tailor protease specificity to

meet specialized needs is an attainable goal. At present, it has not been established what

maximum levels of substrate specificity could exist on the serine protease scaffold. Few

studies have involved improving the extended substrate specificity of serine proteases.

Proteases have been extensively characterized with respect to enzymatic properties.

Rates of catalysis against numerous libraries of peptide and polypeptide substrates that are both natural and synthetic in origin are readily available [41,127-131]. Inhibition constants are

similarly abundant due to the importance of physiological regulation and for impeding the progression of a pathology or preventing one from developing [132-137]. In addition, a vast

amount of sequence and structural data can be found in the databases. Indeed, the quantity of information compiled on proteases is daunting. Ultimately these data reflect properties that

59 result from approximately fifty amino acids that create and surround the active site of serine proteases.

Less than twenty amino acid residues are involved in enzyme-substrate interactions in the SI peptidases [24]. A number of residues involved in substrate binding exert their influence via backbone contacts or by stabilization of the entire structure of the protease domain. Some examples include residues 214 to 216 which are involved in formation of the anti-parallel P-strand between enzyme and substrate and three disulfide bonds between residues 42 and 58, 168 and 182, and 191 and 220 that stabilize the entire domain. In both instances, these residues are highly conserved throughout the serine protease family and do not significantly modulate specificity, but rather assist in the enzyme-substrate interactions and stabilize the transition state of the catalytic process [138,139]. Thus, variation in substrate specificity can be reduced in complexity to roughly ten amino acid positions. These residues are located on all sides of the active site and their effects can be altered by their local environments and proximal residues.

Extended substrate specificity is a hallmark of the serine proteases of vertebrate blood coagulation. These proteases serve as useful models to understand substrate specificity throughout the entire family of SI peptidases. Coagulation factor Xa (FXa) is important due to its central role in coagulation and wide spread use in biotechnology-related applications requiring site specific proteolysis. The predominant feature of FXa substrate specificity is a two residue loop at position 99 [140]. Insertion at this site in the sequence positions a large aromatic side chain, Tyr99, in the S2 pocket and restricts P2 substrate residues to small Gly side chains. Less is known about the other determinants of substrate specificity in FXa.

60 Kinetic data for the hydrolysis of a number of different peptide substrates suggests that the S3 and S4 pockets of FXa display minimal selectivity [41]. The S3 specificity pocket is typically the least stringent of the pockets throughout the S1 peptidase family due to the solvent exposure of the P3 side chain. Moderately sized hydrophobic amino acids are preferred by the S4 binding pocket which is dominated by hydrophobic side chains [116].

Given the selectivity of these pockets, it seems possible to engineer FXa-like specificity into a broadly specific protease. Moreover, it is likely that a protease with a stricter preference for the Ile-Glu-Gly-Arg FXa cleavage sequence could be engineered.

Measures of substrate specificity of a protease can differ substantially based on the type of substrate employed. Proteolytic cleavage of a short peptide can misrepresent the rate of hydrolysis of a single bond in a folded protein. Detailed characterization of FXa using comprehensive peptide substrate libraries revealed that large, planar hydrophobic residues were preferred in the S2 pocket over small residues [41,42]. Such an observation contrasts with sequence analysis of in vivo FXa substrates. Amino acid sequences of prothrombin, the predominant in vivo substrate of FXa, from organisms spanning 450 million years of vertebrate evolution show conservation of the Ile-Glu/Asp-Gly-Arg recognition sequence

[122,123]. Moreover, numerous studies have employed FXa for the site specific proteolysis of recombinant proteins [127]. Reports of non-specific proteolysis by FXa are obviously not easy to find in the literature. However, the large number of successful results confirms selectivity for the cleavage sequence is high.

61 4.1.2 Choice of Mutations to Mimic FXa-like Specificity

Previously, I described the design of a protease with a high primary specificity for Arg in the PI position using SGT. Using the optimal mutant of SGT from this work as a starting point, the design of FXa-like specificity was pursued. To convert the specificity of SGT into

FXa, six mutations were introduced into SGT to enhance selectivity of the S2, S3 and S4 pockets (Table 4.1). Mutations were chosen based on sequence conservation in known FXa proteins from various species and inspection of various x-ray crystal structures of members of the SI peptidase family. These mutations in SGT are involved in enzyme-substrate interactions as well as optimization of the overall architecture of the active site (Figure 4.1).

Many of the mutations created also mimic what is found in activated protein C (aPC), factor

IXa (FLXa), factor XIa, and factor Vila (FVIIa).

Name Mutations

T190P T190P

LP 99-loop, T190P

YP 99-loop, T99Y, T190P

YFP 99-loop, T99Y, N174F, T190P

YSFP 99-loop, T99Y, Y172S, N174F, T190P

YSFMP 99-loop, T99Y, Y172S, N174F, E180M, T190P

YSFMPE 99-loop, T99Y, Y172S, N174F, E180M, T190P,

Y217E

Table 4.1 Mutants of SGT constructed to mimic the substrate specificity of FXa. For

the 99-loop mutation, two residues (Lys and Glu at positions 96 and 97) were inserted

similar to that found in the FXa polypeptide sequence.

62 Selectivity of the S2 pocket in FXa is generated by the insertion of two amino acid residues at position 99. Extension of the polypeptide chain at this location positions the Tyr99 side chain into the S2 pocket and restricts access to small aliphatic side chains [140]. Two mutations in SGT were required to engineer selectivity for P2 side chains. First, a two residue insertion was created. Second, Thr99Tyr was added to the loop construct to constrain the S2 pocket in a similar fashion to FXa.

Figure 4.1 Residues involved in the extended substrate specificity of coagulation

proteases. The Na+ binding site is known to play a role in the substrate specificity of

thrombin and lies adjacent to the SI pocket.

E180M was introduced into SGT to remove the electrostatic effects of the negatively charged Glu residue and mimic a residue conserved in the entire SI family peptidases.

Sequence analysis shows that Met 180 is highly conserved and exists in approximately 60% of the family. However, a functional role for residue 180 has not been described in the literature.

Residue Glu 180 in SGT is one of several negatively charged side chains near the S4 pocket

63 that generates an overall negative electrostatic potential in the region. As hydrophobic P4 residues are preferred in FXa, it was conjectured that removal of the charged moiety in the S4 pocket would improve substrate binding.

Y172S and N174F mutations in SGT were constructed to facilitate proper positioning of the 172-loop and increase the hydrophobicity of the S4 pocket. In SGT, Tyrl72 is buried in the core of the structure in a similar fashion to other trypsin-like enzymes [76]. However,

Ser 172 in FXa has a conformation that exposes the side chain to solvent and is involved in the positioning of residue 174 [141]. Mutation at the equivalent positions in bovine trypsin led to a flexible 172-loop and suggests replacement of residues 172 to 174 is not sufficient to create the S4 pocket of FXa [142]. Thus, stabilization of the loop should be important for optimization of the extended substrate specificity in FXa.

Y217E was created in SGT to secure the 172-loop conformation observed in FXa.

Residue 217 is poorly conserved in the SI family, yet conserved in all coagulation proteins identified from a variety of vertebrate species [122,123]. In the crystal structure of human

FXa, residue 217 forms two charge assisted hydrogen bonds with Serl73 via the hydroxyl group of the side chain and amide group of the backbone [143]. Unfortunately, the Glyl73Ser mutant of SGT has not been characterized yet.

Using B. subtilis as an expression host, a number of mutants at the residues discussed above were constructed in SGT and their enzymatic properties determined. Substrate specificity was investigated by comparison of the rates of hydrolysis of a small library of commercially available chromogenic peptides. The mutant bearing all seven mutations did not produce a protease with FXa-like specificity, but rather specificity towards coagulation factor

XIa (FXIa) substrates was observed. An intermediate mutant with six mutations led to a

64 protease with moderate FXa-like specificity. These results confirm previous studies that demonstrated a role for the 99-loop and 172-loop in FXa and related enzymes. Residue 217 is confirmed as a determinant of substrate specificity. However, questions about the detailed role of residue 217 are raised by the data. Further, residue 180 has been identified as a determinant of the substrate specificity in the SI family of peptidases.

4.2 Materials & Methods

4.2.1 Plasmids, Bacterial Strains, and Growth Conditions

E. coli was grown using standard methods [90]. Plasmid DNA was purified using a

QIAprep spin miniprep kit (Qiagen) and manipulated using standard protocols [90]. Enzymes were obtained from New England Biolabs and Roche Molecular Biochemicals. B. subtilis strain WB700 was grown in super-rich medium [91] or on tryptose blood agar base (Difco) at

37°C. For the B. subtilis carrying plasmid pWB980 [92], kanamycin was added to a final concentration of 10 pig raL"1 in both liquid and solid media.

4.2.2 Construction of a Hexahistidine-tagged SGT

Plasmid SGT pET-28(a)+ was previously constructed in an unsuccessful attempt to express recombinant SGT by cloning the gene into the Xbal and Xhol restriction endonuclease cleavage sites of the plasmid. Inspection of electrospray mass spectrometry data of aged samples of recombinant SGT from B. subtilis suggested that Arg243 was weakly susceptible to autolysis. The mutation Arg243Ser was constructed using the QuikChange site- directed mutagenesis kit (Stratagene) as suggested by the manufacturer using the oligonucleotide 5'- GCCTCGGCCGCCAGCACGCTCGAGCAC-3' and reverse complement

65 oligonucleotide. The C-terminal portion of the SGT gene was sub-cloned from pET28(a)+ into bSGT pWB980 via PvuII and Avrll cleavage sites (Figure 4.2). Thus, SGT was cloned in frame with a hexa-histidine tag (HisR pWB980). As this plasmid construct was to be used for sub-cloning mutants of SGT from the pBluescript KS+ E. coli plasmid, a small portion of the gene was deleted by digesting with Narl and self-ligation to yield ANarl HisR pWB980. As a result, successful sub-cloning of mutants would restore protease activity.

Xbal Xhol Avrll Xbal Xhol ^ 1 KHHOHH^ + cpET-28(a) +

Xbal Xhol Avrll SGTTTS^S + Arg243Ser

Hindll I Xbal AvrU H bSGT pWB980

Hindlll Xhol

HisR pWB980 ^)

Hindlll Xhol Narl

ANarl HisR pWB980

Figure 4.2 Plasmid construction for the production of recombinant His-tagged SGT

(HisR pWB980) and a deletion mutant construct for easily identifying successful sub-

cloning of mutant SGT genes (ANarl HisR pWB980).

66 4.2.3 Sequence analysis of the SI Family peptidases

All sequences for the SI family sub-family A peptidases were downloaded from the

MEROPS database (http://www.merops.ac.uk, release date 04-03-2002) [22]. Sequences were pre-aligned by the curators of the database. The complete alignment of 740 proteases was pasted into Microsoft Excel, and then converted into single columns corresponding to individual residues in the polypeptide chain. Distribution of amino acids at each position of interest was obtained using the COUNTIF function within the program. The file served as a useful database that linked amino acids at a defined position in the polypeptide sequence to characterized enzymes. Regions where insertions or deletions occurred in the family were not handled well by this method and inspection of known crystal structures and published literature were required.

4.2.4 DNA Manipulation

Using the previously described SGT gene cloned into pBluescript KS+ plasmid (Chpt.

2.2.2), mutagenesis was performed on the gene using a QuikChange site-directed mutagenesis kit (Stratagene) as described by the manufacturer. Oligonucleotides used for mutagenesis are provided in Table 4.2. DNA sequence analysis of the cloned gene and mutants was performed using the Big Dye Terminator kit and analyzed on an ABI 3700 DNA Sequencer (Applied

Biosystems). Mutant SGT genes were sub-cloned into plasmid ANarl HisR pWB980 via

Hindlll and PvuII restriction sites and transformed into B. subtilis WB700 by the method of

Spizizen [93].

67 Mutation Oligonucleotide

99-Loop 5' -CAGGCCCCCCGGCT AC AACAAGGAGGGCACCGGC AAGGACTGG-3' T99Y 5' -C AAGGAGGGCTACGGCA AGGAC-3' E180M 5' -CTCGTGGCCAACG AGATGATCTGCGCCGGATAC-3' N174F 5' -TCCGCGT ACGGCTTCDGAGCTCGTGGCC-3' Y172S 5' -GCCGCTCCGCGTCCGGCTTCGAGCT-3' N174F Y217E 5' - AGCTGGGGCG AGGGCTGCGCC-3'

Table 4.2 Oligonucleotides used to mutate the SGT gene to mimic residues found in

FXa. The reverse complement sequences of these oligonucleotides were also used in

the mutagenesis.

4.2.5 Purification of His-tagged SGT and Mutants thereof

Overnight B. subtilis 20 mL cultures were used to inoculate 250 mL of super-rich broth containing 10 [ig/mL kanamycin and grown for 16 hrs at 37°C. The supernatant was harvested by centrifugation (30 min., 5000 rpm) and then passed over a Talon affinity resin column (BD Biosciences) (10 cm x 0.75 cm) equilibrated in wash buffer (50 mM sodium phosphate, 500 mM NaCI, pH 8.2). The column was washed with 10 column volumes of wash buffer and the recombinant protein eluted with wash buffer containing 100 mM imidazole. Active fractions containing recombinant protein were concentrated using a 10,000

NMWL Ultrafree-4 centrifugal filter unit (Millipore) and dialyzed with 100 mM Tris-HCI,

150 mM NaCI, 20 mM CaCl2, pH 7.6 in the same unit. Recombinant proteins were stable at

4°C for months. Protein quantification was identical to that described previously (Chpt.

2.2.3).

68 4.2.6 Characterization of Substrate Specificity

Kinetic analysis was performed in 10 mM Tris-HCI buffer containing 150 mM NaCI,

20 mM CaCl2, and 0.1 % PEG 8000, pH 7.6 at 25°C. Reactions (300 uT) were prepared in 96- well microplates using either a RoboSeq 4204 or RoboGo laboratory automation system

(MWG Biotech AG, Ebersburg, Germany) and measured using a Labsystems Multiskan

Ascent plate reader. Peptide substrates were handled as suggested by their manufacturers

(Diapharma, American Diagnostica). Assessment of specificity using chromogenic peptide substrates was estimated by direct analysis of the hydrolysis of each substrate at 40 \iM at two

stages of the purification. In this method, the specificity constant (kcat/Km) was determined from the slope of the natural logarithm of substrate remaining as function of time. Detailed kinetic analyses were performed on a minimum of six substrate concentrations ranging from

20 to 600 pM and enzyme concentrations of 10 to 70 nM. Higher substrates concentration were not examined due to solubility difficulties with the peptides.

4.2.7 Macromolecular Substrate Specificity

Human prothrombin and FXa were purchased from Haematological Technologies.

Prothrombin (3.5 pg) was digested with mutants of SGT (60 ng) and FXa (60 ng) overnight at room temperature in 10 mM Tris-HCI buffer containing 150 mM NaCI, 20 mM CaCb, and

0.1 % PEG 8000, pH 7.6. Proteolytic fragments were resolved by SDS-PAGE following a standard protocol [90].

69 4.3 Results & Discussion

4.3.1 Production of His-tagged SGT

In order to facilitate purification and analysis of a large number of mutants,

simplification of the four step purification scheme described previously was required (Chpt.

2.2.3). Initial attempts to produce recombinant SGT in E. coli were unsuccessful, yet produced a construct bearing SGT in frame with a hexa-histidine tag in the pET-28a(+) plasmid. Several histidine residues in succession promote binding to metal ions such as Cu+2,

+2 +2

Ni , or Co . As the metal ions can be immobilized onto an appropriate chromatography medium, capture of recombinant protein from complex samples is greatly simplified. The C- terminal region of SGT bearing the tag was amenable for subcloning from pET-28a(+) into the B. subtilis plasmid pWB980 through a fortuitous Avrll restriction site. To ensure the tag remained on the protease, the Arg243Ser mutation was introduced into SGT to protect against potential autolytic cleavage.

Yields of SGT bearing a hexa-histidine tag at the C-terminus of the protein were three to five times lower than the non-tagged construct. Maximal yields of recombinant protein bearing the tag did not exceed 5 mg/L of culture compared to 15 mg/L for the wild-type construct. Binding of the recombinant protease to various commercially available Ni+2 or +2

Co -chelated resins was poor suggesting that the tag was partially buried in the protein.

Inspection of the crystal structure of SGT suggests that the first two His residues may lie in a

shallow cleft on the enzyme surface. However, high purity protein resulted from the purification (Figure 4.3). The reduced yield of SGT bearing the tag may hinder folding of the enzyme. Alternatively, the tag may promote interactions with the negatively charged peptidoglycan found on the cell wall of the bacteria. However, the ease of protein purification

70 based on the tag outweighed the demand for greater amounts of protein. Yields of each of the

mutants were similar suggesting the mutations were not detrimental to protein folding. These

results further support B. subtilis as an excellent expression host for the production of

proteases and other proteins that are difficult to produce in E. coli.

A B C D

m -23 kDa

Figure 4.3 Purification of a typical His-tagged mutant of SGT from B. subtilis culture

using Talon metal affinity resin. Lane A, 1 mL supernatant; Lane B, 1 mL column

flow through; Lane C, 1 mL wash buffer. Lane D: concentrated recombinant protein (2

Hg). Samples were concentrated using trichloroacetic acid in lanes A-C.

4.3.2 Techniques for Characterization of Substrate Specificity of Serine Proteases

Characterization of the substrate specificity of wild-type FXa using comprehensive libraries of small peptide substrates revealed the selectivity of the protease is not strict

[41,42]. Interestingly, the P2 preference for large, planar hydrophobic substrates over Gly side chains has been noted in these studies. Moreover, P3 and P4 side chains are weakly selected for. On the basis of this specificity, FXa is thought to resemble a low efficiency trypsin rather than a highly selective thrombin [41]. One drawback of the libraries used to analyze the substrate specificity is the potential error caused by differences in conformation of each of the peptides in which each peptide may adopt differing conformations and lead to bias.

71 In the present study, we have used commercial substrates that differ from the natural

polypeptide substrates or short synthetic peptides composed of L-amino acids. These

substrates show enhanced specificity for members of the coagulation cascade and have been

widely used in clinical applications. Non-standard amino acids are present including optimal

blocking groups at their N-termini to generate highly specific substrates (Figure 4.4).

Comparison of the rates of hydrolysis of these substrates shows that the mutations created in

SGT have altered the active site geometry and substrate specificity of the enzyme.

4.3.3 Extended Substrate Specificity of SGT

With the T190P mutant of SGT as the starting point, extended substrate specificity at

the S2 to S4 positions was introduced by substitution of residues found in FXa. The T190P

mutant was chosen over the T190A mutation for the higher Km values. As the Km value for most peptides bearing the T190P mutant was approximately 50 to 100 pM, chromogenic

substrates would be useful in the kinetic analysis of mutants. In contrast, the T190A mutant of

SGT displayed Km values below 10 uM for each substrate. Kinetic dissection of subsequent mutants based on the T190A mutation would require the use of fluorescent substrates that are less commonly used for coagulation proteases. Moreover, the T190P mutant permitted

estimation of the specificity constant (kcat/Km) using a direct interpretation of the rate of hydrolysis of peptide substrate. In this method, a single substrate concentration that is well

below the Km value is hydrolyzed and the rate measured by spectrophotometry. The natural logarithm of substrate remaining as function of time is plotted as a straight line whose slope

approximates kcat/Km. A number of mutants of SGT were characterized by this method and it was invaluable for determining successful alterations of substrate specificity.

72 S-2222 S-2238 Bz-lle-Glu(y-OR)-Gly-Arg-pNA H-D-Phe-Pip-Arg-pNA R=H (50%) and R=CH3(50%)

S-2288 S-2302 H-D-lle-Pro-Arg-pNA H-D-Pro-Phe-Arg-pNA

S-2366 pyroGlu-Pro-Arg-pNA

Figure 4.4 Substrates used to characterize mutants of SGT with altered substrate specificity

(Diapharma, West Chester, Ohio). Non-standard amino acids are incorporated into the substrate for improved discrimination among proteases.

73 4.3.4 Effect of the 99-loop on the Substrate Specificity of SGT

Substrate specificity of the S2 binding pocket in coagulation proteases is affected by the presence of a two or three amino acid insertion termed the 99-loop. In FXa, the insertion of two residues facilitates positioning of Tyr99 and restricts access to the S2 pocket.

Characterization of substrate selectivity of SGT revealed preferences inherent within the protease (Figure 4.5). Spectrozyme PCa and S-2366 were preferred two-fold over all other substrates utilized. Both substrates have Pro at P2 and this likely leads to presentation of the

Arg side chain in a conformation more favorable for hydrolysis. Importantly, the FXa preferred substrates were not preferred by the T190P mutant of SGT. Two mutants were designed to show the importance of this loop in generating selectivity in the S2 pocket (Figure

4.5). First, two residues, Lys96 and Glu97, were introduced to lengthen the 99-loop in SGT

(denoted "LP bSGT"). All substrates examined were hydrolyzed with highly similar reaction rates to the T190P mutant using LP bSGT. As this mutant presented the smaller, hydrophilic side chain Thr at position 99, no selectivity in the S2 binding pocket was anticipated. Both of the inserted residues should orient their carbonyl oxygens into the S3/S4 pocket if the loop has the same conformation observed in other proteases. Introduction of T99Y led to a non• specific protease having similar preference for all substrates. These results suggested that the

99-loop was in a similar conformation to that observed in FXa and that additional determinants of substrate specificity were needed to reconstitute the desired specificity.

Notably, substrates bearing P2 Gly were more effectively hydrolyzed suggesting a similar conformation of the loop to FXa.

Structural similarity of the FXa 99-loop is observed in aPC. This likely explains the maintained preference for aPC preferred substrates (Spectrozyme PCa and S-2366) in each of

74 the mutants characterized, including the mutant LP of SGT. In aPC, residue 99 is also Thr.

Mutation of this position in aPC with substitutions found in FXa led to a protease with a

similar a substrate specificity and inhibitory profile to FXa. In the same study, the opposite mutations in FXa (Y99T) led to a similar switching of specificity [140]. Importantly, as determined through peptide substrates, substrate specificity did not correlate with the hydrolysis of macromolecular substrates and demonstrates the crucial role for additional protein-protein interactions in the coagulation proteases.

M S-2266 • S-2366 • S-2222

• SpectrozymeFXa

B SpectrozymePCa • S-2302 T190P

Figure 4.5 Normalized kcat/Km values for the T190P, LP and YP mutants of SGT.

Values derived from independent measurements done in triplicate (± 10% S.D.). Weak

preference for aPC substrates is exhibited by the enzyme initially. Introduction of the

two residue loop and T99Y generates a broadly specific protease.

A number of other serine proteases have insertions at position 99. Large insertions

(greater than 5 residues) are found in several , the Cls protease, as well as [144,145]. Compared to SGT, other proteases have a two or three residue insertion at this region including thrombin and pancreatic elastase II. In each of these enzymes, the loop plays a role in determining the substrate specificity of the enzyme of both the S2 and S3 substrate binding pockets. Increasing the length of the insertion correlates with

75 decreasing catalytic efficiency [145]. Futures studies for engineering substrate specificity

should apply variable lengths of amino acid insertions and compositions of the 99-loop.

4.3.5 Mechanisms of P3 Selectivity in SI Family Peptidases

Selectivity for P3 residues is poor in nearly all SI peptidases. The enzyme-substrate

interaction is limited due to solvent exposure of the P3 side chain. In FXa, P3 binding is generated by the side chains of residues 192 and 215. Throughout the entire family of SI peptidases, residue 215 is highly conserved as a large planar side chain (W, F, Y). All crystal

structures determined to date have shown the side chain in a conformation that borders the S4

specificity pocket (Figure 4.6). However, the backbone carbonyl group of residue 215 is involved in a hydrogen bond with the P3 residue of the substrate. In the crystal structures of wild-type recombinant bSGT and T190P mutant, residue 192 displayed a high degree of flexibility and was modeled as two conformations. Flexibility of this residue has been demonstrated in several coagulation proteases and the alternate conformations facilitate interactions with the P3 and P2' specificity pockets (Figure 4.7). As both SGT and FXa both possess Gin residues at position 192, mutation was not required. However, future work should involve mutagenesis of this residue due to its involvement in substrate specificity and inhibition by protease inhibitors.

Residue 192 has evolved to not disrupt substrate binding and plays a role in enzyme inhibition. Studies involving protein C, FXa, and thrombin have shown that residue 192 is important in protease-inhibitor contacts and is a key basis for differential inhibition [146-148].

Met/Glu/Gln substitutions at position 192 in coagulation factor FXa did not significantly alter substrate specificity using peptide substrates similar to those used in this study [143].

76 Presumably, mutation of the residue to a Lys side chain in SGT would generate an increased preference for acidic side chains at P3. Unfortunately, Lys 192 has not been introduced into any SI peptidases by site-directed mutagenesis yet the residue exists naturally in some members of the family.

170-1

Figure 4.6 S3 & S4 binding pockets of FXa. Gin 192 plays a limited role in

determining the selectivity of the S3 pocket stemming from inherent flexibility of the

side chain. In the FXa structure without a substrate, the side chain points away from

S3 pocket. The model depicted is the superimposition of FXa (PDB ID 1HCG) and

FVIIa in complex with 1,5-dansyl-Glu-Gly-Arg-chloromethyl ketone (PDB ID

1CVW), with the FVIIa structure hidden.

77 Figure 4.7 S3 and S4 binding pockets of FVIIa. In the crystal structure of human

coagulation factor Vila in complex with 1,5-dansyl-Glu-Gly-Arg-chloromethyl ketone

(PDB ID 1CVW), Lys 192 interacts with the negatively charged side chain in P3 of the

inhibitor.

Sequence analysis of the S1 peptidases reveals a subset of proteases that contain positively charged side chains at position 192. These proteases show S3 selectivity for acidic

P3 residues. Notably, venombin A and bilineobin from the moccasin (Agkistrodon bilineatus) have Lys 192 [149,150]. These proteases are found in the of the snake and

78 mimic FXa by activating thrombin at the same position in the polypeptide. Although crystal structures of these proteases have not been reported, molecular modeling of bilineobin suggests that residue 192 occupies an identical position to that observed in FXa, thrombin and

SGT [151]. Creation of an electrostatic interaction between the substrate and enzyme may decrease the rate of deacylation during catalysis. Rates of substrate hydrolysis by Lys 192 bearing proteases are 10-fold lower than observed for the coagulation proteases. Thus, the

Q192K mutant of SGT should be characterized and may yield a stronger preference for acidic side chains at P3 with a concomitant reduction in catalytic efficiency. However, it is likely that the high flexibility of residue 192 will limit the maximal stringency for P3 side chains.

Two mechanisms have evolved to increase the selectivity of the S3 pocket in the SI family of peptidases. In rat mast cell protease II, the absence of a disulphide bond between residues 191 and 220 generates additional enzyme-substrate contacts and enlarges the S3 pocket [152]. P3 selectivity can also be generated by a secondary protein as evident in the structure of staphylokinase in complex with the protease domain of . Staphylokinase acts as a co-factor and inserts several side chains into the S3/S4 pocket. As a result the substrate specificity and inhibitory profile of plasmin are drastically altered [153]. Thus, the serine protease scaffold is highly amenable for further engineering of selectivity in the S3 binding pocket.

4.3.6 Role of the 172-loop and Residue 217 in the SI Peptidases

In order to form the S4 pocket in FXa, a stretch of residues from 172 to 174 adopt a conformation distinct from that observed in broadly specific trypsin like enzymes (Figure

4.8). In SGT and other non-specific serine proteases, a large hydrophobic side chain at residue

79 172 buries itself into core of the enzyme. In turn, the backbone of residues 173 and 174 adopts

a conformation that exposes the side chains of these residues away from the S4 pocket. In

FXa, FIXa, thrombin and aPC residue 172 is Thr, Met, or Ser. In these proteases the 172-loop

adopts an "up" conformation and bounds the S4 pocket. In order to generate the proper conformation of the 172-loop, two mutations were made in SGT: Y172S and N174F.

Figure 4.8 Conformation of the 172-loop in SI peptidases: SGT (A), FXa (B), aPC

(C) and FVIIa (D). In FXa and aPC, the 172-loop adopts a conformation such that

residue 172 is not buried in the core of the enzyme which is evident in all structures of

broad specificity trypsin-like enzymes. FVIIa has a large insertion at residue 170, and

generates substrate specificity in a different manner than other coagulation proteases.

80 Introduction of N174F alone into SGT yielded a protease with selectivity for S-2366, a substrate designed for quantification of FXIa and aPC. Based on the crystal structure of

T190P SGT, N174F should not affect the specificity of the S4 pocket unless the conformation of the 172-loop is distorted. The alteration of substrate specificity may be due to the Phe side chain at position 174 replacing residue Tyrl72 its buried conformation. Subsequent mutagenesis of Y172S suggested that Phe 174 was in a buried conformation as it resulted in a very little change in substrate specificity compared to the YFP mutant of SGT (Figure 4.9). A recent study reported the crystal structure of rat anionic trypsin with similar mutations in the

172-loop, and suggests the basis for aPC and FXIa-like specificity [142].

In the crystal structure of a mutant of rat anionic trypsin with the Ser-Ser-Phe sequence substituted at the 172 to 174 positions, the 172 loop adopts a novel conformation in which Phe 174 buries inward in the structure of the enzyme similar to that anticipated in the mutants of SGT [142]. As a result, the conformation of the side chains at positions 172 and

173 do not mimic those observed in FXa even though the polypeptide sequence is the same.

The S4 pocket is enlarged as the 172-loop extends farther away from the structure. If a similar conformation occurred in the YFP and YSFP mutants of SGT then the preference for substrates with larger side chains at P4 might be more favored. Both S-2366 and S-2266 have large hydrophobic groups at P4 (pyroglutamic acid and D-valine, respectively).

81 1.00 090 080 • S-2266 0 70 • S-2366 0 60 050 • S-2222 0 40

0 so 0?0

0 10 000 YFP YSFP

Figure 4.9 Normalized kcJKm values for the YFP and YSFP mutants of SGT.

Introduction of Y172S yielded no substantial change in substrate specificity. Values

derived from independent measurements done in triplicate (± 10% S.D.).

Addition of the E180M mutation to the YSFP SGT mutant (YSFMP) led to significant improvement in FXa-like specificity (Table 4.3). In any known crystal structure of an SI family peptidase, the Met side chain at residue 180 makes no direct contacts with the substrate or any component of the protein. A limited diversity of amino acid variation is observed at position 180 in the SI peptidases with a Met residue present in -60% of the all proteins in the family. Only a few of SI peptidases bear a positively charged side chain, such as Lys or Arg, at this position. Positioning of residue 180 is achieved through a P-hairpin formed by residues

177 to 180. The NH group of residue 180 forms a hydrogen bond with the C=0 backbone of residue 177. In SGT, the carboxylate moeity of Glu 180 also forms a hydrogen bond with the

NH group of the amide backbone of Vai 177. Mutation of residue 180 should alter the electrostatic environment of the S3 pocket as well as permit an alternate conformation of the

172-loop.

Dissection of the kinetic constants of the YSFMP mutant bearing the E180M mutation indicates moderate reconstitution of FXa-like properties. Boc-Leu-Gly-Arg was the most

82 specific substrate for the mutant protease indicating the S3 pocket is hydrophobic in the

mutant protease. Substrates designed for the quantification of FXa had the lowest Km values

of all substrates (S-2222 and Spectrozyme FXa). However, substrates with the lowest Km

values were not associated with the highest turnover numbers. Efficient turnover of non-FXa

preferred substrates still occurred suggesting the extended binding pocket of FXa was not

present. Based on the observed properties of the YFP, YSFP and YSFMP mutants of SGT,

stabilization of the 172-loop was implicated as being necessary for generating FXa-like

extended substrate specificity. Further, crystallographic analysis is required to observe the

conformation of the 172-loop caused by these mutations.

Substrate Km (uM) kcat (min) kcat/ Km FXa W Km (uM" min") (uM min") LGR 460 4664 10.1 - B oc-Leu-Gly-Arg-pN A ±12 ±811 S-2222 241 1765 7.3 4.9 Bz-Ue-Glu(Y-OR)-Gly-Arg-pNA ±36 ±959

(R=-HorCH3at50%) S-2366 511 3683 7.2 - pyroGlu-Pro-Arg-pNAHCl ±67 ± 1209 Spectrozyme PCa 623 3788 6.1 4.1 H-D-Lys(g-Cbo)-Pro-Arg-pNA ± 139 ±620 Spectrozyme FXa 356 1800 5.1 18.2 M-D-CHG-Gly-Arg-pNA ±20 ±240 S-2266 2271 4017 1.8 - H-D-Val-Leu-Arg-pNA ±468 ± 1687 S-2238 1396 1056 0.8 1.2 H-D-Phe-Pip-Arg-pNA ±80 ±398 S-2302 1467 512 0.3 - H-D-Pro-Phe-Arg-pNA ±80 ±220 S-2251 1857 527 0.3 - H-D-Val-Leu-Lys-pNA ± 103 ±260

Table 4.3 Steady-state kinetic parameters for the hydrolysis of a series of p-nitroanilide chromogenic substrates by the YSFMP mutant of SGT compared to that observed with FXa under similar reaction conditions. (FXa data from ref. [140]). Values obtained in triplicate (±

S.D.).

83 Within the coagulation proteases, residue 217 plays a role in substrate selectivity.

Mutations at this position have been characterized in thrombin, coagulation factor Vila

(FVIIa), and coagulation factor IXa (FIXa) [143,154,155]. In these studies, residue 217 was

described as a determinant of P2/P3 selectivity via formation of direct contact with the

substrate. Conclusions drawn were based on analysis of substrate specificity using only a few peptide substrates and primarily through enzyme-inhibitor interactions evident in crystal

structures. The latter can lead to misrepresentation of the true function of peripheral residues

of the active site, such as residue 217, as the interaction between enzyme and inhibitor is

typically far tighter than for a natural substrate. Moreover, later studies established the

significance of a sodium binding site in several of the coagulation factor proteases. Binding of

a sodium ion improves the catalytic efficiency of the enzyme by structural changes in the protease domain. Residue 217 has been implicated in stabilization of the ion binding site

[156]. Inspection of crystal structures of FXa, Vila, and thrombin in complex with peptide based inhibitors supports neither of these theories completely [141,157,158]. Residue 217 is typically a negatively charged side-chain throughout the SI family of peptidases, yet this residue is a Tyr in SGT. In several crystal structures of coagulation proteases, the carboxyl group of the Glu side chain is involved in the formation of two charge assisted hydrogen bonds with residue 173 in the 172-loop. Hence, it was postulated that introduction of this side chain would lead to stabilization of the loop in an upwards conformation and enhance FXa- like extended substrate specificity.

Kinetic analysis of the YSFMPE mutant of SGT bearing the Glu217 yielded a protease more similar to FXIa and not FXa (Table 4.4). Little is known about the substrate specificity of FXIa in vitro and no crystal structure has been reported for the protein. Inspection of the

84 polypeptide sequence of human FXIa shows the presence of Metl80 and Glu217, and a three residue insertion loop at position 99 that would present either Ser or Gly into the S2 pocket

[159]. Crystallographic analysis is required to determine whether the introduced Glu217 side chain adopts a conformation that restricts the P4 pocket or whether it forms an electrostatic interaction with Arg222. As in the YSFP and YSFMP mutations, the Tyrl72 in the polypeptide sequence of FXIa may possess a 172-loop in the down conformation. Hence, all of the determinants thought to generate FXa-like specificity are present in FXIa with the exception of Tyr99 and may explain substrate specificity of the YSFMPE mutant of SGT.

Substrate Km (uM) kcat (min) kcat/ Km FXa kcat/ Km (uM" min") (uM" min") S-2366 526 6691 12.7 - pyroGlu-Pro-Arg-pNAHCl ±23 ± 1409 S-2302 568 5487 9.7 - H-D-Pro-Phe-Arg-pNA ±39 ± 1569 Spectrozyme FXa 167 819 4.9 18.2 M-D-CHG-Gly-Arg-pNA ±37 ± 180 S-2266 488 1624 3.3 - H-D-Val-Leu-Arg-pNA ±33 ±343 Spectrozyme PCa 1952 5870 3.0 4.1 H-D-Lys(g-Cbo)-Pro-Arg-pNA ±34 ±1553 LGR 177 196 1.1 - Boc-Leu-Gly-Arg-pNA ±22 ±86 S-2238 316 198 0.6 1.2 H-D-Phe-Pip-Arg-pNA ±39 ±86 S-2222 2341 1066 0.5 4.9 Bz-ne-Glu(y-OR)-Gly-Arg-pNA ± 126 ±701

(R = H or CH3 at 50%)

Table 4.4 Steady-state kinetic parameters for the hydrolysis of a series of p-

nitroanilide chromogenic substrates by the YSFMPE mutant of SGT compared to that

observed with FXa under similar reaction conditions. (FXa data from ref. [140]).

Values obtained in triplicate ± S.D.

85 4.3.7 Additional Elements Needed for Reconstructing FXa-like Specificity

Additional mutations must be made to improve further FXa-like selectivity of the SGT

mutants constructed in the present study. Non-additive effects of mutations are common for

mutations that are residues in close proximity [153,160,161]. Hence, the absence of one

specificity determinant may hinder the ability of other residues to function properly. Studies

attempting reconstitution of FXa specificity in other proteases may yield clues as to what

element is absent from the YSFMPE mutant of SGT. Unfortunately, these studies for the most

part have focused exclusively on the 99-loop [70].

Rat anionic trypsin (RAT) was mutated in several of the regions characterized in this

study but did not lead to an enzyme with FXa-like properties [142]. Mutations of residue 190,

the 99-loop, and the 172-loop were combined in RAT but were characterized on the basis of

enzyme inhibition rather than substrate specificity. Inhibition constants (K;) of inhibitors of

FXa were typically 10-fold higher with the combined mutant of RAT. On the basis of

structure data, certain inhibitors could assist the stabilization the 172-loop in the proper

upwards conformation. Based on the data from SGT mutants in the present study, it appears

that the conformation of the 172-loop is the key ingredient missing for reconstitution of the

extended substrate specificity of FXa.

Several possibilities exist for further stabilization of the 172-loop. Introduction of

Y217E was unsuccessful at improving the selectivity of the YSFMP mutant of SGT and may be due to the requirement of a hydroxyl group at the side chain of residue 173, which is a Gly

in SGT. Residues within a 6 A radius of residues 172 to 174 in FXa include residues 167 to

176, 182, 215 to 217, 224 and 227. Comparison of the crystal structures of SGT and FXa

show that most of these residues are identical and exist in highly similar conformations in

86 both proteases. Substitutions of the residues adjacent to the 172-loop in the polypeptide

sequence may further stabilize the loop. However, the largest differences in structure are

evident at residues 224 and 227. Importantly, these residues are involved in the formation of

the Na+ binding site in FXa. It is possible that introduction of a Na+ binding site into

YSFMPE SGT will reconstruct the enzymatic properties of FXa. If the site were present, burial of residue 172 or 174 into the core of the enzyme could be disfavored. Importantly, the

YSFMPE mutant has kinetic properties similar to FXIa which also does not possess a Na+ binding site. Sequence analysis of the coagulation proteases indicates that in their

evolutionary past, all other coagulation proteases had a Na+ binding site.

Site specific sodium binding has been demonstrated to play a key role in the catalytic

efficiency of the coagulation factor proteases including FXa, aPC, and thrombin. Di Cera

demonstrated the crucial role of residue 225 in the serine proteases for binding a single

sodium ion near the active site of the enzyme [162]. Binding of sodium and no other alkali

metal to these enzymes generates a 3- to 5-fold increase in catalytic efficiency. In nearly all

serine proteases, residue 225 is occupied by a Pro and these proteases do not demonstrate

increased catalytic activity in the presence of sodium ions. However, in the coagulation factor proteases, this residue is Tyr. Absence of Pro at position 225 allows proper positioning of the carbonyl group of the preceding residue in the polypeptide sequence to bind the Na+ ion [77].

In addition to residue 225, an extensive network of hydrogen bonds and water molecules facilitates stabilization of a sodium ion in a position that is very near the S1 binding pocket

(Figure 4.10). Much research has been devoted to the characterization of the ion binding site

in thrombin, activated protein C and coagulation factor Xa [156,163,164]. Disruption of any component of the ion binding site readily abolishes binding and catalytic activity. The

87 complexity of these enzymes has limited our ability to understand the structural and

thermodynamic effects of sodium binding. However, the crystal structure of thrombin in the

absence of Na+ suggests that removal of the ion destabilizes the entire SI pocket and changes

the conformation of the Cysl68-Cysl82 disulfide bond [165]. The SI pocket lies adjacent to residue 172 in SGT or residue 174 in Y172S N174F mutants of SGT in the buried conformation. Therefore, the conformation of the loop could be altered by the presence of an occupied Na+-binding site

Figure 4.10 Na+-binding site in thrombin. Octahedral co-ordination stabilizes the ion.

Backbone carbonyl groups and water molecules are involved in the interaction. Three

ion-pairs surround the site and provide stability.

Introduction of a sodium binding site into a mutant SGT to improve substrate specificity would involve significant mutagenesis of the gene as nearly all of the residues that

88 comprise the site are absent in SGT. In thrombin, the Na+ binding site is located between two surface loops beginning at residues 180 and 220 [77]. The ion binding site is 15-20 A distal from the catalytic triad, yet lies within 5 A from D189. A cylindrical cavity occupied by up to sixteen water molecules helps to stabilize the site. Bound Na+ is coordinated octahedrally by two carbonyl oxygen atoms provided by R221aand K224, and four buried water molecules.

One of these water molecules hydrogen bonds to the side chain of D189 establishing a direct link between the sodium ion and the SI site [52]. Overall stability of theNa+ site is provided by three ion pairs, R221a-E146, K224-E217, and D222-R187. Sodium binding involves residues adjacent to the substrate binding pocket and may play a role in generating stringent substrate specificity. In particular, residues 192 and 217 were both targets for mutagenesis in the present study and have been implicated in sodium binding.

4.3.8 Utility of a FXa-like Protease

Factor Xa is routinely used for the cleavage of recombinant fusion proteins, yet the preference for the IEGR cleavage sequence is not strict. For example, in honey bee prepromelittin, a related sequence, VLGR, was readily cleaved by FXa [166]. Although this situation is not common and can be prevented by inspection of the polypeptide sequence of the recombinant protein, a more specific protease is desirable. A number of vendors supply alternative proteases with differing recognition sequences, including TEV protease [167], thrombin [127], enterokinase [127], and PreScission™ protease [127] (Table 4.5). However, many plasmids currently employed contain the FXa cleavage motif in-frame with common restriction sites for cloning and recombinant expression of proteins. To assess the ability of mutant SGT proteases to hydrolyze the bond after the Ile-Glu-Gly-Arg sequence in

89 macromolecular substrates, prothrombin was digested with limited amounts of each mutant constructed in SGT (Figure 4.11). Prothrombin was chosen as it is the substrate of FXa in vivo and has a number of potential cleavages sites.

Protease Cleavage Sequence Notes Enterokinase Asp-Asp-Asp-Asp-Lys PI' can not be a Pro. Factor Xa Ile-Glu/Asp-Gly-Arg^ Secondary cleavage sites (due to low P3 and P4 preference) Thrombin Leu-Val-Pro-Arg^Gly-Ser Secondary cleavage sites. Considerably more expensive than FXa TEV protease Glu-Asn-Leu-Tyr-Phe- Less useful for removal Gln^Gly of C-terminal tags PreScission Leu-Glu-Val-Leu-Phe- Remaining sequence protease GlniGly-Pro after hydrolysis is problematic

Table 4.5 Cleavage sites of proteases used in processing recombinant proteins.

Accurate processing of prothrombin was not achieved by any of the mutants constructed in this study. Overnight digestions are typically used for site specific proteolysis of recombinant fusion proteins and hence a similar strategy was employed. Close inspection of the digestion pattern shown in Figure 4.11 shows that only YSFMP SGT yields a proteolytic fragment of similar size to the B-chain of thrombin. However, additional cleavages occur and leave only trace amounts of the desired fragment. The YSFMPE mutant of SGT did not process prothrombin to yield any fragment suggestive of FXa-like specificity and confirms the data provided through hydrolysis of small peptide substrates. Prothrombin activation is a poor representative of what would be anticipated for the cleavage of typical recombinant proteins. In particular, several regions of the protein are optimized for hydrolysis

90 in vivo. Therefore, it is likely that the YSFMP mutant of SGT could be used as an alternative for the site specific proteolysis of recombinant proteins as an alternative to FXa.

B

—Prothrombin —Prethrombin 1

—Thrombin B-chain —Fragment 1.2

—Protease

—Fragment 1

j 1 ~"1 Prothrombin

Fragment 1 Fragment 2

FXa Activation Fragment 1.2 A-chain B-chain Fragments

Figure 4.11 Prothrombin processing by mutants of SGT: T190P (B), YP (C), YSFP

(D), YSFMP (E) and YSFMPE (F) mutants of SGT compared to FXa (G) and

undigested prothrombin (A). Only the YSFMP mutant of SGT processed trace

amounts of a product the same size as thrombin. The YSFMPE mutant of SGT

contains all known specificity determinants of FXa, but did not generate any of the

fragments associated with the activation of prothrombin.

91 4.4 Conclusions & Future Directions

Extended substrate specificity in FXa is the result of four amino acids at positions 99,

174,180, and 192. These residues are positioned accurately by residues 172 and 217, and a

two amino acid insertion at position 99. Other residues that surround these positions are likely

involved in minor optimization of the electrostatic environment and stability of the region.

Substitution of the key residues of FXa into SGT created a protease with similar substrate

specificity that was more similar to FXIa rather than FXa. Based on these findings, the

development of a protease with more stringent specificity for the preferred FXa cleavage

sequence may be possible through additional mutation of SGT particularly by addition of the

Na+ binding site as found in FXa or thrombin. Central to the success of this research was the

use of B. subtilis for production of the recombinant protein and mutants thereof. The ease of protein purification and low cost of production is a significant advantage over previously reported systems. Structural and sequence similarity of SGT to FXa suggests that substrate

specificity of any coagulation protease could be re-created if not bettered.

Increased stringency for the Ile-Glu-Gly-Arg FXa could be generated by mutagenesis

of the 99-loop and position 192 in the polypeptide sequence of the YSFMPE mutant of bSGT.

Characterization of the specificity of FXa shows a weak preference at the P2, P3 and P4 positions of the substrate. Optimization of the 99-loop created in SGT will be needed to

decrease flexibility and restrict access to the S2 pocket. However, it is unknown what mutations will create rigidity in the 99-loop. Mutagenesis of residue 192 to a Lys amino acid

should increase the specificity of the S3 pocket for negatively charged side chains. Increased

stringency in the S4 pocket may be created by strengthening the interaction of residues 173

and 217. In particular, mutants bearing Ser, Thr, or Lys at position 192 may stabilize the loop

92 further. Each of these mutations will.likely cause a decrease in the catalytic efficiency of the enzyme. The suggested amino acids are likely not observed in the wild-type FXa protein due to the physiological requirement of efficiency. Evolution has decreased the potential harmful effect of less than perfect substrate specificity through linkage of additional protein domains that facilitate protein-protein and protein-lipid interactions.

93 Chapter 5. General Discussion and Outlook

5.1 Substrate Specificity Determinants of the SI family of Serine Proteases

Increasing the substrate specificity through mutagenesis of the SI to S4 pockets in

SGT was successful. Six mutations were combined, a two residue insertion and five point mutations, to mimic the active site architecture of coagulation factor Xa. Only the combined mutant demonstrated a strong preference for the desired Ile-Glu-Gly-Arg recognition sequence. Introduction of a seventh mutation in SGT, Y217E, did not further improve the specificity towards FXa preferred substrates. As anticipated, specificity results largely from the amino acid residues which constitute the enzyme-substrate interface but additional regions of the protease are important. A further increase in specificity will require stabilization of the active site architecture and optimization of the electrostatic environment of the entire active site. These mutations will involve second shell or more distal residues throughout the protein.

Molecular evolution over millions of years has accomplished these tasks and has generated diverse proteases and substrate specificities.

A number of proteases share similar architecture and substrate specificity determinants as SGT. In the MEROPS database, SGT is classified as a member of Clan SA, Family SI,

Subfamily A peptidases [168]. Nearly 1000 proteases in this family have been identified, including the vertebrate and invertebrate coagulation factor proteases, kallikreins, , and, complement proteases (Figure 5.1). Notably, these proteases demonstrate trypsin-like, chymotrypsin-like, and elastase-like primary substrate specificities (Table 5.1). Although the overall architecture of these proteases is similar, variations in the active site of the enzyme facilitate differing specificities.

94 j— Insect Trypsins i~L SGT Invertebrate Trypsins r Plasmin I L Vertebrate Trypsins ,_| '— Vertebrate

— Kallikreins I I— Coagulation Factor Proteases

I— C1r C1s Proteases Elastases

Granzymes &

r-S1A- Insect Chymotrypsins

Complement Factor B

— S1B—— Glutamyl

\— S1C—Protease Do S1- S1D—Lysyl endopeptidase

— S1E — Streptogrisin A

I— S1F —Astrovirus serine protease

Figure 5.1 Simplified phylogenetic tree of the SI family of peptidases. In each of the

SI sub-families of peptidases, a diversity of substrate specificities are found. Sub• family A is considerably larger than the other sub-families which are limited in distribution to gram positive/negative bacteria and viruses.

95 Preferred Peptidases PI Notes Ref. Substrate Monomelic, dimeric and oligomeric structure A R,K [41,169] alters specificity P4:1 > V, P3: E > G; Preference for D at PI, and Granzyme B D [41,170,171] results from residue 226 Granzyme H W,Y,F [172] Granzyme M M,L [173] Human glandular R S1' preference for S [128] kallikrein 2 Trypsin R,K Slight S1 preference for R > K [41,174] Plasmin R,K Strong S1 preference for K > R [41] Cruzain Not P/I Very broad specificity [41] Duodenase L,K,Y,F [175] Many proteases have specificities Trocarin R>K [176,177] similar to coagulation factor proteases Table 5.1 Substrate specificities of SI family peptidases.

Chymotrypsin-like proteases require proper positioning of large hydrophobic side chains in the primary binding pocket for efficient catalysis. Hedstrom demonstrated that conversion of a trypsin-like enzyme to a chymotrypsin-like enzyme required extensive mutagenesis of the SI pocket [56,57,59]. In addition to point mutations at positions 189, 216, and 226 several loops adjacent to the pocket were required to generate the change in the primary substrate specificity but resulted in a poorly active enzyme. Importantly, the altered loops do not contact the substrate direcdy. Improvement of the catalytic efficiency against amide substrates was achieved by mutagenesis of Y172W [57]. In the coagulation proteases, the side chain of residue 172 is exposed to the solvent and involved in the S4 substrate binding pocket [178]. In the present study, mutation of residue 172 in SGT was required to mimic FXa-like specificity of the S4 pocket. However, burial of the aromatic side chain at position led to an alternate conformation in the 172-loop. These studies demonstrate that multiple amino acids act in concert and that similar positions in the polypeptide sequence can affect differing specificity pockets in the folded protein.

96 Amino acid insertions in the protein sequence are a major component in the substrate

specificity of serine proteases. Hydrogen bonds, dipole moments, steric constraints, and

altered electrostatic environments can be generated by the introduction of one or more

residues into the protein. In the present work, addition of a two residue loop at position 99

was important in altering the properties of the S2 pocket. Loop insertions are found in many

members of the SI family of proteases.

In addition to position 99, loop insertions are found at other positions in the protease

sequence in the SI family peptidases. Thrombin possesses a 10 amino acid insertion at position 60 as well as a shorter loop at position 148 that are involved in substrate recognition

[179,180]. These loops have limited flexibility that has only been demonstrated by mutagenesis of the protease [181]. Other positions amenable for loop insertions include residue 170 found in several coagulation proteases [182], the "kallikrein loop" at position 90

[183], and also at residue 70 in the neuropsins [184]. The loops can also be involved in biochemical properties other than substrate specificity such as enzyme regulation, protein- protein interactions and zymogen activation.

Extended substrate specificity within the SI family of peptidases can involve residues on the N- and C-termini of the scissile bond of the substrate. Vertebrate coagulation proteases possess substrate specificity in the SI to S4 pockets. Small side chains at PI' are also favored by many proteases due to their decreased steric hindrance with the enzyme. Members of the kallikrein family display selectivity at P2' for Arg side chains. Rat and human mast cell proteases have a marked preference for acidic residues at P2' due to the presence of Lys40,

Argl43, and Lysl92 [185]. Hence, it seems possible that a similar strategy employed to

97 generate coagulation factor Xa specificity in SGT could be applied to generate proteases highly selective for amino acids on both sides of the scissile bond.

5.2 Molecular Evolution of the SI family of Serine Proteases

Over the course of evolution, proteases have been incorporated into a wide variety of cellular processes. Although the protease domain provides the catalytic machinery, additional protein domains are linked to add functionality. Substrate recognition and cellular (or extracellular) localization are two common roles of these domains. For example, the CUB domains facilitate protein-protein interactions [186], Gia domains promote protein-lipid interactions [187] and fibronectin domains localize proteins to extracellular fibrin depositions

[188,189]. Based on the diversity of associated domains, construction of an accurate phylogenetic tree is a difficult task.

A number of approaches have been taken to dissect the evolution of the serine proteases. These studies can be divided into those that examine the entire protease sequence including the associated protein domains, and those which examine only the protease domain or parts thereof [190-194]. All studies have supported the existence of a single ancestor for the entire SI family of peptidases [120]. Approximately 40 amino acid positions are highly conserved throughout the family to produce a consistent three dimensional structure.

Variations in these residues are limited to conservative mutations that preserve the polarity and size of the side chain [117]. The catalytic triad His-Asp-Ser is present in all members of the family [195]. A carboxyl-group containing amino acid at position 194 is also absolutely conserved amongst the family. Asp/Glul94 provides the electrostatic interaction with the N- terminus of the protease domain (residue 16) [196]. Formation of this interaction stabilizes the

98 oxyanion hole and the active site catalytic triad. Limited diversity at a particular residue and close proximity to the active site suggests involvement in enzyme-substrate interactions and specificity.

Analysis of the amino acid distribution at each residue in the substrate binding pockets shows a limited number of possibilities exist in nature (Table 5.2). Determinants characterized in the present work (residues 180, 190, and 217) are moderately conserved, and can be mutated further to yield novel proteases. Other active site residues with less sequence conservation (residues 215 and 228) have not been characterized with respect to substrate specificity but likely can be manipulated to influence specificity. Together with the variability in size and composition of the loops that surround the active site, the ability to design proteases with specificities that do not exist in nature seems possible. At present, it has not been established how highly specific proteases could be developed in the laboratory aside from structure based design. Elements of the present dissertation could be used in such a system.

5.3 Paper, Rock, Scissors Genetic Screening of Trypsin-like Proteases

Design of substrate specificity by structure based techniques is limited by our inability to understand the complex interactions that occur in a protein structure. Mutations thought to exhibit an effect often.result in absent or opposing results to that expected [125,197]. In the present study, both T190P and Y217E mutations in SGT yielded significantly different kinetic properties from what were anticipated. Thus, alternate approaches are required. Randomized mutagenesis of the whole gene or parts thereof combined with genetic selection has been widely adopted for the addition or manipulation of enzymatic properties.

99 Substrate Binding Pocket

S1 S2 S3/P4

16 17 189 190 194 221 228 214 180 192 215 216 217

I V D S D A Y S M Q W G E A.A. 72 60 69 36 99 50 72 93 61 46 63 90 14 %

V I S A E G F T Q K F V Y A.A. 19 22 13 31 1 29 19 2 7 11 19 5 13 %

L A G T N N A H N Y A V A.A. 1 2 9 20 5 2 1 3 10 8 1 7 %

A L N Q S E F H I L A.A. 1 2 2 2 1 3 2 1 1 7 %

V V D T D A Y S E Q W G Y SGT

I V D A D A Y S M Q W G E FXa .

YSFMPE V V D P D A Y s M Q W G E bSGT

Table 5.2 Substrate specificity determinants of SI family sub-family A peptidases

(not including loop regions). Sequence alignment of 740 proteases in the SI family

peptidases shows a limited diversity of amino acids exists at the substrate binding

region in all known members of the family (Sequence analysis described in Chapter

4.2.3). The final mutant construct (YSFMPE) did not yield the desired specificity,

even though all known specificity determinants were incorporated

Converting trypsin, or other primitive proteases, into highly specific enzymes suffers from the disadvantage that the wild-type enzyme will almost always be more active towards any substrate than the target enzyme. Typical directed evolution strategies employ creating novel activities or substrate specificities that are completely disparate from the wild-type enzyme [198,199]. For example, the wild-type enzyme does not catalyze a particular reaction

100 or does so poorly and the screen selects for increased activity. Such a screen can not be

applied to increase proteolytic substrate specificity from a primitive enzyme. A genetic screen

for novel substrate specificity must also compensate for the large diversity of possible

substrates arising from 20 amino acids at each position. A three residue stretch of amino acids

can have 8000 different possible permutations. Moreover, the side chains have a degree of

similarity that can not easily be accounted for. Thus, identification of the desired mutant

protein must rely on strategies employed in nature. In particular, protease inhibitors may be

useful for influencing the molecular evolution of proteases.

Recent molecular evolution of HIV proteases serves as an excellent model for directed

evolution to escape inhibition. For the virus to reproduce, a number of proteins are produced

as precursors that must be cleaved for activation [200,201]. On the basis of this requirement, a

number of protease inhibitors have been designed and applied clinically in the treatment of

this disease [202]. Unfortunately, the virus is known to mutate rapidly and a number of

mutations have been described that directly lead to resistance against inhibition [203]. These

mutations are located at the active site cleft, as well as at distal regions of the protease.

Notably, resistance to inhibition produced altered substrate specificity of the enzyme [204].

Mutations in the protein substrates at the site of cleavage have been demonstrated to

accompany the mutations that generate inhibitor resistance [205]. These observations suggest

that evolution directed by inhibition is a valid concept for the design of highly specific proteases.

Genetic screening for novel substrate specificity could be generated by combining

three components: a protease, an inhibitor that affects the wild-type but not the target protease, and a means by which to visualize activity. Paper, Rock, Scissors genetic screening

101 is put forth to accomplish these tasks (Figure 5.2). Co-expression of an inhibitor with the

protease will provide the direction for the evolution by distinguishing proteases that are

similar to the wild-type in the active site. Variable sized loops at different positions in the

polypeptide sequence of SGT could be added and the whole gene randomly mutated by

various means [206,207]. Only active proteases with altered active site geometry or properties

would overcome the inhibition and potentially yield mutants with improved substrate

specificity. Detection of protease activity could apply a sensitive fluorescent substrate that is

added directly to the solid growth medium. Formation of a halo would show proteolytic

activity and signal a potentially important mutant that could be characterized further.

Significant technical hurdles are evident for this system to function. In particular, the

generation of large mutant libraries in B. subtilis is significantly more difficult than E. coli

and no known protease substrate would be amenable for in vivo selection. To overcome these

limitations new techniques and novel proteins must be developed.

E. coli trypsin inhibitor, ecotin, is a potent inhibitor of trypsin-like serine proteases

with several properties amenable for use in directed evolution. The binding mode of ecotin is

identical to that observed with a natural substrate [208]. Ecotin displays a number of useful properties, including thermostability and stability in extreme pH. The inhibitor exists as a

dimer in solution and has two regions involved in inhibitor-protease interactions (Figure 5.3)

[209]. Marked differences in inhibitory strengths are observed using the wild-type protein and high specificity enzymes (Table 5.3). Coagulation Factor Xa and thrombin have a 100- to

1000-fold lower inhibition constant compared to broadly specific proteases [210]. A more potent inhibitor against trypsin-like enzymes can be constructed through mutagenesis of the

PI Met to Arg [211]. In contrast, the inhibitor can be converted to a monomeric form with

102 1000-fold less inhibitory strength towards trypsin-like enzymes [132]. Thus, ecotin can be genetically manipulated to exhibit a broad range of inhibitory strengths spanning at least six

orders of magnitude. Directed evolution to generate substrate specificity could use ecotin

mutants iteratively, such that the inhibitory strength was increased after each round of

selection.

c Active Protease + + + - Altered Specificity + Inhibited + - + - Production of Halo +

Figure 5.2 Theory behind the Paper, Rocks, Scissors genetic screen. A. Initially, the

wild-type protease is blocked from hydrolyzing a substrate that is included in the solid

growth medium and no halo is evident. B. Random mutagenesis of the protease gene

will lead to mutant proteases that escape inhibition and a halo surrounding growing

bacterial colonies will signal potential clones. C. Several possibilities are accounted

for by this screen include the removal of non-active mutants and discrimination from

enzymes with wild-type characteristics.

103 Figure 5.3 Dimeric structure of ecotin. The primary binding site interacts with the target protease in a conformation identical to that observed with substrates. A secondary protease binding site is provided by the other chain.

Protease Kj (nM)

Bovine trypsin <1

Factor Xa 54

Human leukocyte elastase 55

Human FXIIa 89

Human Kallikrein 163

Table 5.3 Inhibition constants of ecotin against a variety of SI family peptidases.

Thrombin, activated protein C, tissue-type and plasmin are poorly inhibited by wild-type ecotin.

104 Ecotin binds a protease in a similar fashion as a substrate and this property could be

used to further control the directed evolution of protease specificity. Mutagenesis of the three

residues preceding the PI residue could be used to alter the Ki of the inhibitor. The mutated

sequence of the protein would be the opposite of the recognition site of the protease. For

example, monomeric ecotin presenting Gln-Lys-Trp-Met in P4 to PI should poorly inhibit

coagulation factor Xa which prefers the sequence Ue-Glu-Gly-Arg. However, this mutant

inhibitor should still restrict the activity of SGT. Ecotin derives from a bacterial source and

production of the recombinant inhibitor in B. subtilis should be straightforward. Given the

high level of recombinant protein expression in E. coli, yields in B. subtilis should provide a

sufficient molar excess of inhibitor.

Detection of a highly specific protease in high throughput screening requires a

substrate that is well defined, readily quantified, and preferably inexpensive. A commonly

used method for detection of protease activity in bacterial colonies is the addition of skim milk powder to the solid medium. Protease activity leads to a zone of clearance, or halo,

around the colony. Unfortunately, high specificity proteases do not hydrolyze skim milk

effectively. Alternatively, peptide substrates similar to that employed in the present work can be synthesized with fluorescent leaving groups. However, these peptides are costly to produce. Green fluorescent protein (GFP) is commonly used as reporter protein for gene

expression and cellular localization and could be engineered for detection of protease activity

[212-214]. The fluorophore is generated from three adjacent residues, Ser-Tyr-Gly, that are

sequestered on the inside an 11-stranded p-barrel (Figure 5.4) [215]. The protein displays high thermostability and extreme resistance to proteolysis resulting from the tightly packed

structure [216]. Recently, Williams described three regions in GFP that could be rendered

105 sensitive to site specific proteolysis [217]. However, cleavage of the protein at any one of

these sites did not lead to a decrease in fluorescence. These results were likely due to the

stability of the P -barrel structure which did not unfold after hydrolysis of a single bond.

Combining two insertion mutations lead to a protein that did not fold properly, forming

inclusion bodies in their E. coli expression system, and further research was not continued (M.

Williams, personal communication). Random mutagenesis and screening of a double insertion

mutant of GFP could be readily performed to select for soluble protein that fluoresces under

UV light [218]. GFP would then be a converted to a useful reagent for the detection of

proteolytic activity in vivo. Active protease would be evident by the formation of a halo

surrounding a B. subtilis colony if the fluorescent protein was added directly to the solid growth medium in the proposed screen.

Figure 5.4 Structure of GFP and potential regions for insertion of a protease

recognition sequence (PDB code 1EMA). For example, Ile-Glu-Gly-Arg-Ser inserted

at positions 172 and 189 would allow FXa to cleave GFP twice, releasing a strand of

the P -barrel, and removing the fluorescent properties of the molecule.

106 In summary, a directed evolution strategy for the development of novel proteolytic

substrate specificities is possible. Two proteins must be created and characterized prior to

validating such an approach. The recombinant expression of SGT in B. subtilis provides a

useful starting point towards this goal and is a substantial improvement over previously reported systems [59,98]. Mutagenesis of the cleavage sequence presented by GFP and the

inhibitor will permit screening for any specificity desired. Novel substrate binding pockets may result that differ from that observed elsewhere in the family, yet produce similar and more stringent substrate specificities

5.5 Future Opportunities

A number of novel features exist in the trypsin scaffold that are poorly understood and

could be generated on a simplified scaffold such as SGT. The role of sodium binding in

generating catalytic efficiency was previously discussed as a potential avenue for research.

Induced fit mechanisms of substrate specificity have been described, particularly in the [219]. Autolytic activation induced by receptor binding is a well known phenomenon in the vertebrate blood coagulation cascade [220]. Zymogen activation mechanisms are known to differ in the SI family peptidases [221-224]. Linkage of the protease domain to other protein domains could be studied to generate novel function and proteolytic activities. Lastly, little is understood in the mechanism of catalytic rate enhancement caused by phospholipid and co-factor binding in the coagulation cascade [225].

These molecular properties are likely not distinct mechanisms acting alone. Structural proximity and direct interactions with adjacent amino acid side chains suggest complex relationships with substrate binding and the catalytic process have yet to be fully understood.

107 5.6 Significance of the Work

In the present dissertation, a novel expression system for trypsin-like enzymes has

been produced and optimized. Previous studies have for the most part used eukaryotic

proteases which are inherently more difficult to work with based on their evolution to meet

physiological function. I have demonstrated that primitive trypsin-like enzymes derived from

a bacterial source can be used as a scaffold for engineering substrate specificity and functional

properties similar to eukaryotic proteases. A number of mutations were created in SGT that

increased the primary specificity for Arg containing substrates and the extended substrate

specificity was improved to partially mimic FXa. These results show that the substrate

specificity of any protease in the S1 family of peptidases could be duplicated using a similar

approach.

Perfect mimicry of FXa substrate specificity was not achieved likely due to the requirement of additional second shell residues involved in optimization of the binding pocket. Future work could focus on further reproduction of FXa specificity through similar

amino acid substitutions. An alternative approach that involves random mutagenesis and a

novel genetic screen has also been described that may yield highly specific proteases that bear

little sequence similarity with known proteases. Directed evolution of extended substrate

specificity is technically challenging and, if successful, can significantly expand the repertoire

of protease technology.

Highly selective serine proteases would be useful in a number of applications. The

YSFMP mutant of SGT bearing six mutations could be used for the site specific proteolysis of recombinant proteins as an alternative to FXa. In the future, it is anticipated that proteases could be designed with levels of substrate specificity approaching that found in restriction

108 endonucleases [226]. These enzymes could be used for hydrolysis of proteins without the need for manipulation of the DNA sequence for addition of a protease recognition site.

Moreover, peptide ligation through reverse proteolysis has been described and could be combined with highly selective mutants of SGT [227]. Thus, novel proteins could be produced more rapidly in an approach comparable to combinatorial chemistry.

5.7 Conclusions

Substrate specificity of the SI family peptidases is derived from a few amino acids in the protein sequence. Due to the requirement of second shell residues for optimization of the substrate binding site, structure based design of selectivity is a difficult yet achievable goal.

Engineering specificity is then limited by what exists in nature. Molecular evolution has not explored many of the possibilities of substrate specificity due to the physiological requirement for efficient catalysis. Directed evolution, as proposed in the present dissertation or by other means, is the next likely step in the progression of protease technology.

109 Appendix A: Structural Alignment of Selected SI Family Peptidases

Structures of SI family peptidases were aligned using the combinatorial extension method available on-line at http://cl.sdsc.edu/ce.html [228]. Conserved residues in all 8 polypeptides are denoted with * and mutations constructed in SGT are listed below the alignment. 60-loop

1SGT 1 WGGTRAAQGEFPFMVRLSM GCGGALYAQDIVLTAAHCV - --SGSGNNT--SITATGG 1TLD 1 IVGGYTCGANTVPYQVSLNSGYH FCGGSL INSQWWSAAHCY K S- -GIQVRLG 3 TGI 1 IVGGYTCQENSVPYQVSLNSGYH FCGGSLINDQWWSAAHCY K S- -RIQVRLG 1PFX 1 IVGGENAKPGQFPWQVLLNGKIDA- - FCGGSIINEKWWTAAHCIEP- - G -V- -KITWAG 1AUT 1 LIDGKMTRRGDS PWQVVLLDSKKKL -ACGAVLIHPSWVLTAAHCMDES- K KLLVRLG 1HCG 1 IVGGQECKDGECPWQALLINEENEG- FCGGTILSEFYILTAAHCLYQA- K RFKVRVG 1CVW 1 IVGGKVCPKGECPWQVLLLVNGAQ- - LCGGTLINTIWWSAAHCFDKI - K NWRNLIAVLG 1PPB 1 IVEGSDAEIGMS PWQVMLFRKS PQELLCGASLISDRWVLTAAHCLLYP PWDKNF TENDLLVRIG

99-loop I I 1 SGT : 54 WDLQ- -S-G-AAVKVRSTKVLQAPGYN G-TGKDWALIKLAQPIN QPTLKIAT-T T 1TLD: 52 EDNINWE-G-NEQFISASKSIVHPSYN-SNT-LNNDIMLIKLKSAASLNSRVASISLPT-S C 3TGI: 52 EHNINVLE-G-NEQFVNAAKIIKHPNFD-RKT-LNNDIMLIKLSSPVKLNARVATVALPS-S C 1PFX: 55 EYNTEETEP- -TEQRRNVIRAIPHHSYNATVNKYSHDIALLELDEPLTLNSYVTPICIAD-KEYTNI - F 1AUT: 56 EYDLRRWE-K-WELDLDIKEVFVHPNYS-KST-TDNDIALLHLAQPATLSQTIVPICLPD-SGLAEREL 1HCG: 56 DRNTEQEE-G-GEAVHEVEWIKHNRFT-KET-YDFDIAVLRLKTPITFRMNVAPACLPE-RDWAESTL 1CVW : 58 EHDLSE-H-DGDEQSRRVAQVIIPSTYV- PGT -TNHDIALLRLHQPWLTDHWPLCLPERTFSERT-L 1PPB: 65 KHSRTRYE-RNIEKISMLEKIYIHPRYN-WRENLDRDIALMKLKKPVAFSDYIHPVCLPD-RETAAS-L KE Y * * 96 99 -172-loop-

1SGT: 104 AYNQGTFTVAGWGA- NRE - GG SQQRYLLKANVPFVSDAACRSAY GNELVANEEICAGY - 1TLD: 110 ASAGTQCLISGWGN- TKSSGT SYPDVLKCLKAPILSDSSCKSAY PGQIT-SNMFCAGY- 3TGI: 110 APAGTQCLISGWGN- TLSSGV NEPDLLQCLDAPLLPQADCEAS Y PGKIT - DNMVCVGF - 1PFX: 120 LK-FGSGYVSGWGR- VFNRG RSATILQYLKVPLVDRATCLRST KFTIY-SNMFCAGF - 1AUT: 120 NQAGQETLVTGWGY• HSSREKEAKRN-RTFVLNFIKIPWPHNECSEVM SNMVS - ENMLCAGI - 1HCG: 120 MT-QKTGIVSGFGR- THEKGRQS TRLKMLEVPYVDRNSCKLS S SFIIT - QNMFCAGY - 1CVW: 122 AF-VRFSLVSGWGQ- LLDRG ATALELMVLNVPRLMTQDCLQQSRKVGDS PNIT - E YMFCAGY - 1PPB: 130 LQAGYKGRVTGWGN- LKETWTANVGKGQPSVLQWNLPIVERPVCKDST RIRIT-DNMFCAGY- * * S F M * ** 172 174 180

1SGT 160 - PDT---GGVDTC- -QGDSGGPMFRK DNADEWIQVGIVSWGYGCARPGYPGVYTEVSTFASAI 1TLD 166 -L- E- --GGKDSC--QGDSGGPWCS GKLQGIVSWGSGCAQKNKPGVYTKVCNYVSWI 3TGI 166 -L- E- --GGKDSC--QGDSGGPWCN GELQGIVSWGYGCALPDNPGVYTKVCNYVDWI 1PFX 174 -H-E- --GGKDSC--QGDSGGPHVTEVEGT-- SFLTGIISWGEECAVKGKYGIYTKVSRYVNWI 1AUT 180 -L- G- --DRQDAC--EGDSGGPMVASFHGT-- WFLVGLVSWGEGCGLLHNYGVYTKVSRYLDWI 1HCG 174 -D-T- - -KQEDAC--QGDSGGPHVTRFKDT-- YFVTGIVSWGEGCARKGKYGIYTKVTAFLKWI 1CVW 181 -S- D- --GSKDSC- -KGDSGGPHATHYRGT-- WYLTGIVSWGQGCATVGHFGVYTRVSQYIEWL 1PPB 191 -K- PDEGKRGDAC--EGDSGGPFVMKSPFNNR WYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWI *p* ****** ****E*** * ** * * 190 217

%ldentity to SGT Ref. 1SGT 217 ASAARTL Streptomyces griseus trypsin 100 .0 [76] 1TLD 217 KQTIASN Bovine beta-trypsin 35. 1 [112] 3 TGI 217 QDTIAAN Rat anionic trypsin 32 .7 [229] 1PFX 229 KEKTK-- Human coagulation factor IXa 34 .0 [230] 1AUT 235 HGHIRDK Human activated protein C 32. 2 [231] 1HCG 229 DRSMKTR Human coagulation factor Xa 32 .5 [141] 1CVW 236 QKLMRSE Human coagulation factor Vila 35. 9 [158] 1PPB 251 QKVIDQF Human alpha-thrombin 33 .2 [157]

110 Bibliography

1. Katakura, S., Nagahara, T., Hara, T. & Iwamoto, M. (1993). A novel factor Xa inhibitor: structure-activity relationships and selectivity between factor Xa and thrombin. Biochem. Biophys. Res. Commun. 197, 965-72.

2. Sandberg, H., Almstedt, A., Brandt, J., Gray, E., Holmquist, L., Oswaldsson, U., Sebring, S. & Mikaelsson, M. (2001). Structural and functional characteristics of the B-domain-deleted recombinant factor VIII protein, r-VIII SQ. Thromb. Haemost. 85, 93-100.

3. Denault, J. B. & Salvesen, G. S. (2002). Caspases: keys in the ignition of cell death. Chem. Rev. 102, 4489-500.

4. Davie, E. W., Fujikawa, K. & Kisiel, W. (1991). The coagulation cascade: initiation, maintenance, and regulation. Biochemistry 30, 10363-70.

5. Natesh, R., Schwager, S. L., Sturrock, E. D. & Acharya, K. R. (2003). Crystal structure of the human angiotensin-converting enzyme-lisinopril complex. Nature All, 551-4.

6. Neurath, H. (1984). Evolution of proteolytic enzymes. Science 224, 350-7.

7. Klemm, U., Muller-Esterl, W. & Engel, W. (1991). , the peculiar sperm- specific serine protease. Hum. Genet. 87, 635-41.

8. Sattar, R., Ali, S. A. & Abbasi, A. (2003). Bioinformatics of granzymes: sequence comparison and structural studies on granzyme family by homology modeling. Biochem. Biophys. Res. Commun. 308, 726-35.

9. Kostova, Z. & Wolf, D. H. (2003). For whom the bell tolls: protein quality control of the endoplasmic reticulum and the ubiquitin-proteasome connection. EMBO J. 22, 2309-17.

10. Taylor, N. A., Van De Ven, W. J. & Creemers, J. W. (2003). Curbing activation: proprotein convertases in homeostasis and pathology. FASEB J. 17, 1215-27.

11. Stamenkovic, I. (2003). Extracellular matrix remodelling: the role of matrix metalloproteinases. J. Pathol. 200, 448-64.

12. Selkoe, D. J. (1998). The cell biology of beta-amyloid precursor protein and presenilin in Alzheimer's disease. Trends Cell Biol. 8, 447-53.

13. Yager, D. R. & Nwomeh, B. C. (1999). The proteolytic environment of chronic wounds. Wound Repair Re gen. 7,433-41.

Ill 14. Bolton-Maggs, P. H. & Pasi, K. J. (2003). Haemophilias A and B. Lancet 361, 1801-9.

15. Guzman, L. A. & Lincoff, A. M. (1997). Thrombolytic Therapy: The Treatment of Choice in Acute Myocardial Infarction. J. Thromb. 4, 337-343.

16. McKerrow, J. H., Sun, E., Rosenthal, P. J. & Bouvier, J. (1993). The proteases and pathogenicity of parasitic protozoa. Annu. Rev. Microbiol. Al, 821-53.

17. Maliar, T., Balaz, S., Tandlich, R. & Sturdik, E. (2002). Viral proteinases-possible targets of antiviral drugs. Acta Virol. 46, 131-40.

18. Gupta, R., Beg, Q. K. & Lorenz, P. (2002). Bacterial alkaline proteases: molecular approaches and industrial applications. Appl. Microbiol. Biotechnol. 59, 15-32.

19. Rao, M. B., Tanksale, A. M., Ghatge, M. S. & Deshpande, V. V. (1998). Molecular and biotechnological aspects of microbial proteases. Microbiol. Mol. Biol. Rev. 62, 597-635.

20. Bryan, P. N. (2000). Protein engineering of subtilisin. Biochim. Biophys. Acta 1543, 203-222.

21. Barrett, A. J. (1997). Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB). Enzyme Nomenclature. Recommendations 1992. Supplement 4: corrections and additions (1997). Eur. J. Biochem. 250, 1-6.

22. Rawlings, N. D. & Barrett, A. J. (1999). MEROPS: the peptidase database. Nucleic Acids Res. 21, 325-31.

23. Barrett, A. J., Tolle, D. P. & Rawlings, N. D. (2003). Managing peptidases in the genomic era. Biol. Chem. 384, 873-82.

24. Hedstrom, L. (2002). Serine protease mechanism and specificity. Chem. Rev. 102, 4501-24.

25. Lowther, W. T. & Matthews, B. W. (2002). Metalloaminopeptidases: common functional themes in disparate structural surroundings. Chem. Rev. 102,4581-608.

26. Dunn, B. M. (2002). Structure and mechanism of the pepsin-like family of aspartic peptidases. Chem. Rev. 102, 4431-58.

27. Lecaille, F., Kaleta, J. & Bromme, D. (2002). Human and parasitic papain-like cysteine proteases: their role in physiology and pathology and recent developments in inhibitor design. Chem. Rev. 102, 4459-88.

28. Orlowski, M. & Wilk, S. (2000). Catalytic activities of the 20 S proteasome, a multicatalytic proteinase complex. Arch. Biochem. Biophys. 383, 1-16.

112 29. Matthews, B. W., Sigler, P. B., Henderson, R. & Blow, D. M. (1967). Three- dimensional structure of tosyl-alpha-chymotrypsin. Nature 214, 652-6.

30. Dodson, G. & Wlodawer, A. (1998). Catalytic triads and their relatives. Trends Biochem. Sci. 23, 347-52.

31. Warshel, A., Papazyan, A. & Kollman, P. A. (1995). On low-barrier hydrogen bonds and . Science 269, 102-6.

32. Warshel, A. & Papazyan, A. (1996). Energy considerations show that low-barrier hydrogen bonds do not offer a catalytic advantage over ordinary hydrogen bonds. Proc. Natl. Acad. Sci. USA 93, 13665-70.

33. Topf, M., Varnai, P. & Richards, W. G. (2002). Ab initio QM/MM dynamics simulation of the tetrahedral intermediate of serine proteases: insights into the active site hydrogen-bonding network. J. Am. Chem. Soc. 124, 14780-8.

34. Frey, P. A., Whitt, S. A. & Tobin, J. B. (1994). A low-barrier hydrogen bond in the catalytic triad of serine proteases. Science 264, 1927-30.

35. Hartley, B. S. & Kilbey, B. A. (1954). Biochem. J. 56, 289-97.

36. Veldman, A., Hoffman, M. & Ehrenforth, S. (2003). New insights into the coagulation system and implications for new therapeutic options with recombinant factor Vila. Curr. Med. Chem. 10, 797-811.

37. Ruf, W., Dorfleutner, A. & Riewald, M. (2003). Specificity of coagulation factor signaling. J. Thromb. Haemost. 1, 1495-503.

38. Ahmad, S. S., London, F. S. & Walsh, P. N. (2003). The assembly of the factor X- activating complex on activated human platelets. J. Thromb. Haemost. 1, 48-59.

39. Dahlback, B. & Villoutreix, B. O. (2003). Molecular recognition in the protein C anticoagulant pathway. J. Thromb. Haemost. 1, 1525-34.

40. Davie, E. W. (1995). Biochemical and molecular aspects of the coagulation cascade. Thromb. Haemost. 74, 1-6.

41. Harris, J. L., Backes, B. J., Leonetti, F., Mahrus, S., Ellman, J. A. & Craik, C. S. (2000). Rapid and general profiling of protease specificity by using combinatorial fluorogenic substrate libraries. Proc. Natl. Acad. Sci. USA 97, 7754-9.

42. Backes, B. J., Harris, J. L., Leonetti, F., Craik, C. S. & Ellman, J. A. (2000). Synthesis of positional-scanning libraries of fluorogenic peptide substrates to define the extended substrate specificity of plasmin and thrombin. Nat. Biotechnol. 18, 187-93.

113 43. Davidson, C. J., Tuddenham, E. G. & McVey, J. H. (2003). 450 million years of hemostasis. J. Thromb. Haemost. 1, 1487-94.

44. Jiang, Y. & Doolittle, R. F. (2003). The evolution of vertebrate blood coagulation as viewed from a comparison of puffer fish and sea squirt genomes. Proc. Natl. Acad. Sci. USA 100, 7527-32.

45. Schechter, I. & Berger, A. (1967). On the size of the active site in proteases. I. Papain. Biochem. Biophys. Res. Commun. 27, 157-62.

46. Overall, C. M. (2002). Molecular determinants of metalloproteinase substrate specificity: matrix metalloproteinase substrate binding domains, modules, and exosites. Mol. Biotechnol. 22, 51-86.

47. Mattler, L. E. & Bang, N. U. (1977). Serine protease specificity for peptide chromogenic substrates. Thromb. Haemost. 38, 776-92.

48. Nagai, K. & Thogersen, H. C. (1984). Generation of beta-globin by sequence-specific proteolysis of a hybrid protein produced in Escherichia coli. Nature 309, 810-2.

49. Richardson, P. L. (2002). The determination and use of optimized protease substrates in drug discovery and development. Curr. Pharm. Des. 8, 2559-81.

50. Turk, B. E., Huang, L. L., Piro, E. T. & Cantley, L. C. (2001). Determination of protease cleavage site motifs using mixture-based oriented peptide libraries. Nat. Biotechnol. 19, 661-7.

51. Maly, D. J., Leonetti, F., Backes, B. J., Dauber, D. S., Harris, J. L., Craik, C. S. & Ellman, J. A. (2002). Expedient solid-phase synthesis of fluorogenic protease substrates using the 7-amino-4-carbamoylmethylcoumarin (ACC) fluorophore. J. Org. Chem. 67, 910-5.

52. Gruber, A., Cantwell, A. M., Di Cera, E. & Hanson, S. R. (2002). The thrombin mutant W215A/E217A shows safe and potent anticoagulant and effects in vivo. J. Biol. Chem. 277, 27581-4.

53. Dang, Q. D., Guinto, E. R. & di Cera, E. (1997). Rational engineering of activity and specificity in a serine protease. Nat. Biotechnol. 15,146-9.

54. Fersht, A. & Winter, G. (1992). Protein engineering. Trends Biochem. Sci. 17, 292-5.

55. You, L. & Arnold, F. H. (1996). Directed evolution of subtilisin E in Bacillus subtilis to enhance total activity in aqueous dimethylformamide. Protein Eng. 9, 77-83.

56. Kurth, T., Ullmann, D., Jakubke, H. D. & Hedstrom, L. (1997). Converting trypsin to chymotrypsin: structural determinants of SI' specificity. Biochemistry 36, 10098-104.

114 57. Hedstrom, L., Farr-Jones, S., Kettner, C. A. & Rutter, W. J. (1994). Converting trypsin to chymotrypsin: ground-state binding does not determine substrate specificity. Biochemistry 33, 8764-9.

58. Hedstrom, L., Perona, J. J. & Rutter, W. J. (1994). Converting trypsin to chymotrypsin: residue 172 is a substrate specificity determinant. Biochemistry 33, 8757-63.

59. Hedstrom, L., Szilagyi, L. & Rutter, W. J. (1992). Converting trypsin to chymotrypsin: the role of surface loops. Science 255, 1249-53.

60. Hedstrom, L. (1996). Trypsin: a case study in the structural determinants of enzyme specificity. Biol. Chem. 377,465-70.

61. Rezaie, A. R. & Olson, S. T. (1997). Contribution of lysine 60f to SI' specificity of thrombin. Biochemistry 36, 1026-33.

62. Grutter, M. G., Priestle, J. P., Rahuel, J., Grossenbacher, H., Bode, W., Hofsteenge, J. & Stone, S. R. (1990). Crystal structure of the thrombin- complex: a novel mode of serine protease inhibition. EMBO J. 9, 2361-5.

63. Wang, S. X., Esmon, C. T. & Fletterick, R. J. (2001). Crystal structure of thrombin- ecotin reveals conformational changes and extended interactions. Biochemistry 40, 10038-46.

64. Vindigni, A., Winfield, M., Ayala, Y. M. & Di Cera, E. (2000). Role of residue Y99 in tissue plasminogen activator. Protein Sci. 9, 619-22.

65. Hung, S. H. & Hedstrom, L. (1998). Converting trypsin to elastase: substitution of the SI site and adjacent loops reconstitutes esterase specificity but not amidase activity. Protein Eng. 11,669-73.

66. Evnin, L. B., Vasquez, J. R. & Craik, C. S. (1990). Substrate specificity of trypsin investigated by using a genetic selection. Proc. Natl. Acad. Sci. USA 87, 6659-63.

67. Perona, J. J., Evnin, L. B. & Craik, C. S. (1993). A genetic selection elucidates structural determinants of arginine versus lysine specificity in trypsin. Gene 137, 121- 6.

68. Kurth, T., Grahn, S., Thormann, M., Ullmann, D., Hofmann, H. J., Jakubke, H. D. & Hedstrom, L. (1998). Engineering the SI' subsite of trypsin: design of a protease which cleaves between dibasic residues. Biochemistry 37,11434-40.

69. Wang, J., Chao, J. & Chao, L. (1992). Specificity determinants of rat tissue kallikrein probed by site-directed mutagenesis. Protein Eng. 5, 569-75.

115 70. Hopfner, K. P., Brandstetter, H., Karcher, A., Kopetzki, E., Huber, R., Engh, R. A. & Bode, W. (1997). Converting blood coagulation factor TXa into factor Xa: dramatic increase in amidolytic activity identifies important active site determinants. EMBO J. 16, 6626-35.

71. Trop, M. & Birk, Y. (1968). The trypsin-like enzyme from Streptomyces griseus (pronase). Biochem. J. 109, 475-6.

72. Olafson, R. W. & Smillie, L. B. (1975). Enzymic and physicochemical properties of Streptomyces griseus trypsin. Biochemistry 14, 1161-7.

73. Olafson, R. W., Jurasek, L., Carpenter, M. R. & Smillie, L. B. (1975). Amino acid sequence of Streptomyces griseus trypsin. Cyanogen bromide fragments and complete sequence. Biochemistry 14, 1168-77.

74. Kim, J. C, Cha, S. H., Jeong, S. T., Oh, S. K. & Byun, S. M. (1991). Molecular cloning and sequence of Streptomyces griseus trypsin gene. Biochem. Biophys. Res. Commun. 181, 707-13.

75. Read, R. J., Brayer, G. D., Jurasek, L. & James, M. N. (1984). Critical evaluation of comparative model building of Streptomyces griseus trypsin. Biochemistry 23, 6570-5.

76. Read, R. J. & James, M. N. (1988). Refined crystal structure of Streptomyces griseus trypsin at 1.7 A resolution. J. Mol. Biol. 200, 523-51.

77. Di Cera, E., Guinto, E. R., Vindigni, A., Dang, Q. D., Ayala, Y. M., Wuyi, M. & Tulinsky, A. (1995). The Na+ binding site of thrombin. /. Biol. Chem. 270, 22089-92.

78. Hannig, G. & Makrides, S. C. (1998). Strategies for optimizing heterologous protein expression in Escherichia coli. Trends Biotechnol. 16, 54-60.

79. Prinz, W. A., Aslund, F., Holmgren, A. & Beckwith, J. (1997). The role of the thioredoxin and glutaredoxin pathways in reducing protein disulfide bonds in the Escherichia coli cytoplasm. J. Biol. Chem. 272, 15661-7.

80. Udaka, S., Tsukagoshi, N. & Yamagata, H. (1989). Bacillus brevis, a host bacterium for efficient extracellular production of useful proteins. Biotechnol. Genet. Eng. Rev. 7, 113-46.

81. Wong, S. L. (1995). Advances in the use of Bacillus subtilis for the expression and secretion of heterologous proteins. Curr. Opin. Biotechnol. 6, 517-22.

82. Verrips, C. T. & van den Berg, D. J. (1996). Barriers to application of genetically modified lactic acid bacteria. Antonie Leeuwenhoek 70, 299-316.

83. Binnie, C, Cossar, J. D. & Stewart, D. I. (1997). Heterologous biopharmaceutical protein expression in Streptomyces. Trends Biotechnol. 15, 315-20.

116 84. Schweizer, H. P., Hoang, T. T., Propst, K. L., Ornelas, H. R. & Karkhoff-Schweizer, R. R. (2001). Vector design and development of host systems for Pseudomonas. Genet. Eng. (TV. Y). 23, 69-81.

85. Bingle, W. H., Nomellini, J. F. & Smit, J. (2000). Secretion of the Caulobacter crescentus S-layer protein: further localization of the C-terminal secretion signal and its use for secretion of recombinant proteins. J. Bacteriol. 182, 3298-301.

86. Perona, J. J., Hedstrom, L., Rutter, W. J. & Fletterick, R. J. (1995). Structural origins of substrate discrimination in trypsin and chymotrypsin. Biochemistry 34, 1489-99.

87. Makrides, S. C. (1996). Strategies for achieving high-level expression of genes in Escherichia coli. Microbiol. Rev. 60, 512-38.

88. Kane, J. F. (1995). Effects of rare codon clusters on high-level expression of heterologous proteins in Escherichia coli. Curr. Opin. Biotechnol. 6, 494-500.

89. Jurado, P., Ritz, D., Beckwith, J., de Lorenzo, V. & Fernandez, L. A. (2002). Production of functional single-chain Fv antibodies in the cytoplasm of Escherichia coli. J. Mol. Biol. 320, 1-10.

90. Sambrook, J., Fritsch, E. F. & Maniatis, T. (1989). Molecular Cloning: A Laboratory Manual, 2nd Ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.

91. Hailing, S. M., Sanchez-Anzaldo, F. J., Fukuda, R., Doi, R. H. & Meares, C. F. (1977). Zinc is associated with the beta subunit of DNA-dependent RNA polymerase of Bacillus subtilis. Biochemistry 16, 2880-4.

92. Wu, S. C. & Wong, S. L. (1999). Development of improved pUBl 10-based vectors for expression and secretion studies in Bacillus subtilis. J. Biotechnol. 72, 185-95.

93. Spizizen, J. (1958). Transformation of biochemically deficient strains of Bacillus subtilis by deoxyribonucleate. Proc. Natl. Acad. Sci. USA 44,1072-1078.

94. Hatanaka, Y., Tsunematsu, H., Mizusaki, K. & Makisumi, S. (1985). Interactions of derivatives of guanidinophenylalanine and guanidinophenylglycine with Streptomyces griseus trypsin. Biochim. Biophys. Acta 832, 274-9.

95. McRee, D. E. (1999). XtalView/Xfit—A versatile program for manipulating atomic coordinates and electron density. J. Struct. Biol. 125, 156-65.

96. Brunger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse- Kunstleve, R. W., Jiang, J. S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Crystallography & NMR system: A new software suite for macromolecular structure determination. Acta Crystallogr. D. Biol. Crystallogr. 54 (Pt 5), 905-21.

117 97. Otwinowski, Z. & Minor, D. (1997). Processing of X-ray Diffraction Data Collected in Oscillation Mode. Methods Enzymol., Macromolecular Crystallography, Part A (Carter, C. W. & Sweet, R. M., Eds.), 276, Academic Press.

98. Vasquez, J. R., Evnin, L. B., Higaki, J. N. & Craik, C. S. (1989). An expression system for trypsin. J. Cell. Biochem. 39, 265-76.

99. Fukuoka, S. & Nyaruhucha, C. M. (2002). Expression and functional analysis of rat P23, a gut hormone-inducible isoform of trypsin, reveals its resistance to proteinaceous trypsin inhibitors. Biochim. Biophys. Acta. 1588, 106-12.

100. Manosroi, J., Tayapiwatana, C, Gotz, F., Werner, R. G. & Manosroi, A. (2001). Secretion of active recombinant human tissue plasminogen activator derivatives in Escherichia coli. Appl. Environ. Microbiol. 67, 2657-64.

101. Ye, R., Kim, J. H., Kim, B. G., Szarka, S., Sihota, E. & Wong, S. L. (1999). High- level secretory production of intact, biologically active staphylokinase from Bacillus subtilis. Biotechnol. Bioeng. 62, 87-96.

102. Tjalsma, H., Bolhuis, A., Jongbloed, J. D., Bron, S. & van Dijl, J. M. (2000). Signal peptide-dependent protein transport in Bacillus subtilis: a genome-based survey of the secretome. Microbiol. Mol. Biol. Rev. 64, 515-47.

103. Antelmann, H., Tjalsma, H., Voigt, B., Ohlmeier, S., Bron, S., van Dijl, J. M. & Hecker, M. (2001). A proteomic view on genome-based signal peptide predictions. Genome Res. 11, 1484-502.

104. Kunst, F., Ogasawara, N., Moszer, I., Albertini, A. M., Alloni, G., Azevedo, V., Bertero, M. G., Bessieres, P., Bolotin, A., Borchert, S., Borriss, R., Boursier, L., Brans, A., Braun, M., Brignell, S. C, Bron, S., Brouillet, S., Bruschi, C. V., Caldwell, B., Capuano, V., Carter, N. M., Choi, S. K, Codani, J. J., Connerton, I. F., Danchin, A. & et al. (1997). The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature 390, 249-56.

105. Nagarajan, V., Ramaley, R., Albertson, H. & Chen, M. (1993). Secretion of streptavidin from Bacillus subtilis. Appl. Environ. Microbiol. 59, 3894-8.

106. Kakudo, S., Yoshikawa, K, Tamaki, M., Nakamura, E. & Teraoka, H. (1992). Secretory expression of a glutamic-acid-specific endopeptidase (SPase) from Staphylococcus aureus ATCC12600 in Bacillus subtilis. Appl. Microbiol. Biotechnol. 38, 226-33.

107. Witte, V., Wolf, N., Diefenthal, T., Reipen, G. & Dargatz, H. (1994). Heterologous expression of the clostripain gene from Clostridium histolyticum in Escherichia coli and Bacillus subtilis: maturation of the clostripain precursor is coupled with self- activation. Microbiology 140 (Pt 5), 1175-82.

118 108. van Wely, K. H., Swaving, J., Freudl, R. & Driessen, A. J. (2001). Translocation of proteins across the cell envelope of Gram-positive bacteria. FEMS Microbiol. Rev. 25, 437-54.

109. Dalbey, R. E., Lively, M. O., Bron, S. & van Dijl, J. M. (1997). The chemistry and enzymology of the type I signal peptidases. Protein Sci. 6, 1129-38.

110. Vaguine, A. A., Richelle, J. & Wodak, S. J. (1999). SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. Acta Crystallogr. D. Biol. Crystallogr. 55 (Pt 1), 191-205.

111. Meyer, E., Cole, G., Radhakrishnan, R. & Epp, O. (1988). Structure of native porcine pancreatic elastase at 1.65 A resolutions. Acta Crystallogr. B. 44 (Pt 1), 26-38.

112. Bartunik, H. D., Summers, L. J. & Bartsch, H. H. (1989). Crystal structure of bovine beta-trypsin at 1.5 A resolution in a crystal form with low molecular packing density. Active site geometry, ion pairs and solvent structure. J. Mol. Biol. 210, 813-28.

113. Berglund, G. I., Smalas, A. O. & Hordvik, A. (1995). Structure of Anionic Salmon Trypin in a 2nd Crystal Form. Acta Crystallogr. D. Biol. Crystallogr. 51, 725-730.

114. Graf, L., Jancso, A., Szilagyi, L., Hegyi, G., Pinter, K., Naray-Szabo, G., Hepp, J., Medzihradszky, K. & Rutter, W. J. (1988). Electrostatic complementarity within the substrate-binding pocket of trypsin. Proc. Natl. Acad. Sci. USA 85,4961-5.

115. Edwards, P. D., Mauger, R. C, Cottrell, K. M., Morris, F. X., Pine, K. K., Sylvester, M. A., Scott, C. W. & Furlong, S. T. (2000). Synthesis and enzymatic evaluation of a PI arginine aminocoumarin substrate library for trypsin-like serine proteases. Bioorg. Med. Chem. Lett. 10, 2291-4.

116. Furlong, S. T., Mauger, R. C, Strimpler, A. M., Liu, Y. P., Morris, F. X. & Edwards, P. D. (2002). Synthesis and physical characterization of a PI arginine combinatorial library, and its application to the determination of the substrate specificity of serine peptidases. Bioorg. Med. Chem. 10, 3637-47.

117. Rypniewski, W. R., Perrakis, A., Vorgias, C. E. & Wilson, K. S. (1994). Evolutionary divergence and conservation of trypsin. Protein Eng. 7, 57-64.

118. Perona, J. J. & Craik, C. S. (1997). Evolutionary divergence of substrate specificity within the chymotrypsin-like serine protease fold. J. Biol. Chem. 272, 29987-90.

119. Sichler, K., Hopfner, K. P., Kopetzki, E., Huber, R., Bode, W. & Brandstetter, H. (2002). The influence of residue 190 in the SI site of trypsin-like serine proteases on substrate selectivity is universally conserved. FEBS Lett. 530, 220-4.

119 120. Rose, T. & Di Cera, E. (2002). Substrate recognition drives the evolution of serine proteases. J Biol Chem 277, 19243-6.

121. Yamane, T., Kobuke, M., Tsutsui, H., Toida, T., Suzuki, A., Ashida, T., Kawata, Y. & Sakiyama, F. (1991). Crystal structure of Streptomyces erythraeus trypsin at 2.7 A resolution. J. Biochem. (Tokyo). 110, 945-50.

122. Banfield, D. K. & MacGillivray, R. T. (1992). Partial characterization of vertebrate prothrombin cDNAs: amplification and sequence analysis of the B chain of thrombin from nine different species. Proc. Natl. Acad. Sci. USA 89, 2779-83.

123. Banfield, D. K., Irwin, D. M., Walz, D. A. & MacGillivray, R. T. (1994). Evolution of prothrombin: isolation and characterization of the cDNAs encoding chicken and hagfish prothrombin. J. Mol. Evol. 38, 111-SI.

124. Liu, X. L., Wazer, D. E., Watanabe, K. & Band, V. (1996). Identification of a novel serine protease-like gene, the expression of which is down-regulated during breast cancer progression. Cancer Res. 56, 3371-9.

125. Tobin, M. B., Gustafsson, C. & Huisman, G. W. (2000). Directed evolution: the 'rational' basis for 'irrational' design. Curr. Opin. Struct. Biol. 10, 421-7.

126. Hedstrom, L., Lin, T. Y. & Fast, W. (1996). Hydrophobic interactions control zymogen activation in the trypsin family of serine proteases. Biochemistry 35,4515- 23.

127. Jenny, R. J., Mann, K. G. & Lundblad, R. L. (2003). A critical review of the methods for cleavage of fusion proteins with thrombin and factor Xa. Protein Expr. Purif. 31, 1-11.

128. Cloutier, S. M., Chagas, J. R., Mach, J. P., Gygi, C. M., Leisinger, H. J. & Deperthes, D. (2002). Substrate specificity of human kallikrein 2 (hK2) as determined by phage display technology. Eur. J. Biochem. 269, 2747-54.

129. Sheppeck, J. E., 2nd, Kar, H., Gosink, L., Wheatley, J. B., Gjerstad, E., Loftus, S. M., Zubiria, A. R. & Jane, J. W. (2000). Synthesis of a statistically exhaustive fluorescent peptide substrate library for profiling protease specificity. Bioorg. Med. Chem. Lett. 10, 2639-42.

130. Pacini, L., Vitelli, A., Filocamo, G., Bartholomew, L., Brunetti, M., Tramontano, A., Steinkuhler, C. & Migliaccio, G. (2000). In vivo selection of protease cleavage sites by using chimeric Sindbis virus libraries. J. Virol. 74, 10563-70.

131. Rosse, G., Kueng, E., Page, M. G., Schauer-Vukasinovic, V., Giller, T., Lahm, H. W., Hunziker, P. & Schlatter, D. (2000). Rapid identification of substrates for novel proteases using a combinatorial peptide library. J. Comb. Chem. 2, 461-6.

120 132. Pal, G., Szilagyi, L. & Graf, L. (1996). Stable monomeric form of an originally dimeric serine proteinase inhibitor, ecotin, was constructed via site directed mutagenesis. FEBS Lett. 385, 165-70.

133. Herron, D. K., Goodson, T., Jr., Wiley, M. R., Weir, L. C., Kyle, J. A., Yee, Y. K, Tebbe, A. L., Tinsley, J. M., Mendel, D., Masters, J. J., Franciskovich, J. B., Sawyer, J. S., Beight, D. W., Ratz, A. M., Milot, G., Hall, S. E., Klimkowski, V. J., Wikel, J. H., Eastwood, B. J., Towner, R. D., Gifford-Moore, D. S., Craft, T. J. & Smith, G. F. (2000). 1,2-Dibenzamidobenzene inhibitors of human factor Xa. J. Med. Chem. 43, 859-72.

134. Abato, P., Conroy, J. L. & Seto, C. T. (1999). Combinatorial library of serine and cysteine protease inhibitors that interact with both the S and S' binding sites. J. Med. Chem. 42,4001-9.

135. Kiczak, L., Koscielska, K, Otlewski, J., Czerwinski, M. & Dadlez, M. (1999). Phage display selection of PI mutants of BPTI directed against five different serine proteinases. Biol. Chem. 380, 101-5;

136. Meldal, M. (1998). The solid-phase library assay. Methods Mol. Biol. 87, 75-82.

137. Pannekoek, H., van Meijer, M., Schleef, R. R., Loskutoff, D. J. & Barbas, C. F., 3rd. (1993). Functional display of human plasminogen-activator inhibitor 1 (PAI-1) on phages: novel perspectives for structure-function analysis by error-prone DNA synthesis. Gene 128, 135-40.

138. Ross, J., Jiang, H., Kanost, M. R. & Wang, Y. (2003). Serine proteases and their homologs in the Drosophila melanogaster genome: an initial analysis of sequence conservation and phylogenetic relationships. Gene 304, 117-31.

139. Terado, T., Nonaka, M. I., Nonaka, M. & Kimura, H. (2002). Conservation of the modular structure of complement factor I through vertebrate evolution. Dev. Comp. Immunol. 26, 403-13.

140. Rezaie, A. R. (1996). Role of residue 99 at the S2 subsite of factor Xa and activated protein C in enzyme specificity. J. Biol. Chem. 271, 23807-14.

141. Padmanabhan, K, Padmanabhan, K. P., Tulinsky, A., Park, C. H., Bode, W., Huber, R., Blankenship, D. T., Cardin, A. D. & Kisiel, W. (1993). Structure of human des(l- 45) factor Xa at 2.2 A resolution. J. Mol. Biol. 232, 947-66.

142. Reyda, S., Sohn, C, Klebe, G., Rail, K., Ullmann, D., Jakubke, H. D. & Stubbs, M. T. (2003). Reconstructing the binding site of factor Xa in trypsin reveals ligand-induced structural plasticity. J. Mol. Biol. 325, 963-77.

121 143. Hsu, Y. C, Hamaguchi, N., Chang, Y. J. & Lin, S. W. (2001). The distinct roles that Gin-192 and Glu-217 of factor IX play in selectivity for macromolecular substrates and inhibitors. Biochemistry 40, 11261-9.

144. Gaboriaud, C, Rossi, V., Bally, I., Arlaud, G. J. & Fontecilla-Camps, J. C. (2000). Crystal structure of the catalytic domain of human complement els: a serine protease with a handle. EMBO J. 19, 1755-65.

145. Carvalho, A. L., Sanz, L., Barettino, D., Romero, A., Calvete, J. J. & Romao, M. J. (2002). Crystal structure of a prostate kallikrein isolated from stallion seminal plasma: a homologue of human PSA. J. Mol. Biol. 322, 325-37.

146. Rezaie, A. R. & Esmon, C. T. (1996). Molecular basis of residue 192 participation in determination of coagulation protease specificity. Eur. J. Biochem. 242, 477-84.

147. Rezaie, A. R. & Esmon, C. T. (1995). Contribution of residue 192 in factor Xa to enzyme specificity and function. 7. Biol. Chem. 270,16176-81.

148. Rezaie, A. R. & Esmon, C. T. (1993). Conversion of glutamic acid 192 to glutamine in activated protein C changes the substrate specificity and increases reactivity toward macromolecular inhibitors. J. Biol. Chem. 268, 19943-8.

149. Nikai, T., Ohara, A., Komori, Y., Fox, J. W. & Sugihara, H. (1995). Primary structure of a coagulant enzyme, bilineobin, from Agkistrodon bilineatus venom. Arch. Biochem. Biophys. 318, 89-96.

150. Komori, Y., Nikai, T., Ohara, A., Yagihashi, S. & Sugihara, H. (1993). Effect of bilineobin, a thrombin-like proteinase from the venom of common cantil (Agkistrodon bilineatus). Toxicon 31, 257-70.

151. Castro, H. C, Silva, D. M., Craik, C. & Zingali, R. B. (2001). Structural features of a snake venom thrombin-like enzyme: thrombin and trypsin on a single catalytic platform? Biochim. Biophys. Acta 1547, 183-95.

152. Remington, S. J., Woodbury, R. G., Reynolds, R. A., Matthews, B. W. & Neurath, H. (1988). The structure of rat mast cell protease II at 1.9-A resolution. Biochemistry 27, 8097-105.

153. Esmon, C. T. & Mather, T. (1998). Switching serine protease specificity. Nat. Struct. Biol. 5, 933-7.

154. Tsiang, M., Jain, A. K., Dunn, K. E., Rojas, M. E., Leung, L. L. & Gibbs, C. S. (1995). Functional mapping of the surface residues of human thrombin. J. Biol. Chem. 210, 16854-63.

122 155. Chang, Y. J., Hamaguchi, N., Chang, S. C, Ruf, W., Shen, M. C. & Lin, S. W. (1999). Engineered recombinant factor VIIQ217 variants with altered inhibitor specificities. Biochemistry 38,10940-8.

156. Zhang, E. & Tulinsky, A. (1997). The molecular environment of the Na+ binding site of thrombin. Biophys. Chem. 63, 185-200.

157. Bode, W., Mayr, I., Baumann, U., Huber, R., Stone, S. R. & Hofsteenge, J. (1989). The refined 1.9 A crystal structure of human alpha-thrombin: interaction with D-Phe- Pro-Arg chloromethylketone and significance of the Tyr-Pro-Pro-Trp insertion segment. EMBO J. 8, 3467-75.

158. Kemball-Cook, G., Johnson, D. J., Tuddenham, E. G. & Harlos, K. (1999). Crystal structure of active site-inhibited human coagulation factor Vila (des-Gla). J. Struct. Biol. 127, 213-23.

159. Fujikawa, K., Chung, D. W., Hendrickson, L. E. & Davie, E. W. (1986). Amino acid sequence of human factor XI, a blood coagulation factor with four tandem repeats that are highly homologous with plasma prekallikrein. Biochemistry 25, 2417-24.

160. Wells, J. A. (1990). Additivity of mutational effects in proteins. Biochemistry 29, 8509-17.

161. LiCata, V. J. & Ackers, G. K. (1995). Long-range, small magnitude nonadditivity of mutational effects in proteins. Biochemistry 34, 3133-9.

162. Guinto, E. R., Caccia, S., Rose, T., Futterer, K., Waksman, G. & Di Cera, E. (1999). Unexpected crucial role of residue 225 in serine proteases. Proc. Natl. Acad. Sci. USA 96, 1852-7.

163. Schmidt, A. E., Padmanabhan, K., Underwood, M. C, Bode, W., Mather, T. & Bajaj, S. P. (2002). Thermodynamic linkage between the SI site, the Na+ site, and the Ca2+ site in the protease domain of human activated protein C (APC). Sodium ion in the APC crystal structure is coordinated to four carbonyl groups from two separate loops. J. Biol. Chem. 277, 28987-95.

164. Underwood, M. C, Zhong, D., Mathur, A., Heyduk, T. & Bajaj, S. P. (2000). Thermodynamic linkage between the SI site, the Na+ site, and the Ca2+ site in the protease domain of human coagulation factor xa. Studies on catalytic efficiency and inhibitor binding. J. Biol. Chem. 275, 36876-84.

165. Pineda, A.O., Savvides, S.N., Waksman, G., & di Cera, E. (2003). Crystal structure of the anticoagulant slow form of thrombin. J. Biol. Chem. 277, 40177-80.

166. He, M., Jin, L. & Austen, B. (1993). Specificity of factor Xa in the cleavage of fusion proteins. J. Protein Chem. 12, 1-5.

123 167. Phan, J., Zdanov, A., Evdokimov, A. G., Tropea, J. E., Peters, H. K., 3rd, Kapust, R. B., Li, M., Wlodawer, A. & Waugh, D. S. (2002). Structural basis for the substrate specificity of tobacco etch virus protease. J. Biol. Chem. 277, 50564-72.

168. Rawlings, N. D., Tolle, D. P. & Barrett, A. J. (2004). MEROPS: the peptidase database. Nucleic Acids Res. 32, D160-4.

169. Bell, J. K., Goetz, D. H., Mahrus, S., Harris, J. L., Fletterick, R. J. & Craik, C. S. (2003). The oligomeric structure of human granzyme A is a determinant of its extended substrate specificity. Nat. Struct. Biol. 10, 527-34.

170. Odake, S., Kam, C. M., Narasimhan, L., Poe, M., Blake, J. T., Krahenbuhl, O., Tschopp, J. & Powers, J. C. (1991). Human and murine cytotoxic T lymphocyte serine proteases: subsite mapping with peptide thioester substrates and inhibition of enzyme activity and cytolysis by isocoumarins. Biochemistry 30, 2217-27.

171. Pham, C. T., Thomas, D. A., Mercer, J. D. & Ley, T. J. (1998). Production of fully active recombinant murine granzyme B in yeast. J. Biol. Chem. 273, 1629-33.

172. Edwards, K. M., Kam, C. M., Powers, J. C. & Trapani, J. A. (1999). The human cytotoxic T cell granule serine protease granzyme H has chymotrypsin-like () activity and is taken up into cytoplasmic vesicles reminiscent of granzyme B- containing endosomes. J. Biol. Chem. 214, 30468-73.

173. Smyth, M. J., Sayers, T. J., Wiltrout, T., Powers, J. C. & Trapani, J. A. (1993). Met- ase: cloning and distinct chromosomal location of a serine protease preferentially expressed in human natural killer cells. J. Immunol. 151, 6195-205.

174. Page, M. J., Wong, S. L., Hewitt, J., Strynadka, N. C. & MacGillivray, R. T. (2003). Engineering the primary substrate specificity of Streptomyces griseus trypsin. Biochemistry 42, 9060-6.

175. Zamolodchikova, T. S., Vorotyntseva, T. I. & Antonov, V. K. (1995). Duodenase, a new serine protease of unusual specificity from bovine duodenal mucosa. Purification and properties. Eur. J. Biochem. 227, 866-72.

176. Joseph, J. S., Chung, M. C, Jeyaseelan, K. & Kini, R. M. (1999). Amino acid sequence of trocarin, a prothrombin activator from Tropidechis carinatus venom: its structural similarity to coagulation factor Xa. Blood 94, 621-31.

177. Matsui, T., Fujimura, Y. & Titani, K. (2000). Snake venom proteases affecting hemostasis and thrombosis. Biochim. Biophys. Acta 1411, 146-56.

178. Kamata, K., Kawamoto, H., Honma, T., Iwama, T. & Kim, S. H. (1998). Structural basis for chemical inhibition of human blood coagulation factor Xa. Proc. Natl. Acad. Sci. USA 95, 6630-5.

124 179. DiBella, E. E. & Scheraga, H. A. (1996). The role of the insertion loop around tryptophan 148 in tthe activity of thrombin. Biochemistry 35, 4427-33.

180. Toso, R., Pinotti, M., High, K. A., Pollak, E. S. & Bernardi, F. (2002). A frequent human coagulation Factor VII mutation (A294V, cl52) in loop 140s affects the interaction with activators, tissue factor and substrates. Biochem. J. 363, 411-6.

181. van de Locht, A., Bode, W., Huber, R., Le Bonniec, B. F., Stone, S. R., Esmon, C. T. & Stubbs, M. T. (1997). The thrombin E192Q-BPTI complex reveals gross structural rearrangements: implications for the interaction with and thrombomodulin. EMBO J. 16, 2977-84.

182. Soejima, K., Yuguchi, M., Mizuguchi, J., Tomokiyo, K., Nakashima, T., Nakagaki, T. & Iwanaga, S. (2002). The 99 and 170 loop-modified factor Vila mutants show enhanced catalytic activity without tissue factor. J. Biol. Chem. 277, 49027-35.

183. Timm, D. E. (1997). The crystal structure of the mouse glandular kallikrein-13 (prorenin converting enzyme). Protein Sci. 6, 1418-25.

184. Oka, T., Hakoshima, T., Itakura, M., Yamamori, S., Takahashi, M., Hashimoto, Y., Shiosaka, S. & Kato, K. (2002). Role of loop structures of neuropsin in the activity of serine protease and regulated secretion. J. Biol. Chem. 277, 14724-30.

185. Karlson, U., Pejler, G., Tomasini-Johansson, B. & Hellman, L. (2003). Extended substrate specificity of rat mast cell protease 5, a rodent alpha-chymase with elastase- like primary specificity. J. Biol. Chem. 278, 39625-31.

186. Gal, P. & Ambrus, G. (2001). Structure and function of complement activating enzyme complexes: Cl and MBL-MASPs. Curr. Protein Pept. Sci. 2, 43-59.

187. Huang, M., Rigby, A. C, Morelli, X., Grant, M. A., Huang, G., Furie, B., Seaton, B. & Furie, B. C. (2003). Structural basis of membrane binding by Gia domains of vitamin K-dependent proteins. Nat. Struct. Biol. 10, 751-6.

188. Horrevoets, A. J., Smilde, A., de Vries, C. & Pannekoek, H. (1994). The specific roles of finger and kringle 2 domains of tissue-type plasminogen activator during in vitro fibrinolysis. J. Biol. Chem. 269, 12639-44.

189. Smith, B. O., Downing, A. K., Driscoll, P. C, Dudgeon, T. J. & Campbell, I. D. (1995). The solution structure and backbone dynamics of the fibronectin type I and epidermal growth factor-like pair of modules of tissue-type plasminogen activator. Structure 3, 823-33.

190. Krem, M. M. & Di Cera, E. (2001). Molecular markers of serine protease evolution. EMBOJ. 20, 3036-45.

125 191. Patthy, L. (1990). Evolutionary assembly of blood coagulation proteins. Semin Thromb. Hemost. 16, 245-59.

192. Patthy, L. (1990). Evolution of blood coagulation and fibrinolysis. Blood Coagul. Fibrinolysis 1, 153-66.

193. Patthy, L. (1985). Evolution of the proteases of blood coagulation and fibrinolysis by assembly from modules. Cell. 41, 657-63.

194. Krem, M. M., Rose, T. & Di Cera, E. (2000). Sequence determinants of function and evolution in serine proteases. Trends Cardiovasc. Med. 10, 171-6.

195. Takeda-Shitaka, M. & Umeyama, H. (1998). Elucidation of the cause for reduced activity of abnormal human plasmin containing an Ala55-Thr mutation: importance of highly conserved Ala55 in serine proteases. FEBS Lett. 425, 448-52.

196. Hedstrom, J., Sainio, V., Kemppainen, E., Haapiainen, R., Kivilaakso, E., Schroder, T., Leinonen, J. & Stenman, U. H. (1996). Serum complex of trypsin 2 and alpha 1 antitrypsin as diagnostic and prognostic marker of acute pancreatitis: clinical study in consecutive patients. BMJ 313, 333-7.

197. Nixon, A. E. & Firestine, S. M. (2000). Rational and "irrational" design of proteins and their use in biotechnology. IUBMB Life 49, 181-7.

198. Voigt, C. A., Kauffman, S. & Wang, Z. G. (2000). Rational evolutionary design: the theory of in vitro protein evolution. Adv. Protein Chem. 55, 79-160.

199. Arnold, F. H. & Moore, J. C. (1997). Optimizing industrial enzymes by directed evolution. Adv. Biochem. Eng. Biotechnol. 58,1-14.

200. Kramer, R. A., Schaber, M. D., Skalka, A. M., Ganguly, K, Wong-Staal, F. & Reddy, E. P. (1986). HTLV-III gag protein is processed in yeast cells by the virus pol- protease. Science 231, 1580-4.

201. Lightfoote, M. M., Coligan, J. E., Folks, T. M., Fauci, A. S., Martin, M. A. & Venkatesan, S. (1986). Structural characterization of reverse transcriptase and endonuclease polypeptides of the acquired immunodeficiency syndrome retrovirus. J. Virol. 60, 771-5.

202. Brik, A. & Wong, C. H. (2003). HIV-1 protease: mechanism and drug discovery. Org. Biomol. Chem. 1, 5-14.

203. Condra, J. H., Schleif, W. A., Blahy, O. M., Gabryelski, L. J., Graham, D. J., Quintero, J. C, Rhodes, A., Robbins, H. L., Roth, E., Shivaprakash, M. & et al. (1995). In vivo emergence of HIV-1 variants resistant to multiple protease inhibitors. Nature 314, 569-71.

126 204. Wilson, S. I., Phylip, L. H., Mills, J. S., Gulnik, S. V., Erickson, J. W., Dunn, B. M. & Kay, J. (1997). Escape mutants of HIV-1 proteinase: enzymic efficiency and susceptibility to inhibition. Biochim. Biophys. Acta 1339,113-25.

205. Robinson, L. H., Myers, R. E., Snowden, B. W., Tisdale, M. & Blair, E. D. (2000). HIV type 1 protease cleavage site mutations and viral fitness: implications for drug susceptibility phenotyping assays. AIDS Res. Hum. Retroviruses 16, 1149-56.

206. Chirumamilla, R. R., Muralidhar, R., Marchant, R. & Nigam, P. (2001). Improving the quality of industrially important enzymes by directed evolution. Mol. Cell. Biochem. 224, 159-68.

207. Dalby, P. A. (2003). Optimising enzyme function by directed evolution. Curr. Opin. Struct. Biol. 13, 500-5.

208. McGrath, M. E., Erpel, T., Bystroff, C. & Hetterick, R. J. (1994). Macromolecular chelation as an improved mechanism of protease inhibition: structure of the ecotin- trypsin complex. EMBO J. 13, 1502-7.

209. McGrath, M. E., Erpel, T., Browner, M. F. & Hetterick, R. J. (1991). Expression of the protease inhibitor ecotin and its co-crystallization with trypsin. J. Mol. Biol. 222, 139-42.

210. Seymour, J. L., Lindquist, R. N., Dennis, M. S., Moffat, B., Yansura, D., Reilly, D., Wessinger, M. E. & Lazarus, R. A. (1994). Ecotin is a potent anticoagulant and reversible tight-binding inhibitor of factor Xa. Biochemistry 33, 3949-58.

211. Pal, G., Sprengel, G., Patthy, A. & Graf, L. (1994). Alteration of the specificity of ecotin, an E. coli serine proteinase inhibitor, by site directed mutagenesis. FEBS Lett. 342, 57-60.

212. Leffel, S. M., Mabon, S. A. & Stewart, C. N., Jr. (1997). Applications of green fluorescent protein in plants. BioTechniques 23, 912-8.

213. Cormack, B. (1998). Green fluorescent protein as a reporter of transcription and protein localization in fungi. Curr. Opin. Microbiol. 1, 406-10.

214. Phillips, G. J. (2001). Green fluorescent protein—a bright idea for the study of bacterial protein localization. FEMS Microbiol. Lett. 204, 9-18.

215. Ormo, M., Cubitt, A. B., Kallio, K, Gross, L. A., Tsien, R. Y. & Remington, S. J. (1996). Crystal structure of the Aequorea victoria green fluorescent protein. Science 273, 1392-5.

216. Tsien, R. Y. (1998). The green fluorescent protein. Annu. Rev. Biochem. 67, 509-44.

127 217. Chiang, C. F., Okou, D. T., Griffin, T. B., Verret, C. R. & Williams, M. N. (2001). Green fluorescent protein rendered susceptible to proteolysis: positions for protease- sensitive insertions. Arch. Biochem. Biophys. 394, 229-35.

218. Waldo, G. S. (2003). Genetic screens and directed evolution for protein solubility. Curr. Opin. Chem. Biol. 7, 33-8.

219. Taylor, F. R., Bixler, S. A., Budman, J. I., Wen, D., Karpusas, M., Ryan, S. T., Jaworski, G. J., Safari-Fard, A., Pollard, S. & Whitty, A. (1999). Induced fit activation mechanism of the exceptionally specific serine protease, complement . Biochemistry 38, 2849-59.

220. Eigenbrot, C. (2002). Structure, function, and activation of coagulation factor VII. Curr. Protein. Pept. Sci. 3, 287-99.

221. Venkateswarlu, D., Perera, L., Darden, T. & Pedersen, L. G. (2002). Structure and dynamics of zymogen human blood coagulation factor X. Biophys. J. 82, 1190-206.

222. Eigenbrot, C, Kirchhofer, D., Dennis, M. S., Santell, L., Lazarus, R. A., Stamos, J. & Ultsch, M. H. (2001). The factor VE zymogen structure reveals reregistration of beta strands during activation. Structure {Comb.) 9, 627-36.

223. Hink-Schauer, C, Estebanez-Perpina, E., Wilharm, E., Fuentes-Prior, P., Klinkert, W., Bode, W. & Jenne, D. E. (2002). The 2.2-A crystal structure of human pro-granzyme K reveals a rigid zymogen with unusual features. J. Biol. Chem. 277, 50923-33.

224. Budayova-Spano, M., Lacroix, M., Thielens, N. M., Arlaud, G. J., Fontecilla-Camps, J. C. & Gaboriaud, C. (2002). The crystal structure of the zymogen catalytic domain of complement protease Clr reveals that a disruptive mechanical stress is required to trigger activation of the Cl complex. EMBO J. 21, 231-9.

225. Krishnaswamy, S., Jones, K. C. & Mann, K. G. (1988). Prothrombinase complex assembly. Kinetic mechanism of enzyme assembly on phospholipid vesicles. J. Biol. Chem. 263, 3823-34.

226. Pingoud, A. & Jeltsch, A. (1997). Recognition and cleavage of DNA by type-II restriction endonucleases. Eur. J. Biochem. 246, 1-22.

227. Bordusa, F. (2002). Proteases in organic synthesis. Chem. Rev. 102, 4817-68.

228. Shindyalov, I. N. & Bourne, P. E. (1998). Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 11, 739-47.

229. Pasternak, A., Ringe, D. & Hedstrom, L. (1999). Comparison of anionic and cationic trypsinogens: the anionic activation domain is more flexible in solution and differs in its mode of BPTI binding in the crystal structure. Protein Sci. 8, 253-8.

128 230. Brandstetter, H., Bauer, M., Huber, R., Lollar, P. & Bode, W. (1995). X-ray structure of clotting factor IXa: active site and module structure related to Xase activity and hemophilia B. Proc. Nat.lAcad. Sci USA 92, 9796-800.

231. Mather, T., Oganessyan, V., Hof, P., Huber, R., Foundling, S., Esmon, C. & Bode, W. (1996). The 2.8 A crystal structure of Gla-domainless activated protein C. EMBO J. 15, 6822-31.

129