CRYSTALLOGRAPHIC ANALYSIS OF ESCHERICHIA COLI TYPE I IN COMPLEX WITH PEPTIDE-BASED INHIBITORS

Chuanyun Luo M. Sc, Yunnan University, China, 1999

THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF SCIENCE

In the Department 0f Molecular Biology and Biochemistry

O Chuanyun Luo 2007

SIMON FRASER UNIVERSITY

Spring 2007

All rights reserved. This work may not be reproduced in whole or in part, by photocopy or other means, without permission of the author. APPROVAL

Name: Chuanyun Luo Degree: Master of Science

Title of Thesis: Crystallographic Analysis of Escherichia coli Type I Signal Peptidase in Complex with Peptide-based Inhibitors

Examining Committee: Chair: Dr. Neil Branda Professor, Department of Chemistry

Dr. Mark Paetzel Senior Supervisor Assistant Professor, Department of Molecular Biology and Biochemistry

Dr. Rosemary B. Cornell Supervisor Professor, Department of Molecular Biology and Biochemistry

Dr. Christopher Beh Supervisor Assistant Professor, Department of Molecular Biology and Biochemistry

- Dr. Edgar C. Young Internal Examiner Assistant Professor, Department of Molecular B~ology and B~ochem~stry

Date DefendedlApproved: Mbd 19 1007 SIMON FRASER g&&V uN~ER~~Iibra ry

DECLARATION OF PARTIAL COPYRIGHT LICENCE

The author, whose copyright is declared on the title page of this work, has granted to Simon Fraser University the right to lend this thesis, project or extended essay to users of the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users.

The author has further granted permission to Simon Fraser University to keep or make a digital copy for use in its circulating collection (currently available to the public at the "Institutional Repository" link of the SFU Library website at:

The author has further agreed that permission for multiple copying of this work for scholarly purposes may be granted by either the author or the Dean of Graduate Studies.

It is understood that copying or publication of this work for financial gain shall not be allowed without the author's written permission.

Permission for public performance, or limited permission for private scholarly use, of any multimedia materials forming part of this work, may have been granted by the author. This information may be found on the separately catalogued multimedia material and in the signed Partial Copyright Licence.

The original Partial Copyright Licence attesting to these terms, and signed by this author, may be found in the original bound copy of this work, retained in the Simon Fraser University Archive.

Simon Fraser University Library Burnaby, BC, Canada

Revised: Spring 2007 ABSTRACT

Escherichia coli type I signal peptidase (SPase 1) is a membrane-bound that cleaves amino-terminal signal peptides from secreted proteins and some membrane proteins. To contribute to the structure-based design effort for developing a novel class of antibiotics, the X-ray crystal structures of peptide-based inhibitors with SPase 1 have been solved. Nine complexes of SPase I with inhibitor were co-crystallized, five of which diffracted to 2.6 a or better. Two crystal structures were solved using molecular replacement and refined. Crystallographic analysis of these two SPase I-inhibitor complexes will be useful for evaluating and optimizing these novel antibiotic drug candidates.

The crystal contacts and structural variations of the SPase I molecules from each SPase I crystal structure solved to date were analyzed and compared in a first step towards the design of new constructs of SPase I that may crystallize more readily and in a more ordered fashion.

KEYWORDS: signal peptidase, SerILys dyad catalysis, protease, inhibitor, co- crystallization, x-ray crystallographic analysis, crystal packing contacts, crystal engineering To my family

Thank you for all your love and support ACKNOWLEDGEMENTS

I would like to express my deep and sincere gratitude to my senior supervisor Dr. Mark Paetzel for providing such a great opportunity for me to study and work in his lab, and giving me the chance to appreciate the beauty and challenges in Protein Chemistry. Dr. Mark Paetzel taught me a lot and his help, guidance, and encouragement was essential for the success of this thesis project.

1 am deeply grateful to the members of my committee, Dr. Rosemary

Cornell and Dr. Christopher Beh, for serving on my supervisor committee and for providing valuable discussions, advice, and suggestions over these years. Their detailed reviews, constructive criticisms and excellent advice had a strong influence on my thesis. 1 would also like to thank Dr. Edgar Young and Dr. Neil

Branda, for being members of my examining committee.

I would like to thank Dr. Malcolm Page and his colleges in Basilea

Pharmaceutica Ltd for providing the inhibitors. I want to thank Duncan for his help in providing maintenance and suggesting various computational programs and techniques. My warm thanks to Kathryn, Livleen, Ash, and Nancy in the MBB departmental office for their organization and administration support during my

Master training program, and I would like to thank the private graduate fellowship of SFU for supporting me twice during my Master study. Dr. Mark Paetzel's lab is a wonderful workplace, and I will always cherish the friendships and memories from all members in Dr. Mark Paetzel's lab. 1 would

like to express my warm and sincere thanks to our lab manager Deidre, who was a pleasure to work with and was very helpful in providing lab supplies,

information, and encouragement. 1 would like to thank Dr. Jaeyong Lee for teaching me the computational programs, and helping me with problems during the crystal structure solving process as well as in my research. I would like to thank Dr. David Oliver for proofreading my whole thesis and providing valuable input in both my thesis writing and my presentation. I would like to thank Dr. Anat

Feldman for her helpful discussions in plasmid cloning and her encouragement. I want to thank the graduate students Yuliya and Apollos for sharing their knowledge and proofreading my thesis. I want to thank these new graduate students Ivy, Kelly, Alison, Charles, Sung-Eun, and ISS student Alan, as well as my previous lab colleagues Karen and Eve, for their support and encouragement.

My warm thanks to all of my friends from the Chemistry department, the

Biology Science department, and the Molecular Biology and Biochemistry department for their friendships and support.

Especially, I would like to give my special thanks to my husband Yong, my daughter Jiaohan and my parents whose love and support enabled me to complete this work. TABLE OF CONTENTS .. Approval ...... 11 ... Abstract ...... 111 Dedication ...... iv Acknowledgements ...... v .. Table of Contents ...... VII List of Figures ...... xi List of Tables ...... xiv Glossary ...... xvi Chapter 1: INTRODUCTION ...... 1 1.1 Protein Export and Bacterial Type I Signal Peptidase ...... 1 1.2 Signal Peptides...... 5 1.3 Bacterial Type 1 Signal Peptidase ...... 7 1. 3.1 The First Characterization ...... 8 1.3.2 The Investigation of the SPase Catalytic Mechanism ...... 9 1.3.3 Three-Dimensional Structure ...... 12 1.4 Substrates (Pre-proteins) of Bacterial SPase I ...... 20 1.5 A Proposed Catalytic Mechanism for Bacterial SPase I ...... 22 1.6 Mutants of Bacterial SPase I at the Active and Substrate- ...... 23 1.7 Inhibitors of ...... 30 1.7.1 General Concept of Inhibitor ...... 30 1.7.2 Serine Protease Inhibitor ...... 32 1.8 Inhibitors of Bacterial SPase I...... 45 1.9 Bacterial SPase I: A Novel Antibacterial Drug Target ...... 49 1. 10 Crystallographic Analysis: A Vital Tool in Drug Discovery ...... 50 1.11 Co-crystals of Protein-inhibitor Complexes ...... 51 1. 1 1.1 Co-crystallization ...... 52 1. 11.2 Soaking ...... 53 1.12 Overview of Objectives ...... 55 Chapter 2: THE CRYSTALLIZATION AND PRELIMINARY X-RAY ANALYSIS OF E. COLl SPASE I IN COMPLEX WITH PEPTIDE-BASED INHIBITORS ...... 57 2.1 Introduction...... 57 2.2 Materials and Methods ...... 61 2.2.1 Recombinant Plasmids, Peptide-based Inhibitors. and Chemicals ...... 61

vii 2.2.2 Overexpression of Protein ...... 61 2.2.3 Isolation of Protein Inclusion Bodies ...... 62 2.2.4 Solubilization of Inclusion Bodies and Protein Refolding...... 63 2.2.5 Protein Purification by Column Chromatography ...... 63 2.2.6 Searching for Initial Co-crystallization Conditions of SPase 1 with Peptide-based Inhibitors ...... 66 2.2.7 Inhibitor Soaking into Preformed Crystals ...... 66 2.2.8 X-ray Diffraction Data Collection and Preliminary X-Ray Diffraction Analysis ...... 67 2.3 Results and Discussion ...... 68 2.3.1 Purification of SPase A2-75, SPase A2-75 S90A, and SPase A2-75 K145A ...... 68 2.3.2 Crystallization Conditions for SPase A2-75 and SPase A2- 75 S90A and the Result of Soaking Inhibitors ...... 72 2.3.3 Co-crystallization Conditions for SPase I in Complex with Different Peptide-based Inhibitors ...... 73 2.3.4 Preliminary X-ray Crystallographic Analysis of E. coli SPase I in Complex with Peptide-base Inhibitors ...... 74 Chapter 3: CRYSTAL STRUCTURE OF SPASE A2-75 IN COMPLEX WITH A GLYCO-LIPOHEXAPEPTIDE ...... , 85 3 .I Introduction ...... 85 3.2 Materials and Methods ...... 87 3.2.1 The Complex of SPase A2-75 and BAL 4850C ...... 87 3.2.2 Co-crystallization of SPase A2-75 with BAL 4850C ...... 87 3.2.3 X-ray Diffraction Data Collection ...... 90 3.2.4 Phasing, Model Building, and Refinement ...... 91 3.2.5 Structural Analysis ...... 93 3.2.6 Figure Preparation ...... 93 3.3 Result and Discussion ...... 93 3.3.1 A New Co-crystallization Condition ...... 93 3.3.2 Crystallographic Structure Solution of SPase A2-75 in the Complex with BAL4850C ...... 94 3.3.3 Model Features of SPase A2-75 in the Complex with BAL4850C ...... 94 3.3.4 Structure Comparison of SPase A2-75 in Complex with BAL 4850C and Previously Solved SPase A2-75 Structures ...... 99 3.3.5 Glyco-lipohexapeptide Inhibitor (BAL 4850C) at the ...... I00 3.3.6 Analysis of the Interactions between the Glyco- lipohexapeptide (BAL4850C) and SPase I ...... 104 3.3.7 Comparison of the Substrate Binding Site of SPase A2- 75lBAL 4850C with Apo-, Acyl-enzyme, and Lipopeptide-enzyme Complex ...... 109 Chapter 4: CRYSTAL STRUCTURE OF SPASE A2-75 IN A TERNARY COMPLEX WITH ARYLOMYCIN A2 AND A SULTAMIMORPHOLINO DERIVATIVE BAL0019193 ...... 117 4.1 Introduction ...... 117 4.2 Material and Methods ...... 119 4.2.1 The Ternary Complex of SPase A2-75 with Arylomycin A2 and a SultamIMorpholino Derivative BAL0019193 ...... 119 4.2.2 Co-crystallization of the Ternary Complex of SPase A2-75 with Arylomycin A2 and BAL 00191 93 ...... 120 4.2.3 X-ray Diffraction Data Collection ...... 121 4.2.4 Phasing, Model Building, and Refinement ...... 122 4.2.5 Structural Analysis ...... 123 4.2.6 Figure Preparation ...... 124 4.3 Results and Discussion ...... 124 4.3.1 Cocrystallization, Structure Solution, and Refinement ...... 124 4.3.2 Structural Comparison of SPase A2-75 in Complex with Arylomycin A2lBAL0019193 and Previously Solved SPase n2-75 Structures ...... 131 4.3.3 Inhibitor Arylomycin A2 and BALOOl9193 at the Active Site ...... 131 4.3.4 Analysis of Interactions among Arylomycin A2 and BAL0019193 as well as SPase I ...... 135 4.3.5 Comparison of the Substrate Binding Site of SPase A2- 75lArylomycin A2lBAL 00191 93 with Apo-enzyme, Acyl- enzyme, Lipopeptide-enzyme, and Glycolipopeptide-enzyme Complex ...... 141 Chapter 5: THEORETICAL EXAMINATION OF THE CRYSTAL- PACKING CONTACTS AND STRUCTURAL VARIATION IN E. COLl SPASE A2-75 CRYSTALS ...... 148 5.1 Introduction ...... 148 5.2 Materials and Methods ...... 155 5.2.1 The Structure Data and the Analysis of Crystal-Packing Contacts ...... 155 5.2.2 The Structural Variation Analysis ...... 156 5.2.3 Surface Area of Contact ...... 156 5.2.4 Figure Preparation ...... 156 5.3 Results and Discussion ...... 156 5.3.1 Crystallographic Characteristics of E. coli SPase A2-75 Crystals ...... 156 5.3.2 Overall Crystal Packing in the Three Crystal Lattices ...... 158 5.3.3 The Crystal-packing Contacts ...... 161 5.3.4 Variability in Loop Structure of SPase A2-75 ...... 176 5.3.5 Comparison of Dynamic Regions of SPase A2-75 Molecules in Different Crystals ...... 177 5.3.6 The Dynamic Side Chains of the Active Site and Binding Site .... 187 5.3.7 Future Directions of Crystal Engineering ...... 190 Chapter 6: CONCLUSION ...... 194 Appendix: CRYSTAL ENGINEERING OF E. COLI SPASE A2-80 FOR CRYSTALLIZATION IMPROVEMENT ...... 199 1 . Introduction ...... 199 2 . Materials and Methods ...... 200 2.1 Template. Primers and Gene Amplification ...... 200 2.2 Generating the Sticky Ends of the Insert DNA of SPase A2-80 ...... 202 2.3 Construction of Recombinant Expression Plasmids ...... 204 2.4 SPase A2-80 DNA Sequencing ...... 205 2.5 Overexpression, Purification and Crystallization of Protein SPase A2-80 ...... 206 2.6 In Vitro SPase Activity of SPase A2-80 ...... 206 3 . Results and Discussion ...... 206 3.1 PCR Amplification of SPase A2-80 Gene ...... 206 3.2 Cloning of SPase A2-80 PET Constructs ...... 207 3.3 The Result of SPase A2-80 DNA Sequencing ...... 208 3.4 Expression, Purification and Crystallization Conditions of Protein SPase A2-80 ...... 209 3.5 Signal Peptidase Activity of SPase A2-80 ...... 211 Reference List ...... 213 LIST OF FIGURES

Figure 1.1: Schematic representation of a gram-negative bacterium ...... 1 Figure 1.2. The bacterial SPase I and the SecYEG- ...... 3 Figure 1.3. A typical bacterial signal peptide ...... 6 Figure 1.4. Membrane topology of E. coli SPase I ...... 10 Figure 1.5. A ribbon representation of the general fold of SPase A2-75 ...... 14 Figure 1.6. The surface representation of SPase A2-75 ...... 16

Figure 1.7. The active and substrate-binding sites of E. coli SPase 1 ...... 18 Figure 1.8. The structure of p-lactam inhibitor ...... 19 Figure 1.9: The active site of SPase A2-75 in the complex with a P- lactam inhibitor ...... 20 Figure 1.10. The proposed catalytic mechanism of bacterial SPase 1 ...... 23 Figure 1. 11: Structure of E. coli SPase I and view of the active and substrate-binding site ...... 26 Figure 1.12. Classification of ...... 30

Figure 1. 13: Classification of serine protease inhibitors ...... 33 Figure 1.14. The structure of Arylomycin A2 ...... 47 Figure 1.15: The active site of SPase A2-75 in the complex with Arylomycin A2 ...... 48 Figure 1.16. The structure of glycolipopeptide-based inhibitors ...... 49 Figure 2.1 : An outline of the experimental procedure ...... 60 Figure 2.2: SDS-PAGE analysis of SPase A2-75 overexpression ...... 69 Figure 2.3: The isolation of SPase A2-75 inclusion bodies ...... 70 Figure 2.4: The gel filtration profile of SPase A2-75 ...... 71 Figure 2.5: SDS-PAGE analysis of gel filtration chromatography ...... 71 Figure 2.6: The purification of SPase 02-75 ...... 72 Figure 2.7: The structures of inhibitors of bacterial SPase I (1) ...... 76 Figure 2.8: The structures of inhibitors of bacterial SPase 1 (2) ...... 78 Figure 2.9: The structures of inhibitors of bacterial SPase 1 (3) ...... 79 Figure 3.1: The structure of a glyco-lipohexapeptide inhibitor of bacterial SPase 1 ...... 86 Figure 3.2: Ramachandran plot of the model SPase A2-75lBAL 4850C ...... 97 Figure 3.3: The geometry of the sugar group of BAL4850C ...... 101 Figure 3.4: Electron density for glyco-lipohexapeptide BAL4850C bound in the active site of signal peptidase ...... 103 Figure 3.5: The overall binding theme of BAL4850C...... 104 Figure 3.6: Diagram of BAL4850C interacts with SPase A2-75 at the active and substrate-binding sites ...... 108 Figure 3.7: The active site superposition of SPase A2-75lBAL 4850C complex crystal structure with apo-enzyme, p-lactam inhibitor acyl-enzyme, and lipopeptide-enzyme crystal structures...... 112 Figure 3.8: Three conserved waters at the active site of SPase I ...... 113 Figure 3.9: A superposition of the SPase inhibitors (glyco- lipohexapeptide BAL4850C and Arylomycin A2) at the active site of E. coli SPase I...... 116

Figure 4.1 : Inhibitor structure of bacterial type I signal peptidase ...... 118 Figure 4.2: Two different morphologic crystals of the ternary complex of SPase A2-75lArylomycin A2lBAL0019193 co-existing in one drop ...... 125 Figure 4.3: Ramachandran plot of the model SPase A2-75lArylomycin A2lBALOO19193 ...... 127 Figure 4.4: Electron density for inhibitor Arylomycin A2 and BAL0019193 bound at the active site of E. coli SPase I ...... 134 Figure 4.5: The overall binding theme of Arylomycin A2 and BAL0019193...... 135 Figure 4.6: Diagram of inhibitor Arylomycin A2 and BALOOI 91 93 interact with SPase A2-75 at the active and substrate-binding sites ...... I39 Figure 4.7: The active site superposition of the ternary SPase A2- 75lArylomycin A2lBAL 001 91 93 complex crystal structure with apo-enzyme, p-lactam inhibitor acyl-enzyme, lipopeptide-enzyme, and glycolipopeptide-enzyme crystal structures...... 143 Figure 4.8: Structural superposition of the SPase inhibitor Arylomycin A2 that is co-present with BAL0019193 in the ternary complex and Arylomycin A2 that is present alone in the lipopeptide- enzyme complex at the active site of E. coli SPase...... 147 Figure 5.1 : Creation of a crystal ...... 149 Figure 5.2: Primitive orthorhombic and tetragonal unit cell ...... 159

Figure 5.3: Screw axes ...... 160 Figure 5.4: Overall crystal-packing contacts (packing crystal lattice) in different crystal forms...... 162 Figure 5.5: Ribbon representations of the crystal-packing contact patches on each molecule of the orthorhombic crystals (PDB code 1812, space group P2,2,2) ...... 167

xii Figure 5.6: Ribbon representations of the crystal-packing contact patches on each molecule of the tetragonal crystals (PDB code 1KN9, space group 41212)...... 168 Figure 5.7: Ribbon representations of the crystal-packing contact patches on each molecule of the tetragonal crystals (PDB code 1T7D, space group P432,2) ...... 169 Figure 5.8: A schematic representation of the potential surface mutation sites on the primary structure of E. coli SPase A2-75 ...... 173 Figure 5.9: The surface Lys and Glu residues of SPase A2-75 molecule ...... 174 Figure 5.1 0: Representations of the four dynamic loops of SPase A2-75 molecule ...... I 77 Figure 5.11: Structural superposition of the four molecules in the asymmetric unit of the orthorhombic covalent p-lactam containing crystal structure (penem-enzyme, space group 21212, PDB code: 1812) ...... 179 Figure 5.12: Structural superposition of the four molecules in the asymmetric unit of the tetragonal apo-crystal (apo-enzyme, space group 4,2,2, PDB code: 1KN9) ...... 182 Figure 5.1 3: Structural superposition of the two molecules in the asymmetric unit of the tetragonal lipo-peptide containing crystal (lipopeptide-enzyme, space group 43212,PDB code: 1T7D) ...... 184 Figure 5.14: The active site of SPase A2-75 in the complex with Arylomycin A2 ...... 185 Figure 5.1 5: Structural superposition of ten molecules in the asymmetric unit of three distinct crystal forms (penem-enzyme, space group 21212, PDB code 1B12; apo-enzyme, space group 412~2,PDB code: 1KN9; lipopeptide-enzyme, space group 43212, PDB code: 1T7D) ...... 186 Figure 5.16: A close-up view of the active site and binding sites of all chain superposition of SPase I ...... 189 Figure 5.1 7: Structural superposition of SPase A2-75 with another SerILys protease UmuD' ...... 192

xiii LIST OF TABLES

Table 1.1: The historical time-line of the first characterization of E. coli SPase I...... 9 Table 1.2: Mutants of bacterial SPase 1 at the active and substrate- binding site ...... 29 Table 1.3: Small-molecule inhibitors of serine protease (1) ...... 36 Table 1.4: Small-molecule inhibitors of serine protease (2) ...... 42 Table 1.5: Inhibitors of bacterial SPase 1 ...... 47 Table 2.1: The initial crystallization conditions of SPase A2-75 S90A ...... 73 Table 2.2: Crystals of E. coli SPase I in complex with peptide-based . .. mh~b~tors...... 80 Table 2.3: The progress on each complex of SPase Ilinhibitor ...... 84 Table 3.1 : The initial co-crystallization conditions of SPase A2-75 in complex with glyco-lipohexapeptide inhibitor (BAL4850C) ...... 89 Table 3.2: Side chains with undefined electron density in the crystal structure model of SPase A2-75 in the complex with BAL 4850C ...... ,...... 96 Table 3.3: Crystallographic data collection and refinement statistics of SPase A2-75/BAL 4850C complex ...... 98 Table 3.4: Inhibitor-SPaseA2-75 contact distance (glyco-lipohexapeptide BAL4850C)...... ,...... 107 Table 4.1: The initial co-crystallization conditions of SPase A2-75 in ternary complex with Arylomycin A2 and BAL 001 91 93 ...... 121 Table 4.2: Crystallographic data collection and refinement statistics of the ternary SPase A2-75lArylomycin A2lBAL 001 91 93

complex ...... , , . , . . , . . . . , , .. , , ...... , , . . . . . , . . . . . 128 Table 4.3: A comparison of SPase A2-75 crystals and crystal structures .....I29 Table 4.4: Inhibitor-SPaseA2-75 contact distance (Arylomycin A2 and BAL0019193) ...... 140 Table 5.1: Crystallographic characteristics of SPase A2-75 crystals ...... 157 Table 5.2: Hydrogen-bonded atom pairs (distance < 3.3 a) from the inter-chains of the asymmetric unit and symmetry related molecules of SPaseA2-75 crystals ...... 164 Table 5.3: Residues involved van der Waal interactions (the distance < 5A) from the inter-chains of the asymmetric unit of SPase A2- 75 crystals ...... 166

xiv Table 5.4: The solvent accessible area of lysine. glutamic acid. arginine. and glutamine residues of SPase A2-75 molecule ...... 172 Table 5.5: Selected x values of the side chain from the residues that construct the active and binding site ...... 188 GLOSSARY

A Angstroms (10.'~m)

ATP Adenosine Triphosphate

AU Asymmetry Unit, a basic building block of a crystal, the smallest unit that has no self-symmetry and can be rotated and translated to generate one unit cell using only the crystallographic symmetry operations. An asymmetric unit could contain one, more than one molecules, or a half or a quarter of one molecule if the molecule posses a self-symmetry.

also called temperature-factor, is a numerical indicator. It measures atomic thermal motion and disorder of an atom. It can also point out the errors built in model. S-factors are determined by local crystal packing, the lower the value of B-facter, the higher order the packed region, and vice versa.

Completeness The number of crystallographic reflections measured in an X-ray diffraction data set, expressed as a percentage of the total number of reflections present at the specified resolution. Typically, completeness is nearly 100% for a very good data set, 80-90% for a good data set, and 40% for a poor data set.

Competitive acts as a molecular competitor against the substrate and binds to the inhibitor substrate-binding site. Competitive inhibitors can bind to the free enzyme (E), but not the enzyme-substrate complex (ES).

Crystal An ordered, three-dimensional, repeating molecule array

Crystal lattice The regular spacing (defined by lengths and angles) grid of molecules from the origins of the individual unit cells).

Crystallographic A cyclic process of improving agreement between the molecular model and Refinement the crystallographic data. In each progress, the computing program coverts small optimizing shifts such as the adjustment of the atomic position, conformation of small regions, and atom oscillations or vibrations around the atomic position, in order to obtain highly precise structural model that matches the data.

DDM

DMSO Dimethyl Sulphoxide

Inhibitor An inhibitor is a molecule that reduces or eliminates the catalytic activity of an enzyme.

Irreversible An irreversible inhibitor forms a strong covalent bond with a specific inhibitor functional group of a catalytic residue from an enzyme and disables the enzyme permanently.

kilodalton

Matthews is also called the specific volume (Vm). It is the crystal volume per unit of Coeffiency protein molecular weight. (Volume of assymmetric unit1m.w.) (A3/Da) The average is 2.4 A3/~a, in a range of 1.9-4.2 A3/Da.

# Mole.lAU Number of molecules in one asymmetry unit (AU).

Mosiacity is the width of mis-orientation angles of all unit cells in a crystal, which is caused by the imperfect order of a crystal. This number has units of degrees. Lower mosiacity indicates better-ordered crystals and hence better diffraction.

Non-competitive A non-competitive inhibitor never binds to the active site of enzyme. A non- inhibitor competitive inhibitor can bind to E and ES with an identical affinity.

PAGE Polyacrylamide Gel Electrophoresis

PCR Polymerase Chain Reaction

PDB Protein Data Bank

PEG Polyethylene Glycol

RCSB Research Collaboratory for Structural Bioinformatics

Redundancy A statistical parameter each as standard deviation is used to measure agreement among the repeated measurements. Redundancy is calculated as (number of measured reflections) / (number of unique reflections).

A measure of agreement between the crystal structure model and the original X-ray diffraction data. From the model, the expected intensity of each reflection in the diffraction pattern is calculated, and then these calculated "data" are compared with the experimental data, which consist of measured positions and intensities. An R-factor is calculated as R = Z I Fobs- FcalcI / Z FobsIn the equation, Fobsand FcaIcare the observed and calculated structure factor, respectively. Each IF,,,( is derived from the measured intensity of a reflection in the diffraction pattern, and each IFcaIcJ is the intensity of the same reflection calculated from the current model. Values of R range from zero (perfect agreement of calculated and observed intensities) to about 0.6, An R-factor greater than 0.5 implies that agreement between observed and calculated intensities is very poor. A primary model with R near 0.4 is promising, and is likely to be improved via various refinement methods. A desirable target R-factor for a protein model refined with data to 2.5 A is 0.2.

xvii is calculated in the same manner as R-factor, but using only a small set (5- 10%) of randomly chosen intensities (the "test set") which are set aside from the beginning and not used during refinement progress. They are used only in the quality control process of assessing the agreement between calculated and observed data. At any stage in refinement, Rfree measures how well the current structure model predicts a sub-set of the measured reflection intensities that were not included in the refinement, whereas R measures how well the current model predicts the entire data set that produced the model. Rfreeis more accurate than Rwork(the R-factor for the other 90-95% of data, the "working set") in valuation the accuracy of the model.

A measure of agreement among multiple measurements of the same (not symmetry-related) reflections, with the different measurements being in different frames of data or different data sets. R,,,,, is calculated as: R,,,,, = 1 I Ilo,il - Ilave,il ( I Z (lave,il,where la,,,i is the average structure factor amplitude of reflection I, and I,,, represents the individual measurements of reflection I and its symmetry equivalent reflection.

Reversible Reversible inhibitors bind to through weak covalent bond or non- inhibitor covalent interactions such as hydrogen bonds, electrostatistic interactions, hydrophobic interactions and van der Waal's interactions. Reversible inhibitors could be sub-classified as competitive inhibitors, non-competitive inhibitors, uncompetitive inhibitors and mixed inhibitors

RMSD Root Mean Square Deviation. A measure of how well the final crystallographic model is consistent to expected values of bond lengths and bond angles.

SDS Sodium Dodecyl Sulphate

Space Group is a mathematic description of a crystal lattice with a certain type of symmetry and unit cell. Symmetry type is defined by a set of crystallographic symmetry operations, which may include rotation, translation, and screw axis that characterize a crystal. For example, space group P2,2,2. P means the unit cell type is primitive. Symmetry type 2,2,2 indicates that in a unit cell there is a two-fold screw axis (2,) along with x- and y- axis, respectively, and there is a normal two-fold axis (2) along z- axis (symmetry type).

SPase l Type I Signal Peptidase (Bacterial)

SPase ll Type II Signal Peptidase (Bacterial)

SPase 82-75 The construct of E. coli SPase I lacking residues 2 through 75, which correspond to the two transmembrane segments and the cytoplasmic region.

SPase A2-80 The construct of E. coli SPase I lacking residues 2 through 80, which correspond to the two transmembrane segments and the cytoplasmic region.

xviii Tris

Unit Cell the repeating unit that can generate the crystal through only three- dimensional translation operations.

Uncornpetitive an uncompetitive inhibitor results from the combination of the inhibitor with in hibition the enzyme-substrate complex directly, without the formation of an enzyme-inhibitor complex. In other words, an uncompetitive inhibitor binds only to the enzyme-substrate complex (ES), not to the free enzyme (E), and the EIS complex is catalytically inactive.

xix CHAPTER I: INTRODUCTION

1.1 Protein Export and Bacterial Type I Signal Peptidase

Many newly synthesized proteins must be transported to their correct compartments to perform their specific functions in both prokaryotic and eukaryotic cells [I]. A prokaryotic cell is relatively simple in structure when compared with a eukaryotic cell. However, it still has separate compartments

(Figure 1.I). In Gram-negative bacteria cell, going from inside to outside, there is the cytosol (I), the cytoplasmic membrane (2),the periplasmic space (3),the outer membrane (4), and the exterior environment (5) 121. A great deal of work has gone into the investigation of how proteins travel across compartments in the bacterial cell [3].

Figure 1.1: Schematic representation of a gram-negative bacterium A Gram-negative bacterium has separated compartments. Going from inside to outside, there is the cytosol (I),the cytoplasmic membrane (2), the periplasmic space (3),the outer membrane (4), and the exterior environment (5). Initially, exported proteins are synthesized as pre-proteins having an amino-terminal extension, known as the signal peptide (or leader peptide), for signalling their transport [4]. Pre-protein export involved two steps. Starting at the ribosome in the cytosol, pre-proteins are targeted to the cytoplasmic membrane.

The major targetting route is SecB-dependent [5-71. SecB, a cytosolic chaperone, is a component of the SecYEG-translocase. The SecYEG-translocase is an multi-protein complex comprised of the proteins SecA, SecB, SecD, SecE, SecF,

SecG, SecY, YajC, and YidC [5, 8, 91 The Sec YEG heterotrimer forms a complex integrated at the cytoplasmic membrane and acts as a protein- conducting channel (Figure 1.2). Once a pre-protein emerges from the ribosome in the cytosol, SecB, interacts with the hydrophobic region within the main body of a pre-protein to avoid the pre-protein aggregation or tight folding. SecB then ushers the pre-protein toward the SecYEG-translocase at the cytoplasmic membrane, where the homodimer SecA peripherally associates [7]. SecB passes the pre-protein to SecA, which interacts with the signal peptide [8]. Periplasm

P

ATP ADP+Pi

Figure 1.2: The bacterial SPase I and the SecYEG-translocase Bacterial SPase I and SecYEG-translocase provide a dominant export pathway for the majority of bacterial transportory pre-proteins. The pre-protein is targeting to the cytoplasmic membrane surface with the assistant of the molecular chaperon SecB. SecA, an ATPase, drives the pre-protein chain across the membrane through the SecYEG channel, using the energy of ATP hydrolysis. During or shortly after the pre-protein is translocated across the membrane, the signal peptide is cleaved off by the bacterial SPase I. The function of other components such as SecD, SecF, YajC and YidC are less certain.

The second step of pre-protein export is the transport of the pre-protein across the cytoplasmic membrane, a process called translocation. SecA uses the energy of ATP hydrolysis to power the pre-protein through the Sec YEG channel.

The pro-protein substrate is tethered to the membrane via the signal peptide.

During or shortly after the pre-protein crosses the SecYEG channel, SPase I, a membrane-bound endopeptidase that is presumably close to the SecYEG- translocase, cleaves away the signal peptide of the pre-protein and releases the mature protein from the cytoplasmic membrane. SPase I contains amino-terminal transmembrane segments for anchoring into the membrane and a large carboxy-

terminal catalytic domain that resides on the outer surface of the cytoplasmic

membrane [I01. The mature proteins of Gram-negative bacteria either arrive at

the periplasmic space and fold into their native conformations, sometimes with

assistance of distinct molecular chaperones of the periplasm such as Skp, SurA,

and PpiD [I1-13], or continue to the outer membrane for integration. Proteins in

Gram-positive bacteria secrete directly out of the cell wall into the surrounding growth medium [I]. The function of other components of SecYEG-translocase such as SecD, SecF, YajC, and YidC remain largely unclear. There are reports that SecDFYajC exists as a hetero-trimeric complex [14], and YidC functions to mediate membrane protein insertion in bacterial and assists in membrane protein assembly and folding [6, 14, 151.

The majority of signal peptides of pre-proteins are processed by two principle SPases with different cleavage specificity. Type 1 signal peptidase

(SPase I) is responsible for non-lipoproteins while Type II signal peptidase

(SPase II) processes lipoproteins [5, 161. SPase I is very common and exists in all three kingdoms of life [5]. SPase II is only found in eubacteria [lo].

Bacterial SPase I (SPase, EC 3.4.21.89) consists of a family of membrane-bound serine . They reside mainly on the cytoplasmic membrane and are responsible for the cleavage of signal peptides from the majority of secreted proteins and some membrane proteins. This family of enzymes belongs to a novel protease clan (SF) and protease family S26 [I71, which uses a catalytic SerineILysine dyad instead of a classical Serine/Histidine/Aspartic acid triad catalytic mechanism [I8, 191. They are

essential enzymes for bacteria viability. A lack of bacterial SPase I activity leads

to an accumulation of pre-proteins in the plasma membrane of the cell, which

eventually results in lysis of the membrane and cell death [20].

1.2 Signal Peptides

To ensure correct transport, newly synthesized pre-proteins are equipped with a signal peptide, which is removed by distinct SPases during or after pre-

protein translocation. A signal peptide is a short continuous amino acid sequence at the N-terminus of a pre-protein, which determines the cellular fate of a pre-

protein. Signal peptides play multiple roles such as: (i) discriminating between different targeting pathways, and (ii) presenting pre-proteins to distinct SPases

[21]. The signal peptides from Gram-positive bacterial are much longer. The average length of a signal peptide in Gram-positive bacterial is 32 amino acids, in

Gram-negative bacteria it is 25.1 amino acids and in eukaryotic species it is shorter, 22.6 amino acids [22]. In addition, the signal peptides of Gram-positive have more positive residues such as lysine and arginine at the N-terminal region

(N-region) than those of Gram-negative bacterial or eukaryotic ER [23].

Although signal peptides have little sequence homology, they conserve common structural features and function as one entity in both prokaryotes and eukaryotes [24]. This observation implies that both the signal peptides and

SPase substrate specificity are conserved throughout evolution. A signal peptide has three distinct regions, the amino- (N-), hydrophobic- (H-), and carboxyl- terminal (C-) regions (Figure 1.3) [4]. The N-region contains 1-5 residues (at least one arginine or lysine) and has a net positive charge [25]. The H-region is the

centre hydrophobic core of signal peptides and has 10-15 amino residues. This

region spans the membrane with a-helical conformation. At the end of the H-

region a glycine or proline usually is present, which marks the end of the a helix

[22]. The C-region is about 3-7 residues in length and is neutral but contains

some polar amino acids [25].

I le Leu Thr Val Cleavage Site Ser Ser Gly Gly Pro G~Y -6 I I

Positively Hydrophobic Extended Charged Alpha-helical Beta-conformation 1-5 residues 10-15 residues 3-7 residues

Figure 1.3: A typical bacterial signal peptide A signal peptide has a positively charged N-terminus (N-region), a hydrophobic central region (H-region), and a neutral but polar C-terminus (C-region). A helix- breaking residues (Pro or Gly) are often found at the boundary (-6 or P6 position relative to the cleavage site) between the H-region and C-region. The SPase recognition motif consists of small aliphatic residues at the position -1 (PI) and -3 (P3) position relative to the cleavage site. The most common residues at these positions are Ala.

The N-region and H-region are required for efficient translocation and the

C-region specifies the cleavage site of the signal peptide for SPases [26]. It is believed that the N-region of the signal peptide promotes export by electrostatic interaction with the negatively charged phospholipid headgroups [27, 281. The H- region adopts an a-helical conformation within the membrane and is essential for

targeting and membrane insertion [29]. The C-region contains the recognition

motif for SPase to determine the site of signal peptide cleavage and is

responsible for exposuring the cleavage site to the periplasmic surface. This

region taking a P-stranded conformation to orient the scissile bond to bacterial

SPase 1 [18].

The cleavage site is located at the end of the C-region of the signal

peptide and is marked by a SPase recognition motif. This recognition motif is

made of small, uncharged residues at the -1 and -3 positions (or P1 and P3

positions according to Schechter and Burger nomenclature) [30], on the

upstream of the cleavage site. Although a preference of Ala presented at both

the -1 and -3 positions, the statistical analysis of amino acid sequences

surrounding the cleavage sites has shown that Ala, Gly and Ser could appear at

the -1 position, and larger amino acid residues such as Thr, Leu, Ile, and Val

could be accommodated at the -3 position [22]. Traditionally, this recognition

motif for SPase cleavage is called the "Ala-X-Ala" rule [23].

1.3 Bacterial Type l Signal Peptidase

The knowledge of chemistry and enzymology of bacterial SPase I is largely obtained from the study of the typical Gram-negative bacterial model

Escherichia coli. E. coli SPase I is the most thoroughly studied and the best- characterized bacterial SPase 1 so far. Here, a summary of the structure and function of bacterial E. coli SPase I will be provided. 1.3.1 The First Characterization

E. coli SPase 1 is encoded by the IepB gene [31], and E. coli has a single-

copy of this gene [32]. A time-line of the discoveries of the important properties of

E. coli SPase 1 is summarized in the Table 1.I. SPase I was first isolated from

inner membrane extraction of E. coli and its endoproteolytic activity was

observed against bacteriophage coat proteins in 1978 and 1980 [33, 341. SPase 1 was first cloned and sub-cloned from E. coli genomic DNA in 1981 [31]. The DNA

sequence of the structural gene was first sequenced via the method of Sanger

and the amino acid sequence was deduced in 1983 1351. SPase 1 was first

overexpressed under the control of an arabinose B promoter in 1985 [20]. SPase

I was first classified using site-directed mutagenesis [36]. SPase I enzymatic activity was first kinetically characterized (K, and Kcatagainst a pre-protein of outer membrane protein A, pro-OmpA nuclease A, PONA for short) in 1995 [37].

SPase I was first crystallized in 1995 [38] and first structurally revealed using X- ray crystallography in 1998 [18]. Subsequently, much more work has been done to understand most aspects of this unique enzyme. Now, this enzyme is easily obtained in large amounts (40mg per liter E. coli culture) by isopropyl-P-D- thiogalactopyranoside (IPTG)-induced overexpression and purification using ion exchange chromatography [38, 391. Much attention has gone into using E. coli

SPase I as a drug target for developing novel antibiotics based on structure- based drug design approach [40]. Table 1.1: The historical time-line of the first characterization of E. coli SPase I Year Discovery Reference

J 2004 Crystal structure of enzymellipopeptide inhibitor Paetzel et a/. [40] complex 2002 Crystal structure of apo-enzyme Paetzel et a/. [41] 1998 1" Crystal structure Paetzel et a/. [I 81 1997 Chemical-modification supporting the SerILys dyad 1 Paetzel eta/.[42] catalytic mechanism 1995 crystallized Paetzel eta/. [38] I 1995 Kinetic parameters (Km, Kcat) Tschantz et a/. [37] 1992 Classified as a novel serine protease I Sung et a1 [36] I 1985 Overexpressed Dalbey et a/. [20] Gene sequenced Wolfe et a/. [35] Cloned and sub-cloned / Date eta/. [31] 1 I Purified from E.coli membrane Zwizinski eta/. [34] endoproteolytic activity Chang eta/. [33]

1.3.2 The Investigation of the SPase Catalytic Mechanism

E. coli SPase 1 has 323 amino acids, molecular weight 35,988 Da and pl

6.9. It is a membrane-bound endopeptidase containing two amino-terminal transmembrane segments (residues 4-28, HI and 58-76, H2), one small cytoplasmic region (residues 29-58, PI) and a large carboxyl-terminal periplasmic catalytic domain (residues 76-323, P2) [43, 441. Both N- and C- termini face the periplasmic space as proven by proteinase K digestion [35, 431, gene-fusion [45], and disulfide cross-linking study (Figure 1.4) [46, 471. SPase I 323 aa SPase A2-75 -36 KDa 248 aa -28 KDa

N COOH Outside 4 I

Cytoplasmic HI Membrane - - Inside 28

Figure 1.4: Membrane topology of E, coli SPase I HI and H2 are the two transmembrane segments. PI is the small cytoplasmic region and P2 is the large carboxyl-terminal periplasmic catalytic domain. P2 is also called SPase A2-75 because of the lack of residues from 2 to 75.

Prior to the crystal structure of E. coli SPase I, the catalytic mechanism of this enzyme was explored through mutagenesis including deletion, site-directed mutagenesis, and site-directed chemical modifications. The mutants of deletion

HI (A5-22), PI (A30-52), H2 (A62-68), and even a large deletion A4-60 were active in vivo [44]. Interestingly, in vitro this enzyme was still active after the complete deletion of its two transmembrane segments HI, H2 and the cytoplasmic domain PI (residues 2-75) [37-391. This new construct (without Hl-

PI-H2) of E. coli SPase I called SPase A2-75, which contains only the carboxyl- terminal periplasmic catalytic domain (residue 76-323), was soluble and able to cleave signal peptides with high fidelity [48]. The replacement of the positive charged Arg 77 to either neutral Leu or negatively charged Asp completely aborted catalytic activity, and even the substitution of a positively charged Lys for

Arg 77 resulted in an impaired catalysis [44]. This result suggested that the catalytic domain of this enzyme is its large carboxyl-terminal periplasmic catalytic domain (residues 76-323, P2) and the region immediately starting at residue 77 was very important to catalytic activity. However, we have shown that SPase A2-

80 is active, despite the deletion of Arg77, and its activity is only 10 fold lower than that of SPase ~2-75(In Appendix, Figure A.5).

In general, proteases are categorized into four classes based on their catalytic mechanism [49-521. There are serine, cysteine, metallo and aspartic proteases. To determine which class SPase I belongs, site-directed mutagenesis, and protease assays using the bacteriophage MI3 procoat protein as the substrate, were used. It was found that even after replacing all of the Cys with Ser, or replacement each of His with Ala, this enzyme has a full catalytic activity, indicating it does not belong to cysteine or classical serine protease.

Mutagenesis of all of Ser and Asp residues suggested that the enzyme most likely belongs to a novel class of serine proteases and that the Ser 90 was one of the catalytic residues 1361. Later, an investigation showed that Lys 145 is the only basic residue that is essential for E. coli SPase I activity [53]. Therefore, a

SerILys dyad catalytic mechanism has been proposed [53, 541. Supporting evidences of Ser 90 ILys 145 dyad catalysis came from site-directed chemical modification [42]. The Ser9O to Cys9O mutant was active but was inhibited by the cysteine-specific reagent, N-ethylmaleimide. In contrast, the wild-type enzyme was not affected by this inhibitor, suggesting that the Ser9O is the catalytic residue [54]. The change of Lys145 to Cys 145 inactivated enzyme activity.

However, when this mutant protein was treated with 2-bromoethylamine-HBr, its catalytic activity was recovered because it introduced a lysine analogue (y-thia- lysine, S-CH2-CH2-NH3') back to the position 145. In contrast, when this mutant protein reacted with (2-bromoethy) trimethylammonium-HBr, which introduces a positively charged but non-titratable lysine analogy {S-CH2-CH2-N (CH3)3'} at the position 145, could not restore catalytic activity. Hence, the essential Lys145 is a catalytic residue and should act as a general base in catalysis [42].

In summary, E. coli SPase I is classified as a novel serine protease, which uses a SerILys catalytic dyad in place of SerIHislAsp commonly seen in classical serine proteases. The SerILys dyad catalytic mechanism of E. coli SPase 1 is consistent with its crystal structure, first solved by Paetzel and co- workers in 1998 [I81. Detail structural information regarding the mechanism of this enzyme will be given in the next section.

I.3.3 Three-Dimensional Structure

SPase A2-75 has a molecular weight of 27,952 Da [39] and pl of 5.6 [37] and corresponds to the large soluble, catalytic domain of E. coli SPase I. The

SPase A2-75 gene has been sub-cloned, and the corresponding protein was expressed, purified [39], characterized [37] and crystallized [38]. Paetzel and co- workers (1 998) successfully obtained a well-ordered co-crystal of SPase A2-75 in the complex with a p-lactam inhibitor {allyl (5S, 6s)-6-[(R)-acetoxyethyll-penem-

3-carboxylate}, and solved its three-dimensional (3D) crystal structure by multiple isomorphous replacement with anomalous signal using heavy-atom derivatives including ethylmercury phosphate and methylmercury acetate [I81.

Up to now, three crystal structures of E. coli SPase I have been solved and the structure data has been deposited into the Research Collaboratory for

Structural Bioinformatics (RCSB) Protein Data Bank (PDB) [55] The structures include the SPase A2-75 apo-enzyme (PDB code 1KN9, space group: P41212, resolution 2.5A), SPase A2-75 in complex with a p-lactam (penem) inhibitor

(PDB code 1812, space group: P21212, resolution 1.9A) and with a lipopeptide- based inhibitor arylomycin A2 (PDB code 1T7D, space group P43212,resolution

2.5A) [18, 40, 411. All these crystal structures show the same basic protein architecture (Figure 1.5), except slight differences in their flexible region, and the active site, due to the docking of an inhibitor or a free active site. Among these three structures, the structure of SPase A2-75 in the complex with a P-lactam

(penem) inhibitor was the first solved to high resolution (1.9A). This structure provides direct evidence regarding the catalytic mechanism and the substrate specificity for this enzyme [I81.

The overall protein fold of SPase A2-75 is composed of two anti-parallel P- sheet domains (Figure 1.5). Domain I contains the catalytic residue Ser 90 and

Lys 145, and all of the conserved residues shared among SPase 1 from different species [I01. There is an extended P-hairpin protruding from domain I, and domain II (residue 154-264) has a disulphide bond formed between Cys 170 and

Cys 176. Domain II is only present in signal peptidases from Gram-negative bacteria and its role in the enzymatic activity remains unclear [I 81. Figure 1.5: A ribbon representation of the general fold of SPase A2-75 Domain I is shown in blue with a P-hairpin extension (residues 107-122, highlighted in magentas). The catalytic residues Ser9O (red) and Lys 145 (green) are shown as the sticks. Domain II is shown in grey. The disulphide bond (between resides Cysl70 and Cys 176) is in yellow and shown as the sticks. The inhibitor is not shown for the sake of clarity. The coordinates of SPase A2-75 is from Protein Data Bank (accession code 1612). The figure is prepared using PyMol [56].

Domain I has a large hydrophobic surface that is made up by residues

Phe79, lle80, Tyr81, Phe100, Leul02, Trp300, Met301, Phe303, Trp310,

Leu314, and lle319. These hydrophobic residues run along the P-sheet comprised of the P-strands from amino acids 81 -85, 99-105, 291 -307 and 31 0

(Figure 1.5 and 1.6). This unusual hydrophobic surface present in soluble SPase

A2-75 may be involved in membrane association, which was proposed by

Paetzel et a1 in 1998. This hydrophobic surface has two tryptophan residues (Trp

300 and Trp 31 0). Aromatic residues commonly appear at the proteinlmembrane interfacial surface of integral membrane proteins, known as "aromatic belt" [57].

Trp300 and Trp310 are essential for optimal catalytic activity of E. coli SPase I

[58]. In addition, it was shown that this SPase A2-75 soluble domain lacks the two transmembrane segments and the small cytoplasmic region of full-length E. coli SPase I, but it still needs the non-ionic detergent Triton X-100 for its optimal activity [37, 381. Triton X-100 may mimic the lipid membrane that interacts with the large hydrophobic surface on SPase A2-75 and therefore stabilizes this enzyme as in its native environment. In turn, this phenomenon implies that phospholipids play an important physiological role in catalysis of E. coli SPase 1.

Indeed, in the presence of phospholipids (bilayer or micelles), SPase A2-75 shows a increased ability to process the substrate pro-OmpA nuclease A to its mature form [37]. SPase A2-75 can bind to inner and outer membranes of E. coli, though the native population of E. coli SPase I is found mostly in the inner membrane in vivo. In vitro SPase I can partially penetrate into lipid monolayers mediated by phosphatidylethanolamine (PE), which is the most abundant lipid in

E. coli inner membrane [59]. ts- Membrane association surface

Figure 1.6: The surface representation of SPase A2-75 A large hydrophobic surface (in green) proposed as the membrane association surface is shown involving Trp300, Trp310, and N-terminus positions. The hydrophobic substrate-binding sites S1 and S3 (in cyan) are labelled under a position guidance of the catalytic residue Ser9O (S90, in red). The coordinates of SPase 62-75 is from Protein Data Bank (accession code 1812). The figure is prepared using PyMol [56].

The surface analysis (Figure 1.6) of the crystal structure of SPase A2-75 in the complex with the penem inhibitor also revealed two shallow hydrophobic cavities named S1 and S3 near the active site Ser9O [18]. Sl and S3 correspond to the nomenclature of the recognition motif -1 (PI) and -3 (P3) on the substrate.

Guided by the methyl group at C16 of the penem inhibitor (Figure 1.8 and 1.9), the S1-substrate binding site was signed because the penem methyl group mimics the -1 (PI) alanine side chain of the substrate. The S1-subsrate binding site is made of primarily hydrophobic residues including IIe86, Pro87, Ser88,

Ser90, Met91, Lys95, lle144, Tyr143, and Lys 145 (Figure 1.7, 1.8 and 1.9) [I81.

After modelling an acylated tetra-peptide substrate (Ala-Ala-Ala-Ala) by the guidance of the atom positions of the penem, the S3-substrate binding site was signed to a hydrophobic pocket that is formed by residues including lle144,

Asp142, Val 132, 1101 lle86 and Phe84 (Figure 1.7 and 1.9). The residues lle144 and lle86 bridge the S1 and S3 binding pockets 118, 40, 411. Both S1 and S3 are hydrophobic and depression as well as S3 is more shallow and broad than S1, which is consistent with its substrate specificity of Ala-X-Ala or -1, -3 cleavage site, and a relatively larger residues could be accommodated at -3 (P3) position than at -1 (PI) site of the substrate [18, 40, 411. These results indicate that the

S1 and S3 cavities on SPase I play an important role in substrate binding. Figure 1.7: The active and substrate-binding sites of E. coli SPase I The active site of SPase A2-751penem complex crystal structure is shown as stick representation in cyan. The substrate-binding pockets S1 and S3 are indicated. All residues are labelled in one letter code. The inhibitor is removed for a clarification. The coordinates of SPase A2-75 is from Protein Data Bank (accession code 1B12). The figure is prepared using PyMol [56].

The inhibitor {allyl (5S, 6s)-6-[(R)-acetoxyethyll-penem-3-carboxylate)

(Figure 1.8) has a scissile peptide bond analogue between C7 and N4. The electron density shows that there is a covalent bond between See0 Oy and the carbonyl carbon (C7) of the inhibitor (a continuous electron density surrounding both Ser9O Oy and C7), which suggests that Set90 nucleophile attacks the carbonyl carbon (C7) and that the bond between C7 and N4 is broken (Figure I.8 and 1.9). The electron density shows that the &-aminogroup of Lys 145 points to

Ser9O (2.9 A away from Ser9O) and is the only titratable group near the active site nucleophile. Therefore, the &-amino group of Lys 145 must play a role as a general base to extract the hydroxyl proton away from Ser90, which activates the nucleophilic attack of Ser9O. This observation provided the first direct evidence that this enzyme is a novel serine protease, which uses the SerlLys dyad catalytic mechanism [I81.

Figure 1.8: The structure of p-lactam inhibitor {allyl (5S, 6s)-6-[(R)-acetoxyethyll-penem-3-carboxylate}

In the vicinity of the active site, other residues such as Ser88 and Ser278 are also important for the catalysis. An is formed from the main- chain amide (-NH) of Ser9O and the side-chain hydroxyl Oy of Ser88 to stabilize the tetrahedral intermediate of SerlLys dyad catalysis. It has been suggested that

Ser278 helps to fix the general base lys145 toward Ser9O. Any side chain that is bigger than Gly at the position 272 will clash with Lys145 [18]. Figure 1.9: The active site of SPase 62-75 in the complex with a p-lactam inhibitor The surface representation of the active site of SPase A2-75 with the stick representation of important residues that are involved in catalysis. The p-lactam (5S, 6s) inhibitor is coloured according to the elements (yellow for carbon, red for oxygen, blue for nitrogen and orange for Sulfur). The inhibitor is covalently bound to the Oy of Ser9O through its C7 (Ser9O Oy nucleophile attacks the C7 and the bond between C7 and N4 of the inhibitor is cleaved). The oxyanion hole is formed by the main-chain nitrogen of Ser-90, and the Oy of Ser88 through the hydrogen binding (dash lines) with 08 of the inhibitor. The methyl group (C16) of the inhibitor sits in the S1 substrate-binding site. The Lys 145 N< is hydrogen boned (dash lines) to the ser90 Oy and Ser2780y. The substrate-binding sites S1 and S3 are labelled. The coordinates of SPase A2-75 is from Protein Data Bank with a code 1812. This figure is prepared using PyMol (561.

I.4 Substrates (Pre-proteins) of Bacterial SPase I

Relatively little is known regarding the full-length pre-protein substrate specificity of bacterial SPase I. In the early studies of bacterial SPase 1, several full-length pre-proteins were used as substrates for the enzymatic activity assay.

They are pre-protein of outer membrane protein A (pro-OmpA) of E. coli, procoat protein of MI3 bacteriophage (MI3 procoat is an integral protein that associates with the host plasma membrane on the cytoplasmic face upon the infection), and pre-maltose binding protein (pre-MBP, a periplasmic protein). MI3 procoat protein and pre-MBP were used mostly for in vitro enzymatic processing.

The best-characterized full-length pre-protein substrate of E. coli SPase I to date is the pro-OmpA nuclease A fusion protein (PONA, with a calculated molecular weight of 18,839.7 Da). E, coli SPase I processes PONA with high affinity and fidelity. PONA is a hybrid secreted pre-protein that was constructed from the Staphylococcus aureus nuclease A (Mr = 16,811.2 Da) attached to the signal peptide of E. coli outer membrane protein A (OmpA) [60]. Using PONA as a substrate, the kinetic parameters for the enzymatic activity of SPase 1 [37, 611 were obtained. Paetzel and co-workers in 1997 first obtained the pH-rate profile of this enzyme using this substrate [42]. The pH profile showed a classical bell- shaped curve with a maximum efficiency at pH 9.0. The apparent pKa values for two titratable catalytic residues were at 8.7 and 9.3. At pH 9.0 and in the presence of detergent Triton X-100, the K, was 16.5 pM and the Kcatwas 8.73 s-' for wild-type E. coli SPase I cleavage of PONA. The K, was 32 pM and the

Kcat3.0 s-' for the soluble catalytic domain SPase A2-75 [37]. Using PONA as a substrate, the substrate specificity of E, coli SPase I was also studied. Carlos and co-workers (2000) also showed that neither membrane-spanning segments of bacterial SPase I nor the H-region of signal peptide of pre-protein is important for the cleavage fidelity in vitro since enzyme constructs lacking transmembrane segments still recognized the Ala-X-Ala on the signal peptide and cleaved the right scissile bonds on PONA. In addition, the mutants on the H-region of signal peptide of PONA did not influence the cleavage fidelity [48].

1.5 A Proposed Catalytic Mechanism for Bacterial SPase I

Based on the direct experimental data from the crystal structure of E. coli

SPase I, the available biochemical analysis, and the modelling of peptide substrates, Paetzel and co-workers proposed a catalytic mechanism for pre- protein cleavage by bacterial SPase I (Fig 1.10) [5]. (1) The C-region (PI to P4) of the signal peptide (taken a P conformation) of the pre-protein binds to the P- strand 142-149 (containing the catalytic residue Lys145) of SPase I via a P-sheet type hydrogen-bonding manner. The reorganization motif of the cleavage site, the side chains of -1 (PI) and -3 (P3) of the signal peptide, fit into the hydrophobic S1 and S3 substrate binding sites on the SPase 1 to expose the cleavage site (the scissile peptide bond). The Nc of Lys145 side chain then acts as the general base to extract the proton from the Ser9O Oy and make the Oy of

Ser9O more nucleophilic. (2) The deprotonated Oy of Ser9O attacks the scissile bond; an oxyanion hole, generated by the main-chain amide (-NH) of Ser9O and the side-chain hydroxyl Oy of Ser88, stabilizes the newly formed tetrahedral intermediate I by hydrogen bonding. (3) The ammonium group from the Lys 145 side chain donates a proton to the leaving amide group at the N-terminus of the mature protein. This releases the mature protein from the SPase I and the acyl- enzyme intermediate is formed. (4) To regenerate the active site of SPase I, the

N! of Lys145 again acts as a general base to activate the deacylating water and the tetrahedral intermediate I1 is formed. Next, the ammonium group of Lys145 then donates a proton back to the Ser9O Oy, the cleaved signal peptide dissociates from the SPase I.

I. Mature protein Mature protein 7r L,

~lll 'w Ser 88 P: --( .N H o=-( Fp3 3% Signal peptide Signal peptide Tetrahedral intermediate I

h' 7 Signal peptide Acyl-enzyme intermediate Tetrahedral intermediate II

Figure 1.lo: The proposed catalytic mechanism of bacterial SPase I

1.6 Mutants of Bacterial SPase I at the Active and Substrate- binding Site

Paetzel and co-workers have compared the sequences of a number of

SPase I species from bacterial to human and identified five regions that are conserved throughout evolution [5]. These regions of sequences are designated as regions A-El and region A corresponds to transmembrane segments as well as region B-E all are within the carboxyl-terminal catalytic domain [5, 101. In E. coli SPase I, region B (residues 88-95) contains the nucleophile Ser 90 and a conserved Met 91. The residue Met 91 contributes to formation of the substrate- binding S1 pocket. Region C (residues 127-134) contains Vle 132 that contributes to formation of the substrate-binding S3 pocket. Region D (residues

142-153) contains the general base Lys 145 and a conserved Arg 146. Arg 146 forms a salt bridge with Asp 273, and residues Tyr143, lle144, and Lys145 contributes to the formation of active and substrate-binding site in E. coli SPase I.

Region E (residues 272-282) contains the conserved Gly 272, Asp 273, Asn 274,

Ser 278, Asp 280, and Arg 282. Gly 272 cannot be replaced by any other residues; otherwise, a clash with the general base Lys 145 will occur. Ser 278 helps to place Lys 145 properly toward the nucleophile Ser 90. The crystal structure revealed that regions B-E all reside around the SPase I active site, and contribute to a conserved catalytic protein fold of SPase I (Figure 1.I 1) [lo, 181.

Substitution of these conserved residues have been studied, mainly by site-directed mutagenesis, to probe the function of SPase 1 151. In vitro, activity assays of SPase I mutants analyze the hydrolytic cleavage of the N-terminal signal peptide of substrate PONA. In vivo, the activity assay used measures the processing of the N-terminal signal peptide from the translocated preprotein pro-

OmpA in a temperature-sensitive E. coli signal peptidase strain, IT41 (DE3) [44,

621. The E. coli strain IT41 (DE3) contains a mutant lepB gene for SPase 1, which shows normal growth at 3Z•‹C, but slow growth at 42•‹C.Transformation of a plasmid containing wild-type SPase I into this strain complements the mutant

IepB gene and allows growth at non-permissive temperature 42•‹C. Therefore, the processing of pro-OmpA to OmpA can be carried on in wild-type SPase 1 complemented E. coli strain IT41 (DE3) at 42•‹C. In contrast, the processing efficience of pro-OmpA to OmpA in a mutant SPase I complemented E. coli strain IT41 (DE3) is reduced at 42•‹C.A summary of mutants of E. coli SPase I at the active and substrate-binding site made so far is organized in Table 1.2.

Evidence from Paetzel et al. (1997), Tschantz et a/. (1995), and Sung et al.

(1992) support the model that Ser9O and Lys 145 are essential catalytic residues of E. coli SPase I. Mutations of the catalytic residues Ser9O to S90A and Lys145 to K145A cause a decrease in activity of greater than 100,000-fold in vitro, and mutant enzymes have no detectable activity in vivo [36, 42, 541.

Carlos et a/. (2000) have investigated the role of Ser88 in E. coli SPase I

[61]. They found that substitution of residue either Ser88 to S88C or S88A resulted in a 740-fold and 2440-fold reduction in activity, respectively. In contrast, the Ser88T mutant maintains wild-type enzyme activity. At position 88, the change of Ser88 to S88T maintains a hydroxyl functional group from the side chain, which is proposed to stabilize the oxyanion intermediate during catalysis by E. coli SPase I. In addition, thermal inactivation and circular dichroism (CD) spectroscopic analysis have not detected global conformation distortion of these mutant enzymes when compared with the wide-type E. coli SPase 1, suggesting that the activity decreases of these mutant enzymes are not duo to the conformational change of the mutant proteins but the change of Ser88 side

chain. All of these findings support the conclusion that Ser88 is important in

catalysis of SPase I.

Figure 1.1 I:Structure of E. coli SPase I and view of the active and substrate-binding site A. A ribbon representation of the general fold of SPase 62-75, along with residues that reside at active and substrate-binding site, as well as residues (in region E, a red dashed circle ) that are surrounding the active site are represented as stick and lines. 6. Close-up view of the active and binding site of SPase A2-75lpenem complex crystal structure. Residues that reside at the active and binding site are shown as stick representation in cyan. Conserved residues (in region E) that are surrounding the active site are shown as line representation and coloured by elements. The colour code is carbon in green, oxygen in red, nitrogen in blue. The substrate-binding pockets S1 and S3 are indicated. All residues are labelled in one letter code. The residues at active and binding site are labelled in black whereas the E region residues are labelled in red. The inhibitor is removed for clarification. The coordinates of SPase A2-75 is from Protein Data Bank with a code 1812.

Karla et a/. (2005) screened residues within the substrate-binding pocket of SPase I responsible for the fidelity and substrate specificity of SPase [63].In this study, all of the residues that contribute to the substrate-binding pocket S1 and S3 were re-designed to either increase or decrease the pocket size. They concluded that, except for single mutant 186A and double mutant 186A/1144A, most mutants including F84A/W, M91A, and L95A were still capable of cleaving substrate, and reduced activity only by 10-20-fold (Table 1.2). Mutation of G89 to

G89A had no effect in activity, whereas L95R abolished all enzymic activity. The introduction of the large and charged residue, Arg, at position 95 was expected to interfere with the substrate binding of SPase I and it inactivates SPase I. Double mutant 186N1144A SPase is a poor enzyme, having only 111,000 activity when compared with wild type SPase I. In addition, mutant of 1144C alters the fidelity of

SPase, resulting in multiple cleavages on PONA [63]. Therefore, residues 186 and lle144 are important for substrate specificity in SPase, which is consistent with crystal structure in which these two residues reside at the bridge between the substrate-binding sites S1 and S3 and are important for shaping these two substrate-binding sites (Figure 1.7 and 1.I 1) [I81.

Klenotic et a/. (2000) scanned all of conserved residues in region E, which are critical for maintenance of enzymatic activity and these residues are surrounding the active site in crystal structure (Figure 1.I I) [18, 641. They found that neither the salt bridge between R146 and D273 nor the hydrogen bond between T94 and D273 is required for activity, since all mutants of R146A,

R146M, R146AIT94V and R146AlD273A, as well as triple mutant

T94VIR146AID273A maintained full activity. The effect of D273AlN on activity is not clear, due to poor expression and extraction difficulty, but D273 E is fully active. Based on the data, the authors explain that D273 is essential for activity only when its partner R146 is present [64]. G272 is essential for SPase 1 activity, since the G272 mutant demonstrated 750-fold reduction in activity. This is consistent with its structure role because G272 is located within van der Waals distances from both the general base of Lys 145 and Ser278. Any other residues with a side chain bigger than a hydrogen atom at position 272 would result in a steric obstruction of the side chain atoms of Lys 145 and therefore perturb the active site. Ser278 is required for SPase I activity in both in vitro and in vivo and

S278A displayed 300-fold reduction in activity, which is consistent with the structural role of Ser278. Ser278 resides within hydrogen bonding distance of

Lys145, and positions Lys145 to Ser9O. Ser278 is also a conserved interaction partner of Asp280 and Arg282. Asp280 is located within hydrogen bonding distance of Ser278 and is believed to facilitate the correct position of Ser278.

However, Asp280 and Asp282 are not required for activity since D280E and

D282A mutants showed only a slight decrease in activity. Table 1.2: Mutants of bacterial SPase I at the active and substrate-binding site Residues Function and Mutant Activity Activity Reference Structural Role of Enzyme Reduction relative residue (Vitro, to WT in fold) (Vivo) Active site Catalysis

I I I S90 Catalyt~cNucleophile S90A >100,000 inactive Sung etal. 1361

S88 Oxyanion Hole S88A, 2440 little less Carlos et a/.[61] SB~C, 740 little less ~88~4 50% K145 Catalytic General K145C, K145R, >I00,000 inactive Paetzel eta/. [42] Base K145A, K145H Binding site Substrate-binding Pocket F84 S3 F84A, F84W 2, 10 full Karla et a/. (631 I I I I 186 SllS3 186A 1 200 I active G89 S 1 G89A ] 0 full I M9 1 S 1 M91A 1 20 full

I I I 1130 S 1 1130A 0 full I I 1 I V132 S3 V132A, V1321 100, 0 full

Dl42 S3 D142E,D1421 10,O full

Y143 SllS3 Y143A, Y143W 10, 0 full I I I I 1144 SllS3 ( 11444A(C or S) / 2, 2, 10 full I I I I 186 11144 SllS3 1144A 1186A 1.000 little less 1 I I I Box E Vicinity of the active site T94 1 H-bond Interacts with I T94V / 0 full Klenotic etal. [64] Asp273 R146 I Salt bridge with I R146A, R146M / 0 full Klenot~c& Kim et ~~~273 at. [64, 651 G272 G272A 750 impair Klenotic et al. (641 D273 I D273A, D273N 1 100-200 1 impair Klenotic & Sung et a/. [36, 641 R146/D273/T R146ND273NT 0 no effect Klenotic et a/.1641 94 94v N274 N274A 0 little less Klenotic et a/. [64]

N277 N277A, N277D 0 not clear Klenotic et a/. [64]

Klenotic eta/. 1641 Ser9O D280 positions Ser278 and D280A, D280E >I,000 little less Ser278 then positions K145 to Ser9O R282 R282M

I I I I a/. [64, 651 G285 G285A 10 full Klenot~cet a/. [64] 1.7 Inhibitors of Serine Protease

1.7.1 General Concept of lnhibitor

An inhibitor is a molecule that reduces or eliminates the catalytic activity of an enzyme. Generally, we can divide a broad variety of enzyme inhibitors into two major classes, irreversible and reversible inhibitors [66] (Figure 1.12). An irreversible inhibitor, also known as "a suicide inhibitor", forms a strong covalent bond with a specific functional group of a catalytic residue from an enzyme and permanently disables the enzyme. Reversible inhibitors can bind to enzymes through either covalent or non-covalent interactions. These covalent bonds are usually unstable and weak, whereas non-covalent interactions can be formed and broken quickly and easily. Therefore, the inhibition is effective and instantaneous but is not permanent. YInhibitor

Figure 1.12: Classification of enzyme inhibitor Enzyme inhibitors are divided into two major classes, irreversible and reversible inhibitors. Reversible inhibitors are sub-classified into four groups, competit~ve,non- competitive, un-competitive, and mixed inhibitors. Reversible inhibitors can be sub-classified as competitive inhibitors, non- competitive inhibitors, uncompetitive inhibitors and mixed inhibitors [66].

Competitive inhibitors, as the name suggest, act as competitors for substrates binding to the enzyme. Competitive inhibitors can bind to the free enzyme (E), but not the enzyme-substrate complex (ES) [66].A competitive inhibitor reduces the substrate binding by two means. One is that a competitive inhibitor shares a structural and chemical similarity to the substrate and can directly occupy the same active site and blocks substrate binding. The other is that a competitive inhibitor binds to an inhibitor-binding site, which is not the same as the active site and might be remote from the active site. On binding, however, the inhibitor causes a three-dimensional conformational change in the enzyme, thereby altering the conformation of the active site such that the substrate can no longer to bind to the enzyme. In this case, a competitive inhibitor does not necessarily have a structural and chemical similarity to the substrate of the enzyme and the inhibition is an allosteric competitive inhibition. Inhibition of a competitive inhibitor cannot be overcome by increasing the concentration of substrate. If the inhibitor binds to the enzyme first, then the substrate cannot bind to the enzyme and vice versa.

A non-competitive inhibitor never binds to the active site of enzyme. A non-competitive inhibitor can bind to E and ES with an identical affinity [66].The extent of inhibition dependents only on the inhibitor concentration and will be not affected by the substrate concentration since there is not any competition between the inhibitor and the substrate for the enzyme. Uncompetitive inhibition results from the binding of the inhibitor with the enzyme-substrate complex directly, without the formation of an enzyme-inhibitor complex. In the other words, an uncompetitive inhibitor binds only to the enzyme-substrate complex (ES), not to the free enzyme (E), and the EIS complex is catalytically inactive [66]. Mixed inhibition may be considered a consequence of several kinds of inhibition mentioned above.

1.7.2 Serine Protease Inhibitor

Serine proteases have serine as the catalytic residue at their active sites.

Serine proteases carry out a number of hydrolytic reactions at intra- and extra- cellular environment [67]. They exist in viruses and in all living organism from bacterial to mammals. Excessive serine protease activity is a potential hazard to the protein-rich environment and is responsible for serious diseases such as in neural development. Therefore, this activity must be precisely controlled by regulating their transcription, post-transcriptional regulation, secretion, activation, and degradation [68]. A second level of regulation is by inhibition of their hydrolytic activity [69]. Cellular serine protease inhibitors provide a natural balance to their cognate proteases.

The inhibitors of serine proteases are classified either protein-based inhibitors or small-molecule compounds, including peptide-based inhibitors and chemical compound inhibitors (Figure 1.I 3), based on chemical formula.

Structurally, protein-based inhibitors comprise a-helical, P sheet, and a1 PI and small disulfide-rich proteins [69]. Protein-based serine protease inhibitors could be sub-classified as canonical inhibitors (standard mechanism), non-canonical inhibitors, and serpins (serine protease inhibitors) (Figure 1.I 3) 1691.

I Protein-based inhibitor small-molecule inhibitor I I t- canonical non-canonical chemical compound walkylating t-= t-= acylating phosphonylating t+=- t+=- sulfonylating I- miscellaneous

Figure 1.13: Classification of serine protease inhibitors Serine protease inhibitors are divided into two major classes, protein-based and small molecule. Protein-based serine protease inhibitors are sub-classified into three groups, canonical, non-canonical, and serpins. Small-molecule serine protease inhibitors are sub-classified into five groups, alkylating, acylating, phosphonylating, sulfonylating and miscellaneous inhibitors.

Canonical inhibitors are small proteins comprising fewer than 200 amino acids and are found in viruses through mammals. Bovine pancreatic inhibitor (BPTI), turkey ovomucoid third domain (OMTKY3), and eglin are examples of canonical inhibitors. The mechanism of this type of inhibition is to block the active site by mimicing the enzyme-substrate and by forming a

Michaelis complex. The inhibition is competitive and does not cause major conformational changes in either the inhibitor or protease. The interaction between enzyme and inhibitor could be through a loop resting across the free side of (3-sheet (i.e. Eglin C), an exposed convex protease-binding loop ( i.e. trypsin-aprotinin complex), or an important intrusion of a PI residue (i.e. BPTI)

[70-721. Usually, the exposed convex protease-binding loop and the PI residue have complementary shape with the concave active site of enzyme.

Similar to canonical inhibitors, serpins are group of structural related large proteins, 350-500 amino acids in size, which interact with their target protein in a substrate-like manner, forming a covalent acyl-enzyme complex. However, serpins can cause a dramatic conformation change in inhibitor and disrupt the active sites of proteases.

Known non-canonical inhibitors are hirudin, TAP (tick anticoagulant peptide), and ornithodorin, which use the N-terminal segment to bind to the protease active site forming a short parallel P-sheet [73].These inhibitors can also non-competitively bind to other areas other than the active site of serine protease. An example of this is the recognition of by hirudin [74].

Small-molecule inhibitors of serine proteases are excellent candidates with which to develop therapeutic drugs that are selective for a wide spectrum of serine protease targets. Up to now, most discovered (natural inhibitors) and developed (synthesized) small-molecule inhibitors for serine proteases are irreversible inhibitors [75].Powers et a/. (2002) have further sub-classified irreversible small-molecule inhibitors of serine proteases on the basis of inhibition reaction, as alkylating, acylating, phosphonylation, sulfonylating, and miscellaneous inhibitors [75]. We selected some small-molecule inhibitor examples of serine proteases, with a special consideration of known X-ray crystallographic complexes of serine protease-inhibitors (Table 1.3), based on a search from Merops database, PDB, and the literature [75, 761. Table 1.3: Small-molecule inhibitors of serine protease (1) Inhibitor Chemical PDB Reference Name Structure ID Irreversible inhibitor Alkylating

Chlorornethyl Ketones (abbreviation: AA-RCOCH2CI) AA. amino acid 1 HA1 Vijayalakshmi et a/. [77] n.a. Poulos et at. [78] 1 PPG Wei et a/. [79]

IARC Tsunasawa et a/. [80]

?A*' TOS-LYS-CH2CI(TLCK) ma. Powers et a/. [751

Tos-Phe-CH2CI (TPCK) 1cw Kernball-Cook et at. [81]

1, ~-D~S-GILJ-GI~-A~~-CH~CI dansyl (dimethylaminonaphthalene-5- sulfonyl) Inhibitor Chemical PDB Reference I Name Structure Irreversible inhibitor

--- Acylating 1. Aza-peptide esters n.a. Gupton et at. [821

2. Peptidyl Carbamates esters R'NHCOOR or RSCONR'R"

n.a. Powers et a/. [751

For porcine pancreatic (PPE) '~\// n.a. Powers et a/. (-1 7 [751

For human leukocyte elastase (HLE) 3. Peptidyl Acyl n ma. Powers et at. Hydroxamates [751

1SCN Steinmetz et al [831

1ELG Ding, X., et at. 1ELF WI Inhibitor 1 Chemical Reference Name Structure Irreversible inhibitor

4, p-lactams and R Powers et a/. Analogues [751

4-1. monocyclic p- RJ=r 1BTU Wilrnouth, R. C. lactarns et a/. [85] (azetidinones Or monobactarns)

0 '< 1\ S 0 :R'

R7 R" = E.t R' = p-N02C6H4 4-2. y-lactarn 1E34- Wright, P. A. et 1E38 a/. [86] />$-- N. \S O?lzlj~,R 0

R = CH3, N02,CF3 (3S, 4S)N-para-toluenesulphonyl-3- ethyl-4-(carboxylic acid) pyrrolidin-2-one n.a. Supuran et a/. [a71

0 GO R

R = 0CH2Ph, OMe, OCHaOMe R n.a. Supuran et a/. [a71

0 R =nAc, ClH 5. Heterocyclic Inhibitors Inhibitor Chemical PDB 1 Reference I Name Structure Irreversible inhibitor 5-1. lsocoumarins 0 I I

I CI X=H.MH2, NKc(=NH2f)NH2, NH2 Y=C\ OK 0(CH3)3-SC(=NH23k.iH2 DCI n.a. Harper et a/. 3,4-dichloroisocoumarin (X=H, Y=CI) 1JIM H2N 1 D+fL,OCF.3

CI

9EST Vijayalakshmi, J. et a/. [go]

CI I 3-(2-bromoethoxy)-4-chloro-7-amino- isocoumarin

8EST Powers et a/. PA1

isocoumarin 5-2. Benzoxazinones 5

R 2 Inhibitor Chemical PDB / Reference I Name Structure ID Irreversible inhibitor 1INC Radhakrishnan, R. et a/, [92]

" Kt ;Cl CHI

R2 ;(2-~+thylpmpyl) c&anic acid (1,I-&lhylelh;rl) ester

[I-(5-methyl-4-0x0-4H-3,l -benzoxazin-2- yl)-2-methyl-propyl] carbamic acid 1.1- dimethylethyl ester 5-3. Saccharins 0 ,o ma. Powers et a/. [751

Phosphonylation 1. Peptide Phosphonates

1H81 Skordalakes et a/. [93]

O.., C~Z-D-D~~-P~O-M~~~(OP~~ Mpg(u-amino-6-methoxypropyl phosphonate) 1MAX Bertrand et a/. [941 Inhibitor Chemical I PDB Reference I Name Structure ID Irreversible inhibitor

1P11 Bone, R. et a/. 1P12 [95] 2. Phosphonyl 1CGH Hof, P. et a/. Fluorides ~961

~iisopropylfluorophosphate(DFP) 1AT3 Hoog etal. [97] 1DFP Cole et al. [98] 1XZK Londhi et a1 P91

Sulfonylating Sulfonyl Fluorides

1KLT

PMSF (Phenyl Methane Sulfonyl Fluoride) n.a: not applied Table 1.4: Small-molecule inhibitors of serine protease (2) Inhibitor Chemical PDB Reference Name Structure ID Reversible inhibitor 1. Peptidyl Aldehyde 2AG1 Radisky et a/. [I011 (Tansation-state 1JRS Kurinov et a/. [I021 mimics) 1JRT 1-1 Leupeptine e.g. N-acetyl-L-leucyl- L-leucyl-D, L- argininaldehyde

1-2, antipain 1BCR Ballock et a/. [I031

HOOC H Ary - Val - Arg - H

[(S)-I-carboxy-2-phenylethyll- carbamoyl-L-arginyl-L-valyl- argininal 1-3 Elastatinal n.a. Telang et a/. [I041

N-[(S)-1-carboxy-isopentyll- carbamoyl-alpha-[2- irninohexahydro-4(S)-pyrimidyll- (S)-glycyl-(S)-glutarninyl-(S)- alaninal 1-4. Chymostatin 1SGC Delbaere et a/. [I051 2SGA Moult et a/. [I061 3SGA James et a/.[I 071 4SGA James et a/.[107] 5SGA James et a/.[107] 1BCS Ballock et a/. [I081 Inhibitor Chemical I PDB 1 Reference Name Structure Reversible inhibitor

2. Boronic Acid OH \ Bone et a/. [I 091 (Transation-state Analogs) r

3. Benzamidine mwpJH: 3PTB Marquartetal[IlO] (S1 pocket binding) H:'~- 1J15 Reyda et a/. [I 1I] \=/ Jin '"IZHM et a,, [I121- - 0) (2) IZHP 1DVVT Banner et a/. [I 131 Benzamidine

Among the irreversible serine protease inhibitors, chloromethyl ketones

and their derivatives are peptide-based alkylating inhibitions. Alkylating inhibitors

such as Tos-Phe-CH2CI (TPCK) and Tos-Lys-CH2CI (TLCK) (see chemical

structure in Table 1.3) are widely used for inhibiting and trypsin,

respectively [76]. The mechanism for alkylating inhibition is that the catalytic

residue Ser of the enzyme nucleophilicly attacks the carbonyl carbon of the

ketone to form a tetrahedral adduct, and the histidine general base is alkylated by the chloromethyl ketone functional group [75].

Acylating inhibitors, also represent a big group of serine protease inhibitors, which includes aza-peptides, carbamates, peptidyl acyl hydroxamates, p-lactams, and a variety of heterocyclic derivatives [75]. Similarly, in the acylating inhibition, the catalytic residue Ser of the enzyme nucleophilicly attacks the carbonyl carbon to form a covalent acyl-enzyme intermediate. Aza-peptides are peptides with an aza-amino acid residue in which its a-carbon has been replaced by a nitrogen atom (Table 1.3). Peptidyl carbamate esters (RNHCOOR') and thio

carbamates (RNHCOSR') are specific for the inhibition of porcine pancreatic

elastase (PPE) and human leukocyte elastase (HLE), but have no effect on

trypsin and chymotrpsin [76].Peptidyl acyl hydroxamates are particular inhibitors

of depeptidyl peptidase IV. P-Lactams consist of monocyclic p-lactams and

bicyclic p-lactams, and inhibit a variety of serine proteases including HLE, PPE,

E. coli signal peptidase, chymotrysin, trypsin and thrombin. y-Lactams can also

acylate and other serine proteases [76].Acylating heterocyclic

compounds include isocoumarins, benzoxazinones, saccharins, and

miscellaneous heterocyclic inhibitors and can inactivate most serine proteases.

3, 4-dichloroisocoumarin (DCI) is a general isocoumarin derivative inhibitor of serine proteases. The inhibition of acylating heterocyclic compounds involves the

nucleophlic reaction with serine residue to form a stable acyl-enzyme intermediate. The attack of catalytic residue serine results the heterocyclic ring opening. Phosphonylating inhibitors including peptide phosphonates and phosphonyl fluorides inhibit trpsin and thrombin. Sulfonyl fluorides are the typical agents of sulfonylating inhibitors.

Compared with a huge population of irreversible inhibitors, there are fewer examples for serine protease in complexes with reversible inhibitors. Many aldehydes are known as natural reversible inhibitors of serine peptidases [I141.

They form hemiacetal or thiohemiacetal conjugates with the essential hydroxyl or thiol group of the enzyme that are transition state analogues 1761. Leupeptins such as N-acetyl-L-leucyl-L-leucyl-D, L-argininaldehyde, antipain, elastatinal, and chymostatin comprise one class of aldehydes. Peptidyl-based boronic acid

inhibitor binds in a substrate-like fashion, with Boc group in the P4 position [I091.

The nucleophile Ser formed a covalent bond with the boron atom, forming a

tetracoordinate boronate species. Benzamidine inhibition involves van der Waals

and electrostatics contacts with the S1 specificity pocket [I10-1 131.

1.8 Inhibitors of Bacterial SPase I

There are many reports that bacterial SPase I is not inhibited by the

standard serine protease inhibitors [5]. One reason may be that SPase I use a

unique SerILys dyad catalytic mechanism instead of SerIHislAsp triad catalytic

mechanism commonly seen in classic serine proteases. Nevertheless, four types

of inhibitors have been found for E.coli SPase I (Table 1.5). The first type is

represented by signal peptides that inhibit the hydrolysis of enzyme in a

competitive manner. The 23-amino-acid residue signal peptide from MI3 procoat

(MKKSLVLKASVAVATLVPMLSFA) can inhibit the processing of procoat and

pre-MBP in vitro [I151. Another type of inhibitor is a pre-protein that carries a

signal peptide with a proline at the +I position relative to the cleavage site. The

proline, may twist the c-region @-conformationthat prefers to fit into the binding sites of the enzyme, which interferes with catalysis [I161. The most effective

inhibitors found so far is a group of @-lactamcompounds, referred as the penem- type inhibitors [I17-1 191. Penem-type inhibitors mimic the natural peptide bond of the substrate. They are covalently attached to Ser9O Oy and the reaction is irreversible. Among the penem inhibitors, the 5s stereoisomers have a higher

potent. The 5s stereoisomers are the mirror image compounds of the 5R @- lactams that are required for the inhibition of p-lactamases and penicillin-binding

proteins. The compound allyl (5S, 6s)-6-[(R)-acetoxyethyll-penem-3-carboxylate} has an inhibitory concentration (I&)) value less than 1 pM against E. coli. The crystal structure of E. coli SPase A2-75 in the complex with this penem inhibitor determines that Ser9O Oy nucleophilicly attacks the si-face of the scissile peptide backbone of SPase substrate [18]. In contrast, a classical serine protease usually attacks the ri-face of a scissile bond [5].

Recently, a family of novel lipopeptide-based inhibitors, including lipohexapeptides and glyco-lipohexapeptides, were isolated and identified from

Streptomyces sp (an aerobic soil bacterium that produces antibiotics) [120, 1211.

Interestingly, these inhibitors are from non-ribosome synthesis. Non-ribosomal peptide synthetases (NRPS) are large multiple-enzyme complexes [122]. Unlike the ribosome, an NRPS does not read an mRNA code and makes only one peptide. NRPS can make a wide diversity of peptide such as cyclic and branched backbones, D-amino acids, and structural modifications including N-methyl and

N-formyl groups, glycoslated, acylated, halogenated, or hydroxlated. Arylomycins are lipohexapeptides, have shown to be inhibitors of SPase. One of Arylomycins,

Arylomacin A2 (Figure 1.14) has been co-crystallized with E. coli SPase A2-75

POI. Figure 1.14: The structure of Arylomycin A2 PI, P3, and P5 are designated corresponding to the enzyme binding sites S1, S3, and S5, according to Schechter and Burger nomenclature.

Table 1.5: Inhibitors of bacterial SPase I Inhibitor Chemical I PDB Reference Name Structure ID 23-amino acid signal MKKSLVLKASVAVATLVPMLSFA n.a. Wickner et a/. [I151 peptide Pre-protein with a Barkocy-Gallag her praline at +I relative to etal. [I161 signal peptide cleavage site P-lactam Kuo, Black, Allsop et a/. [I17-1 191 e.g. {allyl (5S, 6s)-6- [(R)-acetoxyethyll- penem-3-carboxylate) Figure 1.8 lb12 Paetzel et a/. [I81 Lipopeptide-based Schimana et a1.[1m

e.g. Arymycin A2 Fig ure 1.14 Iti'd Paetzel et al. [40] Glycolipopeptde-based Figure 1.16 1 n.a. Kulanthaivel et a/. (1211 e.g BAL 4850C Figure 3.1 Crystal Chapter 3 of this Structure thesis solved A sultam/morpholino e.g. BAL 0019193 Figure 2.7(1) and 4.1 Crystal Chapter 4 of this Structure thesis solved ma.: not applied The crystal structure of SPase A2-75 in the complex with Arylomycin A2 revealed that the inhibitor forms hydrogen bonds to the active site [40]. The

COOH-terminal 045 of the inhibitor is hydrogen bonded to the catalytic residues

Ser9OOy and Lys 145 NC, and the oxyanion hole Ser 88 Oy. The carboxyl group of its COOH terminus resides within the substrate-binding site S1 of the enzyme.

The C30 methyl group points into the substrate-binding pocket S3. The C9 methyl group points toward a shallow pocket, which is a possible S5 binding pocket.

Figure 1.15: The active site of SPase A2-75 in the complex with Arylomycin A2 The active site of SPase A2-75 is represented through semi-transparent surface representation and the important residues that are involved in catalysis and binding are displayed through stick representation. Both N-terminus (residue 81-85) and the region of residues 142-145 taking P-strand conformation are shown. The inhibitor Arylomycin A2 is represented as stick and coloured according to the elements (yellow for carbon, red for oxygen, blue for nitrogen). The 045 of arylomycin is non- covalently binding (Hydrogen bonding) to residues S90, S88 and K145. The 044 is hydrogen bonding to residue 1144. The N33 is hydrogen bonding with D142. Glyco-lipohexapeptides, found from the same Streptomyces sp, share the

similar structure feature as Arylomycin A2 except an extra a deoxy-a-mannose

sugar is attached [121]. Glyco-lipohexapeptides have modest antibacterial

activity with 4-8pM against E. coli [121]. Up to now, no crystal structure for

SPase in complexes with a glyco-lipopeptide has been solved or reported.

Nevertheless, this thesis reports a solved crystal structure of SPase I in complex with a glyco-hexapeptide (Chapter 3). The impact of the discovery of this novel

family of natural inhibitors is that they are new antibiotic candidates with a

original structural scaffold for future a rational structure-based drug design

approach.

R1 =OHorH R2 = H or is0 or non-iso C15-17 fatty acid

Figure 1.16: The structure of glycolipopeptide-based inhibitors

1.9 Bacterial SPase I: A Novel Antibacterial Drug Target

Bacterial SPase 1 is a membrane-bound endopeptidase that functions to cleave amino-terminal signal peptides from secretory protein and some membrane proteins. It is a well-characterized novel serine protease, whose

structure has previously been revealed at 1.9 a [I81. It is an attractive target for

novel antibiotic. Its activity is essential for cell viability and its active site is

accessible on the outer leaflet of the cytoplasmic membrane, making drug

delivery relatively simple. In addition, it utilizes a novel Serl Lys dyad catalytic

mechanism with unique stereochemistry requirements for its substrate

recognition and catalysis, making the selection of a novel inhibitor possible.

Significantly, selected inhibitors from bacterial SPase I should not be toxic to

mammalian cells since bacterial and eukaryotic SPase 1 are different in location

(cytoplasmic membrane vs. endoplasmic reticulum membrane) and the catalytic

mechanism (Ser-Lys vs. Ser-His ). Therefore, bacterial SPase 1 is an ideal target

for the development of a novel antibiotic.

1.I 0 Crystallographic Analysis: A Vital Tool in Drug Discovery

Crystallographic analysis of protein and inhibitor complexes is a vital

component in the innovative structure-based approach to drug design.

Determination of the three dimensional crystallographic structure of the protein-

inhibitor complex provides detailed information about the chemical interactions

between the protein and the inhibitor, and produces a visual model of the active

site into which "new" inhibitors can be designed to fit. This approach dramatically

decreases the number of compounds that must be screened in the traditional

drug discovery process, and allows a much faster response time in developing new drugs such as antibiotics, which is critically important in this era of ever

increasing antibiotic resistance. 1.I 1 Co-crystals of Protein-in hibitor Complexes

Although structure-based drug design approach largely reduces the numbers of inhibitors that need to be screened in the process of a novel antibiotic discovery, dozens of three-dimensional models of various protein-inhibitor recognition have to be determined. Because of such a requirement, obtaining diffraction-qualified co-crystals of protein-inhibitor complexes becomes an essential process.

Two methods, co-crystallization and soaking, are commonly used in generating co-crystals of protein-inhibitor complexes [123, 1241. In the co- crystallization method, protein and inhibitor molecules are combined together in solution before setting up crystal growth experiments to search for optimal crystallization conditions. It is common that this process has to be repeated for each new inhibitor. In the soaking method, the inhibitor is incubated with pre- formed apo-crystals of the target protein. The soaking method is usually applied when the crystallization conditions of the target protein has been well established. The soaking method is also used when the crystallization condition is known for one type of protein-inhibitor complex. One can replace an originally co-crystallized inhibitor with a new inhibitor by bathing the original co-crystal into a new inhibitor solution [125]. With the advantage of speed, convenience, and re- producibility, the soaking method is more compatible with structure-based drug design approach. This method provides a large number of diffraction quality crystals with known structure and the iterative crystallization step for each new inhibitor is omitted. 1.I 1 .I Co-crystallization

In order to set up the successful experiments for obtaining co-structure of

protein-inhibitor complex, a couple of factors should be considered. Molecule

homogeneity (purity) is a key component in success of crystallization. However,

when dealing with crystallization of protein-inhibitor complex, it is impossible to

avoid introducing heterogeneity (impurity) since the binding of protein and

inhibitor has to reach equilibrium [123]. If free inhibitor (I) is added to the target

protein (P), equilibrium is established between the protein, the inhibitor, and the

inhibitor-protein complex (PI), as P + I t, PI. According to this equilibrium, we

can define a dissociation constant, Kd:

Kd = [PI [I] 1 [PI]

For evaluating the occupancy of inhibitor to protein, fractional saturation Y

can be defined as Y= [PI] 1 [PI + [PI]. Since [PI] = [PI [I]! Kd, Y = [I] 1 Kd + [I] could

be derived. So the fraction of protein in the inhibitor-bound form, Y, will be

determined by the ratio of [I] to Kd.

For obtaining 90% occupancy, the amount of inhibitor added must be greater than the amount of protein such that the free [I] at the equilibrium should

be greater than 10 x Kd. Therefore, in reality, the initial concentration of inhibitor added should be much greater than 10 x Kd to insure the majority protein

molecules are homogeneity in inhibitor-bound form. If the Kd is unknown, commonly, molar ratio of inhibitor to protein used should be up to 10:l. If inhibitor is not plentiful, molar ratio of inhibitor to protein should be at least 1:l. Such an adjustment is to ensure promise in inhibitor-bound protein molecule

homogeneity.

The ratio of [I] to Kd is not only dependent on the concentration, but also the solubility and the binding affinity of the inhibitor. Compound solubility directly influences protein-inhibitor occupancy. Most synthesized chemical inhibitors are great lipophilic and are not water soluble [126]. Some solvents rather than water are more suitable for dissolving a lipophilic compound. Dimethyl sulfoxide

(DMSO), for example, is an effective solvent and concentrations of 5% DMSO or greater have been found to improve compound solubility [126, 1271. In addition, a number of crystallization reagents containing polyethylene glycerol (PEGS), alcohols or other additives may help with inhibitor compound solubility [128]. The solubility of the inhibitor in solution must be equal to, or greater than, the concentration required for good occupancy of the protein. Furthermore, the pH of the crystallization mother liquor can influence both inhibitor solubility and binding affinity because it influences the ionization state of functional groups of the inhibitor, and the active and binding sites of the protein.

1.1 1.2 Soaking

Unlike small-molecule crystals, protein crystals are loosely packed because protein molecules are large and irregular on their surfaces, forming many solvent channels cross a crystal. The size and configuration of these solvent channels are determined by crystal lattice, and the average size are 20-

100 a in diameter [129]. These solvent channels are full of solvent molecules, mostly water molecules. Typically, water content in a protein crystal may be up to 30% to 80% based on Matthews' calculation [130]. Upon soaking, the inhibitor

molecule must diffuse into the crystal through the solvent channels, and bind to

the active and binding sites of protein molecules.

A critical factor for soaking is that protein crystal lattice must be

compatible with inhibitor diffusing and binding [123, 1251. During the soaking

process, the crystal lattice may undergo mechanical stress because of inhibitor

molecule accessing and binding to the protein. In other words, inhibitor binding

may trigger a conformation change of protein molecule, which may alter the

surface shape of the protein molecule and the protein crystal-contact interactions

as well. Consequently, the crystal lattice may change, resulting in cracking or

dissolving of the pre-formed apo-crystal or the original co-crystal. To overcome

crystal lattice incompatiblity, it is better to grow more than one type of crystals,

each with a different crystal lattice (space group). When one form of crystal fails

for soaking, switching to a second crystal form may yield success since different

crystal lattices may tolerate the protein conformational change differently upon

the inhibitor binding. Note that if crystal contact interactions in a crystal lattice

actually are involved in the active and binding sites of protein molecules, that is,

blocking the access of inhibitor, the soaking method will be most likely

unsuccessful.

The equilibrium occupancy in soaking may not be identical to that in co- crystallization, but will be similar [123]. At least 25-30% of the binding sites need to be occupied in order to see inhibitor density showing in electro-density maps

[I311. Therefore, in the soaking method, [I] does not necessarily have to be 10 xKd since the soaking method does not require high molecule homogeneity for crystallization.

Soaking time has to be considered and must be experimentally tested. In the pre-formed crystals, where protein molecules are fixed with the crystal lattice, inhibitor molecules must travel from the crystal surface to the interior through the solvent-filled channels in order to reach the active and binding sites on protein molecules. This rate is slow and inhibitor diffusion is solely determined by inhibitor-concentration gradient [132]. On the way to the crystal interior, inhibitor molecules are subjected to size-exclusion and the chemical properties of a solvent channel. This is because the inhibitor molecule may interact with the polar or hydrophobic groups on residues that make up the channel. In addition, crystal-growth mother liquor is commonly used to dissolve inhibitor for setting up soaking experiment. Some bulk and viscous molecules such as PEGS used in high concentration in crystal growth reagents may accumulate within the solvent channel and slow down inhibitor diffusion. In rare cases, inhibitor molecules may aggregate to form large particles as big as 30-400 nm, even in aqueous solution at micro-molar concentration [I331.

1.12 Overview of Objectives

To develop a novel antibiotic based on bacterial SPase 1, the Paetzel lab in collaboration with the drug company Basilea Pharmaceutica Ltd in Basel

Switzerland is working to co-crystallize SPase A2-75 with a number of peptide- based inhibitors. These peptide-based inhibitors were isolated from natural products by high-throughput screening. They are structurally very close to the natural substrate cleavage site of bacterial SPase I.

The specific goals in this project are:

Screening and optimizing co-crystallization conditions for E. coli

SPase A2-75 and related mutants in complex with a number of

peptide-based inhibitors.

Collecting X-ray diffraction data of E. coli SPase A2-75 tinhibitor

and its mutantlinhibitor complex crystals using the new SFU

Macromolecular X-rays Diffraction Data Collection Facility and

Synchrotron Radiation.

Processing the X-ray diffraction data collected, and solving and

refining structures.

Improving the X-ray diffraction quality of E, coli SPase A2-75

tinhibitor crystals by mutagenesis and protein truncations.

The research presented in my thesis is a starting point toward our ultimate goal, which is to develop a novel antibiotic. I used X-ray crystallography to study

E. coli SPase I in complexes with a novel family of peptide-based inhibitors, and attempted to obtain the information regarding the interactions between this enzyme and the inhibitors. CHAPTER 2: THE CRYSTALLIZATION AND PRELIMINARY X-RAY ANALYSIS OF E. COLI SPASE I IN COMPLEX WITH PEPTIDE-BASED INHIBITORS

2.1 Introduction

In the use of x-ray crystallographic analysis to evaluate known drug candidates or to direct structure-based drug design for new drug leaders, the ability to obtain ligand-free native-crystals of a target protein or co-crystals of a target protein in complex with inhibitors is essential. However, crystallization of macromolecules (e.g. protein) with or without an inhibitor is not an easy task. For example, even when crystallization conditions for a target protein have been optimized, obtaining co-crystals with inhibitors is often not successful [123].

Furthermore, even when crystallization conditions for co-crystals of some inhibitor complexes are known, applying these conditions to other new inhibitor complexes and obtaining co-crystals could also prove difficult.

Preparation of sufficient amount of pure and active protein is a crucial step, which directly contributes to the quality of crystals for X-ray diffraction analysis. In this work, three proteins were prepared. The major focus was SPase

A2-75 (residues 76-323, Mr. 27,952 Da and pl 5.6), which corresponds to the large soluble, periplasmic catalytic domain of E. coli SPase 1. We used SPase

A2-75 as a principle target for screening the new inhibitors because SPase A2-

75 has been previously crystallized both in its apo-enzyme form and in the complexes with two distinct inhibitors [I8, 38, 401. In addition, two mutants of

SPase A2-75, inactive enzyme SPase A2-75 S90A and SPase A2-75 K145A, in

which the catalytic residue Ser9O and Lys145 of SPase A2-75 has been changed

to Ala, respectively, have been purified. The methods of protein overexpression

and purification were developed according to the previous description of Paetzel

and co-workers [38], with some modifications. Here, the methods of protein

overexpression and purification are mainly described for SPase Q2-75, but they

were applied to all three of the proteins.

We have a number of peptide-based inhibitors, provided by the drug

company Basilea Pharmaceutica Ltd in Basel Switzerland. Many of peptide-

based inhibitors are natural products by high-throughput screening. These

inhibitors can be sub-classified as Glyco-lipopeptide, lipopeptide and peptide,

based on their chemical formula (See Figure 2.7, 2.8, and 2.9). They are

structurally very close to the natural substrate cleavage site of bacterial SPase 1.

To obtain high quality ordered co-crystals of protein in complex with peptide-

based inhibitors, we have applied co-crystallization and soaking.

This chapter summarized protein overexpression, purification,

proteinlinhibitor crystallization, X-ray data collection, and preliminary X-ray

analysis (for an outline of the procedure, Figure 2.1), which is a general

procedure, applied to all of the proteinlinhibitor complexes in this study. Further specific methods that were applied to each individual proteinlinhibitor complex such as the final optimization of crystal growth conditions and the advanced X-

ray diffraction data collection, processing, phasing, model building, inhibitor docking, as well as structure refinement and structure analysis will be found in the next chapters corresponding to each proteintinhibitor complex. In this chapter, crystallographic terms will be introduced gradually. The definitions for most of the crystallographic terms will be found either in the glossary or in the

Chapter 5.

Here, we report a number of new conditions for the co-crystallization of E. coli SPase A2-75 in complex with peptide-based inhibitors. For the first time, we have crystallized an inactive enzyme E. coli SPase A2-75 S90A both in inhibitor- free and in complex with a peptide-based inhibitor. The main reason for the crystallization of the inactive SPase A2-75 S90A and SPase A2-75 K145A in apo-enzyme is to see whether the active site conformation will be changed. The inhibitor binding to these two inactive proteins is for the purpose of an easy crystallization by 'ligand effect', which helps the crystallization. Diffraction data were collected for a number of these crystals with a resolution range from 2.0A to

2.6A. Overexpression

I Inclusion Body Purification I

Soluble Protein -

I Pure Protein ( a-=t> I Inhibitor I 8 Co-cwstallization 1

Proteinllnhibitor

X-ray Diffraction Data Collection

Data Processing n

Model building Refinment n

I 1 Direct crystal I Direct rational 1 [ Engineering Drug design

Figure 2.1: An outline of the experimental procedure 2.2 Materials and Methods

2.2.1 Recombinant Plasmids, Peptide-based Inhibitors, and Chemicals

E. coli strain BL21 (DE3) containing the plasmid pET3dISPase A2-75,

pET3dISPase A2-75 S90A, pET3dISPase A2-75 K145A were originally from the

Dalbey laboratory collection and used for overexpression of the proteins E. coli

SPase A2-75, and its mutants SPase A2-75 S90A and SPase A2-75 K145A,

respectively [37]. Peptide-based inhibitors (see inhibitor structures in Figure 2.7,

2.8 and 2.9, and please note that a nonpeptide small-molecule, a sultam/morpholino inhibitor is included) were obtained from the drug company

Basilea Pharmaceutica Ltd, Basel, Switzerland. Unless otherwise mentioned, all other chemicals were purchased from Sigma-Aldrich, Anachemia, BioShop and

CALEDON.

2.2.2 Overexpression of Protein

SPase A2-75 was expressed in the Escherichia coli BL21 (DE3) strain using the plasmid pET3d. E. coli strain BL21 (DE3) bearing the plasmid pET3dISPase A2-75 was streaked on a Luria-Bertani (LB) agar plate supplemented with ampicillin (100 pglml). The selection plate was incubated at

37" C overnight for fresh transformant single colonies. A single fresh colony of E. coli BL21 (DE3) was picked up and transferred into 50 mL LB medium with ampicillin (100 pglml) and was grown overnight at 37" with shaking at 250 rpm.

Ten millilitres of this overnight cell culture was inoculated into 1 L fresh LBI ampicillin (100 pglml) medium and cultured at 37" C with shaking at 250 rpm.

When the optical density (OD6()()) of the cell culture had reached a value of 0.6 to 0.8, the protein overexpression was induced with an addition of isopropyl-P-D- thiogalactopyranoside (IPTG) (BioShop Canada Inc.) to a final concentration of

0.5 mM. The incubation was continued for 3 hours. The recombinant cells were pelleted via centrifugation at 5,000 rpm, 4•‹C for 10 minutes. The overexpressed protein accumulated as inclusion bodies in the bacterial cytoplasm. Protein expression was examined using 12% sodium dodecyl sulphate polyacrylamide gel electrophoresis (SDS-PAGE). The SDS-PAGE gel was stained using

Coomassie Brilliant Blue R-250. The low molecular weight markers were loaded on the same gel for identifying SPase 62-75 (Figure 2.2). The induced recombinant cells then were frozen and stored at -20•‹C.

2.2.3 Isolation of Protein Inclusion Bodies

The frozen induced recombinant cell pellet from a 4L expression culture were resuspended in 15 mL of lysis buffer (20mM Tris-HCI, pH 7.4, 0.1M NaCI, and ImM EDTA), and broken by passing them through a French Press cell at

10,000 psi 5-7 times. The inclusion bodies (pink precipitate in pellet) were separated from other soluble proteins and cell debris by centrifugation for 5 minutes at 4 OC, 5,000 rpm. The relatively pure inclusion bodies can be achieved by washing the cell lysate at least four times with 30mL washing buffer (20mM

Tris-HCI, pH 7.4, l0mM EDTA, and 0.5 % Triton X-100). Each aliquot of supernatant and resuspended inclusion bodies were subjected to 12% SDS-

PAGE (Figure 2.3). 2.2.4 Solubilization of Inclusion Bodies and Protein Refolding

The inclusion bodies were dissolved (200mg, wet weight) in 100 mL denaturing buffer (4M guanidine hydrochloride in 20mM Tris-HCI, pH 7.4 buffer), and stirred at room temperature. The denatured protein was refolded by diluting it into the refolding buffer (20mM Tris-HCI, pH 7.4, 10mM EDTA, and 0.5 % Triton

X-100) at 1:3 (protein:buffer, VN)ratio. The protein solution was maintained at room temperature with gentle stirring for 2 hours. This protein solution was centrifuged at 15,000 rpm (JA-17, Beckman) for 1 hour to remove any protein aggregates. The supernatant was filtered through glass wool. The soluble protein was dialyzed against 9L pre-chilled dialysis buffer (20mM Tris-HCI, pH 7.4, and

1.7mM Triton X-100) at 4 "C. The dialysis buffer was changed three times.

2.2.5 Protein Purification by Column Chromatography

Ion exchange chromatography and the gel filtration chromatography were used to further purify SPase 02-75 SPase A2-75 is negatively charged at pH 7.4 because of it's pl 5.6 [37]. Therefore, SPase A2-75 binds to anion-exchange resins and flows through cation-exchange resins.

A pre-packed cation-exchange column with S Sepharose Fast Flow resin

(Amersham Biosciences) was used to allow other positively charged contaminants to bind. Ten millilitres of SP Sepharose was placed into the column. The resin was washed with 20 times the column volume of distilled water and equilibrated with 10 times the column volume of dialysis buffer. The dialyzed protein solution was gently loaded onto the column. The flow through containing protein SPase A2-75 was collected. The flow through from the SP Sepharose cation-exchanged column was directly loaded to another self-packed anion-exchange column with Q Sepharose

Fast Flow resin (Amersham Biosciences). The resin was washed and equilibrated as description above. After the protein is loaded, 10 times the column volume of dialysis buffer was added to wash away any contaminants.

The protein was eluted by applying a high salt solution (30 mL) of 0.7M NaCl in

20mM Tris-HCI, pH 7.4 and 1.7mM Triton X-100. The eluted protein was desalted by dialysis (1L) with three changes as described above. The protein was concentrated by using an ultrafiltration concentrator (Amicon) through an

YM-10 membrane (Amicon, 10 KDa molecular weight cutoff, millipore) and the purification result was visualized using 12% SDS-PAGE stained with Coomassie

Brilliant Blue R-250 (Figure 2.6A).

The relatively pure protein sample was applied on a pre-packed S-100 Gel filtration column (~i~re~~~26/60, sephacrylTM S-100, Amersham Biosciences) run by an AKTA Prime system (Pharmacia). The mobile phase was 100mM

NaCI, 20mM Tris-HCI, pH 7.4, and 1.7mM Triton X-100. The column was equilibrated with the mobile phase before the sample injection. The protein sample (2.5mglml) was pre-treated by centrifugation for 20 minutes at 4 OC,

14,000 rpm (JA-17, Beckman) and filtered through a 0.22pm filter. The flow rate was 1 mllmin and the collecting fraction size was 3 ml (Figure 2.4). The column eluent was resolved using 12% SDS-PAGE (Figure 2.5 and 2.66). The fractions containing SPase A2-75 were pooled and concentrated to 10-20 mg/ml using a centrifugal membrane filter tube (Millipore, with a molecular weight cutoff of 10 KDa). The concentrated protein sample was loaded on a pre-packed desalting column (HiTrapTM,5ml, Sephadex TM G-25 Superfine, Amersham Bioscience) for buffer exchange using the buffer: 20mM Tris-HCI, pH 7.4, and 1.7mM Triton X-

100. Pure SPase A2-75 samples were stored at -80 "C in an aliquot size of 100yl until use. The concentration of the protein was determined by measuring the absorbance at 562 nm, using BCA~~Protein Assay Kit (Pierce, USA).

Bicinchoninic acid (BCA) assay for protein concentration quantitative measurement is based on the reduction of cu2' to CU" by protein in alkaline solution. The peptide bonds and residues including cysteine, trptophan, and tyrosine are oxidized and are responsible for reducing cu2' to CU". Two bicinchoninic acid (BCA) molecules react with one CU" to form a complex with purple colour. This water-soluble purple complex has a strong absorbance at 562 nm and the absorbance is approximately linear proportional with protein concentration at a broad range (25-2000 pglml), according to the instruction

(from Pierce, USA). The standard curve was obtained by prepare a series of known concentrations (25-2000 pglml) of bovine serum albumin (BSA) and the concentration of each unknown is determined based on the standard curve.

Briefly, a working reagent was made by mixing reagent A and B at (A: B = 50:l

VIVratio). The reactions were set up by adding 400p1 working reagent into 50pl protein sample (working reagent: sample at 8:l vlv ratio) and incubated at 37 "C for 30 min. The absorbance for each sample was measured using UV spectrometry (Agilent) at wavelength at 562 nm and subtracted against the UV absorbance of a blank buffer solution (20mM Tris-HCI, pH 7.4, and 1.7mM Triton X-100). The re-solubilized and purified protein was active tested by enzymatic

activity assay based on PONA processing.

2.2.6 Searching for Initial Co-crystallization Conditions of SPase I with Peptide-based Inhibitors

Pure proteins corresponding to of SPase A2-75, SPase A2-75 S90A, and

SPase A2-75 K145A were prepared using the methods described above. The

inhibitor (10mM) was dissolved in dimethyl sulphoxide (DMSO) and added into

the protein (10-20mg/ml, O.3577-0.7I55mM) at approximately 1:I mole ratio. The

proteinlinhibitor complex solution was maintained on ice for one hour and then

stored at -80•‹C.

All initial co-crystallization trials (including crystallization trials for SPase

A2-75 S90A and SPase A2-75 K145A) were set up via the sitting-drop vapour-

diffusion method at room temperature. The 24 well sitting-drop plate (Hampton

Research) was used. Nearly 200 initial crystallization conditions were searched for each proteinlinhibitor complex using commercial Hampton Research

Crystallization screen 1 (50 reagents), screen 2 (48 regents), MembFct screen

(48 regents) and PEGllon screen (48 reagents). Mixtures of the proteinlinhibitor complex solution and the same volume of reservoir reagent (1p1 to 1p1) were equilibrated against the reservoir solution (0.5 mL).

2.2.7 Inhibitor Soaking into Preformed Crystals

The crystals of apo-enzyme SPase A2-75 (10-20mglml) were grown at similar conditions reported previously. The optimal growth condition for all batches of SPase A2-75 in this study was 0.4-0.5 M of ammonium dihydrogen phosphate, 0.1 M sodium citrate, pH 5.6. A drop size was 1pI of protein and 1p1

of reservoir solution and the reservoir solution volume was Iml. The average size

of the crystals are 0.25 x 0.25 x 0.4 mm.

The inhibitor was dissolved in DMSO (10mM) then diluted (1:5 VNratio)

into the mother liquor growth condition with higher ionic strength of precipitant

concentration (0.8M ammonium dihydrogen phosphate, 0.1M sodium citrate, pH

5.6). Two microliters of inhibitor soaking solution was added into crystals to

achieve a mole ratio of inhibitor to protein at least 5 times. Soaking time was

varied from 3 min to one week.

2.2.8 X-ray Diffraction Data Collection and Preliminary X-Ray Diffraction Analysis

X-ray data were collected for the complex of protein with peptide-based

inhi bitors by using the SFU Macromolecular X-ray diffraction Data collection

Facility (Rigaku-MSC, USA). Before data collection, the crystal was transferred

by a pipet from the growth drop to a cryoprotectant for 0.5 to 2 min. The

cryoprotectant was generally made of the crystallization mother liquor with 30%

glycerol (VN, replacing 30% of the water). A single crystal was picked up with a

nylon loop and flash-cryo-cooled by directly placing it into a nitrogen stream at

100K. The X-rays (wavelength 1.5418 a) from Cu Ka radiation generated via a

Rigaku MicroMax-007 Microfocus X-ray rotating-anode generator running at

40Kv and 20mA and equipped with Osmic Confocal VariMax High Flux optics.

Each diffraction image was recorded on a R-AXIS IV++ imaging-plate detector with a 0.5" oscillation. The crystal-to-detector distance was typically set to 200- 250mm. Data were collected, indexed, integrated, and scaled using the program

CrystalClear (unless otherwise specified) [I341.

2.3 Results and Discussion

2.3.1 Purification of SPase A2-75,SPase A2-75 S90A, and SPase A2-75 K145A

Purified proteins corresponding to SPase A2-75, SPase A2-75 S90A, and

SPase K145A have been obtained from a number of E.coli culture batches (4-12 liters each batch). Doing multiple batch protein purification is necessary because each batch of protein behaves slightly differently although the same purification procedure is performed each time. These proteins were isolated from inclusion bodies in high protein concentration level (Figure 2.2, the data are not shown for

SPase A2-75 S90A and SPase K145A. There is no noticeable difference in the protein expression level for the mutants). Inclusion bodies are insoluble dense aggregations of misfolded polypeptide. Formation of inclusion bodies requires the extra steps of solubilization and refolding during the protein purification process.

Sometimes refolding can cause a problem if inclusion bodies could not be denatured and refolded properly into their 3D native structures [I35, 1361.

However, an advantage of forming inclusion bodies is that pure protein can be largely achieved by one-step centrifugation because of the high density of inclusion body 1381. In the case of the protein SPase A2-75 and its mutants, denatured inclusion bodies could be solubilized and refolded by the dilution method [39]. After several washes of the inclusion bodies, the protein was relatively pure (Figure 2.3). Further chromatographic purification enhanced protein purity (Figure 2.4, 2.5, and 2.6A). The purity of all three proteins was estimated using SDS-PAGE and was found to be homogeneous (Figure 2.68).

The expression, denature, refolding, and purification of mutant proteins behaved similarly as SPase A2-75.The purity and the yield for each batch of mutant protein was similar to that of SPase A2-75.The data are not shown for SPase

A2-75 S90A and SPase K145A.

1234567 IPTG +++-

Figure 2.2: SDS-PAGE analysis of SPase A2-75 overexpression Lysates were obtained from E. coli BL21 (DE3) cells containing the plasmid pET3dISPase A2-75 after expression was induced with 0.5 mM IPTG. Lane 1, low range molecular weight markers (113, 91, 50, 35, 28, and 21 KDa); lanes 2-4, with IPTG induction; Lanes 5-7 without IPTG induction. The arrow indicates the expressed SPase A2-75 (-28 KDa). Samples were subjected to 12% SDS-PAGE and stained with Coomassie Brilliant Blue R-250. Figure 2.3: The isolation of SPase A2-75 inclusion bodies Lane 1, low range molecular weight markers (113, 91, 50, 35, 28, and 21 KDa); Lane 2, the contaminators in the supernatant of cell lysate. Lanes 3-10, four washes of inclusion bodies (following the wash order, the odd number was the supernatant and the even number was resuspended inclusion body after each wash).The arrow indicates the SPase A2-75 (-28 KDa). Samples were subjected to 12% SDS-PAGE and stained with Coomassie Brilliant Blue R-250.

Interestingly, in the gel filtration chromatography, SPase A2-75 molecules eluted out in the void volume (Figure 2.4) when compared with the column standard curve (data not shown). One explanation for this could be that the final concentration of Triton X-100 (CMC: 0.9 mM, Mr: 631 .O, d: 1.07) was 8.5mM present in the protein sample, therefore, most detergent molecules form as micelles in the solution because the concentration of Triton X-100 is used greater than its CMC. There is evidence that SPase A2-75 molecules lacking its two membrane segments can attach to detergent or lipid micelles [37, 591. If more than one SPase A2-75 molecules attached to each micelle, then SPase A2-75 will appear to have a much larger molecular mass than its monomer molecular weight. The apparent high molecular mass of the SPase 1 is not due to misfolded protein aggregation, since it was active against PONA (data not shown). m .. b 50' 10: 200 min.

Figure 2.4: The gel filtration profile of SPase A2-75 The mobile phase was 100mM NaCI, 20mM Tris-HCI, pH 7.4, and 0.1 % Triton X- 100. The flow rate was 1 ml/min and the fraction size was 3 ml. Peak 2 is the pure SPase 02-75. Peaks I, 3, 4 are contaminants. The column used was a pre-packed S-100 Gel filtration column (~i~re~~~26/60, sephacrylTM S-100, Amersham Biosciences).

Figure 2.5: SDS-PAGE analysis of gel filtration chromatography. Lanes 1 and 20, low range molecular weight markers (1 13, 91, 50, 35, 28, and 21 KDa); Lanes 2-14, the pure SPase 02-75 (peak 2), Lanes 15-19, the emerging contaminant (peak 3). The arrow indicates the SPase 02-75 (-28 KDa). Samples were subjected to 12% SDS-PAGE and stained with Coomassie Brilliant Blue R- 250. Figure 2.6: The purification of SPase A2-75 A. Lanel, low range molecular weight markers (1 13, 91, 50, 35, 28, and 21 KDa); Lane 2, Purified SPase A2-75 after Q anion-exchange column chromatography. B. Lane 1, Broad range molecular markers (209, 124, 80, 49, 35, 29, 20, and 7 KDa); Lane 2, Purified SPase A2-75 after gel filtration column chromatography and it was the protein sample for the crystallization. The arrow indicates the SPase A2-75 (-28 KDa). Samples were subjected to 12% SDS-PAGE and stained with Coomassie Brilliant Blue R-250.

2.3.2 Crystallization Conditions for SPase A2-75 and SPase A2-75 S90A and the Result of Soaking Inhibitors

Previously, SPase A2-75, SPase A2-75 S90A, and SPase A2-75 K145A were kinetically characterized 137, 541. However, only SPase A2-75 has been

crystallized and structurally analysed [18]. In this work, all batches of purified

SPase A2-75 produced large crystals up to 0.25 X 0.25 x 0.5 mm in size. The

growth condition used was 0.4-0.5 M ammonium dihydrogen phosphate (NH4)H2

PO4, O.1M Na Citrate pH 5.6, which was slightly different from the condition 0.7M

(NH4)H2PO4 and 0.1 M Na Citrate pH 5.6 in previous reports [38, 411. Soaking

new inhibitors with pre-grown SPase A2-75 crystals for x-ray data collection was

unsuccessful due to poor diffraction (resolution > 3.5A). SPase A2-75 K145A could not be crystallized even though a broad range of conditions were tried (Hampton Research, Crystal screen 1 and 2, MembFct screen and PEGIlon screen). Four crystallization hits were found for SPase A2-75 S90A (Table 2.1).

The crystal quality of SPase A2-75 S90A was significantly improved when complexed with an inhibitor Ro-66-0771. The diffraction resolution from these crystals is approximately 3.5A, using the SFU MacromolecularX-ray diffraction

Data collection Facility. Soaking inhibitors for x-ray data collection could not be performed because of the small size crystals of SPase A2-75 S90A.

Table 2.1: The initial crystallization conditions of SPase A2-75 S90A Initial Conditions Crystal Form 1 I #3 of Screen 1 rectangular- 0.4 NH4 H2 Po4 #22 of Screen 1 needle clusters 0.2 M NaOAC.3H20 0.1 M Tris HCI pH 8.5 30% w/v PEG 4000 #I9 of MembFac rectangular 0.1 M MgC12 6H20 0.1 M Na3 Citrate dihydrate pH 5.6 4% v/v 2-Methyl-2,4-pentanediol 20# of PEGllon long needles 0.2 Mg Forrnate 20% PEG 3350

2.3.3 Co-crystallization Conditions for SPase I in Complex with Different Peptide-based Inhibitors

Pure protein SPase A2-75 and SPase A2-75 S90A were used for co- crystallization with various peptide-based inhibitors obtained from the drug company Basilea Pharmaceutica Ltd, Basel, Switzerland. The optimal co- crystallization conditions were discovered for nine proteinlinhibitor complexes

(Table 2.2). The discovery of co-crystallization conditions for individual protein- inhibitor complex followed two steps, (i) screening for initial hits and (ii) refining those hits to produce large and well diffracting crystals, Initial searches for co- crystallization conditions began with Crystal screen 1 and 2, MembFct and

PEGllon (Hampton research). The optimal condition for each complex was explored based on several grid-screens: a wide grid of pH ranged (4.5 to 9.0, using different types of buffers), a wide grid of precipitant (i.e. if the precipitant is

PEG, various types and various concentrations of PEGS were used), detergents, and additives (salts, sugar, and alcohol). In addition, after testing temperature effect on the crystal quality by placing crystal plates at 4"C, 18"C, 20•‹Cand room temperatures, 20•‹Cwas found to be the best environment for these crystals to grow.

2.3.4 Preliminary X-ray Crystallographic Analysis of E. coli SPase I in Complex with Peptide-base Inhibitors

The co-crystallization of the complex of E. coli SPase I with peptide-based inhibitors produced crystals of different morphology and in different crystallographic space groups (Table 2.2). Most of complex crystals were of tetragonal form and had a large unit cell. Because of the large unit cell, the diffraction spots were very close to each other in reciprocal space, leading to difficulty in improving the resolution of the data. Synchrotron radiation at short wavelength will help to solve this problem.

The crystals of SPase A2-75 complexed with glyco-lipopeptide BAL4850C

(Figure 2.7) belonged to the tetragonal space group P432,2, with unit cell parameters a = 72.0, b = 72.0, c= 262.6 8, and two molecules in the asymmetric unit (AU) (Table 2.2). The ternary complex crystals of SPase A2-75 and two inhibitors Arylomycin A2 and a sultam/morpholino BAL 001 91 93 also belonged to the tetragonal space group P4,2,2, with unit cell parameters a = 69.0, b = 69.0, c= 258.3 A, and a, p, y = 90•‹,and two molecules in the AU (Table 2.2). Another data set (from synchrotron) for this ternary complex crystals belonged to the tetragonal space group P43212,with unit cell parameters a = 70.01 a, b = 70.01 a, and c = 259.89 a, and a, P, y = 90•‹, and two molecules in the AU (see Chapter

4). Glyco-lipopeptide-based inhibitor BAL 4850C MW 1043

Lipopeptide-based inhibitor Arylomycin A2

'\ D-M~s~;. D-Ala Giy \\\ MeHpg \\\ L-Ala , L-Tyr '\, \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ I I I I I \ 1 I I I I I I I iso-C12 fatty acid

Morpholino derivative BAL0019193 MW 220

Figure 2.7: The structures of inhibitors of bacterial SPase l(1) Glycolipopeptide inhibitor BAL 4850C, lipopeptide inhibitor Arylomycin A2, and a sultarn/rnorpholino BAL0019193. The X-ray crystallographic analysis of SPase A2-75 in complex with glyco-

lipopeptide inhibitor BAL4850C, and in the ternary complex with two inhibitors

Arylomycin A2 and BALOOI9193 are completed and two structures were solved

(details in Chapter 3 and 4).

Some co-crystals are yellow in colour (due to the yellow colour of the

inhibitor, possibly, due to their azobenzene functional group (see Figure 2.8 and

2.9). Crystals of SPase A2-751 Ro 67-541 2 were grown under the same

crystallization conditions (10% wlv PEG 3350, 0.1M (NH4)2H Citrate, 0.05M Na

Citrate pH 5.6) but with variation in additives. Different additives such as 5 mM

NaCI, or 5% ethanol, gave various morphologies and unit cells (Table 2.2). The

various crystals diffracted to a resolution 2.5A. Interestedly, the co-crystallization

conditions for inhibitor Ro 67-5412, Ro62-9327, Ro 62-0771, and Ro67-5337

complexed with SPase A2-75 are extremely similar. We hypothesized that this

phenomenon is due to the structural similarity of these inhibitors (Figure 2.8).

Similarly, the co-crystallization conditions for inhibitor Sf3 299679 and Sf3 273461

complexed with SPase A2-75 are identical, probably owing to the same reason

(Table 2.2 and Figure 2.9).

Preliminary molecular-replacement (MR) for phase solutions of the

complexes of SPase A2-75 and the inhibitors Ro 67-5412, Ro62-9327, Ro 62-

0771 is in progress. The space groups and other parameters for these complexes are indicated in the table 2.2 and will not be described here. Figure 2.8: The structures of inhibitors of bacterial SPase 1 (2) Peptide-based inhibitors Ro-62-9327/001! Ro 67-5412/001, Ro 66-0771/001. Figure 2.9: The structures of inhibitors of bacterial SPase l(3) Peptide-based inhibitors Ro 67-53371001, SB-299679, SB-273461 Table 2.2: Crystals of E. coli SPase I in complex with peptide-based inhibitors Name of the Complex X-ray Data and Optimal Condition Collection Statistics SPase A2-751BAL 4850C Space Group P43212(#96) Unit Cell Dimensions (A) a=72.0 b=72.0 c=262.6 22% wlv PEG 4000 Vm (A31~a) 3.03 0.2 M KC1 # ~ole.1~~ 0.025 M n-Decyl-P-D-maltoside Resolution (DW No. of Total Reflections Drop: 2pL mix.+ 2 pL res.+ 2 pL No. Of Unique Reflections DDM Rrnerge Mosaicity(") Completeness (%) Redundancy Mean //a (I)

Ternary SPase A2-751 Space Group P4?2?2(#92)

IArylomycin A21a Unit Cell Dimensions (A). , a=69.0 b=69.0 c=258.3 sultamlmorpholino inhibitor Vm (A31~a) BALOOlgl93 # Mole.lAU 25% w/v PEG 2000 Resolution 0.1 M (NH4) Formate No. of Total Reflections 0.1 M Cacodylated pH 6.5 No. Of Unique Reflections 5% tert-Amylalcohol Rmerge DroplpL mix.+ 1 pL res. Mosaicity(") - ~r Completeness (%) Redundancy Mean Vo (I)

Mix: Proteinlinhibitor (Molar ration 1:l) mixture Res: Reservoir Vm: The crystal volume per unit of protein molecular weight (Volume of asymmetric uniffm.~.)(A31~a) # Mole.lAU: Number of molecules in one asymmetry unit R,,, R,,, = Z 1 llo,il- Ilaw,iI 1 1 Z IIave,,I,where is the average structure factor amplitude of reflection Iand lo,,represents the individual measurements of reflection Iand its symmetry equivalent reflection. Values in parentheses correspond to the highest resolution shell. Table 2.2: Crystals of E. coli SPase I in complex with peptide-based inhibitors (Continued) Name of the Complex and Optimal Condition ~ollectionStatistics SPase 62-751Ro 67-5412 Space Group P42,2 (# 90) Unit Cell Dimensions (A) a=112.3 b=112.3 c=197.8 10% W/V PEG 3350 Vm (A31Da) 2.79 0.1 M (NH4)2 H Citrate # Mole.lAU 4 0.05 M Na Citrate pH 5.6 Resolution 44.77-2.52 (2.62-2.52) 5% ethanol No. of Total Reflections 291675 Drop: 1pL mix.+ 1 pL res. No. Of Unique Reflections 42220 Rrnerge 0.129 (0.383) Mosaicity(") 0.3 Completeness (%) 97.0 (98.5) Redundancy 6.91 (7.59) Mean Uo (I) 9.3 (5.0)

Space Group P422(#89) Unit Cell Dimensions (A) a=112.0 b=112.0 c=296.0 10% W/V PEG 3350 Vm (A31Da) 0.1 M (NH4)2 H Citrate # Mole.lAU 0.05 M Na Citrate pH 5.6 Resolution Dro~:luL mix.+ 1 pL res. No. of Total Reflections No. Of Unique Reflections Rrnerge Mosaicity(") Completeness (%) Redundancy Mean I/@(I) - SPase 62-751~067-5412 Space Group Unit Cell Dimensions (A) 10% W/V PEG 3350, Vm (A31~a) 0.1 M (NH4)2 H Citrate, # Mole.lAU 0.05 M Na Citrate pH 5.6) Resolution Drop: 1pL mix.+ 1 pL res. No. of Total Reflections Crystals were soaked in 2 uL of No. Of Unique Reflections 2mM inhibitor solution Rrnerge Mosaicity(") Completeness (%) Redundancy Mean Uo (I) Table 2.2: Crystals of E. coli SPase I in complex with peptide-based inhibitors (Continued) Name of the Complex X-ray Data and Optimal Condition Collection Statistics SPase A2-751Ro 62-9327 Space Group P41212 (#92) Unit Cell Dimensions (A) a=112.2 b=112.2 c=198.2 10% W/V PEG 3350 Vm (A31I3a) 2.79 0.1 M (NH4)2 H Citrate # Mole.lAU 4 0.05 M Na Citrate pH 5.6 Resolution 34.99-2.50 (2.59-2.50) 5% tert-butanol No. of Total Reflections 591730 =2pL mix.+ 2 pL res. No. Of Unique Reflections 44289 Rrnerge 0.143 (0.480) Mosaicity(O) 0.3 Completeness (%) 99.3 (99.5) Redundancy 13.30 (14.17) Mean //a (I) 12.1 (5.9)

Space Group P4 Unit Cell Dimensions (A) a=112.8 b=112.8 c=298.5 10% WlV PEG 3350 Vm (A31~a) 2.83 0.1 M (NH4)2 H Citrate # Mole.lAU 6 0.05 M Na Citrate pH 5.6 Resolution 34.69-2.50 9 (2.59-2.50) 5mM NaCl No. of Total Reflections 351806 Drop: 2pL mix.+ 2 pL res. No. Of Unique Reflections 116550 Rmerge 0.162 (0.331) 'rMosaicityv) 0.6 Completeness (%) 91 (89.5) Redundancy 3.02 (1.80) Mean //a (I) 5.4 (2.5) Table 2.2: Crystals of E. coli SPase I in complex with peptide-based inhibitors (Continued) 1 Name of the Complex I Co-crystals ( and Optimal condition 1 SPase A2-75IRo67-5337 Average Crystal Size: 0.25x0.25x0.40mm 10% W/VPEG 3350 Crystals are very good for 0.1 M (NH4)2 H Citrate X-ray data collection. 0.05 M Na Citrate pH 5.6 5% tert-butanol Drop: 2pL mix.+ 2 pL res. I SPase A 2-75 S90AIRo66-0771 Average Crystal Size: I 0.15x0.15x0.3 mm 0.6 NH4H2P04 Crystals diffract to no 0.1 M Na Citrate pH 4.6 better than 3.5 A using 0.5% iso-amyalcohol SFU x-ray facility. Drop: 2pL mix.+ 2 pL res. I SPase A2-75lSB 273461 Average Crystal Size: 0.09 x 0.09 x 0.2 mm 20% wlv PEG 3350 The size of crystals are not 0.1 M NaSCN big enough for data

0.05 M Na Citrate pH 4.6 I collection using SFU X-ray Drop: 2pL mix.+ 2 pl res. I facility SPase A2-75lSB 299679 Average Crystal Size: 9 0.1 x 0.1 x 0.2 mm 20% wlv PEG 3350 The size of crystals are not 0.1 M NaSCN big enough for data 0.05 M Na Citrate pH 4.6 collection using SFU X-ray Drop: 2pL mix.+ 2 pL res. facility

The crystals of SPase A2-75 with inhibitors Ro 67-5337, SB 299679 and

SB 273461, and SPase A2-75 S90A with inhibitor Ro 62-0771 were too small to be suitable for data collection or resulted in poor diffraction. More efforts have to be invested in obtaining bigger and higher quality crystals for these complexes

(Table 2.2 and Figure 2.9). The research progress on each SPase Illnhibitor complex is summarized in Table 2.3. Table 2.3: The progress on each complex of SPase Ilinhibitor

State Crystal Soaking Optimized X-ray Data Phase Refine- Crystal & Co- Crystallization Collection, ment Structure crystal Condition index, and scale (R) 1 2 3 4 5 6 SPase A2-75 BAL4850C d X d d (2.4 A) d d d~w225 Rf 26.0 Arylomycin A2 4 X d d(2.0A) d d d~w20.7 BAL0019193 Rf 25.0 Ro-62-9327 d X q d(2.5~)X Ro-67-5412 d X d d (2.5~) X Ro-66-0771 d X d d (2.5A) X Ro-67-5337 d X d d (2.5A) X SB-299679 4 X d SB-273461 d X d Ro-66-0233 X X Ro-63-2961 X X SPase A2-75S90A Apo-enzyme d n.a. d X Ro-66-0771 4 X d 4(3.5 A) penem d SPase A2-75K145A Apo-enzyme x n.a.

Numbers indicate the order of the progress (From left to right).

4:Success X: Not success R: Resolution in A n.a. not applied Blank cell indicates that the work has not done yet. CHAPTER 3: CRYSTAL STRUCTURE OF SPASE A2-75 IN COMPLEX WITH A GLYCO- LIPOHEXAPEPTIDE

3.1 Introduction

Lipopeptide-based inhibitor BAL4850C is a glyco-lipohexapeptide, a natural product isolated from a microbial source and is a novel inhibitor of bacterial SPase I (the communication of Basilea Pharmceutica Ltd). It consists of three structural features: a hexapeptide, an unsaturated fatty acyl chain, and a deoxy-a-mannose sugar (Figure 3.1). The first two residues of the peptide are in the D-stereochemistry. The amino acid residue MeHpg is N-methyl-4- hydroxyphenylglycine. Three residues (MeHpg - L-Ala - L-Tyr) of the hexapeptides form a single ring via a (3, 3)-binary cross-linkage of the ortho- carbon atom of MeHpg phenol ring and the ortho-carbon atom of Tyr phenol ring.

The 17-carbon unsaturated fatty acyl chain (one double bond between carbons

56-57) is attached via an amide bond to the amino terminus (D-MeSer) and the sugar is attached to MeHpg phenol ring at 025. BAL4850C is a colourless powder and has a molecular weight of 1043 Da.

BAL4850C is one member of a glyco-lipopeptide family identified as potent inhibitors of bacterial SPase 1 [121]. Kinetic analysis showed that this family of glyco-lipopeptides are competitive inhibitors and have Ki from 50 to 158 nM to bacterial SPase 1 [121]. In addition, these glyco-lipopeptides display modest inhibition against the growth of clinical bacteria such as Streptococcus pneumonniae (MIC 8-64 pM) and E. coli (MIC 4-8 pM) [121]. MIC is the minimum concentration of compounds that completely inhibit the growth of bacterial.

Figure 3.1: The structure of a glyco-lipohexapeptide inhibitor of bacterial SPase I (BAL 4850C)

Interestingly, except for differences in the fatty acids between BAL4850C and Arylomycin A2, the only difference left between the two structures is that

BAL4850 has an extra sugar group attached. X-ray crystallographic studies of

SPase A2-75 in complex with a novel natural inhibitor, a glyco-lipohexapeptide

BAL4850C, reveal the non-covalent interactions involved between the protein and the inhibitor. In this study, the goal was to observe the interactions between a glyco- lipopeptide inhibitor BAL 4850 and SPase A2-75 using X-ray crystallographic analysis. In particular, it is critical to know whether the deoxy-a-mannose sugar group of BAL4850C plays any role in the inhibition for SPase A2-75. Here, the crystal structure of E. coli SPase A2-75 in complex with a glyco-lipopeptide

SPase I inhibitor (BAL4850C) at 2.44 A resolutions is presented. This structure provided insights into the binding mode for this natural product inhibiton of essential enzyme type I signal peptidase.

3.2 Materials and Methods

3.2.1 The Complex of SPase A2-75 and BAL 4850C

SPase A2-75 was expressed and purified using the methods described in the section 2.2.2-2.2.5of Chapter 2, which are modified from a previous study

[38]. The inhibitor BAL 4850 was provided by Basilea Pharmceutica Ltd (Basel,

Switzerland). Prior to co-crystallization, SPase A2-75 (18.0 mglml in the buffer of

20mM Tris-HCI and 8.5mM Triton-1 00) was combined with BAL 4850C (10 mM in DMSO) at 1:l molar ratio and incubated on ice for one hour.

3.2.2 Co-crystallization of SPase A2-75 with BAL 4850C

Initial co-crystallization trials for SPase A2-75 in complex with BAL 4850C were carried out by the sitting-drop vapour-diffusion (see the section 2.2.6 in

Chapter 2). Over 40 initial conditions produced sort of crystals. Among these initial crystallization conditions, seven were from Hampton Research Screen 1, three from Screen 2, five were from Membfac, and twenty-nine were from PEGIlon (Table 3.1). The initial crystals were formed in different morphologies such as needles, rectangle, disks and dots. Most of the crystals at this stage are either small or irregular in shape. We choose to optimize a promising condition of

20% PEG 3350, 0.2 M KC1 (PEGIlon '8). Further conditions explored using a grid of pH ranging from pH 4.6 to 9.0 and shown has no improvement in crystallization. Rather, a grid of PEGS 3350, 4000, and 6000 with a range from

14% to 24% raised in units of 2% resulted in bigger and more ordered looking needle shape crystals. Based on this condition of 22% PEG 4000, 0.2M KCI, different additives including alcohols and salts were added but did not improve the crystal size or appearance. The addition of the detergent, n-Dodecyl P-D- maltoside (DDM, CCM 0.17mM)) resulted in improving the size of the crystals.

DDM used was 0.025M. The final optimized reservoir condition at which produced high quality crystals for data collection was 22% PEG 4000, 0.2M KCI,

0.025M DDM. The drop was made up of 2pl mixture of protein and inhibitor, 2pl reservoir solution, and 2p1 0.025M DDM over Iml reservoir at temperature 20•‹C.

The crystals (average size 0.2 x 0.1~0.5mm) were formed from a light precipitate after approximately two weeks. Table 3.1: The initial co-crystallization conditions of SPase A2-75 in complex with glyco- lipohexapeptide inhibitor (BAL4850C) Screen # Salt Buffer Precipitant kit Screen 1 6 0.2M MgC12,6H20 0.1 M Tris-HCI pH 8.5 30% wlv PEG 4000 Screen 1 9 0.2M NH40AC 0.1M Na3 Citrate 2H20 pH 30% wlv PEG 4000 5.6 Screen 1 10 0.2M NH40AC 0.1M NaOAC.3H20pH 4.6 30% wlv PEG 4000 I I I I Screen 1 1 20 1 0.2M NH4S04 I OIM NaOAC.3H20pH 46 1 25% w/v PEG 4000 I I I I Screen 1 / 40 1 none 1 0.1M Na3 Citrate.2H20-.pH / 20% w/v PEG 4000 1 5.6 1 20% v/v iso-propanol Screen 1 1 41 1 none / 0.1 M HEPES.Na pH 7.5 1 20% w/v PEG 4000 10% V/Viso-propanol Screen 1 42 0.05M KH2P04 none 20% wlv PEG 8000 I I I I Screen 2 23 1.6M (NH4)2S04 0.1M MES pH 6.5 10% V/VDioXane I 1 I I Screen 2 1 26 1 0.2 (NH4J2S04 1 0.1M MES pH 6.5 / 30% wlv PEG MME

Screen 2 43 0.2M NH4H2P04 0.1 M Tris pH 8.5 50% MPD MembFac 1 2 1 0 1MZn(OAC)2,2H20 I 0.1M NaOAC.3H20 pH 4.6 1 12% wlv PEG 4000 I I I I MembFac 7 none 0.1M NaOAC.3H20 pH 4.6 1.OM MgSO4.7H20

I 1 I I MembFac 1 13 1 0.1M Li2S04.H20 1 0.1M Na3 Citrate.2H20 pH 1 12% wlv PEG 4000 / 5.6 MembFac ( 16 1 none I 0.1 M Na3 Citrate.2H20 pH 1 1.OM MgS04,7H20

MembFac 17 0.1M NaCl 0.1 M Na3 Citrate.2H20 pH 12% w/v PEG 4000 56- - , I PEGIlon 1 1 I 0.2M NaF I none 1 20% wlv PEG 3350 PEGIlon 2 0.2M KF none 20% w/v PEG 3350 I I I , PEGIlon 1 3 1 0.2M NH,F 1 none 1 20% wlv PEG 3350 PEGIlon 4 0.2M LiCl none 20% w/v PEG 3350 I I I I PEGIlon 1 5 1 0.2M MgCI2.6H20 none / 20% w/v PEG 3350 PEGIlon 1 6 1 0.2M NaCl I none 1 20% wlv PEG 3350 PEGIlon 7 0.2M CaCI2.2H20 none 20% w/v PEG 3350 PEG/lon 8 0.2M KC1 none 20% wlv PEG 3350 PEG/lon 9 0.2M NH4CI none 20% w/v PEG 3350 PEGIlon 10 0.2M Nal none 20% wlv PEG 3350 PEGIlon 11 0.2M KI none 20% w/v PEG 3350 PEGIlon 12 0.2M NH41 none 20% w/v PEG 3350 Table 3.1: The initial co-crystallization conditions of SPase A2-75 in complex with glyco- lipohexapeptide inhibitor (BAL4850C) (Continued) Screen # Salt Buffer Precipitant kit PEGIlon 15 0.2M LiN03 none 20% wlv PEG 3350

I I I PEGIlon 17 0.2M NaN03 none 20% w/v PEG 3350 , PEGIlon 1 18 / 0.2M KN03 none / 20% w/v PEG 3350 I I I I PEG/lon 20 0.2M Mg Formate none 20% wlv PEG 3350

I I I 1 PEGIlon 23 0.2M NH, Formate none 20% w/v PEG 3350

I I I I / PEGIlon / 25 1 0,2M Mg(OAC)2.4H20 none / 20% w/v PEG 3350 I I I I PEGIlon / 27 ( 0.2M NaOAC.3H20 none 1 20% w/v PEG 3350 I I 1 PEGilon c ( 28 / 0.2M Ca(OAC),2H20 I none / 20% wlv PEG 3350 1 I I I I / PEGIlon 1 29 1 0.1M KOAC 1 none 1 20% w/v PEG 3350 1 PEGIlon 1 30 ( 0.2M NH,OAC none 1 20% w/v PEG 3350 PEGIlon 36 0.2M Na2 Tartrate.2H20 none 20% wlv PEG 3350

PEGIlon 38 0.2M (NH4)2 Tartrate none 20% w/v PEG 3350

I I PEGIlon 1 43 / 0.2M NH, H2PO4 none 1 20% w/v PEG 3350 1 I I I I PEGIlon / 44 1 0.2M (NH4),.HPO4 none 1 20% wlv PEG 3350 I I I 1 PEGIlon 47 0.2M K3 CitrateH20 none 1 20% wlv PEG 3350 I I I I PEGIlon 48 0.2M (NH,)2H Citrate none 20% w/v PEG 3350

3.2.3 X-ray Diffraction Data Collection

X-ray diffraction data were collected from SPase A2-75 crystals in the complex with BAL 4850C by using the SFU Macromolecular X-rays diffraction

Data collection Facility (Rigaku-MSC, USA) (also see the section 2.2.8 in

Chapter 2 for details of X-ray data collection). Before data collection, the crystal was transferred by a pipet from the growth drop to a cryoprotectant composed of

24% vlv PEG 4000, 0.2 M KCI, 30% vlv of 0.025 M DDM, 20% vlv glycerol) for

30s. Single crystal was picked up with a nylon loop (Hampton Research) and flash-cryo-cooled by directly placing it into a liquid nitrogen stream at 100K. The

X-rays (wavelength 1.54188,) were generated from Cu Ka radiation via a Rigaku

MicroMax-007 Microfocus X-ray rotating-anode generator, running at 40Kv and

20mA and equipped with Osimic Confocal VariMax High Flux optics. The crystal-

to-detector distance was set to 200mm. All frames (280) were recorded on a R-

AXIS IV++ imaging-plate detector with 0.5" oscillation angle, with an exposure

time of 240 s per frame. The data revealed significant diffraction to a resolution of

2.44 A (dmi, = 2.44 A>. The mosaicity was 0.3. Data were collected, indexed, and

scaled using the program CrystalClear [I341. The crystals belong to the

tetragonal space group P43212.The unit cell dimensions were determined to be a

= 72.01 A, b = 72.01 A, and c = 262.57 A, and u,P, y = 90". The Matthew's

coefficient (or specific volume V,, the crystal volume per unit of protein molecular weight) is 3.03 A3 /Da for two molecules in the asymmetric unit. The fraction of the crystal volume occupied by solvent was 59.3 %, calculated by the program

Matthews in CCP4i suite of program [I 30, 1371. Diffraction data were collected to

2.44 8,, yielding an R,,,, of 0.105 for 247952 measured reflections of 26680

unique reflections. The data set was 99.6 O/O completeness in the resolution range 32.20-2.44, and 100% completeness in the resolution range 2.53-2.44

(high resolution shell). For crystal and data collection statistics see Table 3.3.

3.2.4 Phasing, Model Building, and Refinement

A molecular replacement solution was found using the program Phaser in

CCP4i suite program [137]. The atomic coordinates used for the search model was Molecule A from a 2.5 8, crystal structure of the SPase A2-75 in complex with Arylomycin A2 (PDB code, 1T7D). The resolution range of the reflection

used in the rotation/translation search was 30 to 38, and the number of copies in

asymmetric unit was two molecules. The topology and parameter files for the

inhibitor BAL4850C were generated using the program PRODRG [138].

Coordinates for the inhibitor BAL4850C was manually and consistently docked

into a clear electron different density (Fo - Fc) near the active site. In addition, the

main chain trace and the side chain assignment for the dynamic regions

corresponding to residue Phe196-Asn200 and Asp304 to Leu 314 were built

manually, by beginning with a nine Ala residues then switching these Ala to the

native residues when there was a clear electron density for the full-sidechain of

the residue. Water molecules were added to well-defined peaks (2.0 a and

greater in F,- F ,maps) found between 2 to 3.8 8, from 0 and N atoms in the

protein. Model building and analysis was performed with the program XFlT in the suite XTALVIEW and Coot [139, 1401. Refinement of the structure was carried out using the program Refmac 5 in CCP4i suite as well as CNS [137, 1411. The cycles of refinement were carried out for both protein model and inhibitor model using rigid body and restrained NCS refinement in program Refmac 5, and simulated annealing, energy minimization, B-factor refinement, were performed in CNS. In addition, a cycle of TLS refinement was carried out using the TLS

Motion Determination Sever and the programme TLS refinement in CCP4i [I37,

1421. In all cycles of refinement, 5% of the reflections were set aside for cross- validation. All data were included in the final round of refinement. Final refinement and analysis statistics of the complex are provided in Table 3.3. 3.2.5 Structural Analysis

The stereochemistry of the structure model was analyzed with the program PROCHECK [143]. Superposition of the structures were calculated by the program Superpose [144]. The measurement of the substrate binding site was calculated using the program CASTp [I451 In CASTp, a solvent probe of

1.4was used.

3.2.6 Figure Preparation

Figures were prepared using program lSlS Draw version 2.5 (MDL

Information System, Inc.), XFlT [139], Raster 3D [I461 and PyMol [56].

3.3 Result and Discussion

3.3.1 A New Co-crystallization Condition

Previous crystallization condition (15% wlv PEG 4000, 20% propanol-I,

0.1M Na Citrate pH 6.0, and 0.5% triton X-100) for the complex of SPaseA2-75

IArylomycin A2 applied directly to the co-crystallization of the complexes

SPaseA2-75lBAL4850C was unsuccessful. It was an unkind fact that such a subtle structural difference (mainly one sugar group) between Arylomycin A2 and

BAL4850C could result in the variation of crystallization conditions. Co- crystallization conditions for the complex SPaseA2-75lBAL4850C were searched for and obtained in the unique condition, which was 22% wlv PEG 4000, 0.2M

KCI, 0.025M DDM. Although the crystal morphology of the complex SPaseA2-

75lBAL4850C (needles) was different from that of SPaseA2-75 IArylomycin A2

(rectangular), they share the same space group P43212and similar unit cell dimensions. Notably, the new detergent DDM was found to be very beneficial for the formation of larger needle shaped crystals. The final optimized crystals gave ordered diffraction to beyond 2.44 A.

3.3.2 Crystallographic Structure Solution of SPase A2-75 in the Complex with BAL4850C

The crystal structure of SPase A2-75 in complex with inhibitor BAL4850C was successfully determined by the molecular replacement method using data in the 30-3A resolution range. A molecular replacement solution was found using the program Phaser in CCP4i suite program [I371. The atomic coordinates used for the search model was Molecule A from the published 2.58, crystal structure of the SPase A2-75 in complex with a Arylomycin A2 (PDB code, 1T7D) [40].

Cycles of rigid-body, restrained, NCS and TLS refinement by using the program

Refmac 5 in CCP4i suite [137], and combined refinement in CNS [I411 used to match experimental electron density. Residue-by-residue model checking and manual model rebuilding by using the program XFlT in the suite XTALVIEW [I391 and Coot [139, 1401 enhanced the accuracy of the model. Throughout the course of each refinement, 5% of the data were left out for cross-validation (Rfre,). The final refinement yielded a model with R = 22.5%, Rfree= 26.0% at 2.44a resolution, which fits to the experimental electron density.

3.3.3 Model Features of SPase A2-75 in the Complex with BAL4850C

The final model of SPase A2-75 in complex with BAL 4850C contains two molecules with 429 amino acids residues, 158 water molecules and 3580 non- hydrogen atoms. An inhibitor molecule is bound at the active/binding sites of each molecule.

In the Ramachandran plot, 88.4% of the residues lie in the most favourable regions and with no residues residing in disallowed regions (Figure

3.2). The B-factors of the SPase A2-75 are in average of 56.0 A', whereas those of inhibitors are higher, with an average value of 70.6 A'. The RMS derivations of bonds and angles are 0.025 and 1.957, respectively (Table 3.3). Four disordered surface loops of the residues F106-PI 25, Sl7l-El77, F196-T205, D304-L314, a characteristic of the crystal structures of SPase A2-75 described in previous papers (Figure 5.10) [I8, 40, 411, were also observed in this model.

Some residues, particularly, those within the loop regions (F106-P125,

Sl7l-El 77, F196-T205, D304-L314) have shown very weak electron density at the early stage of refinement but progressively appeared along with the progress of the refinement. These residues have been modeled as alanine at the early stage of refinement, and then switched to the native amino acids in the sequence, whenever the electron densities corresponding to their side chains appear. However, in the final model, these residues were left as mutant alanine or glycine, including residues 79, 140, 205 and 206 in molecule A and residues

121, 122, 198, 199, 203, 304, 306, 307, 309, and 310 in molecule B (Table 3.2), due to the lack of electron density for their side chains. The data collection and final refinement statistics are summarized in Table 3.3 Table 3.2: Side chains with undefined electron density in the crystal structure model of SPase a2-75 in the complex with BAL 4850C Residue# I SPase I Model 1 Residue# I SPase 1 Model I Sequence I Sequence Molecule A Molecule B Ramachandran Plot

Phi (degrees) Pld statistics Residues in marr favouredregio~[A,B,L] Resrducs m addil~omlallowed regioua [a,b,l,p] Resiches in generouxly allowed regions [-a,-by-1;pl Residues in disallowed reglms

Number of non-glycine and non-proline residues Number of end-rwldum (rxcl. Gly and Pro) Nwnbof glplne restdves (shown as Irrulglea) Number of prolinc ICSI~UCJ Total number of radua

Figure 3.2: Ramachandran plot of the model SPase A2-75lBAL 4850C The red area indicates the most favorated region and yellow and light yellow areas indicate allowed regions. Black solid triangles indicate glycine residues and black solid squares indicate other risidues rather than glycine. Table 3.3: Crystallographic data collection and refinement statistics of SPase A2-75lBAL 4850C complex Data Collection Wavelength (A) Space group Unit-cell dimensions (A) Molecules in the asymmetric unit Resolution range (A) Vm (A3 IDa) % Solvent Total observed reflections Unique reflections Average redundancy Completeness (Oh) Mosiacity (") Rmerge(%) Mean //a(/) Refinement Residues Atoms Waters R,,,, Rfree r.m.s deviations Bonds (A) Angles (") Estimated overall coordinate error (A) Overall B (a) for protein atoms Chain A: 57.68 Chain B: 54.29 Overall B (A) for water atoms 43.04 Overall B (A) for inhibitor In chain A: 85.31; In chain B: 55.83 Ramachandran analysis Most favoured Additional allowed Generously allowed Disallowed

Rmerge= t 1 Ilo,,l - (1,,,,,1 ( 1 t )I,,,,,), where is the average structure factor amplitude of reflection I, and I,, represents the individual measurements of reflection Iand its symmetry equivalent reflection. Rwork= 1 I Fobs - Fcalc I / 1 Fobs R,,,, = ZhklS T ((Fobs)- (Fcalc()21 1 hk16 T (Fobs(2,where 1 hk18T are reflections belonging to a test set of 10% of the data, and Fobs and Fcalc are the observed and calculated structure factors, respectively. The data collection statistics in brackets are the values for the highest resolution shell (2.53 - 2.44 A). 3.3.4 Structure Comparison of SPase A2-75 in Complex with BAL 4850C and Previously Solved SPase A2-75 Structures

The asymmetric unit contains two copies of the SPase A2-75lBAL 4850C complex displaying nearly identical conformation. Superposition of these two molecules in an asymmetric unit yields Root Mean Square Deviation (RMSD) values of 0.77 A (a Carbon, Ca), 0.82 A (backbone). Most of the structural changes are located at surface loops, particularly, four loops of the residues 106-

125, 171-1 77, 196-205, 304-314. After removing four disordered regions (the residues 106-125, 171-177, 196-205, and 304-314), the RMSD of Ca and backbone drops to 0.43 and 0.49 A , respectively.

The binding of BAL4850C to SPase A2-75 does not alter the overall conformation of SPase A2-75, since the overall structure of SPase A2-75 in complex with BAL 4850C is quiet similar to those of apo-enzyme, penem- enzyme, and lipopeptide-enzyme, containing two anti-parallel P-sheet domains and an extended P-hairpin protruded from domain 1 (Figure 1.5). The first three residues (residues 76-78) are flexible and were not visible in the electron density.

Superposition (all chains) of molecules with those molecules from apo-enzyme, penem-enzyme, and lipopeptide-enzyme yield the RMSD of Ca trace and the main chain backbone at each molecule is 0.72 and 0.68 A (vs. apo-enzyme),

0.66 and 0.68 A (vs. penem-enzyme), 0.51 and 0.54 A (vs. lipopeptide-enzyme), respectively. Again, slight structural variation is derived mainly from four dynamic loops (residues 106-125, 171-1 77, 196-205, and 304-314). 3.3.5 Glyco-lipohexapeptide Inhibitor (BAL 4850C) at the Active Site

The initial Fo - Fc difference map show a well-defined electron density corresponding to BAL 4850C (Fo and Fc are the observed and calculated structure factor, respectively). Electron density for BAL 4850C clearly demonstrated a large ring-shape density corresponding to the ring formed through three residues (MeHpg - L-Ala - L-Tyr) of the hexapeptides, the rod- shape density corresponding to the N-terminal tripeptide (D-MeSer-D-Ala-Gly), and a strong bulged-shape corresponding to the sugar group attached on the

MeHpg - L-Ala - L-Tyr ring via 025. In particular, the detailed electron density shape for the sugar is consistent with the geometry of the sugar, a deoxy-a- mannose, with 078 pointing to axial, and 070 and 080 pointing to equatorial if viewed from the side of the sugar chair-like ring (Figure 3.3) , which was supported by NMR analysis [121]. Overall, a large circular shape with a rod- shape density is consistent with the general shape of the inhibitor BAL4850C

(Figure 3.4A). so The sugar group is connected to the ring 6-deoxy alpha mannose of BAL 4850C via 025

Figure 3.3: The geometry of the sugar group of BAL4850C A. The chemical structure of a 6-deoxy-a-mannose and its chair-like conformation. B and C, a cross-validated 2Fo-Fc electron density map contoured at la surrounding the 6-deoxy-a-mannose in the inhibitor of molecule A. D and E, a cross-validated 2Fo-Fc electron density map contoured at losurrounding the 6- deoxy-a-mannose in the inhibitor of molecule B. Interestingly, the electron density for the fatty acid tail (17-carbon unsaturated fatty acyl chain) of BAL 4850C appeared after completing simulated annealing refinement with CNS. With the lipid tail fit into the density, and with additional cycles of refinements, the fatty acid tail clearly makes its main molecular interactions with part of a large hydrophobic surface of SPase A2-75

(Figure 3.4 B and 3.5), which is predicted to bind to lipids of the membrane[l8].

The electron density of the 17 carbon unsaturated fatty acid tail of BAL 4850C is observed only for one inhibitor molecule of the asymmetric unit (at 0.5 sigma), suggesting the fatty acid tail is highly dynamic and therefore disordered. No electron density was seen for the fatty acid tail (12-carbon saturated fatty acyl chain) in the Arylomycin A2 crystal structure [40]. Figure 3.4: Electron density for glyco-lipohexapeptide BAL4850C bound in the active site of signal peptidase A. A cross-validated 2Fo-Fc electron density map contoured at Io surrounding the signal peptidase inhibitorlglyco-lipohexapeptideBAL4850C. The molecule of inhibitor BAL 4850C is shown as a stick representation and coded by elements, with carbon atom in yellow, oxygen atom in red, and nitrogen atom in blue. B. Across- validated 2Fo-Fc electron density map contoured at 0.50 surrounding the fatty acid tail (in magenta) of the inhibitor BAL4850C. Along with it, a cross-validated 2Fo-Fc electron density map contoured at 0.50 surrounding the N-terminus (residues 82- 80) of protein is shown in green. The molecule of inhibitor BAL 4850C is shown as a stick representation and coded by elements, with carbon atom in magenta, oxygen atom in red, and nitrogen atom in blue. The residues of protein are shown as a stick representation and coded by elements, with carbon atom in green, oxygen atom in red, and nitrogen atom in blue. The van der Waals interactions between the fatty acid tail of the inhibitor and the N-terminal of protein are shown by dash lines with bond distances in A. The red labelling is for the inhibitor atoms and black labelling is for protein residues (in one letter code). Figure 3.5: The overall binding theme of BAL4850C. A. Protein is represented as ribbon representation and coloured in green. Inhibitor is represented as stick and coloured by element with carbon atom in yellow, nitrogen atom in blue, and oxygen atom in red. B. SPase A2-75 is in molecular surface representation with the following colour scheme: basic residues are in blue, acidic residues are in red, cys residues are in orange, all others are in grey. Inhibitor is represented and coloured the same as that in diagram A. The red labelling is for the inhibitor atoms and black labelling is for protein residues in one letter code.

3.3.6 Analysis of the Interactions between the Glyco-lipohexapeptide (BAL4850C) and SPase I

The inhibitor BAL4850C is non-covalently bound to the substrate-binding

sites of SPase A2-75,with one associated water molecule at the active site and another one one associated at the binding site (Figure 3.6). An average of 521.4

A2 of solvent accessible surface area on SPase A2-75 is buried by the inhibitor.

The interactions between the inhibitor and SPase A2-75 are mediated through both the surrounding amino acid side chains and backbone atoms on SPase A2- 75, and through the peptide-backbone side of 3-residue biaryl-bridged ring

system (MeHpg-L-Ala-L-Tyr) and the N-terminal peptide body (D-MeSer-D-Ala-

Gly) of the inhibitor (Figure 3.5 and Figure 3.6). The inhibitor is positioned with its

C-terminus (the 3-residue biaryl-bridged ring system (MeHpg-L-Ala-L-Tyr)

pointing toward the active site, and makes a parallel P-strand interact with the P-

strand 142-145 that lines the SPase binding pocket. The mainchain of the N-

terminus of the inhibitor (D-MeSer-D-Ala-Gly) also takes a parallel P strand

interaction with the P-strand 83-90 that line the other side of the binding pocket.

Similar to the SPase A2-75lArylomycin A2 complex [40], the C-terminal

caboxylated oxygen atom 045 interacts with all three of the enzyme catalytic

residues including the nucleophile Ser 90 Oy, the general base Lys 145 Nc, and the oxyanion hole Ser88 Oy. Its carbonyl oxygen atom 044 is hydrogen bonded to lle144 N and the general base Lys 145 N<. The N-terminal end of the 3- residue biaryl-bridged ring system (MeHpg-L-Ala-L-Tyr) houses the most interactions between the inhibitor and the protein, including hydrogen bonding interactions through N33, N28, 027, 015 , N7 and 06 to protein residues Asp

142, Ser88, Gln85, Phe84, Pro83, and Glu82. There are several van der Waals interactions observed to assist the recognization between protein and inhibitor.

Van der Waals contacts from the C30 methyl group to protein residues Ile 144 and Phe 84. The C30 methyl group from the Ala points approximately into S3 substrate-binding pocket. Interestingly, only three hydrogen bonds were observed between the N-terminal peptide (D-MeSer-D-Ala-Gly) N7, 06, and 015 of the inhibitor to protein residues Gln 85 N, Pro83 0 and Glu82 0~2, respectively. Among these three hydrogen bonds, two of them are direct H- bonding derived through 015 to Gln85 N and N7 to Pro83 0.Additionally, the hydrogen bond between 06 and Glu82 0~2is mediated via a water molecule.

The relative high interactions from the peptide side of 3-residue biaryl-bridged ring system (MeHpg-L-Ala-L-Tyr) suggest that this ring moiety provides the main driving force for inhibitor BAL4850C binding.

It is worthy noting that the biaryl-bridged side with attached sugar of 3- residue biaryl-bridged ring system (MeHpg-L-Ala-L-Tyr) is almost entirely exposed to the solvent (Figure 3.5). The molecular environment of the sugar is described in the Table 3.4 and Figure 3.6. Table 3.4: Inhibitor-SPaseA2-75 contact distance (glyco-lipohexapeptide BAL4850C) Inhibitor Protein atom Distance Distance atom (A) (-4 Molecule A Molecule B

C48 Glu82 CG 4.5 n.a. 047 Pro83 N 6.2 4.1 06 Glu82 0E2(Via WAT 158 in molecule A, 2.513.4' 3.613.0~ and WAT 75 in molecule B) N7 Pro83 0 4.5 3.6

N33 Asp142 0 2.8 3.1 044 lle144 NILys145 Nz 3.013.4 2.613.4 045 Lys145 NzISer9O 0G/Ser88OG I H2016 3.713.013.014.2 3.4/3.0/3.013.4 in molecule A and H2074in molecule 6 C30 Phe84 CD2/Asp142 OI lle144 CG2, CB 4.213.913.614.0 4.114.513.813.8 Sugar Environment 070 Glu 307 0 n.a. 6.1

n.a. denoted that the interaction was not observed. binhibitor atom to water distancelwater to protein atom distance Figure 3.6: Diagram of BAL4850C interacts with SPase A2-75 at the active and substrate- binding sites Dashed lines indicate hydrogen bonds and van der Waal interactions, and solid lines are drawn for showing the environment surrounding the sugar group. The interaction distances (A) are indicated for the inhibitor of molecule A. 3.3.7 Comparison of the Substrate Binding Site of SPase 112-751BAL 4850C with Apo-enzyme, Acyl-enzyme, and Lipopeptide-enzyme Complex

Superposition of the active and binding site residues of the present structure SPase A2-75lBAL 48506 onto previously solved crystal structures; the p-lactam acyl-enzyme, (PDB code: 1B12, space group P21212)1the apo-enzyme,

(PDB code: 1KN9, space group P41212)and the lipopeptide arylomacin A2 enzyme (PDB code: IT7D, space group P432,2), shows an overall similarity of the active site and binding sites among all structures (Figure 3.7). Particularly, it is identical between SPase A2-75lBAL 48506 and SPase A2-75llipopeptide arylomycin A2 (Figure 3.7C), except for two subtle differences in the side chain conformations of Dl42 and Ile144. One is that the average 2 angle (two molecules in the asymmeric unit) of Dl42 turned from 45.6" in the SPase A2-

75lBAL 48506 structure to 74.8" in the SPase A2-751 arylomycin A2 structure.

The other is that the average %angle of lle144, it turned from 166.3" in the

SPase A2-75lBAL 4850C structure to 121.6" in the SPase A2-751 arylomycin A2 structure.

At the active sites, indicated by three catalytic residues (Ser90, Ser88, and

Lys145), the geometry of the catalytic residue Ser 90 Oy is the same with a similar average XI around 70•‹, however the geometry of the other catalytic residue Ser 88 Oy and Lys 145 N< are lightly different. The average XI angle for

Ser 88 agrees with one another from the crystal structures of BAL 4850C- enzyme, apo-enzyme, and lipopeptide arylomycin A2-enzyme, with a value of

70.4", 78.95", and 79.6", respectively. This is consistent with the role of residue

Ser88 in the formation of the SPase oxyanion hole and stablization of the tetrahedral oxyanion intermediate. However, the angle (average 52.8") for Ser

88 side chain in the penem-enzyme structure is forced out from its common

position (the average XIof 76.3"for all of known structures of SPase A2-75 ) to

avoid a stereo-clash with the thiozolidine ring of the inhibitor (Figure 3.7 B and

Figure 1.8). Therefore, not only the nucleophile Ser9O is defaulted by a covalent

bond formation between Ser9O and penem, but also the oxyanion hole is

disrupted by the displacement of the Ser88 side chain. This may explain why

penem is the most effective inhibitor of SPase 1 [I17-1 191. Interestingly, the

average x4 angle of Lys 145 is similar in the apo-enzyme (178.3") and penem-

enzyme structures (168.8") because they share similar H-bonding. The atom of

N< of Lys 145 H-bond interacts with Ser 90 Oy in both structures to fulfil its role

as a general base. However, despite a H-bond between N< of Lys 145 and Ser

90 Oy, there is an additional H-bond formed between the N< of Lys 145 of protein

and 045 of the inhibitor in both BAL 4850C-enzyme and lipopeptide-enzyme

structures. This forces the average XJ angle from the common position (1 78.3" in

apo-enzyme or 168.8" in penem-enzyme) to the inhibitor-tendentious position

(72.0" in BAL 4850C-enzyme and 69.2" in lipopeptide-enzyme). The

displacement of the N

The binding site Sl, constructed by non-polar residues 86-88, 90-91, 95, and 143-145, is similar in size among all of structures (Figure 3.7D). Surface calculation confirms that S1 has an area of 140.7 A2 and a volume of 142.3 A3 for BAL 4850C-enzyme, an area of 133.4 A2 and a volume of 143.9 A3 for lipopeptide-enzyme, an area of 145.5 A2 and a volume of 129.3 A3 for apo- enzyme, and an area of 156.8 A2 and a volume of 224.1 A3 for penem-enzyme.

The size of S1 is similar for non-covalent binding complexes (lipopeptide- enzyme, BAL 4850C-enzyme) and native protein (apo-enzyme). The size of S1 in penem-enzyme structure is slightly larger than that in other crystal structures.

This may imply that a covalent-binding of inhibitor penem has more effect on protein conformation change than does that of noncovalent-binding, as seen in

Arylomycin A2 and BAL4850C complexed with SPase A2-75.

The binding site S3, built by non-polar residues 84, 86, 101, 132, 142, and

144, is also different among these crystals, mainly due to the presence or absence of bound inhibitor. For example, the conformation of Phe 84 side chain, in all of the inhibitor-binding structures, including BAL 4850C-enzyme, lipopeptide-enzyme, and penem-enzyme, shares a similar position which is different with that in the apo-enzyme structure (Figure 3.7D). It suggests upon inhibitor binding, Phe 84 filps its phenyl ring (80") to lock inhibitor into the position, serving as a supporting base. The van der Waals interaction that is involved between Phe 84 and inhibitors (the C30 of BAL4850C, 4.1 A; the C30 of

Arylomycin A2, 4.1& and the atom of penem) in three inhibitor-binding complexes (BAL 4850C-enzyme, lipopeptide-enzyme, and penem-enzyme) may be responsible for Phe84 phenyl ring flip [18, 401 (Figure 3.6). This ligand induced conformation change of Phe84 may reflect substrate-binding state. Figure 3.7: The active site superposition of SPase A2-75lBAL 4850C complex crystal structure with apo-enzyme, p-lactam inhibitor acyl-enzyme, and lipopeptide- enzyme crystal structures. The active site of SPase 62-75lBAL 4850C complex crystal structure is shown as stick representation in magenta and that of apo-enzyme, p-lactam inhibitor acyl- enzyme, and lipopeptide-enzyme are shown as stick representation and in green. cyan, and blue, respectively. A. Superposition of BAL4850CISPase complex structure onto the apo-enzyme structure. 8. Superposition of BAL4850CISPase complex structure onto the p-lactam inhibitor acyl-enzyme complex structure. C. Superposition of BAL4850CISPase complex structure onto Arylomycin A21SPase complex structure. D. Superposition of BAL4850C-enzyme structure onto all three structures of apo-enzyme (PDB code:lKNS [41]), acyl-enzyme (PDB code:l B12 [18] and lipopeptide-enzyme (PDB code:IT7D [40]) The inhibitor has been removed for clarity. All of molecules are chosen from molecule A of each corresponding structure. According to previous study [41], three conserved waters near the active

site were observed in apo-enzyme structure and designated as water 1, water 2,

and water 3 (Figure 3.8). The water 1 is coordinated with Met 91 backbone NH,

Ser88 0, and Leu 95 0. The water 2 is coordinated with Ile 144 NH. The water 3

is with coordinated with K145 N< [41]. For convenience, the following description

of conserved waters is kept in apo-enzyme numbering.

Figure 3.8: Three conserved waters at the active site of SPase I The active site of the apo-enzyme structure is represented as stick and coloured by elements. The colour code is carbon in green, oxygen in red, and nitrogen in blue. Three conserved waters are represented as spheres in cyan. The hydrogen bonding interactions between these waters and the residues of SPase 1 are indicated as red dashed lines. Protein residues are labelled in one letter code and in black. The protein molecule chosen is molecule A from apo-enzyme crystal structure (PDB code: 1KN9) [41]. In penem-enzyme structure, water 1 was present, but water 2 and 3 were displaced by the inhibitor [I81. In lipopeptide-enzyme structure, water 1 and water

3 were present, but water 2 was displaced by Arylomycin A2 [40]. Not surprisingly, in the current glycolipopeptide-enzyme structure, water 1 (water 35

in molecule A and water 87 in molecule B, the numbering in this structure) and water 3 (water 16 in molecule A and water 19 in molecule B, the numbering in this structure) were present, but water 2 was displaced by inhibitor BAL 48506

(Figure 3.8). Similar to the lipopeptide-enzyme structure, instead of hydrogen bonding between water 2 and Ile 144 N, it is 044 of inhibitor BAL4850C that made a H-bond (3.4in both molecule A and B) with Ile 144 N (Figure 3.8 and

Table 3.4). In summary, water 1 is structural conserved water. It is always in the crystal structures regardless of the presence or absence of inhibitor binding.

Water 2 may help to stabilize the active site when there is no ligand present.

When an inhibitor binds, it is excluded (in penem-enzyme, lipopeptide-enzyme and glycolipopeptide-enzyme complex structures). Water 3 serves as a deacylation water, proposed by Paetzel et a/., based on the proper distance and angle between water 3 and the Nc of Lys 145 [41]. It seems that if a non covalent-binding inhibitor coordinated (glyco-lipopeptide and lipopeptide); water 3 is still present, like in the apo-enzyme structure. However, if a covalent-binding inhibitor (penem) binds, water 3 is also excluded. This feature may be due to a larger conformation change upon inhibitor covalently binding, as supported by the relatively larger binding pocket S1 in penem-enzyme crystal structure. Displacement of deacylation water 3 by penem supports its strongest inhibition among all SPase I inhibitors known so far.

The superposition of inhibitor BAL4850C and Arylomycin A2 at the active and binding site demonstrates that the geometry of C-terminal ring moiety of two inhibitors is overlapping. The N-terminus of two inhibitors take a close geometry but shift a bit in path, beginning with atom N7 down to atom C48 (Figure 3.9).

This similarity can be explained in terms of substrate specificity that SPase I conserves a common recognization feature to similar ligands. In turn, it also indicates that two similar inhibitor molecules would take an identical geometry to the same target upon binding. Figure 3.9: A superposition of the SPase inhibitors (glyco-lipohexapeptide BAL4850C and Arylomycin A2) at the active site of E. coli SPase I. The structures of inhibitor1SPase A2-75 are shown as line representation with BAL4850CISPase in magenta and Arylomycin A21SPase A2-75 in blue. The inhibitors BAL4850C and Arylomycin A2 are shown as stick representations and coloured by element with carbon atom in yellow, oxygen atom in red and nitrogen atom in blue for BAL4850C, and with carbon atom in cyan, oxygen atom in red, and nitrogen atom in blue for Arylomycin A2 Conserved water 1 and water 3 (in the apo-enzyme numbering) at the active site are represented as spheres with the same colour code as the corresponding residues at the active site. The residues of SPase A2-75 are labelled as single letter in black. The atoms of inhibitor are labelled in red. The coordinates for Arylomycin A2 were from the lipopeptide- enzyme crystal structure (PDB code: 1T7D) [40]. CHAPTER 4: CRYSTAL STRUCTURE OF SPASE A2-75 IN A TERNARY COMPLEX WITH ARYLOMYCIN A2 AND A SULTAMIMORPHOLINO DERIVATIVE BAL0019193

4.1 Introduction

In this work, two inhibitors were used to complex with SPase A2-75. One is Arylomycin A2 (Figure 4.1A). The other is a sultamlmorpholino derivative

(BAL0019193) (Figure 4.1B). The X-ray crystal structure for SPase A2-75 in complex with inhibitor Arylomycin A2 was solved previously by Paetzel et a/. [18,

401, which showed the importance of Arylomycin A2 recognition through active and substrate-binding sites. Interestingly, the co-inhibition effect of these two inhibitors was found to be 1,000 times greater than that of Arylomycin A2 alone

(communication of Basilea Pharmceutica Ltd). In order to understand the co- inhibition inhibition mechanism of Arylomycin A2 and a sultamlmorpholino derivative to SPase I at the structural level, it is necessary to find out how this sultamlmorpholino derivative interacts with SPase A2-75 and how these two inhibitors related to one another in the ternary complex structure.

Arylomycin A2 consists of two structural features: a hexapeptide and an iso 12-carbon fatty acyl chain (Figure 4.1A). The first two residues of the peptide are in the D-stereochemistry. The amino acid residue MeHpg is N-methyl-4- hydroxyphenyglycine. Three residues (MeHpg - L-Ala - L-Tyr) of the hexapeptides form a single ring via a (3, 3)-binary cross-linkage of the ortho- carbon atom of MeHpg phenol ring and the ortho-carbon atom of Tyr phenol ring.

The iso 12-carbon unsaturated fatty acyl chain is attached via an amide bond to

the amino terminus (D-MeSer).

A

\ \ \ \ Iso-C?2 fatty acid ', D-MeSer, D-Ala \, Gly \, MeHpg \, L-AI~\\ L-Tyr \,\ \ \ \ \ \ \ \ \ \ \ , \ , \ \ \ \ \ \ \ I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I 4 I I I I 44

I 1 I I I I I I I I I I I I I 1 I I I I I I I I ' gH I I I I I f I 42 ] 1

0 S-N

0 3 0

Figure 4.1: Inhibitor structure of bacterial type I signal peptidase A. Arylomycin A2 8. a sultam/morpholino derivative BAL0019193.

A sultam/morpholino derivative, BAL0019193, consists of two small rings

(Figure 4.1B). One is four-membered hetero-ring, a sultam ring, made of two carbon atoms (sp3 hybrided carbon), a sulphur atom, and a nitrogen atom. The other is six-membered hetero-ring, a morpholino ring, made of four carbon atoms

(sp3hybrided carbon), a nitrogen atom and an oxygen atom. A linker (a urea derivative) holds these two rings together via two nitrogen atoms provided by two rings.

Both inhibitors of Arylomycin A2 and BALOOI 91 93 are colourless and have a molecular weight of 824 Da and 220 Da, respectively.

We have solved the ternary crystal structure of inhibitor Arylomycin A2 and a sultam/morpholino derivative BAL0019193 in complex with SPase A2-75 at

2.04 8, resolution. We determined the binding site for the sultam/morpholino derivative and proposed the inhibition mechanism of BALOO19193 in helping with inhibitor Arylomycin A2 against SPase I.

4.2 Material and Methods

4.2.1 The Ternary Complex of SPase A2-75 with Arylomycin A2 and a SultamlMorpholino Derivative BAL0019193

SPase Q2-75 was expressed and purified using the methods described in the section 2.2.2-2.2.5 of Chapter 2, which are modified from a previous study

[38]. The concentration of SPase A2-75 protein was adjusted to 15 mglml

(0.5366mM), in the buffer of 20mM Tris-HCI and 8.5mM Triton-100. Inhibitor BAL

001 91 93 stock solution was prepared at 75 mM in 100% acetonitrile while arylomycin A2 stock solution was made at 20mM in 100% DMSO. The inhibitor stocks were stored at -80 Co until use. Before setting up crystallization screens,

SPase A2-75 was mixed with BAL 001 9193 at 1:5 molar ratio and the solution was incubated on ice for one hour. Arylomycin A2 solution was then added to the

BALOO19193/SPase Q2-75 mixture at 1:I molar ratio (arylomycin A2: SPase A2- 75). The three-component mixture solution of SPase A2-75/Arylomycin A21

BALOO19193 was kept on ice for one additional hour.

4.2.2 Co-crystallization of the Ternary Complex of SPase A2-75 with Arylomycin A2 and BAL 0019193

Initial screening for co-crystallization was performed by the sitting-drop vapour-diffusion method at 18 "C using commercial Hampton Research crystallization screens (Please also refer to the section 2.2.6 in Chapter 2). Initial co-crystallization screens gave either disc-shaped or needle-like microcrystals.

There were three crystallization hits from MembFac and six hits from PEG/lon

(Table 4.1). Based on a base condition of 0.1 M Mg(OAC)2,and 20% PEG 3350

(PEGIlon '25), we first used a cross-grid screen of pH 4.6-8.5 raising 1 pH unit at a time (4.6 NaOAC, 5.6 Na citrate, 6.5 Na Cacodylate, 6.5 ADA, 7.5 Hepes, 8.5

Tris-HCI) against 10% PEG 2000 MME, 2000, 3350, 4000, 8000, 10000. Crystals in the condition of 10% PEG 2000, 6.5 Na Cacodylate, 0.1M Mg(OAC)2were found to be improved in size. This was followed by a replacement screening,

PEG 2000 (5%, lo%, 15%, 20%, 22%, and 25%) against 0.2M tri-potassium

Citrate, NH4 Formate, Mg2S04,Li3 Citrate, was used. The crystals grown from the condition 0.2M NH4 Formate, 25% PEG 2000, 0.1 M Na Cacodylate pH 6.5 were big and appeared more regular in shape. Various additives were searched including alcohols and different cation salts. The addition of the additive 5% tert- amyalcohol further increased crystal size.

The final optimized reservoir condition that produced high diffraction- quality crystals was 0.2M NH4 Formate, 25% PEG 2000, 0.1 M Na Cacodylate pH 6.5, and 5% tert-Amyalcohol. The drop consisted of 2pl of protein and inhibitor mixture, and 2pl of reservoir solution. The drop was equilibrated over 1ml reservoir solution.

Table 4.1: The initial co-crystallization conditions of SPase A2-75 in ternary complex with Arylomycin A2 and BAL 0019193 1 Screen 1 # I Salt Buffer Precipitant kit MembFac 1 17 1 0.lM NaCl ( 0.1M Na3 Citrate pH 5.6 1 12% wlv PEG 4000 MembFac 18 0.1M (Li),S04.H20 0,lM Na, Citrate pH 5.6 12% w/v PEG 6000 MembFac 23 0.1M (Li)2S04.H20 0.1 M n-(2-Acetam1do)- 12% wlv PEG 4000 iminodi-acetic Acid pH 6.5 2% v/v iso-propanol PEG/lon 21 0.lM Na Forrnate none 20% w/v PEG 3350 PEGIlon 23 0.2M NH4 Formate none 20% wlv PEG 3350 PEG/lon 25 0.2M none 20% wlv PEG 3350 McJ(OAC)~.~H~O PEGIlon 32 0.2M MgSO4.7H2O none 20% w/v PEG 3350 PEGIlon 45 0.2M Li3 Citrate.4H20 none 20% wlv PEG 3350 PEG/lon 47 0.2M Kg Citrate,H20 none 20% w/v PEG 3350

4.2.3 X-ray Diffraction Data Collection

The ternary complex crystals of SPase 02-75lArylomycin A2iBAL0019193 were used for collecting diffraction data both at the SFU Macromolecular X-rays diffraction Data collection Facility (Rigaku-MSC, USA) (the data set was summarized in Chapter 2, Table 2.2) and at the Advanced Light Source of

Lawrence Berkeley National Laboratory (the data set is presened in this Chapter,

Table 4.2). The method described here is for the data set from Lawrence

Berkeley National Laboratory. Before data collection, the crystal was transferred by a pipet from the growth drop to a cryoprotectant composed of 0.2M NH4

Formate, 25% PEG 2000, 0.1M Na Cacodylate pH 6.5, and 5% tert-Amyalcohol, 20% vlv glycerol) for 2 days. The diffraction data set used for revealing this

ternary complex structure was collected on beamline 8.2.2 at the Advanced Light

Source of Lawrence Berkeley National Laboratory, using a Quantum-315 ADSC

detector. The wavelength of synchrotron X-rays was 1.1217A. The crystal was

rotated through a total 10Oo1 with 1.O0 oscillation per frame. The exposure time

used for each image was 10 seconds.The distance between the crystal and the

detector was set up as 300 mm. The data revealed significant diffraction to a

resolution of 2.04 A. Data were collected, indexed, integrated using DENZO and

scaled using SCALEPACK, from the HKL2000 program [147]. The crystals

belonged to the tetragonal space group P43212.The unit cell dimensions were

determined to be a = 70.01 a, b = 70.01 a, and c = 259.89 a, and u,P, y = 90".

The Matthews's coefficient (or specific volume V,) is 2.88 A3 /Da for two

molecules in the asymmetric unit. The fraction of the crystal volume occupied by

solvent was 56.5 % calculated by the program Matthews in the CCP4i suite of

program [I30, 1371. For crystals and data collection statistics see Table 4.2.

4.2.4 Phasing, Model Building, and Refinement

A molecular replacement solution was found using web CaspR [148]. The

atomic coordinates used for the search model through CaspR was Molecule A

from a 2.5 A crystal structure of the SPase A2-75 (PDB code, 1T7D). The

resolution range of the reflections used in the rotationltranslation search was

42.98 to 2.04 A and the number of copies in asymmetric unit was two molecules.

The topology and parameter files for the inhibitor Arylomycin A2 were

obtained from the pdb file of 1T7D in RCSB. The topology and parameter files for the inhibitor sultam/morpholino derivative BAL0019193 were generated using the

program PRODRG [138]. Coordinates of inhibitor Arylomycin A2 and inhibitor

BAL0019193 were manually docked into clear different electron densities (Fo-Fc)

near the active site. In addition, the main chain trace and the side chain

assignments for the dynamic regions corresponding to residue Ser197-Asn200 in

Molecule B and residues 304-314 in both molecule A and B were built manually.

Water molecules were added to well-defined peaks (2.0 a and greater in F,-F,

maps) found between 2 to 3.8 A from 0 and N atoms in the protein. Model

building and analysis was performed with the program XFlT in the suite

XTALVIEW and Coot [I39, 1401. Refinement of the structure was carried out

using the program Refmac 5 in CCP4i suite as well as CNS [137, 1411. The

cycles of refinement were carried out for both the protein model and the inhibitor

model using rigid body refinement and then followed by restrained NCS

refinement in Refmac 5, and simulated annealing, energy minimization, and B- factor refinement was performed in CNS [I37, 1411. In addition, a cycle of TLS

refinement was carried out using the TLS Motion Determination Server and the

program TLS refinement in CCP4i [137, 1421. In all cycles of refinement, 5% of the reflections were set aside for cross-validation. All data were included in the final round of refinement. Final refinement and analysis statistics of this ternary complex are provided in Table 4.2.

4.2.5 Structural Analysis

The stereochemistry of the structure model was analyzed with the program PROCHECK [143]. Superposition of the structures were calculated by the program Superpose [144]. The measurement of the substrate binding site was calculated using the program CASTp and Accessible Surface Areas (ASA) was calculated by CCP4i [137, 1453. In CASTp and CCP4i, a solvent probe of

1.4a was used.

4.2.6 Figure Preparation

Figures were prepared using program lSlS Draw version 2.5 (MDL

Information System, Inc.), XFlT [139], Raster 3D [I461 and PyMol [56].

4.3 Results and Discussion

4.3.1 Cocrystallization, Structure Solution, and Refinement

Co-crystallization conditions for the ternary complex of SPase A2-

751Arylomycin A21BAL0019193 were screened for and obtained in the novel condition; 0.2M NH4 Formate, 25% PEG 2000, 0.1M Na Cacodylate pH 6.5, and

5% tert-Amyalcohol. Crystals at pH 6.5, the closest pH to physiological pH (7.0) and other structures solved at more acidic pHs, may reflect the true physiological conformation for this ternary complex of SPase A2-751Arylomycin

A21BALOO19193, on both aspects of protein and inhibitor conformation. These crystals formed approximately 5 days after set up, and only one or two crystals grew in each single drop. Interestingly, there are two crystal morphologies co- existing in the same drop (Figure 4.2). One is football shaped and the other is rectangular. These crystals belong to the different space group but share similar unit cell dimensions. The football shaped crystal has the space group P42212and unit cell dimensions of a = 69.0 A, b = 69.0 8,, and c = 258.3 8, (Table 2.2 in Chapter 2). The rectangular crystal belongs to the space group P43212and has unit cell dimensions of a = 70.0 A, b = 70.0 A, and c = 259.9 A (Table 4.2).

Typically, the football shaped crystals are approximately 0.2 x 0.2 x 0.4 mm in size and the rectangular crystals are approximately 0.3 x 0.3 x 0.4 mm in size.

Figure 4.2: Two different morphologic crystals of the ternary complex of SPase A2- 75lArylomycin A2/BAL0019193 co-existing in one drop A typical football shaped crystal is 0.2 x 0.2 x 0.4 mm in size and a typical rectangular crystal is 0.3 x 0.3 x 0.4 mm in size.

The rectangular crystal with 0.3 x 0.3 x 0.4mm in size gave ordered diffraction to beyond 2.04 a and was used for data collection and structure solving. The crystal structure of SPase A2-75 in ternary complex with two inhibitors Arylomycin A2 and BAL0019193 was successfully determined by molecular replacement. The final refinement yielded a model that fits to the experimental electron density with a Rcvst = 20.7% and Rf,, = 25.0% at 2.04 a resolution. The final model contains 2 molecules in the asymmetric unit with a total of 469 amino acids residues and 257 water molecules. Each SPase A2-75 molecules is bound to one molecule of Arylomycin A2 and one molecule of BAL

0019193. In the Ramachandran plot, 89.4O/0 of the residues lie in the most favourable regions, with no residues residing in the disallowed regions (Figure

4.3).The 6-factors of the SPase A2-75 are 36.3 A2 for molecule A and 39.2 A2 for molecule 6, and those of the inhibitor are 47.0 a2for the inhibitor in molecule

A and 43.1 A2 the inhibitors in molecule B (Table 4.2), as well as the average of

49.5 A2 for the all water molecules. The RMS deviations of bonds and angles are

0.030 and 2.348, respectively. The data collection and final refinement statistics are summarized in Table 4.2.

Due to the lack of electron density for their side chains, three residues,

198, 200, and 309 in molecule 6 have been modeled as alanines rather than the native side-chain residues in the SPase A2-75 amino acid sequence. In the final model, the dynamic P-hairpin loop of the residues 106-125 is missing, the dynamic surface loops of residues I71 -1 77, 196-205 are incomplete, and the dynamic loop of residues 304-314 is present, in both molecule A and B (Table

4.3).A comparison of SPase A2-75 crystals and crystal structures solved so far is summarized in Table 4.3. Ramachandran Plot

Phi (degrees)

Kesrduzs m mat favoured regiou [A,B.L] Residues in addilional allowed regions [a,b,l,p] Krstdues m gemrously allowed regtaw [-a;b,-l-pl Residues in disallowed regions

Number ofnon-glycme and nm-prolule res~dues Number of end-restdues (cxcl. Glp and Pro) Number of glycme rrslduss (show as branglss) Numbn of pmluv raulues

Figure 4.3: Ramachandran plot of the model SPase A2-75lArylomycin A2lBAL0019193 The red area indicates the most favorated region and yellow and light yellow areas indicate allowed regions. Black solid triangles indicate glycine residues and black solid squares indicate other residues rather than glycine. Table 4.2: Crystallographic data collection and refinement statistics of the ternary SPase A2-751Arylomycin A21BAL 0019193 complex Data Collection Wavelength (a) Space group Unit-cell dimensions (A) Molecules in a.s.u Resolution range (A) Vm (A3/Da) % Solvent Total observed reflections Unique reflections Average redundancy Completeness (%) Mosiacity (") Rrnerge('L) Mean //a(/) Refinement Protein Residues Atoms Waters

Rwork Rtree r.m.s deviations Bonds (A) Angles (") Estimated overall coordinate error (a, Overall B (A) for protein atoms Chain A: 36.3; Chain B: 39.2 Overall B (A) for water atoms 49.5 Overall B (A) for inhibitor In Chain A: 47.0; In Chain 6:43.1 Ramachandran analysis Most favoured Additional allowed Generously allowed Disallowed (

Rmerge= t I Ilo,,l - Ilae,iJ 1 / t II,,,,1, where is the average structure factor amplitude of reflection I,and laj represents the individual measurements of reflection I and its symmetry equivalent reflection. RWork= Z ( Fobs - Fcalc ( / Z Fobs

R,,,, = Z hk16T (IFobsl - IFcalcl)2 / E hk16T IFobs12, where E hk18T are reflections belonging to a test set of 10% of the data, and Fobs and Fcalc are the observed and calculated structure factors, respectively. The data collection statistics in brackets are the values for the highest resolution shell (2.14 - 2.04 A). Table 4.3: A comparison of SPase A2-75 crystals and crystal structures

Structure Lipopeptide Glyco- Lipopeptide Penem-enzyme & lipopeptide- -enzyme Covalently sultamlmorpholino enzyme Non- Bound derivative Non- covalently inhibitor-enzyme covalently Bound Non-covalently Bound Bound PDB Code NIA NIA Space Group Tetragonal Orthorhombic Tetragonal P43212 P21212 P41212 Unit-cell 112.4 dimensions 112.4 a, b, c (A) 198.7 Resolution

Oh Solvent #Waters 257 Other Ions none none none none Ligand Arylomycin A2 glycolipo- Arylomycin none BAL0019193 hexapeptide A2

Dynamic Loop Mole. A El Mole. A B ( Mole. A B Mole. A B C D Mole. A B C D 107-124 a a P a aP Pa P a 170-178 a a P a Pa aP a P 197-204 a a p a aa PP P P 305-313 P P a a pa aa a a RMSD of Ca 0.96 0.51 Overall B (A) for protein atoms

Overall B (A) for water atoms Overall B (A) for inhibitor Overall B (A) for all of atoms Crystallization 0.2M NH4 Formate, 22% \N/V 15% wlv 1.OM NH4HzP04, 0.7M NH4HzP04 Growth 25% PEG2000, PEG 4000, PEG 4000, 0.1M Na Citrate 0.lM Na Citrate Conditions 0.lM Na 0.2M KCI, 20% pH3.6, pH56 Cacodylate 0.025 M Propanol, 5% Tert-arnyl 5% 2- pH 6.5, DDM 0.1M Na alcohol methylpen tane 5% tert-Amyalcohol Citrate -2.4-diol pH6.0 0.5% Triton X-100 Structure Lipopeptide Glyco- Lipopeptide Penem-enzyme Apo-enzyme 8 lipopeptide- -enzyme Covalently sultam/morpholino enzyme Non- Bound derivative Non- covalently inhibitor-enzyme covalently Bound Non-covalently Bound Bound Buffer 20 mM Tris-HCI, pH 20 mM Tris- 20 mM Tris- 7.4, 0.5% Triton X- HCI, pH 7.4, HCI, pH 7.4, 100 0.5% Triton 0.5% Triton Triton X-100 X-100 X-100 Method Sitting-drop Sitting-drop Sitting-drop Sitting-drop Sitting-drop vapor diffusion vapor vapor vapor diffusion vapor diffusion diffusion diffusion Temperature 20•‹C Room temperature temperature Protein 15.0 rnglml 18.0 mglml 10 mglml 10 mglml 10 mglml concentration

Vm: The crystal volume per unit of protein molecular weight (Volume of asymmetric unit/rn.w.) (A3/~a)

# MoleJAU: Number of molecules in the asymmetry unit (AU), The AU IS the smallest unit that can be rotated and translated to generate one unit cell using only the crystallographic symmetry operators. The asymmetric unit could contain one, more than one molecules, or a half or a quarter of one molecule. Rwork: The R factor, is defined as Rwork= E 1 Fobs- FWlc( 1 E Fobs.It is a numerical indicator to follow progress of refinement agreement between data and model and indicates the accuracy of the model. Rrre&The free R factor is also a numerical indicator to follow progress of refinement agreement between data and model using 10% of the native data, which were randomly chosen and excluded from the refinement. It is more accurate than Rworkin valuation the accuracy of the model.

(Rrree= 1hkl* T (1~~~~1- IF~~~I)'I t hk16 T where t hkl8 T are reflections belonging to a test set of 10% of the data, and Fobsand F,,I, are the observed and calculated structure factors, respectively.) RMSD: Root Mean Square Derivation Mole. Molecule p: Present a: Absent 4.3.2 Structural Comparison of SPase A2-75 in Complex with Arylomycin A2lBAL0019193 and Previously Solved SPase A2-75 Structures

The asymmetric unit contains two copies of the SPase A2-75lArylomycin

A2lBAL0019193 complex displaying nearly identical conformations.

Superposition of these two molecules in an asymmetric unit yields Root Mean

Square Deviation (RMSD) values of 0.96 A (a Carbon, Ca), 1.02 A (backbone).

The binding of Arylomycin and BALOOI 91 93 to SPase A2-75 does not

alter the overall conformation of SPase Q-75, since the overall structure of

SPase A2-75 in complex with these two inhibitors is quite similar to those of apo-

enzyme, penem-enzyme, lipopeptide-enzyme, and glycolipopeptide-enzyme,

containing two anti-parallel P-sheet domains and an extended P-hairpin

protruded from domain 1 (Figure 1.5, Chapter 1). The first three residues

(residues 76-78) in molecule A and the first four residues (residues 76-79) in

molecule B are flexible and were not visible in the electron density. Superposition all chains with those molecules from apo-enzyme, penem-enzyme, lipopeptide- enzyme and glycolipopeptide-enzyme yield RMSD of Ca trace of the main chain

backbone of each molecule is 0.74 and 0.77 A (vs. apo-enzyme), 0.70 and 0.72

A (VS.penem-enzyme), 0.44 and 0.47 8, (vs. lipopeptide-enzyme), and 0.60 and

0.62 A (vs. glycolipopeptide-enzyme), respectively.

4.3.3 Inhibitor Arylomycin A2 and BALOOI9193 at the Active Site

The initial Fo - Fc difference map showed a well-defined electron density corresponding to Arylomycin A2. The electron density for Arylomycin A2 clearly shows a large ring-shaped density corresponding to the ring formed through three residues (MeHpg - L-Ala - L-Tyr) of the hexapeptides, and the rod-shape density corresponding to the N-terminal tripeptide (D-MeSer-D-Ala-Gly) (Figure

4.4 A and B).

Interestingly, the electron density for the iso 12-carbon fatty acid tail of

Arylomycin A2 was obvious after completing one cycle of simulated annealing refinement with CNS. The lipid tail fits into the density with additional cycles of refinements. This fatty acid tail clearly makes its main molecular interaction with part of a large hydrophobic surface (residues Glu 82, Pro 83, Gln 85, Phe 100 and Trp 300) of SPase A2-75 (Figure 4.5 A and B), which is predicted to bind to lipids of the membrane [I81. The electron density of the iso 12-carbon fatty acid tail of Arylomycin A2 has a curved shape, indicating a curved conformation for this iso 12-carbon fatty acid tail. The electron density of the iso 12-carbon fatty acid tail of Arylomycin A2 is observed for only one inhibitor molecule of the asymmetric unit (contoured at 1 sigma, Figure 4.4). Interestingly, no electron density was seen for the fatty acid tail (iso 12-carbon saturated fatty acyl chain) in the previously solved Arylomycin A21SPase A2-75 crystal structure [40]. In addition, it should be noted that in a recently solved glycolipopeptide-enzyme crystal structure (Figure 3.4 and 3.5, Chapter 3), the 17-carbon unsaturated fatty acid tail actually assumes a different conformation. In that case, it extends out and runs parallel to the N-terminus of SPase A2-75, interacting with N-terminal

(residues 80-84) of SPase A2-75 (Figure 3.5, and 3.6). Taken together these data, suggest that the fatty acid tails of these lipopeptide inhibitors are dynamic. Moreover, the electron density for a sultam/morpholino derivative

BAL0019193 was gradually increased along with the refinement progress, particularly after performing a cycle of simulated annealing refinement through

CNS. This small inhibitor is located above ("capping") the active site and taking a parallel orientation relative to the ring moiety of Arylomycin A2 (Figure 4.48). Figure 4.4: Electron density for inhibitor Arylomycin A2 and BAL0019193 bound at the active site of E. coli SPase I A. A cross-validated 2Fo-Fc electron density map contoured at lasurrounding the inhibitor lipohexapeptide Arylomycin A2. The molecule of inhibitor Arylomycin A2 is shown as a stick representation and coded by elements, with carbon atom in yellow, oxygen atom in red, and nitrogen atom in blue. 6.A cross-validated 2Fo-Fc electron density map contoured at 0.50 surrounding the inhibitor Arylomycin A2 (in magenta) and BAL0019193 (in orange). Along with them, a cross-validated 2Fo-Fc electron density map contoured at 0.50 surrounding residues of protein is shown in blue. The molecule of inhibitor Arylomycin A2 is shown as a stick representation and coded by elements, with carbon atom in yellow, oxygen atom in red, and nitrogen atom in blue. The molecule of inhibitor BAL0019193 is shown as a stick representation and coded by elements, with carbon atom in cyan, oxygen atom in red, and nitrogen atom in blue. The residues of protein are shown as a stick representation and coded by elements, with carbon atom in green, oxygen atom in red, and nitrogen atom in blue. The red labelling is for the inhibitor atoms and the black labelling is for protein residues (in one letter code). The inhibitors shown here are from molecules A. Figure 4.5: The overall binding theme of Arylomycin A2 and BAL0019193. A. Protein is represented as ribbon representation and coloured in green. Inhibitor Arylomycin A2 is represented as stick and coloured by element with carbon atom in yellow, nitrogen atom in blue, and oxygen atom in red. Inhibitor BAL0019193 is represented as stick and coloured by element with carbon atom in cyan, nitrogen atom in blue, and oxygen atom in red. B. SPase A2-75 is in molecular surface representation with the following colour scheme: basic residues are in blue, acidic residues are in red, cys residues are in orange, all others are in grey. Inhibitors are represented and coloured the same as that in diagram A. The red labelling is for the inhibitor atoms and the black labelling is for protein residues in one letter code.

4.3.4 Analysis of Interactions among Arylomycin A2 and BAL0019193 as well as SPase I

The inhibitor Arylomycin A2 is non-covalently bound to the active and substrate-binding sites of SPase A2-75, with one associated water molecule at the binding site (Figure 4.6 and Table 4.4). An average of 487.8 A2 and an average of 187.4 A2of solvent accessible surface area on SPase A2-75 are buried by the inhibitor Arylomycin A2 and inhibitor BAL0019193, respectively. The interactions between the inhibitor Arylomycin A2 and SPase A2-75 are mediated through both the surrounding amino acid side chains and backbone atoms on SPase A2-75, and through the peptide-backbone of the 3-residue biaryl-bridged ring system (MeHpg-L-Ala-L-Tyr) and the N-terminal peptide body

(D-MeSer-D-Ala-Gly) of the inhibitor (Figure 4.5, and 4.6). The inhibitor is positioned with its C-terminus (the 3-residue biaryl-bridged ring system (MeHpg-

L-Ala-L-Tyr) pointing toward the active site, and makes a parallel p-strand interaction with P-strand 142-145 that lines the SPase binding pocket. The main chain of the N-terminus of the inhibitor (D-MeSer-D-Ala-Gly) also makes a parallel P strand interaction with P-strand 83-90 that lines the other side of the binding pocket.

Similar to the previous crystal structure of binary SPase A2-75lArylomycin

A2 complex [40], the C-terminal caboxylated oxygen atom 045 interacts with all three of the enzyme catalytic residues including the nucleophile Ser 90 Oy, the general base Lys 145 NC, and the oxyanion hole Ser88 Oy. Its carbonyl oxygen atom 044 is hydrogen bonded to lle144 N and the general base Lys 145 NC. The

N-terminal end of the 3-residue biaryl-bridged ring system (MeHpg-L-Ala-L-Tyr) houses the most interactions between the inhibitor and the protein, including hydrogen bonding interactions through N33, N28, 027, 015 , 011, N7, and 04 to protein residues Asp 142 , Ser88, Gln85, Phe84, Pro83, and Glu82. Among these hydrogen bonds, the one from 027 of the inhibitor to the residue Asp 142

N is through a water molecule (water 144 in molecule A and water 137 in molecule 0). There are several van der Waals interactions observed to assist the recognition between protein and Arylomycin A2. There are van der Waals contacts from the C30 methyl group to protein residues Ile 144 and Phe 84.

Notably, the C30 methyl group from the Ala side chain of Arylomycin A2 points approximately into S3 substrate-binding pocket.

In addition, iso 12-carbon fatty acid tail of Arylomycin A2 helps to stabilize

Arylomycin A2 binding through six van der Waals contacts, including interactions from atoms C50-52, C53, C58, and C59 to Pro83, Trp300, Gln85, and Phe100, respectively (Figure 4.5 and 4.6).

The sultamlmorpholino derivative BAL0019193 residing above the active site interacts with both SPase and Arylomycin A2 via hydrogen bonding and van der Waals interactions from atoms Sl, 02, N6, N9, and 08 of BALOOI 9193 to residues Ser88, Ser90, Lys 145, Asn 277, Ala279, and Glu307 of SPase, as well as 045 of Arylomycin A2 (Figure 4.6 and Table 4.4). Particularly, the atoms Sl,

02, N6, N9, and 08 of BAL0019193 make hydrogen bonds with SPase catalytic residues Ser88 Oy and Ser9O N. Ser88 Oy and Ser9O N, which are involved in formation of the oxyanion hole, stabilize the negatively charged tetrahedral intermediate during SPase substrate cleavage. Thus, BAL0019913 binding may disturb the formation of oxyanion hole. In addition, the atom C5 of the four- membered sultam ring of BAL0019193 replaces the Water 3 (in apo-enzyme numbering) that is to be a deacylation water to attack acyl-enzyme intermediate in SPase catalytic mechanism (Figure 4.8) [41]. Therefore, working together with

Arylomycin A2, BAL0019193 further inhibits SPase I. This may explain why the inhibition effect increases 1,000 times when both inhibitors are present, (versus

Arylomycin A2 present alone) during SPase catalysis. The molecular interactions of SPase and inhibitor Arylomycin A2 as well as BAL0019193 are described in the Table 4.4 and Figure 4.6. Figure 4.6: Diagram of inhibitor Arylomycin A2 and BAL0019193 interact with SPase A2- 75 at the active and substrate-binding sites Dashed lines indicate hydrogen bonds and van der Waal interactions. The interaction distances (A) are indicated for the inhibitors of molecule A. Table 4.4: Inhibitor-SPaseA2-75 contact distance (Arylomycin A2 and BALOO19193) Inhibitor atom Protein atom Distance (A) Distance (A) Arylomycin A2 Molecule A Molecule B C59 Phel 00CE2 4.2 ma. C58 Gln85NE2 4.5 n.a. C53 Trp300CH2 4.7 ma. C 52 Pro83 CG 4.4 ma. C50 Pro83 CG 4.6 n.a. C48 Pro83 CG 5.0 n.a. 047 Pro83 N 4.3 4.5 04 Glu82 0E2 3.0 3.3 N7 Pro83 0 3.0 3.2 011 Gln85 NE2 2.8 2.8 015 Gln85 N 2.9 2.9 027 Asp142 N (via WAT 144 in molecule 2.912.9~ 3.012.8~ A and WAT 137 in molecule B) N28 Gln85 0 3.0 3.2 N33 Asp142 0 3.0 2.9 044 lle144 NlLys145 Nz 2.813.2 2.614.2 045 Lys145 NzlSerSO 0G/Ser88 OG1 08 3.6/3.0/3.0/3.4/3.2 3,613.813.813.5 in MOL/H,0194 in molecule A and 14.0 H20197 in molecule B C30 Phe84 CD2/Asp142 01 IIe144 CG2,CB 4.014.2/3.9,4.3 4.2/4.3/3.7,4.0 BALOOl99l3 014 Gl~3070~~/Ser88Nl025 in A2 3.214.815.2 3,415,714 5 N9 Ser88 OG 3.7 4.3 08 045 in A21 Ser88 OG/ 3.414.415.614.3 3.515.314.614.5 -Lys145NE/Tyr143 OH -- N6 Ser88 0GlSer90 0G1045 in A2 3.314.213.3 3.5/4.5/3.3 C5 Lys145NE 4.0 3.5 C4 Ala 297 N/CB 3.114.0 3.014.0 02 Asn2770lSer88 OG, NI Ser9O OG, N 3.0/3.6,4.0/4.3, 3.2 3.0/3.4,4.0/4.1 ,3.0 S1 Ser88 0GlSer90 OG 4.014.6 3.814.9

n.a, denoted that the interaction was not observed. b Inhibitor atom to water distancelwater to protein atom distance MOL denoted to the sultam/morpholino derivative BAL0019193 A2 denoted to the inhibitor Arylomycin A2 4.3.5 Comparison of the Substrate Binding Site of SPase A2- 751Arylomycin A21BAL 0019193 with Apo-enzyme, Acyl-enzyme, Lipopeptide-enzyme, and Glycolipopeptide-enzyme Complex

Superposition of the active and binding site residues of the current ternary

structure SPase A2-75lArylomycin A2lBAL 001 91 93 onto previously solved

crystal structures; the p-lactam acyl-enzyme, (PDB code: 1B12, space group

P21212),the apo-enzyme, (PDB code: 1KN9, space group P41212),the

lipopeptide arylomycin A2-enzyme complex, (PDB code: 1T7D, space group

P43212),and glycolipopeptide-enzyme (space group P43212,in Chapter 3) shows

an overall similarity of the active site and binding sites among all structures

(Figure 4.7B). In particular, it is almost identical among the ternary complex

SPase A2-75lArylomycin A2lBAL 0019193 and the binary complex SPase A2-

75llipopeptide arylomycin A2, as well as the SPase A2-75lglycolipopeptide

BAL4850C structure (Figure 4.7A), except for two subtle differences in the side

chain conformations of Dl42 and lle144. One is that the average %angle (based

on two molecules in the asymmetric unit) of Dl42 turned from 45.6" in the

glycopeptides-enzyme (SPase A2-75lBAL 4850C) structure to 74.8" in the

lipopeptide-enzyme (SPase A2-751 arylomycin A2) binary complex structure , then further to 112.4" in the ternary complex structure (SPase A2-75larylomycin

A2lBAL0019193). The other is that the average 2 angle of lle144. The average angle of 168.2" in the SPase A2-75larylomycin A2lBAL0019193 structure is similar as that in the SPase A2-75lBAL 4850C structure (166.3"), but the average

2 angle is smaller (121.6") in the SPase A2-751 arylomycin A2 structure. At the active sites, indicated by three catalytic residues, the geometry of the catalytic residue Ser 90 Oy adopts the same conformation with a similar average XI (-7O0), but the geometry of the other catalytic residue Ser 88 0-jand

Lys 145 N< are slightly different. The average X, angle for Ser 88 is similar among the crystal structures of glycolipopeptide BAL 4850C-enzyme, apo-enzyme, and lipopeptide Arylomycin A2-enzyme, and Arylomycin A2lBAL0019193-enzyme with a value of 83.7", 70.4", 79.0•‹,and 79.6", respectively, consistent with its role in the formation of the SPase oxyanion hole and stablization of the tetrahedral oxyanion intermediate. However, the average XI angle 52.8" for Ser 88 in penem- enzyme is forced out of its common position (the average of 78.2" for all of known structures of SPase A2-75 ) to avoid a stereo-clash with the thiozolidine ring of the inhibitor (Figure 4.76). Interestingly, the average x4 angle of Lys 145 is similar in apo-enzyme (178.3") and penem-enzyme (168.8") because they share the similar H-bonding fashion. The atom of N< of Lys 145 makes H-bonds with

Ser 90 Oy in both structures to fulfil its general base role. However, despite a H- bond between N< of Lys 145 and Ser 90 Oy, there is an additional H-bond formed between the N< of Lys 145 of protein and 045 of the inhibitor in Arylomycin

A2lBAL0019193-enzyme, BAL 4850C-enzyme and lipopeptide Arylomycin A2- enzyme structures. This forces the average x4 angle from the common position

(178.3" in apo-enzyme or 168.8" in penem-enzyme) to the inhibitor-tendentious position (69.9" in Arylomycin A2/6AL0019193-enzyme, 72.0" in BAL 48506- enzyme and 69.2" in lipopeptide Arylomycin A2-enzyme). The displacement of the Nc of Lys 145 weakens the proton attraction force from Ser 90 by the N( of Lys 145. In other words, it decreases the ability of the Nt; of Lys 145 to pull the

proton away from the nucleophile SerSO. This suggests that the lipopeptide-

based inhibitor usurps the role of Lys 145 as a general base.

For \1~

Figure 4.7: The active site superposition of the ternary SPase A2-75lArylomycin A21BAL 0019193 complex crystal structure with apo-enzyme, fl-lactam inhibitor acyl- enzyme, lipopeptide-enzyme, and glycolipopeptide-enzyme crystal structures. The active site of ternary SPase A2-75lArylomycin A2lBAL 0019193 complex crystal structure is shown as stick representation in red and that of apo-enzyme, p- lactam inhibitor acyl-enzyme, lipopeptide-enzyme, and glycolipopeptide-enzyme are shown as stick representation and in green, cyan, blue, and magenta, respectively. A. Superposition of the ternary SPase A2-75lArylomycin A21BAL 0019193 structure onto lipopeptide-enzyme and glycolipopeptide-enzyme structures. B. Superposition the structure of SPase A2-75lArylomycin A21BAL 0019193 onto all four structures of apo-enzyme (PDB code:1 KN9 [41]), acyl-enzyme (PDB code:1 B12 [I81 and lipopeptide-enzyme (PDB code:IT7D [40]) and glycolipopeptide-enzyme (Chapter 3). The inhibitor has been removed for clarity. All of molecules are chosen from molecule A of each corresponding structure.

According to the previous study, three conserved waters near the active site were observed in apo-enzyme structure and designated as water 1, water 2, and water 3 (Figure 3.8). The water 1 is coordinated with Met 91 backbone NH, Ser88 0, and Leu 95 0. The water 2 is coordinated with Ile 144 NH. The water 3 is with coordinated with K145 N< and serves as deacylation water in SPase catalysis [41]. For convenience, the following description of conserved waters is kept in apo-enzyme numbering. In penem-enzyme structure, water 1 was present, but water 2 and 3 were displaced by the inhibitor [I81. In lipopeptide- enzyme structure, water 1 and water 3 were present, but water 2 was displaced by Arylomycin A2 [40]. Similarly, in the lipopeptide Arylomycin A2-enzyme structure and in the glycolipopeptide BAL4850C-enzyme structures, water 1 and water 3 were present. By contrast, water 2 was displaced by inhibitor BAL

4850C or Arylomycin A2 (Figure 3.9). Instead of hydrogen bonding between water 2 and Ile 144 N, it is 044 of inhibitor (either Arylomycin A2 or BAL 4850C) that makes the H-bond. Interestingly, in the current ternary Arylomycin

A2lBAL0019193-enzyme, only water 1 is present. This feature makes the inhibition model of the ternary complex SPase/Arylomycin A2lBAL0019193 similar to that of the penem-enzyme rather than that of the binary

SPaseIArylomycin A2. Note that water 2 is replaced by the atom 044 of

Arylomycin A2, and water 3 is replaced by the atom C5 of BAL 0019193 (Figure

4.8) in the ternary complex structure. Compared to the inhibition mechanism from previously solved SPase Ilinhibitor complex structures, we can see the following differences. Covalent inhibitor penem inhibits SPase I through three primary mechanisms: (i) covalently binding to Ser9O to prevent its role as a nucleophile;

(ii) replacement of deacylation water (water 3 in apo-enzyme numbering) to block the deacylation; (iii) expanding the substrate-binding pocket S1 (supporting data in page 11 1) and changing the shape of substrate-binding pocket S3. Non- covalent inhibitors such as Glyco-lipopeptide (BAL4850C) and lipopeptide

(Arylomycin A2) inhibit SPase I through four major mechanisms: (i) The COOH terminus makes hydrogen bonds with all three catalytic residues Ser90, Ser88 and Lys145 to prevent all roles of the nucleophile, the general base, the oxyanion hole; (ii) The hexapeptide backbone makes parallel P-sheet type hydrogen bonds to the main chain and the side chain atoms in the protein residues that line the

SPase I binding site from the P-strand 85-82 and the P-strand 145-142; (iii) The methyl (C30) mimics substrate Ala side chain and points into the S3 substrate- binding site of SPase I, which competes for the substrate-binding site; (iV) The fatty acid tail makes van der Waals contacts with the cytoplasmic membrane association surface of SPase 1, which may increase the affinity of the inhibitor to

SPase I. BAL0019193 increases the inhibition of Arylomycin A2 by three means:

(i) it coordinates with the nucleophile and the oxyanion hole; (ii) it displaces the deacylation water; (iii) it helps to block the active site, may slow down the off rate for inhibitorISPase complex. The replacement the deacylation water and capping the active site by the inhibitor BAL0019193 seems to enhance non-covalent inhibition by Arylornycin A2 alone. This could explain the 1,000-fold co-inhibition observed when both BAL0019193 and Arylomycin A2 present during SPase 1 substrate processing.

The superposition of inhibitor Arylomycin A2 from the ternary Arylomycin

A2/BAL0019193-enzyme complex structure and inhibitor Arylornycin A2 from the binary Arylomycin A2-enzyme complex structure at the active and binding site demonstrates that the geometry of C-terminal ring moiety of two inhibitors is almost superimposable. Nevertheless, the position of Arylomycin A2 in the ternary complex structure is slightly lower (Average 0.2 A from molecule A and

B) than that in the binary Arylomycin A2-enzyme structure. We hypothesize this is caused by BAL 0019193 binding. Arylomycin A2 slides down and leaves more space for BAL 0019193 occupancy. The N-termini of two inhibitors adopts a close geometry but shifts slightly in path, beginning with atom C13 down to atom

C48 (Figure 4.8). The geometry similarity of Arylomycin A2 displayed in different complexes imply that SPase 1 recognizes this inhibitor in the same way even though in a different environment, binary complex verse ternary complex

(BAL0019193 was made to bind to SPase I first in the experimental procedure).

This subtle difference at the N-termini between two inhibitors is caused by the dynamic nature of the iso 12-carbon fatty acid tail. 03

WAT 3

Figure 4.8: Structural superposition of the SPase inhibitor Arylomycin A2 that is co- present with BAL0019193 in the ternary complex and Arylomycin A2 that is present alone in the lipopeptide-enzyme complex at the active site of E. coli SPase. The structures of inhibitor1SPase A2-75 complexes are shown as line representation with Arylomycin A2lBAL00191931SPase in red and Arylomycin A2lSPase in blue. The inhibitors Arylomycin A21BAL 0019193 in the ternary complex and Arylomycin A2 in the lipopeptide-enzyme complex are shown as stick representations and coloured by element. Carbon atom in yellow, oxygen atom in red and nitrogen atom in blue is for Arylomycin A2 that is co-present with BAL 0019193 in the ternary complex structure and with carbon atom in cyan, oxygen atom in red, and nitrogen atom in blue for BAL 0019193. Carbon atom in grey, oxygen atom in red and nitrogen atom in blue is for Arylomycin A2 that is present alone in lipopeptide-enzyme structure. Conserved water 1 and water 3 (in the apo- enzyme numbering) at the active site are represented as spheres with the same colour code as the corresponding residues at the active site. Water 2 (red and blue spheres) from both structures is highly overlapping.The residues of SPase A2-75 are labelled as single letter in black. The atoms of inhibitor are labelled in red. The coordinates for Arylomycin A2 were from the lipopeptide-enzyme crystal structure (PDB code: 1T7D) [40]. CHAPTER 5: THEORETICAL EXAMINATION OF THE CRYSTAL-PACKING CONTACTS AND STRUCTURAL VARIATION IN E. COLI SPASE A2-75 CRYSTALS

5.1 Introduction

In solution, the polypeptide chain of a protein is folded into a stable globular structure, through amino acids with hydrophobic side chains mainly in the core of the structure and amino acids with hydrophilic side chains on the surface of the structure. The surface of a globular protein is irregular and full of cavities and bulges. During the crystallization process, individual globular protein molecules pack themselves into crystals.

Crystals are precisely ordered, three-dimensional, repeating arrays of molecules. The characteristics of distinct crystals are described using crystallographic terms including asymmetric unit (AU), unit cell, space group, and crystal lattice [149]. By definition, the asymmetric unit is defined as the basic building block of a crystal. It is the smallest unit that has no self-symmetry elements and can be rotated and translated to generate one unit cell using only crystallographic symmetry operations. An asymmetric unit could contain one, or more molecules, and even a half or a quarter of one molecule if the molecule does contain a self-symmetry. The central for understanding the concept of a crystal is applying a complete set of crystallographic symmetry operations of rotation, and translation, called a space group (which is a mathematical description of the symmetry of unit cells). An asymmetric unit could produce one unit cell, and applying only translation indefinitely to one unit cell could construct a crystal (Figure 5.1). A crystal lattice is built up by the origins of unit cells through the three-dimensional space. A crystal lattice is defined as the regular spacing (defined by lengths and angles) grid of the origins of the individual unit cells.

A asymmetric unit B space group C unit cell

D crvstal lattice

Figure 5.1: Creation of a crystal A. A representation of an asymmetric unit, a basic building block of crystal. 8. Space group, a set of crystallographic symmetry operations (here three perpendicular normal two-fold axes). Space group is applied to the asymmetric unit to generate a closed set of asymmetric units. C. Unit cell is chosen such that it contains a complete set of the asymmetric units and reflects the symmetry properties of the group. D. Crystal is built up by indefinitely translating unit cell through three-dimensional space and crystal lattice is formed by the origins of unit cells. Individual globular protein molecules pack into a crystal lattice by

contacting each other at only a few small regions, thus leaving large holes or

channels to accomodate their irregular surfaces. Such contacts are governed by

surface interactions, known as crystal-packing contacts. Crystal-packing contacts

is a term used for those non-specific intermolecular interactions, occurring solely as the result of protein crystallization. In contrast, specific intermolecular interactions are defined those interactions including recognition phenomenon between antibody and antigen, enzyme and inhibitor, functional association of protein-protein complexes, and physiological interactions between the subunits of a multimetric protein [I 50-1531. Both non-specific and specific interactions exist in crystals, but the former are involved in crystallization.

Although non-specific and specific interactions share the same physico- chemical basis such as hydrogen bonds, hydrophobic-hydrophobic, electronstatistic, and van der Waals interactions, they are distinct from each other. Specific interactions are unchangeable and occupy the specific location of the same protein. However, non-specific interactions fluctuate in combination and strength, and reside in various areas depending on different crystal arrangements of the same protein. This is also the reason why crystals of the same protein can be grown in a variety of space groups.

Unlike specific interactions, non-specific interactions in crystal-packing contacts usually receive little attention. Carugo and Dasgupta groups have found that the contact patches in crystal packing are smaller than that in oligomeric interfaces [I54, 1551. Commonly, there were only 10 to I00 atoms involved in one patch in crystal contacts but 100-1000 atoms engaged per patch in oligomer interfaces. More than just the distinction in contact size, Bahadur and co-workers have compared the specific interactions from 77 monomeric protein-protein complexes and 122 homodimeric proteins with the non-specific interactions from

188 pairs of "crystal dimers" of monomeric proteins. They concluded that (i) non- specific interactions are less hydrophobic than that in homodimers, (ii) occur with lower affinity and loose packing, and (iii) contain fewer fully buried atoms [I561.

These authors further presented that non-specific interactions (from 173 large monomeric protein crystal-packing contacts) are more hydrated or "wet" with 15 water molecules per 1000 A2 of interface area, which could penetrate into the protein-protein interface. In contrast, specific interactions from 46 monomeric protein-protein complexes and 115 homodimeric proteins are more hydrophobic or "dry" with 10 -1 1 water molecules per 1000 A2 of interface area, and the water molecules form a ring layer around the interface atoms [I571.

Irregular protein surfaces are not optimal for crystal packing. Lys (68% exposed, 26% partial exposed, and 6% buried), Glu (only 12% buried) are predominately located on the surface and their large, stretched side chains largely contribute to the irregularity of a globular protein surface [158]. Arg also has very long flexible side chain but not often exposed on protein surface [159].

Therefore, surface Lys and Glu are not desirable for protein crystal contacts.

Protein crystallization is a thermodynamic process. Entropic effects have been considered since AGOcwstar= AHocrystai- TASocrystal, and AHoCrystalis insignificant in negative contribution of AGOcvstal,for instance, AHoCrystalis -70 KJImol for lysozyme, 0 KJImol for ferritin, and 155 KJImol for haemoglobin C

[159]. In contrast, ASocVstalhas a significant effect. The driving force of protein crystallization is releasing water molecules from free protein molecule surface as the crystal contacts are formed, therefore increasing the entropy. However, this increasing entropy is subtracted by the entropic cost of ordering protein molecules in crystal lattice and by the loss of conformational freedom of side chains involved in the crystal contacts. Removing the high conformational entropic surface residues results in surface-entropy reduction, and consequently, lowers the entropy cost needed to constrain the high conformational entropic side chains in crystal contacts thereby enhancing protein crystallization. This leaves a significant opportunity for improvement of crystallization by modifying the surface residues, known as surface mutation or surface substitution, therefore altering the non-specific crystal-packing contacts, which opens a new way in protein crystal engineering.

It is believed that surface residues with high conformation entropy, in particular of Lys (-2 Kcallmol), and Glu (1.55 -1.75 Kcal/mol), impede protein crystallization 1160, 1611. Lys and Glu are good targets for protein crystal engineering. Mutation of Lys or Glu to Ala is the most productive strategy leading to the high order and high diffraction quality crystals [154, 1621. There are also more supporting examples of surface substitutions that remarkably decrease the surface entropy and change crystal-packing pattern, a switch of Lys to Ala, Glu to

Ala, Trp to Ala resulting improvement of crystal quality [163-1681. Arg and Gln are also the two most favoured residues in crystal contacts,

thus, not only mutate Lys or Glu to Ala, but also mutate Lys or Glu to Arg or Gln.

[169, 1701. The side chain entropy of Arg is not high because Arg is not frequently exposed on the surface of proteins [159]. The reason is that non- specific protein crystal contacts tend to include polar interactions (e.g. Arg

provides electrostatistic interaction) rather than hydrophobic residues [I541.

In order to predict precise surface mutant sites in protein crystal engineering, one must have prior knowledge of the crystal-packing contacts.

Many laboratories have reported that the knowledge of crystal-packing contacts directed their experiments by creating or omitting [168, 1711, strengthening or weakening [I721 only a few key crystal-packing contacts on protein surfaces.

The information of crystal-packing contacts of a protein can be found in the PDB. The PDB is a treasure of structural information that is expanding rapidly with the increasing number of crystal structures being solved and submitted.

There were over 41 527 structures (Feb.6, 2007) stored in the PDB, and according to the daily report by RCSB at the time of this writing, there have been more than 50 new depositions per week since the year 1999 [55]. A PDB file contains not only atomic coordinates of a protein for structural visualization, but also a large amount of underlying information on protein-protein crystal-packing contacts.

Paetzel and co-workers have deposited three crystal structures of E. coli

SPase A2-75 into the PDB (accession codes: 1B12, 1KN9, and lT7D) [I8,40,

411. These crystals were grown under different conditions and they formed different unit cells (space group and unit cell dimensions), making them valuable for examining protein-protein crystal contacts. E, coli SPase A2-75 is a monomeric molecule in solution but found either as a "crystal dimer" or "crystal tetramers" in the asymmetric unit of crystals. The analysis of protein-protein crystal-packing contacts would help to address how crystal-packing contact patches are distributed on each single chain of SPase A2-75. Are these contacts common in various space groups? What surface residues should be mutagenized in order to improve the diffraction quality of crystals?

The information surely would provide clues for altering the protein contracts by genetic manipulation (i.e. site-directed mutagenesis), improving the crystallographic resolution of E. coli SPase I. We have found that a slight change in crystallization conditions (eg, one amino acid of SPase A2-75 such as SPase

A2-75S90A or mixing SPase A2-75 with any new inhibitor) require a significant effort to screen for crystallization conditions. In addition, most of the crystal diffraction data were around resolution 2.5 8, and could not be improved due to the unperfected crystal lattice, which complicated data processing and structure determination. In addition, most crystals formed a very large unit cell, leading to diffraction spots very close to one another or overlapping in reciprocal space.

This is a problem in the process of data indexing and scaling because of a non- negligible percentage of uncertainty. The best solution to solve this problem is to collect data using synchrotron short wavelength radiation, which is very expensive for a routine data collection required in inhibitor screening. Taken together, the crystallization of SPase A2-75 is a rate-limiting step and hinders the progress of using SPase A2-75 as a drug target for screening inhibitors and further development of a novel antibiotic.

We have investigated the crystal-packing contacts and structural variation of SPase A2-75. Analysis of the crystal-packing contacts of SPase A2-75 revealed the origins of disorder. We have suggested strategies to re-design the new constructs. We hope that the knowledge will have practical value in SPase

A2-75 crystal engineering.

5.2 Materials and Methods

5.2.1 The Structure Data and the Analysis of Crystal-Packing Contacts

The coordinates of atoms of SPase A2-75 crystals were taken from the

PDB for orthorhombic crystals (penem-enzyme, PDB code: 1BIZ, space group

P21212), tetragonal crystals (apo-enzyme, PDB code: 1KN9, space group P4,2,2 and lipopeptide-enzyme, PDB code: 1T7D, space group P43212).The intermolecular crystal-packing contacts analyzed using the program CONTACT in CCP4i [I371 and the program XFlT [I391.

Atoms of two protein molecules were considered interacting if they are within a given threshold distance of < 58, [I561. Water molecules found in the crystal structure coordinates were not considered in the analysis. We defined a criterion of hydrogen bonds based on distance between atom 0 and N, with a distance cut-off at 3.38,, which were constituted of potential hydrogen-bonding atoms. Possible electrostatistic interactions (salt bridges) are included in the category of hydrogen bonds since a PDB file does not describe the position for H atoms. Other non-hydrogen-bonding interactions were considered as van der

Waals (VDW) interactions.

5.2.2 The Structural Variation Analysis

The superposition of the molecules per AU were aligned using the program Superpose [144].

5.2.3 Surface Area of Contact

The exposed surface area of the atoms in the free molecule were evaluated using the program Accessible Surface Area (ASA) and calculated using the program AREAIMOL in CCP4i [I371. The accessible surface area of an atom is defined as the area of envelope of the centre of a probe sphere (1.4& water molecule) which is in contact with the VDW surface of the atom.

5.2.4 Figure Preparation

Figures were prepared using programs XFlT [I391 and PyMol [56].

5.3 Results and Discussion

5.3.1 Crystallographic Characteristics of E. coli SPase A2-75 Crystals

Crystallographic characteristics of E. coli SPase A2-75 crystals including the space group, unit cell parameters, refinement parameters, and crystallization conditions are summarized in Table 5.1. Table 5.1 : Crystallographic characteristics of SPase 112-75 crystals 1 Orthorhombic 1 Tetragonal I Tetragonal 1 Structure I Penem-enzyme I Apo-enzyme 1 Lipopeptide-enzyme 1 Covalently Bound Noncovalently sbund PDB Code 1012 1KN9 1T7D -Space Group ( P2,2,2 / P4?212 1 P43212 Unit-cell parameters a b c (A) 110.7 113.2 99.2 112.4 112.4 198.7 69.6 69.6 258.5 ---a P Y (") 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 Resolution 1.9 2.4 2.5 RworklRfree 22.5127.5 23.9127.9 23.1128.5 Vm (A31~a) 2.78 2.78 2.8 # M;I~./AU 4 4 2 % Solvent 56.0 56.2 56.0

# Waters 448 258 304 I Other Ions 1 ~0~'- / none none MissingLoops 1Mole.A 6 C D1Mole.A B C D1Mole.A B 107-124 PaaP P a Pa a a 170-178 Papa a Pap a P 197-204 paaa PPPP a a 305-3 13 a a p a aa aa a P RMSD of Ca 0.51 0.75 0.62 (A\ Crystallization 1.OM NH4H2P04 0.7M NH4H2P04 15% wlv PEG 4000 Conditions O.1M Na Citrate pH5.6 0.1M Na Citrate pH5.6 20% Propanol 5% Tert-amyl alcohol 5% 2-methylpentane- 0.1M Na Citrate pH6.0 2,4-diol 0.5% Triton X-100

Vm: The crystal volume per unit of protein molecular weight (Volume of asymmetric uniffm.~.)(a3/~a) # Mole.lAU: Number of molecules in one asymmetry unit (AU), The AU is the smallest unit that can be rotated and translated to generate one unit cell using only the crystallographic symmetry operators. The asymmetric unit could contain one, more than one molecules, or a half or a quarter of one molecule. Rwork:The R factor, is defined as Rwork= .E 1 Fobs- FcalcI I1 FobsIt is a numerical indicator to follow progress of refinement agreement between data and model and indicates the accuracy of the model. Rfre,:The free R factor is also a numerical indicator to follow progress of refinement agreement between data and model using 10% of the native data which were randomly chosen and excluded from the refinement. It is more accurate than RWorkin valuation the accuracy of the model. 2 (Rrree= .Ehkld~(IFobsl - l~cacl)'1 1hklF~lFabsl , where t hk15~are reflections belonging to a test set of 10% of the data, and Fob, and FcaIcare the observed and calculated structure factors, respectively.) RMSD: Root Mean Square Derivation Mole. Molecule p: Present a: Absent 5.3.2 Overall Crystal Packing in the Three Crystal Lattices

Three crystals (PDB code: 1BIZ, 1KN9, and 1T7D) of E. coli SPase A2-75 have distinct asymmetric units, unit cells (defined by space group and cell dimensions of lengths and angles) (Table 5.1).

The primitive orthorhombic P21212crystals were from SPase A2-75 in the complex with the inhibitor penem, {allyl (5S, 6s)-6-[(R)-acetoxyethyll-penem-3- carboxylate}, with four molecules in the asymmetric unit. A primitive orthorhombic unit cell is defined as unit cell lengths a # b # c and unit cell angles a = P = y =

90". Angle a of a unit cell is defined the angle between the vectors b and c. Angle p is the angle between the vectors a and c and y is the angle between the vectors a and b (Figure 5.2 A).

In the term of space group P21212,P means primitive, 2,2,2 indicates that in a unit cell there is a two-fold screw axis (21) along with x- and y- axis, respectively, and there is a normal two-fold axis (2) along z-axis. Two-fold screw axis means that two identical asymmetric units in a unit cell are related in such a way that one asymmetric unit may be generated from the other by rotating it by

180" along the axis (here along either x or y), and then translating the asymmetric unit by 112 the unit cell length, and parallel with the axis (Figure 5.3

A). Normal two-fold axis means two identical asymmetric units in a unit cell are related in a way that one asymmetric unit may be generated from the other by rotating it by 180" only and there is no translation applied.

The primitive tetragonal P41212 crystals were from apo-enzyme of SPase

A2-75. A primitive tetragonal unit cell is defined as unit cell lengths a = b # c and unit cell angles a = P = y = 90" (Figure 5.2 B). Space group P41212contains one four-fold screw axis along x-axis (4,)? one two-fold screw axis (21) along y-axis, and one normal two-fold axis (2) along z-axis. Four-fold screw axis 41 indicates that two identical asymmetric units in a unit cell are related in such a way that one asymmetric unit may be generated from the other by rotating it by 36014"

(90") along the axis (here along x), and then translating the asymmetric unit by

114 the unit cell length, and parallel with x axis (Figure 5.3 B).

Primitive Orthorhombic a+ b f c o=p=y=goO

Figure 5.2: Primitive orthorhombic and tetragonal unit cell A. primitive orthorhombic unit cell has parameters as the unit cell lengths a # b # c and the angles a = P = y = 90". 6. Primitive tetragonal unit cell has parameters as the unit cell lengths a = b # c and the angles a = P = y = 90". Angle a of a unit cell is defined as the angle between the vectors b and c. Similarly, angle P is the angle between the vectors a and c and y is the angle between the vectors a and b.

The primitive tetragonal P43212crystals were from SPase A2-75 in the complex with the inhibitor lipopeptide-enzyme (Arylomycin A2). Space group

P432,2 contains one four-fold screw axis (43) along x-axis, one two-fold screw axis (2,) along y-axis, and one normal two-fold axis (2) along z-axis. Four-fold screw axis 43 means that two identical asymmetric units in a unit cell are related in such a way that one asymmetric unit may be generated from the other by rotating it 36014" (90") along the axis (here along x), and then translating the asymmetric unit by 114 the unit cell length, and parallel with x axis. However, the translation direction runs opposite against that of 4, (Figure 5.3 C).

Figure 5.3: Screw axes A. 2, screw axis along z. Applying of the 2, screw axis (rotation 180" and translation 112 the unit cell length) along z to the asymmetric unit at x, y, z creates an equivalent asymmetric unit at -x, -y, %+z. Further application of 2, screw axis along z creates the asymmetric unit at x, y, l+z, which is the same as x, y, z due to the periodicity of the unit cell. 8.4, screw axis along z. Applying of the 4, screw axis (rotation 90" and translation 114 the unit cell length) along z to the asymmetric unit at x, y, z creates an equivalent asymmetric unit at x, -y, 114+z. Further application of 4, screw axis along z creates another equivalent asymmetric unit at -x, -y, 112+z. Additional applying 4, screw axis along z then creates -x, y, 314+z. Again, further application of 4, screw axis along z creates the asymmetric unit at x, y, l+z, which is the same as x, y, z due to the periodicity of the unit cell. C. 43 screw axis along z. 43 screw axis operates the similar crystallographic symmetry operation as 4, but runs the opposite direction against that of 4, screw axis. The overall crystal packing contacts are shown in crystal lattice fashion for the different crystal forms of penem-enzyme (orthorhombic, space group P212~2,

PDB code: 1B12), apo-enzyme (tetragonal, space group P41212,PDB code:

1KN9), and lipopeptide-enzyme (tetragonal, space group P43212,PDB code:

1T7D) (Figure 5.4).

5.3.3 The Crystal-packing Contacts

The crystal contact analysis via the program CONTACT in CCP4i showed that each molecule in the three crystal forms (orthorhombic, penem-enzyme,

PDB code: 1612; apo-enzyme, PDB code: 1KN9; tetragonal, lipopeptide- enzyme, PDB: 1T7D) has 2 to 6 molecules to contact with (< 5A), from the asymmetric unit and surrounding symmetry related molecules (the symmetric equivalent of the asymmetric unit) (Table 5.2).

The individual crystal-packing contact of hydrogen bonds and VDW interactions are summarized in Table 5.2 and 5.3, respectively. In Table 5.2, each hydrogen bond was determined as hydrogen-bonded pairs (< 3.3A). The first residue corresponds to the molecule in the primed AU (x, y, z), and the second residue corresponds to either another molecule in the primed AU (x, y, z) or surrounding symmetry related molecule (< 5A). For example, in the first column and the fifth row of Table 5.2, molecule A of the primed AU (x, y, z) forms hydrogen-bonding contacts with molecule C in the primed AU (x, y, z), and it also forms hydrogen-bonding contacts with surrounding symmetry related molecules

C at (-x. -y, z), B at (x+1/2, -y+1/2, -z), and C at (x+1/2, -y+1/2, -2). orthorhombic B

a=110.7 b=113.2 ~99.2 a# b # c (-J=p = y=goO

tetragonal

tetragonal

Figure 5.4: Overall crystal-packing contacts (packing crystal lattice) in different crystal forms A. Unit cell of penem-enzyme crystal (orthorhombic, space group P21212, PDB code: 1812) and the asymmetric unit with 4 molecules is included within the unit cell. B. The view of ab plane of P212~2lattice. C. Unit cell of apo-enzyme crystal (tetragonal, space group P41212,PDB code: 1KN9) and the asymmetric unit with 4 molecules is included within the unit cell. D.The view of ab plane of P41212lattice. E. Unit cell of lipopeptide-enzyme crystal (tetragonal, space group P43212,PDB code: 1T7D) and the asymmetric unit with 2 molecules is included within the unit cell. F. The view of ab plane of P43212lattice. In 6, D, and F, corresponding unit cell that contains a complete set of symmetry related asymmetric units is outlined on the ab plane and the primed asymmetric unit is highlighted in red. Table 5.3 includes the VDW interactions between molecules in the primed

AU (x, y, z) but omits the VDW interactions from surrounding symmetry related

molecules since there are too many VDW interactions (< 5A) to be listed.

To supplement Table 5.2 and 5.3, a visualized representation for each molecule of the AU that hydrogen-bonding and VDW interaction contact with other molecules in the AU and surrounding symmetry related molecules is shown in Figure 5.5, 5.6, and 5.7, for the three different crystal-packing patterns, respectively. The contact patch distribution on each molecule (represented as ribbon) was mapped through surface contact residues (represented as spheres) along with residue labelling. Table 5.2: Hydrogen-bonded atom pairs (distance < 3.3 A) from the inter-chains of the asymmetric unit and symmetry related molecules of SPaseA2-75 crystals SPase-penem Apo-Spase I SPase-Arylomycin A2 Orthorhombic (1812) Tetragonal (1KN9) I Tetragonal (1T7D)-- # Molecules in per AU 4 (A, 8, C, D) 4 (A, 6,C, D) 2 (A1 B) Hydrogen-bonded atom pairs (A) I 1 Molecule A (x, y, z) to Molecule A (x, y, z) to Molecule A (x, y, z) to I c (x, Y14 B (x, Y, z) B (XIY, -4

B (-X, -y, -~+1/2) T216N,0-A179~lN(3.01, 2.97) K217d-~1780 (2.53) Molecule B (x, y, z) to Molecule B (x, y, z) A (x, Y, -4 K140 fd-~la2900(3.07) A (~+1/2,-y+1/2, - z+1/4) K126 N'-Val1 870 (2.90) K227C '-Dl 580 62 (2.95) B (~+112,-y+1/2, - z+114) ~1650''-~2140(3.13) (2167 N E2-~2130(3.24) ~1820-~2160'~(3.28) Table 5.2: Hydrogen-bonded atom pairs (distance < 3.3 A) from the inter-chains of the asymmetric unit and symmetry related molecules of SPaseA2-75 crystals (continued) SPase-penem Apo-SPase SPase-Arylomycin A2 Orthorhombic (1612) Tetragonal (1 KN9) Tetragonal (1T7D)-- # Molecules in per AU 6. 4 (A.I,, C. , Dl, 2 (A. B)

Hvdroaen-bonded atom- - airs (Al ,~~"-- 0- ,v, Molecule C (x, y, z) to Molecule C (x, y, z) to

~216~,0-~1790,~(3.26, 3.05) Molecule D (x, y, z) to Molecule D (x, y, z) to C (x+1/2, -y+1/2, -2) Y1150H-G266N, 0 (1.99, 3.03) Y1150H-Q267N (2.70) ~1600-~219~~~(2.90) E2180 "-~244cZ (2.70) N2190 6',~62-~244~,~ (2.97, 3.11) P2410-G220N (3.00) 12420-F196N (3.09) Q244N,O-F1960 c3.26, 3.25) ~2440"-N219 N (2.75) Q2460 "-~1970(2.72) ~2460E'-~2~5~Y (3.26) B (~+112,-y+1/2, -2) ~2150~'-~178N (3.21) T216N,O-A1790, N (2.96, 2.81) K217~~-~1770(2.64) Table 5.3: Residues involved van der Waal interactions (the distance < 5A) from the inter- chains of the asymmetric unit of SPase 82-75 crystals SPase- Arylomycin A2 1B12 1KN9 1T7D Molecule A-D Molecule A-C Molecule Y81-L316,S317 A-B L102-L316,1319 Q257-GI 73, E104-S317 (2174, T297,A298-1319, G320 A1 75 W300-M301, L316,1319 L316- Y81, L102, T297 S317-Y81,Gl~104 131 9-L102,T297, A298, W300 Molecule B-C Molecule A-B D158-V160,S161 PI59-N2 19 V160-D158, T165 V160,S161-G218,N219 S161-D158, Vl6O K162-N219 E163-V160, Sl6l Q194,F196,S197-E203,1242,E246 T165-V160 R198-A243,Q244,D245,E246, M249,D276,Y283 R199,N200-(2246, M249, S278,Ala279,Y283 G201-D276,N277,Y283 Molecule B-C E203-Q194,F196,D276,N277 N186-N186, E188, R226 A204.T205-G202, E203 Vl87, V187-N186 E218,N219,G220,1221-1242,A243,Q244,Q246 N219-PI59, V160, K162 E218-V160,S161 P241,1242,A243-T195,F196, S197,R198,K217,E218 ,N219,G220,1221 (2244-T195,F196,S197, R198,N219 D245,Q246,M249-S197,R198, N200,G202,N219,T245 D276.N277-R198,G201,E203,G207 Molecule A-C Molecule B-D S171,S172,G173-(2252, P254, L316-S317.1319 G255, Q256, (2257 1262, V263-P254, G255

The residues before a dash correspond to residues from one molecule in AU (x, y, z) and the residues after the dash correspond to residues from another molecule in AU (x, Y>a. Figure 5.5: Ribbon representations of the crystal-packing contact patches on each molecule of the orthorhombic crystals (PDB code 1812, space group P21212) Figures A, B, C and D show molecule A, B, C, and D, respectively, mapped with the crystal-packing contact patches. The molecule A, B, C, D are shown as black ribbons and the contact residues are shown as spheres. The contact residues that are H-bonding (< 3.3 A) with other molecules in the asymmetric unit (AU) are in red and with other surrounding symmetry related molecules are in blue (see Table 5.2). The contact residues that are van del Waals (< 5.0 A) interacting with other molecules in the AU are in light grey (see Table 5.3). The contact residues are labelled along with the contact patches and the labelling is in the same colour code. The catalytic residues Ser9O and Lys145 are shown as sticks and coloured in red and green, respectively, for the position guideline. Figure 5.6: Ribbon representations of the crystal-packing contact patches on each molecule of the tetragonal crystals (PDB code 1KN9, space group 4&2) Figures A, B, C and D show molecule A, 6,C, and D, respectively, mapped with the crystal-packing contact patches. The molecule A, 6, C, D are shown as black ribbons and the contact residues are shown as spheres. The contact residues that are H-bonding (< 3.3 A) with other molecules in the asymmetric unit (AU) are in red and with other surrounding symmetry related molecules are in blue (see Table 5.2). The contact residues that are van der Waals (< 5.0 A) interacting with other molecules in the AU are in light grey (see Table 5.3). The contact residues are labelled along with the contact patches and the labelling is in the same colour code. The catalytic residues Ser9O and Lys145 are shown as sticks and coloured in red and green, respectively, for the position guideline. Figure 5.7: Ribbon representations of the crystal-packing contact patches on each molecule of the tetragonal crystals (PDB code 1T7D1space group P43212) Figures A and B show molecule A and B, respectively, mapped with the crystal- packing contact patches. The molecule A and B are shown as black ribbons and the contact residues are shown as spheres. The contact residues that are H-bonding (< 3.3 A) with the other molecule in the asymmetric unit (AU) are in red and with other surrounding symmetry related molecules are in blue (see Table 5.2). The contact residues that are van der Waals (c 5.0 A) interacting with the other molecule in the AU are in light grey (see table 5.3). The contact residues are labelled along with the contact patches and the labelling is in the same colour code. The catalytic residues Ser9O and Lys145 are shown as sticks and coloured in red and green, respectively, for the position guideline.

At first glance, it is remarkable that the overall crystal-packing contacts

share up to 80% identity between two distinct morphologic crystals, orthorhombic

crystal (penem-enzyme, space group P272,2, PDB code:IB12) and tetragonal

crystal (apo-enzyme, space group P4,2,2, PDB code: 1KN9) (Table 5.2,

highlighted as bold and Italic). However, after carefully comparing the two crystal growth conditions (1.OM NH4H2P04,0.1 M Na Citrate pH 5.6, 5% tert-amyl

alcohol for penem-enzyme and 0.7M NH4H2P04,0.1M Na Citrate pH 5.6, 5% 2- methylpentane-2, 4-diol for apo-enzyme), it is perhaps not surprising since these two crystal growth conditions are very similar except a subtle difference in additives, here, 5% tert-amyl alcohol versus 5% 2-methylpentane-2, 4-diol.

Based on these observation, we hypothesize that for the same protein molecule, similar crystallization conditions generate similar crystal-packing contacts. The reason is that probably almost any area, except the geometric concave of active and binding site on the surface of a protein molecule, is capable of forming intermolecular crystal-packing contacts, but the specific contacts depend on the crystallization condition such as precipitant salts, PEG, pH, additives, detergent, and temperature. Changing the crystallographic reagent composition could provide an appropriate pattern of contact patches on each molecule. Each molecule therefore acquires sufficient affinity to interact with one another and form a crystal. This forms the basis of standard screening procedure for crystallization hits from multitude of commercially prepared solutions. On the other hand, a subtle difference in additives, was significant enough to vary several contacts (see Table 5.2, in plain and regular font), which produces the polymorphism with the distinct nature of unit cell (orthorhombic and space group

P21212versus tetragonal and space group P41212). It is also worth noting that up to 80% of the crystal-packing contact identity between with inhibitor (penem) binding or without any inhibitor binding (apo-enzyme) also implies that the penem binding did not substantially influence the overall packing of a protein molecule of

E. coli SPase A2-75.

Interestingly, the crystal-packing contact patterns (Table 5.2 and 5.3) of lipopeptide-enzyme are much different from those of the penem-enzyme and apo-enzyme. There are much less crystal-packing contacts involved in the generation of this crystal. The crystal-producing condition was 150h w/v PEG

4000, 20% propanol, 0.1 M Na Citrate pH 6.0, 0.5% Triton X-100. This crystal growth condition was a low ionic strength rather than a high ionic strength.

Phosphate salts, 0.7-1.0 M NH4H2P04were present for the other two crystals. A low ionic strength growth condition may be responsible for such a difference in crystal-packing contacts.

The analysis of crystal-packing contacts revealed that (i) 7/12 (58%) Lys residues and 711 3 (54%) Glu residues in the sequence of SPase A2-75 molecule are exposed on protein surface (Table 5.4, Figure 5.8 and Figure 5.9) and (ii) participate in protein crystal-packing contacts (Figure 5.8 and 5.9; Table 5.2 and

5.3; Figure 5.5, 5.6, and 5.7). Therefore, for each E. coli SPase A2-75 protein molecule, at least 14 kcallmol (2 kcallmol x 7 for Lysines) plus12 kcaltmol (1.75 kcaltmol x7 for Glutamic acids) side chain conformational entropy cost needs to be added to constrain these side chains during crystal packing process according to the calculation from other research groups 1160, 1611. Thus, these Lys and Glu residues (ElO4, K105, K126, K140, K126, E163, E177, E188, E203, K213, E215,

K217, E218 and K227) could be potential mutation sites for improving crystal quality (Figure 5.8 and 5.9). Each single mutant of Lys to Ala, Glu to Ala could be tested to investigate the surface-entropy reduction approach, and each single mutant of Lys to Arg or Gln, Glu to Arg or Gln could be chosen for generating favoured crystal contacts [169, 1701. Table 5.4: The solvent accessible area of lysine, glutamic acid, arginine, and glutamine residues of SPase A2-75 molecule Residues Total Solvent accessible area The average specific solvent accessible ItY Pe area residues 12 62.60 115.7 133.3 150.6 69.50 185.20 5.90 50.70 18.10 (the salt bridge Kl62-N219) 127.00 78.00 K227 108.50 -- E82 13 62.50 El04 32.30 El21 81.20 El63 58.40 El77 179.00 El88 109.50 E203 119.20 E210 129.20 E215 95.90 E218 160.10 E225 44.00 E228 0.00 E289 91.80 -- R77 13 196.80 I R127 46.00 Rl46 9.30 Rl98 107.30 R199 170.80 R222 46.30 R226 37.90 R236 46.90 R275 1.20 R282 0.10 R295 79.40 R3l5 46.80 R3l8 - Q85 12 (2116 Q167 Q174 (2194 Q244 Q246 Q252 Q253 Q256 (2257 Q267 1 50 80 nteraction (< 5A) with The residues that are involved in H-bonding (< 3A) and VDW iA other molecules in AU and symmetry-related molecules are in bold. Figure 5.8: A schematic representation of the potential surface mutation sites on the primary structure of E. coli SPase A2-75 The surfaces residues of lysines and glutamic acids that are participated into crystal-packing contacts are labelled (based on crystal contact analysis, see the table 5.2 and 5 3). The primary structure of SPase A2-75 is shown as a bar. Three Lys and Glu (EK) clusters are indicated as bold and italic

Frequently, proteins containing surface sites where Lys and Glu cluster together are found and this provides alternative strategies of double, triple, or multiple mutations [I591. In our case, three surface-exposed of LysIGlu clusters, EK

(residues 104, and 105), KE (residues I62 and 163) and EKE (residues 21 5,

217and 218) are found (Figure 5.8 and 5.9). Among three LysIGlu clusters, EKE

(residues 215, 217and 218) is the most common patch participating in crystal- packing contacts in both penem- and apo- enzyme crystals (Table 5.2, Figure 5.5 and Figure 5.6). Particularly, these three residues could be considered as sites for applying double or triple mutations in crystal engineering. Figure 5.9: The surface Lys and Glu residues of SPase A2-75 molecule SPase A2-75 molecule is represented as the gray mesh. The surface Lys and Glu residues that are involved in H-bonding (< 3A) and VDW interaction (< 5A) with other molecules in the AU and the symmetry-related molecules are indicated as spheres with the colour red (Glu) and green (Lys). A. a face view. B. a back-side view. C. a top view from the domain 11.

It is also worthy noting that the common contact features of VDW interaction (highlighted in bold, Table 5.3) also exist between crystals penem- and apo- enzyme crystals (PDB code: 1B12 and 1KN9, respectively). First the large hydrophobic surface (also see Figure 1.5 and 1.6) is involved in crystal packing via Tyr81, Leul02, Thr297, Ala 298, Trp 300, Leu316, Ser317, and lle319 (Figure 5.5 and 5.6). In this large hydrophobic surface, the enzymatically important residue Trp300 seems to be the unique residue participating in crystal contacts (Table 5.3). Research groups have shown that exposed Trp is favourite residue for interaction and mutations of other surface residues to Trp have a high potential for the formation of crystal contacts [I54, 1711. Interestedly, replacing

Trp with Ala to weaken the dominant crystal contact also yields a better crystal in some cases [I721.

Domain 11 is largely involved in crystal-packing contacts in both hydrogen- bonding and VDW contacts but with distinct contact patterns. The contact patches cover more residues in tetragonal crystal (apo-enzyme, space group

P42212)than that in another tetragonal crystal (lipopeptide-enzyme, space group

P43212,PDB code: 1T7D) and orthorhombic crystal (penem-enzyme, space group P21212,PDB code: 1812). Domain I1 contains many more charged and polar residues, whereas domain I occupied with a number of hydrophobic residues (i.e. a large hydrophobic surface) which may account for the phenomenon.

Quite often Arg and Gln participate in crystal-packing contacts (Table 5.2,

5.3, and 5.4) since they are two favourite residues in crystal-packing [169, 1701.

Arg198 and Arg199 were involved in contact for multiple times in SPase A2-75 crystal packing. Other six Arg residues in the sequence of SPase A2-75 do not make contacts since most of these Arg residues are not fully exposed on the surface of protein (Table 5.4). Up to 75% (9112 Glu residues in the sequence) took part in crystal-packing contacts. Our evidence once again support the

general hypothesis that Arg and Gln are the two most favourite residues in

crystal contacts [169, 1701. In addition, the "self-contactJ'Leu102-Leu316 was

observed, which is a structural motif existing in specific interactions such as in

leucine zippers and leucine-rich repeats. This kind of interaction may support a

high affinity crystal-packing contact.

5.3.4 Variability in Loop Structure of SPase A2-75

Four loops (residues of 107-142 (P-hairpin), 170-178, 197-204, and 305-

313, Figure 5.9) in the SPase A2-75 molecule are variably present in the experimental electron density. The above-mentioned loops that are modelled based on the electron density have high 6-factors, suggesting they are very dynamic in nature. 6-factor, also called temperature-factor, is a numerical indicator. It measures atomic thermal motion and disorder of an atom [149]. It can also point out where there are errors in model building. B-factors are largely determined by local crystal packing, the lower the value of BfaCt,,, the higher order the packed region, and vice versa.

Experimentally, these four dynamic loops were either present or absent depending on the unit cells (Table 5.1) of different crystals. This phenomenon implies that these four dynamic loops were highly disordered and sometimes too flexible to be seen. They could serve as potential crystal engineering regions by deletion to make the SPase A2-75 molecule itself more tightly packed and would be more well ordered in crystal lattice. Loop 305-313 :N-teri

Figure 5.10: Representations of the four dynamic loops of SPase A2-75 molecule A. A ribbon representation of the four dynamic loops of SPase A2-75 molecule, the domain 1 and domain II are shown in blue and grey. respectively. The loop 107-124 (6-hairpin) is in magentas; the loop 170-178 is in yellow; the loop 197-204 is in green; and the putative loop 305-313 of this molecule is in red. The loop 305-313 is experimentally missing in this molecule. The coordinates of SPase A2-75 molecule were from PDB (accession code 1812, molecule A, and the coordinates of the putative loop 305-31 3 were taken from 1612, molecule C by superimposing). B. B- factor representation of the four dynamic loops of SPase A2-75 molecules, ten molecules of SPase A2-75 in the three different crystals are superimposed. Some of these molecules are missing the certain loops (see table 5.1) and present loops have different conformations. The colour code from blue to red is consistent with the increasing value of 8-factor, and the thickness of the line is also corresponding to the increasing value of 0-factor. Therefore, the warmer colour plus the fatter line indicates the great value of the B-factor, then the great dynamic degree of the loop.

5.3.5 Comparison of Dynamic Regions of SPase A2-75 Molecules in Different Crystals

In the orthorhombic crystal of enzyme-penem (PDB code: 1B12), the four

molecules in the asymmetric unit of enzyme-penem were superimposed (Figure

5.1 1 D for an overall view). Four molecules are almost identical as reflected in

Root Mean Square Deviation (RMSD) of the main chain back bone at each

molecule to be 0.64 A. However, most of the differences occur in the four loop regions, in particular, they are regions of residues 194-222, residues 168-180, residues 303-31 6, residues 105-124 (beta hairpin) and the P-strand 83-86 together with C-terminal region.

Region 194-222 includes the loop 197-204 (Figure 5.1 1 A). Only molecule

A (green) has the full loop. Other three molecules have partial of the loop 197-

204. Except for the loop197-204, the rest of this region shows different conformations. In the region 168-180 (Figure 5.11 B), molecule A (green) and molecule C (magenta), have full loops with different shapes. Molecule B (blue) has part of this loop and molecule D (yellow) has no loop. The H-bonding interaction of Ser1720H to Q252N seems help the stability of this loop as it is found in molecules A and C rather than in molecules B and D (Table 5.2, Figure

5.5). In region 303-316 (Figure 5.1 1 C), only molecule C has the loop, other three molecules have no loops shown.

Interestingly, in the region of beta stand 81-86 and the C-terminal 320-323

(Figure 5.1 1 E), both the N-terminal and C-terminal take two paths. The beta strand 81-86 of two molecules B (blue) and C (magenta) form a bulge conformation but that of the other two molecules A (green) and D (yellow) take a regular P-strand conformation. For the N-terminus, a bulge implies a less ordered conformation whereas a regular P-stand implies a relatively well-ordered conformation. Additionally, the C-termini run in different directions, molecules AD pair goes one direction and molecules BC pair goes to the other.

The last, in the region of beta hairpin (105-124)) two molecule A and D have the hairpin loops but the other two molecules B and C do not. The middle region of the P-hairpin loop in molecule D is more flat than that in molecule A owning to an H-bonding contact formed by Y1150H to G266N and G2660.

A region 194-222 I I I I I I I I I, I I, I I I I I I I I t 1 I I I t I I I

I eta strand8146 region 105-124 F ~-temina1~0-323 beta hairpin

Figure 5.1 1 :Structural superposition of the four molecules in the asymmetric unit of the orthorhombic covalent fl-lactam containing crystal structure (penem-enzyme, space group 2,2,2, PDB code: 1812) Figures D, an overall view of superposition of the four molecules in the AU. The molecule A (green), B (blue), C (magentas), and D (yellow) are shown as lines. The grey region is the overlapping ordered region. Figure A, 8, C, E, and F are the close-up views of the flexible regions. The colour code is corresponding to the molecules. Why do the N- and C-termini take two paths and why is the P-hairpin loop dynamic? We hypothesize that the large hydrophobic area (Figure 1.5, Figure

5.10) involved in the VDW crystal-packing contacts is responsible for the two paths of the N- and C- terminus and for the presence or absence of the P-hairpin.

When the large hydrophobic area is involved in the VDW crystal-packing contacts such as in molecules A and D (Figure 5.1I), the P-hairpin seems to be stabilized and shows up. The N-terminus takes a regular P-conformation and C- terminus points the direction away from the P-hairpin loop. In contrast, when the large hydrophobic area is not in involved in the VDW crystal-packing contacts such as in molecules B and C (Figure 5.5), the P-hairpin does not show up. In this case, the N-terminus adopts a bulge conformation, and the C-terminus tilts the direction toward to the hairpin loop. A bulge conformation of N-terminus represents a less ordered flexibility of this region due to no any interaction to support it.

Similarly, in the tetragonal crystal of apo-enzyme (PDB code: 1KN9), the four molecules in the asymmetric unit of apo-enzyme were superimposed (Figure

5.12 C). These molecules are almost identical as reflected in Root Mean Square

Deviation (RMSD) of the main chain back bone at each molecule to be 0.78 A.

Five dynamic regions contribute to the deviation. They are residues 191-222, residues 168-180, residues 104-125 (beta hairpin) and the P-strand 83-86 together with C-terminal region 320-323. Note that none of molecules in this crystal display loop 305-313, which implies that this loop is highly dynamic under this crystal packing environment. Region 194-222 includes loop 197-204 (Figure 5.12 A). Three molecule A (green), B (blue) and C (magenta) have the full loops.

The vibration of the loops span widely. Molecules D (yellow) have part of the loop

197-204. The same as in crystal of penem-enzyme, except the loop197-204, the rest portion of this region show different conformation. In the region 168-180

(Figure 5.12 B)! only molecule B has the full loop. The other three Molecules A,

C, and D have only a partial section of this loop.

Similar to that of the penem-enzyme crystal, in the region of beta stand

83-86 and C-terminal 320-323 (Figure 5.12 D), there are two conformations for these regions. The beta strand 83-86 of two molecules B and D form a bulge conformation but that of the other two molecules A and C take a regular P-strand conformation. In the region of beta hairpin (105-124), molecule A and C has the hairpin loop but molecule B and D do not have. The middle region of the hairpin loop shown an expanding flat shape due to the H-bonding from two residues

Q1160 and PI51 0 (Figure 5.6 and Table 5.2).

Again, the large hydrophobic area (Figure 1.5, 1.6, 5.6, and Table 5.2 and

5.3) involved in the VDW crystal-packing contacts is responsible for the two paths of the N- and C- terminus and for the presence or absence of the P-hairpin.

When the large hydrophobic area is involved in the VDW crystal-packing contacts such as in molecules A and C (Figure 5.12), the P-hairpin seems to be stabilized and shows up, and the N-terminus takes a regular P-conformation and

C-terminus points the direction away from the P-hairpin loop. Figure 5.12: Structural superposition of the four molecules in the asymmetric unit of the tetragonal apo-crystal (apo-enzyme, space group 4*2*2, PDB code: 1KN9) Figures C, an overall view of superposition of the four molecules in the AU of 1KN9. The molecule A (green), B (blue), C (magentas), and D (yellow) are shown as lines. The grey region is the overlapping ordered region. Figure A, B, D, and E are the close-up views of the flexible regions. The colour code is corresponding to the molecules. In contrast, when the large hydrophobic area is not involved in the VDW crystal-packing contacts such as in molecules B and D (Figure 5.6), the P-hairpin does not show up, and the N-terminus takes a bulge conformation, as well as the

C-terminus tilts toward to the hairpin loop.

In the tetragonal crystal of lipopeptide-enzyme (PDB code: 1T7D), two molecules per AU of lipopeptide-enzyme (PDB code: 1T7D) (Figure 5.13 A) were superimposed. These molecules are almost identical as reflected in Root Mean

Square Deviation (RMSD) of the main chain back bone at each molecule to be

0.65 A. Still, three dynamic regions contribute to the differences. They are residues 168-180, 191-222, and residues 302-315.

Importantly, because there is no large hydrophobic area (Figure I.5, 1.6, and 5.7) involved in the VDW crystal-packing contacts, there is no P-hairpin loop showing up in either of A and B molecule. This phenomenon is consistent with our proposal. However, why does the N-terminus still take a regular P-strand conformation the large hydrophobic area being involved in the VDW crystal- packing contacts to support it? It is possible that the lipopeptide inhibitor

(arylomycine A2), which has a long fatty acid tail, interacts with N-terminus from residues 81-85. Specifically, the H atom from arylomycine A2 at N7 is H-bonding with the residue P830 (3.5 A), therefore, supporting N-terminus in its regular P- strand conformation (Figure 5.14). In fact, based on the glycolipopeptide-enzyme crystal structure solved in Chapter 3 of this thesis, the fatty acid tail of the glycolipopeptide interact with the N-terminus from residue 80-85 in a parallel P- strand manner. B region 191822 1

Figure 5.13: Structural superposition of the two molecules in the asymmetric unit of the tetragonal lipo-peptide containing crystal (lipopeptide-enzyme, space group 432,2, PDB code: 1T7D) A, an overall view of superposition of the two molecules per AU of 1T7D. The molecule A (green), B (blue) are shown as lines. The grey region is the overlapping ordered region. Figure B, C, and D are the close-up views of the flexible regions. The colour code is corresponding to the molecules.

In the region of 191-222 (Figure 5.13 B), both molecule A (green) and B

(blue) display partial portion of the loops. In the region of 302-315 (Figure 5.1 3

C), only molecule B but A has the full loop. In the region of 168-180 (Figure 5.1 3

D), molecule B but A has the full loop. Figure 5.14: The active site of SPase A2-75 in the complex with Arylomycin A2 The active site of SPase A2-75 is represented as surface representation and the important residues that are involved in catalysis and binding are represented as stick representation. Both N-terminus (residue 81-85) and the region of residues 142-145 taking P-strand conformation are shown. The inhibitor Arylomycin A2 is representaed as stick and coloured according to the elements (yellow for carbon, red for oxygen, blue for nitrogen). The 045 of arylomycin A2 is non-covalently binding (Hydrogen bonding) to residues S90, S88 and K145. The 044 is Hydrogen bonding to residue 1144. The N33 is Hydrogen bonding with D142. The importance, the N7 is Hydrogen bonding with P83, which is the important interaction supporting the regular P-stand conformation of N-terminus of SPase A2-75 at this particular crystal-packing environment.

Figure 5.15 shows the superposition of the ten molecules in the AU of penem-enzyme (PDB code: 1B12, space group 21212), apo-enzyme (PDB code:

1KN9, space group 4,2,2) and lipopeptide-enzyme (PDB code: 1T7D, space group 43212)and the close-up view of each dynamic region. Interestedly, the loop

302-325 (Figure 5.1 5 D) has two conformations differing with directions ("up conformation" in lipopeptide-enzyme and "down conformation" in penem-enzyme, relative to the domain II), possible due to different crystal contact packing, however, the detailed information remains unclear. I ~

D region 303-316

Figure 5.15: Structural superposition of ten molecules in the asymmetric unit of three distinct crystal forms (penem-enzyme, space group 21212,PDB code 1812; apo-enzyme, space group 41212,PDB code: 1KN9; lipopeptide-enzyme, space group 43212,PDB code: 1T7D) Figures C, an overall view of superposition of ten molecules in the AU of three distinct crystal forms. The molecules are represented as lines. The molecule A (green), B (blue), C (magentas), and D (yellow) are from the penem-enzyme crystal. The molecule A (cyan), B (purple), C (red), and D (orange) are the from apo- enzyme crystal. The molecule A (forest), B (deepblue) are from the lipopeptide crystal. The grey region is the overlapping ordered region. Figure A, 6, D, E, and F are the close-up views of the flexible regions, the colour code is corresponding to the molecules. 5.3.6 The Dynamic Side Chains of the Active Site and Binding Site

Comprising the active and binding site of ten-molecule superposition of

the AU from three crystal forms (penem-enzyme, space group 21212, PDB code

1612; apo-enzyme, space group 4,2,2, PDB code: 1KN9; lipopeptide-enzyme,

space group 43212,PDB code: 1T7D), reveals that the active and binding sites

are nearly identical reflected by RMSD of Ca (0.52) and back bone (0.55). The

RMSD is mainly contributed by a bulge path formed in the beta strand 81-86 of

four molecules (2 molecules from penem-enzyme and the other 2 molecules from

apo-enzyme crystal). Similarly, there is little change in the active site constructed

by the catalytic residues Ser9O and Lys145, except 2 molecules from lipopeptide- enzyme shown a much different x4 value from -NH2 group of Lys145 (Table 5.4).

The -NH2 group of Lys145 points to a different direction in the molecules of lipopeptide-enzyme crystals (Figure 5.16C and 5.16D).

However, there is a significant difference in the substrate-binding sites.

The S1 binding pocket is made of non-polar atoms from residues 186, P87, S88,

S90, M91, Y143, 1144, and K145. The side chain rotation of these residues varies the size and shape of the pocket S1. Among 10 molecules, the largest rotations from these side chains are listed in Table 5.5. The side chain ~2 of 186 has rotated up to 236". The side chain X, of S88 has rotated up to 143 ". The side chain and of Y143 has rotated up to 28" and 37", respectively. The side chain and of 1144 has rotated up to 357" and 107", respectively. The S3 binding pocket is made of non-polar atoms from residues F84, 186, 1101, V132,

D142, Y143 and 1144. Similarly, the side chain rotation of these residues changes the size and shape of the pocket S3. The side chain 2,of 1101 has slightly rotated to 10, but the side chain and 2 has rotated up to 348". The side chain 2, of V132 has rotated up to 355". Particularly, F84 has contributed from both the main chain bulge conformation and its own side chain. The 1,and 7- of

F84 has rotated up to 212" and 164", respectively.

Table 5.5: Selected %valuesof the side chain from the residues that construct the active and binding site 1B12 1 KN9 1T7D / The -ABCD ABCD A Blargest Residues x("> x ("1 rotation K145 X, -164 178 -169 -164 179 179 177 179 -74 -65 34 8 1144 A penem-enzyme B apo-enzyme C lipopeptide-enzyme

Figure 5.16: A close-up view of the active site and binding sites of all chain superposition of SPase I A. The active and binding sites of 4 molecule superposition of the AU of the crystal form (penem-enzyme, space group 21212, PDB code 1812). The molecules are represented as lines. The molecule A (green), B (blue), C (magentas), and D (yellow) are highlighted. 6.The active and binding sites of 4 molecule superposition of the AU of the crystal form (apo-enzyme, space group 41212, PDB code: 1KN9). The molecule A (cyan), B (purple), C (red), and D (orange) are highlighted. C. The active and binding sites of 2 molecule superposition of the AU of the crystal form (lipopeptide-enzyme, space group 4&2, PDB code: 1T7D). The molecule A (forest), B (deep blue) are highlighted. D.The active and binding sites of 10 molecule superposition of the AU of three crystal forms. Molecules are highlighted as the same colour code as in the structure of the individual crystal form. The grey region is the overlapping ordered region. F84 in 4 molecules (2 from penem-enzyme and another 2 from apo- enzyme crystals) shapes a relatively wider S3 pocket, whereas F84 in the rest molecules including 2 molecules from lipopeptide-enzyme crystal constructs a narrow S3 pocket.

If we define that the active site of apo-enzyme as a relaxed state, it is very interesting to compare the active site and binding site of penem-binding enzyme with that of lipopeptide-binding enzyme. From the penem-binding state to the lipopeptide-binding state, the side chains of active residues K145, 1144, L95, 186,

1101, V132, and F84 K145 rotate largely duo to the binding difference between penem-binding and lipopeptide-binding enzyme. The x4 of K145 &-amino group rotates 348" from H-bonding with residue G272 in penem-binding state to H- binding with 045 of lipopeptide arylomycin A2. The ~1 of S88 rotates 143" from H- bonding with 08 of penem to H-bonding with 045 of arylomycin A2 (Figure 5.14).

The side chain of 1144 is free binding in the penem-enzyme complex but its ~1 has been rotated 243" because the H-binding between its main chain -NH to

044 in the lipopeptide arylomycin A2-enzyme complex.

5.3.7 Future Directions of Crystal Engineering

In summary, a comprehensive analysis of crystal-packing contact and structural variation from all of E. coli SPase molecules found in various crystallization environments have been studied. The various crystallization environments refer to three crystal forms of penem-enzyme (orthorhombic, space group P21212, PDB code: 1B12), apo-enzyme (tetragonal, space group P41212,

PDB code: 1KN9), and lipopeptide-enzyme (tetragonal, space group P432~2, PDB code: 1T7D). Based on the detailed interface analysis of hydrogen bonding and VDW interaction involved in crystal-packing contact, we propose four major ways for the crystal engineering of E. coli SPase I. First, single, double or triple mutations of surface Lys to Ala, Glu to Ala, or Lys and Glu to Arg, or to Gln may provide new crystal forms with superior diffraction qualities. The surface mutagenesis could be applied to E104, K105, K126, K140, K126, E163, E177,

El88, E203, K213, E215, K217, E218 and K227. An alternative, for varying the contact of the large hydrophobic surface (Figure 1.5, 5.5, and 5.6), Trp300 to Ala could also be a site for surface substitution. In addition, the single, double, triple or tetra removals of the four dynamic loops (residues of 107-142 (P-hairpin), 170-

178, 197-204, and 305-31 3, Figure 5.10) may be beneficial to generate highly ordered crystal due to their intrinsic flexibility that may interfere to crystal-packing lattice formation.

Taking an example, for directing the deletion of the P-hairpin loop residues

107-142, one has to consider how to design a short linker to replace the P-hairpin since not every randomly chosen linker could be suitable for the correct folding and activity of SPase A2-75. One clue comes from the fact that SPase and another SerILys protease UmuD' share the same protein fold in the catalytic core based on the structural comparison of SPase A2-75 with UmuD1[I 91 (Figure

5.17). The structural linker Ser-Ala-lle-Thr-Ala of the molecule UmuD that corresponds to the P-hairpin loop of SPase A2-75 in the 3D structure could be used as a reference for generating a linker to replace the P-hairpin loop in the deletion manipulation of E. coli SPase I. mi- b.

Figure 5.17: Structural superposition of SPase A2-75 with another SerlLys protease UmuD' The molecule of SPase A2-75 (molecule A, PDB code: 1B12) is represented as ribbon and coloured as blue and yellow. The molecule of UmuD (molecule A, PDB code: 1umu) is shown as ribbon and coloured as red and green. The P-hairpin loop of SPase A2-75 is highlighted as yellow. The structural linker of the molecule UmuD that is corresponding the P-hairpin loop of SPase A2-75 is highlighted as green, and the primary sequence of the structural UmuD linker is indicated.

Finally, truncations of flexible N- or C- terminal polypeptides of E. coli

SPase, such as the removal of the first 4 residues (residues V76-F79, Figure

5.10 6)at N-terminal and the last 2 residues (1322-H323, Figure 5.15 E) at the C- terminal, respectively. It is still very difficult to predict exactly the effect of crystal engineering through operations of surface residue mutation, loop deletion, termini truncation, and varying the contact on the protein molecular structures and further on enzyme activities. We should test the mutant enzyme activity of each derivative before setting up the crystallization. CHAPTER 6: CONCLUSION

Escherichia coli type 1 signal peptidase (SPase I) is a membrane-bound endopeptidase, which functions to cleave off the amino-terminal signal peptide from the majority of secreted proteins and some membrane proteins (Chapter 1, page 1). It is essential for bacterial cell viability. The active site of SPase I resides on the outer leaflet of the bacterial membrane and utilizes a SerILys dyad catalytic mechanism, which is significantly different from the SerlHis mechanism utilized by the functional homologous signal peptidase complex in the endoplasmic reticulum. Therefore, it is a potential antibiotic target to support the structure-based drug design approach for the development of a novel class of antibiotics.

In order to obtain the diverse information critical for structure-based drug design, solving a single crystal structure of a target is essential but not optimal.

Ideally, a broad range of crystal structures of enzyme-inhibitor complexes should be obtained to characterize the target flexibility or inhibitor induced conformation changes.

This thesis work contributed to the discovery of optimized co- crystallization conditions for up to nine novel complexes of SPase Ilinhibitor

(Chapter 2, page 57). The preliminary X-ray crystallographic analysis (crystals diffracted to 2.6A) was completed for five of these complexes. In addition, the crystal structure was solved for two new complexes of SPase Ilinhibitor. This thesis reported the crystal structure of a soluble catalytic domain of E. coli SPase I (SPase A2-75) in complex with a glyco-lipohexapeptide based natural product SPase I inhibitor (BAL4850C) (Chapter 3, page 85). The 2.4 a resolution structure revealed that the inhibitor interacts with SPase A2-75 through a hydrogen-bond network derived from its COOH terminus to all catalytic residues, Ser88, Ser90, and Lys145, and through its hexapeptide backbone

(N33, N28, 027, 015, N7 and 06) to the main chain and the side chain atoms in the protein residues that line the SPase binding site (Asp 142, Gln85, Phe84,

Pro83, and Glu82) in a parallel P-sheet type hydrogen bond manner. The methyl

(C30) of the inhibitor mimics substrate Ala side chain and points into the S3 substrate-binding site of SPase 1. The attached sugar group (deoxy-a-mannose) of the inhibitor, although clearly visible in the electron density (contoured at 1 sigma), is entirely expose to the solvent and is closest to the protein residue

Pro87. The structure reveals electron density for the 17-carbon unsaturated fatty acid tail of the inhibitor, which makes van der Waals contacts with the N-terminal

P-stand (residues 82-80) of the SPase I. A structural comparison is made between this inhibitor complex and the previously solved structures of E. coli

SPase I.

We have learnt the inhibition model of SPase 1 with the inhibitor

BAL4850C: (1) The C-terminus mimics transition state; (2) The hexapeptide competes for the substrate-binding site; (3) The fatty acid tail may increase the affinity of the inhibitor by packing with the cytoplasmic membrane association surface of SPase I and helps to provide the proper orientation for the inhibitor to the binding site; (4) The sugar group may help with the solubility and the proper

presentation of the inhibitor to the binding site.

This thesis work also reported the crystal structure of SPase A2-75 in a ternary complex with a lipohexapeptide-based natural product SPase I inhibitor

(Arylomycin A2) and a sultam/morpholino derivative (BAL0019193) (Chapter 4,

page 117). The investigation was inspired by the co-inhibition of Arylomycin A2 and BAL 0019193. The 2.0 A resolution structure revealed that these two inhibitors interact with SPase A2-75 through a hydrogen-bond network. For

Arylomycin A2, a hydrogen-bond network derived from its COOH terminus and its hexapeptide backbone is similar to the interactions seen for BAL4850C. In addition, the structure reveals electron density for the iso 12-carbon saturated fatty acid tail of Arylomycin A2, which has a curved conformation (contoured at 1 sigma) and makes van der Waals contacts from C50-52, C53, C58, and C59 of

Arylomycin A2 to the protein residues Trp300, Phe100, Gln85, and Pro83, respectively. Interestingly, we were able to determine the binding site for

BAL0019193 from a clear electron density, which resides like a lid to cap the active site. The binding forces of BAL 001 9193 are from hydrogen bonding and van der Waals contacts, from atoms Sl, 02, N6, N9, and 08 of BAL0019193 to the protein residues Ser88, Ser90, Lys 145, Asn 277, Ala279, and Glu307, as well as the atom 045 of inhibitor Arylomycin A2. Additionally, the displacement of a conserved deacylation water by the atom C5 of BAL0019193 and the cap of the active site as well as the ring core of Arylomycin A2 help to secure

Arylomycin A2 inhibition. Again, a structural comparison is made between this SPase A2-75linhibitor complex and the previously solved structures of E. coli

SPase.

We have also learnt the co-inhibition model of SPase 1 with the inhibitor

Arylomycin A2 and BALOOI 91 93: (1) the inhibition mechanism of Arylomycin A2 is similar to that of BAL4850C; (2) BALOOI 91 93 helps the inhibition of Arylomycin

A2 by three means: first, it coordinates with the nucleophile and the oxyanion hole. Second, it displaces the deacylating water. The third, it helps to block the active site, may slow down the off rate for inhibitorlSPase complex. Therefore, working together with Arylomycin A2, BAL0019193 increases the inhibition of

Arylomycin A2. This may explain why the inhibition increases 1,000-fold when both inhibitors are present, versus the lone presence of Arylomycin A2 during

SPase catalysis.

In order to increase the speed for screening and evaluating a number of inhibitors (promising drug candidates), and further supporting structure-based drug design for the development of new antibiotics, an effective growing crystal and then determining the accurate structure of enzyme-inhibitor complexes have to be manipulated. We noticed that even though we have established crystallization conditions for SPase I and two complexes of SPase Ilinhibitor

(penem and Arylomycin A2), however, whenever we switched to a new inhibitor, renders the previous established crystal growth conditions useless. A full search for initial crystallization conditions was required. To solve this problem, we plan to modify protein molecule and make it more rigid and, in turn, more easy to crystallize, thereby, achieving improved diffraction. Therefore, we turned our attention to a new tool, crystal engineering, and a much more controlled protein crystallization procedure that modifies the crystal-packing contact by changing the physico-chemical properties of protein molecule itself. Before we apply crystal-engineering strategies such as rational surface mutagenesis, deletion of dynamic loops, and truncation of N- and C-termini, we have to analyze the crystal-packing contacts and structural variations of SPase I, which we have performed in this thesis.

This thesis contributed to a comprehensive analysis of crystal-packing contacts and structural variations from all of E. coli SPase molecules found in various crystallization environments (Chapter 5, page 148). After a comprehensive structural analysis, precise sites for single, double, and triple site- mutations, generated by surface mutagenesis or via dynamic loop deletions, were identified and proposed. In the future this hypothesis could be tested to improve the diffraction quality of SPase Ilinhibitor crystals, making the crystallization process easier therefore being consistent with efficiency of the structure-based drug design approach. Lastly, a new N-terminal deletion construct of SPase I (SPase A2-80)was cloned, expressed, and crystallized, for the purpose of crystallization improvement (Appendix, page 199).

All of the efforts carried on in this thesis will greatly contribute to our understanding of SPase inhibitorlsubstrate recognition and should prove helpful in the development of novel antibiotics based on the inhibition of SPase 1. APPENDIX: CRYSTAL ENGINEERING OF E. COLI SPASE A2-80 FOR CRYSTALLIZATION IMPROVEMENT

I.Introduction

Based on our experience with the crystallization and the co-crystallization of SPase A2-75 with various inhibitors, we realized that the traditional strategies in SPase A2-75 crystallization including changing percipitant salts, PEG, pH, additives (various alcohol, cations, anions, detergents, and sugars), and temperature, work. However, for each new inhibitor, previously well-established crystallization conditions failed when applied to new complexes of SPase A2-

75linhibitor. For each new complex of SPase A2-75/inhibitor, we had to search for initial crystallization hits, and then optimize these hits. Such a long crystallization process required for SPase A2-75 complexed with each new inhibitor is not compatible with new antibiotic discovery, which is supposed to be very efficient. Believing that a crystal engineering approach might help to construct a more compact protein structure by trimming disordered regions and loops, or varying surface residues, we decided to generate new constructs of

SPase in order to increase the potential for better diffraction quality crystals. For example, we hope to generate a new mutant enzyme of SPase I, which will allow us easily to apply an inhibitor-soaking method. Soaking different inhibitors into performed crystals is more compatible with the structure-based drug design approach because it can provide speed, convenience, and reproducibility. In this study, we have taken our first step toward to SPase A2-75 crystal

selective engineering. We have generated a new construct called SPase A2-80

by truncating the five amino acids (residue 76-80, Val-Arg-Ser-Phe-lle) of SPase

A2-75 at its N-terminus. These five amino acids at the N-terminal are very

dynamic as reflected in high B-factors, since there is a weak electron density

corresponding to the residues 76-80 of SPase A2-75 (see Chapter 5, Figure 5.10

B).

These residues are not associated with enzyme active site nor do they have a role in the inhibitor binding (Chapter 5). We hypothesize that these N- terminal residues might interfere with crystal packing within crystal lattice. We expect that better diffraction crystals will be generated from E. coli SPase A2-80, which eliminates the disorder residues 76-80 of SPase A2-75. Here, we report the cloning, high-level expression, purification and crystallization of E. coli SPase

A2-80.

2. Materials and Methods

2.1 Template, Primers and Gene Amplification

The gene of SPase A2-80 was subcloned by polymerase chain reaction

(PCR) using purified plasmid DNA pET3dISPase A2-75 as a template. Plasmid

DNA pET3dISPase A2-75 was purified from expression host strain BL21 (DE3) using the alkaline lysis method as QlAprep Spin miniprep kit (250) (Qiagen Inc.) instructions. The sense primer was 5'-

CATATGTATGAACCGTTCCAGATCCCG-3' containing a Nde I restriction- endonuclease site (underlined). The anti sense primer was 5'-

GTCGACTTAATGGATGCCGCCAATGCG-3' containing a Sal I restriction-

endonulease site (underlined).The primers were 21 bp in length and containing

around 50% CG pairs. The primers were synthesized by University of Calgary.

The primers were resuspended in 200 PI TE buffer and their concentrations were

determined by measuring the UV absorbance at wavelength of 260nm using a

UV spectrometer (Agilent).

The PCR reaction was performed using the Mastercycler Gradient

(Eppendorf Science) and the target gene of SPase A2-80 was generated using a

high-fidelity vent DNA polymerase (New England Biolabs) for subcloning.

Amplification was carried out in a 100pl reaction mixture (Details on PCR

reaction in Table A.l and A.2). The reaction was started at 94•‹C (2 min), followed

by 20 cycles of denaturation (0.5 min at 94"C), annealing (1 min at 56.5"C), and

extension (2 min at 72 "C). After the 2othcycle, a 10-min extention at 72 "C was

applied. The reaction product was analyzed on a 1% agarose gel. The gel was stained in Ethidium Bromide (EtBr) dye solution and dye-bound DNA bands were visualized via UV illumination. Table A.l: PCR Reaction Components Component Name Volume (PI) Template Plasmid DNA pET3dISPaseA2-75 0.5 (25ng) Sense Primer A2-80 Sense 1.5 (50 pmollpl) Anti sense Primer A2-80 Anti-Sense 1.5 (50 pmollpl) dNTP's dATP, dTTP, 10 (8mM, dCTP, dGTP 2mM for each) ThermolPol buffer 10 (lox) Polymerase Vent DNA Polymerase 2 Sterile dH20 74.5 Total Volume 100

Table A.2: PCR Reaction Thermocycling Conditions Pre-activating 94•‹C 2 min Three-steps cycle: Denature 94•‹C 0.5 min Annealing 56.5"C 1 min Extension 72•‹C 2 min The Final Extension 72•‹C 10 min

2.2 Generating the Sticky Ends of the Insert DNA of SPase A2-80

2.2.1 Ligation of the PCR amplified insert DNA of SPase A2-80 With TOPO Vector

The TOPO TA Cloning @ (Invitrogen) vector was used as a tool to generate the sticky ends of the insert DNA with the designed restriction- endonuclease sites. Since the two primers were designed to generate the restriction-endonuclease sites that were actually non-suitable for a direct digestion of restriction-endonucleases, but for a purpose of a fast and efficient cloning using the commercial TOPO TA cloning @ kit. The insert DNA of SPase

A2-80 was amplified with the both blunt ends after the PCR reaction via a vent

DNA polymerase described above. To make 3'-A overhangs to each end of the

PCR product, an extra step of 20-min incubation at 72•‹Cwas needed, at which 1 1-11 rTag DNA polymerase (GE Healthcare) was added into the un-purified PCR

product right after the 2othcycle. Tag polymerase has a nontemplate-dependent

terminal activity and adds one single 3'-A overhangs at the both ends

of the PCR product. In the TOPO TA cloning @ kit, a TOPO vector ([email protected]

TOPO') was provided as linearized and has two compatible 3'-T overhangs

covalently binding to DNA topoisomerases I. DNA topoisomerases I recognizes

its compatible ends and functions as a . The ligation reaction is complete with 5 min at room temperature (the ligation reaction was set up as lnvitrogen

instructions).

Two micro liters of ligated TOPOISPase A2-80 plasmids was transformed

into 100pl One S~O~@TOP10 E.coli competent cells (Invitrogen). Briefly, the

mixture was placed on ice for 30min. Then the cells were subjected to a heat

shock at 42•‹C for 45s and followed by sitting on ice for 3min. 250 1-11 LB media was added to the transformed cells and incubated at 37"C, 250rpm for 1.5hr. The

positive white colonies with the successful ligated TOPO-SPase A2-80 plasmids were selected by plating the transformed cells onto a LB agar plate supplemented with 100 pl Xgal (20mglml) and Kanamycin (50 pglml).

2.2.2 Releasing the Target Gene with the Sticky Ends of Restriction-Endonulease Sites from the Plasmid of TOPOlSPase A2-80

The sticky restriction-endonuclease Nde I and Sal I ends were generated by restriction digestion. The successful ligated plasmids of TOPOlSPase A2-80 were purified using QlAprep Spin Miniprep Kit (Qiagen). The purified plasmid

DNA was double digested by incubating with Nde I and Sal I (Invitrogen) with corresponding buffer (Details in Table A.3) at 37•‹C for 2 hr. The releasing of insert DNA of SPase A2-80 with sticky ends of restriction-endonulease sites was confirmed by a I% DNA agarose gel analysis

Table A.3: Restriction Digestion I Com~onent I Volume (ul) / Plasmid DNA 1 10 I 10 x Buffer H 1.5 Nde l 1 Sal I 1 Sterile dH20 1.5 Total Volume 15

2.3 Construction of Recombinant Expression Plasmids

The desired band of the insert DNA of SPase A2-80 with sticky ends of restriction-endonulease sites was excised and gel-purified according to the manual provided with QlAquick Gel Extraction Kit (QIAgen), and ligated into the expression vector PET 24a' that had been similarly digested and treated with shrimp alkaline phosphatise (SAP) for 30 min at 37•‹C.The band of expression vector PET 24a' had also been excised and gel-purified. The ligation reaction

(Table A.4) was set up using T4 DNA ligase (Invitrogen) and maintained at 16•‹C for 16hr. Table A.4: DNA Ligation ( Component ( Volume (PI) I Vector / x I Insert DNA ( Y 10x Ligation- Buffer 12 50 mM ATP 1 T4 DNA Ligase 1 Sterile dH70- Z Total Volume 1 20

X: a proper volume to provide the vector of 100ng or 0.03 pmol per reaction) Y: a proper volume to provide a molar ratio of insert DNA to vector at 3:l Z: a proper volume to provide a total 20 pl reaction

Two micro-liter of the ligation mixture was transformed into E. coli. Nova blue cells, a high transformation efficient cell line. The kanamycin-resistant colonies were selected on LB plates at 37•‹C. Finally, the plasmid pET24a'lSPase A2-80 was isolated from Nova blue cells and retransformed into expression host strain BL21 (DE3). The positive clones were confirmed by double restriction-endonulease digestion for releasing the insert DNA.

2.4 SPase A2-80 DNA Sequencing

The pure and concentrated (-300 nglpl) sample of recombinant plasmid

DNA pET24a'lSPase 02-80 was sent to UBC CMMT (Center for Molecular

Medicine and Therapeutics) for confirming the correct open reading frame of

SPase 02-80 via DNA sequencing. The 1.5ml stocks of E. coli strain BL21 (DE3) containing the plasmid pET24a'ISPase A2-80 adjusted to 25% glycerol were prepared and stored at -80•‹C. 2.5 Overexpression, Purification and Crystallization of Protein SPase A2-80

The overexpression, purification and crystallization of Protein SPase A2-

80 were carried out following the methods described in chapter 2.

2.6 In Vitro SPase Activity of SPase A2-80

In vitro signal peptidase activity of SPase A2-80 was determined and compared with SPase A2-75. Signal peptidase cleavage of the substrate pro-

OmpA nuclease A (PONA) was performed at different dilution of SPase [60].

Briefly, each cleavage reaction was set up by the addition of 1~1of SPase started solution (0.1 mg/ml) or a corresponding serial dilution (the dilution factors as 1o', lo2, lo3,and lo4)of the started solution to lop1 of PONA solution (0.04 mg/ml in

20 mM Tris-HCI, pH 7.4, and 0.1% triton). Reactions were incubated at 37 "C for

1 hr and analyzed by running on 17% SDS-PAGE (Figure A.5).

3. Results and Discussion

3.1 PCR Amplification of SPase A2-80 Gene

Using the applicable primers and the proper PCR reaction conditions, the gene of SPase A2-80 was successfully amplified from the template plasmid pET3dISPase A2-75. An agarose gel analysis confirmed the successful amplification of SPase A2-80 gene based on its size (Figure A.l). A.1: Agarose gel analysis of SPase 112-80 PCR products Lane 1, A PST ladder; Lane 2, 100bp Ladder; Lanes 3-5, 6-8, 9-1 1, 12-13, and 15- 17 were SPase A2-80 PCR products from PCR cycles loth, 15th, 20th 25th, and 30th, respectively. The Product from the PCR cycle 20th was used for further subcloning. The arrow indicates the PCR product of SPase A2-80 gene (- 735 bp).

3.2 Cloning of SPase A2-80 PET Constructs

Using TOPO cloning vector as an intermediate vector to generate the

correct sticky ends with designed restriction-endonulease sties of the insert DNA was proved very efficient. A linearized TOPO vector ([email protected] -TOPO@) accepts the insert DNA by the complementary base pairing of 3'-T overhanges of the

TOPO vector and the 3'-A overhanges of the insert DNA from a Tag polymerase

PCR product. This process is fulfilled by the activation and ligation of topoisomerase I. The cutback insert DNA fragment by double restriction- endonulease digestion must carry the correct sticky ends with restriction- endonulease sties due to its release.

Recombinant expression constructs of PET 24a'l SPaseA2-80 were obtained using T4 Ligase. The ligation of TOPO-cutback insert SPase A2-80

DNA and similar cut pET24a' (both of them were pre-treated with the double digestion of Nde 1 and Sal I) was achieved at 3: 1 molar ratio (insert to vector).

The successful recombinant constructs was confirmed by the releasing the

correct size of insert DNA SPase A2-80 from the expression vector pET24a'

(Figure A.2).

Double digestion of the recombinant plasmid SPase A2-801pET24a' Lane 1, A PST ladder; Lane 2, Mass ruler; Lane 3, 100bp ladder; Lane 16, 1Kb ladder; Lanes 4, 6-15 were several positive clones with the successful recombinant plasmid SPase ~2-80/pET24a'; Lane 5 was from the plasmid TOPOISPase A2-80 as a positive control. The arrow indicates the releasing SPase A2-80 DNA fragments by Nde I and Sal I double-digestion. The arrow indicates the releasing DNA insert of SPase A2-80 (- 735 bp).

3.3 The Result of SPase A2-80 DNA Sequencing

A positive clone (pET24a+/SPase A2-80) of containing an insert gene

SPase A2-80 was isolated, and the open reading frame was detected by nucleotide sequencing of it. The amino acid sequence of SPase A2-80 was deduced using the program T-COFFEE in the ExPASy server [I 731. The amino acid sequence of SPase A2-80 is shown (Figure A.3), along with the amino acid sequence alignment of SPase A2-75. The amino acid sequence alignment indicates SPase A2-80 has been truncated five residues (VRSFI) from that of

SPase A2-75 at its N-terminus.

Sequence 1: 2-75, (249 residues) Sequence 2: 2-80, (244 residues)

CLUSTAL FORMAT for T-COFFEE Version-1.41 ,CPU=0.94 sec, SCORE=100, Nseq=2, Len=249

-----YEPFQIPSGSMMPTLLIGDFILVEKFAYGIKDPIYQKTLIETGHPKRGDIWFK .*k%***X;~**:~*XX**;~~t~~**~k.,r*~C;~~~****;t~C*;C*XI:~~.C;C~~~i**;~~k~k',k~.t*~~~*;C

YPEDPKLDYIKRAVGLPGDKVTYDPVSKELTIQPGCSSGQACENALPVTYSNVEPSDFVQ YPEDPKLDYIKRAVGLPGDKVTYDPVSKELTIQPGCSSGQACENALPVTYSNVEPSDFVQ ......

TFSRRNGGEATSGFFEVPKNETKENGIRLSERKETLGDVTVPIAQDQVGMYYQQP TFSRRNGGEATSGFFEVPKNETKENGIRLSERKETLGDVTVPIAQDQVGMYYQQP ......

GQQLATWIVPPGQYFMMGDNRDNSADSRYWGFVPEANLVGRATAIWMSFDKQEGEWPTGL GQQLATWIVPPGQYFMMGDNRDNSADSRYWGFVPEANLVGRATAIWMSFDKQEGEWPTGL ......

323 RLSRlGGlH RLSRlGGlH *********

A.3: The amino acid sequence alignment of SPase A2-75 and SPase A2-80 SPase 112-75 has 249 amino acids. SPase A2-80, an N-terminal truncated version of SPase 112-75, has only 244 amino acids, without five residues VRSFI of SPase A2-75 at its N-terminus.

3.4 Expression, Purification and Crystallization Conditions of Protein SPase A2-80

Protein SPase 02-80 was expressed as inclusion bodies in a high level and was purified under the same procedure as that described in chapter 2. The pure SPase 02-80 sample used for crystallization was shown in Figure A.4A. Interestingly, the protein SPase A2-80 could be concentrated up to 194 mglml without any aggregation.

Purification and crystallization of SPase A2-80 A. Lane 1, low range molecular markers (1 13, 91, 50, 35, 28, and 21 KDa); Lane 2, SPase A2-80 was purified to homogeneity for crystallization. The arrow indicates the SPase A2-80 (-28 KDa). B. The whole drop photomicrography of crystals of SPase A2-80.

The crystallization condition for SPase A2-80 was still required an initial screening and grids optimization described in chapter 2. The improved crystallization condition so far is 1.7M NH4 Formate, 0.1 M Na acetate pH 5.2.

The drop size was 1pl protein plus I pl reservoir solution. The crystal average size was 0.04 x 0.04 x 0.16 mm and was too small to collect data (Figure A.4B).

Further modification of this growth condition is ongoing.

At this stage, the evaluation of crystal engineering SPase A2-80 for improving the crystal quality of SPase A2-75 could not be concluded yet. 3.5 Signal Peptidase Activity of SPase A2-80

The full-length pre-protein substrate PONA has a mature protein portion

(Staphylococcus aureus nuclease A ) with a calculated molecular weight

18,839.7 Da, linked to a signal peptide of E. coli outer membrane protein A

(OmpA) [60]. The predicted SPase cleavage site is after the sequence -

FANAQA- at the C-terminal of the signal peptide [48]. After the cleavage, the mature protein of PONA has a molecular weight 16,811.2, estimated by mass spectrometry [63].

SPase L.2-80 activity 234567

Dilution 0 1 10' lo2 lo3 lo4

In vitro signal peptidase activity of SPase A2-80 and SPase A2-75 A. Lane 1, low range molecular markers (1 13, 91, 50, 35, 28, and 21 KDa); Lane 2, The 0 dilution is a negative control in which no SPase has been added and indicates the position of the unprocessed pro-protein PONA (designated as p). Lanes 3, 4, 5, 6, and 7 indicate the cleavage processing of PONA. Processing of PONA was initiated by the addition of 1 pl of SPase started solution (0.1 mglrnl) or a corresponding serial dilution (dilution factor as lo1, lo2, lo3, and lo4)of the started solution to 10 pl of PONA (0.04 mg/ml in 20 mM Tris-HCI, pH 7.4, and 0.1% Triton X-100). The processed mature protein was designated as m. Reactions were incubated at 37 "C for I hr, analyzed on 17% SDS-PAGE, and visualized by Coomassie Blue staining. 6. Lanes 1 to 6 indicate the processing of PONA by active SPase A2-75 under the same procedure as that of SPase A2-80 for comparison. Lanes 7 is low range molecular markers (I13, 91, 50, 35, 28, and 21 KDa) as the lane 1 in A. The processing result (Figure A.5A) of PONA by SPase A2-80 demonstrates that SPase A2-80 is active and has the cleavage activity for PONA signal peptide. As can be seen in Figure A.5, SPase A2-80 has about 10-fold lower activity when compared with the SPase A2-75 active cleavage reaction

(Figure A.5A and A.56). REFERENCE LIST

Wiech H., K.P., and Zimmerman R., Protein export in prokaryotes and eukayotes Theme with variations. FEBS Lett, 1991. 285(2): p. 182-188. Lodish H., B.A., Matsudaira P., Krieger C. A., Krieger M., Scott M. P., Zipursky S. I., Darnell J.. Molecular Cell Biology. 5th ed. 2003, New York: W.H. Freeman and Company. p. 3. Pugsley, A. P., The complete general secretory pathway in gram-negative bacterial. Microbiological Reviews, 1993. 57(1): p. 50-108. von Heijne, G., The signal peptide. J. Membr Biol, 1990. 115: p. 195-201 Paetzel, M., Dalbey, R. E., and Strynadka, N. C., The structure and mechanism of bacterial type I signal peptidases. A novel antibiotic target. Pharmacol Ther, 2000. 87(1): p. 27-49. Collinson, I., The structure of the bacterial protein translocation complex SecYEG. Biochem Soc Trans, 2005. 33(6): p. 1225-30. Fekkes, P., and Driessen, A. J., Protein Targeting to the Bacterial Cytoplasmic Membrane. Microbiol. Mol. Biol. Rev., 1999. 63(1): p. 161- 173. Economou, A., Following the leader: bacterial protein export through the Sec pathway. Trends in Microbiology, 1999. 7(8): p. 31 5-320. Driessen, A.J., Fekkes, P., and van der Wolk, J. P., The Sec system. Curr Opin Microbiol, 1998. l(2): p. 216-22. Paetzel, M., Karla, A., Strynadka, N. C., and Dalbey, R. E., Signal peptidases. Chem Rev, 2002. 102(12): p. 4549-80. Chen, R., and Henning, U., A periplasmic protein (Skp) of Escherichia coli selectively binds a class of outer membrane proteins. Mol Micro biol., 1996. 19(6): p. 1287-94. Hennecke, G., Nolte, J., Volkmer-Engert, R., Schneider-Mergener, J., and Behrens, S., The periplasmic chaperone SurA exploits two features characteristic of integral outer membrane proteins for selective substrate recognition. J Biol Chem., 2005. 280(25): p. 23540-8. Muller, M., Koch, H. G., Beck, K., and Schafer, U., Protein trafic in bacteria: multiple routes from the ribosome to and across the membrane. Prog Nucleic Acid Res Mol Biol, 2001. 66: p. 107-57. Duong, F., Wickner, W., The SecDFyajC domain of preprotein translocase controls preprotein movement by regulating SecA membrane cycling. EMBO J., 1997. 16: p. 4871-4879. Samuelson, J.C., YidC mediates membrane protein insertion in bacterial. Nature, 2000. 406: p. 637-641. Innis, M.A., Tokunaga, M., Williams, M. E., Loranger, J. M., Chang, S. Y., Chang, S.! and Wu, H. C., Nucleotide sequence of the Escherichia coli prolipoprotein signalpeptidase (lsp) gene. Proc Natl Acad Sci U S A, 1984. 81(12): p. 3708-12. Barrett, A. J., and Rawlings, N. D., Families and clans of serine peptidases. Arch Biochem Biophys, 1995. 318(2): p. 247-50. Paetzel, M., Dalbey, R. E., and Strynadka, N. C., Crystal structure of a bacterial signal peptidase in complex with a beta-lactam inhibitor. Nature, 1998. 396(6707): p. 186-90. Paetzel, M., and Strynadka, N. C., Common protein architecture and binding sites in proteases utilizing a Ser/Lys dyad mechanism. Protein Sci, 1999. 8(11): p. 2533-6. Dalbey, R.E. and W. Wickner, Leaderpeptidase catalyzes the release of exported proteins from the outer surface of the Escherichia coli plasma membrane. J Biol Chem, 1985. 260(29): p. 15925-31. Martoglio, B., and Dobberstein, B., Signal sequences: more than just greasy peptides. Trends in Cell Biology, 1998. 8(l0): p. 41 0-415. Nielsen, H., Engelbrecht, J., Brunak, S., and von Heijne, G., Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng., 1997. lO(1): p. 1-6. von Heijne, G., and Abrahmsen, L., Species-specific variation in signal peptide design implications for protein secretion in foreign hosts. FEBS Letters, 1989. 244(2): p. 439-446. von Heijne, G., Signal sequences. The limits of variation. J Mol Biol, 1985. 184(1): p. 99-1 05. von Heijne, G., Net N-C charge imbalance may be important for signal sequence function in bacteria. J Mol Biol, 1986. 192(2): p. 287-90. Emr, S.D., and Silhavy, T. J., Importance of secondary structure in the signal sequence for protein secretion. Proc Natl Acad Sci U S A., 1983. 80(15): p. 4599-603. Nesmeyanova, M.A., Karamyshev, A. L., Karamysheva, Z. N., Kalinin, A. E., Ksenzenko, V. N., and Kajava, A. V., Positively charged lysine at the N-terminus of the signal peptide of the Escherichia coli alkaline phosphatase provides the secretion efficiency and is involved in the interaction with anionic phospholipids. FEBS Letters, 1997. 403(2): p. 203- 207. Inouye, S., Soberon, X., Franceschini, T., Nakamura, K., Itakura, K., and Inouye, M., Role of positive charge on the amino-terminal region of the signal peptide in protein secretion across the membrane. Proc Natl Acad Sci U S A., 1982. 79(11): p. 3438-41. Briggs M.S., C.D.G., Dluhy R.A., Gierasch L.M., Conformation of signal peptides induced by lipids suggest initial steps in protein export. Science, 1986. 233(4760): p. 206-8. Schechter, I., and Berger, A,, On the size of the active site in proteases. I. Papain. Biochem Biophys Res Commun., 1967. 27(2): p. 157-62. Date, T. and W. Wickner, Isolation of the Escherichia coli leaderpeptidase gene and effects of leader peptidase overproduction in vivo. Proc Natl Acad Sci U S A, 1981. 78(lO): p. 6106-1 0. van Klompenburg W, W.P., Diemel R, von Heijne GI de Kruijff B., A quantitative assay to determine the amount of signal peptidase I in E. coli and the orientation of membrane vesicles. Mol Membr Biol., 1995. l2(4): p. 349-53. Chang, C.N., Blobel, G., and Model, P., Detection ofprokawotic signal peptidase in an Escherichia coli membrane fraction: endoproteolytic cleavage of nascent fl pre-coat protein. Proc Natl Acad Sci U S A., 1978. 75(1): p. 361-5. Zwizinski, C., and Wickner, W., Purification and characterization of leader (signal) peptidase from Escherichia coli. J. Biol. Chem., 1980. 255(16): p. 7973-7977. Wolfe, P.B., Wickner, W., and Goodman, J. M., Sequence of the leader peptidase gene of Escherichia coli and the orientation of leader peptidase in the bacterial envelope. J Biol Chem, 1983. 258(l9): p. 12073-80. Sung, M., and Dalbey, R. E., Identification of potential active-site residues in the Escherichia coli leader peptidase. J Biol Chem, 1992. 267(l9): p. 13154-9. Tschantz, W.R., Paetzel, M., Cao, G., Suciu, D., Inouye, M., and Dalbey, R. E., Characterization of a soluble, catalytically active form of Escherichia coli leader peptidase: requirement of detergent or phospholipid for optimal activity. Biochemistry, 1995. 34(12): p. 3935-41. Paetzel, M., Chernaia, M., Strynadka, N. C., Tschantz, W., Cao, G., Dalbey, R. E., and James, M. N., Crystallization of a soluble, catalyfically active form of Escherichia coli leaderpeptidase. Proteins, 1995. 23(1): p. 122-5. Kuo, D.W., Chan, H. K., Wilson, C. J., Griffin, P. R., Williams, H., and Knight, W. B., Escherichia coli leader peptidase: production of an active form lacking a requirement for detergent and development of peptide substrates. Arch Biochem Biophys, 1993. 303(2): p. 274-80. 40. Paetzel, M., Goodall, J. J., Kania, M., Dalbey, R. E., and Page, M. G., Crystallographic and biophysical analysis of a bacterial signal peptidase in complex with a lipopeptide-based inhibitor. J Biol Chern, 2004. 279(29):p. 30781 -90. Paetzel, M., Dalbey, R. E., and Strynadka, N. C., Crystal structure ofa bacterial signal peptidase apoenzyme: implications for signal peptide binding and the Ser-Lys dyad mechanism. J Biol Chem, 2002. 277(11):p. 9512-9. Paetzel, M., Strynadka, N. C., Tschantz, W. R., Casareno, R., Bullinger, P. R., and Dalbey, R. E., Use of site-directed chemical modification to study an essential lysine in Escherichia coli leader peptidase. J Biol Chem, 1997. 272(15): p. 9994-1 0003. Moore, K.E., and Miura, S., A small hydrophobic domain anchors leader peptidase to the cytoplasmic membrane of Escherichia coli. J. Biol. Chem., 1987. 262(l8):p. 8806-881 3. Bilgin, N., Lee, J. I., Zhu, H. Y., Dalbey, R., and Heijne, V., Mapping of catalytically important domains in Escherichia coli leader peptidase. The EMBO Journal, 1990. 9(9):p. 271 1-2722. San Millan J. L., B.D., Dalbey, R., Wickner, W., and Beckwith, J., Use of phoA fusion to study the topology of the Escherichia coli inner membrane protein leader peptidase. J Bacteriol, 1989. I71(1 0): p. 5536-41. Whitley, P., Nilsson, L., and von Heijne, G., Three-dimensional model for the membrane domain of Escherichia coli leader peptidase based on disulfide mapping. Biochemistry, 1993. 32(33): p. 8534-9. Dalbey RE, W.W., Leaderpeptidase of Escherichia co1i:critical role of a small domain in membrane assembly, Science, 1987. 235(4790): p. 783- 7. Carlos, J.L., Paetzel, M., Brubaker, G., Karla, A., Ashwell, C. M., Lively, M. O., Cao, G., Bullinger, P., and Dalbey, R. E., The role of the membrane- spanning domain of type I signal peptidases in substrate cleavage site selection. J Biol Chem, 2000. 275(49): p. 3881 3-22. Rawling, N.D., and Barret, A,, Families of serine peptidases. Meth. Enzyrnol., 1994. 244: p. 78-61. Rawling, N.D., and Barret, A,, Families of cysteine peptidases. Meth. Enzymol., 1994. 244: p. 461-486. Rawling, N.D., and Barret, A,, Families of metallopeptidases. Meth. Enzyrnol., 1994. 248: p. 183-228. Rawling, N.D., and Barret, A., Families of aspartic peptidases, and those of unknown mechanism. Meth. Enzyrnol., 1994. 248: p. 105-120. Black, M.T., Evidence that the catalytic activity of prokaryote leader peptidase depends upon the operation of a serine-lysine catalytic dyad. J Bacteriol, 1993. 175(16): p. 4957-61. Tschantz, W.R., Sung, M., Delgado-Partin, V. M., and Dalbey, R. E., A serine and a lysine residue implicated in the catalytic mechanism of the Escherichia coli leader peptidase. J. Biol. Chem., 1993. 268(36): p. 27349- 27354. Berman, H.M., Westbrook, J., Feng, Z., Gilliland! G., and T.N. Bhat, Weissig, H., Shindyalov, 1. N., and Bourne, P. E., The Protein Data Bank. Nucleic Acids Res, 2000. 28(1): p. 235-42. DeLana, W.L., The PyMoL molecular graphics System. DeLano Scientific, San Carlos, CA, USA, 2002. Deol, S.S., Bond, P. J., Domene, C. , and Sansom, M. S. P., Lipid-Protein Interactions of Integral Membrane Proteins: A Comparative Simulation Study. Biophys J., 2004. 87(6): p. 3737-3749. Kim, Y.T., Muramatsu, T., and Takahashi, K., Leaderpeptidase from Escherichia coli: overexpression, characterization, and inactivation by modification of tryptophan residues 300 and 31 0 with N- bromosuccinimide. J Biochem (Tokyo), 1995. 117(3): p. 535-44. van Klompenburg, W., Paetzel, M., de Jong, J. M., Dalbey, R. E., Demel, R. A,, von Heijne, G., and de Kruijff, B., Phosphatidylethanolamine mediates insertion of the catalytic domain of leader peptidase in membranes. FEBS Lett, 1998. 431(1): p. 75-9. Chatterjee, S., Suciu, D., Dalbey, R. E., Kahn, P. C., and Inouye, M., Determination of Km and kcat for signal peptidase I using a full length secretory precursor, pro-OmpA-nuclease A. J Mol Biol, 1995. 245(4): p. 31 1-4. Carlos, J.L., Klenotic, P. A., Paetzel, M., Strynadka, N. C., and Dalbey, R. E., Mutational evidence of transition state stabilization by serine 88 in Escherichia coli type I signal peptidase. Biochemistry, 2000. 39(24): p. 7276-83. Inada, T., Court, D. L., Ito, K., and Nakamura, Y., Conditionallethalamber mutations in the leader peptidase gene of Escherichia coli. J Bacteriol., 1989. 171 p. 585-587. Karla, A,, Lively, M. O., Paetzel, M., and Dalbey, R., The identification of residues that control signal peptidase cleavage fidelity and substrate specificity. J Biol Chem, 2005. 280(8): p. 6731-41. Klenotic, P.A., Carlos, J. L., Samuelson, J. C., Schuenemann, T. A., Tschantz, W. R., Paetzel, M., Strynadka, N. C., and Dalbey, R. E., The role of the conserved box E residues in the active site of the Escherichia coli type I signal peptidase. J Biol Chem, 2000. 275(9): p. 6490-8. Kim, Y.T., Kurita, R., Kojima, M., Nishii, W., Tanokura, M., Muramatsu, T., Ito, H., and Takahashi, K., Identification of arginine residues important for the activity of Escherichia coli signal peptidase I. Biol Chem., 2004. 385(5): p. 381-8. Conn, E.E., Stumpf, P. K., Bruening, G., and Doi, R. H., Outlines Of Biochemistry 5/E, ed. P. Brecht, and Petry, G. 1987, New York: John Wiley & Sons, Inc. p149-163. Neurath, H., Proteolytic processing and physiological regulation. Trends Biochem Sci., 1989. 14(7): p. 268-71 Miranda E. and Lomas, D.A., Neuroserpin: a serpin to think about. Cell. Mol. Life Sci., 2006. 63: p. 709-722. Krowarsch, D., Cierpicki, T., Jelen, F., and Otlewski, J., Canonical protein inhibitors of serine proteases. Cell Mol Life Sci, 2003. 60(11): p. 2427-44. Bode, W., and Huber, R., Structural basis of the endoproteinase-protein inhibitor interaction. Biochim Biophys Acta, 2000. 1477(1-2): p. 241 -52. Lu, W., Zhang, W., Molloy, S. S., Thomas, G., Ryan, K., Chiang, Y., Anderson, S., and Laskowski, M., Argl5-Lysl7-Argl8 turkey ovomucoid third domain inhibits human . J Biol Chem, 1993. 268(20): p. 14583-5. Ardelt, W., and Laskowski, M., Effect of single amino acid replacements on the thermodynamics of the reactive site peptide bond hydrolysis in ovomucoid third domain. J Mol Biol, 1991. 220(4): p. 1041-53. Huntington, J.A., and Carrell, R. W., The serpins: nature's molecular mousetraps. Sci Prog, 2001. 84(Pt 2): p. 125-36. Stubbs, M.T., Huber, R., and Bode, W., Crystal structures of factor Xa specific inhibitors in complex with trypsin: structural grounds for inhibition of factor Xa and selectivity against thrombin. FEBS Lett, 1995. 375(1-2): p. 103-7. Powers, J.C., Asgian, J. L., Ekici, 0. D. and James, K. E., Irreversible inhibitors of serine, cysteine, and threonine proteases. Chem Rev, 2002. 102: p. 4639-4750. Rawlings, N.D., Morton, F.R. and Barrett, A. J., MEROPS: the peptidase database. Nucleic Acids Res., 2006. 34: p. D270-D272. Vijayalakshmi, J., Padmanabhan, K. P., Mann, K. G., and Tulinsky, A,, The isomorphous structures of prethrombin2 hirugen-, and PPACK- thrombin: changes accompanying activation and exosite binding to thrombin. Protein Sci., 1994. 3: p. 2254-2271. Poulos, T.L., Alden, R. A., Freer, S. T., Birktoft, J. J., and Kraut, J., Polypeptide Halometry Ketones Bind to serine Proteases as Analogs of the Tetrahedral Intermediate. The Journal of Biological Chemistry, 1976. 251(4): p. 1097-1 103. Wei A. Z., M.I., and Bode W., The refined 2.3 A crystal structure of human leukocyte elastase in a complex with a valine chloromethyl ketone inhibitor. FEBS Lett., 1988. 234(2): p. 367-73. Tsunasawa, S., Masaki, T., Hirose, M., Soejima, M., and Sakiyama, F., The primary structure and structural characteristics of Achromobacter lyticus protease I, a lysine-specific serine protease. J. Biol. Chem., 1989. 264: p. 3832-3839. Kernball-Cook, G., Johnson, D. J., Tuddenham, E. G., and Harlos, K., Crystal structure of active site-inhibited human factor Vlla (des-Gla). J.Struct.Biol., 1999. 127: p. 21 3-223. Gupton, B.F.,Carroll, D. L., Tuhy, P. M., Kam, C. M., and Powers, J. C., Reaction of azapeptides with chymotrypsin-like enzymes. New inhibitors and active site titrants for chymotrypsin A alpha, BPN', subtilisin Carlsberg, and human leukocyte cafhepsin G. J. Biol. Chem., 1984. 259(7):p. 4279-4287. Steinmetz, A.C., Demuth, H. U., and Ringe, D., Inactivation of subtilisin Carlsberg by N-((tert-butoxycarbonyl)alanylprolylphenylalanyl)-O- benzoylhydroxyl- amine: formation of a covalent enzyme-inhibitor linkage in the form of a carbamate derivative. Biochemistry, 1994. 33: p. 10535- 1 0544. Ding, X., Rasmussen, B. F., Demuth, H. U., Ringe, Dl and Steinmetz, A. C. U., Nature of the inactivation of elastase by N-peptidyl-0-aroyl hydroxylamine as a function of pH. Biochemistry, 1995. 34(23):p. 7749 - 7756. Wilmouth, R.C., Westwood, N. J., Anderson, K., Brownlee, W., Claridge, T. D., Clifton, I. J., Pritchard, G. J., Aplin, R. T., and Schofield, C. J., Inhibition of elastase by N-sulfonylaryl beta-lactams: anatomy of a stable acyl-enzyme complex. Biochemistry, 1998. 37(50): p. 17506-1 3. Wright, P.A., Wilmouth, R. C., Clifton, I. J., and Schofield, C. J., 'pH-jump' crystallographic analyses of gamma-lactam-porcine complexes. Biochem.J., 2000. 351: p. 335-340. Supuran, C.T., Scozzafava, A,, and Clare, B. W., Bacterial Protease Inhibitors. Med. Res. Rev., 2002. 22(4):p. 329-372. Harper, J.W., Hemmi, K., and Powers, J. C., Reaction of Serine Proteases with Substituted Isocoumarins: Discovery of 3,4-Dichloroisocoumarin, a New General Mechanism Based Serine Protease Inhibitor. Biochemistry, 1985. 24: p. 1831-1841. Meyer Junior, E.F., Presta, L.G., and Radhakrishnan, R., Stereospecific Reaction of 3-Methoxy-4-Chloro-7-AminoisocoumarIn with Crystalline Porcine Pancreatic Elastase. J.Am.Chem.Soc., 1 985. 107: p. 4091. Vijayalakshmi, J., Meyer Jr., E. F., Kam, C. M., and Powers, J. C., Structural study of porcine pancreatic elastase complexed with 7-amino-3- (2-bromoethoxy)-4-chloroisocoumarin as a nonreactivatable doubly covalent enzyme-inhibitor complex. Biochemistry, 1991. 30: p. 21 75-21 83. Pwers, J.C., Oleksyszyn, J., Naeasimhan, S. L., and Kam, C. M., Reaction of Porcine Pancreatic Elastase with 7-Substituted 3-Alkoxy-4- chloroisocoumarins: Design of Potent lnhibitors Using the Crystal Structure of the Complex Formed with 4-Chloro-3-ethoxy-7- guanidinoisocoumarin. Biochemistry, 1990. 29: p. 31 08-31 18. Radhakrishnan, R., Presta, L. G., Meyer Jr., E. F., and Wildonger, R., Crystal structures of the complex of porcine pancreatic elastase with two valine-derived benzoxazinone inhibitors. J.Mol.Biol., 1987. 198: p. 41 7- 424. Skordalakes, E., Dodson, G. G., Green, D. St. C., Goodwin, C. A., Scully, M. F., Hudson, H. R., Kakkar, V. V., and Deadman, J. J., Inhibition of human [alpha]-thrombin by a phosphonate tripeptide proceeds via a metastable pentacoordinated phosphorus intermediate. Journal of Molecular Biology, 2001. 31 l(3): p. 549-555. Bertrand, J.A., Oleksyszyn, J., Kam, C. M., Boduszek, B., Presnell, S., Plaskon, R. R., Suddath, F. L., Powers, J. C., and Williams, L. D., lnhibition of trypsin and thrombin by amino(#- amidinophenyl)methanephosphonate diphenyl ester derivatives: X-ray structures and molecular models. Biochemistry, 1996. 35: p. 3147-31 55. Bone, R., Sampson, N. S., Bartlett, P. A., and Agard, D. A., Crystal structures of alpha-lytic protease complexes with irreversibly bound phosphonate esters. Biochemistry, 1991. 30: p. 2263-2272. Hof, P., Mayr, I., Huber, R., Korzus, E., Potempa, J., Travis, J., Powers, J.C., and Bode, W., The 1.8A crystal structure of human G in complex with Suc-Val-Pro-PheP-(OPh)2: a Janus-faced proteinase with two opposite specificities. EMBO J, 1996. 15: p. 5481 - 5491. Hoog, S.S., Smith, W. W., Qiu, X., Janson, C. A., Hellmig, B., McQueney, M. S., O'Donnell, K., O'Shannessy, D., DiLella, A. G., Debouck, C., and Abdel-Meguid, S. S., Active site cavity of herpesvirus proteases revealed by the crystal structure of herpes simplex virus proteaseLnhibitor complex. Biochemistry, 1997. 36: p. 14023-14029. Cole, L.B., Chu, N., Kilpatrick, J. M., Volanakis, J. E., Narayana, S. V., and Ba bu, Y. S., Structure of Diisopropyl Fluorophosphate-Inhibited . Acta Crystallogr.,Sect.D, 1997. 53: p. 143. Longhi, S., Nicolas, A., Creveld, L., Egmond, M., Verrips, C.T., de Vlieg, J., Martinez, C., and Cambillau, C., Dynamics of Fusarium solani cutinase investigated through structural comparison among different crystal forms of its variants. Proteins, 1996. 26: p. 442-458. McGrath, M.E., Mirzadegan, T., and Schmidt, B. F., Crystal structure of phenylmethanesulfonyl fluoride-treated human at 1.9 A. Biochemistry, 1997. 36: p. 1431 8-14324. Radisky, E.S., Lee, J. M.! Lu, C. J., and Koshland Jr., D. E., Insights into the serine protease mechanism from atomic resolution structures of trypsin reaction intermediates. Proc.Natl.Acad.Sci. USA, 2006. 103: p. 6835-6840. Kurinov, I.V., and Harrison, R. W., Two Crystal Structures of the leupeptin- trysin complex. Protein Science, 1996. 5: p. 752-758. Bullock, T.L., Breddam, K., and Remington, S. J., Peptide Aldehyde Complexes with Wheat Serine Carboxypeptidase 11: lrnplications for the Catalytic Mechanism and Substrate Specificity. Journal of Molecular Biology, 1996. 255(5):p. 714-725. Telang, M.A., Giri, A. P., Sainani, M. N., and Gupta, V. S., Characterization of two midgut proteinases of Helicoverpa armigera and their interaction with proteinase inhibitors. Journal of Insect Physiology, 2005. 51(5):p. 513-522. Delbaere, L.T., and Brayer, G.D., The 1.8 A structure of the complex between chymostatin and Streptomyces griseus protease A. A model for serine protease catalytic tetrahedral intermediates. J. Mol. Biol., 1 985. 183: p. 89-1 03. Moult, J., Sussman, F., and James, M. N., Electron densitycalculations as an extension of protein structure refinement. Streptomyces griseus protease A at 1.5 A resolution. J.Mol.Biol., 1985. 182: p. 555-566. James, M.N., Sielecki, A.R., Brayer, G.D., Delbaere, L.T., and Bauer, C.A., Structures of product and inhibitor complexes of Streptomyces griseus protease A at I.8 A resolution. A model for serine protease catalysis. J.Mol.Biol., 1980. 144: p. 43-88. Bullock, T.L., Breddam, K., and Remington, S.J., Peptide aldehyde complexes with wheat serine carboxypeptidase 11: implications for the catalytic mechanism and substrate specificity. J.Mol.Biol., 1996. 255: p. 71 4-725. Bone, R., Shenvi, A. B., Kettner, C. A., and Agard, D. A,, Serine protease mechanism: structure of an inhibitory complex of alpha-lytic protease and a tightly bound peptide boronic acid. Biochemistry, 1987. 26: p. 7609- 7614. Marquart, M., Walter, J., Deisenhofer, J., Bode, W., and Huber, R., The Geometry of the Reactive Site and of the Peptide Groups in Trypsin, Trypsinogen and its Complexes with Inhibitors. Acta Crystallogr. ,Sect. B, 1983. 39: p. 480. Reyda, S., Sohn, C., Klebe, G., Rall, K., Ullmann, D., Jakubke, H. D., and Stu bbs, M. T., Reconstructing the Binding Site of Factor Xa in Trypsin Reveals Ligand-induced Structural Plasticity. J.MOL. BI OL., 2003. 325: p. 963-977. Jin, L., Pandey, P., Babine, R. E., Weaver, D. T., Abdel-Meguid, S. S,, and S trickler, J. E., Mutation of surface residues to promote crystallization of activated factor XI as a complex with benzamidine: an essential step for the iterative structure-based design of factor XI inhibitors. Acta Crystallogr., Sect.D, 2005. 61: p. I418-4425. Banner, D.W., and Hadvary, P., Crystallographic Analysis at 3.0- Angstroms Resolution of the Binding to Human Thrombin of Four Active Site-Directed lnhibitors. J.BioLChem., 1991. 266: p. 20085-20093. Sanderson, P.E. J., Small, Noncovalent Serine Protease Inhibitors. Med. Res. Rev., 1999. 1g(2): p. 179-197. Wickner, W., Moore, K., Dibb, N., Geissert, D., and Rice, M., lnhibition of purified Escherichia coli leader peptidase by the leader (signal) peptide of bacteriophage MI3 procoat. J Bacterial, 1987. 169(8): p. 3821 -2. Barkocy-Gallagher, G.A., and Bassford, P. J., Jr., Synthesis of precursor maltose-binding protein with proline in the +I position of the cleavage site interferes with the activity of Escherichia coli signal peptidase 1 in vivo. J Biol Chem, 1992. 267(2): p. 1231-8. Kuo, D., Weidner, J., Griffin, P., Shah, S. K., and Knight, W. B., Determination of the kinetic parameters of Escherichia coli leader peptidase activity using a continuous assay: the pH dependence and time- dependent inhibition by beta-lactams are consistent with a novel serine protease mechanism. Biochemistry, 1994. 33(27): p. 8347-54. Black, M.T., and Bruton, G., lnhibitors of bacterial signal peptidases. Current Pharmaceutical Design, 1998. 4: p. 133-154. Allsop, A,, Brooks, G., Edwards, P. D., Kaura, A. C., and Southgate, R., lnhibitors of bacterial signal peptidase: a series of 6-(substituted oxyethy1)penems. J Antibiot (Tokyo), 1996. 49(9): p. 921 -8. Schimana, J., Gebhardt, K., Holtzel, A., Schmid, D. G., Sussmuth, R., Muller, J., Pukall, R., and Fiedler, H. P., Arylomycins A and B, new biaryl- bridged lipopeptide antibiotics produced by Streptomyces sp. Tu 6075. 1. Taxonomy, fermentation, isolation and biological activities. J Antibiot (Tokyo), 2002. 55(6): p. 565-70. Kulanthaivel, P., Kreuzman, A. J., Strege, M. A., Belvo, M. D., Smitka, T. A., Clemens, M., Swartling, J. R., Minton, K. L., Zheng, F., Angleton, E. L., Mullen, D., Jungheim, L. N., Klimkowski, V. J., Nicas, T. I., Thompson, R. C. and Peng, S. B., Novel lipoglycopeptides as inhibitors of bacterial signal peptidase I. J Biol Chem, 2004. 279(35): p. 36250-8. Marahiel, M.A., Stachelhaus, T., and Mootz, H. D., Modular Peptide Synthetases lnvolved in Nonribosomal Peptide Synthesis. Chem. Rev., 1997. 97(7): p. 2651-2673. Danley, D.E., Crystallization to obfain protein-ligand complexes for structure-aided drug design. Acta Cryst., 2006. D62: p. 569-575. Hassell, A.M., An, G., Bledsoe, R. K., Bynum, J. M., Carter, H. L. Ill., Deng, S. J., Gampe, R. T., Grisard, T. E., Madauss, K. P., Nolte, R. T, Rocque, W. J., Wang, L., Weaver, K. L., Williams, S. P., Wisely, G. B., Xu, R., and S hewchu k, L. M., Crystallization of protein-ligand complexes. Acta Crystallogr D Biol Crystallogr, 2007. 63(Ptl): p. 72-9. Skarzynski, T., and Thorpe, J., Industrial perspective on X-ray data Collection and analysis. Acta Cryst., 2006. D62: p. 102-107. Avdeef, A,, Physicochemical profiling (solubility, permeability and charge state). Curr. Top. Med. Chem., 2001. l(4): p. 277-351. Balakin, K.V., Ivanenkov, Y. A., Skorenko, A. V., Nikolsky, Y. V., Savchuk, N. P., and Ivashchenko, A. A., In Silico Estimation of DMSO Solubility of Organic Compounds for Bioscreening. Journal of Biomolecular Screening, 2004. 9(1): p. 22-31. Aminabhavi, T.M., Desai, K. H., and Kulkarni, A. R., Polymers in drug delivery: Methods to enhance solubility of drugs using polymeric dispersion technique. Polymer News, 2003. 28: p. 315-320. Vilenchik, L.Z., Griffith, J. P., St Clair, N., Navia, M. A., and Margolin, A. L., Prpfen Crystals As Novel Microporous Materials. J. Am. Chem. Soc., 1998. 120: p. 4290-4294. Matthews, B.W., Solvent content ofprotein crysfals. J Mol Biol, 1968. 33(2): p. 491 -7. Wu S. S., D., J., Kontopidis, G,, Taylor, P., and Walkinshaw M. D., The First Direct Determination of a Ligand Binding Constant in Protein Crystals. Angew Chem Int Ed Engl., 2001. 40(3): p. 582-586. Mcnae, I.W., Kan, D., Kontopidis, G., Patterson, A., Taylor, P., Worrall, L., and Walkinshaw, M. D., Studying protein-ligand interactions using protein crystallography. Crystallography Reviews, 2005. 11(1 ): p. 61-71. Seidler, J., McGovern, S. L., Doman, T. N., and Shoichet, B. K., Identification and Prediction of Promiscuous Aggregating Inhibitors among Known Drugs. J. Med. Chem., 2003. 46: p. 4477-4486. Pflugrath, J.W., The finer things in X-ray diffraction data collection. Acta Crystallogr D Biol Crystallogr, 1999. 55(Pt 10): p. 1718-25. Middelberg, A.P.J., Preparative protein refolding. Trends in Biotechnology, 2002. 20(10): p. 437-443. Vincentelli, R., Canaan, S., Campanacci, V., Valencia, C., Maurin, D., Frassinetti, F., Scappucini-C., L., Bourne, Y., Cambillau, C. and Bignon, C., High-throughput automated refolding screening of inclusion bodies. Protein Sci, 2004. 13(10): p. 2782-2792. Num ber.4, C.C. P., The CCP4 suite: programs for protein crystallography. Acta Crystallogr D Biol Crystallogr, 1994. D50: p. 760-763. van Aalten, D.M., Bywater, R., Findlay, J. B., Hendlich, M., Hoof?, R. W., and Vriend, G., PRODRG, a program for generating molecular topologies and unique molecular descriptors from coordinates of small molecules. J Comput Aided Mol Des, 1996. lO(3): p. 255-62. McRee, D.E., XtalViewHfit - A versatile program for manipulating atomic coordinates and electron density. J. Structural Biology, 1999. 125: p. 156- 165. Cowtan, P. E.a. K., Coot: Model-Building Tools for Molecular Graphics. Acta Crystallographica Section D - Biological Crystallography, 2004. 60: p. 2 126-2 132. Brunger, A.T., Adams, P. D., Clore, G. M., Delano, W. L., Gros, P., Grosse-kunstleve, R. W., Jiang J. S., Kuszewski, J., Nilgesl, N., Pannun, S., Read, R. J., Rice, L. M., Simonson, T., and Warren, G. L., Crystalllography and NMR System (CNS): A New Software System For Macromolecular Structure Determination. Acta Cryst., 1998. D54: p. 905- 921. Painter, J., and Merritt, E. A., Optimal description of a protein sructure in terms of mutipie groups undergoing TLS motion. Acta Cryst., 2006. D62: p. 439-450. Laskowski, R.A., MacArthur, M. W., Moss, D. S., and Thornton, J. M., PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Crystallog, 1993. 26: p. 283-291. Rajarshi Maiti, G.H.V.D., Haiyan Zhang, and David S. Wishart, SuperPose: a simple sewer for sophiaticated structure superposition. Nucleic Acids Res, 2004. 32(Web Server Issue): p. W590W594. Binkowski, T.A., Naghibzadeg, S., and Liang, J., CASTp: computed atlas of surface topography of proteins. Nucleic Acid Research, 2003. 31: p. 3352-3355. Meritt, E.A., and Bacon, D. J., Raster3D: Photorealistic molecular graphics. Methods Enzymol, 1997. 277: p. 505-524. Otwinowski, Z., and Minor, W., Processing of X-ray Diffraction Data Collected in Oscillation Mode. Methods in Enzymology, 1997. 276(Macromolecular Crystallography part A): p. 307-326. Jean-Baptiste Claude, K.S., ~hdricNotredame, Jean-Michel Claverie and Chantal Abergel, CaspR: a web-sewer for automated molecular replacement using homology modelling. Nucleic Acids Research, 2004. 32W: p. 606-609. Mcpherson, A., introduction To Macromoiecuiar Crystallography. 2003, Hoboken, New Jersey: John Wiley & Sons, Inc. P. 25-36. Glaser, F., Steinberg, D. M., Vakser, I. A,, and Ben-Tal, N., Residue frequencies and pairing preferences at protein-protein interfaces. Proteins, 2001. 43(2): p. 89-1 02. Chakrabarti, P., and Janin, J., Dissecting protein-protein recognition sites. Proteins, 2002. 47(3):p. 334-43. Mintseris, J., and Weng! Z., Atomic contact vectors in protein-protein recognition. Proteins, 2003. 53(3):p. 629-39. Ofran, Y., and Rost, B., Analysing Six Types of Protein-Protein Interfaces. Journal of Molecular Biology, 2003. 325(2): p. 377-387. Dasgupta, S., lyer, G. H., Bryant, S. H., Lawrence, C. E., and Bell, J. A,, Extent and nature of contacts between protein molecules in crystal lattices and between subunits of protein oligomers. Proteins, 1997. 28(4):p. 494- 514. Carugo, O., and Argos, P., Protein-protein crystal-packing contacts. Protein Sci, 1997. 6(lO):p. 2261 -2263. Bahadur, R.P., Chakrabarti, P., Rodier, F., and Janin, J., A dissection of specific and non-specific protein-protein interfaces. J Mol Biol, 2004. 336(4):p. 943-55. Rodier, F., Bahadur, R. P., Chakrabarti, P., and Janin, J., Hydration of protein-protein interfaces. Proteins, 2005. 60(1):p. 36-45. Baud, F., and Karlin, S., Measures of residue density in protein structures. Proc Natl Acad Sci U S A, 1999. 96(22):p. 12494-9. Derewenda, Z.S. and P.G. Vekilov, Entropy and surface engineering in protein crystallization. Acta Crystallogr D Biol Crystallogr, 2006. 62(Pt 1): p. 1 16-24. Derewenda, Z.S., The use of recombinant methods and molecular engineering in protein crystallization. Methods, 2004. 34(3):p. 354-63. Longenecker, K.L., Garrard, S. M., Sheffield, P. J., and Derewenda, Z. S., Protein crystallization by rational mutagenesis of surface residues: Lys to Ala mutations promote crystallization of RhoGDI. Acta Crystallogr D Biol Crystallogr, 2001. 57(Pt 5): p. 679-88. Mateja, A., Devedjiev, Y., Krowarsch, D., Longenecker, K., Dauter, Z., Otlewski, J., and Derewenda, Z. S., The impact of Glu-->Ala and Glu-- >Asp mutations on the crystallization properties of RhoGDI: the structure of RhoGDl at 1.3 A resolution. Acta Crystallogr D Biol Crystallogr, 2002. 58(Pt 12):p. 1983-91. Lawson, D.M., Artymiuk, P. J., Yewdall, S. J., Smith, J. M., Livingstone, J. C., Treffry, A., Luzzago, A,, Levi, S., Arosio, P., and Cesareni, G., Solving the structure of human H ferritin by genetically engineering intermolecular crystal contacts. Nature, 1991. 349(6309): p. 541 -4. Janin, J., and Rodier, F., Protein-protein interaction at crystal contacts. Proteins, 1995. 23(4): p. 580-7. Borchert, T.V., Abagyan, R., Kishan, K. V., Zeelen, J. P., and Wierenga, R. K., The crystal structure of an engineered monomeric triosephosphate , monoTlM: the correct modelling of an eight-residue loop. Structure, 1993. l(3): p. 205-13. Kuhlman, B., O'Neill, J. W., Kim, D. E., Zhang, K. Y. J., and Baker, D., Conversion of monomeric protein L to an obligate dimer by computational protein design. PNAS, 2001. 98(19): p. 10687-10691. Thoma, R., Hennig, M., Sterner, R., and Kirschner, K., Structure and function of mutationally generated monomers of dimeric phosphoribosylanthranilate isomerase from Thermotoga maritima. Structure, 2000. 8(3): p. 265-76. Kang, Y.N., Adachi, M., Mikami, B., and Utsumi, S., Change in the crystal packing of soybean {beta}-amylase mutants substituted at a few surface amino acid residues. Protein Eng., 2003. l6(ll): p. 809-817. Anstrom, D.M., Colip, L., Moshofsky, B., Hatcher, E., and Remington, S. J., Systematic replacement of lysine with glutamine and alanine in Escherichia coli malate synthase G: effect on crystallization. Acta Crystallograph Sect F Struct Biol Cryst Commun, 2005. 61(Pt 12): p. 1069-74. Czepas, J., Devedjiev, Y., Krowarsch, D., Derewenda, U., Otlewski, J., and Derewenda, Z. S., The impact of Lys-->Arg surface mutations on the crystallization of the globular domain of RhoGDI. Acta Crystallogr D Biol Crystallogr, 2004. 60(Pt 2): p. 275-80. Granata, V., Housden, N. G., Harrison, S., Jolivet-Reynaud, C., Gore, M. G., and Stura, E. A,, Comparison of the crystallization and crystal packing of two Fab single-site mutant protein L complexes. Acta Crystallogr D Biol Crystallogr, 2005. 61(Pt 6): p. 750-4. Honegger, A., Spinelli, S., Cambillau, C., and Pluckthun, A., A mutation designed to alter crystal packing permits structural analysis of a tight- binding fluorescein-scFv complex. Protein Sci, 2005. l4(lO): p. 2537- 2549. Notredame C., H.D., and Heringa J., T-Coffee: A novel method for multiple sequence alignments. Journal of Molecular Biology, 2000. 302: p. 205- 217.