Crystal structure analysis of a snake venom metalloproteinase in complex with an inhibitor as basis for considerations on the proteolytic activity and the hemorrhagic mode of action

INAUGURALDISSERTATION zur Erlangung der Doktorwürde der Fakultät für Chemie, Pharmazie und Geowissenschaften Albert-Ludwigs Universität Freiburg im Breisgau

vorgelegt von Torsten Jens Lingott aus Bayreuth

Freiburg im Breisgau November 2010

Tag der Bekanntgabe des Prüfungsergebnisses: 16.12.2010

Dekan: Prof. Dr. H. Hillebrecht Referentin: Prof. Dr. I. Merfort Korreferent: Prof. Dr. J. M. Gutiérrez Drittprüfer: Prof. Dr. A. Bechthold

Parts of this thesis have been or are prepared to be published in the following articles:

Lingott, T., Schleberger, C., Gutiérrez, J. M., and Merfort, I. (2009). High-resolution crystal structure of the snake venom metalloproteinase BaP1 complexed with a peptidomimetic: insight into inhibitor binding. Biochemistry 48 , 6166-6174.

Wallnoefer, H. G., Lingott, T., Gutiérrez, J. M., Merfort, I., and Liedl, K. R. (2010). Backbone flexibility controls the activity and specificity of a protein-protein interface: Specificity in snake venom metalloproteases. J Am Chem Soc 132 , 10330-10337.

Lingott, T. and Merfort, I. (xxxx). The catalytic domain of snake venom metalloproteinases - Sequential and structural considerations. in preparation.

Wallnoefer, H. G.*, Lingott, T.*, Escalante, T., Ferreira, R. N., Nagem, R. A. P., Gutiérrez, J. M., Merfort, I., and Liedl, K. R. (xxxx). The hemorrhagic activity of P-I snake venom metalloproteinases is controlled by loop dynamics. in preparation. * Equally contributed authors.

Steinbrecher, T., Lingott, T., and Merfort, I. (xxxx). Free energy calculations on snake venom metalloproteinase BaP1. in preparation.

Relevant coordinates and structure factors have been deposited in the RCSB Protein Data Bank under the following access codes:

2W12 High-resolution rystal structure of snake venom metalloproteinase BaP1 complexed with a peptidomimetic, 1.46 Å, pH 6.5 2W13 High-resolution rystal structure of snake venom metalloproteinase BaP1 complexed with a peptidomimetic, 1.14 Å, pH 4.6 2W14 High-resolution rystal structure of snake venom metalloproteinase BaP1 complexed with a peptidomimetic, 1.08 Å, pH 8.0 2W15 High-resolution rystal structure of snake venom metalloproteinase BaP1 complexed with a peptidomimetic, 1.05 Å, pH 7.5 Parts of this thesis have been presented at following conferences:

Talk:

Lingott, T., Wallnoefer, H. G., Liedl, K. R., Gutiérrez, J. M., and Merfort, I. (2010). In silico tool to predict hemorrhagic activity of snake venom metalloproteinases. 10 th Meeting of the Pan American Section of the International Society on Toxinology (IST) , San José, Costa Rica, April 18 th -20 th .

Posters:

Lingott, T., Gutiérrez, J. M., and Merfort, I. (2008). High-resolution crystal structure of the P-I snake venom metalloproteinase BaP1 in complex with a peptidomimetic: Insight into inhibitor binding. ChemBioNet - 5th Status Seminar Chemical Biology (DECHEMA e.V.) , Frankfurt, Germany, December 8 th -9th .

Lingott, T., Gutiérrez, J. M., Wolber, G. and Merfort, I. (2009). High-resolution crystal structure of a SVMP*inhibitor complex as model for the design of metalloproteinase inhibitors. Drug Discovery and Delivery: Membrane Proteins and Natural Product Research , Freiburg, Germany, April 16 th -17 th .

Lingott, T., Gutiérrez, J. M., Wolber, G. and Merfort, I. (2009). X-ray analysis of a snake venom metalloproteinase inhibitor complex as basis for drug design using pharmacophore-based virtual screening. Fakultätsfest der Fakultät für Chemie, Pharmazie und Geowissenschaften der Albert- Ludwigs-Universität , Freiburg, Germany, July 9th .

Lingott, T., Gutiérrez, J. M., and Merfort, I. (2010). High-resolution crystal structure of the P-I SVMP BaP1 in complex with a peptidomimetic: Insight into inhibitor binding and importance of a flexible loop region correlated to hemorrhagic activity. 10 th Meeting of the Pan American Section of the International Society on Toxinology (IST) , San José, Costa Rica, April 18 th -20 th .

Lingott, T., Wallnoefer, H. G., Liedl, K. R., Gutiérrez, J. M., and Merfort, I. (2010). Sequential and structural comparison of hemorrhagic and non-hemorrhagic P-I SVMPs and specific MD simulations lead to new insight into hemorrhagic activity. 10 th Meeting of the Pan American Section of the International Society on Toxinology (IST) , San José, Costa Rica, April 18 th -20 th .

RESEARCH

to see what everybody else has seen

and

to think what nobody else has thought

Albert von Szent-Györgyi Nagyrápolt

Table of contents

1 INTRODUCTION ...... 1 1.1 ...... 1 1.1.1 Classification of zinc-dependent metalloendopeptidases...... 2 1.1.2 The metzincin clan of metalloendopeptidases...... 3 1.1.3 Reaction mechanism of metzincins...... 8 1.2 Metalloproteinases from snake venoms...... 10 1.2.1 Classification and biosynthesis of snake venom metalloproteinases...... 11 1.2.2 Hemorrhagic activity of snake venom metalloproteinases...... 16 1.3 Snake venom metalloproteinases as models for drug design...... 17 1.4 In silico approaches in drug discovery...... 18 1.4.1 Pharmacophore modeling and virtual screening...... 18 1.4.2 Protein-ligand docking...... 19 1.4.3 Molecular dynamics simulations...... 19 1.4.4 Sequence alignments ...... 20 1.5 Aims of this work...... 21

2 EXPERIMENTAL PROCEDURES...... 22 2.1 Materials ...... 22 2.1.1 Appliances ...... 22 2.1.2 Chemicals and kits ...... 23 2.1.3 Buffers and solutions...... 24 2.2 Protein purification ...... 26 2.2.1 Venom of Bothrops asper snakes ...... 26 2.2.2 Ion exchange chromatography...... 26 2.2.3 Concentration of protein solutions...... 27 2.2.4 Affinity chromatography ...... 27 2.2.5 Gel permeation chromatography ...... 28 2.3 Protein characterization ...... 29 2.3.1 Discontinuous sodium dodecylsulfate polyacrylamide gel electrophoresis...... 29 2.3.2 Photometric determination of protein concentrations ...... 29 2.3.3 Proteolytic activity of BaP1 and inhibition assay ...... 30 2.4 Protein crystallization ...... 30 2.4.1 Crystal lattices and symmetry ...... 30 2.4.2 Solvent content in protein crystals ...... 31 2.4.3 Protein crystallization...... 32 2.4.4 Crystal mounting...... 37 2.5 X-ray structure analysis ...... 38 2.5.1 Theory of X-ray diffraction ...... 38 2.5.2 Reciprocal space and Ewald construction...... 42 2.5.3 Temperature factors ...... 44 2.5.4 The Patterson function ...... 45 2.5.5 Data collection ...... 46 2.5.6 Improvement of the scattering performance of crystals...... 47

I Table of contents

2.5.7 Indexing, scaling, and data reduction ...... 48 2.5.8 Phase determination by molecular replacement ...... 50 2.5.9 Non-crystallographic symmetry...... 51 2.5.10 Model building and refinement...... 52 2.5.11 Macromolecular refinement ...... 54 2.6 Protein structures...... 59 2.6.1 Validation of protein structures...... 59 2.6.2 Presentation and analysis of protein structures ...... 61 2.6.3 Structure comparison...... 61 2.7 In silico analyses using protein structures and sequences...... 62 2.7.1 Pharmacophore modeling ...... 62 2.7.2 Protein-ligand docking ...... 63 2.7.3 Molecular dynamics simulations ...... 63 2.7.4 Conformational analysis of small molecules...... 66 2.7.5 Virtual screening ...... 67 2.7.6 Sequence alignments...... 70 2.7.7 Computational phylogenetic analysis...... 70

3 RESULTS ...... 72 3.1 Crystal structure of the BaP1*inhibitor complex...... 72 3.1.1 BaP1 purification...... 72 3.1.2 Proteolytic activity of BaP1 and protease inhibition assay...... 76 3.1.3 Initial crystallization of native BaP1 and with the inhibitor ...... 77 3.1.4 Refinement of crystallization experiments by the use of seeding methods ...... 81 3.1.5 Structure determination and refinement of the BaP1*inhibitor complex...... 84 3.1.6 Validation of the BaP1*inhibitor complex models...... 89 3.1.7 Structure description of the BaP1*inhibitor complex ...... 90 3.1.8 Structural comparison of BaP1 with of the reprolysin family...... 98 3.2 Multiple sequence alignments of the metalloproteinase domain of SVMPs...... 101 3.2.1 Metalloproteinase domain of P-I snake venom metalloproteinases...... 102 3.2.2 Metalloproteinase domain of P-II snake venom metalloproteinases ...... 106 3.2.3 Metalloproteinase domain of P-III snake venom metalloproteinases ...... 109 3.2.4 Comparison of the different P-classes of snake venom metalloproteinases...... 113 3.2.5 Conserved amino acid residues - structural importance...... 116 3.3 Proteolytic reaction mechanism of metzincins...... 118 3.3.1 Insight derived from the X-ray structures of the BaP1*inhibitor complexes ...... 118 3.3.2 Insight gained by MD simulations of different metzincin structures...... 121 3.4 Elucidation of the hemorrhagic mode of action of SVMPs ...... 125 3.4.1 Insight derived from the X-ray structures of BaP1*inhibitor complexes ...... 125 3.4.2 Insight gained by sequential and structural analyses of metalloproteinase domains . 126 3.4.3 Insight gained by MD simulations of P-I SVMP structures ...... 130 3.5 Search for new BaP1 inhibiting compounds...... 135 3.5.1 Pharmacophore modeling and virtual screening ...... 136 3.5.2 Calculation of binding affinities with MD simulations...... 143

II Table of contents

4 SUMMARY...... 148 5 DISCUSSION...... 152 5.1 Crystal structure of the BaP1*inhibitor complex...... 152 5.2 Proteolytic reaction mechanism of metzincins ...... 153 5.3 Elucidation of the hemorrhagic mode of action of SVMPs...... 155 5.4 Multiple sequence alignments of the metalloproteinase domain of SVMPs ...... 157 5.5 Search for new BaP1 inhibiting compounds...... 159

6 BIBLIOGRAPHY...... 161 7 APPENDIX ...... 182 7.1 Calibration of gel permeation column ...... 182 7.2 Successful crystallization conditions with streak seeding ...... 182 7.3 Crystal contacts in BaP1 models of dataset I, II, III, and IV ...... 183 7.4 Refinement statistics of dataset I, II, III, and IV...... 185 7.5 SVMP sequences deposited in the UniProtKB/SwissProt database...... 186 7.5.1 P-I snake venom metalloproteinases...... 186 7.5.2 P-II snake venom metalloproteinases ...... 188 7.5.3 P-III snake venom metalloproteinases ...... 189 7.6 Multiple sequence alignments of the metalloproteinase domain of SVMPs ...... 191 7.6.1 Metalloproteinase domain of P-I snake venom metalloproteinases ...... 191 7.6.2 Metalloproteinase domain of P-II snake venom metalloproteinases ...... 193 7.6.3 Metalloproteinase domain of P-III snake venom metalloproteinases...... 194 7.6.4 Statistical distribution of amino acid residues at the of SVMPs...... 196 7.6.5 Hemorrhagic and non-hemorrhagic P-I snake venom metalloproteinases ...... 197 7.7 Biological testing of NCI database screening hits ...... 198 7.8 Acknowledgements...... 205

III Glossary

Acronyms

ADAM a and metalloproteinase domain containing ADAMTS a disintegrin and metalloproteinase domain containing enzyme with thrombospondin motifs Ax absorption at a wavelength x nm BESSY Berliner Elektronenspeicherring-Gesellschaft für Synchrotronstrahlung CCD charge coupled device CLP C-type lectin-like protein DSSP dictionary of secondary structure of proteins EGF epidermal growth factor-like ER endoplasmic reticulum FACIT fibril associated with interrupted triple helices GB generalized Born method GPC gel permeation chromatography (gel filtration) GPSA Generalized-Born surface area HEPES N-2-hydroxyethylpiperazine-N’-2-ethanesulfonic acid HIC-Up hetero-compound information center Uppsala HIV human immunodeficiency virus HVR hypervariable region IEC ion exchange chromatography β-ME β mercapto ethanol MEP MES N-morpholinoethanesulfonic acid MCA macromolecular annealing MD molecular dynamics MM molecular mechanics MMP MR molecular replacement MSA multiple sequence alignment MWCO molecular mass (weight) cutoff NCS non-crystallographic symmetry NMR nuclear magnetic resonance PB Poisson-Boltzmann equation PBSA Poisson-Boltzmann surface area PDB Protein Data Bank (www.pdb.org) PEG-xk polyethylene glycol with a mean molecular mass of x 10 3 g/mol pI isoelectric point QM quantum mechanics RCSB research collaboratory for structural bioinformatics Rcryst crystallographic R factor Rfree 'free' cystallographic R factor Rmeas redundancy-weighted R factor for symmetry-related intensities Rsym R factor for symmetry-related reflection intensities RMS root mean square RT room temperature SA surface area SDS-PAGE sodium dodecyl sulfate polyacrylamid eletrophoresis SVMP snake venom metalloproteinase TACE tumor necrosis factor α converting enzyme TI thermodynamic integration TIMP tissue inhibitor of metalloproteinases TNF α tumor necrosis factor α v/v volume per volume w/v weight per volume XANES X-ray absorption near-edge structure spectoscropy ZBG zinc-binding group

IV Glossary

Symbols v v v a, b, c, α, β , γ real space cell parameters v v v a*, b*, c*, α*, β*, γ * reciprocal space cell parameters α(hkl ) phase angle of structure factor F(hkl ) Å Ångström (10 -10 m = 0.1 nm) B temperature factor ( B factor) Cα alpha C-atom of amino acid χ solvent content of the crystal d lattice spacing dj conformation being poled D displacement from feature center Di RMS deviation between poling distances G0 free binding energy ∆ o Sconfig change in configurational entropy E normalized structure factor E normalized structure factor amplitude f atomic form factor Fcalc , Fobs calculated and observed structure factor amplitudes g gravitational constant h Planck’s constant Ĥ Hamiltonian hkl ( h lk ) Miller index (Miller index of the Friedel mate) IC 50 half maximal inhibitory concentration i imaginary unit (i 2 = -1) Ivobs observed diffraction intensities k wave vector of diffracted X-ray radiation KD dissociation constant λ wavelength λ (snapshot) instant moment of the MD simulation m figure of merit Nd number of poling distances OA,B volume overlap between conformer A and B Ψ wave function of the system P(u,v,w) Patterson function φ, ψ main-chain torsion angles ρvel electron density rv positional vector in real space s positional vector in reciprocal space σ standard deviation T absolute temperature Tf feature tolerance θv glancing angle u vector in Patterson space U Boltzmann-averaged potential energy u, v, w coordinates in Patterson space W Boltzmann-averaged solvation energy Wf feature weight Wpole scaling factor in poling algorithm V0 void volume VEZ unit cell volume VM Matthews parameter x, y, z real space coordinates

V

1.1 Metalloendopeptidases

1 Introduction

1.1 Metalloendopeptidases

Degradation of other peptides or proteins is the main function of metalloproteinases, whereby their catalytic activity can only be achieved in the presence of metal ions. They are classified as either exopeptidases or based on whether terminal or internal peptide bonds are cleaved. Metalloendopeptidases (MEPs; EC subclass 3.4.24) are ubiquitous and widely involved in metabolism regulation through their ability to extensively degrade proteins as well as to selectively hydrolyze specific peptide bonds (Sternlicht and Werb, 2001). In most cases, MEPs are zinc- dependent and are present across all kingdoms of living organism. Thereby, they are catalyzing extensive processing events like digestion or degradation of intake proteins and tissue development, maintenance, and remodeling (Gomis-Ruth, 2003). Furthermore, they are involved in activation mechanisms by proteolytic effects. Through specific and limited cleavage of peptide bonds, other enzymes, proenzymes, bioactive peptides, DNA repressors, or themselves can be activated or inactivated. Some MEPs are also able to shed soluble forms of cytokines and other bioactive peptides from membrane-anchored precursors. Hence, they are involved in downregulating protein concentrations at cell surfaces and increasing the levels of circulating forms (Neurath and Walsh, 1976; Yong, 2005). Nearly 40 years ago, the first atomic structure of a MEP, (Matthews et al., 1972), became famous. Since then, it has been known that the active site of MEPs is characterized by a cleft designed to accommodate the substrate which has to be hydrolyzed. Usually, the is categorized and named after the peptide’s side chains interacting with specific pockets of the protein (Schechter and Berger, 1967). In this scheme, the scissile bond of the substrate at the active site acts as center (zinc ion) and the interacting subsites of the protein located on the N-terminal side (unprimed side) are marked with S1, S2, and S2 (counted from the active site towards the N- terminal). Subsites interacting with side chains on the C-terminal side (primed side) are named S1', S2', S3', etc. (from the scissile bond towards the C-terminal). Any interaction between substrates, inhibitors, small molecules, and other compounds and the binding site of metalloproteinases can thereby be described in detail. The corresponding side chains of putative substrates are, consequently, termed P3, P2, P1, (scissile bond), P1', P2', P3' (Figure 1).

1 1 INTRODUCTION

Figure 1 Schematic representation interactions of a substrate with a proteolytic enzyme within the active site cleft (Gomis-Ruth, 2003). Enzymatic subsites (S) and side chains of the substrate (P) are termed according to common nomenclature (Schechter and Berger, 1967).

1.1.1 Classification of zinc-dependent metalloendopeptidases

At the bottom of the binding cleft, the catalytically important zinc ion is positioned. Commonly, it is coordinated by three protein side chains. Mostly histidine, but also glutamate, aspartate, lysine, and arginine residues come into question for this task. In the majority of cases, MEPs use two histidine residues which occur within a short consensus sequence embedded in an active site α- helix, HEXXH (one-letter code for amino acids; X: any residue). Additionally, a remote side chain acts as third ligand of the zinc ion (histidine, glutamate, aspartate) (Figure 2). A fourth zinc ligand would be a solvent molecule which is activated by an acidic residue, usually glutamate, acting as general base during the catalyzed reaction (Section 1.1.3). With respect to the presence of the zinc ion, these MEPs have been termed zincins. The other zinc-dependent group of MEPs, termed inverzincins, exhibit an inverted zinc-binding motif, HXXEH (Figure 2). In this case, the third zinc ligand is a remote glutamate residue located in a second α-helix. The overall arrangement is very similar, but the direction of the active site α-helix which is comprising the zinc-binding motif is inverted with respect to zincins. Inverzincins are mainly mammalian or insect MEPs and their structural prototypes are pitrilysin from Escherichia coli (PDB code: 1Q2L) and yeast mitochondrial peptidase (Taylor et al., 2001). Most of the zinc-dependent MEPs are actually zincins which can be further divided into three structurally characterized clans. This classification is dependent on the nature and position of the third zinc ligand or a highly conserved loop region. Consequently, the groups are termed gluzincins (glutamate as third ligand), aspzincins (aspartate as third ligand), and metzincins (conserved Met- turn). Most prominent examples for the gluzincins are thermolysins which possess a second consensus motif, NEXXSD, about 20 residues down-stream of the HEXXH motif (Matthews et al., 1972). These residues are located in a second α-helix close to the active site and thereby the third zinc ligand, the glutamate residue, is provided. Both α-helices and a structurally conserved β-sheet

2 1.1 Metalloendopeptidases are forming the binding pocket of this peptidase class. Other examples are neurolysin (Brown et al., 2001), (Oefner et al., 2000), and neurotoxin A and B from Clostridium botulinum (Lacy et al., 1998; Swaminathan and Eswaramoorthy, 2000). The group of the aspzincins show identical structural arrangement concerning the two α-helices and the β-sheet, with the sole exception of an aspartate residue as third ligand instead of the glutamate residue. This could be confirmed by the structures of deuterolysin from Aspergillus oryzae (Fushimi et al., 1999) and a MEP from Grifola frondosa (Hori et al., 2001). Finally, the third and most abundant subclass of the zincins is the one of the metzincins.

Figure 2 Schematic classification of zinc-dependent metalloendopeptidases after Gomis-Ruth (2003). of the zincin type have the minimal zinc-binding motif, HEXXH, whereas the inverzincins possess the inverted form, HXXEH. Zincins comprise three enzyme families which are termed according to the third zinc ligand or a highly conserved loop: gluzincins (glutamate), metzincins (Met-turn), and aspzincins (aspartate). Within the metzincins the MMPs, adamalysins, astacins, serralysins, and pappalysins are the major components.

1.1.2 The metzincin clan of metalloendopeptidases

Proteins of the metzincin clan of metalloendopeptidases encompass an extended zinc-binding motif, HEXXHXXGXXH/D, which comprises the first two zinc-coordinating histidine residues as well as the third ligand (either histidine or aspartate) and the catalytically essential glutamate residue. Furthermore, these proteins possess a conserved methionine-containing 1,4-β-turn which is located about 20 residues down-stream of the first zinc-coordinating histidine residue and gave rise to the name of the subclass (Bode et al., 1993; Stocker et al., 1995; Stocker and Bode, 1995). Enzyme families included in the metzincins are astacins, serralysins, and, recently discovered, fragilysins, pappalysins, and gametolysins (Gomis-Ruth, 2003). However, adamalysins and matrix metalloproteinases (MMPs), also called matrixins, are the most important families of MEPs (Figure 2).

3 1 INTRODUCTION

Matrix metalloproteinases

The MMP family comprises nearly 30 related zinc-dependent vertebrate MEPs which all are involved in the turnover of extracellular matrix and other basement-membrane compounds. Thereby, substrates are collagens, laminins, fibronectins, vitronectins, aggrecans, enactins, tenascins, elastins, and proteoglycans (Nagase and Woessner, Jr., 1999). Through degradation of these components matrixins play an important role in many physiological processes, such as embryonic development, morphogenesis, reproduction as well as bone remodeling (Hu et al., 2007). Additionally, they are involved in the reorganization of tissues during pathological processes, such as inflammation, wound healing, and invasion of cancer cells (Egeblad and Werb, 2002). As MMPs are implicated in so many physiological processes, the proteolytic activity has to be controlled precisely during activation from their inactive precursors. The inactive state is kept up by endogenous inhibitors, α-macroglobulins, and tissue inhibitors of metalloproteinases (TIMPs). Like other proteolytic enzymes, MMPs are first synthesized as inactive proenzymes or zymogens. Usually, self-inhibition is provided by an unpaired cysteine sulfhydryl group of the pro-peptide domain that acts as fourth zinc ligand (Springman et al., 1990). The activation upon removal of the pro-peptide is called cysteine-to-zinc or just cysteine switch (Sternlicht and Werb, 2001). As mentioned before, MMPs are also capable of ectodomain shedding of growth factors, growth factor binding proteins, hormone and hormone receptors, cytokines and cytokine receptors from the cell surface, in the conversion of receptor agonists and antagonists, and in the exposure of cryptic neoproteins (Klein and Bischoff, 2010). Besides, it has been recognized that MMPs cleave many other types of peptides and proteins and have a myriad of other important functions that may be independent of proteolytic activity (Overall and Lopez-Otin, 2002). Common names of MMPs are generally based on a preferred substrate and an MMP numbering system following the order of discovery. They can be classified into (i) true , capable of cutting triple helix at a single site (usually the N-terminus of a leucine residue), (ii) , targeting denatured collagens and gelatins, and (iii) stromelysins which have a broad proteolytic activity and may degrade proteoglycans (Yong, 2005). According to domain stucure, MMPs can also be categorized into eight different groups (A to H) (Figure 3) (Sternlicht and Werb, 2001). In this regard, all matrixins comprise a signal peptide (approx. 20 residues) which is cleaved off after directing the synthesis to the endoplasmic reticulum (ER). Mostly, MMPs are secretory proteins, but six examples are expressed as cell surface enzymes that contain transmembrane domains (Sternlicht and Werb, 2001). Directly connected to the signal peptide are the pro-peptide (approx. 80 residues) and the catalytically active metalloproteinase domain (approx. 160 to 170 residues). The former domain is maintaining enzyme latency until it is removed or dirupted while

4 1.1 Metalloendopeptidases the latter domain is responsible for proteolytic activity and is containing the consensus motif of the zinc-binding region. Connected by linker or so-called hinge regions, a hemopexin-like C-terminal domain is following which is required for cleavage of triple helical interstitial collagens and for pro- MMP activation on the cell surface (Bode and Maskos, 2001). Further insertions may comprise furin-susceptible sites, fibronectin type-II-related domains, transmembrane domains, and glycophosphatidyl inositol-anchoring domains (Figure 3).

Figure 3 Domain structure of MMPs and examples (Sternlicht and Werb, 2001). Pre, signal sequence; Pro, pro-peptide with a free zinc-ligating thiol (SH) group; F, furin-susceptible site; Zn, zinc-binding site; II, collagen-binding fibronectin type II inserts; H, hinge region; TM, transmembrane domain; C, cytoplasmic tail; GPI, glycophosphatidyl inositol- anchoring domain; C/P, cysteine/proline; IL-1R, interleukin-1 receptor. The hemopexin/vitronectin-like domain contains four repeats with the first and last linked by a disulfide bond.

The adamalysin/reprolysin family of metalloendopeptidases

This family of MEPs was originally named after proteins characterized from hemorrhagic and non-hemorrhagic reptilian venoms (reprolysins). The more commonly used name, adamalysins, is based on its structural prototype, the snake venom metalloproteinase (SVMP) adamalysin II from Crotalus adamanteus (Gomis-Ruth et al., 1993). Besides SVMPs (Section 1.2), ADAMs (a disintegrin and metalloproteinase domain containing enzymes) and ADAMTS (ADAM with thrombospondin motifs) are prominent mammalian members of the adamalysins. Like the MMPs, adamalysins are able to digest extracellular matrix components, such as type-IV collagen, nidogen, fibronectin, laminin, and gelatin (Gomis-Ruth, 2003). With MMPs, adamalyins also share the structural arrangement of the binding site cleft and a clear preference to substrates with bulky hydrophobic residues at P1' (Figure 1). Besides, they do not contain the hemopexin domain and mainly not the abovementioned insertions (Figure 3) but are synthesized with different additional domains, such as the disintegrin and the cysteine-rich domain. Interestingly, eight of the known ADAMs lack proteolytic activity of their metalloproteinase domain, among them ADAM22 of which recently the atomic structure was published (Edwards et al., 2008; Liu et al., 2009). 5 1 INTRODUCTION

The membrane-anchored ADAMs mediate cell-cell fusion and are engaged in adhesion and intracellular signalling, mainly, by interaction with its disintegrin and cysteine-rich domains (Murphy, 2008; White, 2003). The presence of the disintegrin domain is unique among all known cell-surface proteins. ADAMs also play a major role in protein ectodomain shedding (Edwards et al., 2008). To date, 22 human ADAMs are identified which were termed in order of their discovery (Klein and Bischoff, 2010). After the first ADAM was discovered in guinea-pig sperm, it became clear that these enzymes possess important biological properties (van Goor et al., 2009; Wolfsberg et al., 1993). Comparison with other metzincins revealed that the catalytic domains of ADAMs and SVMPs are structurally very similar, particularly, in the fold and the arrangement of the active site cleft (Gomis-Ruth, 2003; Takeda, 2009). Like MMPs, ADAMs comprise a pro-peptide domain which is generally removed intracellularly during the transit through the Golgi system. This pro-peptide is acting in a self-inhibitory process, the aforementioned cysteine switch, as regulator of the proteolytic activity. The pro-domain is linked together with the metalloproteinase domain which contains the metzincin-typical consensus zinc-binding motif. All following domains differ from the domain structure in MMPs (Figure 3). Thus, ADAMs usually comprise the disintegrin domain, the cysteine-rich domain with an epidermal growth factor-like (EGF) repeat, the transmembrane domain, and, finally, the cytoplasmic tail (Figure 4). The disintegrin domain contains a 14-residue stretch that is implicated in interactions between ADAMs and integrins (White, 2003).

Figure 4 Schematic representation of the domain structure of ADAMs (Murphy, 2008). PRO, large amino-terminal pro-peptide; MP, metalloproteinase domain with the consensus sequence of the zinc-binding motif; DIS, disintegrin domain which is disulphide bonded to a cysteine- rich region (CR); latter region often contains an epidermal growth factor-like repeat (EGF); structural studies revealed a hypervariable region (HVR) in the cysteine-rich domain that is likely to be a major interaction site with other molecules. The ADAMs ectodomain is inversely C-shaped, whereas the transmembrane domain (TM) is anchoring the protein in the membrane and represents the connection to the cytoplasmic domain (CD).

6 1.1 Metalloendopeptidases

ADAMs have been implicated in a set of diseases, such as diabetes, Alzheimer’s disease, cardiac hypertrophy, cancer, rheumatoid arthritis, Crohn’s disease and microbial infections (Moss et al., 2008; Murphy, 2008; Murphy, 2009; Seals and Courtneidge, 2003). Especially, ADAM17, also known as TACE (tumor necrosis factor alpha converting enzyme), deserves high interest. It is a promising medicinal target, as it cleaves the membrane-bound precursor of tumor necrosis factor alpha (TNF-α) to its soluble form which is involved in a wide set of diseases ranging from rheumatoid arthritis to tumors (Kenny, 2007). A further subgroup of the adamalysin is formed by the ADAMTSs. Compared to ADAMs, these metzincins are lacking the transmembrane domain and instead contain multiple C-terminal copies of a thrombospondin 1-like repeat (Apte, 2009). Main functions of the ADAMTSs are the prevention of cell adhesion by binding to integrins, locally anchoring to the extracellular matrix, and cleaving several extracellular matrix components (Gomis-Ruth, 2003). All these enzymes are considered to be highly responsible for the cartilage aggrecan catabolism observed during the development of rheumatoid arthritis and osteoarthritis (Apte, 2009).

The metzincin’s topology, fold, and zinc-binding site

All metzincins share an identical topology and fold of the metalloproteinase domain (Figure 5). Usually, these domains are not larger than 220 amino acids (mostly about 200 residues) and are arranged in a globular shape. They can be divided into a large N-terminal and a small C-terminal subdomain (approximately 3:1 ratio according to the amount of residues) which are virtually forming the substrate binding site. As mentioned before, the catalytically essential zinc ion is positioned at the bottom of the binding site cleft and most metzincins require elongated substrates for optimum cleavage efficiency (Stocker et al., 1995). The N-terminal subdomain, also called upper domain according to its position in standard orientation (Gomis-Ruth, 2003), consists of a five-stranded twisted β-sheet and four α-helices. All the β-strands are parallel to each other and to a putative substrate bound in the cleft with the exception of strand IV. The latter strand is crucial for interaction with the substrate, mainly through backbone hydrogen bonds, and simultaneously is forming the upper wall of the active site crevasse. The opposite wall is mainly featured by flexible loop regions in between the invariant Met-turn and the only α-helix of the C-terminal subdomain (Figure 5A). Seven residues of the consensus zinc-binding sequence are part of the fourth α-helix of the N- terminal subdomain (Figure 5B). Thereby, the helix encompasses more than half of the conserved residues including the first two zinc-binding histidine residues. The two zinc-binders are separated by a single α-helical turn which allows a concerted approach to the metal as well as to the catalytic water molecule for the essential general base. The next invariant residue is a glycine at the end of

7 1 INTRODUCTION the helix. Only this type of amino acid can support the corresponding main-chain angles for the observed sharp turn of the backbone at this position. As any other residue would be in a high- energy conformation, only slight variation is known among metzincin structures, such as an elongated loop avoiding the sharp turn (Gomis-Ruth, 2009). Directly after the glycine residue, the small C-terminal subdomain begins and the last four residues of the consensus sequence are providing the third zinc-binding histidine residue. Afterwards, the most variable part, both in amino acid composition and length as well as in conformation follows. Interestingly, in this region a sequentially and structurally invariant 1,4-β-turn is located, the so-called Met-turn. It is composed of three residues, whereby the central one is a methionine residue which gave rise to the metzincin name of this kind of MEPs. Its exact function is still not completely known, but the position directly underneath the zinc ion allows the presumption that it is responsible for an energetically favorable environment for the zind ion. By mutagenesis studies it could already be shown that it is essential for function and the structural integrity of the active site arrangement. However, a role during folding could also be conceived (Hege and Baumann, 2001; Tallant et al., 2010).

Figure 5 Schematic representation of the topology (A) and the fold (B) of metzincins (Gomis-Ruth, 2003; Murphy, 2008). In the topology plot (representated by the SVMP adamalysin II) β- sheets are indicated as gray arrows and α-helices as light gray cylinders. In the representation of the fold the twelve amino acid residues of the zinc-binding motif HEXXHXXGXXH are depicted as gray dots and partly are indicated with the one-letter code for amino acids. The zinc ion and the highly conserved Met-turn are also shown.

1.1.3 Reaction mechanism of metzincins

As mentioned before, the catalytic activity of metzincins is based on the zinc ion and the highly conserved glutamate residue which directly follows after the first zinc-binding histidine residue. For the hydrolysis of peptide bonds, a water molecule is also needed. Usually, in zinc-containing metalloenzymes, this water molecule is the fourth ligand and tetrahedral metal ion coordination is the most common arrangement (Alberts et al., 1998; Auld, 2001). The proposed catalytic mechanism of metzincins and other MEPs can be divided into two steps: the addition and the elimination (Figure 6) (Lovejoy et al., 1994; Stocker and Bode, 1995). In the

8 1.1 Metalloendopeptidases first step, the water molecule is added to the peptide bond by nucleophilic attack. This is leading to an energetically unfavorable tetrahedral intermediate and needs a considerable amount of energetic drive. It was proposed that this can be achieved by activation of the water molecule through the catalytically essential general base, in the case of metzincins this is an invariant glutamate residue. The partly acceptance of the water’s proton produces a more nucleophilic, nearly hydroxylic, character of the attacking substance and the step of addition can be realized. Additionally, the intermediate state is stabilized by the Lewis-acidic character of the zinc ion as it lowers the unfavorable negative charge of the peptide’s former carbonylic oxygen atom. In the elimination step the peptide bond is cleaved and the proton of the general base is retransferred to the N-terminal end of the cleaved substrate (Ramos and Selistre-de-Araujo, 2006).

Figure 6 Schematic representation of the catalyzed reaction of metalloproteinases after Lovejoy et al. (1994). The reaction can be divided into an addition and an elimination step, whereby the scissile peptide bond passes through a tetrahedral reaction intermediate. Essential for the reaction are the zinc ion, the catalytic water molecule, and the general base.

Dependency of the proteolytic activity on the pH value

Many activity studies on human MMPs and ADAMs, but also on SVMPs have revealed a depencency of metzincin’s proteolytic activity on the pH value, either in in vitro experiments or in their common physiological environment (Fasciglione et al., 2000; Ramos and Selistre-de-Araujo, 2006; Vuotila et al., 2002; Xu et al., 2004; Zhu et al., 1999). Usually, the physiological pH value at the scene of action for SVMPs lies around 7.5 and the pH optimum of the proteolytic activity is shifted towards slight basic values (Gutierrez et al., 2005; Zhu et al., 1997). Further to acidic pH values, the proteolytic activity is rapidly decreasing as it was shown in in vitro experiments (Zhu et al., 1997). Studies concerning biosynthesis of SVMPs (Section 1.2.1) have also suggested that the acidic pH of the venom gland lumen, together with the presence of tripeptide endogenous

9 1 INTRODUCTION inhibitors, contribute to the lack of proteolytic activity of SVMPs until injection into external tissue (Robeva et al., 1991). Possible explanations for the pH dependency of the proteolytic activity have been proposed by Ramos and Selistre-de-Araujo (2006) and others (Guan et al., 2010; Wu et al., 2009; Xu et al., 2004; Zhao et al., 2007). In a study on AcutolysinC using X-ray absorption near-edge structure spectroscopy (XANES) at different pH values, it was shown that the overall arrangement of the catalytic zinc ion remained tetrahedral going from pH 8.0 to 3.0, but the zinc coordination distances and the distance between the glutamate and the active site water molecule increased (Zhao et al., 2007). Another possible mechanism for pH-dependent modulation of proteolytic activity could be a more widespread change in enzyme structure (Xu et al., 2004). In a recent work, it was suggested that the lower pH is influencing the conformation of the second and third zinc-binding histidine residues, hence, indirectly affecting the coordination state of the metal ion (Guan et al., 2010). A theoretical work of Wu et al. concerning MD simulations of Acutolysin A (Wu et al., 2009) has confirmed the proposition that in the native state of SVMPs a water molecule is present which is then activated by the glutamate residue. Thereby, it could be shown that only the distances between both hydrogen atoms of the water molecule are changed which indicates the modification of the water molecule into a hydroxide ion. However, the distance of the glutamate’s oxygen atom and the water’s oxygen atom is staying equal. Hence, the dependency of the proteolytic activity on the pH value is explained by the probability of the water molecule to be present as hydroxide ion which is less favorable in acidic than in basic conditions.

1.2 Metalloproteinases from snake venoms

Apparently, most articles in the field of toxinology are related to snake venoms and their components (Guimaraes and Carlini, 2004) which is not surprising owing to the significant public health relevance of snake-bites. Worldwide about 5.5 million people get bitten by snakes each year (Chippaux, 2008; Kasturiratne et al., 2008). The accumulated knowledge of snake venoms led to a better understanding of toxins in general, of venom compositions and, a more successful treatment of envenomations. Snake venoms are complex mixtures of a variety of substances. Predominantly, the dry weight is composed of proteins, nevertheless also organic low molecular mass compounds and metal ions account for their complexity. Among the proteins, enzymes such as acetylcholinesterases, ADPases,

L-amino acid oxidases, phospholipases A 2, hyaluronidases, metallo-, and serine proteases are included. Additionally, non-enzymatic proteins, like , α-neurotoxins, C-type lectin-like proteins (CLPs) and bradykinin-potentiating peptides also contribute to toxicity (Mackessy, 2009).

10 1.2 Metalloproteinases from snake venoms

The envenomation by a snake is, in the majority of cases, a combined effect of local pathological and systemic pathophysiological alterations. While the latter are readily avoided by timely administration of antivenoms, local pathological effects are difficult to neutralize, primarily because of the very rapid development of these lesions (Gutierrez et al., 2009). Although, already in the late 19 th century the proteolytic character of venom of snakes of the genus Bothrops was mentioned for the first time, it took more than half a century until a definitive experimental work was established to demonstrate the proteolytic nature of snake venoms (Fox and Serrano, 2009). Henceforth, it has been proven that snake venoms contain major quantities of metalloproteinases and, since then, these enzymes have been studied intensively. Meanwhile, it is known that SVMPs are one of the most abundant components of viperid snake venoms. It has been assumed that more than one third of the venom’s content is composed of this kind of proteinases (Fox and Serrano, 2009). Beside myotoxic phospholipases A 2, the direct action of SVMPs is main reason for the complex and severe local pathological manifestations, including edema, blistering, dermonecrosis, myonecrosis and hemorrhage (Gutierrez et al., 2009). In the meantime, many studies have also attributed a significant role to SVMPs in the systemic effects of the snake envenomation, predominantly, in the haemostatic disturbances leading to systemic bleeding.

1.2.1 Classification and biosynthesis of snake venom metalloproteinases

Twenty-five years ago, the first basis for SVMP classification was given by Bjarnason and Fox with a simple differentiation between small and large hemorrhagic toxins (Bjarnason and Fox, 1983). A few years later, in a more complete assessment, SVMPs were categorized into four main classes (P-I, P-II, P-III, and P-IV) according to the presence or absence of different non-proteinase domains as observed in isolated proteins and their mRNA transcripts (Bjarnason and Fox, 1994). Thereby, the assignments were as follows (referring to the mature proteins): P-I SVMPs composed of a metalloproteinase domain only; P-II SVMPs being synthesized with the metalloproteinase and a disintegrin domain; P-III SVMPs possessing the metalloproteinase, the disintegrin-like, and a cysteine-rich domain; and finally P-IV SVMPs, additionally to the P-III structure, being synthesized with C-type lectin-like domains connected via disulfide bonds. However, the current categorization is consisting of only three main classes (P-I, P-II, and P-III) (Fox and Serrano, 2008) (Figure 7). The former P-IV class has been abolished due to the fact that to date no P-IV mRNA transcript has been observed; suggesting that the P-IV structure could only be a post-translational modification of the P-III proteins. The P-II and P-III classes are further divided into several subclasses (Figure 7),

11 1 INTRODUCTION

which became relevant once variable processing as well as dimerization of the proteins was observed (Tallant et al., 2010).

Figure 7 Schematic representation of the three different SVMP main classes with their corresponding subclasses and examples (Fox and Serrano, 2008). (?), processed product has not been identified in venom; P, signal peptide; Pro, pro-peptide domain; Proteinase, metalloproteinase domain; S, spacer; Dis, disintegrin domain; Dis-Like, disintegrin-like domain; Cys-Rich, cysteine-rich domain; Lec, lectin-like domain.

Like most secretory proteins, metalloproteinases from snake venoms begin to be synthesized in the cytoplasm of secretory cells and are then cotranslationally transferred to the ER. Afterwards, they continue their way via the Golgi apparatus to the lumen of the venom gland with the aid of secretory granules (Figure 8). All nascent proteins contain a signal sequence which is essential for guidance to the ER by signal recognition particles. It is removed on the way to the ER and, usually, this sequence is composed of about 18, mostly hydrophobic, amino acid residues (Ramos and Selistre-de-Araujo, 2006). In the ER the proteins undergo folding, disulfide bond formation, N- glycosylation, and in a few cases multimerization processes (Fox and Serrano, 2008). In the following the different domains of SVMPs are described in detail concerning their function and structure.

Pro-peptide domain

Like MMPs and ADAMs (Section 1.1.2), SVMPs are synthesized as inactive zymogens (Bjarnason and Fox, 1994; Selistre de Araujo and Ownby, 1995). Responsible for maintenance of enyzme latency is the pro-peptide domain which is composed of about 200 amino acid residues. A conserved seven-residue motif of the pro-peptide, PKMCGVT, is supposed to interact with the zinc ion as fourth ligand and thereby prevents any proteolytic action (Ramos and Selistre-de-Araujo,

12 1.2 Metalloproteinases from snake venoms

2006). As soon as it is cleaved off, either in an autocatalytic fashion or by the action of other venom proteases (Ramos and Selistre-de-Araujo, 2006; Shimokawa et al., 1996), the metalloproteinase domain is proteolytically active. Similar to the mode of action in MMPs (Springman et al., 1990), the mechanism of self-inhibition is called cysteine-switch (Grams et al., 1993). As only in a few instances SVMP sequences have been detected in the actual venom (Cominetti et al., 2003), the proteolytic cleavage of the pro-peptide is supposed to mainly occur in advance of releasing the mature form into the venom gland (Figure 8). The amino acid sequences of SVMPs’s pro-peptides are highly conserved.

Metalloproteinase domain

The catalytic domain of SVMPs usually contains about 200, usually not more than 215, amino acid residues and consists of the known metzincin fold. In detail, there is a major N-terminal (residues 1-149; numbering according to BaP1, PDB code: 2W15) and a minor C-terminal subdomain (residues 150-202) which flank the shallow active site cleft. The metalloproteinase domain is less conserved than the pro-peptide domain. This is the reason why the hypothesis was suggested that SVMPs have evolved from gene duplication from a common ancestor and that the metalloproteinase domain has passed through an accelerated evolution process (Paine et al., 1994; Selistre de Araujo and Ownby, 1995). As it is known for metzincins, the major subdomain adopts an α/β-fold containing four α-helices and a five-stranded β-sheet, in which only the penultimate strand is antiparallel. The minor subdomain starts immediately after the highly conserved zinc-binding motif and comprises only one α-helix and several loops. However, this subdomain includes the highly conserved Met-turn. The zinc-binding motif can be represented by the following conserved sequence: HEbxHxbGbxHD; comprising the three zinc-binding histidine residues, the catalytically essential glutamic acid, a structurally important aspartic acid, and a conserved glycine residue. Additionally, there are three positions (b) preferentially occupied with bulky hydrophobic residues, while the remaining three residues (x) are not yet reported to be conserved in SVMPs (Fox and Serrano, 2005; Fox and Serrano, 2008). As mentioned before, all members of the metzincins contain the highly conserved Met-turn which is an 1,4-β-turn. It is positioned in near proximity of the catalytic zinc ion, protruding towards it, but, nevertheless, too far away for direct interactions. The detailed function of this turn is still not completely known, but it was proposed that it forms a hydrophobic patch to ensure an energetically favorable environment for the zinc ion. By mutagenesis studies it could already be shown that it is essential for function and the structural integrity of the active site arrangement of different 'metzincins' (Tallant et al., 2010).

13 1 INTRODUCTION

Figure 8 Schematic representation of the hypothetical biosynthetic pathways for the production of the three SVMP classes (Fox and Serrano, 2008). From transcription at the ER surface (I.), through the ER (II.-III.) to the Golgi (IV.). Parentheses indicate a product which has not yet been observed in the venom. P, pro-peptide domain; M, metalloproteinase domain; S, spacer; D, disintegrin domain; DL, disintegrin-like domain; Cys, cysteine-rich domain; L, lectin-like domain.

Disintegrin and disintegrin-like domain

Usually, the disintegrin (P-II class) or disintegrin-like domains (P-III class) vary greatly in length (41 to about 100 residues) and content of disulfide bonds. They are attached to the metalloproteinase domains via a spacer peptide (Figure 7). This spacer sequence contains about 10 to 15 amino acid residues and provides structural space between the proteinase and the following domains that could be involved in post-translational modification of the nascent SVMP (Fox and Serrano, 2009; Hite et al., 1992). The main structural differences between the disintegrin domains of P-II SVMPs and the disintegrin-like domains of P-III SVMPs is the presence of an additional cysteinyl residue in the latter domains. It is located in the so-called RGD loop and influences interaction with integrins (Fox and Serrano, 2009). Usually, the RGD motif of this loop acts as integrin-binder and it is supposed that the loss of the cysteinyl residue in evolution resulted in the potent integrin-binding inhibitor activity associated with the disintegrins (Calvete, 2005). The presence of disintegrins in venoms was known since a long time (Huang et al., 1987). However, not before cDNA sequences of SVMPs began to be solved, it became apparent that the active disintegrins commonly found in viperid venoms generally result from proteolytic processing of a precursor bound to the metalloproteinase domain (Huang et al., 1987; Shimokawa et al., 1996).

14 1.2 Metalloproteinases from snake venoms

The disintegrin-like domains are usually monomeric, whereas the disintegrin domains can also be dimeric, either homodimeric or heterodimeric (Calvete et al., 2003). As it is implied by the domain name, the main function is the interaction with integrins. In 2005, a functional classification for disintegrins based on integrin selectivity was suggested (Marcinkiewicz, 2005). Thereby, disintegrins are subdivided concerning their conserved three-residue interaction motif into RGD-, MLD-, KTS-, and D/ECD-disintegrins. Structure-function studies have revealed that disulfide pattern of the disintegrin domain is essential for integrin binding. Additionally, it could be shown that binding affinity and selection are influenced by residues flanking the three-residue binding motif and the C-terminus (Ramos and Selistre-de-Araujo, 2006). To date, there is not much known about the consensus elements of snake venom disintegrins as there are only a few NMR and X-ray structures available for analyses. However, comparison of the present disintegrin domain structures revealed a significant variance and highly flexible parts (Ramos and Selistre-de-Araujo, 2006). The secondary structure elements are primarily β-turns and short antiparallel β-sheets which are held together via several disulfide bonds. Usually, the functional interaction motif is structurally fixed by two of such bonds. Both heterodimeric and homodimeric interfaces seem to be dependent on two N-terminal interchain disulfide bonds and a hydrogen bonding system (Bilgrami et al., 2004; Calvete et al., 2003). Disintegrin-like domain structures of SVMPs are not yet available and it is not known to what extent they structurally differ compared to disintegrin domains. As mentioned before, the biological effects of the disintegrins is closely related to the binding of integrins. Thus, dependent on the type of bound integrin, disintegrins may regulate a variety of processes, such as platelet aggregation, angiogenesis, metastasis and tumor growth (Ramos and Selistre-de-Araujo, 2006).

Cysteine-rich domain

The name of this domain originates from the vast number of cysteinyl residues found in this domain. Usually, there are about 13 of these amino acid residues located in the cysteine-rich domain of P-III SVMPs. In total, this domain is composed of about 112 residues. Several studies have contributed to the important functionality of this domain by interacting with other proteins, such as fibril associated collagens with interrupted triple helices (FACIT collagens), von Willebrand factor, and integrins (Pinto et al., 2007; Serrano et al., 2005; Serrano et al., 2006; Serrano et al., 2007). As it is the case for ADAMs (Figure 4), the cysteine-rich domain of SVMPs also comprises a hypervariable region (HVR). It was shown that the HVR interacts with the first lectin-like subunit (Takeda et al., 2007). Furthermore, in a few cases the integrin-binding motifs, either RGD or KGD have also been found in the cysteine-rich domain of P-III SVMPs (Ito et al.,

15 1 INTRODUCTION

2001; Kishimoto and Takahashi, 2002; Mazzi et al., 2007). For it was shown that the cysteine-rich domain is also important in collagen-binding (Jia et al., 2000).

C-type lectin-like domain

To date, only two P-III SVMPs are known containing C-type lectin-like domains, whereas in pure form several C-type lectin-like proteins (CLPs) have already been isolated from snake venoms. The two mature SVMPs containing C-type lectin-like domains are RVV-X from Daboia russelli siamensis (Takeda et al., 2007) and VLFXA from Vipera lebetina (Siigur et al., 2004). The analysis of VLFXA’s cDNA revealed that this SVMP is encoded in three different RNAs. Accordingly, the protein can be divided into the P-III metalloproteinase (heavy chain) and two C-type lectin-like domains (light chains). Meanwhile, this was also corroborated through the sequence of RVV-X. It was suggested that the first light chain is connected to the metalloproteinase domain with a unique disulfide bond. The second C-type lectin-like domain is joined via another disulfide bond between both light chains (Fox and Serrano, 2005). Interestingly, despite the homology to Ca 2+ dependent galactose binding C-type lectins, snake venom C-type lectin-like domains do not bind to galactose (Takeya et al., 1992).

1.2.2 Hemorrhagic activity of snake venom metalloproteinases

The main function of SVMPs is to hydrolyze a variety of extracellular matrix proteins, including those of the basement membranes (BM) surrounding endothelial cells in the microvasculature, thus inducing hemorrhage (Bjarnason and Fox, 1994; Hati et al., 1999). This hemorrhagic activity can either occur locally at the site of venom injection or systemically affecting multiple organs. The mechanism by which viperid venoms induce hemorrhage is dependent on the proteolytic degradation of components of the BM in the wall of microvessels, especially capillaries (Markland, 1998a). It has been hypothesized that in vivo hemorrhage occurs by a two-step mechanism (Gutierrez et al., 2005): first, proteins of the BM and of the endothelial cell membrane, involved in cell-BM adhesion, are cleaved which leads to the weakening of the mechanical stability of capillaries; thereafter, in a second step, the hemodynamic forces normally operating in the microcirculation, like hydrostatic pressure and shear stress, contribute to the distension and rupture of the capillary wall, with the ensuing extravasation. Thus, hydrolysis of BM components and its binding is a key step in the pathogenesis of venom-induced hemorrhage (Gutierrez et al., 2005; Markland, 1998a). The potential of SVMPs to induce hemorrhage varies largely in between the different SVMP classes. Usually, the P-III proteins show a higher activity than the P-I or P-II SVMPs. It has been proposed that this is a consequence of the additional disintegrin-like/disintegrin and cysteine-rich

16 1.2 Metalloproteinases from snake venoms domains of P-III SVMPs. As a result, these domains improve the targeting in microvessels and the ability to inhibit collagen-induced platelet aggregation. Besides, the susceptibility of SVMPs to plasma proteinase inhibitors, like α2-macroglobulin, is reduced (Gutierrez et al., 2005). As a matter of fact, the potential to induce hemorrhagic effects differs in the sequentially and structurally very similar P-I class. These enzymes just contain the catalytic metalloproteinase domain responsible for the proteolytic activity on which hemorrhagic activity is based on. Interestingly, there are P-I SVMPs which are hemorrhagic whereas others are not, despite having similar proteolytic activity towards diverse substrates (T. Escalante, personal communication). Clearly, there have to be structural determinants in the metalloproteinase domain that enable SVMPs to induce hemorrhage. The identity of such structural features remains, nevertheless, unknown. Many efforts have been made to disclose this secret. Thereby, a variety of sequential and structural alignments of hemorrhagic and non-hemorrhagic SVMPs have been carried out, which applied the attention to a variable loop region surrounding the active site (Watanabe et al., 2003). This part of the metalloproteinase domain contains about 25 amino acid residues and is only interrupted by the highly conserved Met-turn.

1.3 Snake venom metalloproteinases as models for drug design

The potential of snake venoms and their components as basis for drug design and novel therapies has a long lasting tradition and is even more relevant today (Koh et al., 2006; Stocker, 1999). One of the first examples was the development of synthetic inhibitors for angiotensin converting enzyme using the structure of an oligopeptide from Bothrops jararaca venom (Ondetti et al., 1971). Structural similarities between snake venom proteins and human enzymes offer intriguing possibilities to develop inhibiting compounds. Accordingly, several SVMP*inhibitor/substrate complexes have already been used to investigate structure-activity relationships and to design inhibitors for metalloproteinases (Botos et al., 1996; Cirilli et al., 1997; Gomis-Ruth et al., 1998; Huang et al., 2002; Lingott et al., 2009; Lou et al., 2005; Ramos and Selistre-de-Araujo, 2006; Zhu et al., 1999). Different zinc-binding moieties are known that can be used to guide these inhibitors to the active site of metzincins. Among these, hydroxamate derivatives with peptidomimetic scaffolds exhibit a high potential and are therefore of great interest for the development of small molecule inhibitors for SVMPs, ADAMs, and MMPs (Jacobsen et al., 2007). Besides, several examples already exist of SVMPs being employed in pre-clinical and clinical investigations. One of them is the use of fibrinogenolytic non-hemorrhagic SVMPs as thrombolytic agents, such as Fibrolase from Agkistrodon contortrix contortrix (Markland, 1998b; Swenson and Markland, Jr., 2005). An advantage of fibrinolytic non-hemorrhagic P-I SVMPs in comparison to

17 1 INTRODUCTION

usually used thrombolytic agent, such as streptokinase, is the smaller size and that they are not affected by blood serine proteinase inhibitors (Ramos and Selistre-de-Araujo, 2006). Besides the metalloproteinase domains of small P-I SVMPs, disintegrins derived from snake venoms have already been shown to be promising as possible therapeutics against several pathologies including Alzheimer’s, inflammation, autoimmune diseases, virus infections, asthma, osteoporosis, thrombosis and cancer (Marcinkiewicz, 2005). Some of them are already in use as pharmaceutics, such as Aggrastat (Merck ®) and Ancrod, also called Viprinex (Knoll ®), which have been derived from the snake venom disintegrins Echistatin and Kistrin. Since several years, peptidomimetics with RGD repeats are investigated concerning their potential as cancer treatment agencies by inhibiting tumor metastasis and angiogenesis (Sheu et al., 1997). Based on the recognition of specific integrins expressed on tumor cells, it was already shown that these synthetic peptide-like molecules can be used to target the cells responsible for drug and gene delivery (Chen et al., 2005). Furthermore, blocking virus infection, such as HIV, has been analyzed since many viruses use integrins to reach the cell inside (Lafrenie et al., 2002; Maher et al., 2005). The potential application of disintegrins are already well-known, but a structure of a disintegrin-integrin complex, which is still not available, would bring relevant information for further drugs derived from SVMPs.

1.4 In silico approaches in drug discovery

The number of protein and protein-ligand complex structures available in the RCSB Protein Data Bank (PDB) (Berman et al., 2000) is constantly growing. As a matter of fact, structure-based approaches for drug design and screening have become increasingly important. Therefore, methods of drug discovery have strongly changed over the past few decades. Searching for and designing drug candidates with the aid of computer programs (in silico ) prior to studying them in vitro or in vivo , is now a well accepted and a generally used approach (Drews, 2000; Jorgensen, 2004). Accordingly, the number of programs and methods working with protein structures, sequences or small molecules has significantly risen, especially over the past few years. In the following the most important approaches are shortly introduced.

1.4.1 Pharmacophore modeling and virtual screening

Interactions of small molecules with a target protein can be described using a so-called pharmacophore model. A pharmacophore defines a set of structural features which is recognized at a receptor binding site and responsible for the biological activity of the substance (Gund, 2000). A pharmacophore model is the description of chemical features of several biologically active molecules with respect to their conformations. Small molecules can be matched with the

18 1.4 In silico approaches in drug discovery pharmacophore model if they obtain equal or bioisosteric functional groups for an interaction with the target. Pharmacophore modeling has evolved into an important and successful method for drug discovery (Guner, 2002; Leach et al., 2010). Virtual screening of large compound libraries is a modern alternative to high-throughput screening of potential inhibitors using biological assays, and is being rapidly adopted due to financial pressures and immensely growing computational power (Koppen, 2009).

1.4.2 Protein-ligand docking

One of the most frequently used structure-based in silico techniques is protein-ligand docking (Fradera and Mestres, 2004). The starting point is the three-dimensional structure of the target protein. The method tries to predict the correct binding mode of small molecules at the target binding site. The docking procedure consists of two steps. The first is the calculation of the binding pose and the second the calculation of the binding affinity, the so-called docking score. While the former now typically yields quite reliable results, the latter is still controversially discussed (Warren et al., 2006). Modern docking programs employ different techniques. The most commonly used are heuristic methods or methods based on genetic algorithms, surface similiarity, ligand shape, simulated annealing, and molecular dynamics. Generally, they can be separated into flexible and rigid docking approaches, in which the target protein is treated as a flexible object or as one without any degrees of freedom. The first method is more accurate, though requiring considerable computational power, but protein flexibility seems to play an important role in ligand docking (Brooijmans and Kuntz, 2003). Due to these reasons, semi-flexible methods have been evolved and are the up-to-date solution for docking experiments in a sensible cost-to-result ratio. Several techniques exist to decrease the degrees of freedom of the target protein without losing too much flexibility of the protein.

1.4.3 Molecular dynamics simulations

Generally, three-dimensional protein structures gained by X-ray analysis only represent a static picture of a system that is in most cases actually dynamic. Indeed, there are some features in X-ray structures that give an indication of mobility, such as high B factors or unresolved electron density, but the only way to prove actual movement in a given molecule is to capture different conformational states in separate crystal structures. It is known that the function of proteins frequently depends on conformational dynamics. With molecular dynamics (MD) simulations, the gap in understanding the dynamic behaviour of proteins can at least partly be closed (Karplus and McCammon, 2002). MD simulations of biomacromolecules have become a generally used method and, nowadays, lead to reliable results. The detailed motion of how proteins behave over time 19 1 INTRODUCTION

helped in many cases to understand the biological function of a target (Lee et al., 2009). Solely regarded, results of MD simulations may be discussed controversely but in combination with experimental data they represent a strong approach to resolve complex biological questions (Sotomayor and Schulten, 2007). However, the creation of protein movies through MD simulations is still limited, due to the demanding computer power, the size of biomacromolecules, and the time scale of the simulation. Nevertheless, it was already possible to perform MD simulations on large macromolecular systems such as the ribosome or entire viral capsids and smaller systems with simulation times up to 100 µs (Freddolino and Schulten, 2009; Sanbonmatsu and Tung, 2007; Zink and Grubmuller, 2009). In the future, the topic of MD simulations will be of even higher significance as many important research problems in modern cell biology require a combined experimental-computational approach and computing has to be a reliable partner (Lee et al., 2009).

1.4.4 Sequence alignments

Compared to the number of protein structures, the number of protein sequences available for computational analyses is rising significantly faster. A fundamental principle in biology states that functionally important residues in proteins are less likely to mutate during evolution than unimportant ones. Thus, a high degree of local sequence conservation indicates functionally or structurally important regions. A reference sequence with known function/structure combined with sequence similarity analysis can be used to annotate, classify or find similar sequences by alignments. Hence, the aim of sequence alignments is to group conserved residues that have the highest sequence similarity.

20 1.5 Aims of this work

1.5 Aims of this work

As mentioned above, SVMPs are interesting targets in drug research, either concerning elucidation of their intrinsic biological effects or as models in metalloproteinase inhibitor development. Structural studies on BaP1 from Bothrops asper in complex with a peptidomimetic inhibitor were aimed at gaining further insight into the structure-activity relationship of small molecules inhibiting SVMPs in general and, particularly BaP1. In order to achieve this goal, the already existing BaP1 purification protocol ought to be modified with the final aim of obtaining cocrystals and solving the crystal structure of a BaP1*inhibitor complex. Based on this complex structure, different in silico approaches were to be performed to find more effective BaP1 inhibitors. Especially, pharmacophore modeling and virtual screening methods should generate new lead structures of metalloproteinase inhibitors. These procedures were to be established in cowork with the group of Prof. Thierry Langer and Dr. Gerhard Wolber (now at the University of Berlin) at the Computer-Aided Molecular Design Group, University of Innsbruck. The structural behavior of the active site ought to be analyzed concerning the zinc ion coordination sphere and the role of the catalytically active general base during the proteolytic reaction mechanism. Therefore, results of the BaP1*inhibitor complex structures as well as MD simulations were to be gathered in order to analyze pK a values of the catalytic glutamate residue and the dynamic behavior of the active site. These MD simulations were performed in cooperation with Thomas Steinbrecher of the Department for Theoretical Chemical Biology, University of Karlsruhe. The motivation for the theoretical elucidation of the hemorrhagic mode of action of SVMPs was the lack of explanation for the connection between proteolytic and hemorrhagic activity. Here, one of the questions to be addressed was how sequentially and structurally very similar P-I snake venom metalloproteinases can show comparable proteolytic activity towards the same substrates but differ widely in the potential to induce hemorrhagic effects. In a first attempt, profound sequence and structure alignments of SVMPs’ metalloproteinase domains were to be carried out to gain insight into determinants for hemorrhagic activity. Furthermore, results of MD simulations of hemorrhagic and non-hemorrhagic P-I SVMPs ought to clarify the question if flexibility might be one of these determinants. Corresponding MD simulations were realized in collaboration with Hannes Wallnoefer of the Center for Molecular Biosciences, University of Innsbruck.

21 2 EXPERIMENTAL PROCEDURES

2 Experimental procedures

2.1 Materials

2.1.1 Appliances

General Pipettes 2 µL, 20 µL, 200 µL, 1000 µL and 5000 µL Gilson Multipipette Research Pro 10/100 Eppendorf Water purifier Milli-Q Gradient Millipore Water bath K4 Electronic MGW Lauda Centrifuges 5415 C, 5804 R, and 5417 R Eppendorf Rotina 35 R Hettich Zentrifugen Vortexer REAX 2000 Heidolph Vortex Genie 2 Scientific Industries Scales AE 240, PE 1600 and PE 3600 Mettler AB184-A3 Mettler Toledo Analytical Plus Ohaus Photometer PR 2210 Eppendorf pH electrodes IJ44/Meteo GAT Cuvets K-cuvets 407 1/1 and UVette 220-1600 nm Eppendorf

Protein production and purification Incubator Innova 4230 New Brunswick Scientific Autoclave Varioklav 500 EV H & P Labortechnik Ultrasonication bath Sonorex RK 106 Super Bandelin Stir cells 8050 and 8010 Amicon Ultrafiltration PM 10, PM 30, PM 50 Amicon membranes Centrifugal Vivaspin 6 mL, 15 mL Viva science concentrators

Proteolytic activity assays Incubator HERAcell Kendro Laboratory Products Photometer Microplate Reader Modell 680 Bio-Rad

Chromatography FPLC systems Äkta Purifier 100 Amersham Biosciences Äkta Explorer 100 Amersham Biosciences Äkta prime Amersham Biosciences Software Unicorn Version 3.00 and 3.10.11 Amersham Biosciences PrimeView 1.0 Amersham Biosciences Peristaltic pump ISM321A Imatec Columns/column CM Sephadex C-25 Pharmacia material Affi-gel blue Bio-Rad Superdex-200 26/60 Amersham Biosciences Electrophoresis Power Supply ECPS 3000/150 Amersham Biosciences

22 2.1 Materials

Crystallization Coverslips Ø 21 mm, thickness 2 mm Hecht-Assistent Crystallization trays Linbro cell culture plates, 24-well Flow Laboratories Crystal Clear Strips, 96-well Hampton Research Microscopes Stemi 2000 Zeiss Laborlux-12 POL S Leitz Horse-tail hair for Mare 'Farina' 10 years M. Uhlmann streak seeding Refrigerators Wine-cooler Bosch Photo documentation Cammedia C-3030 Zoom Olympus

X-ray analysis Cryo loops 0.1-1.0 mm Hampton Research Goniometer head Huber Hampton Research X-ray generators RU-200B and RU-H2C Rigaku Detector X 1000 Siemens marCCD165 Marresearch Cryo stream controller 600 series Oxford Cryosystems

2.1.2 Chemicals and kits

Chemicals Common laboratory chemicals were obtained from Fluka, Merck, Roth, Serva or Amersham Biosciences and not listed seperately. Unless noted otherwise, all chemicals were p.a. grade.

Acrylamide, research grade Fluka Ammonium sulfate (for crystallization) Fluka Ammonium peroxodisulfate (APS) Merck Azocasein Fluka Batimastat Tocris Biosciene Bromophenol blue Serva Commassie Brilliant Blue R-250 Serva Dithiothreitol (DTT) Gerbu EDTA, disodium salt Gerbu Glycerol Serva HEPES Gerbu β-Mercapto ethanol ( β-ME) Serva MES Fluka N,N’-Methylene bisacrylamide, research grade Serva Norvaline hydroxamate Sigma Polyethylen glycol (PEG-xk), various sizes Fluka RO-0325337 inhibitor Roche Pharmaceutics SDS Fluka Sinaization solution Serva TEMED Merck Tosedostat Tocris Biosciene Tris Fluka Tris Merck Tween 80 Fluka

23 2 EXPERIMENTAL PROCEDURES

Kits for protein crystallography Crystal Screen I Hampton Research Crystal Screen II Hampton Research Crystal Screen lite Hampton Research Wizard Screen I Emerald Biostructures Wizard Screen II Emerald Biostructures Wizard Cryo I Emerald Biostructures Wizard Cryo II Emerald Biostructures JB Screen 1-10 Jena Biosciences Additive Screen I Hampton Research Additive Screen II Hampton Research Additive Screen III Hampton Research

2.1.3 Buffers and solutions

Protein preparation IEC-A Tris 50 mM KCl 100 mM pH 7.0 (RT, 6 M HCl)

IEC-B Tris 50 mM KCl 750 mM pH 7.0 (RT, 6 M HCl)

AfC-A Tris 15 mM pH 8.0 (RT, 6 M HCl)

AfC-B Tris 15 mM NaCl 1.5 M pH 8.0 (RT, 6 M HCl)

GPC Tris 15 mM NaCl 200 mM pH 8.0 (RT, 6 M HCl) cryo-buffers Crystals of native BaP1 and corresponding crys. buffers crystals of BaP1*inhibitor complex + Glycerol 10% (v/v)

Proteolytic acitivity assay Reaction buffer Tris 25 mM NaCl 150 mM CaCl 2 5 mM pH 7.4 (RT, 6 M HCl)

Azocasein solution Tris 25 mM NaCl 150 mM CaCl 2 5 mM Azocasein 10 mg/mL pH 7.4 (RT, 6 M HCl)

Termination solution Trichloroacetic acid 5% (w/v) in dH 2O

24 2.1 Materials

SDS-PAGE Acrylamide stock solution Acrylamide 39% (w/v) Bisacrylamide 1% (w/v)

Electrophoresis buffer Tris 50 mM Glycine 380 mM SDS 0.1% (w/v) pH 8.3 Stacking gel buffer Tris 0.5 M SDS 0.4% (w/v) pH 6.8 (RT, 6 M HCl)

Stacking gel, 6% (w/v) Acrylamide stock solution 0.375 mL stacking gel buffer 0.625 mL dH 2O 1.5 mL APS 10% (w/v) 25 µL TEMED 2.5 µL

Separating gel buffer Tris 1.5 M SDS 0.4% (w/v) pH 8.8 (RT, 6 M HCl)

Separating gel, 12% (w/v) Acrylamide stock solution 1.5 mL Separating gel buffer 1.25 mL dH 2O 2.25 mL APS 10% (w/v) 50 µL TEMED 5 µL

SDS sample buffer Stacking gel buffer 2 mL SDS 16% (w/v) 2 mL Glycerol 4 mL Bromophenol blue 0.2% (w/v) 1 mL pH 6.8 (RT, 6 M HCl)

Staining solution Coomassie Brilliant Blue R-250 0.25% (w/v) Ethanol 30% (v/v) Acetic acid 10% (v/v)

Destaining solution Ethanol 30% (v/v) Acetic acid 10% (v/v)

Low molecular weight standard Phosphorylase B (94 kDa) 64 µg/µL (LMW) Bovine serum albumin (67 kDa) 83 µg/µL Ovalbumin (43 kDa) 147 µg/µL Carboanhydrase (30 kDa) 83 µg/µL Trypsin inhibitor (20 kDa) 88 µg/µL α-Lactalbumin (14 kDa) 121 µg/µL

25 2 EXPERIMENTAL PROCEDURES

2.2 Protein purification

2.2.1 Venom of Bothrops asper snakes

BaP1 was isolated from a venom pool collected from about 50 adult specimens of Bothrops asper of the Pacific regions of Costa Rica. The first two protein purification steps have been carried out at the Instituto Clodomiro Picado in San José (Facultad de Microbiología, Universidad de Costa Rica). After extraction, venom was lyophilized and stored at -20°C. Before performing the ion exchange chromatography (see Section 2.2.2) between 0.5 and 0.6 g of the lyophilized venom was dissolved in 5 mL of IEC-buffer A and centrifuged (5 min, 1'000*g, 20°C).

2.2.2 Ion exchange chromatography

Purification of the venom was performed by ion exchange chromatography (IEC). The principle is based on attractive ionic interactions of the stationary phase (matrix) and the solutes of opposite charge in the mobile phase. Because of the charged groups on their surface all proteins are polyelectrolytes in solution. Thereby, the net charge is variable and pH dependent. The pH value where the protein has no net charge is called isoelectric point (pI). Depending on the pH value of the protein, it will carry a negative or positive net charge and bind to an anion or cation exchange resin, respectively. This interaction is sensitive to the concentration of salt ions in solution, because they are competing for the coordination positions of the column. Depending on the level of affinity, proteins can be eluted differently and thereby separated. With the online tool ProtPram (ExPaSy, www.expasy.ch) the pI of BaP1 was calculated to a pH value of 7.4 which means that the protein is positively charged at 7.0 and binds to negatively charged cation exchange resin. The parameters of the used cation exchange column are summarized in Table 1. The obtained fractions were investigated on their content of protein by SDS-PAGE (Section 2.3.1), pooled and concentrated to a volume of about 5 mL by the use of ultrafiltration membranes (see Section 2.2.3). Subsequently, the protein solution was washed three times with AfC-buffer A in an Amicon cell.

26 2.2 Protein purification

Table 1 Parameters of the cation exchange column. CM Sephadex C-25 Column material cross-linked dextran Expulsion factor 30 Column diameter 30 Column height [mm] 50 Column volume [mL] 35 Load volume [mL] 5 Flow rate [mL/min] 0.5 Fraction size [mL] 4 Detection [nm] 254, 280

2.2.3 Concentration of protein solutions

Large volumes of protein solutions were concentrated by Amicon ultrafiltration cells using ultrafiltration membranes with a molecular weight cut off (MWCO) of 10 kDa. Smaller volumes were concentrated by VivaSpin concentrators (MWCO 3 kDa). Thereby, pressure overload or centrifugal force pushes the solution against a semipermeable membrane and all components with a molecular weight lower than the cut off, including water molecules, pass the membrane. Larger substances, like the BaP1 molecule (23 kDa), are restrained by the membrane and accumulate above it. The protein concentration in the Amicon cell was performed at 3 to 4 bar and by the VivaSpin concentrators at 3'000*g and 4°C.

2.2.4 Affinity chromatography

Affinity chromatography is characterized by a covalent bond between a specific protein ligand and the insoluble stationary phase (matrix). In the best case, only the considered protein is binding and all other components and impurities of the solution are directly washed down in the void volume. As mentioned before (Section 2.2.2), salt ions can affect the interaction between ligand and protein and the protein can be eluted from the matrix with a certain salt concentration. In this work, Affi-gel blue was used as column material, whereby BaP1 is specifically attached to the covalently bound ligand (Cibacron Blue F3GA) and can be eluted selectively with the salt gradient of the AfC- buffer. In Table 1 the parameters of the Affi-gel blue column are given. After affinity chromatography, responsible fractions were analyzed by SDS-PAGE (Section 2.3.1) and the fractions corresponding BaP1 were pooled. Concentration was done via ultrafiltration in the Amicon cell (Section 2.2.3) after washing three times with 50 mL of deionized water. Subsequently, the desalted protein solutions were lyophilized for transportation.

27 2 EXPERIMENTAL PROCEDURES

Table 2 Parameters of the affinity chromatography. Affi-gel blue Column material Cibacron Blue Column material F3GA20 Column height [mm] 100 Column volume [mL] 30 Load volume [mL] 5 Flow rate [mL/min] 0.4 Fraction size [mL] 5 Detection [nm] 280

2.2.5 Gel permeation chromatography

In gel permeation chromatography (GPC, also called gel filtration) proteins are separated according to their size. This method can be used to determine the native molecular mass of proteins, because a roughly globolar shape of the proteins can be assumed and the size correlates well with the molecular mass. The gel used in GPC consists of small beads with difined pores. Solutes larger than the average pore diameter elute in the void volume (V 0) of the column. Molecules smaller than the average pore diameter can enter the beads and are thus slowed down in their elution time. The elution volume is directly correlated to the hydrodynamic radius and, hence, to the molecular mass of the protein. It can be determined by comparing the elution volumes to those of protein standards of known size. GPC was used as final purification step and was added to the established purification procedure from Gutiérrez et al. (1995). To prevent unspecific interactions between protein and column material, the used GPC-buffer contained 200 mM NaCl. The parameters of the used GPC column are summarized in Table 3.

Table 3 Parameters of the GPC chromatography. Superdex-200 26/60 prep grade agarose cross-linked Column material with dextrane Isolation range [kDa] 10-600 Column diameter 26 Column height [mm] 600 Column volume [mL] 320 Load volume [mL] 4-5 Flow rate [mL/min] 1.0 Fraction size [mL] 4 Detection [nm] 280

Prior to the GPC analyses, the lyophilized protein was dissolved in 4 to 5 mL of GPC-buffer. To remove suspended particles, the solutions were centrifuged (12'000*g, 10 min, 4°C), and, subsequently, sterile filtered using 0.2 µm filters. After the GPC, BaP1 containing fractions were

28 2.2 Protein purification analyzed by SDS-PAGE (Section 2.3.1) and selected fractions were pooled. Via VivaSpin concentrators (Section 2.2.3) the samples were three times washed with deionized water and concentrated to about 1 mL volume.

2.3 Protein characterization

2.3.1 Discontinuous sodium dodecylsulfate polyacrylamide gel electrophoresis

The discontinuous sodium dodecylsulfate polyacrylamide gel electrophoresis (SDS-PAGE) is used to assess purity and amount of proteins in solution and to determine the molecular weight of unknown proteins (Laemmli, 1970). The gel is prepared by copolymerization of acrylamide with the bifunctional crosslinker N,N’-methylenbisacrylamide. The degree of crosslinking can be altered by changing the ratio of the two monomers. Performing the electrophoresis, the proteins are concentrated in a large-pored stacking gel at a pH value of 6.8 and then isolated in a narrowed- pored separating gel (pH value 8.8) (Righetti, 1990). The principle of SDS-PAGE is based on the masking of the intrinsic charge of proteins through the anionic detergent sodium dodecylsulfate (SDS). The binding of SDS in a fairly constant proportion of the masses (1:1.4) puts the eletrophoretic mobility of the proteins in a linear correlation to its molecular weight. Together with molecular sieve effects caused by the gel, the proteins migrate as bands through the gel. In comparison with the mobility of marker proteins of known mass the weight of the protein can be calculated. In this work, protein samples (1 to 30 µg) were mixed with 10 µL SDS sample buffer and denaturated at 95°C. The gels were prepared in reducing and non-reducing conditions (with and without β-ME) and the electrophoresis was performed until the band of bromphenol blue had reached the lower edge of the gel (30 to 40 min). After electrophoresis, gels were stained with the dye Coomassie Brilliant Blue R-250 by brief boiling in staining solution. Excess dye was removed by incubating the gel in destaining solution. Subsequently, gels were placed between two sheets of cellophane and dried at RT.

2.3.2 Photometric determination of protein concentrations

Lambert-Beer’s law allows the calculation of the concentration of a protein solution from its absorption at 280 nm (Equation 1). The molar decadic extinction coefficient ε280 can be calculated from the amino acid sequence by summarizing the extinction coefficients of tryptophan, tyrosine, and cysteine residues, the only residues with apreciable absorption at 280 nm (Gill and von Hippel, 1989) (Equation 2).

29 2 EXPERIMENTAL PROCEDURES

A ⋅ M [Da ] Equation 1 c[mg / mL ] = 280 r ε 1- 1- ⋅ 280 [M cm ] d cm[ ]

with A280 absorption at 280 nm Mr molecular mass d path length

Equation 2 ε = (n ⋅ 5690 + n ⋅1280 + n ⋅120 )M -1 cm -1 280 Trp Tyr Cys

with ε280 molar decadic extinction coefficient nx absorption at 280 nm

For BaP1 with three tryptophan, seven tyrosine, and six cysteinyl amino acid residues, the molar -1 -1 decadic extinction coefficient is calculated to a value of ε280 = 26'930 M cm . The calculated relative molecular weight is 22'733 Da. Both values were calculated with the online tool ProtParam (ExPaSy, www.expasy.ch).

2.3.3 Proteolytic activity of BaP1 and protease inhibition assay

The proteolytic activity of BaP1 was assayed by unselective cleavage of azocasein (Wang et al., 2004). Different concentrations of purified BaP1 were diluted in 20 µL of the reaction buffer and were mixed with 100 µL of the azocasein solution. After an incubation time of 90 min at 37°C, 200 µL of reaction termination solution were added to quench the reaction and the samples were centrifuged for 10 min at 4,000*g. 100 µL of the supernatant was mixed with an equal amount of

0.5 M NaOH in dH 2O. The absorbance was recorded at 450 nm. For inhibition testing, BaP1 (0.7 mg/mL) was preincubated for 30 min at 37°C with different concentrations of the inhibitor in reaction buffer before adding the azocasein solution. Assays were performed at least in triplicate. Statistical analysis and non-linear regression were performed using Prism 5.0 (GraphPad Software, Inc.).

2.4 Protein crystallization

2.4.1 Crystal lattices and symmetry

The identical repeating unit of crystals is its uni cells. It is periodically arranged in the same way in all three directions in space. Per definition, a uni cell is the smallest volume in a crystal that can 30 2.4 Protein crystallization be related to a neighboring one by a pure translation operation. It preserves every structural r information of the crystal, including symmetry. Thereby, the unit cell is defined by the vectors a , r r b , c . The lengths of the three cell axes a, b, and c together with corresponding angles α, β, and γ are called the unit cell parameters. The whole crystal lattice can be described by the symmetry elements occurring within the crystal together with lattice translations. In total, there are 32 possibilities to combine the symmetry operations allowed in crystals, giving rise to 230 distinct space groups (International Tables for Crystallography, 1996). The definition of every space group is given by a specific set of symmetry operations. With these operations the unit cell of a crystal can be generated when they are applied to the so-called assymmetric unit. This is the smallest part that cannot be further reduced through symmetry operations. And, furthermore, the full unit cell can always be generated by application of all symmetry operators in the space group to the asymmetric unit which means that the aim of structure analysis is to get the atomic information of the asymmetric unit. Taking the 32 possible point groups into account, seven crystal systems exist: the triclinic, monoclinic, orthorhombic, tetragonal, trigonal, hexagonal and cubic systems. Besides a primitive cell description, face and body centered lattices exist in some crystal systems, giving rise to 14 possible Bravais lattice types. Not all space groups are relevant to protein crystallography. Because biological macromolecules are always chiral, only rotation and screw axes are allowed. The number of allowed point groups is thus reduced to 11 and the number of space groups to 65.

2.4.2 Solvent content in protein crystals

Because protein molecules remain hydrated upon crystallization, protein crystals contain a considerable amount of water. While some water molecules are specifically bound to the protein, the majority is present as unstructured bulk solvent. A survey of water content in protein crystals led to the definition of the packing parameter VM (Equation 3) (Matthews, 1968).

V Equation 3 V = EZ M M ⋅ z ⋅ n

3 with VEZ unit cell volume [Å ] M molecular mass of the protein [Da] z number of molecules in the asymmetric unit n number of asymmetric units per unit cell

3 The packing parameter VM describes the volume [Å ] occupied by each molecular mass unit [Da]. Usually, for soluble proteins, it is in the range of 1.7 to 3.5 Å 3/Da, whereas it was observed 31 2 EXPERIMENTAL PROCEDURES that the most common value is ca. 2.2 Å 3/Da (Kantardjieff and Rupp, 2003). With this estimation, it is possible to calculate the number of protomers z per asymmetric unit in an unknown crystal solely from the unit cell parameters and the molecular mass of the protein. In the opposite way, this relationship can be used to calculate the solvent content χ of a crystal using the most common mean density fo 1.35 g/cm 3 for proteins.

23.1 Equation 4 χ = 1 − VM

Only the larger and membrane proteins tend to crystallize with a higher solvent content; usually, crystals of soluble, globular proteins contain about 20 to 80% solvent (Matthews, 1975).

2.4.3 Protein crystallization

Single protein crystals of sufficient size and quality are required to perform X-ray structure analysis. However, protein crystallization depends on a variety of parameters like pH value, protein purity, additives, temperature, ionic strength of the protein solution, and the concentration of proteins, precipitants, and salts in the solution. Most crystallization methods rely on the principle of slowly raising the concentration of a precipitant that binds solvent molecules, thereby removing them from the hydration sphere of the protein molecules in solution. Ideally, this leads to the enforcement of protein-protein interactions and the formation of crystals. The process of crystallization can be stated in terms of a phase diagram. Concerning the reduction of solvent molecules, the protein solution becomes supersaturated, and a metastable state is reached. In this state already existing crystal nuclei grow, but new ones do not appear. Not before reaching a labile state at even higher degrees of supersaturation, new crystal nuclei appear, and also the old ones continue growing. Otherwise, the precipitation state could not be reached, because in this zone unspecific protein aggregates is favored (McPherson, 1990). As the goal of protein crystallization is the growth of large single crystals, the aim is to reach the labile state soon after setting up the drop. Thereby, only a few nuclei are spontanously formed, but the state should quickly cross over to the metastabile zone. Ideally, most of the protein presented in the drop is used for crystal growing and not for formation of new nuclei (McPherson, 1990). Common precipitant agents are inorganic salts of the Hofmeister series, water soluble polymers as well as organic solvents.

32 2.4 Protein crystallization

Figure 9 Phase diagram of the crystallization process. The goal of crystallization experiments is to reach the metastable zone for crystal growth after the nucleation zone (Chayen and Saridakis, 2008).

Initial screening and refinement of cystallization conditions

Until today, it is still not possible to rationally predict crystallization conditions for a given protein. Therefore, it is useful to search for suitable conditions with the aid of so-called screens. Mostly, bifactorial grid screens and multifactorial sparse matrix screens are used (Carter, Jr. and Carter, 1979). Many factors are investigated at the same time by the latter method, whereas only two parameters are varied by the former one, for example precipitant concentration and pH value. For multifactorial screens it would be impractical to exhaustively test all variations. This method is mostly used for the initial search of potential crystallization conditions. It represents a sparse sampling of parameter space which is based on a statistical analysis of conditions and led to successful crystallization of different proteins in the past. Typically, three to four factors such as the buffer ion, pH value, precipitant type and concentration, and an additional salt, are varied (Jancarik et al., 1991). As soon as suitable crystallization conditions with protein crystals are found, bifactorial screens are set up for further refinement of the conditions. The aim thereby is to grow larger single crystals of the proteins suitable for X-ray analysis. Additionally, the influences on crystallization of BaP1 by additives and detergents were investigated in this work (Cudney et al., 1994). There are different methods for protein crystallization. Mainly, vapor diffusion and batch crystallization methods are used nowadays (McPherson, 1990; Unge, 1999). In this work, the vapor diffusion method was exclusively used, with hanging and sitting drop setups. Protein solution and

33 2 EXPERIMENTAL PROCEDURES crystallization buffer are mixed in a drop, usually in a 1:1 ratio, and put beside the reservoir solution, either as hanging drop or as sitting drop (Figure 10). Through air-tight sealing with coverslip and vacuum grease or Crystal Clear Tape, respectively, the necessary closed system is created and the concentration gradient between drop and reservoir can be dissipated slowly through vapor diffusion. In most cases, only water vapor passes from drop to reservoir in order to raise the precipitant concentration in the drop. However, if volatile precipitants are used, these can also pass through the gas phase, but in the opposite direction. Both effects lead to a slow rise of precipitant concentration in the drop, eventually leading to supersaturation and crystal formation.

Figure 10 Schematic representation of hanging drop (A) and sitting drop (B) methods.

Initial screening was performed in 96-well Crystal Clear trays using the sitting drop method (Figure 10B). Advantages of this method are: less protein and buffer solution is needed and more different conditions can be tested. For each drop, 1.0 L protein solution was mixed with an equal amount of reservoir buffer. The reservoir volume was 70-100 L. Further refinement of suitable crystallization conditions was carried out in 24-well Linbro plates using hanging drops on silanized coverslips (Figure 10A). In this case, drop sizes were between 2 and 5 L for final crystal production, and the wells sealed with vacuum grease above 500 µL reservoir buffer. All crystallization trays were stored at a constant temperature of 20°C and regularly reviewed with an optical microscope. Prior to the crystallization experiments, protein was dissolved in different concentrations (5 to 14 mg/mL) with deionized water or buffer.

Seeding methods

As already mentioned, the goal of every crystallization experiment is to obtain large, single, diffraction-quality crystals for subsequent X-ray analysis. However, in many cases only small crystals in a high quantity are obtained during crystallization. To grow large single crystals, nucleation state and crystal growth have to be uncoupled. Several methods have been applied to fulfill this achievement, such as the use of gels to lower convection (Biertumpfel et al., 2002) and hanging the freshly set up coverslip over a well with a lower precipitant concentration as the original one. The most effective method is the introduction of preformed crystal seeds from outside. 34 2.4 Protein crystallization

Here, three methods are possible, so-called crystal seeding (McPherson, 1990). One has to differentiate between seeding with complete crystals (macroseeding) or only with small pieces of crystals (microseeding) (Bergfors, 1999; D'Arcy et al., 2003). In macroseeding, large crystals are washed in order to remove the outermost surface and expose intact growth layers, then placed in a freshly set up drop where the old crystal will continuously grow. It is important to maintain the precipitant concentration at low levels to prevent crystal nucleation and promote growth of the old crystals. A too low precipitant concentration leads to resolving of the crystal. Crucial is also the quality of the transferred crystal. In microseeding, an existing crystal is finely crushed and the resulting seed stock used in dilution to inoculate a freshly set up drop, where the microscopic nuclei will then give rise to large crystals. With respect to the old crystals, it has to be differentiated between homogenous and cross seeding. The latter method means that nuclei for seeding originate from crystals of another protein, a mutation of a protein or a protein with a different ligand. Especially, crystals of mutated proteins or proteins complexed to a different ligand are used. Success of this method depends on the space of the used crystal as it should be the same as for the new crystal. An alternative of microseeding is the so-called streak seeding, in which an existing crystal is touched with an animal hair (e.g. horse tail hair), thereby picking up nuclei from the crystal surface. The hair is then pulled through a fresh drop, dislodging some of the nuclei which can then continue to grow into large crystals (Figure 11). Successive pulling of the hair through several fresh drops in series leads to diminishment of the nuclei from drop to drop and, thereby, creating a variability in quantity and size of the growing crystals. Another parameter is the kind of the seeding line. In this work, a horse tail hair was touched to an old crystal and then used to inoculate 6 to 12 drops before recharging the hair with nuclei. The streaking was either a circular movement or just one selective contact to the new drop, containing protein and buffer solutions as mentioned before. Prior to every streak seeding experiment the horse tail hair was degreased with ethanol and then washed with deionized water.

35 2 EXPERIMENTAL PROCEDURES

B

A B Figure 11 Schematic representation (A) of the streak seeding technique (Bergfors, 1999) and crystals grown along the seeding line (B) (Lingott, 2005).

Crystal soaking

In general, protein crystals contain a high quantity of solvent (Section 2.4.2) which is realized by discrete networks of solvent channels in the crystal. Molecules smaller than the diameter of the solvent channels can therefore penetrate the crystal and reach specific protein binding sites by diffusion. This property is used to prepare protein-ligand complexes, e.g. with inhibitors or substrates, by soaking crystals in appropriate solutions of the compounds (Petsko, 1985). Binding of the compound can induce conformational changes in the protein or alter crystal packing. This often leads to deterioration of the crystal lattice and loss of diffraction, but can also improve diffraction properties if compound binding leads to stronger lattice contacts.

Soaking experiments to obtain BaP1*inhibitor cocrystals

In this work, BaP1*inhibitor complexes were produced by transferring crystals of native BaP1 (only containing the protein and eventually metal ions) into freshly set up reservoir and cryo buffer drops which contained the inhibitor in a concentration between 1 and 5 mM. The soaking time was varied from a few min to several hours, while the morphology of the crystals was checked regularly. The synthetic hydroxamate derivative used in this study inhibited BaP1 with a significant value of the half maximal inhibitory concentration (IC 50 ) (Section 2.3.3). This peptidomimetic was originally designed as a TACE-inhibitor (RO-0325337, Roche Pharmaceutics, personal communication). Due to the fact that TACE and BaP1 have structurally very similar binding sites it was then used to analyze its effects on snake venom metalloproteinases. The inhibitor is depicted in Figure 12 and furtheron in this work abbreviated as RO-inhibitor.

36 2.4 Protein crystallization

Figure 12 Structure of the BaP1 inhibiting peptidomimetic hydroxamate derivative RO-0325337.

Cocrystallization experiments for the BaP1*inhibitor complex

In some cases, the binding site of proteins in the crystal is blocked through crystal contacts or is not acessible for the compounds before a conformational change of the protein. In the case soaking experiments are not successful and cocrystallization would be an alternative. Thereby, prior to the crystallization experiments the protein is incubated in a ligand containing solution. The cocrystallization of BaP1 with the inhibitor RO-0325337 was established during this work and was done in both initial screening and refinement of the crystallization conditions. Firstly, the inhibitor was dissolved in double distilled water and then cocrystallized with the protein in a four times higher concentration (1.5 mM). After incubation for 20 min at 4°C, the protein-inhibitor solution was directly used for crystallization trials.

2.4.4 Crystal mounting

Because of the high quantity of solvent in protein crystals (Section 2.4.2), they have to be protected from dehydration during data collection. Otherwise, their integrity would be destroyed and the crystal would not be suitable for X-ray analysis anymore. The experiments can be set up at room temperature or at low temperatures. Low temperature measurements strongly reduce the effects of secondary radiation damage, which are time and temperature dependent, but not like primary radiation damage being solely dose dependent (Garman, 1999; Gonzalez and Nave, 1994). To setup low temperature measurements, crystals are picked up from the mother liquor and placed in a nitrogen stream cooled to 100 K, or directly plunged into liquid nitrogen (Figure 13). This is achieved with small nylon loops (cryo loops) (Figure 13B). Through performance of low temperature experiments, the high radiation of the synchrotrons can be used for X-ray analysis of protein crystals. At room temperature the crystal is mounted in a thin-walled glass capillary. During this work, data collection at low temperatures was exclusively used.

37 2 EXPERIMENTAL PROCEDURES

Figure 13 Schematic setup of the equipment for X-ray experiments (A) and cryo-pin with -loop and - vial (B) for crystal mounting and storage at low temperatures (Gonzalez and Nave, 1994).

Cryoprotection and crystal mounting crystals in cryo loops

To prevent formation of ice crystals while shock freezing the protein crystals, the protein is transferred to solutions containing cryoprotectants prior to freezing. Therefore, glycerol, polyethylene glycols, oils, sugars or different low molecular mass organic substances are applied (Garman and Mitchell, 1996). On the other side, the risk exists that these substances extremely lower the quality of crystals, especially the mosaicity can rise significantly. With the aid of trial and error, a suitable cryo protectant, its optimal concentration and a well planned transferring protocol has to be established. The crystal should be transferred step by step in buffer drops with rising cryoprotectant concentrations. Here, glycerol as cryoprotectant was solely used. The crystals were placed into reservoir buffer with 5% (v/v) glycerol and soaking times of 10 sec. Subsequently, it was transferred into a drop containing 10% (v/v) glyerol, in which it was held for another 10 sec, and shock frozen in liquid nitrogen for storage or immediate data collection.

2.5 X-ray structure analysis

2.5.1 Theory of X-ray diffraction

The basis for structure analysis is X-ray radiation with wavelengths between 1*10 -9 and 1*10 -11 m (1-124 keV). While bench scale testing, X-ray radiation is generated by focusing a beam of electrons onto a metal target in a vacuum. Usually, the target is a rotating copper anode and the electron source is a cathode whose potential is about 45 kV below that of the anode. To generate more intense radiation, oscillation of high-energy electrons in a magnetic field is utilized (Rosenbaum et al., 1971). Every change of direction of the electrons gives rise to the emission of intense radiation over a broad frequency range from visible light to hard X-ray radiation

38 2.5 X-ray structure analysis

(1*10 -11 m). The desired wavelength can then be accurately adjusted by double crystal monochromators (Girard et al., 2006). As soon as X-ray radiation hits matter, electrons interact with the changing electromagnetic field and in turn begin to oscillate. These oscillating dipoles on their part are emitting coherent radiation with the same wavelength as the incident radiation. This effect is called Thompson or elastic scattering, while the effects of inelastic scattering (Compton scattering) is not taken into account in X-ray analysis. As the used wavelength of the incident radiation is in the range of the distance between the scattering units in crystals, the binding distances of atoms, interferences of the scattered waves are created. In certain directions, the scattered waves interfere constructively, causing detectable reflections. In all other directions they desctructively interfere and cancel each other out and no reflections can be detected. In the case of hitting an ordered, crystalline system, X- ray radiation creates a characteristic reflection pattern whose symmetry directly reflects the underlying symmetry of the crystal. Its intensity distribution is determined by the arrangement of scattering matter within the repeating units of the crystal, in other words the electron density. The schematic representation of the diffraction process of X-ray radiation and matter is shown in Figure 14.

Figure 14 Schematic representation of X-ray diffraction by matter. The incident radiation is treated as a planar wave (Rhodes, 2000).

v Mathematically, the incident wave can be described by the wave vector k with a magnitude of v 0 2π/λ, the diffracted wave in good approximation by a vector k with the same magnitude. The diffraction vector s represents the difference vector between the incident and the diffracted waves. v The amplitude of the diffracted wave E at the position R is obtained by integration over all v diffracting volume elements V with associated electron density ρel (r ) (Equation 5). The physical properties of the diffracted wave are described by the terms in front of the integral, whereas the 39 2 EXPERIMENTAL PROCEDURES integral itself contains information on the electron density distribution in the diffracting volume element and is called the structure factor F. Mathematically, the amplitudes and the phases of the diffracting wave represent the Fourier transform (Fourier analysis) of the electron density at v location ρel (r) (Equation 6). With this Fourier analysis the electron density is transferred from real r to reciprocal space which is built by structure factos F( s ).

v v v v = [ ( )] Equation 5 E const. exp i ω k-t R ρel (r ) exp( 2π i r s ) dr ∫V

with ω angular frequency = 2 πc/λ, speed of light c, wavelength λ t time r s diffraction vector 2 πs = k - k0 v k wave vector of the diffracted beam v ρel (r) electron density at location r

v v v v v ( ) = Equation 6 F s ρel (r ) exp( 2π i r s ) rd ∫V

Constructive interference between diffracting X-ray beams just occurs, if the phase difference r between the incident and diffracted waves (2 π r s ) is a multiple of 2π (Equation 7). Hence, the Laue conditions, stated in Equation 7, are resulting.

v v a ⋅ s = h v v Equation 7 b ⋅ s = k with h, k, l being integers v v c ⋅ s = l

The so-called Miller indices, the integers h, k, and l of Equation 7, describe a set of lattice planes. Each set of lattice planes h, k, l corresponds to a specific order of diffraction and pass through a set of lattice points. If the unit cell fractional coordinates x, y, and z can describe the positional vector v r of real space, the following term is implied for for the structure factor F(hkl ) (Equation 8).

1 1 1 = []+ + Equation 8 F (hkl ) VEZ ∫ ∫ ∫ ρel (x, y, z) exp 2 π i (hx ky lz) dx dy dz x=0 y=0 z=0

In contrast, the structure factor can be calculated by Fourier analysis of the electron density

ρel (x, y, z) (Equation 6). In turn, the electron density can also be calculated by an inverse Fourier transformation (Fourier synthesis) of the structure factors F(hkl ). Further simplification is given,

40 2.5 X-ray structure analysis because diffraction and hence Laue conditions only occur along discrete directions. The integration can be replaced by a summation over all (hkl ) (Equation 9).

= 1 []− + + Equation 9 ρel (x, y, z) ∑∑∑ F (hkl ) exp 2 π i (hx ky lz) VEZ x y z

Equation 10 shows how a structure factor F(hkl ) can be described by a complex number through its amplitude F(hkl ) and phase α(hkl ). Furthermore, the intensities of the observed diffraction intensities can lead to the structure factors amplitudes (Equation 11), whereby the phase information is lost. Nowadays, different indirect methods are utilized to solve this so-called crystallographic phase problem (Section 2.5.8).

Equation 10 F(hkl ) = F(hkl ) exp [iα(hkl )]

∝ 2 = ⋅ ∗ Equation 11 I obs (hkl ) F (hkl ) F(hkl ) F (hkl )

Bragg’s law gives a simplified description of the diffraction process by crystal lattices (Equation 12). The principle is based on the description of the diffraction process as reflection of incident rays at a set of parallel lattice planes hkl that only occurs at specific glancing angles Θhkl (Figure 15).

= Θ Equation 12 n λ 2d hkl sin hkl

with n 1, 2, 3, … λ wavelength [Å] dhkl spacing of the lattice planes hkl [Å] θhkl glancing angle of the reflection hkl [°]

41 2 EXPERIMENTAL PROCEDURES

Figure 15 Schematic representation of Bragg’s law. Constructive interference between two diffracting rays only occurs if the path length between them corresponds to a multiple of the wavelentgh λ (Rhodes, 2000).

In case that neighbouring wave sets present a phase difference, which is a whole-number multiple of the incidental ray wavelength, constructive interference is occuring. Because of the r Laue condition, the diffraction vector shkl of the length 1/ dhkl placed perpendicular to the corresponding lattice planes (hkl ).

2.5.2 Reciprocal space and Ewald construction

In case of constructive interference, a set of reflections ( hkl ) emerge through diffraction of lattice v v v v v v = = = planes ( hkl ). The reciprocal lattice with the basis vectors a* s100 , b* s010 , and c* s001 is r described by the end points P(hkl ), if the diffraction vectors shkl are placed on common origin. In consideration of the Laue conditions, an expression relating real and reciprocal space is obtained (Equation 13).

r r r r r r r b × c c × a a × b r r r Equation 13 s = h r r r + k r r r + l r r r = ha * + bk * + cl * a()b × c a()b × c a()b × c

A graphical representation of Bragg’s law is given by the Ewald construction (Ewald, 1921) v (Figure 16). Herein, the crystal lies in the middle of the Ewald sphere with the radius 1/ λ = | k |/2 π, and the origin O of the reciprocal lattice is defined at the intersection of the incident beam with the Ewald sphere. Bragg’s law can only be true, if a lattice point P(hkl ), representing the end point of r the diffraction vector shkl , comes to lie on the surface of the Ewald’s sphere. Just the reflections

42 2.5 X-ray structure analysis which fulfill this condition can be observed, because the diffraction vector is then perpendicular to the resulting lattice plane. Its length is then 1/ dhkl = 2sin θhkl /λ. This relationship exactly corresponds to the Bragg condition (Equation 12). Data collection is performed with the rotation method, in which the crystal together with the reciprocal lattice is rotated around an axis, bringing different lattice points into reflecting position.

Figure 16 The Ewald construction showing the relationship between real and reciprocal space in a diffraction experiment using monchromatic radiation. The crystal lies in the middle of the Ewald sphere with radius 1/ λ. The origin O of reciprocal space is defined as the intersection of the incident (primary) beamv with the Ewald sphere. A reflection is recorded if the end point of a diffraction vector shkl (reciprocal lattice point P(hkl )) comes to lie on the surface of the Ewald sphere (Ewald, 1921).

Neglecting anomalous scattering, the intensities of the reflections hkl and h lk are the same and an inversion center is created at the origin of the reciprocal lattice which is not present in its real space counterpart. The reason for this is that there is no difference wether the diffraction of an X- ray beam occurs from the bottom or from above a lattice plane. This relationship is described by Friedel’s law (Equation 14) and, accordingly, the two reflections hkl and h lk are called Friedel mates. In the case of anomalous scattering, Friedel’s law is broken and the intensities of the two Friedel mates differ. This can be used to solve the phase problem (Section 2.5.8).

2 Equation 14 F (hkl ) 2 = F(h lk ) , from which follows I (hkl ) = I (h lk )

43 2 EXPERIMENTAL PROCEDURES

2.5.3 Temperature factors

Another possibility to describe the structure factor is the sum of scattering to all the atoms in the unit cell (Equation 15). Both the static disorder of the crystal and the actual thermal motion of the atoms lead to minor intensity of diffracted X-rays towards higher glancing angles (Debye, 1914).

2 This is taken into account by the so-called Debye-Waller factor exp( Bi 4/ d hkl ) , a correction term which is a Gaussian function of (1/ dhkl ). Generally, most of the intensity falloff with resolution belongs to the effect of disorder in real crystals. Nevertheless the quantity B is called the temperature factor. It is proportional to the mean square displacement of an atom from its resting position (Equation 16) and is thereby describing the isotropic displacement on one coordinate. Working with high resolution data, it may be necessary to model anisotropic displacements using anisotropic temperature factors, but this adds more parameters to the model, because they characterize the atomic motion in all three directions in space.

a  1  vv Equation 15 F(hkl ) = f 0 exp  − B  exp ()2 π sri ∑ i  i 2  i i  4d hkl 

0 with fi atomic scattering factor of atom i Bi temperature factor ( B factor) of atom i

Equation 16 B = 8 π2 u2

Determination of scale and temperature factors

With data of a resolution better than 3 Å the mean temperature factor B and the scale factor C, which places the structure factor amplitudes on an absolute scale, can be estimated using a Wilson plot (Wilson, 1949). Therefore, the data have to be divided into resolution bins and a graph is drawn v of the natural logarithm of the quotient of the mean measured intensity in the bin I (s ) and the v ( ) 0 absolute intensity calculated from the sum of all scattering factors at rest 〈I abs s 〉 = Σfi against (sin Θ/λ)2. By linear regression analysis the scale factor C and the temperature factor B can then be calculated (Equation 17), assuming that the unit cell parameters are known. As mentioned before, this determination can only be used with high resolution data, it at least provides a differentiation between DNA and protein crystals, because both have different characteristic interatomic distances due to secondary structure.

44 2.5 X-ray structure analysis

v I (s ) 2 = − sin Θ Equation 17 ln 0 2 ln C 2B 2 ∑( fi ) λ i

2.5.4 The Patterson function

The Patterson function P(u, v, w) is a Fourier transform of the measured reflection intensities and can be calculated directly, without any information about the phase angles, from the measured data (Equation 18). To prevent confusion with the real space coordinates x, y, z for the description of the v Patterson cell the relative coordinates u, v, and w of the vector u are utilized. It can be shown that the Patterson function represents a convolution of electron density with its inverse (Equation 19). In other words, it can be interpreted as a summation over all vectors between all atoms of the unit cell, v v v v + since the product of the electron densities at the locations r1 and r1 u just is non-zero if u corresponds exactly to the vector between two given atoms.

1 2 Equation 18 P(u, v, w) = ∑ F (hkl ) cos []2 π(hu + kv + lw VEZ h

v v v = + Equation 19 P(u) ρ(r1 )ρ(r1 u)dr 1 ∫v r1

A structure with N atoms per unit cell shows a Patterson map with N(N+1) non-origin peaks. The peak height is equal to the product between the number of electrons of both atoms. As there are always two opposed vectors between two atoms, the Patterson function is centrosymmetric, and all screw axes of the real space are becoming rotational axes in the Patterson map. In X-ray analysis of small molecules, it is often possible to deconvolute the Patterson map and solve the structure directly by reconstruction of the original atom positions from all sets of difference vectors. For proteins, however, both the large number of atoms and the typically low data resolution usually render the direct interpretation of a Patterson map impossible. However, the Patterson function can be used to determine heavy atom positions using anomalous or isomorphous differences, as only atoms that give rise to such differences will be visible in the resulting map. The Patterson function is also used in molecular replacement (Section 2.5.8) and can be used to determine proper non-crystallographic symmetry (Section 2.5.9).

45 2 EXPERIMENTAL PROCEDURES

2.5.5 Data collection

For the quality of X-ray analysis, it is important to collect all reflections of a crystal up to a certain resolution. Already small errors or inaccurancies lead to difficulties in space group determination, phasing, and downstream calculations, and may ultimately hamper structure solution. Therefore, accurate measurements of the positions and intensities of all reflections are indispensable. During this work, all measurements of intensities were performed using the rotation method (Arndt et al., 1973). Thereby, the crystal is rotated around a fixed axis in small intervals and a position-sensitive detector accumulates the diffracted X-rays. Each step results in an image (frame) containing positional coordinates of the individual picture elements, their intensities, and information regarding the geometry of the detector. Prior to measurements using synchrotron radiation, all crystals were tested at the home source which was generating the X-ray beam by the usage of a water-cooled rotating copper anode

(RU200B, Rigaku) operated at a voltage of 45 kV and a current of 120 mA. This gave rise to CuKα radiation with a wavelength λ of 1.5477 Å. Depending on crystal size and application, the beam was collimated to 0.3 or 0.4 mm. As experimental setup a typical goniometer system adjustable at three angles (3-circle goniostat) was used (Figure 17).

Figure 17 Schematic measurement setup of a multiwire area detector with a typical goniometer system adjustable at three angels ( χ = 45°) (3-circle goniostat). The crystal is rotated about ω and φ, the detector can be displaced in the angle of 2 θ (Arndt et al., 1973) .

The desired orientation of the crystal was adjusted with the angles ω, φ, and χ, whereas χ was fixed at 45°. For measurements of higher resolution the detector was displaced in the angle of 2θ. During the measurements, the crystal was rotated in certain steps of 0.1 or 0.2° around ω and the exposure time was set up regarding the scattering performance of the crystal between 120 and 300 sec. During all experiments, the distance between crystal and detector was arranged at 160 mm

46 2.5 X-ray structure analysis

(Dauter, 1998) and for data collection of the reflections a multiwire area detector was utilized (Siemens, X-1000). Because of inhomogenous sensitiveness and geometrical distortion, the multiwire area detector had to be calibrated prior to each measurement. During the flood-field exposure, the primary beam hits a thin plate made of amorphous iron and the resulting X-ray fluorescense radiation was detected for 5 min. From the measured data of the intensities a correction table for the position-dependent sensibility of the detector was created (flood-field correction). Thereby, the angle of 2Θ is 360°. For the correction of the spatial distortion of the detector, a brass-plate exposure was performed which means that a perforated brass plate was put on the detector and the X-ray fluorescense radiation of the amorphous iron was collected for 10 min. Measurements using synchrotron radiation were operated at beamline 14.1 at BESSY (Berlin, Germany), while a charge coupled device (CCD) detector (marCCD165, Marresearch) was used for data collection. Exposure time per frame was 10 sec during a rotation of 1°. The wavelength was selected with a Si(111) monochromator.

2.5.6 Improvement of the scattering performance of crystals

The qualtity of the measured data is highly dependent on the scattering quality of the crystals. Most commonly, the following considerations come into account for a improvement of data quality: screening of new crystallization conditions, distinct crystal forms, crystallization of the same protein but from a different organism or surface mutations prior to crystallization (Derewenda, 2004; Longenecker et al., 2001). All mentioned methods tend to change the step of crystallization, but instead of rejection of poorly scattering crystals, there are fast and easy methods to improve the scattering quality. Especially, the following parameters can be varied: order of the crystal packing, solvent content, mosaicity of crystals, and, most importantly, removal of ice rings during data collection. Several methods have been described with which significant improvement have been achieved, e.g. annealing, dehydration, soaking and cross-linking (Heras and Martin, 2005). During this work, besides the screening for new crystal conditions, the method of annealing has been performed During the process of freezing critical defects can easily occur, if not enough cryoprotectant is added. Mainly, this happens because of volume changes due to the conversion of water from liquid to solid state. The method of annealing is taking this into account and the crystal is shortly unfrozen at RT and then frozen again. In annealing it can be differentiated between macromolecular (MCA) and flash annealing (Harp et al., 1999). Both were adopted in this work (Figure 18). Whereas, the cryo stream is interrupted just temporarly during latter, in MCA the crystal is placed into a drop containing cryo buffer for 10 min at 20°C (Figure 18).

47 2 EXPERIMENTAL PROCEDURES

Figure 18 Schematic representation of macromolecular (MCA) (A) and flash annealing (B) (Heras and Martin, 2005).

2.5.7 Indexing, scaling, and data reduction

Meanwhile, there are many different computer programs for the processing of X-ray data. Here, all reflection data were processed with the XDS package (Kabsch, 2010; Kabsch and Sander, 1983). In data processing, the first step is the determination of the dimensions and orientation of a primitive crystal lattice relative to the laboratory coordinate system. To index the reciprocal lattice the strongest reflections are used and the Laue symmetry can be determined. As many reflections are only partially recorded on a single frame, so-called batches have to be calculated; three- dimensional spot profiles from several consecutive frames. These batches usually comprise about 5 to 10° of the rotation. Afterwards, profile fitting is performed. This is the actual integration step and, thereby, all reflections are integrated by fitting them to the best available profile. In the following postrefinement, reflections that lie close enough to their predicted positions are used to refine geometric parameters, such as cell parameters, detector distance and orientation. The last two steps are iteratively repeated until convergence of the data which is the so-called raw reflection intensities (Leslie, 2006). These data together with their standard deviations are then corrected for absorption and Lorentz polarization effects. Afterwards they are adjusted on each other via a simple scaling. At this stage it is already possible to determine the correct space group by comparison of the intensities with that of a given symmetry-related space group and by analyzing the systemic absence (caused by lattice centering) or occurrence of screw axes, respectively. Generally, during data collection the experimental conditions are varying. Problems can occur because of fluctuation of the X-ray beam, decreasing scattering quality through radiation damages or anisotropic effects of the crystal. Therefore, intensities have to be scaled within the datase in order to reduce the mean variation in intensity between multiple measurements of equivalent reflections (Evans, 2006). The scale factors that are used have to be adjusted to rotation angle and resolution. Scaling is able to compensate, at least partially, for the effects of radiation damage (Diederichs, 2006). The next step in data processing is the reduction of the intensities of equivalent

48 2.5 X-ray structure analysis reflections, which have been merged before, to the asymmetric unit of the space group used for processing. The XDS package contains the moduls XSCALE and XCONV for these purposes. There are several statistical indicators for the assessment of data quality which are generally calculated in resolution bins because the quality of intensity measurements varies largely with the resolution. A so-called R (reliability) factor was introduced to assess the data quality. It measures the accuracy as ratio between the mean difference of a value that should be the same and the mean magnitude of the measured value itself. The most commonly used version of the R factor is the Rsym factor which is defined in Equation 20. It describes the variations in the intensity of symmetry- related reflections that should be identical according to crystal symmetry. A similar possibility is the assessment of the agreement between multiple intensity measurements of the same reflection from the same of different crystals, which is done by the less often used Rmerge factor. In all cases, the average fractional error with respect to intensities is described by R factors. One problem occurs because both Rsym and Rmerge factor depend on the multiplicity, also called redundancy, and become larger if reflections are measured several times. Collecting highly redundant data actually improves data quality, but will lead to high Rsym values and apparently inferior data, especially at high resolution. This was the reason for the introduction of a redundancy-independent R factor, the so- called Rmeas (also called Rr.i.m. ) which allows a more realistic assessment of data quality (Diederichs and Karplus, 1997). The definition is given in Equation 21. Other indicators for data quality are the ratio of the mean intensity and the mean standard deviation of intensities / σ (also abbreviated to I/σ(I)), the fractional data completeness and the multiplicity of observations.

nhkl − ∑∑ I (hkl )i I (hkl ) Equation 20 R = hkl i=1 sym nhkl ∑∑ I (hkl )i hkl i=1

with I (hkl ) mean intensity of all symmetry-related reflections (hkl )

I (hkl )i intensity of reflection i(hkl )

nhkl nhkl − ∑ ∑ I (hkl )i I (hkl ) n −1 = Equation 21 R = hkl hkl i 1 meas nhkl ∑∑ I (hkl )i hkl i=1

nhkl number of independent measurements of the reflection ( hkl )

49 2 EXPERIMENTAL PROCEDURES

2.5.8 Phase determination by molecular replacement

As described in Section 1.1 the main problem in X-ray analsis is the loss of the phase information during the scattering of X-ray beams, the so-called crystallographic phase problem. Calculation of the amplitudes is easily possible from the diffraction data, whereas the phase information is not directly available. Nowadays, there are different indirect methods that allow solving the phase problem, e.g. phase determination by isomorphous replacement (Green et al., 1954), by anomalous scattering, use of heavy atom clusters or by molecular replacement. During this work, solely the latter method was used. This method can only be used if the structure of the protein which has to be analyzed or a close structural homolog is already known. Mostly, molecular replacement is utilized in cases of a protein crystallizing in multiple crystal forms with different space groups, where just small distinctions from the known structure are expected. An example is given in the present work, in which a protein inhibitor complex has to be solved and the protein structure was already known. Structural homologs can only be used in case of a significant degree of structural similarity between model and the protein which has to be analyzed. In other words, on the level of protein sequence, it is expected that two proteins with a degree of 30% or more of sequence identity also share a common fold (Sander and Schneider, 1991). Statistical analyses have shown that the number of distinct protein folds in nature is restricted to only 500 to 1000 (Benner et al., 1997; Schulz, 1981). Combined with the continuously increasing number of available protein structures and the new folds which are being discovered, this should lead to better secondary and tertiary structure prediction. Moreover it could also lead to structure determination by using a fold- library as models for molecular replacement. The principle of molecular replacement is based on the placement of a search model into the unit cell of a new crystal in such a way that its position and orientation is obtained. With this information, calculation of phases from the model is possible with the aid of Equation 15. Together with the structure factor amplitudes from the new cell it is then possible to calculate an electron density map and start model building and refinement (Section 2.5.10). The placement of the search model, is a six-dimensional problem which can be simplified by splitting it into a rotation and a translation search (Rossmann and Blow, 1962). The translation and the rotation search can be implemented as Patterson correlation function using structure factor amplitudes from the search model and the unknown crystal. Nevertheless, the basis for a successful search is both a high degree of structural similarity between search and target molecules as well as a high quality of the diffraction data. Furthermore, the number of monomers in the cell and the complexity of the space group play a role. If some external phases are available, such as initial phases from isomorphous

50 2.5 X-ray structure analysis replacement, these ones can be used during the translation search (phased translation function). To assess the quality of the obtained results, different indicators are given by each program, e.g. correlation coefficients or so-called Z scores. Especially, in the translation search, for correct and false solutions, these indicators give highly different values. In all models obtained from molecular replacement, it has to be considered that electron density is strongly influenced by the search model as the phases completely originate from those ones. Precisely, this so-called model bias comes into account in molecular replacement with low quality data. During this work, molecular replacement searches were performed using MOLREP (Vagin and Teplyakov, 1997) and PHASER (McCoy, 2004). In general, the ten best solutions (orientations) of the rotational search were used to calculate ten translation searches each. An end point of a successful molecular replacement search was defined by a correlation coefficient larger than 0.3 and an Rcryst factor of lower than 50%. The most important decision criterion was a clear drop in difference function peak height between correct and incorrect solutions.

2.5.9 Non-crystallographic symmetry

In case that the asymmetric unit of a crystal contains more than one copy of the molecules, the symmetry relation between the copies is called non-crystallographic symmetry (NCS). In protein crystals, rotation and translation operations are both possible as NCS operations. In most of oligomeric proteins rotational NCS is of the closed type (proper NCS), with a rotational angle corresponding to a rotation of (360/ n)°, forming assemblies with n-fold rotational symmetry in solution (C n point groups). Otherwise, it belongs to the open type (improper). In addition to the rotational component, a translation operation is necessary to describe the position of the rotated copy relative to the original one. Two copies of proteins related via NCS are usually defined by a 3x3 matrix describing the rotation and three parameters describing the translation in orthogonal coordinates. The superposition of the native Patterson function with a rotated copy of itself is called self rotation function. With its peaks the rotation component R of the NCS operator can be determined. (Rossmann and Blow, 1962). Thereby, the rotation is done in discrete steps around the origin by a multiplication of the vector u with a different rotation matrix C in each step. This results in a rotated v vector urot . Importantly, the maximal radius of the integration rmax must only include vectors between atoms in the same protomer (self vectors). If the value of rmax is too large, noise is introduced by so-called cross vectors, vectors between atoms in different protomers.

51 2 EXPERIMENTAL PROCEDURES

v Equation 22 R = P(u)P(u )dV ∫ v < rot u rmx

with r maximal radius of integration maxv P(u) value of the Patterson function a u (v ) P urot value of the rotated Patterson function at urot Usually, the translational component T of the NCS cannot be determined without additional phase information. At the stage of an initial electron density map, the position of the axis in the unit cell can be determined by translating it throughout the cell and calculating the correlation of electron density values in n equivalent masked regions around the axis (Vonrhein and Schulz, 1999). However, it is only working with proper rotational symmetry.

2.5.10 Model building and refinement

Model building

After solving the crystallographic phase problem, the resulting density map must be interpreted by building a model of the molecule. In case of good quality data and high resolution of the experimental density this procedure is straighforward and can at least partially be performed in an automated fashion. In this work, experimental electron density was automatically interpreted with the package ARP/wARP (Perrakis et al ., 1999). In contrast to place the C α atoms into the density, this program completely fills the electron density with a model consisting of dummy atoms and builds a first model, the so-called free atom model. Thereby, the atoms are placed according to distance criteria (1.1 Å distance between each other) and a variable density threshold. The next step is the search among the dummy atoms for sets of interatomic distances corresponding to dipeptides by pattern-recognition algorithms. The atoms found are joined to the main chain and the remaining dummy atoms are defined to be side chain atoms as long as they can be recognized as those and are included in the model. Both steps use prior knowledge of protein stereochemistry during placement and linking of fragments. This means that the procedure should be based on good model geometry, otherwise results are not satisfactory. The single peptide fragments are then superimposed with the protein sequence in single-residue steps prior to checking the fit of the placed residues to the electron density. With good fit they are assigned to the corresponding part of the sequences and are conformationally refined. During model building ARP/wARP already corrects the initial phase information. Usually, significantly higher quality of phases is gained during the procedure. With corresponding weighting the program combines the initial model phases with the experimental phases resulting in increased quality. Additionally, the model is refined between building cycles 52 2.5 X-ray structure analysis using reciprocal-space refinement programs such as REFMAC (Murshudov et al., 1997). Manual model building and evaluation of the structure, which is automatically built, was performed with COOT (Emsley and Cowtan, 2004). After each round of rebuilding, the model was refined in reciprocal space and the resulting electron density inspected, whereby the following types of electron density maps were used:

(Fobs – Fcalc ) electron density map

Difference Fourier maps with the coefficients ( Fobs -Fcalc )exp( iαcalc ) show differences between the model and the true structure. Missing parts of the model appear as positive and wrongly positioned with negative difference density. The differences occur with half of the actual electron density and usually become significant at a contour level of three standard deviations (3 σ) of the mean electron density. (Fobs -Fcalc ) density maps are the basis for correction and rebuilding of the protein model as well as for the placement of water molecules or small molecule ligands.

(2 Fobs – Fcalc ) electron density map

An electron density map calculated with the coefficient (Fobs -Fcalc )exp( iαcalc ) can be described as a superposition of an (Fobs )exp( iαcalc ) and an (Fobs -Fcalc )exp( iαcalc ) electron density map. These maps are usually viewed at a contour level between 1.0 and 1.5 σ and show the modeled protein, but sometimes are strongly influenced by model bias.

σA-weighted ( mF obs – DF calc ) and (2 mF obs – DF calc ) electron density maps There will be always in both the phase and the amplitude of the calculated structure factor, if parts of the model are not yet built or are wrongly placed. To address this problem, the calculation of structure factor amplitudes must be weighted accordingly the fact that both the observed and the calculated structure factor amplitudes have errors (Figure 19). Therefore, in σA-weighting corrections are introduced that account for errors in the model, incompleteness and disorder, and errors in scaling (Read, 1986; Read, 1990). In this work, σA-weighted Fourier coefficients were calculated with REFMAC (Murshudov et al., 1997) and the resulting weighted electron density maps were used throughout the refinement.

53 2 EXPERIMENTAL PROCEDURES

Figure 19 Argand diagram showing the meaing of weighted coefficients in the calculation of electron density maps. The calculated structure factor amplitude from the most probable model (Fcalc ) is not the most probable structure factor amplitude (dashed). The most probable value of the structure factor amplitude has the same phase as Fcalc . But it is reduced by a factor D, which approaches unity if the model is assumed to be exact and zero if the model is deemed unrelieble. Uncertainties in the calculated structure factor give rise to a two- dimensional Gaussian distribution around DF calc (0 ≤ D ≤ 1)that becomes very sharp if coordinate uncertainties are small (and vice versa). A figure of merit m is used to weight the observed structure factor amplitudes. As the quality of the model increases and the mean phase error decreases, m also approaches unity.

2.5.11 Macromolecular refinement

The initial, experimental protein model is certainly not the best possible way of interpretating the measured data. In other words, it is not yet in good agreement with the experimental diffraction data concerning chemical and biological properties. As mentioned before, it is possible to calculate stucture factors for the model with an inverse Fourier transformation (Equation 15), given atomic positions, occupancies and B factors. The comparison of the calculated structure factors Fcalc with the observed values Fobs still shows significant inequalities. The aim of structure determination is lowering these differences by variation of the model parameters. To statistically measure the fit of the model to the observed data the crystallographic R factor Rcryst was invented (Equation 23). The value for randomly placed atoms in the unit cell would be 59% (Wilson, 1949) and R values around 50% are common for initial and severely incomplete models. In contrary, well-refined models will generate values between 10% and 25% depending on the resolution. The Rcryst value is decreasing with higher resolution.

( )− ( ) ∑ Fobs hkl kF calc hkl Equation 23 R = hkl cryst () ∑ Fobs hkl hkl

with k resolution-dependent scale factor 54 2.5 X-ray structure analysis

The aim of macromolecular refinement is to minimize a residual function of the observed structure factor amplitudes by changing the parameters of the model. The residual function used in standard least squares refinement is given in Equation 24.

[F (hkl )− F (hkl P, )]2 Equation 24 f ()P = obs calc ∑ σ ()2 hkl obs hkl

with P set of model parameters Fcalc (hkl ,P) calculated structure factor amplitude for the reflection hkl using a set of parameters P

In refinement, observables correspond to independent reflections. The more are available, the easier it is to determine the optimal set of parameter P that results in the minimum value of the residual function. Theoretically, it would be sufficient using an equal number of observables and parameters, but in practice the ratio must be much higher towards the observables, because the various minimization programs use approximations to the full-matrix method (Murshudov et al., 1997; Tronrud, 2004). As the number of observables is restricted because of data resolution and completeness as well as space group and sovent content of the crystal, resolution is the most important factor. The refinement process should be started with a high observable to parameter ratio. However, it can be lowered if refinement progresses and errors in the model become smaller (Kleywegt and Jones, 1995). The ratio can significantly be influenced by using different types of parameterization of the model. As mentioned before, the number of parameters should be lowered to obtain a reasonable observable to parameter ratio. On the one hand, this can be achieved by using a contraint that defines a rigid relationship between two parameters which causes to collapsing the formerly two parameters in only one parameter value. Likewise, many parameters can be effectively removed from the refinement process and their number can be reduced. On the other hand, with the introduction of restraints, parameters are allowed to deviate from a target value, but just with an imposition of a penalty. Thereby, the number of parameters is significantly decreased, because equations are written that correlate parameters to one another. The difference to constraints is that in the case of restraints it is uncertain into how many parameters the former number is reduced. With restraints three parameters can be either three, two or one parameter. Therefore, the introduction of restraints is not as easy to handle like of contraints. A common mistake in the refinement process is so-called overfitting. This occurs if too many parameters which are not supported by experimental data are introduced into the model. To avoid overfitting, 5 to 10% or at least 500 reflections of the experimental data are set aside and are not

55 2 EXPERIMENTAL PROCEDURES used for crystallographic refinement or map calculations. This generally accepted method, allows immediate assessment of overfitting after the refinement (Brunger, 1993). With the so-called test set of reflections, a 'free' R factor can be calculated in the same fashion as Rcryst (Equation 25). The value of Rfree correlates with the mean error of the model phases and is used as an indicator of refinement progress (Kleywegt and Brunger, 1996).

( )− ( ) ∑ Fobs hkl kF calc hkl Equation 25 R = hkl ∈T free () ∑ Fobs hkl hkl ∈T

Model parameterization: Coordinates

A model in refinement as unity of single atoms is defined by positional coordinates x, y, and z, the B factor, and an occupancy value of each atom. Usually, the occupancy is assumed to be unity, apart from those cases where double conformations are modeled by two conformers with half occupancy. Hence, four parameters (in case of isotropic B factors; nine with anisotropic B factors; see below) are sufficient to parameterize the model, the B factor and three positional coordinates. As mentioned before, with a low number of observables (reflections), e.g. in low resolution data, the number of parameters to describe the model has to be lowered to gain a reasonable observable to parameter ratio. The simplest way to describe a model is that of a rigid body. In rigid body refinement the internal structure of the protein is regarded to be rigid and not moving while the position and orientation of the protein has changed. Therefore, only six parameters are sufficient to describe the model, three rotational and three translational, and this refinement method can be performed already with low resolution data. It has a high radius of convergence and in this work it was used for initial refinement of the initial model. Data with higher resolution and number of observables can be refined with an intermediate method. On the one hand, there are too many observables for the rigid body refinement, but not enough for individual positional refinement. The principle of this method is the description of a complete protein fold by only two main chain torsion angles per residue plus a variable number of side-chain torsion angles. The basis for the torsion-angle description, the bond angles and distances, are taken from standard libraries and are constrained during the refinement (Engh and Huber, 1991). Regarding an average residue which has about eight atoms and five torsion angles, only five parameters need to be refined with this method. Compared to the individual positional refinement, where 24 parameters have to be described, this corresponds to a five-fold decrease in the number of parameters. Torsion-angle refinement is used for intermediate

56 2.5 X-ray structure analysis resolution ranges and is implemented in the CNS package (Brunger et al., 1998; Rice and Brunger, 1994). The third possibility to decrease the number of parameters is NCS (Section 2.5.9). The number of parameters can, thereby, be reduced because only one protomer is refined and the other copies are generated using the NCS relationship. This means that only one set of atomic coordinates has to be described which drops the number of positional parameters to 1/ n if n-fold NCS is used. However, this is only true for NCS constraints, but not restraints, because constraints do not allow any variation in structure between NCS copies. Using restraints allows variations between the individual protomers and all atomic coordinates in the respective NCS copy have to be refined separately again. That is the reason why the number of parameters is reduced to uncertain numbers with restraints.

Model parameterization: B factors

As mentioned before, the vibrational movement of an atom around its central position can be described by the temperature or B factor (Section 2.5.3). Depending on the number of observables different parameterizations of the B factors are used. Like in rigid body refinement, the simplest temperature factor model is that of just one B factor for the whole molecule. This is utilized in low resolution refinement with a small number of parameters. The next step would be using group B factors for each residue or separate B factors for the main chain and side chain atoms in each residue. The number of parameters is thereby reduced, because all atoms of one group contain the same value. Data with higher resolution than 2.0 Å can be refined with individual B factors, starting with one isotropic B factor per atom. At even higher resolution individual anisotropic B factors can be refined. Depending on other factors, this can be achieved at resolution higher than approx. 1.3 Å as the number of parameters is considerably increased by describing each atom with six parameters. Although it is very probable that in low to medium resolution the strong motion of atoms in the lattice is the reason for the weak diffracting power of the crystal, isotropic B factors have to be used to describe the atoms which are not able to depict the three-dimensional motion. This is the reason why a different parameterization was introduced that allows modeling of anisotropic motion without the need of many new parameters, the so-called TLS (translation, libration, screw) model (Schomaker and Trueblood, 1968; Winn et al., 2001). In refinement with the TLS model, the molecule is divided into TLS group (Figure 20), whereby the rigid-body displacement of atoms in the group is described by the three 3x3 tensors T, L and S. Symmetric tensor T and L describe the translation in units of Å 2 and the rotational components in units of rad 2. With the third, asymmetric tensor S describing the correlation between the translation and the rotation movement, 20 new parameters are needed for one single TLS group. As this is considerably less than in anisotropic B

57 2 EXPERIMENTAL PROCEDURES factor refinement, TLS refinement can be used at quite low resolution and still offer good results (Painter and Merritt, 2006; Winn et al., 2003). During this work, the TLS parameterization was done with REFMAC which automatically refines the B factors after TLS refinement.

Figure 20 Schematic representation of TLS refinement. Each TLS group librates around its center of gravity (Winn et al., 2003).

Maximum-likelihood refinement

Modern refinement programs minimize a maximum-likelihood residual (Equation 26) instead of a least square residual in former times (Equation 24).

[F (hkl )− F (hkl P, ) ]2 Equation 26 f ()P = obs calc ∑ σ ()()2 + σ 2 hkl obs hkl calc hkl P,

with P set of model parameters ( ) Fcalc hkl P, expection value of F calculated from all plausible models similar to P σ ( ) calc hkl P, width of the distribution of values for Fcalc (hkl ,P) σ ( ) obs hkl experimental standard deviation fo F(hkl )

The advance of the modern method is that it includes errors in the model and adds them to the uncertainties of the experimental data in the weighting term, while least-squares residuals just neclected the errors. The probability of a model being correct is therein calculated by Bayesian inference. Stereochemical restraints are included in the likelihood distribution of the observed data, which parallels the treatment of restraints as additional observables in the least-squares methodology. During this work, CNS (Brunger et al., 1998) and REFMAC (Murshudov et al., 1997) were used which both utilize maximum-likelihood targets.

58 2.5 X-ray structure analysis

Water placement and ligand fitting

Water molecules were either automatically induced with ARP/wARP (Perrakis et al., 1999) or

COOT (Emsley and Cowtan, 2004). During the automatical search ( mF obs - DFcalc ) and (2 mF obs -

DF calc ) were used. Criteria for placement were the presence of spherical difference density of at least 3 σ in ( mF obs - DFcalc ) maps and the presence of a suitable donor/acceptor nearby. Afterwards, the density of the newly introduced water molecules were inspected at a contour level of 1.0 σ in

(2 mF obs - DF calc ) difference maps and, eventually, placement was corrected, completed or wrongly placed water molecules deleted. The small molecule inhibitor and the modified amino acid residue pyroglutamate were manually fit into ( mF obs - DFcalc ) difference density maps using COOT (Emsley and Cowtan, 2004). Coordinates were obtained from the HIC-Up server (hetero-compound information center Uppsala) (Kleywegt et al., 2003) in case of the pyroglutamate and PRODRG2 (Schuttelkopf and van Aalten, 2004) for the inhibitor. Stereochemical restraint libraries for the pyroglutamate were directly obtained from REFMAC (Murshudov et al., 1997) and in case of the inhibitor from PRODRG2 (Schuttelkopf and van Aalten, 2004).

2.6 Protein structures

2.6.1 Validation of protein structures

Clearly, in the final model of protein structures different kind of errors are still present. These can either be global or local. The former type would be errors in the symmetry operators or overall scaling. Other errors are of local nature, such as misplaced residues, wrong sequence assignment or false stereochemistry, to name a few. For structure quality evaluation several statistical indicators exist which concern both local and global errors (Dodson et al., 1998).

Reliability factors Rcryst and Rfree

As mentioned above, the crystallographic R factor Rcryst is an indicator for the agreement between observed structure factor amplitudes and those calculated from the model (Equation 23). In a global way, it just can describe the ability of the modeled distribution of atoms to explain the measured X-ray data. It is not able to evaluate model geometry or chemical plausibility. Furthermore, it depends strongly on the observables to parameters ratio. This is the reason why it is not meaningful as indicator for model quality of protein structures (Kleywegt and Jones, 1995). In contrast, further development of the free R factor is a very useful tool to monitor refinement progress as it is sensitive to overfitting and offers some measures of model plausibility. In some rare

59 2 EXPERIMENTAL PROCEDURES cases, such as the presence of high-order NCS, its value could artificially be lowered (Fabiola et al., 2006). Mainly because of this reason, the statistical basis for the free R definition was compromised.

Main and side chain torsion angles

In all proteins only two rotational degrees of freedom per residue exist in the main chain, the torsion angles φ and ψ, because of the rigidity of the peptide bond. Furthermore, both are restricted to certain values in all amino acids except for glycines. Otherwise, the presence of a bulky side chain at C α would lead to clashes with main chain amide hydrogen or carbonyl oxygen atoms. Graphically, this fact can be depicted by a so-called Ramachandran diagram in which φ is plotted against ψ for all residues of the model (Ramachandran and Sasisekharan, 1968). The angles should come to lie in allowed regions. These region were originally defined by modeling and since then they are updated using libraries of observed main chain torsion angles in high resolution structures (Lovell et al., 2003). There are special allowed regions for torsion angles of the flexible glycine residues and those ones which proceed proline. In general, main-chain torsion angles are not restrained in high-resolution data (>2.0 Å) refinement and, therefore, are an excellent source of information for structure validation. In a refined model of data at about 2.0 Å at least 98% of all Ramachandran data should be in allowed regions.

A similar situation occurs with side chain torsion angles χ1, χ2, χ3 etc. which are restricted to certain combinations because of putative steric clashes. The angles of all side chains are compared to rotamer libraries compiled by analyzing high-resolution structures (Lovell et al., 2000). In proteins, only a small number of different rotamers are observed for certain amino acid side chains. This means that if a set of side chain torsion angles corresponds to a frequently occurring rotamer, it is likely to be correct. Additionally, side chain torsion angles cannot be restrained during the refinement process because in general several ideal values exist for a given side chain. Equivalent torsion angles can usually be merged by a 120° rotation. This is the reason for minimization problems that occur during refinement. Side chain torsion angles are either weakly restrained or not refined at all. As before, also these angles are very good independent quality indicators in structure validation. In this work, Ramachandran analysis (main chain torsion angles) was done with RAMPAGE (Lovell et al., 2003) and PROCHECK (Laskowski et al., 1993), and rotamer analysis (side chain torsion angles) with COOT (Emsley and Cowtan, 2004).

60 2.6 Protein structures

Stereochemistry

Another commonly used measure of model quality is given by the comparison of stereochemical parameters of the model, such as bond distances, bond angles, planar groups, and improper torsion angles, with ideal values obtained from analysis of small molecule crystal structures (Engh and Huber, 1991). The stereochemical parameters of the model are restrained to the ideal values and the RMS (root mean square) deviation is reported. It has to be considered that adherence of the model to the ideal values is not an indicator of model quality, but rather an indicator of how strictly the restraints were enforced during refinement. In this work, RMS deviations have been calculatet with PROCHECK (Laskowski et al., 1993).

2.6.2 Presentation and analysis of protein structures

The atomic representation of protein structures is just useful in small parts because of the vast size of macromolecules. Hence, several representations have been introduced for the depiction of a whole molecule. The overall fold of the protein can either be shown by plotting only the C α positions, the so-called C α trace, or by abstractly showing the fold (ribbon representation) or including secondary structure elements (cartoon representation) (Richardson, 1985a; Richardson, 1985b). Additionally, the representation of the protein surface (surface repesentation) is useful for different analyes, e.g. binding site or surface characteristics, but also to obtain an impression how the protein really looks. In this work, all representation of protein structures were created with POVSCRIPT+ (Fenn et al., 2003) and rendered with POVRAY (http://www.povray.org) or with PYMOL (http://www.pymol.sourceforge.org). The assignment of secondary structure elements was done automatically with DSSP (Dictionary of Secondary Structure of Proteins) (Kabsch and Sander, 1983) by using hydrogen bond, main chain torsion angle or distance criteria. For the determination of polar and hydrophobic interactions between protein and ligands CONTACT (CCP4, 1994) was used with 3.4 Å and 5.5 Å cutt-off distances. Additionally, interactions have been analyzed manually with COOT (Emsley and Cowtan, 2004) and LIGPLOT (Wallace et al., 1995).

2.6.3 Structure comparison

Usually, proteins do not possess a unique fold, but rather belong to the same with related structures. The comparison and superposition of structures from different members can help to understand relationships in between the family. In contrast, differences in the structure also can explain functional differences. Typically, in structure comparison the number of structurally equivalent residues, a distance cutoff in Å (Holm and Sander, 1993) and the RMS deviations of structurally equivalent atoms, mostly C α atoms, in Å (Maiorov and Crippen, 1994) are reported as

61 2 EXPERIMENTAL PROCEDURES criteria for the similarity of structures. In this work, structure comparison was performed either with SSM (secondary structure matching) (Krissinel and Henrick, 2004) or with PYMOL (http://www.pymol.sourceforge.org) by overlaying secondary structure elements. Pairwise structure comparison was also performed with DALILITE (Holm and Park, 2000) which computes Z scores indicating the statistical significance of the match. Large-scale searches for structural homologs were performed in a non-redundant subset of PDB using DALI (Holm and Sander, 1993).

2.7 In silico analyses using protein structures and sequences

In this work, most of the in silico experiments were performed at the Computer-Aided Molecular Design Group and the Center of Molecular Biosciences at the Leopold-Franzens-University in Innsbruck, Austria.

2.7.1 Pharmacophore modeling

Pharmcophore modeling is generally separated into direct and indirect approaches. In the indirect case, no structure of the target protein is available, and compounds which are known to bind the same target are investigated in terms of their common chemical features. As a result, ligand-based pharmacophore models are derived which represent the key features of these substances. In the direct approach, three-dimensional structures from X-ray crystallography or NMR are used to derive a set of protein-ligand interactions and build a structure-based pharmacophore model. In this work, both methods were used, but mainly the latter, as the BaP1*inhibitor complex with its high- resolution data served as an excellent starting point for pharmacophore modeling. All pharmacophore models were generated with the computer program LIGANDSCOUT (Inte:ligand, version 1.01, 2006) (Wolber and Langer, 2005), and then converted for exportation and ongoing virtual screening (Section 2.7.5). In the program, an initial pharmacophore model is automatically created from the PDB structure of a complex, with the most important protein-ligand interactions then chosen by hand for the final model (Figure 21).

Figure 21 The different interactions recognized and labeled in LIGANDSCOUT. From the ligand’s point of view, it can be differentiated between hydrogen bond acceptor (green) and donor (red arrow), positive (blue) and negative ionizable (red star), metal binding (cone), excluded volume (black sphere), hydrophobic (yellow sphere) and ring aromatic feature (rings).

62 2.7 In silico analyses using protein structures and sequences

2.7.2 Protein-ligand docking

In this work, protein-ligand docking was used to further validate the hit lists of putative BaP1 inhibitors. No docking scores were calcuted, but all docking poses were visually inspected for a reasonable position in the binding site. This is a commonly used procedure in virtual drug design approaches (Warren et al., 2006). Protein-docking experiments were mainly performed with GLIDE (Schrödinger 7.5 Software. 2007) (Friesner et al., 2004) and with GOLD (Cambridge Crystallographic Data Centre, version 3.1, 2006) (Verdonk et al., 2003).

2.7.3 Molecular dynamics simulations

Molecular dynamics (MD) simulations mentioned in this work that involved time-dependent motion of proteins were performed in close collaboration with Hannes Wallnoefer of the Center for Molecular Biosciences at the Leopold-Franzens-University in Innsbruck, Austria. Free energy binding affinity calculations of potential inhibitors of BaP1 were performed by Thomas Steinbrecher of the Department for Theoretical Chemical Biology at the Karlruher Institute for Technology, Karlsruhe, Germany.

Time-dependent motion of protein structures

The theoretical basis of MD simulations is the time-dependent Schrödinger equation which is needed for every description of dynamic systems (Equation 27). However, even in systems with just a few atoms, the number of degrees of freedom is immense and the equation is difficult to solve analytically. In order to perform MD simulations of much larger systems, e.g. macromolecules, many assumptions have to be made, including the Born-Oppenheimer approximation, the use of empirical force fields instead of effective potentials, and the use of the Newtonian equation of motion to describe nuclear dynamics.

ih ∂Ψ Equation 27 HˆΨ = 2π ∂t

with Ĥ Hamiltonian h Planck’s constant Ψ wave function of the system

Other important factors in MD simulations are maintaining a 'natural' environment (water box), periodic marginal conditions, and interaction cutoffs. With these assumptions and a lot of computational power, the motion of the target protein can be calculated for short time intervals

63 2 EXPERIMENTAL PROCEDURES

(usually a few ns). Commonly, results of MD simulations are given as plots of RMS deviation or B factor against simulation time. The latter is an indicator for the mobility of single atoms, analogous to the one used in X-ray crystallography (Section 2.5), while the former is the distance of an atom from the point of reference.

Free energy calculations with the aid of MD simulations

Different approaches were performed to analyze the properties of the experimentally determined BaP1*inhibitor complex on one hand, and to calculate binding affinities of new or rationally designed small molecules to BaP1 on the other. Thermodynamic-integration methods (TI) allow the calculation of free energy differences based on MD simulations (Gilson and Zhou, 2007; Jorgensen, 2004). In contrast to protein-ligand docking (Section 2.7.2), realistic free energies can be obtained.

These can then be interpretated as binding constants, pK a values, etc., and conclusions concerning the underlying chemical systems can be made. The basis for TI calculations describes the conversion of one chemical system into another (Figure 22). The corresponding free transformation energies G0 can not be calculated directly, but must be determined through application of the thermodynamic cycle. Thereby, an integral has to be solved by numerical convergence with the use of different λ values (snapshots) of the MD simulations (Equation 28). Each simulation provides a single datapoint which then can be mapped onto a free energy diagram and, finally, free energies are obtained for previously incalculable chemical reactions.

Figure 22 Schematic representation of the thermodynamic cycle for TI calculations by means of a ligand (A, B) binding to a receptor R. Ligand A is converted into ligand B, both in solution and in complex to a receptor. Transformation energies are physically not interpretable, but can be gained by the difference of the free binding energies of both ligands to the receptor.

64 2.7 In silico analyses using protein structures and sequences

1 ∂ (λ ) ∆ 0 = V λ ()()()λ = λ + []− λ ⋅ Equation 28 GTI d and V f V0 1 f V1 ∫0 ∂λ λ

with V0 potential of the initial state V1 potential of the final state λ instant moment of the MD simulation (snapshot)

Another way to obtain free energy values on the basis of MD simulations is either the more detailed Molecular Mechanics/Poisson-Boltzmann Surface Area (MM-PBSA) or the faster Molecular Mechanics/Generalized-Born Surface Area (MM-PGSA) calculations. These are mainly used to analyze the affinities of small molecules binding to target proteins (Gilson and Zhou, 2007; Steinbrecher and Labahn, 2010). In detail, the MM term describes the internal energy of the system, the PB or GB term the electrostatic contribution to solvation, and the SA term the nonpolar contribution to solvation (sovent accessible surface area). In fact, MM-PBSA and MM-GBSA calculations are a combination of MD simulations and scoring schemes to calculate the free energy of a query compound binding to a target structure. MD simulations of the free ligand, free protein, and their complexes are used to calculate the average potential and solvation energies with an equation which is used in most physics-based scoring functions (Equation 29). Therein, the change in configurational entropy is defined as the change of the entropy associated with ligand and protein motion. One fact that has to be considered, and is still intensively discussed, is the weighting of entropy-energy correlation as in theory the binding free energy includes a loss of configurational entropy due to the decrease in freedom of the ligand and the protein upon binding. Usually, the loss is nearly as large as the gain in favorable binding energy.

∆ 0 = − − + − − − ∆ o Equation 29 G U PL U P U L WPL WP WL T Sconfig

with UX Boltzmann-averaged potential energy of protein (P), ligand (L) or complex (PL) WX Boltzmann-averaged solvation energy of protein (P), ligand (L) or complex (PL) T absolute temperature ∆ o Sconfig change in configurational entropy

Either with the PBSA or GBSA implicit solvent model, snapshots ( λ) of MD simulations are used to compute their potential energies Uλ with the empirical force field and their solvation

65 2 EXPERIMENTAL PROCEDURES energies Wλ. After averaging over each trajectory of the MD simulations, and with Equation 29, the changes in mean potential and solvation energy can be calculated.

2.7.4 Conformational analysis of small molecules

In most of the cases, chemical molecules are not a rigid scaffold of atoms but its structures are constantly moving from one conformation to another. Therefore, prior to virtual screening experiments (Section 2.7.5), an ensemble of conformations has to be calculated for each compound, and optimized geometries generated. Ideally, this ensemble compromises all low energy conformers which are accessible by the molecule. It has been shown that conformational analysis of compounds as constituents of virtual libraries is a crucial step in all three-dimensional structure based virtual screening experiments (Chen and Foloppe, 2008). Both quantum mechanics (QM) and molecular mechanics (MM) calculations are used for conformational analysis and geometry optimization. The former method is more time-consuming, but also much more precise. The purpose of both is to provide adequate coverage of the energy landscape and diversity represented among the conformational models of the compound under consideration (Smellie et al., 1995a; Smellie et al., 1995b). Ab initio and semi-empirical quantum mechanics describe molecules in terms of nuclei and electrons and calculate wave functions to predict their geometric, electronic, and energetic properties (Raha et al., 2007). The faster and more often used approach is the method of molecular mechanics, whereby steric energies are calculated based on simple functions. These functions are derived from experimental data or QM calculations. In general, the entire steric energy of a molecule can be described by bond-stretching, angle-bending, torsional, van der Waals, and electrostatic energy terms. Together with empirically derived constants, they represent the so-called force field which determines the strain energy during the search for optimized geometry or a set of low energy conformations. Several molecular mechanics search techniques exist, whereby the most common are the following: randomly generating conformers (e.g. Monte Carlo algorithm), changing torsion angles systemically, and molecular dynamics that determine new atom positions for defined time intervals during the movement of molecules (Vieth et al., 1998). During this work, both methods were applied, but mostly MM methods were used for conformational analysis. For this, the CATALYST module ConFirm (Accelrys, version 4.11, 2006) was used, which randomly generates a set of conformers via the Monte Carlo technique (Kirchmair et al., 2005). Two methods are available, called FAST and BEST. The former is generally preferred for generating large compound databases, whereas the latter method is used for higher quality analysis, and is more suitable for highly complex flexible ring systems. Both methods start with an

66 2.7 In silico analyses using protein structures and sequences energy-optimized three-dimensional structure in which torsion angles and cartesian coordinates are randomly changed. The next step is energy minimization using a modified CHARM force field (Zaki and Hsiao, 1999), resulting in a conformer which is then compared to the others. The new conformer is only kept if structurally distinct from them and if the internal strain energy lies within an energy threshold. Additionally, conformational variation is increased by an internal POLE algorithm (Smellie et al., 1994) which restricts conformations that are similar to previous ones. Thus, a fast and exhaustive coverage of conformational space is provided (Smellie et al., 1995a; Smellie et al., 1995b). The energy term used in the minimization steps is given in Equation 30:

2/1       ()− 2    ∑ d j dij   =  = 1 =  j 1  Equation 30 Fpole Wpole and Di   ∑ ()N N i Di  d     

with Di RMS deviation between the poling distances of dj and dij dj current conformation being poled dji the ith previous conformation being poled Nd number of poling distances Wpole scaling factor

2.7.5 Virtual screening

In this work, libraries were either screened with a pharmacophore model (Section 2.7.1) or with an algorithm based on three-dimensional shape and electrostatic potential. Before the virtual screening procedure, compounds were prepared by conformational analysis (Section 2.7.4) and libraries were designed.

Virtual library design

Virtual libraries are used for screening of a query model, but also for the evaluation of pharmacophore models (Section 2.7.1) prior to the actual screening procedure. In this work, all large-scaled compound libraries were already provided and just the smaller test and training sets had to be generated. The key challenge in test and training set design is to find biologically inactive compounds in the literature that are structurally similar to known active substances. These decoys are needed for an accurate evaluation of virtual screening performance in order to compare the discriminatory power of different virtual screening workflows with respect to a specific problem. In a common approach, decoys are randomly generated by computer programs, as it is in fact quite unlikely to find biologically active compounds in these databases (Edwards et al., 2005; Kirchmair

67 2 EXPERIMENTAL PROCEDURES et al., 2007). Another possibility arises in cases where many structurally similar substances fail in activity tests, and this data can then included in a further step of evaluation. The goal of the procedure is then to create a virtual screening workflow which is able to identify active molecules within a database of structurally similar decoys. Random libraries including structural diverse compounds lead to less reliable enrichments than a focused library of decoys which are structurally related to active molecules (Kirchmair et al., 2008). Further validation of the virtual screening protocol was done by screening the commercial database of the Derwent World Drug Index (WDI), a library of more than 60'000 compounds that includes information on their biological activity. It is the world’ most comprehensive datavase of enhanced patent documents. Hence, it can be used as test set for a pharmacophore model, whereby the number of actually found active/inactive compounds defines the quality of the model.

Virtual screening with pharmacophore models

There are different platforms for virtual screening with pharmacophore models. In this work, the screeing platform of CATALYST (Accelrys, version 4.11, 2006) was used throughout. As the pharmacophore models (Section 2.7.1) had been generated with LIGANDSCOUT (Inte:ligand, version 1.01, 2006) (Wolber and Langer, 2005) the resulting file had to be converted to a CATALYST-readable file prior the screening experiments. The screening in CATALYST can be performed either with the Fast Flexible or the Best Flexible search, whereby both methods are based on the HipHop algorithm (Greene et al., 1994). As before, the former technique affords less computational energy and is used for the screening of large-scaled databases. The simplification consists of fitting the compounds to the pharmacophore model using a precalculated set of conformational models without additionally minimizing them. In the Best Flexible search the minimization step is included, and more accurate results are attained. As a result, fit values are calculated, taking feature mapping and the corresponding feature weight into account (Equation 31). They can assume values between 0 and 1, the latter corresponding to a compound whose chemical functionality entirely fits on the feature center.

 2   D  Equation 31 Fit = W 1−    ∑ f  ∑ T     f  

with Wf feature weight D displacement from feature center Tf feature tolerance

68 2.7 In silico analyses using protein structures and sequences

Three-dimensional shape and electrostatic similarity screening

Another common approach besides virtual screening with pharmacophore models is similarity searching using three-dimensional shape and electrostatic potential. Generally, this method is used in the absence of structural data for the corresponding pharmacological target, but these search algorithms are also applied as an additional filter on hit lists obtained from screening with pharmacophore models, as done in this work. The basis is an energetically minimized three- dimensional structure of a small molecule, usually a known inhibitor and in the present case the RO-inhibitor structure. Here, the ROCS algorithm (Rapid Overlay of Chemical Structures, version 2.2, OpenEye Scientific Software, Inc. 2006) was used, which overlays each molecule of a compound database onto a query (reference) molecule by a volume-overlap maximization technique, whereby the molecular volume is represented by Gaussian terms. Using the default settings, the ROCS results are ranked by the so-called ComboScore, which is based on a combination of Shape Tanimoto coefficient T(A,B ) (Equation 32) and a scaled color score. The latter value is based on the chemical force field which measures the chemical complimentarity and refines shape-based superpositions based on chemical similarity.

O Equation 32 T ()A, B = A,B + − I A I B OA,B

with OA,B volume overlap between conformer A and B IX conformer A or B volume

The combination of virtual screening of a pharmacophore model followed by refinement with a shape-based screening technique is a frequently used method, and it has been shown that results from this procedure are much more reliable (Leach et al., 2010). Not only is the presence of key binding elements used for scoring, but also the degree of overall shape complementarity to the active site, which significantly reduces the rate of the false positives. Another common way to introduce spacial information is to implement an ensemble of excluded volumes (Figure 23) (Rella et al., 2006).

69 2 EXPERIMENTAL PROCEDURES

Figure 23 Illustration of how a three-dimensional pharmacophore model can be enhanced by the addition of shape and excluded volume information (Rella et al., 2006). From a bound ligand (1.) the structure-activity information is derived and a pharmacophore model is generated (2.). The addition of shape information (3.) and exclusion volume spheres (4.) enhances the quality of the model for the specific target.

2.7.6 Sequence alignments

Alignment procedures are separated into pairwise (comparison of only two sequences) and multiple sequence alignments (MSA) which are now able to deal with hundreds of sequences at one time (Do and Katoh, 2008). There are two types of algorithms that are used in MSA, optimal and heuristics, with the latter method mainly used as it is faster and more accurate. In contrast, using the optimal algorithm is only feasible when dealing with a few sequences, as search space and computing time increases exponentially with the number and length of protein sequences. The predominant strategy for protein MSA is the so-called progressive alignment. It starts with the most similar pair of sequences and consecutively adds the less similar sequences to the growing alignment. Throughout this work, the heuristic algorithm of CLUSTALW2 (Larkin et al., 2007) was used to generate MSA.

2.7.7 Computational phylogenetic analysis

Computational phylogenetic analyses, the results of which are commonly represented by a so- called phylogenetic tree, represent the evolutionary history of a set of protein sequences (Goldman and Yang, 2008). In the simplest case, MSA (Section 2.7.6) produces a set of aligned protein sequences on which computational phylogenetic analysis can be performed. However, the above- mentioned progressive multiple alignment methods automatically produce a phylogenetic tree as part of their function, as they incorporate new sequences into the alignment according to genetic distance. This method, called neighbor-joining, is a distance-matrix technique which is relatively simple and produces unrooted phylogenetic trees, which means that the distances from the root to 70 2.7 In silico analyses using protein structures and sequences every branch tip are equal. In this work, phylogenetic trees were calculated with CLUSTALW2 (Larkin et al., 2007), whereby the analysis were based on the neighbor-joining method of Saitou and Nei (Saitou and Nei, 1987).

71 3 RESULTS

3 Results

3.1 Crystal structure of the BaP1*inhibitor complex

3.1.1 BaP1 purification

Purification scheme

BaP1 was purified following established methods apart from an additional purification step (Gutierrez et al., 1995; Rucavado et al., 1998). After the lyophilisation for transportation, the protein was purified via gel permeation chromatography. The purification procedure is described elsewhere (Section 1.1) and depicted in Figure 24. Thereby, 45 to 100 mg of purfified BaP1 could be obtained from 1 g of Bothrops asper venom.

Lyophilized venom of Bothrops asper snakes in IEC-buffer A

Cation exchange chromatography with CM Sephadex C-25 column

Affinity chromatography with Affi-gel blue column

Washing and lyophilization of the purified protein

Gel permeation chromatography with Superdex-200 26/60 prep grade column

Crystallization and protein- analytical methods

Figure 24 Flow chart of BaP1 purification.

Cation exchange chromatography

The initial step of purification was a cation exchange chromatography with the column material CM Sephadex C-25 (Section 2.2.2). A chromatogram with three baseline-separated peaks of the main components could be received. Thereby, the maxima were positioned at elution volumes of 202, 301, and 463 mL, respectively. BaP1 could be associated with the first peak at 202 mL (Figure

72 3.1 Crystal structure of the BaP1*inhibitor complex

25). Corresponding fractions were examined with SDS-PAGE concerning quantity and purity of BaP1 (Figure 26) and then pooled and concentrated to the desired protein concentration (Section 2.2.3).

Figure 25 Chromatogram of BaP1 purification by cation exchange chromatography using a CM Sephadex C-25 column. With IEC-buffer A and B a salt gradient was applied. Solid lines, absorption at 254 nm (black) and 280 nm (grey); dotted line, conductivity; dashed line, applied salt gradient. Fractions pooled and used for subsequent purification are marked by a line.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 kDa

94

67

43 30

BaP1 20

Figure 26 SDS-PAGE of BaP1 purification by cation exchange chromatography using a CM Sephadex C-25 column. 15 µL were loaded in each lane.

Lane 1: Fraction 46 (178-182 mL) Lane 9: Fraction 54 (210-214mL) Lane 2: Fraction 47 (182-186 mL) Lane 10: Fraction 55 (214-218 mL) Lane 3: Fraction 48 (186-190 mL) Lane 11: Fraction 56 (218-222 mL) Lane 4: Fraction 49 (190-194 mL) Lane 12: Fraction 57 (222-226 mL) Lane 5: Fraction 50 (194-198 mL) Lane 13: Fraction 58 (226-230 mL) Lane 6: Fraction 51 (198-202 mL) Lane 14: Fraction 59 (230-234 mL) Lane 7: Fraction 52 (202-206 mL) Lane 15: LMW standard, 10 µg Lane 8: Fraction 53 (206-210 mL)

73 3 RESULTS

Affinity chromatography

The next step in purificating BaP1 was affinity chromatography with the column material Affi- gel blue (Section 2.2.4). Thereby, a chromatogram with two peaks was obtained. These were baseline-separated and appeared at approximately 27 and 225 mL elution volume (Figure 27). The first peak is associated with all those proteins eluted with void volume of the column. BaP1 could be associated with the second peak which was not totally baseline-separated from the third peak (not shown in Figure 27). Corresponding fractions were examined with SDS-PAGE concerning quantity and purity of BaP1 (Figure 28) and were then pooled and concentrated.

Figure 27 Chromatogram of BaP1 purification by affinity chromatography using a Affi-gel blue column. With AfC-buffer a salt gradient was applied. Solid line, absorption at 280 nm; dashed line, applied salt gradient. Fractions pooled and used for subsequent purification are marked by a line.

74 3.1 Crystal structure of the BaP1*inhibitor complex

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 kDa

94

67

43

30

BaP1 20

Figure 28 SDS-PAGE of BaP1 purification by affinity chromatography using a Affi-gel blue column. 15 µL were loaded in each lane.

Lane 1: Fraction 4 (182-187 mL) Lane 9: Fraction 12 (222-227 mL) Lane 2: Fraction 5 (187-192 mL) Lane 10: Fraction 13 (227-232 mL) Lane 3: Fraction 6 (192-197 mL) Lane 11: Fraction 14 (232-237 mL) Lane 4: Fraction 7 (197-202 mL) Lane 12: Fraction 15 (237-242 mL) Lane 5: Fraction 8 (202-207 mL) Lane 13: Fraction 16 (242-247 mL) Lane 6: Fraction 9 (207-212 mL) Lane 14: Fraction 17 (247-252 mL) Lane 7: Fraction 10 (212-217 mL) Lane 15: LMW standard, 10 µg Lane 8: Fraction 11 (217-222 mL)

Gel permeation chromatography

An additional purification step was introduced to the established procedure because of unsuccessful crystallization experiments. The gel permeation chromatography was done with a Superdex-200 26/60 prep grade column under conditions mentioned above (Section 2.2.5). In Figure 29 the corresponding chromatogram is shown. As only one huge peak at absorption (280 nm) is observed and also no unspecific protein aggregates are found, which would appear as peak in the void volume or at higher molecular masses, the association of the present peak to the monomer of BaP1 is clear. Adittionally, this was confirmed by the calibration curve of the used column (Appendix 7.1) as a peak at 252 mL of the elution volume corresponds to a molecular mass of approx. 23 kDa. This supports previous results that BaP1 in solution is a monomeric protein (Gutierrez et al., 1995). Afterwards, corresponding fractions were examined with SDS-PAGE concerning quantity and purity of BaP1 (Figure 30), pooled and concentrated for crystallization experiments or proteolytic activity testing.

75 3 RESULTS

Figure 29 Chromatogram of BaP1 purification by gel permeation chromatography using a Superdex- 200 26/60 prep grade column. GPC-buffer was used. Solid black line, absorption at 280 nm. Fractions pooled are marked by a line.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 kDa

94

67

43

30

BaP1 20

Figure 30 SDS-PAGE of BaP1 purification by gel permeation chromatography using a Superdex-200 26/60 prep grade column. 15 µL were loaded in each lane.

Lane 1: Fraction 59 (232-236 mL) Lane 9: Fraction 67 (264-268 mL) Lane 2: Fraction 60 (236-240 mL) Lane 10: Fraction 68 (268-272 mL) Lane 3: Fraction 61 (240-244 mL) Lane 11: Fraction 69 (272-276 mL) Lane 4: Fraction 62 (244-248 mL) Lane 12: Fraction 70 (276-280 mL) Lane 5: Fraction 63 (248-252 mL) Lane 13: Fraction 71 (280-284 mL) Lane 6: Fraction 64 (252-256 mL) Lane 14: Fraction 72 (284-288 mL) Lane 7: Fraction 65 (256-260 mL) Lane 15: LMW standard, 10 µg Lane 8: Fraction 66 (260-264 mL)

3.1.2 Proteolytic activity of BaP1 and protease inhibition assay

Activity testing was performed before and after lyophilisation of the protein. The assays of the proteolytic activity testing of BaP1 were arranged as described in Section 2.3.3. Every purified

76 3.1 Crystal structure of the BaP1*inhibitor complex sample of BaP1 was tested for evaluation of purity and integrity of the protein. Besides, the effect of the RO-inhibitor (Figure 12) on the proteolytic activity was analyzed, performing several test series. A significant inhibitory effect of the hydroxamate derivative was shown on the proteolytic activity of BaP1. However, no conclusion about the Michaelis-Menten kinetics can be drawn, because the assay is based on the unselective cleavage of azocasein. Instead, fitting of the data in a sigmoidal dose-response curve (Figure 31) allowed non-linear regression and statistical analysis with PRISM 5.0 (GraphPad Software, Inc.). This led to a calculated IC 50 value of 12.9 µM (95% confidence interval: 12.4 to 13.3 µM).

100

80

60

40 relative activity [%] activity relative

20

0

1 10 100 concentration [µM]

Figure 31 Inhibitory effect of the hydroxamate derivative on the proteolytic activity of BaP1. Sigmoidal dose-response curve as semi-log plot of the inhibitor concentration and the relative inhibition of the proteolytic activity of BaP1. Activity testing was determined on the basis of unselective cleavage of azocasein. Statistical analysis and non-linear regression were performed with PRISM 5.0 (GraphPad Software, Inc.) and led to an IC 50 value of 12.9 µM (95% confidence interval: 12.4-13.3 µM). Data are expressed as mean ± standard deviations of the relative inhibition.

3.1.3 Initial crystallization of native BaP1 and with the inhibitor

Crystallization of native BaP1

At the beginning of this work, trials were undertaken to reproduce the established crystallization conditions of native BaP1 without bound inhibitor or substrate (Watanabe et al., 2002). Formerly, BaP1 was crystallized by the hanging-drop vapor-diffusion technique in a 0.1 M bicine buffer solution containing 10% PEG-20k and 2% (v/v) dioxane at a pH value of 9.0. Several crystallization boxes were setup, whereby the precipitant (8 to 13% of PEG-20k) and the protein concentration (6 to 14 mg/mL) as well as the pH value were varied (8.0 to 9.0). All experiments were performed as described in Section 2.4.3. Nevertheless, no crystals were obtained in any of the

77 3 RESULTS combinations. Therefore, a series of crystallization screens for the search of initial crystal conditions were performed and the crystallization behaviour of native BaP1 was analyzed. In most of the cases, a high degree of precipitation was observed during the mixing of reservoir and protein solution (50% presaturation) but only at six different conditions crystals of different morphology could be obtained. The successful crystallization conditions are listed in Table 4 and the corresponding crystals of native BaP1 are shown in Figure 32 and in Figure 33.

Table 4 Successful initial crystallization conditions of native BaP1. Precipitant Buffer Additive Salt pH Screen, condition # (w/v) [0.1 M] (v/v) [0.2 M] value MPD Na/K Wizard TM I, #43 (35 %; v/v) phosphate - - 7.0

TM ethanol Wizard cryo , #7 (40 %; v/v) Tris - - 7.0

TM PEG-8k Wizard II, #46 (10 %) imidazol - Ca(OAc) 7.9 PEG-4k iso -propanol JBScreen ® 3, #3 (10 %) HEPES (5 %) - 7.5

TM MPD Crystal Screen II, #43 (50 %; v/v) Tris - (NH 3)2HPO 4 8.5

® PEG-8k JBScreen 5, #7 (18 %) HEPES - Ca(OAc) 7.5

A B C

Figure 32 A Needle-like crystals of native BaP1, ~80 µm lengths; reservoir: 100 mM HEPES pH 7.5, 18% (w/v) PEG-8k, 200 mM calcium acetate; drop: 7.8 mg/mL BaP1, 50% presaturation.

B Crystals of native BaP1 (form A), 20 µm x 40 µm x n.d.; reservoir: 100 mM Na/K phosphate pH 7.0, 35% (v/v) MPD; drop: 7.8 mg/mL BaP1, 50% presaturation.

C Crystals of native BaP1 (form A), 20 µm x 20 µm x n.d.; reservoir: 100 mM Tris pH 7.0, 40% (v/v) ethanol ; drop: 7.8 mg/mL BaP1, 50% presaturation.

78 3.1 Crystal structure of the BaP1*inhibitor complex

A B C

Figure 33 A Crystals of native BaP1 (form B), 10 µm x 10 µm x 10 µm; reservoir: 100 mM HEPES pH 7.5, 10% (w/v) PEG-4k, 5% (v/v) iso -propanol; drop: 7.8 mg/mL BaP1, 50% presaturation.

B Crystal of native BaP1 (form A), 20 µm x 80 µm x 20 µm; reservoir: 100 mM Tris pH 8.5, 50% (v/v) MPD, 200 mM (NH 3)2HPO 4; drop: 7.8 mg/mL BaP1, 50% presaturation.

C Crystal of native BaP1 (form C), 20 µm x 20 µm x 20 µm; reservoir: 100 mM imidazol pH 8.0, 10% (w/v) PEG-8k, 200 mM calcium acetate; drop: 7.8 mg/mL BaP1, 50% presaturation.

During these initial crystallization experiments, four different crystal forms of BaP1 could be assigned, which were the following: needles (Figure 32A), hexagonal crystals (form A, Figure 32B), oval (drop-like) crystals (form B, Figure 33A), and cubic crystals (form C, Figure 33C). The rest of obtained crystals in the successful crystallization conditions (Figure 32C, Figure 33B) could not clearly be assigned because of the crystal size. Most likely, they were of the hexagonal crystal form A. Due to size and quality reasons (needle-like crystals, Figure 32A), all the obtained crystals were not suitable for X-ray analysis and further refinement had to be arranged. Variations concerning pH value, concentration of the precipitant as well as the salt were set up (Table 5). With the hanging-drop method (Section 2.4.3) the reproduction was not successful. Only in four cases, crystals were obtained and these were still not appropriate for crystallographic analyses. The resulting crystals are shown in Figure 34.

79 3 RESULTS

Table 5 Parameters of the crystallisation refinements. Box Precipitant Salt Buffer [0.1 M] b

c Na/K phosphate, VF002 20-45% MPD - pH 6.1, 6.4, 6.6, 6.9 VF003 5-30% PEG-4kd - HEPES, pH 6.5-8.0 c VF004 20-45% ethanol c - Tris, pH 6.5-8.0 c VF005 35-60% MPD 0.2 M (NH 3)2HPO 4 Tris, pH 7.5-9.0 VF006 5-30% PEG-8kd 0.2 M Ca(OAc) imidazol, pH 7.0-8.0 VF007 5-30% PEG-8kd 0.2 M Ca(OAc) HEPES, pH 6.5-8.0 a Variation of the concentration in steps of 5%. b Variation of the pH values in steps of 0.5, if not stated otherwise. c Concentration in (v/v). d Concentration in (w/v).

A B C D

Figure 34 A Needle-like crystal clusters of native BaP1, ~50 µm length; reservoir: 100 mM HEPES pH 6.5, 30% (w/v) PEG-8k, 200 mM calcium acetate; drop: 8.1 mg/mL BaP1, 50% presaturation.

B Needle crystal clusters of native BaP1, 20 µm x 200 µm x n.d.; reservoir: 100 mM imidazol pH 7.0, 30% (w/v) PEG-8k, 200 mM calcium acetate; drop: 8.1 mg/mL BaP1, 50% presaturation.

C Needle crystal clusters of native BaP1, 180 µm x n.d. x n.d.; reservoir: 100 mM imidazol pH 7.5, 30% (w/v) PEG-8k, 200 mM calcium acetate; drop: 8.1 mg/mL BaP1, 50% presaturation.

D Needle crystal clusters of native BaP1, ~120 µm length; reservoir: 100 mM Tris pH 8.0, 60% (v/v) MPD, 200 mM (NH 3)2HPO 4; drop: 8.1 mg/mL BaP1, 50% presaturation.

With further refinement of the conditions, a slight improvement of crystal quality could be obtained. Thereby, the habitus of the needle-like crystals could be changed in a way that another dimension grew in size. Accordingly, rodlike crysals were obtained. The improved crystals are shown in Figure 35, but these crystals also did not fulfill the requirement for crystallographic analyis of at least a sufficient size in two dimensions.

80 3.1 Crystal structure of the BaP1*inhibitor complex

A B

Figure 35 A Needle- and rodlike crystals of native Bap1, 20 µm x 100 µm x n.d.; reservoir: 100 mM imidazol pH 7.0, 22.5% (w/v) PEG-8k, drop: 12.0 mg/mL BaP1, 50% presaturation.

B Overgrown, rodlike crystals of native Bap1, 15 µm x 200 µm, n.d. reservoir: 100 mM Tris pH 7.5, 57.5% (v/v) MPD, 200 mM (NH 3)2HPO 4; drop: 12.0 mg/mL BaP1, 50% presaturation.

None of the crystallization experiments with native BaP1 led to suitable crystals for X-ray analysis. Also, reproducing the known crystallization conditions was not successful. Reasons therefore could be contaminations in the solution, which still existed after the gel permeation purification. Another common problem in protein crystallization is the dynamic formation of unspecific aggregates in solution which efficiently inhibit crystal nucleation.

Cocrystallization of native BaP1 with inhibitor

As proteins in some cases crystallize better in complex with substrates or inhibitors, further initial crystallization experiments were arranged. The reason therefore is that the small molecules promote a capture of the protein in a rigid conformation because of significant energetic effect of interaction which leads to considerably better arrangements of the protein molecules during the crystallization process. To take advantage of this effect, initial crystallization screens of BaP1 in the presence of the inhibitor were set up. As it was the case in crystallization experiments with native BaP1, only needle-like crystals could be obtained. Indeed, these were larger in two dimensions but still not large enough for data collection or initial X-ray experiments (not shown).

3.1.4 Refinement of crystallization experiments by the use of seeding methods

As described before (Section 2.4.2), a common method to improve crystal quality is the use of seeding methods. Mainly, it was worked with the streak seeding and with the cross seeding technique. The protein concentration in the crystallization drop, consciously, was held lower than in the initial crystallization experiments because during seeding the protein tends to precipitate in a non-crystalline manner.

81 3 RESULTS

Crystallization of native BaP1

In a first attempt, the potential of streak seeding to influence the crystal habitus was analyzed. Per optical criteria, crystals with best quality were chosen and the corresponding crystal drop was used as stock solution. The horse-tail hair was moved through the old and the new drop as described in Section 2.4.3. By these means, both initial crystallization experiments as well as refinements of stock solutions were set up. The crystals which were used for stock solutions are shown in Figure 35 (page 81). Streak seeding showed positive influence on crystal nucleation, especially in case of initial crystallization experiments. Hence, all newly set up screens showed significantly more crystals and even qualitatively better crystals were already obtained. Most impressive was the effect in the crystallization experiments with the Wizard TM I+II screen (Emerald BioSystems). Thereby, 24 of the 96 screening conditions (25%) led to crystals while the same crystallization screen led to only 3 successful conditions (3%) without seeding (Section 3.1.3, Table 4). In general, despite the lower protein concentration there were more precipitation observable in the drops than before in crystallization experiments without seeding. Most of the crystals were needle-like as before, but a few promising crystal forms were also obtained. All of them could be assigned to the three formerly described crystal forms A, B, and C, respectively. The corresponding crystallisation conditions are listed in Table 23 and Table 22 (Appendix 7.1) and a selection of crystals is shown in Figure 36.

A B C

Figure 36 A Crystal of native BaP1 (form A), 30 µm x 180 µm x 200 µm; reservoir: 100 mM Tris pH 8.5, 20% (w/v) PEG-1k; drop: 5.3 mg/mL BaP1, 50% presaturation.

B Crystal of native BaP1 (form A), 40 µm x 200 µm x 300 µm; reservoir: 100 mM imidazol pH 8.0, 10% (w/v) PEG-8k; drop: 5.3 mg/mL BaP1, 50% presaturation.

C Crystal of native BaP1 (form B), different in size; reservoir: 100 mM imidazol pH 8.0, 1.0 M Na/K tartrate, 200 mM NaCl; drop: 5.3 mg/mL BaP1, 50% presaturation.

82 3.1 Crystal structure of the BaP1*inhibitor complex

The obtained crystals (Figure 36) itself were used as stock solutions for all further crystallization experiments with seeding.

Cocrystallization of native BaP1 with the inhibitor

With the method of cross seeding (Section 2.4.3) crystals of the BaP*inhibitor complex were obtained which were suitable for X-ray analysis. The first diffraction tests at the home X-ray source (Section 2.5.5) were already promising. As the scattering quality is strongly dependent on the size of crystals, it was further tried to vary the touching of the horse-tail hair into the new drop. Thereby, crystal growing behavior could be changed from a high amount of small crystals towards a few larger crystals in one drop. Therefore, the hair was only briefly put in one spot into the new drop. Furthermore, a series of refinement experiments were arranged in order to cover a wide range of pH values. Several, high quality crystals were obtained in this way and, finally, three very good datasets could be measured of BaP1*inhibitor cocrystals which were all belonging to the crystal form C. The successful crystal conditions were of different compositions and, most importantly, at different pH values. One notably in the acidic range (pH 4.6) and two of them (6.5 and 7.5) near to the physiological pH value (7.4) (Gutierrez et al., 1995). The corresponding crystals are shown in Figure 37. The crystals appeared one day after the setting up and grew to their final size in 4 to 10 days prior to freezing and transferring them to BESSY for data collection.

A B C

Figure 37 A BaP1*inhibitor cocrystal of form C, 30 µm x 80 µm x 80 µm; reservoir: 100 mM HEPES pH 7.5, 20% (w/v) PEG-3k, 200 mM NaCl; drop: 6.0 mg/mL BaP1, RO- inhibitor, 50% presaturation.

B BaP1*inhibitor cocrystal of form C, different in size; reservoir: 100 mM Na(OAc) pH 4.6, 20% (w/v) PEG-4k, 200 mM (NH 4)Oac; drop: 10.0 mg/mL BaP1, RO- Inhibitor, 50% presaturation.

C BaP1*inhibitor cocrystal of form C, 20 µm x 80 µm x 80 µm; reservoir: 100 mM Tris pH 6.5, 30% (w/v) PEG-20k; drop: 5.0 mg/mL BaP1, RO-Inhibitor, 50% presaturation.

83 3 RESULTS

Soaking experiments

Besides the cocrystallization experiments, crystallographically suitable crystals of native BaP1 were soaked in a buffer solution containing the inhibitor (Section 2.4.3). During the process of soaking, the crystals did not show any visible disturbances or modifications at their surface. Also, morphology and habitus remained the same. For this process, a high-quality crystal of native BaP1 was chosen which was originally obtained by the use of streak seeding for initial crystallization experiments. Crystallization was done at a pH value (8.0) slightly shifted towards to basic pH values compared to the physiological one (7.4). The corresponding crystal is shown in Figure 36B. In this case, the crystal of native BaP1 was soaked for 48 h in a drop containing the crystallization conditions with added glycerol (Section 2.4.4) and the inhibitor in a concentration of 4 mM.

3.1.5 Structure determination and refinement of the BaP1*inhibitor complex

Data collection

Initial diffraction tests were performed at the home source (Section 2.5.5) with the highest- quality crystals diffracting to 2.8 Å resolution. Sufficient frames were collected to perform an initial space group determination. It became apparent that the cubic crystals were reflected by the orthorhombic symmetry of the space group P2 12121 (No. 19). This was also the spacegroup in which the native protein was crystallized (PDB code: 1ND1) (Watanabe et al., 2002). For cryoprotection, crystals were transferred into drops containing the corresponding reservoir buffer with additional 15% (v/v) glycerol and then flash frozen to 100 K. Complete datasets of five crystals in different crystallization conditions (Table 6) were collected with synchrotron radiation at BESSY, whereby two crystals were measured in the most acidic condition (first line in Table 6). Further on, only the better diffracting was used for structure solution and subsequent analyses. All diffraction data were processed with XDS and scaled with XSCALE (Section 2.5.7).

Table 6 Crystallization conditions of the measured datasets. Dataset Precipitant Buffer [0.1 M] Salt [0.2 M] pH value

I 20 % (w/v) PEG-4k Na(Oac) (NH 4)OAc 4.6 II 30 % (w/v) PEG-20k Tris - 6.5 III 20% (w/v) PEG-3k HEPES NaCl 7.5 IV 10 % (w/v) PEG-8k Imidazol - 8.0

Indexing was straightforward and indicated an orthorhombic primitive Bravais lattice. After integration, comparison of merging R values in different space groups showed the presence of three

2-fold screw axis leading to the space group P2 12121 (No. 19) for crystal form C. The value of the 84 3.1 Crystal structure of the BaP1*inhibitor complex packing parameter V M corresponds to one BaP1 molecule per asymmetric unit. All four crystals showed the same symmetry (Table 7).

Table 7 Data collection statistics for datasets I to IV (crystal form C) collected at synchrotron radiation of BESSY. Dataset I II III IV

Spacegroup P2 12121 P2 12121 P2 12121 P2 12121 Wavelength [Å] 0.91841 0.91841 0.91841 0.91841 Resolution range [Å] a 20.0 - 1.14 19.3 - 1.46 20.0 - 1.05 20.0 - 1.08 Cell parameters a, b, c 38.1,(1.21 59.6, - 1.14) 82.7 38.0,(1.55 59.5, - 1.46) 81.8 37.9,(1.11 59.8, - 1.05) 83.3 37.9,(1.11 59.8, - 1.08) 83.1 Unique reflections a 68'715 (10'292) 32'970 (4'755) 84'722 (10'797) 77'288 (4'106) Completeness [%] a 98.7 (93.0) 98.4 (90.1) 95.0 (75.9) 94.6 (64.6) Multiplicity of 6.9 7.0 7.6 7.4 I/ σ(I) a 18.8 (5.4) 21.9 (6.7) 25.2 (7.8) 17.6 (4.2) a Rmeas [%] 7.9 (38.0) 6.7 (28.0) 5.6 (24.1) 7.2 (38.1) a Rsym [%] 7.3 (34.4) 6.2 (25.4) 5.2 (21.7) 6.7 (33.4) Wilson-B [Å2]b 9.0 8.7 10.1 8.7 3 -1 VM [Å Da ] 8.3 8.1 8.3 8.3 Solvent content [%] 40.5 40.0 40.8 40.7 a Data in the highest resolution shell are given in parentheses. b From XSCALE.

For structure solution, the molecular replacement method was chosen because an ideal search model was already available with the crystal structure of native BaP1 (Watanabe et al., 2003).

Phase determination by molecular replacement, model building, and refinement

In case of dataset II, the phases were determined with MOLREP (Section 2.5.8) and the crystal structure of native BaP1 as search model. In all other cases, the phases were determined by molecular replacement procedure using the final model of dataset II as search model. Rigid body refinement of the initial models using one protomer per rigid group with REFMAC resulted in R factors of 30.8, 33.3, 26.0, and 29.1 % for dataset I, II, III, and IV ( Rfree 30.9, 34.0, 26.6, and 29.9 %), indicating the high quality of the data and a correct molecular replacement solution. All side chains were rebuilt using ARP/wARP. The variable loop region (see below) was rebuilt manually in the final step of refinement with REFMAC using medium-strength NCS restraints and one TLS group in case of dataset II. Due to the high data quality, most of the side chains were already correctly positioned in the initial electron density. Nevertheless, some residues, without exception at the protein surface, had to be manually corrected. Also, a considerable amount of side chains appeared to be positioned in different rotamers which resulted in model building of several half occupied side chains. In Table 8 the residues with atom modelled in two distinct conformations are listed.

85 3 RESULTS

Table 8 Residues in double conformation. Dataset Residues Ser4, Tyr7, Leu10, Thr20, Asn27, Arg30, Leu37, Asn38, Lys69, Lys73, Leu87, Ser91, Arg127, I Ser130, Cys159, Gly160, Ala161, Lys162, Glu177, Glu185, His193, Asn194, Glu196 II Leu10, Leu37, Asn38, Ser130, Cys159, Gly160, Ala161, Lys162, Ser175, Glu177 Ser4, Leu10, Thr20, Arg30, Leu37, Asn38, Lys69, Asp85, Leu87, Thr99, III Arg127, Ser130, His151, Cys159, Gly160, Ala161, Lys162, Glu185 Ser4, Leu10, Thr20, Arg30, Leu37, Asn38, Asp85, Leu87, Thr99, Val125, IV Arg127, His151, Cys159, Gly160, Ala161, Lys162, Ser175, Glu185

Further explanations in model building correspond to all four datasets. The presence of a Zn 2+ ion at the active site was indicated by a 10 to 11 σ peak in anomalous difference Fourier map. It was calculated with model phases, even though the wavelength for data collection was distant from the Fe/K edge. In contrast to the crystal stucture of native BaP1, no structural Zn 2+ or Ca 2+ ions could be found. The correct placement of all cysteinyl and methionine residues was also evaluated by significant peaks in an anomalous difference Fourier map. Besides the Zn 2+ ion, additional difference density was visible in the active site cleft in σA-weighted (F O-FC) maps. Already after the first rigid body refinement, the electron density for the inhibitor was continuous at a 3 σ contour level in (F O-FC) maps and allowed unambiguous placement of its scaffold (Figure 45, page 93). It was present in all four datasets and the manual placement was straightforward. Water molecules were placed automatically using ARP/wARP and manually checked with COOT for stereochemistry. All residues were clearly defined beginning with the modified pyroglutamate ring (residue no. 1) and ending with the proline residue (no. 202). Only long chained residues at the protein surface, mainly lysine and arginine residues, were slightly disorderd. During structure analysis, it became clear that residue 151, originally reported to be an aspartate due to poor electron density and poor sequencing data, is in fact a histidine residue. In all four crystal structures, an additional glycerol molecule from the cryoprotectant was identified near the active site, which is held into place by hydrogen bonds to the N ε-atoms of Arg110, a water molecule, and the inhibitor structure. In all models a loop segment existed in double conformation and was modelled as such. A significant drop of R factors could thereby be gained ( R factors, 0.2 - 0.6 %; Rfree factors, 0.1 - 0.5 %). It is comprised of 4 residues and is located directly before the Met-turn at positions 159 to 162. Interestingly, this loop should be usually rigid as it is enclosed by Cys159 and Cys164 which are forming one of the three disulfide bridges (Cys117-Cys197, Cys157-Cys181, Cys159-Cys164) present in BaP1. Exactly this region, which directly comes after the zinc-binding motif, has been suggested as being responsible for the hemorrhagic potential of SVMPs (Section 1.2.2) (Watanabe et al., 2003). Due to the high resolution data and, consequently, high redundancy of the collected 86 3.1 Crystal structure of the BaP1*inhibitor complex data (high number of observables), it was possible to increase the accurancy of the model by augmenting the number of the parameters. Datasets I, III, and IV were refined with anisotropic temperature factors (Section 2.5.11) which have to be described with much more parameters, but significantly improve the quality of the model description. The quality of a model refined with anistropic B factors can be indicated by the distribution of anisotropy and the agreement of ellipsoids across peptide C-N bonds. The corresponding plots were generated with the PARVATI (protein anisotropic validation and analysis tool) (Merritt, 1999) and are shown in Figure 39 and Figure 38. No outliers (spikes) are observed in the plot of the correlation coefficient values lower than 0.88 and the peaks of all three anisotropy distributions are located very near to the mean value (0.45) which is known to be the proper value for protein atoms. The final models consist of 202 residues which correspond to one molecule in the asymmetric unit and one chain per structure, respectively. Depending on the resolution and model quality several residues were built with half occupancy and therefore the number of protein atoms is differing between 1657 (dataset II) and 1747 (dataset I). Consequently, the number of water molecules also varies from 345 (dataset I) to 391 (dataset III), but all functionally and structurally important water molecules were observed at the same positions in all four models. The refinement statistics are given in Table 28 (Appendix 1.1).

Figure 38 Plot of the distribution of anisotropy by atom class for dataset I (solid line), III (dashed line), and IV (dashed and dotted line). Analysis have been performed with PARVATI (protein anisotropic validation and analysis tool) (Merritt, 1999). Typical distribution of anisotropy is represented by a Gauss distribution with a mean peak located at 0.45 for protein atoms.

87 3 RESULTS

Figure 39 Plot of the correlation factor CCuij against residues for dataset I (above), III (central), and IV (bottom). Analysis have been performed with PARVATI (protein anisotropic validation and analysis tool) (Merritt, 1999). The plot highlights bad joins between TLS groups. A bad join is where the C and N atom of the bond linking two TLS groups have very different ADPs (thermal ellipsoids). If the atoms are similar, their correlation coefficient CCuij is near 1. If there is a bad join, it will be shown as a downward spike in CCuij at that point.

88 3.1 Crystal structure of the BaP1*inhibitor complex

3.1.6 Validation of the BaP1*inhibitor complex models

Model quality

Global indicators of model quality were very good for all four models. The values of R and Rfree were equally good for the given resolution at 13.0% ( Rfree 16.0%) for the model of dataset I, 14.8%

(Rfree 17.8%) for dataset II, 11.7% ( Rfree 14.4%) for dataset III, and 11.8% ( Rfree 14.3%) for dataset IV. RMS deviations of bond distances and angles from ideality were also in the acceptable range (Table 28). Furthermore, all main chain torsion angles were in favored or allowed regions of the Ramachandran plots and not one single outlier was found (Figure 40).

A B

C D

Figure 40 Ramachandran diagrams for the models of dataset I (A), dataset II (B), dataset III (C), and dataset IV (D). Main chain torsion angles were analyzed with RAMPAGE (Lovell et al., 2003) and are indicated for each residue by crosses (glycines) or squares (all others). Dark and light background shading indicates energetically favorable and allowed regions, respectively. Four plots for each model are presented as following: top left, general plot for all non-glycine residues; top right, special plot showing torsion angles and energetically favorable regions for glycine residues; bottom, plot for pre-proline (left) and proline (right) residues. 89 3 RESULTS

Temperature factor analysis

By analyzing the mean B factors for all residues, the same tendencies and distribution of the four datasets are found (Figure 41). The location of the peaks in the B factor plot with respect to the amino acid sequence is largely the same in all four models, and usually coincides with loop regions between secondary structure elements. Even though for the refinement of dataset II isotropic B factors, and for the other datasets anisotropic, were used, the plot for the four models is virtually identical. Obviously, due to the lower resolution and different kind of refinement the peaks for dataset II turn out to be sligthly higher. Datasets I, III, and IV all show peaks at the same positions with similar amplitudes and indicate reasonable refined models.

Figure 41 B factor plot for the models of the BaP1*inhibitor complex. The mean B factors of dataset I (green), II (black), III (blue), and IV (red) are drawn from residue number 1 to residue number 202. Secondary structure elements are given below the graph as follows: β-strands, black lines; α-helices, light gray lines; 310 -helix, gray line.

3.1.7 Structure description of the BaP1*inhibitor complex

Secondary structure of BaP1

The overall structure of BaP1 is very similar to the eight other structurally known P-I SVMPs (Akao et al., 2010; Botos et al., 1996; Gomis-Ruth et al., 1993; Gomis-Ruth et al., 1994; Gomis- Ruth et al., 1998; Gong et al., 1998; Huang et al., 2002; Kumasaka et al., 1996; Lou et al., 2005; Pinto et al., 2006; Zhang et al., 1994; Zhu et al., 1999). Secondary structure elements were

90 3.1 Crystal structure of the BaP1*inhibitor complex determined with DSSP (dictionary of secondary structure of proteins). The complete assignment is shown in Figure 42.

β1 α1 α2 310        ERFSPRYIELAVVADHGIFTKYNSNLNTIRTRVHEMLNTVNGFYRSVDVH · · · · · · · · · · β2 α3 β3      APLANLEVWSKQDLIKVQKDSSKTLKSFGEWRERDLLPRISHDHAQLLTA · · · · · · · · · · β4 β5 α4       VVFDGNTIGRAYTGGMCDPRHSVGVVRDHSKNNLWVAVTMA HELG HNLGI · · · · · · · · · · α5     HHDTGSCSCGAKSCIMASVLSKVLSYEFSDCSQNQYETYLTNHNPQCILNKP · · · · · · · · · ·

Figure 42 Sequence and secondary structure of BaP1 ( β-strands, black lines; α-helices, light gray 2+ lines; 310 -helix, gray line). Every fifth residue is marked by a dot and the Zn -coordinating histidine residues are highlighted. The three disulfide bonds (Cys119-Cys197, Cys157- Cys181, Cys159-Cys164) are indicated by a gray line.

Overall fold

Overall, the two subdomains of BaP1 fold into a single globular domain of 45 Å diameter with the typical α/β-fold of P-I SVMPs (Figure 45). An overview of the topology is given in Figure 43. BaP1 consists of a major N-terminal (residues 1-152) and a minor C-terminal subdomain (residues 153-202) which flank the shallow active site cleft. The major domain adopts an α/β fold containing four α-helices ( α1, α2, α3, and α4) and a five stranded β-sheet ( β1, β2, β3, β4, and β5). All strands are parallel except the antiparallel strand β4. Additionally, a 3 10 -helix comes after the second helix (α2) in the N-terminal domain. The minor domain consists of one α-helix ( α5) and various loops, including the highly conserved Met-turn.

Figure 43 Topology plot of the α/β-fold of BaP1 showing β-strands as arrows and α-helices as cylinders. The active site cleft is located between α4- and α5-helix and is indicated by red dots which mark the positions of the three histidine residues coordinating the catalytic Zn 2+ ion. The position of the highly conserved 1,4-β-turn (Met-turn) is indicated with an M.

91 3 RESULTS

The final models of the different datasets showed very high similarity in between each other, represented by high Z scores and very low RMS deviations (Section 2.6.3). Thereby, the highest Z score/lowest RMS deviation (40.7/0.07 Å) in pairwise structure comparison with DALILITE was obtained between the model of dataset III and IV and the lowest Z score/highest RMS deviation (40.0/0.20 Å) between the structures of dataset II and III (Table 9).

Table 9 Z scores and RMS deviations [Å] of pairwise structure comparison. Model II III IV I 40.1/0.19 Å 40.5/0.11 Å 40.5/0.11 Å II - 40.0/0.19 Å 40.1/0.18 Å III - 40.7/0.07 Å

Likewise, the superposition of the Cα positions of the models, representing the protein backbone, confirm the high similarity in between each other. In Figure 44 the RMS deviations of the models of datasets I, III, and IV against the model of dataset II, which was refined with isotropic B factors, is shown. The three other models are virually the same with only small differences in between their main chains. Because of the different refinement method the model of dataset II is the most distinct but according to the resolution (1.46 Å) it is still in a good quality model.

Figure 44 Plot of the deviation in C α positions upon superposition. The models of dataset I (green), III (black), and IV (red) are aligned onto the model of dataset II and the deviations are drawn from residue number 1 to residue number 202. Secondary structure elements are given below the graph as follows: β-strands, black lines; α-helices, light gray lines; 310 - helix, gray line.

92 3.1 Crystal structure of the BaP1*inhibitor complex

Figure 45 Stereo top view of the inhibitor bound to the snake venom metalloproteinase BaP1. Ribbon plot of BaP1 with the initial (Fo-Fc) electron density map of the inhibitor at a contour level of 3 σ. The final model of the inhibitor is drawn into the density. The six cysteine residues involved in disulfide bonding and the conserved Met-turn with the methionine residue are indiated for reference. The catalytic zinc ion is shown as magenta sphere.

Quarternary structure

An overview of the intermolecular contacts in the BaP1*inhibitor crystal lattices for all four models is given in Table 24, Table 25, Table 26, and Table 27 (Appendix 0). Hereby, the chains contacting via the interfaces (no. I to V) are chain A of the sole molecule in the asymmetric unit and chain A of the corresponding symmetry-related monomer. In between the four models, there are only slight differences in surface area, number of hydrogen bonds/salt bridges and interfacing residues, respectively. In all crystals the same contacts were observed apart from interchanged crystal contacts no. III and IV in dataset IV (Table 27). No contacts that could indicate a symmetric dimer are found. Therefore, dimer formation is not probable which is in agreement with results from gel filtration, showing that the BaP1*inhibitor complex is monomeric (Section 3.1.1). The contacts building up the crystal lattices are markedly nonpolar in nature, with only few salt bridges and hydrogen bonds formed.

Active site and inhibitor binding

In the structure, the hydroxamate derivative occupies the peptide binding cleft of BaP1, forming interactions over its entire length. The peptidomimetic inhibitor can be separated into four parts (Figure 46). In the following, functional groups, namely hydroxamate (Hyd1), i-butyl (Byl2), t-

93 3 RESULTS butyl (Byl3), and methyl (Myl4) are used to identify the different parts of the inhibitor. The N- methylthiazole-2-carboxamide group of Byl2 is shortened to thiazole. Following common nomenclature for peptide/protease complexes (Section 1.1), these groups are divided into P1 (thiazole), P1' (Byl2), P2' (Byl3), and P3' (Myl4). In its binding mode, the inhibitor closely mimicks the C-terminal part of an enzyme-bound peptide substrate. It is held into position by hydrogen bonds and van der Waals contacts involving its backbone and side chains as well as a cation-π interaction (Figure 48). The hydrogen bonding network between enzyme and inhibitor backbone resembles that of an antiparallel β-sheet as know for metzincins (Section 1.1.2).

Figure 46 Schematic structure of the peptidomimetic inhibitor. Its functional groups (hydroxamate, i- butyl, t-butyl, and methyl) are used for abbreviation and fragmentation of its scaffold (Hyd1, Byl2, Byl3, and Myl4). The N-methylthiazole-2-carboxamide group of Byl2 is shortened to thiazole. Inhibitor residues P1 through P3' and enzymatic subsites involved in binding are indicated (S1, S1', S2', and S3').

Four regions of the BaP1 structure are involved in inhibitor binding. The first consists of part of strand β4 and the proceeding loop, the second is the zinc-binding motif, which contains part of helix α4 and the following loop, the third region is part of the loop connecting helices α4 and α5, and the fourth is residue Thr139 (Figure 48). These determinants cover four binding subsites of BaP1, one on the N-terminal side of the catalytic center (S1) and three on the C-terminal side (S1', S2', and S3') (Figure 47). The first segment (Asn106-Thr107-Ile108-Gly109-Arg110) contributes to all peptide-binding subsites of BaP1. Thr107 and Arg110 are involved in forming subsite S1, Thr107 and Ile108 in subsite S1', Asn106 and Thr107 in subsite S2', and Asn106 and Ile108 in subsite S3'. As the first segment is part of strand β4, hydrogen bonding is dominated by backbone interactions, with bonds formed between the mainchain O-atom of Asn106 and the N-atom of Myl4, the N-atom of Ile108 and the O-atom of Byl2, and between the O-atom of Gly109 and the amide nitrogen of 94 3.1 Crystal structure of the BaP1*inhibitor complex

Hyd1 (Figure 48). These mainchain interactions are the reason for the observed 1.4 Å shift of the loop segment towards the inhibitor upon binding. Additionally, there are side chain hydrogen bonds between the O δ-atom of Asn106 and the N-atom of Myl4 as well as between the O γ-atom of Thr107 and the O-atom of P1. Whereas the Thr107 side chain only slightly changes its orientation, the Asn106 side chain, which is solvent exposed in the native structure, now points towards the inhibitor (Figure 48). Ile108 is part of the entrance of the S1' subsite tunnel, which is known for its high hydrophobicity in all SVMPs and is likely to be responsible for substrate specificity (Ramos and Selistre-de-Araujo, 2006; Watanabe et al., 2003). Its C γ- and C δ-atoms are found in van der Waals contact distances with the C-atoms of P1' (3.8 Å). Thr139 is also part of the S1' subsite tunnel entrance and its C γ-atom, as well as the one of Thr107, is in hydrophobic contact distance to inhibitor’s P1' residue (Figure 48). A rarely seen type of inhibitor/enzyme interaction is observed between Arg110 side chain (subsite S1) and the P1 residue of the peptidomimetic scaffold. Both form a cation-π-interaction, whereby the delocalized positive charge of the arginine side chain interacts with the π-system of the thiazole ring. In the native structure of BaP1, Arg110 extends into the bulk solvent, hence, increasing the height of the docking site wall (Figure 47). Interestingly, in the BaP1*inhibitor complex it is flipped towards helix α3, forming a hydrogen bond with the Ser72 side chain (not shown) and placing the positive charge in contact distance to the ring of P1 (Figure 48). The observed distances (3.5-3.7 Å) between N η- and N ε-atoms of Arg110 and the plane of the aromatic thiazole ring of P1 are close to the energetically most favorable distance (4.0 Å) for this kind of interaction. The P1 residue is also fixed by a hydrophobic interaction with the C γ-atom (Figure 48). The only amino acid of the zinc-binding motif (ranging from His142 to Asp153) involved in BaP1 subsite building is His142. It is also part of the entrance of the S1' pocket and thus interacting with residue P1' (C γ- and C ε-atom). On its other side (N ε2-atom) it is involved in coordination of the zinc ion (Figure 48). As has been observed in the majority of fourfold zinc coordination systems in enzymes, the zinc ion is tetrahedrally coordinated by the N ε2-atoms of three histidines (His142, His146, His152) and a catalytic water molecule in the non-complexed BaP1 structure (Figure 58A, page 120) (Alberts et al., 1998; Watanabe et al., 2003). In contrast, the zinc ion in the inhibitor complex is fivefold coordinated, with nearly perfect square pyramidal coordination geometry (Figure 58B). The N ε2-atoms of His146 and His152 and both oxygens of Hyd1 form the basal plane and the N ε2-atom of His142 the apex of the pyramid. The zinc ion is slightly displaced towards the apex, out of the plane of the basal atoms. Compared to the most commonly seen trigonal bipyramidal geometry, this square-pyramidal arrangement is rarely observed in five- coordinate enzymatic zinc centers (26% vs. 74%) (Alberts et al., 1998).

95 3 RESULTS

Figure 47 The inhibitor sitting in the shallow binding cleft of BaP1 represented by its molecular surface. N-terminus of enzyme-bound substrate would be located on the left, C-terminus on the right side. The zinc ion, the coordinating histidine residues and the ribbon plot of BaP1 are shown beneath the surface. Involved binding subsites of BaP1 are indicated as black labels, secondary structure elements as white labels.

Figure 48 Stereo view of the BaP1 active site with the bound inhibitor as stick model. Carbon atoms are shown in gray (BaP1) and orange (inhibitor), oxygen atoms in red, nitrogen atoms in blue, sulfur atoms in yellow and the catalytic zinc ion as magenta sphere. Important interactions are indicated as follows: hydrogen bonds, black dashed lines; zinc coordination, magenta dashed lines; hydrophobic contacts, green dots; cation-π-interaction, purple dots. All enzymatic and inhibitor residues are indicated and the complexed glycerol molecule is labelled with GOL. For reference, side chains of Asp106, Arg110, Glu143, Ser168, and Val169 of non-complexed BaP1 are shown as black sticks. 96 3.1 Crystal structure of the BaP1*inhibitor complex

Binding of the inhibitor results in the displacement of several water molecules at the binding site of native BaP1, i.e. the catalytic Wat67. Wat67 has been proposed to play a critical role as the catalytic nucleophile in SVMPs, attacking the scissile peptide bond after polarization by the highly conserved Glu143 (Section 1.1.3). Only two water molecules, Wat7 and Wat13 deep inside the S1' pocket, were found near the active site in the inhibitor complex. Wat7 interacts with the backbone O-atoms of Ile165 and Ala167, and Wat13 with the backbone N- and O-atoms of Ser171 as well as with the Ser163 side chain. Instead of coordinating the catalytic water molecule, Glu143 rotates towards the hydroxamate group (Hyd1) of the inhibitor and forms hydrogen bonds with it (Figure 48). The glycerol O3-atom is detected close to the original position of another water molecule in the native structure (Wat69).

Figure 49 Interaction of the inhibitor with the zinc ion and Glu143. The three zinc coordinating histidine residues are also depicted.

Backbone atoms of the last segment (Ser168-Val169-Leu170) are the main part of the S1' and S2' subsites of BaP1, interacting with the inhibitor scaffold via two hydrogen bonds. The first is formed by the backbone O-atom of Ser168 with the backbone N-atom of Byl3 and the second between the O-atom of Byl3 with the mainchain N-atom of Leu170. Presumably, these interactions force this loop segment onto a 1.3 Å shift towards the inhibitor upon binding. This could also be the reason for the rotamer change of the Ser168 side chain, although it is not directly involved in inhibitor binding (Figure 4). Besides the hydrogen bonds, almost all main chain atoms of the loop are in van der Waals contact distance to the inhibitor P1' residue. Additionally, the side chain atoms of Val169 are possible interaction partners for the P2' inhibitor residue, which points outside the cleft towards the bulk solvent. The C γ-atoms of Val169 are located in close contact (3.8 and 4.2 Å) to one of the methyl groups of P2', which probably imposes the Val167 side chain on a 90° rotation (Figure 48). Because the side chain of Leu170 is positioned alongside the binding site cleft, it interacts with the P1' as well as with the P3' residue and is part of two different subsites (S1', S3'). Overall, inhibitor binding only affects the substrate binding pocket, while the rest of the protease is unperturbed. Inhibitor binding causes the pocket-flanking loops to shift inward by 2.7 Å, 97 3 RESULTS significantly narrowing the substrate binding cleft. The most obvious differences occur at residues Asn106, Arg110, and Val169, which bind the inhibitor directly and change rotamers (Figure 48). This flexibility at the binding pocket is in agreement with the broad substrate specificity seen in SVMPs, making BaP1 well suited to cleave a variety of peptides in the extracellular matrix.

3.1.8 Structural comparison of BaP1 with enzymes of the reprolysin family

A comparison of the BaP1 structure with all PDB entries performed with PDBeFold (Krissinel and Henrick, 2004) (SSM threshold of the lowest acceptable match: 50%) provides 76 hits with RMS deviations up to 2.09 Å. The first 27 matches all originate from metalloproteinase domains of P-I or P-III class SVMP structures, either of apoenzymes or complexes (no P-II class structures are known yet). Beginning with an RMS deviation of 1.21 Å metalloproteinase domain structures of other human reprolysin family members appear in the following order: ADAM33, ADAMTS5, ADAM17 (TACE), and ADAM22.

Comparison of BaP1 with P-I SVMPs

To date, ten X-ray structures of P-I SVMPs are known. The following nine of them are deposited in the Protein Data Bank (Berman et al., 2000): Acutolysin A (1BSW), Acutolysin C (1QUA), and FII (1YP1) from Agkistrodon acutus (Gong et al., 1998; Lou et al., 2005; Zhao et al., 2007; Zhu et al., 1999), Adamalysin II from Crotalus adamanteus (4AIG) (Gomis-Ruth et al., 1993; Gomis-Ruth et al., 1998), Atrolysin C from Crotalus atrox (1ATL) (Botos et al., 1996; Pinto et al., 2006; Zhang et al., 1994), BaP1 from Bothrops asper (2W15) (Lingott et al., 2009; Watanabe et al., 2003), BmooMPalpha-1 from Bothrops moojeni (3GBO) (Akao et al., 2010), H2-Proteinase from Trimeresurus flavoviridis (1WNI) (Kumasaka et al., 1996), and TM-3 from Trimeresurus mucrosquamatus (1KUF) (Huang et al., 2002). The structure of Leucurolysin A is not yet released and was kindly provided for analysis by R. N. Ferreira and A. M. de Pimenta, Universidade Federal de Minas Gerais, Brazil (Ferreira et al., 2009). All examples show a very similiar arrangement of the secondary structure elements (Figure 50). The pairwise structure alignments of all structures with BaP1 confirmed the high similarity. Thereby, the RMS deviations of all atoms are laying between 0.50 and 0.77 Å (C α-atoms between 0.40 and 0.71 Å). Regarding RMS deviations of the four models of the present work (Section 3.1.7), this is an astonishing similarity of distinct proteins. Disregarding the most distinct structure (Acutolysin A; 0.77 and 0.70 Å), the RMS deviations of all other examples are between 0.50 and 0.61 Å (C α-atoms between 0.40 and 0.54 Å). The only region where major distinctions can be observed is the above mentioned variable loop area which comes after the zinc-binding motif (residues 153 to 177). These about 25 residues are splitted into two loops by the highly conserved

98 3.1 Crystal structure of the BaP1*inhibitor complex

Met-turn (164 to 166): the loop before the Met-turn (residues 153 to 163) and afterwards (residues 167 to 177). Concerning the presented structures, the first loop comprises either eight (Acutolysin C), nine (TM-3, H2-Proteinase, and FII), ten (BaP1, BmooMPalpha-I, Leucurolysin A, Adamalysin II, and Atrolysin C) or eleven residues (Acutolysin A). The second loop is always composed of exactly eleven residues. Interestingly, although the three residues of the Met-turn, and at least one cysteinyl residue in these two loops, are highly conserved in sequence and structure, there is a high variability in this region. This is not only the case for the side chain atoms, but also for backbone atoms of the proteins.

Figure 50 Alignment of the metalloproteinase domains of all ten known P-I SVMP structures. Acutolysin A (1BSW) (Gong et al., 1998), Acutolysin C (1QUA) (Zhao et al., 2007; Zhu et al., 1999), Adamalysin II (4AIG) (Gomis-Ruth et al., 1993; Gomis-Ruth et al., 1998), Atrolysin C (1ATL) (Botos et al., 1996; Pinto et al., 2006; Zhang et al., 1994), BaP1 (2W15) (Lingott et al., 2009), BmooMPalpha-1 (3GBO) (Akao et al., 2010), FII (1YP1) (Lou et al., 2005), H2-Proteinase (1WNI) (Kumasaka et al., 1996), TM-3 (1KUF) (Huang et al., 2002), and Leucurolysin A (Ferreira et al., 2009) perfectly fit together. The only variable areas are the N- and the C-Terminus and the two loops before and after the Met- turn. Each step represents a rotation of 90° clockwise. The remaining nine structures are aligned onto the one of BaP1 (2W15).

Comparison of BaP1 with the metalloproteinase domain of P-III SVMPs

So far, there are five known structures of the metalloproteinase domain of P-III SVMPs and all of them are deposited in the Protein Data Bank (Berman et al., 2000). Very recently, Guan et al. have solved the structures of Kaouthiagin-like (3K7N) and Atragin (3K7L) from Naja atra (Guan et al., 2010). The other five structures are AaHIV from Agkistrodon acutus (3HDB) (Zhu et al., 1997; Zhu et al., 2009), from Bothrops jararaca (3DSL) (Assakura et al., 2003; Muniz et al., 2008), VAP1 (2ERQ) and Catrocollastatin (2DWO), also called VAP2B, from Crotalus atrox (Kikushima et al., 2008; Takeda et al., 2006; Zhou et al., 1995), and RVV-X (2E3X) from Daboia russelli siamensis (Gowda et al., 1996; Takeda et al., 2007). Like the P-I SVMPs the metalloproteinase domains analyzed here are very similar (Figure 10). To compare the P-III domains with the structures of the P-I proteins, all seven examples were aligned onto the structure of BaP1 (2W15). Thereby, RMS deviations which were just slightly 99 3 RESULTS larger as for the structures in between the P-I class proteins (all atoms between 0.66 and 0.79 Å; Cα-atoms between 0.55 and 0.71 Å) originated. Again, the major distinctions in between the structures occur at the variable loop region after the zinc-binding motif and the N-terminus of the domains. In contrast, the C-terminus in P-III structures is more rigid as consequence of the disintegrin domain which is usually coming after the metalloproteinase domain. This domain is missing in case of the P-I SVMPs, therefore the C-terminus is not fixed and more flexibility is observed. Unlike before, residues 103 to 108 (loop between β-sheet 3 and 4) show higher fluctuations. As this loop is located at the surface of the protein it may be an effect of different packing of the protein in the crystal. Compared to BaP1, both β-sheets β1 (residues 9 to 13) and β2 (residues 52 to 57) are elongated in all P-III proteins and are positioned in a different way. In case of the P-III structures, the loop before the Met-turn is composed of exactly ten residues (like in BaP1). The second loop is also possesses eleven residues in all structures apart from the two examples from Naja atra (Kaouthiagin-like and Atragin) in which one residue is missing. As mentioned before these two loops are separated by the highly conserved Met-turn. The first loop is containing of two highly conserved cysteinyl residues (see disulfide pattern of P-III SVMPs). Nevertheless, this region again is the most variable part of the structures although the backbone of the Met-turn (164 to 166) is perfectly aligned onto each other.

Figure 51 Alignment of the metalloproteinase of all five known P-III SVMP structures (blue tones) with BaP1 (green). AaHIV (3HDB) (Zhu et al., 1997; Zhu et al., 2009), Bothropasin (3DSL) (Assakura et al., 2003; Muniz et al., 2008), VAP1 (2ERQ), Catrocollastatin (2DWO) (Kikushima et al., 2008; Takeda et al., 2006; Zhou et al., 1995), and RVV-X (2E3X) (Gowda et al., 1996; Takeda et al., 2007) are also very similiar in structure. The only variable areas are the N- and the C-terminus and the two loops before and after the Met-turn. Each step represents a rotation of 90° clockwise. All structures are aligned onto the one of the P-I SVMP of BaP1 (2W15) which is not shown.

Comparison of BaP1 with the metalloproteinase domain of ADAMs and ADAMTSs

To date, crystal structures of the following human adamalysins are known: ADAM17 (PDB code: 2I47) (Condon et al., 2007), ADAM22 (3G5C) (Liu et al., 2009), ADAM33 (1R54) (Orth et

100 3.1 Crystal structure of the BaP1*inhibitor complex al., 2004), ADAMTS1 (2JIH) (Gerhardt et al., 2007), ADAMTS4 (3B2Z) (Mosyak et al., 2008), and ADAMTS5 (3HY7) (Tortorella et al., 2009). Likewise, the structures of the metalloproteinase domains are very similar in the arrangement of secondary structure elements. In contrast, the loop regions in between the secondary structure elements vary largely (data not shown). For comparison, all six structures have been aligned onto the structure of BaP1 (2W15). Compared to the alignments in between the metalloproteinase domains of SVMPs, the RMS deviations now are significantly higher, but are still in remarkable range concerning similarity (all atoms between 1.21 and 2.02 Å; Cα-atoms between 1.19 and 2.01 Å2). For comparison, the connecting part between metalloproteinase and disintegrin-like domain was deleted in the structures of the ADAMs and ADAMTSs and the protein was shortened to 202 amino acid residues for comparison with BaP1 (data not shown).

3.2 Multiple sequence alignments of the metalloproteinase domain of SVMPs

All MSA were performed as described in Section 2.7.6. Sequences of the catalytic domain of different P-I, P-II, and P-III SVMPs deposited in the UniProtKB/SwissProt database were used (Apweiler et al., 2004). To date (July 2010), these are 37 sequences of P-I, 25 of P-II, and 31 of P- III SVMPs (Table 29, 30, and 31; Appendix 7.6). From these sequences, 33, 22, and 24 entries are suitable for analysis as only these ones obtain the complete sequence of the metalloproteinase domain. Additionally to the data of the UniProtKB/SwissProt database, the sequences of TM-1 (Huang et al., 1995) and FII (PDB code: 1YP1) (Lou et al., 2005) were used for the P-I SVMPs analyses. In the case of P-III SVMP analyses, AaHIV (3HDB), Atragin (3K7L), and Kaouthiagin- like (3K7N) (Guan et al., 2010) were included. Overall, metalloproteinase domain sequences of 35 P-I, 22 P-II, and 27 P-III SVMPs from 31 different snake species have been analyzed. As the deposited sequences of Atrolysin D (UniProtKB entry name: P15167) and Atrolysin C (Q90392) are virtually the same, for further analyses only the sequence of Atrolysin C (Q90392) is mentioned. For a clearer view and a better comparability all sequences were aligned onto the sequence of BaP1 in all analyses (Figure 42, page 91). Furthermore, the secondary structure elements and the positions of the disulfide bonds are assigned according to the numbering in the BaP1 sequence (Section 3.1.7). Notably, a problem can occur concerning the classification of P-I and P-II SVMPs. In many cases, SVMPs classified as P-I may be synthesized as P-II, and then cleaved to generate a P-I SVMP by the release of the disintegrin domain. Usually, if the sequence of the proteins is deduced from cDNA, they will be P-II because the disintegrin domain is present. Thus, the classification of a given sequence as P-I or P-II may depend on whether the sequence is based on the protein or on the

101 3 RESULTS cDNA. Nevertheless, the classification noted in the UniProtKB/SwissProt database is adapted unchanged in all following analyses.

3.2.1 Metalloproteinase domain of P-I snake venom metalloproteinases

All P-I SVMP sequences are listed in Table 29 (Appendix 7.5.1) and the corresponding sequence alignment of the 35 complete sequences is presented in Figure 80 (Appendix 7.6.1). Beginning with the sequence of BaP1 from Bothrops asper , the proteins were ordered in descending manner of sequence identities compared to the sequence of BaP1. The highest sequence identity shows BmooMPalpha-I from Bothrops moojeni (82%) (Akao et al., 2010) following by three protein sequences of the same genus: BjussuMP-2 from B. jararacussu (80%) (Marcussi et al., 2007), Leucurolysin A from B. leucurus (78%) (Ferreira et al., 2009), and Neuwiedase from B. neuwiedi pauloensis (68%) (Rodrigues et al., 2000). The only two P-I SVMPs that originate from Bothrops genus and do not directly follow, are BITM02A from B. insularis (57%) (Junqueira-de-Azevedo and Ho, 2002), and Atroxlysin 1 from B. atrox (52%) (Sanchez et al., 2010). Regarding FII from Agkistrodon acutus as protein with the lowest sequence identity, it becomes obvious how similar the catalytic domains of P-I SVMPs really are. Even in this case the sequence identity is 39%. Nevertheless, it is not possible to group the sequences according to the snake genus by the sequence identity (data not shown). Comparison of the physico-chemical character of each amino acid residue indicates that there are 65 positions at which the residues are at least semiconserved. This corresponds to 32% of all 202 amino acid residues with the undestanding that the 202 amino acid residues of BaP1 are a complete metalloproteinase domain. In other words, one third of P-I SVMPs’ catalytic domain is possibly conserved because of either structural or functional (or both) reasons. As mentioned before, SVMPs are provided with a highly conserved zinc-binding moiety and a consensus three residue motif, the Met-turn (Section 1.1.2). With the comparison of the 35 sequences it can be assumed that the consensus sequence of P-I SVMP has to be increased concerning the physico-chemical property of the foregoing amino acid residues (towards the N-terminus). Furthermore, there is an absolute conserved three residue motif (147-NLG-149) located directly after the second histidine residue (His146). This motif is probably responsible for the structural integrity of the active site (Section 3.2.5) and it has yet not been assigned as a conserved motif in P-I SVMPs. The six positions (136 to 141) prior to the first zinc coordinating histidine residue (His142) are occupied by residues of the same physico-chemical character in all analyzed P-I sequences. For example, bulky hydrophobic residues (valine, methionine, isoleucine) are always found at position 136. In contrast, only alanine or glycine residues are found at the next position which means that

102 3.2 Multiple sequence alignments of the metalloproteinase domain of SVMPs there is a clear preference towards small hydrophobic residues. No preference according to residue size but physico-chemical character is observed at position 138 as either small or bulky hydrophobic residues (valine, isoleucine, leucine) are presented. In 97% of the sequences a threonine is found at position 139. The last two positions before reaching the first absolute conserved histidine residue (His142) have to be of hydrophobic character again; in the case of position 140 bulky amino acid residues (methionine, leucine, isoleucine) and in the case of 141 small ones (alanine, glycine). Then, the first zinc-coordinating histidine residue and the catalytically active glutamic acid (Glu143) can be found. Further downstream in the sequence, a need for hydrophobic residues is observed; bulky residues (leucine, isoleucine, methionine) at position 144 and small amino acid side chains (glycine, alanine) at position 145. Afterwards, the second catalytic histidine residue (His146) and the above-mentioned absolute conserved three residue motif (147-NLG-149) are present. This means that the formerly described bulky hydrophobic residue at position 148 (Section 1.1.2) is indeed a highly conserved leucine residue in P-I SVMPs. The sequence analyses could also confirm the bulky hydrophobic character of the next residue (methionine, valine or isoleucine at position 150). The position before the third zinc coordinating histidine residue (His152) may be assigned as the most variable in the 18-residue-motif. Hence, at this position nine different amino acid residues of different physico-chemical character are located (Glu, His, Asn, Asp, Arg, Ser, Lys, Ala, Gly). However, a more detailed view reveals a clear preference towards polar/charged amino acid residues (94%) and not towards small hydrophobic ones (6%; alanine or glycine). Thereafter, the third catalytic histidine residue (His152) and the structurally important, and highly conserved, aspartic acid residue (Asp153) are located. Interestingly, as mentioned before, directly after the consensus zinc-binding moiety of P-I proteins, the most variable region follows. This region is only interrupted by the conserved Met-turn (164 to 166) and builds the already mentioned two variable loops. In P-I SVMPs, the loop before the Met-turn comprises eight to twelve and that one after the Met-turn eleven residues. Thereby, the first part contains one or, alternatively, two cysteinyl residues. These residues are essential for the disulfide pattern of P-I SVMPS and, consequently, represent the only conserved residues in this region. The first residue of the highly conserved three residue motif of the Met-turn is always a cysteinyl residue (Cys164). Thereafter, a bulky hydrophobic residue is located at position 165; in 80% of the cases isoleucine or otherwise valine. These aliphatic residues are in hydrophobic interaction distance to the absolutely conserved Phe178 and the characteristically conserved position 141 (in 92% of the cases alanine or glycine). For further detail on the conserved Phe178 it is referred to Section 3.2.5. Absolute invariance is observed at the central residue of the Met-turn (Met165). As

103 3 RESULTS discussed before (Section 1.1.2), this residue is supposed to maintain the structural integrity of the complete zinc-binding pocket and to support the zinc ion with the essential energetic environment which is necessary to fulfill the catalytic function. Interestingly, the interaction partners of the mentioned residues are also invariant in all analyzed sequences (Section 3.2.5). At the consensus zinc-binding motif, 8 out of 11 positions show a clear preference towards hydrophobic residues (either small or bulky). Only at three positions polar/charged residues (Thr139, Asn147, Xxx151) seem to be required. The necessity of many hydrophobic residues can be explained on the basis of the secondary structure elements required for the distinct and highly conserved three-dimensional structure (Section 3.2.5). Most of the corresponding positions are part of the fourth helix, and it is known that hydrophobic residues are preferred for the building of such a secondary structure element. Additionally, this helix represents the bottom of the binding site and most of its residues are directed towards the inner part of the protein which requires a profound degree of hydrophobicity.

Disulfide pattern of the metalloproteinase domain of P-I SVMPs

The disulfide pattern of the catalytic domain of SVMPs was already described in different publications (Fox and Serrano, 2005, 2008). It was proposed that differences in amount and connectivity of cysteinyl residues could be related to stability or post-translational processing, such as the release of connected domains like the disintegrin domain. At least for the structural stability this was disproven because the nine known X-rays structures of P-I SVMPs show a different disulfide pattern and, nevertheless, are highly similar in structure (Section 3.1.8). It was described that the amount of cysteinyl residues for P-I SVMPs range from four, to six, or seven residues. Thereby, two or three disulfide bonds are formed (Fox and Serrano, 2005). In the present work, nearly half of the analyzed P-I sequences (46%, 16 sequences) possess six cysteinyl residues, following by about one third (31%, 11 sequences) with seven, and 14% (5 sequences) with four of these residues (Table 29, Appendix 7.6.1). It is proposed that all of the proteins which contain seven or six cysteinyl residues form three disulfide bonds because the corresponding positions are highly conserved (positions 117, 157, 159, 164, 181, and 197). The disulfide bonds would thereby be the following: Cys117-Cys197, Cys157- Cys181, and Cys159-Cys164. The only exception is Ac1 from A. acutus (Nikai et al., 1995) which contains six cysteinyl residues but is missing Cys157 and is therefore only able to build two disulfide bonds (Cys117-Cys197 and Cys159-Cys164). Instead, there is an additional cysteinyl residue at position 187. This is also the position where the sequences with seven cysteinyl residues possess the odd cysteinyl residue. This is the case for at least the six Agkistrodon (AaPA, ACLH, Acutolysin A, DaH1, DaH2, and DaH3) (Gong et al., 1998; Johnson and Ownby, 1993; Tsai et al.,

104 3.2 Multiple sequence alignments of the metalloproteinase domain of SVMPs

2000) and the two Bothrops proteins (Atroxlysin 1 and BITM02A) (Junqueira-de-Azevedo and Ho, 2002). Two of three examples of the Trimeresurus genus (TM-1, and TM-3) (Huang et al., 1995; Huang et al., 2002) present the additional residue at position 46 and the third one (H2-Proteinase) (Kumasaka et al., 1996) at position 87. In the sequence of Neuwiedase from B. neuwiedi pauloensis (Rodrigues et al., 2000) only five and in the sequence of Atroxase from C. atrox (Willis and Tu, 1988) only three cysteinyl residues were found. Unfortunately, these sequences, and the sequence of DaH4 from A. acutus (Tsai et al., 2000) are missing a segment at the C-terminus exactly in a region where usually a highly conserved cysteinyl residue is located (Cys197). Therefore, it is suggested that they should contain either six or four cysteinyl residues. The five P-I sequences with only four cysteinyl residues originate from the Crotalus genus: Adamalysin II from C. adamanteus , Atrolysin B, Atrolysin C, Atroxase from C. atrox , and HT-2 from C. ruber ruber (Gomis-Ruth et al., 1994; Hite et al., 1994; Takeya et al., 1990; Willis and Tu, 1988; Zhang et al., 1994). As mentioned before according to Ac1, these proteins are only able to form two disulfide bonds: Cys117-Cys197 and Cys159-Cys164. The only sequence which does not fit into this scheme is the one of Kistomin from A. rhodostoma (Hsu et al., 2008). Herein, nine cysteinyl residues are found. The sequence of Kistomin shows the same disulfide pattern as the proteins with three disulfide bonds. However, it possesses its three additional cysteinyl residues at completely different positions (po. 43, 58, and 122) compared to proteins comprising seven residues. In P-I SVMP research, this is a novelty and the structural or functional implications remain unknown but at least the first two positions are exposed to the surface in the structure of BaP1 and would be accessible for inter-domain disulfide bond formation.

Computational phylogenetic analysis of the catalytic domain of P-I SVMPs

All phylograms were calculated as described before (Section 2.7.7). The phylogenetic tree of the P-I SVMP sequences is depicted in Figure 52. As published by Fox and Serrano (2005), it was not possible to correlate hemorrhagic activity and organization of the catalytic domain with a phylogenetic analysis. At least some of the groups seem to be arranged according to the hemorrhagic activity in the present work. For example, in the Agkistrodon branch, non-hemorrhagic P-I SVMPs (bottom) are separated from hemorrhagic ones (top) by proteins with slight activities (Acutolysin C, Aculysin 1, Acutolysin B). Despite the striking sequence similarity of the P-I SVMPs it is, however, possible to divide the seven different genera to which the analyzed sequences belong (Agkistrodon, Bothrops, Crotalus, Echis, Lachesis, Trimeresurus, and Vipera ) by computational phylogenetics. The phylogenetic tree clearly separates the corresponding protein sequences and clusters them by the groups of different

105 3 RESULTS snake genera. The only exceptions are the aforementioned BITM02A and Atroxlysin 1 (Junqueira- de-Azevedo and Ho, 2002; Sanchez et al., 2010) from Bothrops genus which build a group in the middle of a large Agkistrodon branch. This gives rise to the suggestion that there could be some distinct features in sequence according to the snake genus.

Figure 52 Phylogenetic tree obtained by the computational analysis of all complete P-I SVMP sequences. The seven different genera ( Bothrops , Lachesis , Crotalus , Vipera , Echis , Agkistrodon , and Trimeresurus ) can be separated by this computational analysis.

3.2.2 Metalloproteinase domain of P-II snake venom metalloproteinases

All P-II SVMP sequences are listed in Table 30 (Appendix 7.5.2) and the corresponding sequence alignment of 22 complete metalloproteinase domain sequences is presented in Figure 81 (Appendix 7.6.2). For better comparability, the P-II metalloproteinase domain sequences are aligned onto the sequence of the P-I SVMP BaP1. To underline the high similarity of the catalytic domains of the different P-classes, sequence identities compared to BaP1 are calculated and the ranking is presented in descending manner. In the P-II class the highest sequence identity is 82% and applies to a Bothrops SVMP: Insularinase A from B. insularis (Modesto et al., 2005). Subsequently, another Bothrops protein follows with 81%: MP-2 from B. asper . Nevertheless, no correlation seem to exist between sequences and snake genus. Interestingly, the sequence identity with the lowest value compared to BaP1 is even higher than in between the group of P-I SVMPs (Section 3.2.1). Hence, the sequence of Rhodostoxin from Agkistrodon rhodostoma (Chung et al., 1996) in 45% shows the same residues as the sequence of BaP1.

106 3.2 Multiple sequence alignments of the metalloproteinase domain of SVMPs

Semiconserved residues are found at 77 positions of all sequences which represents more than one third of the entire metalloproteinase domain. According to the zinc-binding motif, a similar arrangement arises as mentioned concerning the group of P-I SVMPs (Section 3.2.1). The only differences are found at positions 136 and 137 where an absolute conservation of a valine and an alanine residue is found in the case of P-II sequences. However, this still fits into the characteristic preferences of P-I SVMPs. Another distinction occurs at position 141. In the case of P-I class sequences, a preference to small hydrophobic residues is observed, whereas P-II sequences do not strictly depend on this kind of residue. Here, a significant amount of threonine residues (18%) is present. Compared to the P-I sequences (Section 3.2.1) the same characteristic or exactly the same amino acid residue is found at the other 15 positions. Nevertheless, a dramatic change is found in Albolatin from Trimeresurus albolabris (Singhamatr and Rojnuckarin, 2007) where the catalytically essential glutamic acid (Glu143) is mutated to a glycine residue. As the activity, both the proteolytic and the hemorrhagic, is dependent on this residue, Albolatin is supposed to be inactive in all in vitro and in vivo experiments. This is also suggested by Singhamatr et al. (2007) and it was proposed that Albolatin only serves as precursor of a relevant disintegrin. As far as it is known, this is the only exception and Albolatin is the only known SVMP without the catalytically very important glutamic acid. The adaptive role of catalytically inactive SVMPs remains unknown. However, there are also several examples of human ADAMs which have lost their proteolytic activity but may retain important functions by their other domains (Liu et al., 2009). The NLG-motif (147 to 149) after the second zinc coordinating histidine residue is also found in the P-II class of SVMPs. Again, the most variable part of the proteins directly follows further downstream. In the present sequences, the first loop (before the Met-turn) comprises seven to ten amino acid residues and the second loop (after the Met-turn) eleven residues. The former loop contains one or two cysteinyl residues dependent on how many disulfide bonds are formed. The latter loop covers only one semiconserved position. Accordingly, position 170 has to be occupied by a bulky hydrophobic residue (isoleucine, valine or leucine). Usually, the first residue of the Met- turn is a strictly conserved cysteinyl residue (Cys164) which also forms one of the essential disulfide bonds. The P-II SVMP Mt-d from Agkistrodon halys brevicaudus (Jeon and Kim, 1999) possesses a serine residue at this position which represents the only known naturally isolated SVMP with a changed motif of the Met-turn. Mostly, isoleucine, but in a smaller number of cases also valine (9%) residues are found at the central position of the Met-turn. Again, the last of the three residues (Met166), the methionine, is strictly conserved.

107 3 RESULTS

Disulfide pattern of the metalloproteinase domain of P-II SVMPs

It was observed that the metalloproteinase domain of P-II SVMPs contain five, six or seven cysteinyl residues enabling the formation of two or three disulfide bonds (Fox and Serrano, 2008). Since there are yet no X-ray structures of any P-II SVMPs it can not finally be proven that the arrangement of the disulfide bonds is the same as in the P-I class proteins. Nevertheless, the positions of the cysteinyl residues are the same compared to structurally known P-I SVMPs. Therefore, it is supposed that the disulfide pattern is structurally arranged in a similar way. Mostly, the proteins of the present analysis contain five (2 sequences, 9%), six (8 sequences, 36%) or seven (11 sequences, 50%) cysteinyl residues (Table 30, Appendix 7.5.2). However, there is one example, Rhodostoxin from Agkistrodon rhosdostoma (Chung et al., 1996), which has eight cysteinyl residues. As mentioned before, the structural or functional importance of this higher amount of cysteinyl residues is still not known. Indeed, it would be interesting to verify whether the two additional residues are used to form one more disulfide bond in the metalloproteinase domain of the protein. By all means, comparable residues in the structure of BaP1 (PDB code: 2W15) are the positions 19 and 60 which are located exactly beside each other. The distance between both C α- atoms is in a range (6.0 Å) which definitely is reasonable for disulfide bonding, especially, with the knowledge of the distances of the other three disulfide bonds in the BaP1 structure: 5.8 Å at Cys117-Cys197, 5.6 Å at Cys157-Cys181, and 5.9 Å at Cys159-Cys164. In the case of P-II SVMPs which possess six and seven cysteinyl residues it is proposed that these proteins form three disulfide bonds. All corresponding cysteinyl residues are positioned in the same way as in all structurally known P-I proteins (Section 3.2.1). The only exception of proteins with six residues and distinct positions is Mt-b from Agkistrodon halys bevicaudus (Kang et al., 1998). In this sequence the Cys117 is missing and an additional cysteinyl residue exists at position 186. Probably, this provokes the formation of only two disulfide bonds and two free cysteinyl residues. For all proteins containing seven cysteinyl residues it is proposed that they also possess three disulfide bonds and an odd single cysteinyl residue. Accordingly, some of the proteins can be grouped to Mt-b, because the additional residue is also found at position 186. This is the case in the following sequences: Aculysin-2 and DaMD2 from Agkistrodon acutus , Agk.hal.pal. from Agkistrodon halys pallas , Crot.dur.dur. from Crotalus durissus durissus , HT-E from Crotalus atrox , Jararafibrase-2 from Bothrops jararaca , and Salmosin from Agkistrodon halys brevicaudus (Hite et al., 1992; Kang et al., 1998; Maruyama et al., 1992; Shin et al., 2003; Tsai et al., 2000). In the structure of BaP1 the position of residue 186 is exposed to the surface and the possibility of an inter-domain disulfide bond can be debated but further analyses will be necessary to confirm this assumption.

108 3.2 Multiple sequence alignments of the metalloproteinase domain of SVMPs

Both examples of proteins with only five cysteinyl residues, Mt-d from Agkistrodon halys brevicaudus (Jeon and Kim, 1999) and Bitisgabonin-1 from Bitis gabonica (Calvete et al., 2007), are capable of forming only two disulfide bonds. As mentioned before, in the sequence of Mt-d the cysteinyl residue of the Met-turn (Cys164) is mutated to a serine. It is not clear which kind of consequences this may have concerning structure and function of the protein. Instead, in the sequence of Bitisgabonin-1 the last cysteinyl residue (Cys197) is missing, although the counterpart (Cys117) is still present.

Computational phylogenetic analysis of the catalytic domain of P-II SVMPs

In contrast to the P-I class, it is not possible to distinguish between the seven different genera of the proteins ( Agkistrodon , Bitis , Bothrops , Crotalus , Echis , Trimeresurus , and Vipera ) by the phylogenetic tree of the P-II SVMP sequences (Figure 53). Indeed, some smaller groups can be separated but no clear differentiation can be observed.

Figure 53 Phylogenetic tree obtained by the computational analysis of all complete P-II SVMP sequences. The seven different genera ( Bothrops , Agkistrodon , Vipera , Bitis , Echis , Trimeresurus , and Crotalus ) cannot be separated by computational analysis.

3.2.3 Metalloproteinase domain of P-III snake venom metalloproteinases

All P-III SVMP sequences are listed in Table 31 (Appendix 7.5.3) and the corresponding sequence alignment is presented in Figure 82 (Appendix 7.6.3). In the case of the P-III class, 27 complete metalloproteinase domain sequences are deposited in the UniProtKB/SwissProt database. Again, all sequences were aligned onto the sequence of BaP1. The sequence identities compared to 109 3 RESULTS

BaP1 are calculated and the ranking is given in its descending manner. Accordingly, a Bothrops protein shows the highest identity (70%), BjussuMP-1 from B. jararaca (Mazzi et al., 2004). Subsequently, a sequence of another genus can be found, Graminelysin from Trimeresurus gramineus , with only 55% sequence identity (Wu et al., 2001). The sequence with the lowest identity (42%) corresponds to RVV-X from Daboia russelli siamensis (Gowda et al., 1996; Takeda et al., 2007) which is still higher than in between the P-I SVMPsequences. In general, the similarity of the P-III protein sequences towards the catalytic domain of BaP1 is notably lower than before, suggesting that there are noticeable differences in the sequence. However, direct comparison of the P-III alignment with the one of the P-I SVMPs does not reveal any significant discrepancy (Section 3.2.4). Again, the sequences are not ordered according to their snake genus, even if the sequence of BaP1 is ignored and not included in the alignment (data not shown). Again, a significant conservation of amino acid characteristics occurs in more than one third of the complete catalytic domain. Hence, at 71 positions of all sequences at least a semiconserved residue is present. The consensus zinc-binding motif of P-III SVMPs shows the same tendencies as in the other two classes. Only at two positions an ambiguous situation according to the amino acid characteristic is observed. In the other two classes, a definite preference towards a polar/charged residue (threonine) is observed at position 139. Instead, two thirds of the P-III sequences possess a non-polar residue (valine, isoleucine, alanine) at this position; at least, slightly more favorable than the threonine residue. As in the case of P-II proteins, a considerable amount of alanine residues (85%) can be observed at position 141. However, a significant amount of threonine residues is found in the P-III class which can not be neglected (15%). All the other residues are of the same physico-chemical character. Additionally, the three residue motif (147-NLG-149) is also conserved in the case of P- III sequences. Moreover, the most variable region of the P-III proteins directly comes after the most conserved motif. This time, the first loop (before the Met-turn) is composed of ten amino acid residues. Only in the case of Graminelysin from Trimeresurus gramineus (Wu et al., 2001) this loop comprises eight residues. Mostly, the second loop (after the Met-turn) contains eleven residues; only three exceptions are observed with ten residues (Atragin and Kaouthiagin-like from Naja atra and Kaouthiagin from Naja kaouthia ) (Guan et al., 2010; Ito et al., 2001). The Met-turn (residues 164 to 166) itself is again highly conserved. Both, the first (Cys164) and the last residue (Met166) are invariant in all analyzed sequences. Again, only the central position is variable either possessing an isoleucine (63%) or a valine residue (37%).

110 3.2 Multiple sequence alignments of the metalloproteinase domain of SVMPs

Disulfide pattern of the metalloproteinase domain of P-III SVMPs

The metalloproteinase domain of P-III SVMPs is known to have either six or seven cysteinyl residues to form all three formerly discussed disulfide bonds (Fox and Serrano, 2008). In the present analysis, two exceptions are found with TSV-DM from Trimeresurus stejnegeri (Wan et al., 2006) and RVV-X from Daboia russelli siamensis (Gowda et al., 1996; Takeda et al., 2007) both of which possess eight cysteinyl residues. All the other sequences contain six (5 sequences, 19%) or seven residues (20 sequences, 74%) (Table 31, Appendix 7.5.3). All six cysteinyl residues necessary for the known disulfide bonds are conserved in the P-III sequences (Cys117, Cys157, Cys159, Cys164, Cys181, Cys197). It is proposed that every P-III metalloproteinase domain forms the aforementioned three intra-domain disulfide bonds. The free cysteinyl residues are located at different positions but two positions seem to be more favored than others. Firstly, position 172 which is occupied by a cysteinyl residue in the sequences of Agkihagin from Agkistrodon acutus (no reference), Eoc1 from Echis ocellatus (Wagstaff et al., 2006), Halysase from Agkistrodon halys pallas (You et al., 2003), VAIP-A and VLAIP-B from Vipera lebetina (Trummal et al., 2005), and VAP1 from Crotalus atrox (Kikushima et al., 2008; Takeda et al., 2006). Secondly, sequences of the following proteins possess their seventh residue at position 186: AaHIV and Acutolysin E from Agkistrodon acutus (Zhu et al., 1997; 2009), Jararhagin and Bothropasin from Bothrops jararaca (Paine et al., 1992; Souza et al., 2001), BITM06A from Bothrops insularis (Junqueira-de-Azevedo and Ho, 2002; Paine et al., 1992; Souza et al., 2001), Brevilysin-H6 from Agkistrodon halys brevicaudus (Fujimura et al., 2000), and Catrocollastatin from Crotalus atrox (Zhou et al., 1995). The remaining seven P-III sequences with an additional cysteinyl residue do not show any correlation to the aforementioned positions: Graminelysin from Trimeresurus gramineus , position 39 (Wu et al., 2001); Berythractivase from Bothrops erythromelas , position 50 (Trummal et al., 2005); HCLD from Agkistrodon contortrix laticinctus , position 55 (Selistre de Araujo et al., 1997); VLFXA from Vipera lebetina , position 60 (Siigur et al., 2004); HR1a, position 121, and HR1b position 84, from Trimeresurus flavoviridis (Kishimoto and Takahashi, 2002); and Atrolysin A from Crotalus atrox , position 167 (Hite et al., 1994). In the case of the eight cysteinyl residues containing TSV-DM and RVV-X, the additional residues are located at position 84 and 172 and at position 22 and 60. The latter ones are located in close vicinity, but probably not near enough for an interaction via a disulfide bond (11.1 Å in the BaP1 structure). Nevertheless, through loop shifts disulfide bonding would probably be possible. In the case of TSV-DM the possibility of another disulfide bond is not given as comparable residues in the BaP1 structure are located at the opposite side of the protein.

111 3 RESULTS

Indeed, the assumption of a fourth intra-domain disulfide bond in RVV-X is confirmed by the crystal structure of RVV-X (PDB code: 2E3X) in which all eight cysteinyl residues are involved in disulfide bond formation (Takeda et al., 2007). The free cysteinyl residue of the P-III proteins which contain seven of such residues is mostly located at position 186. As mentioned in the chapter of P-II SVMPs, this residue is exposed to the surface in the structure of BaP1. In the case of the P- III SVMP structures of AaHIV (3HDB), Bothropasin (3DSL), and Catrocollastatin (2DW0) the corresponding cysteinyl residues are also located at the surface but protruding towards the inner part of the proteins. However, by loop shifts the required accessibility of Cys186 for disulfide bond formation with a free cysteinyl residue of another domain should be possible. Nevertheless, there is only a slight chance for this scenario in the case of residue 186 as it is located on the opposite side of the disintegrin and cysteinyl-rich domains. The mentioned positions of the other P-III sequences with seven cysteinyl residues (39, 50, 55, 60, 84, 167, and 172) are also invariably arranged at the surface of the metalloproteinase domain. Structural alignment of the known P-III SVMPs stuctures onto the one of BaP1 has shown that, especially, residues of positions 50, 80, and 172 are located in the middle of a domain-domain contact. Hence, residues of positions 39 and 55 are located in actual interaction distance but at the edge of the interface. Slight possibility for an inter-domain disulfide bond is given for cysteinyl residues at the positions 60 and 167 as they are not located in near vicinity of the disintegrin or cysteinyl-rich domains.

Computational phylogenetic analysis of the catalytic domain of P-III SVMPs

The eight different genera ( Agkistrodon , Bothrops , Crotalus , Daboia , Echis , Naja , Trimeresurus , and Vipera ) cannot be separated by phylogenetic analysis (Figure 54). The only small group, which is composed by proteins of one snake genus, is that of the sequences of Naja proteins (Kaouthiagin, Kaouthiagin-like, and Atragin). Besides this group, no clear differentiation in the phylogenetic tree is observable.

112 3.2 Multiple sequence alignments of the metalloproteinase domain of SVMPs

Figure 54 Phylogenetic tree obtained by the computational analysis of all complete P-III SVMP sequences. The eight different genera ( Bothrops , Crotalus, Trimeresurus, Agkistrodon , Vipera , Echis , Naja, and Daboia ) cannot be separated by this computational analysis.

3.2.4 Comparison of the different P-classes of snake venom metalloproteinases

The comparison of the three different alignments is presented in Figure 55. Overall, the alignments are very similar. However, slight differences can be observed, e.g. in the region of residues 18 to 29. Here, the P-I and the P-II protein sequences show five at least semiconserved positions, whereas the P-III SVMPs do not possess any conservation in this region. The same accounts for positions 10, 40, 47, 71, and 136. At least, residues 19 to 29, part of the first two helices ( α1 and α2), are located at the surface of the protein and may therefore be able to act as interaction site in functional processes, such as hemorrhagic activity. Similarly, one has to admit that residues at the surface of proteins are subject to higher evolutionary variability than those which exhibit functional or structural importance. Interestingly, the described patches are conserved in P-I and P-II but not in P-III proteins. Structural analyses revealed that this region is not hidden by additional domains in the crystal structures of P-III SVMPs which means that they are accessible and may act as interfaces during protein-protein interaction processes.

113 3 RESULTS

Figure 55 Sequence alignment of the metalloproteinase domain of all P-I, P-II, and P-III SVMPs performed with ClustalW2 (Larkin et al., 2007). For comparison the sequence alignment of all SVMPs is given in the last line. Residue numbering and secondary structure elements (light gray: α-helices; black: β-sheets; gray: 3 10 -helix) correspond to those of the P-I SVMP BaP1 (2W15). Consensus symbol explanation: (*) identical residues; (:) conserved residue substitution; (.) semi-conserved residue substitution.

Contrarily, similar residues exist between the P-II and the P-III group excluding the P-I group at the region of residues 33 to 35 and 86 to 89. However, it has to be mentioned that in the latter region P-I SVMPs do only not show a conservation because of the sequence of Atroxase from Crotalus atrox . As mentioned before, herein a segment of seven residues is missing (Willis and Tu, 1988). Otherwise, there would be the same kind of conservation as in the larger two P-classes. Indeed, the corresponding residues 33 to 35 are located at the surface but all P-classes contain sequences which show the same conserved residues in this region. Therefore, they do not seem to differ in this regard. Furthermore, the same effect can be observed at positions 12, 74, and 108. Similarities between the P-I and the P-III group which are not present in the P-II proteins can be detected at the following positions: 56, 109, 125, 130, 189. However, all of these residues are isolated and not located at the surface. Nevertheless, no particular sequential moiety which characterizes only one of the protein groups can be observed. This means that there are probably no class specific amino acid residue motifs with which a classification would be possible. However, more detailed analyses have to be done for further confirmation. Furthermore, phylogenetic analysis (data not shown) were also not capable to separate the different P-classes or the genera of the snake species.

114 3.2 Multiple sequence alignments of the metalloproteinase domain of SVMPs

Conserved residues of the zinc-binding motif and the Met-turn

As mentioned before, SVMPs are very similar in the primary and secondary structure, especially concerning the metalloproteinase domain. Many efforts have been made to find similarities or discrepancies in between the different P-classes or according to variable activities of SVMPs, such as hemorrhagic activity. The presented sequence alignments of 84 different SVMP sequences could show that conservation of the residues around the zinc-binding motif is higher than supposed before (Figure 83, Appendix 7.6.4). To verify these results and to obtain more significant data Jordi Durban & Juan José Calvete (unpublished data) kindly provided sequence segments of a recent venomics project. They analyzed 454 pyrosequenced venom gland transcriptomes of different Costa Rican viper species belonging to five different snake genera ( Bothriechis , Bothrops , Atropoides , Cerrophidion , and Crotalus ). Thereby, sequence segments of 48 different SVMPs of unknown P- class were found comprising both a complete zinc-binding moiety and a complete Met-turn. Hence, in total, the zinc-binding motif and the Met-turn of 132 sequences of SVMPs could be analyzed and statistically more significant data could be gained concerning the conservation of amino acid residues of the metalloproteinase domain (Figure 83, Appendix 7.6.4). Most of the regarded positions show the aforementioned preferences independently to which P- class they belong. These preferences of amino acid characteristics are well-defined apart from position 139. There is a preferred use of threonine in P-I and P-II proteins, whereas P-III sequences only present this amino acid residue in one third of the analyzed cases. Instead the tendency is slightly displaced towards non-polar residues (isoleucine and alanine). Furthermore, position 141 is mainly occupied with an alanine residue but in 10% of all analyzed sequences a threonine residue is observed; not dependent on snake genus nor P-class. Only in one of the 132 analyzed sequences, the catalytic glutamic acid (position 143) is not conserved (glycine residue in the sequence of Albolatin). The three residue motif (147-NLG-149) after the second zinc coordinating histidine residue is conserved in all 132 sequences. As far as it is known from literature all known SVMP sequences possess this invariant three residue motif. With all this information an elongated and more detailed zinc-binding motif of SVMPs can be proposed (Figure 56).

136 - n(b) - n(s) - n(b) - T>n(s/b) - n(b) - A>T - H - E - n(b) - n(s) - H - N - L - G - n(b) - p(s/b) - H - D - 153

Figure 56 Elongated zinc-binding motif of SVMPs. Except from two positions (139 and 141) all 18 residues (residue 136 to 153) show a clear preference considering the physico-chemical characteristics of amino acids. Abbreviations are as follows: n, non-polar residue; p, polar residue; b, bulky residue; s, small residue.

115 3 RESULTS

The three residue motif of the Met-turn is highly conserved in all analyzed sequences (Figure 57). Especially, in the case of the first (Cys164) and the last residue (Met166) no alternative option for an intact SVMP exists. As mentioned before, the cysteinyl residue forms one of the conserved disulfide bonds and the latter is supposed to be essential for structural integrity (Tallant et al., 2010). Instead, the central residue shows variability as either an isoleucine (in ¾ of all cases) or a valine residue is present at this position. Thereby, P-III SVMPs slightly show less preference towards the isoleucine residue. The only sequence in which the Cys164 was not existent is the one of Mt-d in which a serine residue is observed.

Figure 57 All complete sequences of SVMPs deposited in the UniProtKB/SwissProt database (Apweiler et al., 2004) were analyzed concerning amino acid composition at the highly conserved Met-turn. Thereby were involved 35 P-I, 22 P-II, and 27 P-III SVMP sequences of 31 different species and further 48 SVMP sequences of unknown class affiliation (gen.). Only the second one of the three amino acid residues shows variability.

3.2.5 Conserved amino acid residues - structural importance

To summarize, there are even more amino acid residues around the zinc-binding motif and the Met-turn which are conserved. Structural analyses of the known metalloproteinase domains (Section 3.1.8) confirmed the importance of these residues as they are also conserved in alignments. Due to the fact that most of the conserved residues require a specific counterpart for interaction or at least a characteristic type of amino acid even more amino acid residues can be included into the consensus motif of the metalloproteinase domain (Table 10). For instance, Asp153 which itself was once added ex post to the conserved residues of the zinc- binding motif, was already noted to need a specific hydrogen bonding partner (Ramos and Selistre- de-Araujo, 2004). This interaction partner is the serine residue at position 179 proceeding the last helix of the metalloproteinase domain ( α5). It was proposed that the interaction may be related to

116 3.2 Multiple sequence alignments of the metalloproteinase domain of SVMPs the stabilization of the position of His152 and, therefore, for conformation of the active site (Ramos and Selistre-de-Araujo, 2004). In all analyzed sequences this serine residue is conserved apart from the sequence of the P-I SVMP DaH4 from A. acutus (Tsai et al., 2000) in which at this position a leucine residue is present. However, its sequence is shortened and it is doubtful whether this is the true amino acid residue. Structurally, the serine is perfectly aligned in all 17 known metalloproteinase domain structures (data not shown), thereby, protruding towards the aspartic acid for hydrogen bonding. Moreover, there is another serine residue (Ser182) positioned in interaction distance to the second oxygen atom of the aspartic acid (Asp153). It is not absolutely conserved in all analyzed sequences, because of DaH4 possessing a lysine residue and because of Mt-d from A. halys brevicaudus (Jeon and Kim, 1999) possessing an asparagine residue. As before, this serine residue is also structurally conserved in all analyzed proteins. Hence, there is no doubt that these two residues seem to carry out any kind of structural functionality as this amino acid residue has not been mutated along evolution. Besides, there is another absolute conserved serine residue (Ser122) observed in near proximity of the zinc-binding motif. Again, this serine residue is conserved in both sequence and structure and interacts via a hydrogen bond with the first residue (Asn147) of the NLG-motif (147-NLG-149). As the latter residue in the present work was found to be absolutely conserved in the metalloproteinase domain sequence, again, there is no question about the structural importance of the two residues and their function. Besides the ionic interactions of these three serine residues, most of the conserved residues are of hydrophobic nature (Table 10). As hydrophobic interactions can often require variable interaction distances and unspecific partners (apart from being hydrophobic), several of these residues are not absolutely identical, but conserved in its physico-chemical character. However, only methionine and phenylalanine residues are found at positions 116 and 178. Both seem to be important for the structural integrity of the active site. Thereby, Met116 is interacting with the hydrophobic residues at positions 8, 144, and 148 (in case of BaP1: isoleucine, leucine, leucine). The former two residues invariably are of hydrophobic character, whereas the latter is the central residue of the NLG-motif and is an absolutely conserved leucine residue. Simultaneously, it is the last residue of the active site helix ( α4) and serves as hydrophobic anchor in a pocket composed of Met116 and the two hydrophobic residues at position 8 and 144. The other conserved residue (Phe178) is positioned in hydrophobic interaction distance to two of the Met-turn residues (Ile165 and Met166). The side chain of Phe178 stabilizes the one of Met166 in a way that this itself can protrude towards the zinc ion. Although the meaning of the methionine residue was discussed many times, future analysis have to include the phenylalanine residue in the considerations of the important Met-turn. This structural analysis reflects that overall far more than the mentioned 18 amino acid residues are

117 3 RESULTS

essential for the structural integrity of the zinc-binding region and to maintain the function of the metalloproteinase domain.

Table 10 Interactions of amino acid residues of the zinc-binding motif and the Met-turn of BaP1. bb / bb a bb / sc a sc / sc a sc / bb a Residue N(x)-›O N-›O(x) Val136 Thr139, Met140 Asn133 Val126 Leu198 Ala137 Met140, Ala141 Asn133 Val138 Ala141, His142 Leu134 Phe43 Leu170 Tyr176 Thr139 His142, Glu143 Trp135, Val136 His129 Met140 Leu144 Val136, Ala137 Gln96 Leu37 Val40 Val126 Ala141 Gly145 Ala137, Val138 Tyr44 Phe43 Ile165 His142 Gly145, His146 Val138, Thr139 Zn 2+ Ile165(O) Glu143 Asn147 Thr139 Leu144 Asn147, Leu148 Met140 Tyr44 Met116* Leu148 Gly145 Leu148, Ile150 Ala141, His142 His146 Gly114, Gly149 His142 Zn 2+ Ile165(O) Asn147 Met116 Glu143, Leu144 Ser122* Tyr112(O), His121(O) Ile8 Tyr44 Met116* Leu148 Leu144, Gly145 Leu144* Ile198 Gly149 His146 His151 Ile150 Gly145 His146* Met166* Tyr186 His151 Gln185 Gln185 Gly149 His152 Zn 2+ Asp153 Met166 Ser179* Ser182* Cys164 Ala167 Cys159* Ile165 Glu177 His142* Phe43 Ala141 Phe178* Met166 Asp153 Ile150 a bb, backbone; sc, side chain. Conserved residues are shaded gray (physico-chemical properties) or marked with an asterix (absolute).

3.3 Proteolytic reaction mechanism of metzincins

As described elsewhere (Section 1.1.3), the dependency of the reaction mechanism of metzincins is still not understood in detail. Both the X-ray structure of the BaP1*inhibitor complex and MD simulation were arranged to address the questions of zinc ion coordination distances, of the properties of glutamate residue, and of the dynamic behavior of residues as well as the displacement of water molecules of the binding site upon inhibitor binding.

3.3.1 Insight derived from the X-ray structures of the BaP1*inhibitor complexes

The catalytic zinc ion in BaP1 changes its coordination geometry from four-coordinate tetrahedral in the unliganded native state to five-coordinate square pyramidal in the inhibitor complex (Figure 58, Table 11). Such an expansion is expected to result in longer zinc coordination distances (Alberts et al., 1998). However, this is only the case for His152 (2.063 ± 0.009 Å between 118 3.3 Proteolytic reaction mechanism of metzincins

Zn and the N ε2-atom in the inhibitor complex, 2.00 Å in the native structure). The corresponding distance to His142 remains equal (2.048±0.011 Å vs. 2.05 Å), while the one to His146 even decreases (2.048±0.005 Å vs. 2.12 Å). For SVMPs, it has been suggested that proteolytic activity depends on the protonation state of the Zn 2+ -coordinating histidines on one hand, and on the polarization capacity of the glutamate residue on the other (Ramos and Selistre-de-Araujo, 2006). In a study on Acutolysin C using X-ray absorption near-edge structure (XANES) spectroscopy at different pH values, it was shown that the overall arrangement of the catalytic zinc ion remained tetrahedral going from pH 8.0 to 3.0. In contrast, the zinc coordination distances and the distance between the glutamate and the active site water molecule increased (Zhao et al., 2007). In contrast, there is no evidence for such a change in the corresponding Glu143(O ε1)-Hyd1(OH) distances in the BaP1*inhibitor complexes obtained at different pH values (Table 11). In a similar vein, pH-dependent changes in the protonation state of the active site histidine residues have been proposed to cause a change in zinc coordination distances (Section 1.1.3) (Zhu et al., 1999). In order to address this question, an additional step of refinement with REFMAC in which all metal-ligand restraints had been removed was performed. Subsequent analysis of the BaP1*inhibitor complex at four different pH values showed no detectable trend in Zn 2+ - coordination (Table 11). All three histidine residues feature quite similar average Zn-N distances between pH 4.6 and 8.0: 2.048±0.011 Å (His142-Nε2-atom), 2.048±0.005 Å (His146-Nε2-atom), and 2.063±0.009 Å (His152-Nε2-atom). Standard deviations are calculated between the four structures at different pH values and are all laying within coordinate error (0.025 Å for dataset IV). Additionally, there is no significant change in the distances between the zinc ion and the hydroxylic Hyd1(OH) and carboxylic Hyd1(O) atoms of the inhibitor (2.230±0.004 Å and 2.018±0.005 Å). Nevertheless, conclusions concerning atomic distances should be regarded critically, even though the estimated overall coordinate error of the present high-resolution structures is lower than that of previously available low-resolution structures (1.8-2.2 Å) (Zhu et al., 1999).

119 3 RESULTS

Table 11 Coordination distances at the active site of complexed and non-complexed BaP1. I II III IV BaP1 a Dataset pH 4.6 pH 6.5 pH 7.5 pH 8.0 pH 9.0 His142(N ε2)-Zn 2+ 2.04 2.08 2.03 2.04 2.05 His146(N ε2)-Zn 2+ 2.06 2.04 2.05 2.04 2.12 His152(N ε2)-Zn 2+ 2.06 2.09 2.05 2.05 2.00 Hyd1(OH)-Zn 2+ 2.23 2.23 2.24 2.22 - Hyd1(O)-Zn 2+ 2.01 2.03 2.01 2.02 - Glu143(O ε1)-Hyd1(OH) 2.62 2.54 2.61 2.67 - Glu143(O ε2)-Hyd1(N) 3.24 3.23 3.25 3.27 - Wat67-Zn 2+ - - - - 2.25 Glu143(O ε1)-Wat69 - - - - 2.86 Glu143(O ε2)-Wat67 - - - - 3.01

A B

Figure 58 Zinc coordination sphere of non-complexed (A) and inhibitor-bound BaP1 (B). In the former, the zinc ion is coordinated tetrahedrally by the N ε2-atoms of three histidine residues (H142, H146, and H152) and the oxygen atom of the catalytic water molecule (Wat67). Coordination distances are all given in angstrom and taken from the PDB entry 1ND1 (Watanabe et al., 2003). In case of BaP1 with the complexed inhibitor (only Hyd1 is depicted), the zinc ion is coordinated in a nearly perfect square based pyramidal geometry by the N ε2-atoms of three histidine residues (H142, H146, and H152) and both oxygen atoms of Hyd1. The latter atoms and those of His146 and His152 form the basal plane and the N ε2-atom of His142 the apex of the pyramid. Carbon atoms are shown as gray, nitrogen atoms as blue, oxygen atoms as red sticks, and the zinc ion is represented as magenta, the water molecule as red sphere. Coordination distances are given in angstrom and represent the standard deviations of all four presented structures.

Another possible mechanism for pH-dependent modulation of proteolytic activity is a more widespread change in enzyme structure (Xu et al., 2004). For the BaP1*inhibitor complex, there are only a few side chain and loop shifts, mainly at the protein surface and the active site, but no significant conformational changes in the peptide backbone at different pH values are observable. Structural alignments lead to RMS deviations (C α atoms) between 0.05 and 0.20 Å in pairwise comparisons. Thus, the structures can be regarded to be equal and our high-resolution analysis does not support the idea of wide-range conformational shifts causing a change in activity. 120 3.3 Proteolytic reaction mechanism of metzincins

3.3.2 Insight gained by MD simulations of different metzincin structures

As described before, with MD simulations it is possible to calculate several interesting aspects of structurally known protein targets (Section 2.7.3). All MD simulations mentioned in this section have been performed by Thomas Steinbrecher and are not yet published (personal communication).

Calculation of pK a shifts of the catalytically active glutamate residue

Generally, pH-dependent characteristics of proteins are defined by the pK a values of its amino acid side chains. To address the question about the protonation state and the detailed role of the catalytic glutamate residue (Glu143) during the reaction mechanism, three free energy calculations using the TI formalism were set up to simulate the protonation process in different environments: transformation of a deprotonated glutamate into the protonated form in the BaP1*inhibitor complex (PDB code: 2W15), in non-complexed BaP1 (same structure without the inhibitor), and in a virtual three-residue peptide (Ace-Glu-Nme; Ace, acetate; Nme, methylamine). The latter calculation is necessary because MM-forcefield-based free energy calculations can only give pK a values with respect to an arbitrary zero point and therefore the protonation of the free glutamate with known pK a (approximately 4.3) was included as a reference point. This way, pK a changes caused by the environment (protein and ligand) can be studied. Subsequently, the free energies are compared with the value of the free state and conclusions about the protonation state can be drawn. To obtain statistically valuable results, three different consecutive simulations have been performed (100 ps, 100 ps, and 300 ps). The resulting free energy values exhibit considerable drift between the two short (simulation 1 and 2) and the following longer simulation (simulation 3), indicating that longer simulation times are necessary for converged results. Nevertheless, the data indicate that the pK a of the glutamic acid is raised in the receptor, and also, maybe to a smaller degree in the receptor-ligand complex (Table 12). Comparison between the three different surroundings of the glutamate residue are shown in Table 13.

Table 12 Free energies G0 for Glu143 of simulations 1-3 in kcal/mol. Transformation Simulation 1 Simulation 2 Simulation 3 Average BaP1*inhibitor complex +78.1 +76.8 +74.9 +76.6 Non-complexed BaP1 +72.5 +73.5 +74.2 +73.3 Free glutamate +77.2 +77.2 +77.3 +77.2

121 3 RESULTS

Table 13 Free energy differences G0 for Glu143 in kcal/mol. Free energy differences Simulation 1 Simulation 2 Simulation 3 Complex vs. free +0.9 -0.4 -2.4 Non-complexed vs. free -4.7 -3.7 -3.1 Complex vs. uncomplexed +5.6 +3.3 +0.7

As all metzincins are catalytically depending on the invariant glutamate residue, the pK a shifts of other members were also analyzed. Two examples, TACE (PDB code: 3EDZ) (Mazzola, Jr. et al., 2008) and MMP13 (1XUC) (Engel et al., 2005), were chosen and the subsequent MD simulations were calculated. In this case, five different simulations were analyzed (simulation 1 to 5). Accordingly, the regarded residues were Glu406 in the TACE structure and Glu223 in the MMP13 structure. The results of the free energy calculations were consistent with the former data of BaP1 structures and results of all four samples (BaP1+I, BaP1*inhibitor complex; BaP1, non-complexed BaP1; TACE; MMP13) are shown in Table 14. The data of the BaP1*inhibitor complex (BaP1+I) is insufficiently converged as the other free energies and an average value would not be meaningful. However, the other data (BaP1, TACE, MMP13) indicates a better convergence. The comparison with the free glutamate residue depicts enhanced energies in all examples and, correspondingly, an increased pK a value is obtained (Table 15).

Table 14 Free energies G for Glu143, Glu406, and Glu223 of simulations 1-5 in kcal/mol. Protein Simulation 1 Simulation 2 Simulation 3 Simulation 4 Simulation 5 Average BaP1+I 78.1 76.8 74.9 73.0 70.8 ? BaP1 72.5 73.5 74.2 72.9 72.3 73.1 TACE 74.5 74.1 75.3 74.4 74.3 74.5 MMP13 72.3 70.3 71.7 70.9 72.7 71.6

Table 15 Free energy differences G in kcal/mol and the corresponding pK a values.

Protein Glu Average Compared to free Glu pK a BaP1+I 143 ? ca. -5.0 +3.7 BaP1 143 73.1 -4.1 +3.1 TACE 406 74.5 -2.7 +2.1 MMP13 223 71.6 -5.6 +4.2

Zinc ion coordination and Glu143 interaction

In contrast to the X-ray structures of the BaP1*inhibitor complexes (Section 3.1.7), in which the zinc ion is coordinated by the three histidine residues and two oxygen atoms of the hydroxamate group, there is an additional solvent molecule ligated during the MD simulations. This water molecule is positioned opposite to His142 and the zinc ion coordination is virtually octahedral

122 3.3 Proteolytic reaction mechanism of metzincins

(compare Figure 58). Also, zinc ion coordination in the simulation of the non-complexed structure is differing from the X-ray structure of the native protein (Section 3.1.7). Hence, it is ligated by the common histidine residues and three water molecules. This coordination behavior was confirmed by the MD simulation of the native protein, too. MM methods are known to have difficulties in accuratly representing the intricacies of metal-ligand binding geometries without extensive efforts at cases-specific reparameterization. In the present case, the forcefield as it is yielded to a sufficient approximation for stable and realistic binding modes. However, the simulations give at least some indication that a displacement of bound water molecules has to occur upon ligand binding. The coordination of Glu143 in the inhibitor complexes is dominated by both oxygen atoms of the hydroxamate group of the peptidomimetic and a solvent molecule positioned in hydrogen bonding distance to one of the oxygen atoms. This is also the case for the simulations of the complex structure. In both, the non-complexed and the native structure, the side chain of Glu143 is ligated by three water molecules, whereby one of these is interchanged with another one during the time range.

Water molecules of the binding pocket

In the X-ray structures of the BaP1*inhibitor complexes (Section 3.1.7) two water molecules have been found in the S1' pocket of the protein (Wat10 and Wat15). In contrast, in the structure of the native protein a total amount of six water molecules are found (Wat402, Wat405, Wat456, Wat441, and Wat456). Interestingly, the S1' pocket of metzincins usually is of hydrophobic character and is suggested to play an important role in substrate specificity (Section 1.1.2). This means that upon inhibitor binding and protrusion of substrate’s P1' side chain into the pocket a considerable number of solvent molecules has to be displaced. During the MD simulation of the BaP1*inhibitor complex structure both water molecules stay in the pocket, whereas in the non- complexed model there is an active and fast interchange of a considerable number of solvent molecules. The same tendencies could be confirmed by simulations of the native structure model. At least entropically, this would be an unfavorable process as the solvent molecules are losing degrees of freedom.

Side chain rotamers and energetic aspects concerning inhibitor binding

In another approach, the conformational changes of the binding site and the resulting energetic expenses of BaP1 upon inhibitor binding were analyzed. The MD simulations of the BaP1*inhibitor complex structure (2W15), the same structure without the inhibitor (non-complexed BaP1), and the native protein structure (1ND1) were taken as basis for all following comparisons. The simulation length was 2 ns in all cases. Firstly, the dynamic behaviour of important hydrogen bonds between

123 3 RESULTS inhibitor and protein, as observed in the X-ray structures (Section 3.1.7), were analyzed. Hydrogen bonds of the following backbone atoms were validated as stable in the BaP1*inhibitor complex: amid hydrogen atoms of Ile108 and Leu170, and likewise carbonyl oxygen atoms of Asn106, Gly109, and Ser168. Besides, both carboxylic oxygen atoms of Glu143 and the hydroxylic oxygen atom of Thr107 seem to maintain stable hydrogen bonds to the inhibitor. Furthermore, the consistent coordination of the zinc ion and the upkeep of the inhibitor in the binding pocket during the simulation indicates a realistic interpretation and a considerable amount of affinity of the complex partners. In a more detailed attempt, side chain rotamers of selected residues were compared between the three different starting points. These residues have been chosen due to significant conformational distinctions found in the X-ray of the BaP1*inhibitor complex and in the native protein. Corresponding residues are the following: Asn106, Arg110, Glu143, Ser168, and Val169. Dihedral angles of mentioned residues were first compared between both X-ray structures and then observed during the simulations (Table 16). In case of Asn10, the corresponding dihedral angle shows significant difference between the starting points and the simulations itself. This could indicate low rigidity and low interaction importance of this residue. Three different angles were analyzed in Arg110. This time, both in the complex and in the native protein the angles do not diverge significantly from their starting points. Both conformations seem to be stable and seem to be divided by a significant energy barrier. Dihedral angles of the catalytic Glu143 are staying in good consistence in all three simulations and also in the two starting points. In contrast, the starting points of the angles of Ser168 are differing widely and change frequently from one into another during the simulations. This would suggest that the energetic expense probable is comparatively low. The corresponding angle of Val169 in the complex differs also to the native protein. Interestingly, during the simulation of the non-complexed protein it is transformed into the form of the native protein structure. The rate of rotamer transformations therefore seems to be fast compared to the simulation timescale for Ser168 and Val169 and slow for Arg110.

Table 16 Dihedral angles of selected residues of the X-ray structures and MD simulations X-ray structures MD simulations Residue Dihedral angle complex native complex non-complexed native Asn106 Cα-Cβ-Cγ-Nδ 29 111 60 / 180 180 180 Arg110a Cα-Cβ-Cγ-Cδ -64 150 -60 -60 180 / 60 Arg110b Cβ-Cγ-Cδ-Nε -54 -84 -10 / -60 -60 180 / 60 / -60 Arg110c Cγ-Cδ-Nε -Cζ -74 140 -60 -60 180 Glu143 Cα-Cβ-Cγ-Cδ -44 -49 -60 -60 -40 Ser168 C-Cα-Cβ-Oγ -63 177 -60 / 180 180 / -60 180 Val169 C-Cα-Cβ-Cγ 59 169 60 180 180 / 60 / -60

124 3.3 Proteolytic reaction mechanism of metzincins

With the MM-PBSA technique (Section 2.7.3) the mean value of the free total enthalpy, representing the stability of the protein models, can be obtained. In case of the complex this led to a value of -5'611.6 kcal/mol, whereas the values of the non-complexed form (-5'634.3 kcal/mol) and the native protein (-5'628.1 kcal/mol) were noticeably lower. The absolute values should not be taken into account as quantitative conclusions are hardly gained by MM-PBSA calculations. But, at least, this is a hint that the protein has to adopt an energetically less favorable conformation upon inhibitor binding.

3.4 Elucidation of the hemorrhagic mode of action of SVMPs

As mentioned before (Section 1.2.2), the smallest class of SVMPs, the P-I class, comprises several examples of SVMPs with proteolytic activity towards the same substrates in a similar range but with widely differing hemorrhagic activity. The fact that hemorrhagic effects are depending on the proteolytic activity of the metalloproteinase domain led to the suggestion that the metalloproteinase domain probably exhibits unknown sequential or structural determinants.

3.4.1 Insight derived from the X-ray structures of BaP1*inhibitor complexes

Several investigations have correlated surface characteristics and structural variations within the family of SVMPs, especially at the mentioned loop region of residues 153 to 176, with their hemorrhagic potential (Gong et al., 1998; Ramos and Selistre-de-Araujo, 2004; Watanabe et al., 2003). It was proposed that this loop, by virtue of its surface localization and its vicinity to the active site, mediates extracellular matrix substrate binding and thereby influences the hemorrhagic potential of the enzymes. In the present analysis of BaP1*inhibitor structures, the first part of the loop (residues 154 to 162) shows high flexibility, with considerably higher B-factors (main and side chain atoms 12.8 Å2 and 15.2 Å2, respectively) compared to the average values of the whole protein (6.0 and 8.7 Å2). The last four residues in this segment (159-162) are in a double conformation in all four BaP1*inhibitor structures, representing a superposition of closed and open conformations. In the closed conformation, this loop segment packs against residues 175-177 through backbone interactions. Instead, in the open conformation it extends into the bulk sovent. In contrast to this highly flexible segment, the residues of the loop region on the C-terminal side of the Met-turn is rigid again (6.6 Å2 and 9.1 Å2, residues 167 to 176) (Figure 59). A comparison of the B-factor distribution of identical residues in the non-complexed form of BaP1 shows the same tendencies in flexibility by unchanged rigidity of the active site residues (data not shown).

125 3 RESULTS

Figure 59 B factor plot of the flexible loop region (residues 152 to 176). For reference the average B factor of all BaP1 C α-atoms (6.0 Å2) is indicated as dotted line.

3.4.2 Insight gained by sequential and structural analyses of metalloproteinase domains

Many efforts have been made to find the key structural determinants of the hemorrhagic activity of SVMPs. There are proteins which show potential towards the same subtrates in proteolytic activity, but can differ greatly in the level of hemorrhagic activity. As the latter is depending on the proteolytic function of the metalloproteinase domain, it was suggested that there should be structural determinants, either in primary or in secondary structure, in itself that enable a differentiation. In an interesting attempt, a bioinformatics study by Ramos and Selistre-de-Araujo (2004) involving comparative surface analysis of several hemorrhagic and non-hemorrhagic SVMPs (P-I as well as P-III) revealed a slight but not linear correlation between molecular surface area characteristics and hemorrhagic potential. In the present work, four P-I SVMPs which sequentially and structurally are very similar, but only two of them hemorrhagically active, were analyzed. The latter two are BaP1 from Bothrops asper (Lingott et al., 2009; Watanabe et al., 2003) and Acutolysin A from Agkistrodon acutus (Gong et al., 1998). The other two are Leucurolysin A from Bothrops leucurus (Ferreira et al., 2009) and BmooMPalpha-I from Bothrops moojeni (Akao et al., 2010) which are devoid of hemorrhagic activity.

Comparison of BaP1, Leucurolysin A, and BmooMPalpha-I

The sequence alignment of the three proteins show very high identities (Figure 60). Compared to BaP1, Leucurolysin A contains at 78% of all 202 positions the same amino acid residue and BmooMPalpha-I even at 82%. This corresponds to 158 and 166 amino acid residues of the

126 3.4 Elucidation of the hemorrhagic mode of action of SVMPs complete metalloproteinase domain. In between the two non-hemorrhagic proteins there is a sequence identity in the same range (81%). A comparison of the pairwise sequence alignments (BaP1 with one of the others) with the alignment of both non-hemorrhagic proteins indicates that the intersection of different amino acid residues which could explain the different behaviour of hemorrhagic activity are quite rare. There are only 25 positions (12%) which may be considered for explanation. Furthermore, of these residues only 22 are located at the surface of the protein and could influence protein-protein interaction which is supposed to forego hemorrhagic effects. Interestingly, most of the putative residues are located around the binding pocket (Figure 61), inidicating that this region may be responsible for an interaction with substrates in the extracellular matrix. These subtrates have to be cleaved at the active site to cause hemorrhagic effects. Nearly half of these residues belong to the above mentioned most variable region of the metalloproteinase domain: the two loops before and after the Met-turn. Six of those eight distinct positions are located in the first loop and two of them in the second. Again, the alignment of the three structures reveals quite impressive similarity. Compared to BaP1, RMS deviations for all atoms are 0.51 and 0.60 Å (BmooMPalpha-I and Leucurolysin A) and for C α-atoms only 0.44 and 0.51 Å. In between the two non-hemorrhagic proteins the structures show values slightly lower: for all atoms 0.45 Å and for C α-atoms only 0.42 Å. In fact, overall the structures are very similar again apart from the N- and C-terminal residues and the most variable region following the zinc-binding motif. In detail, this time the backbone of the first loop of the two hemorrhagic proteins is arranged in a different way, whereas the second loop (after the Met-turn) is nearly in the same position in all four proteins. This distinction is the reason of the higher similarity between the non-hemorrhagic proteins compared to the hemorrhagic one.

127 3 RESULTS

Figure 60 Sequence alignment of the hemorrhagic P-I SVMP BaP1 from B. asper and the non- hemorrhagic P-I SVMPs BmooMPalpha-I from B. moojeni and Leucurolysin A from B. leucurus performed with ClustalW2 (Larkin et al., 2007). Residue numbering and secondary structure elements (light gray: α-helices; black: β-sheets; gray: 3 10 -helix) correspond to those of BaP1 (2W15). Highly conserved residues of the active site and the Met-turn are shaded red. The quantity of residues and sequence identities compared to BaP1 are given at the end of the sequences. Pairwise alignments are shown. Consensus symbol explanation: (*) identical residues; (:) conserved residue substitution; (.) semi- conserved residue substitution. All positions which appear to be different between BaP1 and the hemorrhagically not active proteins are framed with a small box. With the comparison of the pairwise alignments (BaP1 and BmoooMPalpha-I; BaP1 and Leucurolysin A) these positions could be reduced to 25 putatively responsible residues (shaded green). These amino acid residues were analyzed concerning their exposure to the surface of the metalloproteinases ( ▲ or -).

128 3.4 Elucidation of the hemorrhagic mode of action of SVMPs

Figure 61 Surface presentation of BaP1 (2W15) and the peptidomimetic inhibitor to indicate the binding site pocket. All amino acid residues are coloured gray apart from the highly conserved zinc-binding motif and the Met-turn (red). The green patches correspond to those amino acid positions at which a difference in pairwise sequence alignments (BaP1 - Leucurolysin A, BaP1 - BmooMPalpha-I, Leucurolysin A - BmooMPalpha-I) is found. Interestingly, most of the residues are located around the binding site pocket. Each step represents a rotation of 90° clockwise.

Comparison of BaP1, Acutolysin A, H2-Proteinase, and Leucurolysin A

In a pairwise sequence alignment, BaP1 and Leucurolysin A exhibit the same type of amino acid side chains in 78% of all residues (Figure 62). Even more impressive is the fact that only 5% of all amino acid side chains possess different chemical properties. Compared to H2-Proteinase the sequential similarity of BaP1 is still rather high (53%), especially according to the similarity in between both active and both inactive proteins is equal (53%) and even lower (51%), respectively. Comparing all four SVMPs, the only major difference in sequence occurs at the loop region which directly comes after the highly conserved active site. Nevertheless, it is not possible to correlate the amino acid composition directly to the activity behaviour (hemorrhagic actives and inactives) of the proteins. Indeed, there are some remarkable similarities within the group of actives and inactives. For example, the hemorrhagic active proteins (BaP1 and Acutolysin A) both present a Gly-Ser-Cys-Ser- Cys-Gly-Ala/Gly-Lys-Ser (residues 154 to 162) sequence in the part before the Met-turn, whereas the inactives do not show any identical residues in this section beside the two conserved cysteinyl residues. This is a further hint that flexibility might play a role to distinguish between acitve and inactive proteins, bearing in mind that these amino acid side chains are known to enable a broader main chain conformational flexibility (Anderson et al., 2005; Dahl et al., 2008). In the part after the Met-turn (residues 167 to 177) no direct correlation between sequence and activity can be observed. Either both active and inactive proteins possess the same amino acid side chain (positions 167, 169 to 172, and 174 to 177) or, like at position 173, they do not show the same properties within the actives (Val and Glu) and the inactives (Lys and Gly). Therefore and as mentioned above, sequence

129 3 RESULTS alignments can lead to the localization of the potential interaction area but do not explain in detail why some SVMPs are hemorrhagic active and some are not.

β1 α1 α2 310       · · · · · · · · · · BaP1|ERFSPRYIELAVVADHGIFTKYNSNLNTIRTRVHEMLNTVNGFYRSVDVH LeucurolysinA|EQFSPRYIELVVVADHGMFKKYNSNLNTIRKWVHEMLNTVNGFFRSMNVD H2-Proteinase|QRFPQRYIELAIVVDHGMYKKYNQNSDKIKVRVHQMVNHINEMYRPLNIA AcutolysinA|STEFQRYMEIVIVVDHSMVKKYNGDSDSIKAWVYEMINTITESYSYLKID | . **:*:.:*.**.: .*** : :.*: *::*:* :. : :.:

β2 α3 β3     · · · · · · · · · · BaP1|APLANLEVWSKQDLIKVQKDSSKTLKSFGEWRERDLLPRISHDHAQLLTA LeucurolysinA|ASLVNLEVWSKKDLIKVEKDSSKTLTSFGEWRERDLLPRISHDHAQLLTV H2-Proteinase|ISLNRLQIWSKKDLITVKSASNVTLESFGNWRETVLLKQQNNDCAHLLTA AcutolysinA|ISLSGLEIWSGKDLIDVEASAGNTLKSFGEWRAKDLIHRISHDNAQLLTA | .* *::** :*** *: :. ** ***:** *: : .:* *:***.

β4 β5 α4      · · · · · · · · · · BaP1|VVFDGNTIGRAYTGGMCDPRHSVGVVRDHSKNNLWVAVTMA HE LGHNLG I LeucurolysinA|IFLDEETIGIAYTAGMCDLSQSVAVVMDHSKKNLRVAVTMA HE LG HNLG M H2-Proteinase|TNLNDNTIGLAYKKGMCNPKLSVGLVQDYSPNVFMVAVTMT HE LG HNLG M AcutolysinA|TDFDGATIGLAYVASMCNPKRSVGVIQDHSSVNRLVAITLA HE MA HNLG V | :: *** ** .**: **.:: *:* **:*::**:.****:

α5     · · · · · · · · · · BaP1|H HD -TGSCSCGAKS CIM ASVLSKVLSYEFSDCSQNQYETYLTNHNPQCILNKP LeucurolysinA|R HD -GNQCHCNAPS CIM ADTLSKGLSFEFSDCSQNQYQTYLTKHNPQCILNKP H2-Proteinase|E HD DKDKCKCEA-- CIM SDVISDKPSKLFSDCSKNDYQTFLTKYNPQCILNAP AcutolysinA|S HD -EGSCSCGGKS CIM SPSISDETIKYFSDCSYIQCRDYISKENPPCILNKP | ** ..* * . ***: :*. ***** : . :::: ** **** *

Figure 62 Sequence alignment of the four SVMPs performed with ClustalW2 (Larkin et al., 2007). Residue numbering and secondary structure elements (light gray: α-helices; black: β-sheets; gray: 3 10 -helix) correspond to those of BaP1 (2W15). Highly conserved residues of the active site and the Met-turn are depicted in white letters on black background. Residues of the highly dynamic loop area are shaded light gray. Pairwise alignments of the three proteins resulted in the following sequence identities: BaP1 and Leucurolysin A: 78%; BaP1 and H2-Proteinase: 53%; BaP1 and Acutolysin A: 51%; Leucurolysin A and H2- Proteinase: 53%; Leucurolysin A and Acutolysin A: 50%; H2-Proteinase and Acutolysin A: 51%. Consensus symbol explanation: (*) identical residues; (:) conserved residue substitution; (.) semi-conserved residue substitution.

3.4.3 Insight gained by MD simulations of P-I SVMP structures

All MD simulations were performed by Hannes Wallnoefer and results presented in this abstract are mainly published in Wallnoefer et al. (2010) or still unpublished (H. G. Wallnoefer, personal communication). For the calculation of MD simulations, the X-ray structures of BaP1 (PDB-code:

130 3.4 Elucidation of the hemorrhagic mode of action of SVMPs

2W15), Acutolysin A (1BUD), and H2-proteinase (1WNI) were extracted form the Protein Data Bank (Berman et al., 2000). The structure of Leucurolysin A is unreleased and was kindly provided by R. N. Ferreira and A. M. de Pimenta, Universidade Federal de Minas Gerais, Brasil (personal communication). These four proteins were chosen, because their activity was determined consistently with the same experimental protocol (Takahashi and Ohsaka, 1970; Xu et al., 1981) (T. Escalante, H. G. Wallnoefer, personal communication). For calculation of the MD simulations, both BaP1 conformations of the 160-162 loop were used (Section 3.1.5). These two structures are treated as two different starting points in the MD simulations and are referred as BaP1 conformer 1 and conformer 2. This led to a total of three simulations for the hemorrhagic active proteins. The peptidomimetic inhibitor of the BaP1*inhibitor complex (Section 3.1.5) is retained for the simulations. The structure of Leucurolysin A contains several 1,2-dihydroxyethane molecules, subsequently abbreviated with ‘edo’. Accordingly, two simulations were set up again. One in the original conditions with the molecules and another one without them. This provides two independent starting points which are referred as ‘Leucurolysin A with edo’ and ‘Leucurolysin A without edo’. After the equilibration, for all six complexes 30 ns of MD simulation were performed. Stable trajectories with mean backbone RMS deviations below 2.0 Å were generated for further analysis. Both conformers of BaP1 undergo a conformational shift in two regions (Figure 63). One is located at residues 59 to 70 and the other at residues 153 to 163. While the former one lies at the opposite side of the protein with respect to the active site, the latter one is positioned directly beside the active site. Actually, it is representing the first loop before the Met-turn. Concerning the two starting points of the distinct BaP1 conformations, neither a conversion from one conformation into the other nor a combined conformational end point is found. Both are rather stable during the complete time of the simulations. Analysis of the flexibility is possible by comparing the mean B factor for each residue over the entire simulation time (Figure 64). Overall, the simulations show very low B factors which means that the structures are well equilibrated and stay significantly rigid during the simulations. In general, an increased flexibility is observed at the N- and C-terminus which can also be seen in the present case. Besides, the already mentioned region from residues 153 to 175 show considerable flexibility. In some cases peaks are found with values ten-fold the mean B factor of all structures over the entire simulation time (38.9 Å2). Only at the three-residue motif of the Met-turn (residues 164 to 166) no flexibility is observable (Figure 65). Interestingly, the models representing hemorrhagic P-I SVMPs (BaP1 conformer 1, BaP1 conformer 2, Acutolysin A) can be distinguished from the models of non-hemorrhagic proteins (Leucurolysin A with edo, Leucurolysin A without edo, H2-Proteinase) by the level of flexibility. The former ones show high

131 3 RESULTS flexibility in the first part of the loop and considerable rigidity in the second part. In fact, the contrast is observed for the simulations of the non-hemorrhagic samples. This means, rigidity before the Met-turn and flexibility after the Met-turn (Figure 65).

Figure 63 Ribbon plot of two snapshots of the BaP1 conformer 1 structure during the MD simulation (Wallnoefer et al., 2010). Alignment of a conformation after 15 ns (green) and a snapshot after 25 ns (pale green). Both loops, Trp59 to Asp70 and Asp153 to Ile165, which undergo structural variations are highlighted.

Figure 64 Residue-wise B factor plot for all six structures over the entire simulation time (Wallnoefer et al., 2010). Overall, the structures are staying very rigid, represented by low B factors. In most parts of the structures only small differences in the B factors are observable. Only the N- and C-terminus and, especially, the region from residues 153 to 175 show high differences in flexibility. Acutolysin A, blue line; BaP1 conformer 1, red; BaP1 conformer 2, yellow; H2-Proteinase, green; Leucurolysin A with edo, purple; Leucurolysin A without edo, light blue.

132 3.4 Elucidation of the hemorrhagic mode of action of SVMPs

Figure 65 Detailed depiction of the B factor plot from Figure 64 showing the flexible region of residues 156 to 175 (Wallnoefer et al., 2010). While flexibility over the entire structures are very similar in this region a differentiation of hemorrhagic (red, yellow, blue) and non- hemorrhagic proteins (light blue, green, purple) is possible. Same color code as in Figure 64 .

To verify the weighting term of the side chain and the backbone part, the B factors have been splitted into separate terms. Thereby, it could be shown that it does not solely depend on the kind of amino acid residue but that the backbone significantly contributes to the dynamics. To address this question, clustering of the trajectories has been performed and representative structures have been calculated. Each representative structure characterizes a group of similar conformations observed at different points in time (snapshots) of the simulations. The number of snapshots of each cluster emphasizes the population and the statistical importance of the corresponding structure during the simulation. Accordingly, in Figure 66 the backbone RMS deviation plots against simulation time are depicted. Mostly, the clusters are stable and well populated during the simulation. This means that they represent important conformational contributions to the structural behavior of the models. The motion of the backbone can impressively be shown by an alignment of the representative structures for each cluster (Figure 67). In the present case, the corresponding five different clusters are aligned onto each X-ray structure of the four examples. It becomes clear that flexibility, as represented by the B factors, is not due to higher fluctuations of side chain residues but rather due to a slow flip of the backbone. This results in a fan-like shaped conformational ensemble of the loops (Figure 67).

133 3 RESULTS

Figure 66 Backbone RMS deviation plots against simulation time (personal communication, H. G. Wallnoefer). The 1200 snapshots extracted from each MD simulation of Acutolysin A (A), BaP1 conformer 1 (B), BaP1 conformer 2 (C), H2-Proteinase (D), Leucurolysin A with edo (E), and Leucurolysin A without edo (F) are colored related to the cluster they belong to. The quantity of clusters’s population ranges red (high), followed by yellow, green, and blue to orange color. Besides considerable stability of the simulations, the reasonability of the chosen clusters is illustrated as the five generated clusers represent a significant time lapse of the simulations.

Figure 67 Ribbon plots of representative structures of all six P-I SVMPs (Wallnoefer et al., 2010). Each protein is illustrated by representative structures of five clusters which were aligned onto the starting point of each simulation (the known X-ray structures). The flexible loop region (residues 153 to 175) are highlighted. In case of the hemorrhagic P-I SVMP models (above), the first part of the loops shows significant conformational changes (right side). In contrast, models of non-hemorrhagic P-I SVMPs (below) show flexibility in the second part of the loops (left side).

134 3.4 Elucidation of the hemorrhagic mode of action of SVMPs

To determine whether the dynamics are due to local conditions or are specifically occurring in an appropriate protein only, the corresponding loops (residues 151 to 175) of hemorrhagic Acutolysin A and non-hemorrhagic H2-Proteinase have been interchanged. Subsequently, the in silico -mutated SVMPs were exposed to the calculation of new MD simulations. To prevent external influences, only the structures have been changed but the original MD protocol was consistent. As indicator for flexibility, the B factor plots have been analyzed and results point to the fact that the dynamic behavior was also exchanged (Figure 68). Now, H2-Proteinase acts if it would be a hemorrhagic P-I SVMP and Acutolysin A as it would be a non-hemorrhagic protein. In detail, the mutated loop of H2-Proteinase shows high flexibility in the first part. Correspondingly, Acutolysin A has its more flexible part after the Met-turn.

Figure 68 B factor plots of the MD simulations of Acutolysin A and H2-Proteinase with interchanged loops (personal communication, H. G. Wallnoefer). A, B factors for all residues of Acutolysin A (blue) and H2-Proteinase (green). Overall, the proteins show the same flexibility as in former simulations (Figure 64) with the exception of the in silico -mutated loop region (residues 151 to 179). B, detailed view of the flexible region (residues 156 to 175). Dashed lines indicate the B factor plots of the native proteins (Figure 64). The plots show inverted dynamic behavior of the mutated proteins.

3.5 Search for new BaP1 inhibiting compounds

Different in silico approaches (Section 2.7) were used to search for new BaP1 inhibiting compounds. As a basis, the high-resolution crystal structures of BaP1 complexed to the peptidomimetic hydroxamate derivative (Section 3.1.7) were used. Mainly, and because the other structures were virtually identical, the highest resolution model (PDB code: 2W15) was taken into account.

135 3 RESULTS

3.5.1 Pharmacophore modeling and virtual screening

Pharmacophore modeling and subsequent virtual screening of small molecule databases were performed to find new inhibitor scaffolds. Methods and strategies are described in Section 2.7.1 and Section 2.7.5.

Pharmacophore model development

A prerequisite for successful structure-based pharmacophore modeling approaches is the sufficient knowledge on structure-activity relationship of the inhibitor and the target protein (Section 2.7.1). In the present work, these interactions between BaP1 and the peptidomimetic inhibitor were extensively discussed before (Section 3.1.7). Additionally, a literature search was carried out for further compounds which are known to inhibit the proteolytic activity of SVMPs. Subsequently, for this study the following SVMP complex structures deposited in the PDB were used. The structures of P-I SVMPs Adamalysin II complexed to different peptidomimetics (PDB code: 2AIG, 3AIG, 4AIG) (Cirilli et al., 1997) (Gomis-Ruth et al., 1998), the structure of FII (1YP1) (Lou et al., 2005) and of TM-3 complexed with several tripeptides (1KUK, 1KUG, 1KUI) (Huang et al., 2002), and of Atrolysin C in complex with Batimastat (1DTH) (Botos et al., 1996) and with two other peptide-like substances (1HTD) (Zhang et al., 1994). In the case of the metalloproteinase domain of P-III SVMPs, the structures of VAP2 (2DW0, 2DW1, and 2DW2) (Takeda et al., 2007) and of RVV-X (2E3X) (Takeda et al., 2007) complexed with peptide-like inhbitors were analyzed. Because of the high similarity of the binding site, crystal structures of TACE*inhibitor complexes, either in complex with peptide-like substances or compounds with distinct scaffolds, have also been added as information source. Here, the following TACE structure PDB entries have been used as peptidomimetics: 1BKC (Maskos et al., 1998), 2DDF (Ingram et al., 2006), 2FV9 (Ingram et al., 2006), 3E8R (Mazzola, Jr. et al., 2008), and 3EDZ (Mazzola, Jr. et al., 2008). X-ray structures of small molecules with distinct scaffolds complexed to TACE were: 1ZXC (Levin et al., 2005), 2A8H (Levin et al., 2006), 2FV5 (Ingram et al., 2006), 2I47 (Condon et al., 2007), 3E8R (Mazzola, Jr. et al., 2008), 2OI0 (Rao et al., 2007), 3EDZ (Mazzola, Jr. et al., 2008), and 3EWJ (Guo et al., 2009). The mentioned complex structures have been analyzed with LIGANDSCOUT (Wolber and Langer, 2005) concerning its structure-activity relationship. In Figure 69 the interactions between a metalloproteinase and its ligand are exemplified by the interaction map of two TACE*inhibitor complexes.

136 3.5 Search for new BaP1 inhibiting compounds

A B

Figure 69 Structure-activity analysis of a complex between TACE and the peptidomimetic inhibitor INN (3EDZ) (A) and the small molecule inhibitor IH6 (1ZXC) (B) depicted with LIGANDSCOUT (Wolber and Langer, 2005). Blue cylinders, zinc-binding feature; yellow forms, hydrophobic features; red arrows, hydrogen bonding acceptor function; green arrows, hydrogen bonding donor function; blue circles/arrows, aromatic feature.

First of all, the quality of the structure models was checked in terms of resolution range, R factors, B factors, electron density maps, etc. (Section 2.6.1), and only the well-interpreted models were chosen. The most important selection criterion was the visible electron density at the binding site which represents the actual position of the inhibitor scaffold on an atomic level. The information derived from zinc ion coordination, side chain mobility as well as position of inhibitor residues and side chains of the target proteins was used to differentiate between important and less important interaction features. This discrimination is essential as the initial pharmocophore model derived from X-ray structures contain too many interaction features and would therefore be too restrictive for subsequent virtual screening. In Figure 70 an initial pharmacophore model is exemplified.

Figure 70 Initial pharmacophore model derived from the BaP1*inhibitor complex (2W15). For clarification the inhibitor scaffold is also depicted. Blue cylinder, zinc-binding feature; yellow spheres, hydrophobic features; red arrows, hydrogen bonding acceptor function; green arrows, hydrogen bonding donor function; blue circles/arrows, aromatic feature; gray spheres, exclusion volume spheres.

137 3 RESULTS

For ligand-based pharmacophore modeling (Section 2.7.1), all known inhibitors of SVMPs, and also of TACE, have been aligned and common features were extracted to find the most important interactions. For this purpose, the ligands of aforementioned SVMP and TACE*inhibitor complex structures were taken into account. Furthermore, several small molecule inhibitors have been retrieved from literature. A selection of these structures is given in Figure 72 and in Figure 73. Subsequently, it is not differentiated between pharmacophore models generated by the use of structure- or ligand-based strategies. With the information of both methods four different pharmacophore models have been developed. The models can be divided into one group (two models) which were derived from peptide-like inhibitor structures and another group (two models) which were generated by the use of small molecules with distinct scaffolds. The models can be differentiated by the different composition of their single features (hydrophobic, aromatic, hydrogen bonding donor/acceptor, zinc-binding feature). Steric constraints for virtual screening were included by adding exclusion volume spheres to the models. The composition of the pharmacophore models concerning the interaction features is listed in Table 17 and the corresponding models are shown in Figure 71.

Table 17 Number of distinct features of the four models. #1 #2 #3 #4 Zinc-binding feature 1 0 1 1 Aromatic feature 0 1 0 0 Hydrophobic feature 2 1 1 2 H-bond acceptor feature 1 2 2 2 H-bond donor feature 0 1 1 1 Exclusion volume spheres 16 15 15 15

#1 #2 #3 #4

Figure 71 The four final pharmacophore models which were developed with either structure-based or ligand-based methods. Models #1 and #2 have been generated by the use of small molecule inhibitors without peptide-like scaffolds, whereas model #3 and #4 have been developed with peptidomimetic structures. Blue cylinder, zinc-binding feature; yellow spheres, hydrophobic features; red arrows, hydrogen bonding acceptor function; green arrows, hydrogen bonding donor function; blue circles/arrows, aromatic feature. Exclusion volume spheres are not shown for clearer arrangement.

138 3.5 Search for new BaP1 inhibiting compounds

Validation of the pharmacophore models

A common problem in generating test and trainings sets for the validation of pharmacophore models is the lack of structurally similar ligands showing no or bad inhibitory effects (Section 2.7.5). Therefore, it was tried to evaluate the four pharmacophore models by test and training sets which mainly contain known inhibitors of TACE and only to a minor degree of SVMPs. As the phamacophore models were developed either from small molecules or from inhibitors with peptide-like scaffolds their validation had to be done with different sets. The test and training set for the peptide-like models was made of 47 substances derived from literature and the original inhibitor of BaP1. That one for the other pharmacophore models was built with 67 known inhibitor structures which are not characterized by a peptide-like scaffold. Prior to validation, members of test and training sets were clustered and, subsequently, separated in case of high structural similarity to generate diverse libraries. Afterwards, conformer generation was performed with CATALYST using the following settings: maximum number or conformers: 250; generation type: best quality; energy range: 20 kcal/mol above the lowest calculated energy.

Figure 72 Structures with small molecule scaffolds derived from PDB entries either for pharmacophore modeling or validation of the models. 615, 3E8R (Mazzola, Jr. et al., 2008); 4NH, 2A8H (Levin et al., 2006); 440, 3B92; KGY, 2I47 (Condon et al., 2007); IH6, 1ZXC (Levin et al., 2005); 541, 2FV5 (Ingram et al., 2006). 1BKC (Maskos et al., 1998), 2DDF (Ingram et al., 2006), 2FV9 (Ingram et al., 2006), 3E8R (Mazzola, Jr. et al., 2008), and 3EDZ (Mazzola, Jr. et al., 2008).

139 3 RESULTS

Figure 73 Structures with peptide-like scaffolds derived from PDB entries either for pharmacophore modeling or validation of the models. 002, 2FV9 (Ingram et al., 2006); INN, 3EDZ (Mazzola, Jr. et al., 2008).

Refinement of the pharmacophore models was performed until the virtual screening of the test sets provided satisfactory results. During the final validation step of pharmacophore model #1, 77% of the queries were found with 31% showing BestFit values > 1. Pharmacophore model #2 provided similar results with 64% hits but only 6% of them with BestFit values > 1. Both peptidomimetics- derived models (#3 and #4) showed even higher hit rates (89% and 100%) but with lower number of well-aligned compounds (26% and 32% BestFit values > 1). Further validation was performed through screening of the WDI database (Section 2.7.5). All models showed a reasonable quantity of hits while screening this database (#1, 163 hits; #2, 511 hits; #3, 620 hits; #4, 431 hits). Hit quantities between 100 and 500 were regarded as models with a reasonable specificity. This means that pharmacophore models with many interaction features are very specific and only produce a small amount of hits in the DWPI database. In contrast, pharmacophore models with a low quantity of interaction features are less specific and show a high number of hits. All entries of this database are former (or actual) drugs or drug candidates for which information on their actual biological target proteins is available. In the present case, it was possible to take advantage of this fact as the database comprises many examples of known metalloproteinase inhibitors and only if the models found the majority of them, the pharmacophore model was considered as well-refined.

Virtual screening results

Virtual screening was performed as described before (Section 2.7.5). The first screening experiments were established with the National Cancer Institute (NCI) database. This is a free biomedical database which is commonly used to find new lead stuctures in a first step in in silico drug discovery approaches. Results of the screening process with the aforementioned pharmacophore models were promising and led to reasonable quality criteria (number of hits and BestFit values) (Table 18).

140 3.5 Search for new BaP1 inhibiting compounds

Table 18 Screening results with the NCI database. Data/model #1 #2 #3 #4 Hits 348 1153 542 1652 BestFit > 4 - 0 4 6 BestFit > 3 2 3 34 45 BestFit > 2 13 52 n.d. 68 BestFit > 1 54 157 n.d. n.d. n.d., not determined.

As described before (Section 2.7.5), virtual screening with CATALYST is defined by an alignment of the compound conformations into the pharmacophore model and the only restriction concerning shape is included through the excluded volume spheres of the pharmacophore model. To increase the shape factor in the ranking a subsequent ROCS search was performed (Section 2.7.5). The best 100 hits of each NCI database screening were subjected to a search against the shape of the original BaP1 inhibitor (query) and putatively inhibiting compounds were rationally chosen from the received hit lists.

Compound testing of the proteolytic activity of BaP1

In total, 74 different compounds were chosen from the hit lists for subsequent in vitro testing of the inhibitory effects. Compounds were chosen in a way that a wide range of distinct compound scaffolds should be covered (Table 32, Appendix 1.1). Except of 7 substances, all compounds were soluble in DMSO. In the case of the 7 compounds no suitable solvent without affection to protein’s native state was found. In a first assessment, two distinct concentrations of the putative inhibitors were analyzed (50 and 100 µM). Testing of the proteolytic activity was performed as described before (Section 2.3.3). Most of the substances showed only low or very moderate inhibitory effects and data were also ambigous in between the two concentrations (Table 32, Appendix 1.1). Nevertheless, the 23 compounds with the most promising effects were selected for further analyses and tested at a considerably higher concentration (2.5 mM). Again, the results were not promising as the highest obtained BaP1 inhibition was 7.9% (compound 307442_187).

Further development of pharmacophore models and second virtual screening

The information about the known inactive compounds was included in the original test and training sets and the pharmacophore models were exposed to another round of refinement. In total, 36 compounds without a known inhibitory effect against BaP1 were chosen to act as structurally related but inactive compounds. Subsequently, the validation of the test and training sets was of higher quality. Additionally, a set of 651 decoys were extracted from the WOMBAT database (Olah et al., 2007). This is a database of compounds with known biological activity which is commonly

141 3 RESULTS used for virtual screening enrichment studies like the generation of decoys. Furthermore, all active structures of the test and training sets were subjected to a ligand-to-protein docking for the structural analysis of the binding to BaP1. On the basis of this information and the already developed pharmacophore models, three new models were generated (data not shown), and were validated with the renewed test and training sets. The final step of screening produced promising results with a hit rate between 0 and 8% in the set of the inactive substances and between 7 and 31% in the set of the actives (Table 19).

Table 19 Number of hits during the second validation. Library/model #5 #6 #7 Inactives (empirical data) 0 / 36 1 / 36 3 / 36 Actives (WOMBAT data) 43 / 651 203 / 651 173 / 651

Again, the screening of the WDI database was performed and the amount of hits found was in acceptable quality levels (#5, 633 hits; #6, 261 hits; #7, 345 hits). The subsequent screening of the NCI database also provided promising results (#5, 525 hits; #6, 403 hits; #7, 375 hits). Interestingly, only a small amount of the formerly chosen and tested NCI substances overlapped with the hits of the new attempt (#5, 2/74, 2.7%; #6, 3/74, 4.1%; #7, 1/74, 1.4%) due to slight geometric changes in the new models. This time, also commercial databases were screened, such as ENAMINE, ASINEX, CHEMBRIDGE, CHEMDIVERSE etc. The resulting hits were imported into LIGANDSCOUT and ranked according to the shape of the original inhibitor. Similarly to the ROCS search before, this step enhances the influence of molecular shape similarity on the selection process. To further validate the results the 300 best hits were prepared to perform a ligand-to- protein docking with the output of the best two poses each. Again, the selection criterion was visual inspection of the docking poses and not calculated docking scores (Warren et al., 2006). The most promising compounds were then chosen for subsequent in vitro testing.

Biological activity testing of the final hits

After evaluation of the docking results, four structures of the most promising docking poses were extracted. Invariably, the hits were found to be of the ENAMINE database. The in vitro analysis of the chosen substances concerning the inhibitory effects of BaP1’s proteolytic activity showed only low values (Table 20). As before, potential was subsequently tested with a higher concentration of the compounds (2.5 mM) which as well did not lead to promising inhibitory effects (T5777832, 10.1%; T5870487, 9.5%; T5960832, 7.4%; T6049755, 17.8%).

142 3.5 Search for new BaP1 inhibiting compounds

Table 20 The four most promising hits of the ENAMINE database. Compound Structure Mol. weight Rel. BaP1 activity [%] ID [g/mol] 50 µM 100µM

T5777832 358.4 92.4 90.7

T5870487 397.5 97.4 95.1

T5960832 369.3 99.8 94.7

T6049755 329.3 100.2 94.8

3.5.2 Calculation of binding affinities with MD simulations

As mentioned before (Section 2.7.3), it is possible to obtain binding affinities of putatively inhibiting compounds concerning a structurally known target with the aid of MD simulations and trajectory postprocessing free energy calculations. Again, the absolute values, e.g. dissociation constants KD or half maximal inhibitory concentrations IC 50 , are not available or are only partly meaningful but comparison between different similar systems provides reliable data. All MD simulation mentioned in this context have been performed by Thomas Steinbrecher. With the data of MD simulations of the BaP1*inhibitor complex (time range 2 ns) and the MM- GBSA technique a free binding energy G = -8.5 kcal/mol was obtained which would correspond to a dissociation constant K D = 581 nM. Using MM-PBSA calculations, the value of ligand binding was twice as large, whereby in both cases most of the affinity seem to be afforded by hydrophobic contacts. Electrostatic interactions and desolvation expenses are large, but cancel each other almost perfectly, so that they play only a minor role. The same system and calculations were used to explore the contribution to ligand binding by the S1' pocket. Responding to the known hydrophobicity of this pocket in metzincins and the data on trapped water molecules in the pocket (Section 3.3.2), the complexed inhibitor was virtually manipulated at the P1' residue ( i-butyl) (Figure 74). Therefore, the side chain of the structure was either elongated in its aliphatic chain or

143 3 RESULTS new hydrophobic residues were implemented (Table 21). All modification at the P1' residue resulted in lower binding energies and indicate a higher affinity towards BaP1.

Figure 74 Schematic structure of the original inhibitor apart from the residue at P1' which is manipulated with different aliphatic groups (Table 21).

Table 21 Binding free energies G and T·S for five different virtual designed inhibitors in kcal/mol. Data/ligand resiue R CC(C)C CC(C)CC CC(C)CCC CPh CC1C=CC=C1 CCPh

Gtotal (MM-GBSA) -40.3 -46.3 -40.0 -42.0 -38.7 -45.8 Gtotal (MM-PBSA) -33.0 -38.5 -36.3 -37.4 -35.4 -42.1 T·S (rot./transl.) 24.4 24.5 24.7 24.6 24.6 24.8

GB -8.5 -14.0 -11.6 -12.8 -10.8 -17.3

GB 0 -5.5 -3.1 -4.3 -2.3 -8.8

For the verification of the results, longer and more descritpive MD simulations were performed. Additionally, calculations of P1' residues shortened by one or two methyl groups were realized to address the problem of false positive results. The basic setting of the simulations were consistent with prior experiments but the simulation length was elongated to 10 ns. Binding free energies were not corrected with the entropy factors which resulted in so-called binding scores and not in the absolute binding energies. Invariably, the elongated residues showed a more favored binding fashion and the shortened ones a slightly decreased affinity towards BaP1 (Figure 75). The highest binding free energies were obtained by either a 2-methyl-pentyl side chain or a toluene residue. In all cases, the improved binding energies are resulting from the augmented van der Waals interactions with the protein. Moreover, solvation terms play a minor role.

144 3.5 Search for new BaP1 inhibiting compounds

Figure 75 Relative binding free energies according to MM-GBSA (red) and MM-PBSA (green) trajectory postprocessing free energy calculations of seven putative inhibitors. All G values are given relative to the original inhibitor. Error bars indicate fluctuations of the results over different subsets of the trajectory. For clarification of the functional groups it is referred to Table 21.

Subsequently, a set of structurally similar and commercially available compounds were in silico analyzed concerning their binding properties towards BaP1. Thereby, known metalloproteinase inhibitors (Tapi-2, Batimastat, Tosedostat, Actinonin) and small molecules (norvaline hydroxamate and lysine hydroxamate) of unknown activity against metalloproteinases were analyzed (Figure 76). These substances were chosen because of only slight differences in structure compared to the original inhibitor. Additional aims were to further elucidate the structure-activity relationship according to elongation at the P1' residue (Actinonin) and the importance of the hydroxamate group (lysine and norvaline hydroxamate) for metalloproteinase inhibitor binding. The results of MM- PBSA and MM-GBSA calculations differ significantly from each other. A clear interpretation of the data, in this respect, is barely feasible (Figure 77). Nevertheless, norvaline and lysine hydroxamate can presumably be excluded from being significant inhibitors of BaP1. Furthermore, Actinonin and Tosedostat possibly show a lower or comparable activity, whereas Tapi-2 and Batimastat should turn out to be more compotent inhibitors of BaP1. As some inhibitors may exist

145 3 RESULTS in different protonation states, MD simulations for all possible of those were conducted, with the best result for G chosen in each case.

Figure 76 Schematic structures of the putative inhibitors of BaP1 used for affinity calculations. Tapi- 2, Tosedostat, Actinonin, and Batimastat are known metalloproteinase inhibitors. The inhibiting effects of norvaline and lysine hydroxamate were not known. For reference, the structure of the original inhibitor is also depicted.

Figure 77 Resulting relative binding free energies from MM-GBSA (red) and MM-PBSA (green) trajectory post processing of 10 ns length MD simulations. Error bars indictate fluctuations of the results over different subsets of the trajectory. Binding free energies are calculated relative to the original inhibitor (WT). Actinonin, Batimastat, and Tosedostat were analyzed in the presented form (Figure 76), whereas Tapi-2 and norvaline hydroxamate were analyzed in the neutral (neut.) and in the protonated form (prot.). Simulations of norvaline and lysine hydroxamate were performed in neutral and protonated forms.

146 3.5 Search for new BaP1 inhibiting compounds

For confirmation of the in silico data, in vitro BaP1 inhibition tests were set up with Batimastat, Tosedostat, the original inhibitor and norvaline hydroxamate. The experiments of the inhibitory effects on the proteolytic activity of BaP1 led to the following IC 50 values: 6.6 µM (Batimastat),

12.9 µM (original inhibitor), and 19.2 µM (Tosedostat) (Figure 78). The IC 50 value of norvaline hydroxamate could not be determined as at 2.5 mM still 95% of BaP1’s proteolytic activity was observable (data not shown).

Figure 78 Inhibitory effect of the hydroxamate derivatives on the proteolytic activity of BaP1. Sigmoidal dose-response curve as semi-log plot of the inhibitor concentration and the relative inhibition of the proteolytic activity of BaP1. Activity testing was determined on the basis of unselective cleavage of azocasein. Statistical analysis and non-linear regression were performed with PRISM 5.0 (GraphPad Software, Inc.) and led to IC 50 values of 6.6 µM (Batimastat), 12.9 µM (original inhibitor), and 19.2 µM (Tosedostat). The corresponding 95% confidence intervals were calculated to 6.2-6.9 µM (Batimastat), 12.4- 13.3 µM (original inhibitor), and 18.6-19.9 µM (Tosedostat). Data are expressed as mean ± standard deviations of the relative inhibition.

147 4 SUMMARY

4 Summary

Crystal structure of the BaP1*inhibitor complex

The first part of this dissertation was related to the structure determination of the complex of BaP1 from Bothrops asper and a peptidomimetic inhibitor. BaP1 is a P-I SVMP and is one of the major enzymatic contents of the venom of B. asper . It is responsible for unselective cleavage of peptides and other proteins in the tissue of prey which leads to severe systemic and local effects. Purification procedure of the native enzyme was modified and crystals of the inhibitor complex were obtained in different crystallization conditions, either by cocrystallization or crystal soaking. All crystals belonged to the same space group and the structures were solved by molecular replacement at 1.46, 1.14, 1.08, and 1.05 Å resolution. The two subdomains of BaP1 fold into a single globular domain with the typical α/β-fold of SVMPs’ metalloproteinase domains. The major N-terminal domain adopts four α-helices and a five stranded β-sheet. In between both subdomains a typical proteinase binding pocket is found. Already after the first rigid body refinement, the electron density of the inhibitor laying in this substrate binding pocket was unambiguous. The peptidomimetic inhibitor is held into place through profound interactions of distinct character. Indeed, the hydrogen bonding network between enzyme and inhibitor backbone resembles that of an antiparallel β-sheet. Besides, hydrophobic contacts and a rarely seen cation-π interaction play a considerable role. In total, four different subsites of the enzyme are involved in inhibitor binding. A comparison with the unliganded protein shows that the inhibitor binding only affects the substrate binding pocket while the rest of the protein is unperturbed. The binding cleft is significantly narrowed as both pocket-flanking loops shift inward by about 3 Å. In all models, a considerable amount of residues were found to be present in double conformations most of which are located at the protein surface. Nevertheless, an unexpected finding in all four high-quality structures was the double conformation of a complete loop region. It consists of four residues and is located directly before the invariant Met-turn.

Multiple sequence alignments of the metalloproteinase domain of SVMPs

Comprehensive multiple sequence alignments of the metalloproteinase domain of SVMPs were the goal of another part of this thesis. Prerequisite for these studies were the entries of SVMP sequences in the UniProtKB/SwissProt database. Additional data and more statistical significance were gained by including sequence segments of SVMPs with unknown P-classes. Notably, high sequence identities are found in between the metalloproteinase domains, not only within but also in

148 Multiple sequence alignments of the metalloproteinase domain of SVMPs between the different P-classes of SVMPs. Especially, around the zinc-binding region a high invariance can be observed. Altogether, the former zinc-binding moiety could be elongated according to preferences of amino acid residues. Furthermore, directly after the second zinc-binding histidine residue, an invariant three-residue motif (147-NLG-149) is found in all 132 analyzed sequences. In the present work, this invariant motif is noted for the first time. Most of the interacting counterparts are also conserved amino acid residues and their structural importance is not anymore deniable. The first and the last of the three residues of the Met-turn are found to be highly conserved. Only the central residue shows slight variance but has to be of hydrophobic character. The functional importance of the methionine residue is still widely discussed, but in the present work it could be shown that an invariant phenylalanine residue is positioned in interaction distance to this methionine residue. The minimum amount of cysteinyl residues within the metalloproteinase domain is four. This means that the SVMPs’ metalloproteinase domain at least contains two invariant disulfide bonds (Cys117-Cys197 and Cys159-Cys164). The majority of the sequences are characterized by a third one (Cys157-Cys181). In the case of two P-III and one P-II SVMP even eight cysteinyl residues are found. From the structure of the P-III RVV-X it is known that it forms four disulfide bonds but also in the P-II example these residues seem to lay in actual interaction distance. Overall, P-I SVMPs were observed to contain either four, six or seven, P-II SVMPs either five, six, seven or eight, and P-III SVMPs either six, seven or eight cysteinyl residues. The three disulfide bonds are formed in all proteins containing more or equal amount of six cysteinyl residues. Structural comparisons have revealed that there are three predominant positions for odd cysteinyl residues which may form inter- domain disulfide bonds and thereby may influence the proteolytic processing of SVMPs, such as the release of disintegrins.

Proteolytic reaction mechanism of metzincins

Additionally, the proteolytic reaction mechanism of metzincins, especially its dependency on the pH value, was elucidated. Results of the X-ray structures and theoretical calculations of pK a values were summarized to gain further insight. X-ray structures revealed that the zinc coordination geometry changes from four-coordinate tetrahedral to five-coordinate square pyramidal upon inhibitor binding which was observed in all structures derived from crystals in different pH conditions (4.6, 6.5, 7.5, and 9.0). The high quality of the structures permits a proposition about the atomic interaction distances, especially because metal-ligand restraints were removed during the last step of refinement. It could be shown that the coordination distances of the histidine N ε2-atoms and the zinc ion are not varying between the distinct pH values. Similarly, the distance between the Oε1-atom of the glutamate residue and the hydroxamatic substructure of the inhibitor is invariant.

149 4 SUMMARY

This interaction represents the catalytically essential interaction of the glutamate residue and the activated water molecule during the reaction mechanism of native metzincins.

Furthermore, by MD simulations it could be shown that the pK a values of the catalytic glutamate residue in the protein is decreased in comparison to the free amino acid. This is the case for SVMPs, but also for other metzincins, such as MMP-13 and TACE. It was also shown that displacement of a considerable amount of bound water molecules is required for inhibitor binding. MD simulations indicated that the energetic expenses of the side chain rotamer changes seem to be compensated by the interaction network of the inhibitor and the protein. This is the reason why inhibitor binding results in an energetically favored process.

Elucidation of the hemorrhagic mode of action of SVMPs

X-ray structures, sequence alignments, and MD simulations of hemorrhagic and non- hemorrhagic SVMPs were used to elucidate the hemorrhagic mode of action of SVMPs. In all four X-ray structures a certain loop region located in the near proximity to the active site was discovered to be present in a double conformation. Crystallographic B factors indicate a high flexibility in this part of the protein, although the succeeding Met-turn and the rest of the protein are highly rigid. As a matter of fact, hemorrhagic activity depends on the proteolytic action of the metalloproteinase domain. Interestingly, certain P-I SVMPs are known which show similar proteolytic acitvity towards the same substrates but differ widely in the potential to induce hemorrhagic effects. This reason led to the assumption that there have to be differences already in the metalloproteinase domain of SVMPs. The above-discussed multiple sequence alignments were set up to address this question. Specific distinctions in the primary structures of the metalloproteinase domain which can be correlated with hemorrhagic activity could not be found but the main reason therefore was the lack of hemorrhagic activity data for reliable comparison. Many different ways to measure these effects are known and most of them are not directly comparable. Consequently, just a small set of proteins were taken into account for further investigations (BaP1, Acutolysin A, BmooMPalpha-I, Leucurolysin A, and H2-Proteinase). Detailed sequence alignments depicted that differences are only marginal and corresponding residues are mostly laying at the surface of the proteins. Noteworthy, the majority of these residues are located around the active site, especially as part of the aforementioned loop region. Further alignment showed that hemorrhagic SVMPs contain a consensus motif in the first part of the loop. These ten residues are bordered by the zinc-binding motif on the N-terminal side and the Met-turn on the C-terminal side. The kind of amino acid residues used in the hemorrhagic SVMPs usually are known to permit a high degree of backbone flexibility which is another hint that flexibility might be the wanted criterion. Ultimately, MD simulations have indicated that there is a different flexible behavior in between hemorrhagic

150 Elucidation of the hemorrhagic mode of action of SVMPs and non-hemorrhagic P-I SVMPs. While the former ones show enhanced flexibility in the first part of the loop and are rigid in the second part, the latter ones are flexible in the second and rigid in the first part. Even, the modification of an inactive to an active protein was possible in in silico experiments. The mutational change of the complete loop led to interchanged flexibilities in subsequent MD simulations.

Search for new BaP1 inhibiting compounds

The last part of this dissertation dealt with the search for new BaP1 inhibiting compounds. Two different approaches were used to address this topic. Firstly, the methods of pharmacophore modeling and virtual screening were tried to generate new lead structures. Secondly, theoretical calculations of binding affinities via MD simulations were performed to distinguish between important and less important inhibitor residues. During the former procedure, different well- validated pharmacophore models were established either via the structure- or the ligand-based approach. Basis therefore were known metalloproteinase*inhibitor complex structures or known inhibiting compounds derived from literature. Preferentially, this information was related to SVMPs, but data concerning ADAMs, such as TACE, were also taken into account. Several databases were screened with the validated models which resulted in a selection of 74 NCI database screening hits. Subsequent biological testing of the proteolytic activity led to no significant inhibitory effects. The inclusion of the gained information into further pharmacophore model development led to a more qualitative validation and two different models were generated. The screening hits were additionally docked to the binding site of the BaP1 structure for rational selection. Compounds with the most promising docking poses were chosen for biological testing. However, all four substances showed only low inhibitory effects on the proteolytic activity of BaP1. Through MD simulations the dissociation constant of the original inhibitor was calculated to a

KD = 581 nM, whereas in vitro testing led to an estimated IC 50 = 12.9 µM. Further, qualitative data could be gained by comparing relative values of others compounds with the original inhibitor. In a first attempt, it could be shown that aliphatic elongation of inhibitor’s P1' residue leads to higher affinities, because the inhibitor can protrude deeper into the hydrophobic S1' pocket of the protein. Subsequently, a set of structurally similar and commercially available compounds were analyzed. Known metalloproteinase inhibitors (Batimastat, Tapi-2, Tosedostat, Actinonin) and small molecules with unknown activity against metalloproteinases (lysine hydroxamate and norvaline hydroxamate) were analyzed. Calculations indicated that only Batimastat and Tapi-2 should be more competent inhibitors. These results could be confirmed by subsequent biological testing with the result of a lower IC 50 value for Batimastat (6.6 µM) and a higher IC 50 value for Tosedostat (19.2 µM) and only slight activity of norvaline hydroxamate (5% inhibition @ 2.5 mM).

151 5 DISCUSSION

5 Discussion

Metzincins play an important role in many diseases affecting humans (Edwards et al., 2008; Klein and Bischoff, 2010a/b; Murphy, 2008). However, effective and selective drugs against metalloproteinases are still rare. There is no neutralizing drug against the local pathological effects caused by snake-bites, and in spite of massive metalloproteinase research, only a few inhibitors against human ADAMs and MMPs have passed clinical trials (Hu et al., 2007; Ramos and Selistre- de-Araujo, 2006). In the majority of cases these small molecule inhibitors contain a zinc-binding group, a peptide backbone or peptidomimetic scaffold, and suitable side chains that bind to the enzymatic subsites (Jacobsen et al., 2007). Because metalloproteinases often cleave a wide range of targets, they are quite flexible in binding peptides and often lack specificity (Sternlicht and Werb, 2001). Thus, developing highly selective metalloproteinase inhibitors still poses one of the main challenges in the search for successful clinical candidates. Especially, three-dimensional structures of metalloproteinase*inhibitor complexes can present crucial informations on the structural determinants of substrate selectivity and build the basis for structure-based drug design approaches.

5.1 Crystal structure of the BaP1*inhibitor complex

The BaP1*inhibitor complex reported here contributes to metalloproteinase crystal structures with one of the best resolution ranges which have ever been obtained. As it is the first structure of BaP1 complexed to an inhibitor, it is an important step concerning inhibitor design of SVMPs and other metalloproteinases. Despite the moderate IC 50 value of the inhibitor, the high-resolution structures are excellent starting points for further optimization. The atomic detail offers insight into inhibitor binding and the structural determinants for proteolytic activity. As mentioned in Section 1.1.3, it is generally accepted that the conserved glutamate residue plays an important role in acting as general base during the reaction mechanism of metzincins (Gomis-Ruth, 2003). During the addition step the catalytic water molecule is deprotonated allowing its nucleophilic attack at the carbon atom of the peptide bond (Lovejoy et al., 1994; Stocker and Bode, 1995). This causes a tetrahedral reaction intermediate and a protonated glutamate residue. During the elimination step the proton is transferred to the N-terminus of the cleaved substrate. Presumably, the proteolytic reaction mechanism of BaP1 is trapped by the peptidomimetic inhibitor in the transition state of the reaction. This seems to be possible through a profound interaction network between the inhibitor scaffold and the protein. These interactions are dominated by the backbone/backbone hydrogen bonding as it is known for substrates bound to metzincins (Gomis- 152 5.1 Crystal structure of the BaP1*inhibitor complex

Ruth, 2003). However, side chain interactions of the inhibitor residues also contribute to the binding (Figure 46, page 94). The hydrophobic interaction of the t-butyl (P2') and the cation-π interaction of the thiazole residue (P1) seem to fit perfectly into the preferences of the enzymatic subsites (S1 and S2'). The deep tunnel-like S1' subpocket of BaP1 suggests that the activity of an inhibitor might be improved by using a larger residue at the P1' position. Comparison of the inhibitor complex with the native protein structure has enlightened that most of the metalloproteinase domain is rigidly kept in its typical globular shape apart from the binding site in which several changes occur (Figure 47, page 96). Interestingly, the affected side chain residues seem to be energetically flexible enough for an adaption to different side chains of the corresponding substrate. This in turn, seems to contribute to the broad substrate specificity which is known for SVMPs, MMPs, and other metalloproteinases. Another interesting modification that takes place upon inhibitor binding is the movement of both sides of the binding site. The corresponding loops shift inward to hold the substrate into the right place while it is cleaved during the catalytic reaction. Like the claws of a crab, the protein seems to retain its substrate with the scissile bond at the active site for subsequent cleavage.

5.2 Proteolytic reaction mechanism of metzincins

In several metzincins a rapid decrease of activity can be observed in acidic conditions (Fasciglione et al., 2000; Ramos and Selistre-de-Araujo, 2006; Vuotila et al., 2002; Xu et al., 2004; Zhu et al., 1999). It was proposed that the protonation state of the glutamate causes the pH dependency of the reaction mechanism of metzincins (Gomis-Ruth, 2003; Lovejoy et al., 1994; Stocker and Bode, 1995). In contrast, Xu et al. (2004) proposed that a widespread change in enzyme structure is responsible for the loss of proteolytic activity at low pH values. Although, the protonation state of the glutamate residue is not determinable through a simple X- ray experiment, one may speculate that it is protonated in the complex structures, because both of its oxygen atoms are found to be at hydrogen bonding distance to the hydroxamic group (Table 11, page 120). As the pK a value of the hydroxamic oxygen atoms is significantly lower than the pK a of the glutamic acid, both oxygen atoms may be deprotonated and, hence, the only possibility is the protonated form of the glutamic residue. It seems that the putative reaction is trapped by the inhibitor in between the addition and the final elimination step (Figure 49, page 97). In the present case, an effect of the pH value on the acidic behavior of the glutamate residue, which definitely would take place in case of a free amino acid in solution, is not determinable. No significant variations in the corresponding Glu143(O ε1)-Hyd1(OH) and Glu143(O ε2)-Hyd1(N) distances at different pH values can be observed (Table 11, page 120). This fact contrasts with other

153 5 DISCUSSION observations in unliganded SVMPs. Herein, a dependency on the interaction distances of the Glu143(O ε1) and the oxygen atom of the active water molecule was proposed (Zhao et al., 2007). However, no evidence exists for such a change in the BaP1*inhibitor complexes. In contrast to proteolytic activity, inhibitor binding does not seem to be dependent on the pH. Additionally, the profound interaction network of the inhibitor and the protein may contribute to and explain the invariant distances. The energetic benefit of these interactions could possibly be in a similar range as the expenses which are necessary to freeze the atomic interaction distance even if the ionic character of the glutamate is changing. However, the atomic distances of the zinc ion and the N ε2- atoms of the histidine residues can also not be correlated to the pH value. Therefore, the explanation of the pH dependency with atomic distances is rather doubtful but the energetic benefit may be high enough to account for unfavorable binding. In total, conclusions concerning atomic distances should be regarded with caution, even though the estimated overall coordinate errors of the present high-resolution structures are significantly lower than those of previously available low-resolution structures (1.8-2.2 Å) (Zhu et al., 1999). Another proposed explanation is the correlation of pH conditions and widespread enzyme structure changes. However, this is also not the case for the BaP1*inhibitor complex structures as they are virtually the same. The high resolution structures provide detailed information on atomic positions and the differences between the distinct pH conditions are marginal and only single amino acid residues are affected at the protein surface. All the protein’s core, the backbone, and the important residues and consensus motifs are positioned in the same way. Indeed, the loss of activity with the change of the pH value has to be correlated with an alteration of the ionic behavior of any participating item and the glutamate residue as well as the zinc ion are essential elements. However, atomic interaction distances seem not to play the role according to the results of the present work. A more probable explanation would be that one proposed by Wu et al. (Wu et al., 2009) (Section 1.1.3). By MD simulations they showed that the distance between both oxygen atoms of the glutamate residue and the water molecule is not variable during the simulation of the native structure of Acutolysin A. However, the distances in between the hydrogen atoms and the oxygen atom are distinct. The distance concerning the hydrogen atom protruding towards the glutamate residue is significantly higher than the distance concerning the other hydrogen atom. This fact underlines the proposal in which water activation is involved in the reaction mechanism. With the knowledge of this fact, the dependency on the proteolytic activity may be explained by the probability of the water molecule being present as hydroxide ion. This probability is considerably lower under acidic conditions than under basic conditions. Results of the present work support this hypothesis rather than the one mentioned before.

154 5.2 Proteolytic reaction mechanism of metzincins

MD simulations have confirmed that the glutamate residue is probably protonated, at least in a considerable percentage of molecules in solution. This can be applied to the inhibitor complex but also to the native protein structure of BaP1, and even to the human metalloproteinases MMP-13 and TACE. Therefore, the generally accepted reaction mechanism has to be reconsidered, because in a protonated state it would not (or only barely) be able to activate the water molecule for nucleophilic attack at the scissile bond. To clarify the actual importance of the glutamate residue more specific investigations have to be set up which focus on its ionic behavior. The displacement of a considerable amount of water molecules from the binding site upon substrate/inhibitor binding should usually lead to an entropically favored process. This part and the comprehensive interaction network dominated by backbone/backbone interactions are energetically necessary for the frequent changes of the side chain rotamers. This energetic system is probably responsible for the broad specificity of SVMPs. The gain of substrate/inhibitor binding is relatively high and the profit can be invested into profound side chain rotamer flips and loop shifts. This in turn results in an increased interaction between protein and structurally different substrates.

5.3 Elucidation of the hemorrhagic mode of action of SVMPs

The main function of SVMPs is to hydrolize a variety of extracellular matrix proteins, including those of the BM surrounding endothelial cells in the microvasculature (Bjarnason and Fox, 1994; Hati et al., 1999). Hydrolysis of BM components and its binding is a key step in the pathogenesis of venom-induced hemorrhage (Gutierrez et al., 2005; Markland, 1998a). In earlier works, it has already been reported that hemorrhagic effects of SVMPs could be correlated to a certain loop region which is located between the consensus zinc-binding motif and the invariant Met-turn (Ramos and Selistre-de-Araujo, 2004; Watanabe et al., 2003). The unexpected finding of this loop region being present in a double conformation gives new insight to explain the structural prerequisite for hemorrhagic activity of P-I SVMPs. Even if one has to note that the regarded loop area is marginally touched by one of the crystal contacts and it is known that this can affect the flexible behavior of protein loops (Table 24, Appendix 1.1, page 183). The present high-resolution structures indicate that the first part of this loop region is of considerable flexibility in the BaP1 molecule. In fact, most of the amino acid residues between the zinc-binding motif and the equally invariant and very rigid Met-turn seem to be able to change easily the conformation of the complete loop. Based on this information an idea was born that hemorrhagic potential of SVMPs may be correlated to the structural behavior of this part of the protein. Possibly, substrate recognition during the hemorrhagic mode of action can be realized in an ’open-close’ mechanism at the protein-protein interface. This hypothesis is supported by the

155 5 DISCUSSION observed flexibility of the loop and argues for its functional relevance. The catalytic center and other parts involved in proteolytic activity, like the conserved Met-turn, are not affected by the loop’s conformational change. Therefore, it can be assumed that proteins are capable of varying their surface properties without influencing their proteolytic activity. This property would enable proteins to dock to different kind of substrates in the extracellular matrix without losing their functionality as peptide cleaving enzymes. Interestingly, in a recently published work, the same loop region was described to be very flexible in P-III SVMPs (Guan et al., 2010). However, the detailed mode of action and the unambiguous proof of flexibility as structural determinant for hemorrhagic activity is still missing. This flexible loop region may act as a so-called exosite and determinates the binding selectivity of SVMPs towards certain substrates of the extracellular matrix. It was already shown for MMP-13 that considerable selectivity in inhibition can be achieved by inhibitors which do not interact with residues or the zinc ion of the protein’s active site but with a nearby loop region (Engel et al., 2005; Gooljarsingh et al., 2008). Exactly this loop region is structurally corresponding to the above- mentioned flexible loop of SVMPs. This hypothesis can only be confirmed by inhibitor design studies focussed on the interaction with the certain loop region of SVMPs. At least MD simulations have proven a distinct flexible behavior of the first and the second loop within the corresponding region. Interestingly, the flexibility seems not to be restricted to side chain atoms but is observed for the backbone of the complete loops. During the MD simulations, a so- called conformational ensemble is observed which gives rise to the assumption that hemorrhagic activity in SVMPs is controlled by multispecificity. In contrast to the historic ‘lock and key’ model of Fischer (1894), it seems that the binding process is not a static insertion of a low-energy conformation of the ligand into a rigid binding site but is characterized by a large number of existing loop conformations. Thereby, the most favorable conformation is chosen by the ligand. This binding mechanism model is known as ‘conformational selection’ (Ma et al., 1999; Tsai and Nussinov, 1997a/b). Accordingly, multispecificity is characterized by a rugged energy hypersurface which provides several minimas and each ligand chooses its favorable counterpart from the conformational ensemble prior to binding. In the present case, this means that SVMPs bind to different protein compartments of the basement membrane at the protein-protein interface and, subsequently, cleave them through their intrinsic peptide-cleaving activity of the metalloproteinase domain. Further studies, especially in vivo experiments, have to be done to finally prove this concept. Not only because more detailed insight into structural requirements for hemorrhagic potential of snake venoms will be helpful for the local therapy of snake venom poisonings but also to reach a better understanding of protein-protein interactions of other metzincins.

156 5.4 Multiple sequence alignments of the metalloproteinase domain of SVMPs

5.4 Multiple sequence alignments of the metalloproteinase domain of SVMPs

Since the first sequences of metalloproteinases became known, efforts have been undertaken to find common or distinct features in their primary structures. Thus, investigations of zinc dependent metalloendopeptidase sequences rapidly succeeded in recognition of the sequential distinctions between zincins and inverzincins (Hooper, 1994). Subsequently, these enzyme groups were divided into the subfamilies gluzincins, aspzincins, and metzincins according to characteristic features, such as the third zinc coordinating residue or the invariant Met-turn (Bode et al., 1993; Stocker et al., 1995; Stocker and Bode, 1995). The metzincins can be differentiated from the other zincins by the three-residue motif which lays about 20 amino acid residues downstream from the first zinc coordinating histidine residue. With the appearence of the first primary structure data of SVMPs in the late 1980’s, it became apparent that these proteins belong to the metzincins (Shannon et al., 1989; Takeya et al., 1989). Since then, many broad analyses of primary structures have been performed (Fox and Serrano, 2005; 2008; 2009; Ramos and Selistre-de-Araujo, 2004; 2006). Meanwhile it is known that the metalloproteinase domain of SVMPs is characterized by the following 12-residue consensus sequence: HEbxHxbGbxHD (one-letter amino acid code; x, any amino acid residue; b, bulky hydrophobic residue) (Ramos and Selistre-de-Araujo, 2006). So far, all performed SVMP sequence alignment studies encompass a smaller amount of examples compared to the present study in which 132 sequences or sequence segments were analyzed. The sequence of the aforementioned consensus zinc-binding motif could be elongated towards the N-terminus by six further residues. In total, now 18 residues are known to be invariant around the active site. Besides, also the majority of the counterparts of these residues seem to be invariant in sequence and structure. The functional importance of these residues is now clear as they are essential for the structural arrangement of the complete binding site. Combined with the aforementioned finding of the high adaptability in the binding pocket, it seems that nature allows a significant flexibility for substrate binding. Nevertheless, corresponding regions responsible for the catalytic activity show high rigidity. Interestingly, the areas of high flexibility and high rigidity are located close to each other. The residues which seem to contribute to the structural arrangement of the active site should be included in future studies for confirmation of their importance. In another attempt, it was searched for characteristic features in primary structure of the metalloproteinase domains that may prospectively be used to identify the corresponding P-class even in case of sequences or sequence segments of unknown SVMPs. Unfortunately, no such features were detectable. The only way to differentiate between the P-class of SVMPs remains the

157 5 DISCUSSION presence or absence of corresponding domains. Besides, this finding underlines the high similarity within the metalloproteinase domain of SVMPs not dependent on the P-class. The disulfide pattern of the metalloproteinase domains of SVMPs has been analyzed and discussed various times. Especially, Fox and Serrano (2005; 2008) contributed many interesting concerns on this topic. Nevertheless, apart from the intrinsic structural stabilization within the protein domain the further meaning of the cysteinyl residues is still unknown. However, it was proposed that the disulfide arrangement influences the feasibility of proteolysis of the SVMP chain and thereby affects the release of additional domains. The absolute amount of cysteinyl residues in the metalloproteinase domain varies in all former sequence alignment approaches. For example, P-II and P-III metalloproteinase sequences with eight cysteinyl residues were not yet included in the studies. The observation of two P-II proteins containing only five cysteinyl residues raises the question of the actual function as both missing residues belong to the disulfide bonds which actually are conserved in all SVMPs (Cys117-Cys197 and Cys159-Cys164). Further structural investigations and activity studies have to be performed to ultimately decide whether disulfide bonding is essential for common proteinase activity and/or structural arrangement of the metalloproteinase domain. Presumably, it can be differentiated between P-I and P-II/P-III SVMPs by the amount of cysteinyl residues and the pure sequence of a metalloproteinase domain. At least in the case of either four or five present cysteinyl residues as 7% of P-II and not one P-III SVMP metalloproteinase sequence possess less than six cysteinyl residues. With the aid of the comprehensive sequence alignments a considerable amount of additionally conserved residues at the active site were found. Besides, the variation of cysteinyl residues was investigated in more detail. As mentioned before, the ‘disulfide bond engineering’ through evolution may be an explanation for the different amount of intact disulfide bonds and of free cysteinyl residues. The actual influence by free cysteinyl residues on the biological function of SVMPs has to be investigated in more detail. However, a first hint is given that additional domains, such as the disintegrin domain, may only be released under certain circumstances. Thus, at least three predominant positions have been found at which inter-domain disulfide bonds are easily possible. Further studies have to be performed to enlight the uncertainty about the meaning of the cysteinyl residues in the metalloproteinase domain of SVMPs. Noteworthy, the additional domains (disintegrin, cysteinyl-rich, and C-type lectin-like) have to be included in these sequential and structural investigations.

158 5.5 Search for new BaP1 inhibiting compounds

5.5 Search for new BaP1 inhibiting compounds

As described before, the use of in silico approaches in drug discovery is nowadays a well- accepted and frequently used procedure (Drews, 2000; Jorgensen, 2004). Particularly, pharmacophore modeling and virtual screening have evolved into important and successful alternatives to biological high-throughput screening of potential inhibitors (Koppen, 2009; Leach et al., 2010). Although the development and the validation of the pharmacophore models led to promising results, none of the actual screening hits showed significant inhibitory effects on the proteolytic activity of BaP1. In almost the same manner, the second round of screening did not lead to a potent inhibiting compound. One reason for this may be the inclusion of structural data from other metalloproteinases (TACE) in the development and validation process. However, the similarity of the active site has been shown and important features of the models were chosen according to interaction potential towards the binding pocket of BaP1. Besides, the original inhibitor was developed as TACE inhibitor and, subsequently, its potential of BaP1 inhibition was found. This fact confirms that the inhibitory mechanism of BaP1 may be very similar to that one of TACE. Nevertheless, at least the higher concentration series of tests should have shown coherence between concentration and inhibitory effect. To eliminate the fact of uncertainities about impure, putatively inhibiting compounds, a further screening should be performed with a database of substances with high quality. By MD simulations, affinity of the original inhibitor was calculated in form of the dissociation constant KD. Although the range of the calculated K D value differed from the half-maximal inhibitory concentration of the in vitro testing, this approach represents an excellent starting point for other calculations in correlation to the original inhibitor. Thus, it could be shown that an elongation of the P1' residue of the inhibitor would enhance the affinity towards BaP1. This fact may be used in further studies concerning the specificity of BaP1 inhibitors as the hydrophobic S1' pocket is varying in width and length in between different metalloproteinases. To validate the procedure of theoretical calculation of the binding affinities, a set of known metalloproteinase inhibitors were also subjected to biological testing. In all examples, the predicted activity correlated to the original inhibitor could be observed. Therefore, the method can be a suitable tool for the evaluation of unknown putatively inhibiting compounds. In a further approach, it was tried to estimate the actual contribution of the zinc-binding group to the complete affinity of the inhibitor. In the case of the original inhibitor this is a hydroxamic substructure, whereby these scaffolds are nowadays actively discussed in metalloproteinase

159 5 DISCUSSION inhibitor design. Despite early cases of inhibitor design approaches, many different zinc-binding groups have been inspected (Jacobsen et al., 2007; Puerta et al., 2004; Puerta et al., 2006). It was proposed that hydroxamic scaffolds show an excessive affinity towards all kind of zinc dependent proteins and, consequently, the toxicity of the inhibitors is significantly high. To address this question norvaline hydroxamate which contains the corresponding substructure but not the peptide- like scaffold was biologically tested. Interestingly, it showed only very low affinity to the binding pocket of BaP1. This finding contrasts with the common opinion that inhibitory effects of hydroxamate derivatives against the activity of zinc dependent metalloproteinases are dominated by the zinc complexation. In the present case the specific interactions of the inhibitor residues with the binding site seem to account significantly for the affinity. On the basis of this result and with the aid of further studies, metalloproteinase inhibitor design approaches may lead to a reconsideration of the contribution of the zinc-binding group and novel scaffolds of metalloproteinase inhibitors may be encountered.

160 Bibliography

6 Bibliography

Akao, P.K., Tonoli, C.C., Navarro, M.S., Cintra, A.C., Neto, J.R., Arni, R.K., and Murakami, M.T. (2010). Structural studies of BmooMPalpha-I, a non-hemorrhagic metalloproteinase from Bothrops moojeni venom. Toxicon 55 , 361-368.

Alberts, I.L., Nadassy, K., and Wodak, S.J. (1998). Analysis of zinc binding sites in protein crystal structures. Protein Sci 7, 1700-1716.

Anderson, R.J., Weng, Z., Campbell, R.K., and Jiang, X. (2005). Main-chain conformational tendencies of amino acids. Proteins 60 , 679-689.

Apte, S.S. (2009). A disintegrin-like and metalloprotease (reprolysin-type) with thrombospondin type 1 motif (ADAMTS) superfamily: functions and mechanisms. J Biol Chem 284 , 31493-31497.

Apweiler, R., Bairoch, A., and Wu, C.H. (2004). Protein sequence databases. Curr Opin Chem Biol 8, 76-80.

Arndt, U.W., Champness, J.N., Phizackerly, R.P., and Wonacott, A.J. (1973). A single-crystal oscilation camera for large unit cells. J Appl Cryst 6, 463.

Assakura, M.T., Silva, C.A., Mentele, R., Camargo, A.C., and Serrano, S.M. (2003). Molecular cloning and expression of structural domains of bothropasin, a P-III metalloproteinase from the venom of Bothrops jararaca. Toxicon 41 , 217-227.

Auld, D.S. (2001). Zinc coordination sphere in biochemical zinc sites. Biometals 14 , 271-313.

Benner, S.E., Chothia, C., and Hubbard, T.J. (1997). Population statistics of protein structures: Lessons from structural classifications. Curr Opin Struct Biol 7, 369-376.

Bergfors, T.M. (1999). Protein crystallization: Techniques, strategies, and tips, (International University Line: La Jolla).

Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. (2000). The Protein Data Bank. Nucleic Acids Res 28 , 235-242.

Biertumpfel, C., Basquin, J., Suck, D., and Sauter, C. (2002). Crystallization of biological macromolecules using agarose gel. Acta Crystallogr D Biol Crystallogr 58 , 1657-1659.

Bilgrami, S., Tomar, S., Yadav, S., Kaur, P., Kumar, J., Jabeen, T., Sharma, S., and Singh, T.P. (2004). Crystal structure of schistatin, a disintegrin homodimer from saw-scaled viper (Echis carinatus) at 2.5 A resolution. J Mol Biol 341 , 829-837.

Bjarnason, J.B., and Fox, J.W. (1983). Proteolytic Specificity and Cobalt Exchange of Hemorrhagic Toxin-E, A Zinc Protease Isolated from the Venom of the Western Diamondback Rattlesnake (Crotalus-Atrox). Biochemistry 22 , 3770-3778.

Bjarnason, J.B., and Fox, J.W. (1994). Hemorrhagic Metalloproteinases from Snake-Venoms. Pharmacol Therapeut 62 , 325-372.

161 6 BIBLIOGRAPHY Bode, W., Gomis-Ruth, F.X., and Stockler, W. (1993). Astacins, serralysins, snake venom and matrix metalloproteinases exhibit identical zinc-binding environments (HEXXHXXGXXH and Met-turn) and topologies and should be grouped into a common family, the 'metzincins'. FEBS Lett 331 , 134-140.

Bode, W., and Maskos, K. (2001). Structural studies on MMPs and TIMPs. Methods Mol Biol 151 , 45-77.

Botos, I., Scapozza, L., Zhang, D.C., Liotta, L.A., and Meyer, E.F. (1996). Batimastat, a potent matrix metalloproteinase inhibitor, exhibits an unexpected mode of binding. PNAS 93 , 2749-2754.

Brooijmans, N., and Kuntz, I.D. (2003). Molecular recognition and docking algorithms. Annu Rev Biophys Biomol Struct 32 , 335-373.

Brown, C.K., Madauss, K., Lian, W., Beck, M.R., Tolbert, W.D., and Rodgers, D.W. (2001). Structure of neurolysin reveals a deep channel that limits substrate access. PNAS 98 , 3127-3132.

Brunger, A.T. (1993). Assessment of phase accuracy by cross validation: the free R value. Methods and applications. Acta Crystallogr D Biol Crystallogr 49 , 24-36.

Brunger, A.T., Adams, P.D., Clore, G.M., DeLano, W.L., Gros, P., Grosse-Kunstleve, R.W., Jiang, J.S., Kuszewski, J., Nilges, M., Pannu, N.S., Read, R.J., Rice, L.M., Simonson, T., and Warren, G.L. (1998). Crystallography & NMR system: A new software suite for macromolecular structure determination. Acta Crystallogr D Biol Crystallogr 54 , 905-921.

Calvete, J.J., Moreno-Murciano, M.P., Theakston, R.D., Kisiel, D.G., and Marcinkiewicz, C. (2003). Snake venom disintegrins: novel dimeric disintegrins and structural diversification by disulphide bond engineering. Biochem J 372 , 725-734.

Calvete, J.J. (2005). Structure-function correlations of snake venom disintegrins. Curr Pharm Design 11 , 829-835.

Calvete, J.J., Escolano, J., and Sanz, L. (2007). Snake venomics of Bitis species reveals large intragenus venom toxin composition variation: Application to taxonomy of congeneric taxa. J Proteom Res 6, 2732-2745.

Carter, C.W., Jr., and Carter, C.W. (1979). Protein crystallization using incomplete factorial experiments. J Biol Chem 254 , 12219-12223.

CCP4 (1994). The CCP4 suite: programs for protein crystallography. Acta Crystallogr D Biol Crystallogr 50 , 760-763.

Chayen, N.E., and Saridakis, E. (2008). Protein crystallization: from purified protein to diffraction-quality crystal. Nat Methods 5, 147-153.

Chen, I.J., and Foloppe, N. (2008). Conformational sampling of druglike molecules with MOE and catalyst: implications for pharmacophore modeling and virtual screening. J Chem Inf Model 48 , 1773-1791.

Chen, X., Plasencia, C., Hou, Y., and Neamati, N. (2005). Synthesis and biological evaluation of dimeric RGD peptide-paclitaxel conjugate as a model for integrin-targeted drug delivery. J Med Chem 48 , 1098-1106.

162 Bibliography

Chippaux, J.P. (2008). Estimating the Global Burden of Snakebite Can Help To Improve Management. Plos Medicine 5, 1538-1539.

Chung, M.C.M., Ponnudurai, G., Kataoka, M., Shimizu, S., and Tan, N.H. (1996). Structural studies of a major hemorrhagin (rhodostoxin) from the venom of Calloselasma rhodostoma (Malayan pit viper). Arch Biochem Biophys 325 , 199-208.

Cirilli, M., Gallina, C., Gavuzzo, E., Giordano, C., Gomis-Ruth, F.X., Gorini, B., Kress, L.F., Mazza, F., Paradisi, M.P., Pochetti, G., and Politi, V. (1997). 2 angstrom X-ray structure of adamalysin II complexed with a peptide phosphonate inhibitor adopting a retro-binding mode. FEBS Lett 418 , 319-322.

Cominetti, M.R., Ribeiro, J.U., Fox, J.W., and Selistre-de-Araujo, H.S. (2003). BaG, a new dimeric metalloproteinase/disintegrin from the Bothrops alternatus snake venom that interacts with alpha5beta1 integrin. Arch Biochem Biophys 416 , 171-179.

Condon, J.S., Joseph-McCarthy, D., Levin, J.I., Lombart, H.G., Lovering, F.E., Sun, L.H., Wang, W.H., Xu, W.X., and Zhang, Y.H. (2007). Identification of potent and selective TACE inhibitors via the S1 pocket. Bioorg Med Chem Lett 17 , 34-39.

Cudney, R., Patel, S., Weisgraber, K., Newhouse, Y., and McPherson, A. (1994). Screening and optimization strategies for macromolecular crystal growth. Acta Crystallogr D Biol Crystallogr 50 , 414-423.

D'Arcy, A., Mac, S.A., and Haber, A. (2003). Using natural seeding material to generate nucleation in protein crystallization experiments. Acta Crystallogr D Biol Crystallogr 59 , 1343- 1346.

Dahl, D.B., Bohannan, Z., Mo, Q., Vannucci, M., and Tsai, J. (2008). Assessing side-chain perturbations of the protein backbone: a knowledge-based classification of residue Ramachandran space. J Mol Biol 378 , 749-758.

Dauter, Z. (1998). Data collection strategies. Methods Enzymol 276 , 326-343.

Debye, P. (1914). Interferenz von Röntgenstrahlen und Wärmebewegung. Annal Phys 43 , 49-95.

Derewenda, Z.S. (2004). Rational protein crystallization by mutational surface engineering. Structure 12 , 529-535.

Diederichs, K., and Karplus, P.A. (1997). Improved R-factors for diffraction data analysis in macromolecular crystallography. Nat Struct Biol 4, 269-275.

Diederichs, K. (2006). Some aspects of quantitative analysis and correction of radiation damage. Acta Crystallogr D Biol Crystallogr 62 , 96-101.

Do, C.B., and Katoh, K. (2008). Protein multiple sequence alignment. Methods Mol Biol 484 , 379-413.

Dodson, E.J., Davies, G.J., Lamzin, V.S., Murshudov, G.N., and Wilson, K.S. (1998). Validation tools: can they indicate the information content of macromolecular crystal structures? Structure 6, 685-690.

Drews, J. (2000). Drug discovery: a historical perspective. Science 287 , 1960-1964.

163 6 BIBLIOGRAPHY Edwards, B.S., Bologa, C., Young, S.M., Balakin, K.V., Prossnitz, E.R., Savchuck, N.P., Sklar, L.A., and Oprea, T.I. (2005). Integration of virtual screening with high-throughput flow cytometry to identify novel small molecule formylpeptide receptor antagonists. Mol Pharmacol 68 , 1301- 1310.

Edwards, D.R., Handsley, M.M., and Pennington, C.J. (2008). The ADAM metalloproteinases. Mol Aspects Med 29 , 258-289.

Egeblad, M., and Werb, Z. (2002). New functions for the matrix metalloproteinases in cancer progression. Nat Rev Cancer 2, 161-174.

Emsley, P., and Cowtan, K. (2004). Coot: model-building tools for molecular graphics. Acta Crystallogr D Biol Crystallogr 60 , 2126-2132.

Engel, C.K., Pirard, B., Schimanski, S., Kirsch, R., Habermann, J., Klingler, O., Schlotte, V., Weithmann, K.U., and Wendt, K.U. (2005). Structural basis for the highly selective inhibition of MMP-13. Chem Biol 12 , 181-189.

Engh, R., and Huber, R. (1991). Accurate bond and angle parameters for X-ray protein structure refinement. Acta Crystallogr A 47 , 392-400.

Evans, P. (2006). Scaling and assessment of data quality. Acta Crystallogr D Biol Crystallogr 62 , 72-82.

Ewald, P.P. (1921). Das 'reziproke' Gitter in der Strukturtheorie. Z Kistallogr 56 , 129-156.

Fabiola, F., Korostelev, A., and Chapman, M.S. (2006). Bias in cross-validated free R factors: mitigation of the effects of non-crystallographic symmetry. Acta Crystallogr D Biol Crystallogr 62 , 227-238.

Fasciglione, G.F., Marini, S., D'Alessio, S., Politi, V., and Coletta, M. (2000). pH- and temperature-dependence of functional modulation in metalloproteinases. A comparison between neutrophil and gelatinases A and B. Biophys J 79 , 2138-2149.

Fenn, T.D., Ringe, D., and Petsko, G.A. (2003). POVScript+: A program for model and data visualization using persistence of vision ray-tracing. J Appl Cryst 36 , 944-947.

Ferreira, R.N., Rates, B., Richardson, M., Guimaraes, B.G., Sanchez, E.O.F., Pimenta, A.M.D., and Nagem, R.A.P. (2009). Complete amino-acid sequence, crystallization and preliminary X-ray diffraction studies of leucurolysin-a, a nonhaemorrhagic metalloproteinase from Bothrops leucurus snake venom. Acta Crystallogr F Struct Biol Cryst Comm 65 , 798-801.

Fischer, E. (1894). Einfluss der Configuration auf die Wirkung der Enzyme. Ber Deut Chem Ges 27 , 2985-2993.

Fox, J.W., and Serrano, S.M.T. (2005). Structural considerations of the snake venom metalloproteinases, key members of the M12 reprolysin family of metalloproteinases. Toxicon 45 , 969-985.

Fox, J.W., and Serrano, S.M.T. (2008). Insights into and speculations about snake venom metalloproteinase (SVMP) synthesis, folding and disulfide bond formation and their contribution to venom complexity. FEBS J 275 , 3016-3030.

164 Bibliography

Fox, J.W., and Serrano, S.M.T. (2009). Timeline of key events in snake venom metalloproteinase research. J Proteom 72 , 200-209.

Fradera, X., and Mestres, J. (2004). Guided docking approaches to structure-based design and screening. Curr Top Med Chem 4, 687-700.

Freddolino, P.L., and Schulten, K. (2009). Common structural transitions in explicit-solvent simulations of villin headpiece folding. Biophys J 97 , 2338-2347.

Friesner, R.A., Banks, J.L., Murphy, R.B., Halgren, T.A., Klicic, J.J., Mainz, D.T., Repasky, M.P., Knoll, E.H., Shelley, M., Perry, J.K., Shaw, D.E., Francis, P., and Shenkin, P.S. (2004). Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J Med Chem 47 , 1739-1749.

Fujimura, S., Oshikawa, K., Terada, S., and Kimoto, E. (2000). Primary structure and autoproteolysis of brevilysin H6 from the venom of Gloydius halys brevicaudus. J Biochem 128 , 167-173.

Fushimi, N., Ee, C.E., Nakajima, T., and Ichishima, E. (1999). Aspzincin, a family of metalloendopeptidases with a new zinc-binding motif. Identification of new zinc-binding sites (His(128), His(132), and Asp(164)) and three catalytically crucial residues (Glu(129), Asp(143), and Tyr(106)) of deuterolysin from Aspergillus oryzae by site-directed mutagenesis. J Biol Chem 274 , 24195-24201.

Garman, E. (1999). Cool data: quantity AND quality. Acta Crystallogr D Biol Crystallogr 55 , 1641-1653.

Garman, E.F., and Mitchell, E.P. (1996). Glycerol concentrations required for cryoprotection of 50 typical protein crystallization solutions. J Appl Cryst 29 , 584-587.

Gerhardt, S., Hassall, G., Hawtin, P., McCall, E., Flavell, L., Minshull, C., Hargreaves, D., Ting, A., Pauptit, R.A., Parker, A.E., and Abbott, W.M. (2007). Crystal structures of human ADAMTS-1 reveal a conserved catalytic domain and a disintegrin-like domain with a fold homologous to cysteine-rich domains. J Mol Biol 373 , 891-902.

Gill, S.C., and von Hippel, P.H. (1989). Calculation of protein extinction coefficients from amino acid sequence data. Anal Biochem 182 , 319-326.

Gilson, M.K., and Zhou, H.X. (2007). Calculation of protein-ligand binding affinities. Annu Rev Biophys Biomol Struct 36 , 21-42.

Girard, E., Legrand, P., Roudenko, O., Roussier, L., Gourhant, P., Gibelin, J., Dalle, D., Ounsy, M., Thompson, A.W., Svensson, O., Cordier, M.O., Robin, S., Quiniou, R., and Steyer, J.P. (2006). Instrumentation for synchrotron-radiation macromolecular crystallography. Acta Crystallogr D Biol Crystallogr 62 , 12-18.

Goldman, N., and Yang, Z. (2008). Introduction. Statistical and computational challenges in molecular phylogenetics and evolution. Philos T Roy Soc B 363 , 3889-3892.

Gomis-Ruth, F.X., Kress, L.F., and Bode, W. (1993). First structure of a snake venom metalloproteinase: a prototype for matrix metalloproteinases/collagenases. EMBO J 12 , 4151-4157.

165 6 BIBLIOGRAPHY Gomis-Ruth, F.X., Kress, L.F., Kellermann, J., Mayr, I., Lee, X., Huber, R., and Bode, W. (1994). Refined 2.0 A X-ray crystal structure of the snake venom zinc- adamalysin II. Primary and tertiary structure determination, refinement, molecular structure and comparison with astacin, collagenase and thermolysin. J Mol Biol 239 , 513-544.

Gomis-Ruth, F.X., Meyer, E.F., Kress, L.F., and Politi, V. (1998). Structures of adamalysin II with peptidic inhibitors. Implications for the design of tumor necrosis factor alpha convertase inhibitors. Protein Sci 7, 283-292.

Gomis-Ruth, F.X. (2003). Structural aspects of the metzincin clan of metalloendopeptidases. Mol Biotechnol 24 , 157-202.

Gomis-Ruth, F.X. (2009). Catalytic domain architecture of metzincin metalloproteases. J Biol Chem 284 , 15353-15357.

Gong, W.M., Zhu, X.Y., Liu, S.J., Teng, M.K., and Niu, L.W. (1998). Crystal structures of acutolysin A, a three-disulfide hemorrhagic zinc metalloproteinase from the snake venom of Agkistrodon acutus. J Mol Biol 283 , 657-668.

Gonzalez, A., and Nave, C. (1994). Radiation damage in protein crystals at low temperature. Acta Crystallogr D Biol Crystallogr 50 , 874-877.

Good, A.C., and Oprea, T.I. (2008). Optimization of CAMD techniques 3. Virtual screening enrichment studies: a help or hindrance in tool selection? J Comput Aided Mol Des 22 , 169-178.

Gooljarsingh, L.T., Lakdawala, A., Coppo, F., Luo, L., Fields, G.B., Tummino, P.J., and Gontarek, R.R. (2008). Characterization of an exosite binding inhibitor of matrix metalloproteinase 13. Protein Sci 17 , 66-71.

Gowda, D.C., Jackson, C.M., Kurzban, G.P., McPhie, P., and Davidson, E.A. (1996). Core sugar residues of the N-linked oligosaccharides of Russell's viper venom factor X-activator maintain functionally active polypeptide structure. Biochemistry 35 , 5833-5837.

Grams, F., Huber, R., Kress, L.F., Moroder, L., and Bode, W. (1993). Activation of snake venom metalloproteinases by a cysteine switch-like mechanism. FEBS Lett 335 , 76-80.

Green, D.W., Ingram, V.M., and Perutz, M.F. (1954). The structure of haemoglobin IV. Sign determination by the isomorphous replacement method. Proc Roy Soc A 225 , 287-307.

Greene, J., Kahn, S., Savoj, H., Sprague, P., and Teig, S.L. (1994). Chemical Function Queries for 3D Database Search. J Chem Inf Comput Sci 34 , 1297-1308.

Guan, H.H., Goh, K.S., Davamani, F., Wu, P.L., Huang, Y.W., Jeyakanthan, J., Wu, W.G., and Chen, C.J. (2010). Structures of two elapid snake venom metalloproteases with distinct activities highlight the disulfide patterns in the D domain of ADAMalysin family proteins. J Struct Biol 169 , 294-303.

Guimaraes, J.A., and Carlini, C.R. (2004). Most cited papers in Toxicon. Toxicon 44 , 345-359.

Gund, P. (2000). Evolution of the pharmacophore concept in pharmaceutical research. In: Pharmacophore: Perception, development, and use in drug design . (International University Line: La Jolla), pp. 5-11.

166 Bibliography

Guner, O.F. (2002). History and evolution of the pharmacophore concept in computer-aided drug design. Curr Top Med Chem 2, 1321-1332.

Guo, Z.Y., Orth, P., Wong, S.C., Lavey, B.J., Shih, N.Y., Niu, X., Lundell, D.J., Madison, V., and Kozlowski, J.A. (2009). Discovery of novel spirocyclopropyl hydroxamate and carboxylate compounds as TACE inhibitors. Bioorg Med Chem Lett 19 , 54-57.

Gutierrez, J.M., Romero, M., Diaz, C., Borkow, G., and Ovadia, M. (1995). Isolation and characterization of a metalloproteinase with weak hemorrhagic activity from the venom of the snake Bothrops asper (terciopelo). Toxicon 33 , 19-29.

Gutierrez, J.M., Rucavado, A., Escalante, T., and Diaz, C. (2005). Hemorrhage induced by snake venom metalloproteinases: biochemical and biophysical mechanisms involved in microvessel damage. Toxicon 45 , 997-1011.

Gutierrez, J.M., Rucavado, A., Chaves, F., Diaz, C., and Escalante, T. (2009). Experimental pathology of local tissue damage induced by Bothrops asper snake venom. Toxicon 54 , 958-975.

Harp, J.M., Hanson, B.L., Timm, D.E., and Bunick, G.J. (1999). Macromolecular crystal annealing: evaluation of techniques and variables. Acta Crystallogr D Biol Crystallogr 55 , 1329- 1334.

Hati, R., Mitra, P., Sarker, S., and Bhattacharyya, K.K. (1999). Snake venom hemorrhagins. Crit Rev Toxicol 29 , 1-19.

Hege, T., and Baumann, U. (2001). The conserved methionine residue of the metzincins: a site- directed mutagenesis study. J Mol Biol 314 , 181-186.

Heras, B., and Martin, J.L. (2005). Post-crystallization treatments for improving diffraction quality of protein crystals. Acta Crystallogr D Biol Crystallogr 61 , 1173-1180.

Hite, L.A., Fox, J.W., and Bjarnason, J.B. (1992). A new family of proteinases is defined by several snake venom metalloproteinases. Biol Chem H-S 373 , 381-385.

Hite, L.A., Jia, L.G., Bjarnason, J.B., and Fox, J.W. (1994). Cdna Sequences for 4 Snake-Venom Metalloproteinases - Structure, Classification, and Their Relationship to Mammalian Reproductive Proteins. Arch Biochem Biophys 308 , 182-191.

Holm, L., and Sander, C. (1993). Protein structure comparison by alignment of distance matrices. J Mol Biol 233 , 123-138.

Holm, L., and Park, J. (2000). DaliLite workbench for protein structure comparison. Bioinformatics 16 , 566-567.

Hooper, N.M. (1994). Families of zinc metalloproteases. FEBS Lett 354 , 1-6.

Hori, T., Kumasaka, T., Yamamoto, M., Nonaka, N., Tanaka, N., Hashimoto, Y., Ueki, U., and Takio, K. (2001). Structure of a new 'aspzincin' metalloendopeptidase from Grifola frondosa: implications for the catalytic mechanism and substrate specificity based on several different crystal forms. Acta Crystallogr D Biol Crystallogr 57 , 361-368.

Hsu, C.C., Wu, W.B., and Huang, T.F. (2008). A snake venom metalloproteinase, kistomin, cleaves platelet glycoprotein VI and impairs platelet functions. J Thromb Haemost 6, 1578-1585.

167 6 BIBLIOGRAPHY Hu, J., Van den Steen, P.E., Sang, Q.X., and Opdenakker, G. (2007). Matrix metalloproteinase inhibitors as therapy for inflammatory and vascular diseases. Nat Rev Drug Discov 6 , 480-498.

Huang, K.F., Hung, C.C., Pan, F.M., Chow, L.P., Tsugita, A., and Chiou, S.H. (1995). Characterization of Multiple Metalloproteinases with Fibrinogenolytic Activity from the Venom of Taiwan Habu (Trimeresurus Mucrosquamatus) - Protein Microsequencing Coupled with Cdna Sequence-Analysis. Biochem Biophys Res Comm 216 , 223-233.

Huang, K.F., Chiou, S.H., Ko, T.P., Yuann, J.M., and Wang, A.H.J. (2002). The 1.35 angstrom structure of cadmium-substituted TM-3, a snake-venom metalloproteinase from Taiwan habu: elucidation of a TNF alpha-converting enzyme-like active-site structure with a distorted octahedral geometry of cadmium. Acta Crystallogr D Biol Crystallogr 58 , 1118-1128.

Huang, T.F., Holt, J.C., Lukasiewicz, H., and Niewiarowski, S. (1987). Trigramin. A low molecular weight peptide inhibiting fibrinogen interaction with platelet receptors expressed on glycoprotein IIb-IIIa complex. J Biol Chem 262 , 16157-16163.

Ingram, R.N., Orth, P., Strickland, C.L., Le, H.V., Madison, V., and Beyer, B.M. (2006). Stabilization of the autoproteolysis of TNF-alpha converting enzyme (TACE) results in a novel crystal form suitable for structure-based drug design studies. Protein Eng Design Sel 19 , 155-161.

International Tables for Crystallography (1996). Space-Group Symmetry, (Kluwer Academic Publishers: Dordrecht).

Ito, M., Hamako, J., Sakurai, Y., Matsumoto, M., Fujimura, Y., Suzuki, M., Hashimoto, K., Titani, K., and Matsui, T. (2001). Complete amino acid sequence of kaouthiagin, a novel cobra venom metalloproteinase with two disintegrin-like sequences. Biochemistry 40 , 4503-4511.

Jacobsen, F.E., Lewis, J.A., and Cohen, S.M. (2007). The design of inhibitors for medicinally relevant metalloproteins. Chem Med Chem 2, 152-171.

Jancarik, J., Scott, W.G., Milligan, D.L., Koshland, D.E., Jr., and Kim, S.H. (1991). Crystallization and preliminary X-ray diffraction study of the ligand-binding domain of the bacterial chemotaxis-mediating aspartate receptor of Salmonella typhimurium. J Mol Biol 221 , 31- 34.

Jeon, O.H., and Kim, D.S. (1999). Molecular cloning and functional characterization of a snake venom metalloprotease. Eur J Biochem 263 , 526-533.

Jia, L.G., Wang, X.M., Shannon, J.D., Bjarnason, J.B., and Fox, J.W. (2000). Inhibition of platelet aggregation by the recombinant cysteine-rich domain of the hemorrhagic snake venom metalloproteinase, atrolysin A. Arch Biochem Biophys 373 , 281-286.

Johnson, E.K., and Ownby, C.L. (1993). Isolation of A Hemorrhagic Toxin from the Venom of Agkistrodon-Contortrix-Laticinctus (Broad-Banded Copperhead) and Pathogenesis of the Hemorrhage Induced by the Toxin in Mice. Int J Biochem 25 , 267-278.

Jorgensen, W.L. (2004). The many roles of computation in drug discovery. Science 303 , 1813- 1818.

Junqueira-de-Azevedo, I.L., and Ho, P.L. (2002). A survey of gene expression and diversity in the venom glands of the pitviper snake Bothrops insularis through the generation of expressed sequence tags (ESTs). Gene 299 , 279-291.

168 Bibliography

Kabsch, W., and Sander, C. (1983). Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22 , 2577-2637.

Kabsch, W. (2010). XDS. Acta Crystallogr D Biol Crystallogr 66 , 125-132.

Kang, I.C., Chung, K.H., Lee, S.J., Yun, Y., Moon, H.M., and Kim, D.S. (1998). Purification and molecular cloning of a platelet aggregation inhibitor from the snake (Agkistrodon halys brevicaudus) venom. Thromb Res 91 , 65-73.

Kantardjieff, K.A., and Rupp, B. (2003). Matthews coefficient probabilities: Improved estimates for unit cell contents of proteins, DNA, and protein-nucleic acid complex crystals. Protein Sci 12 , 1865-1871.

Karplus, M., and McCammon, J.A. (2002). Molecular dynamics simulations of biomolecules. Nat Struct Biol 9, 646-652.

Kasturiratne, A., Wickremasinghe, A.R., de Silva, N., Gunawardena, N.K., Pathmeswaran, A., Premaratna, R., Savioli, L., Lalloo, D.G., and de Silva, H.J. (2008). The Global Burden of Snakebite: A Literature Analysis and Modelling Based on Regional Estimates of Envenoming and Deaths. Plos Medicine 5, 1591-1604.

Kenny, P.A. (2007). TACE: a new target in epidermal growth factor receptor dependent tumors. Differentiation 75 , 800-808.

Kikushima, E., Nakamura, S., Oshima, Y., Shibuya, T., Miao, J.Y., Hayashi, H., Nikai, T., and Araki, S. (2008). Hemorrhagic activity of the vascular apoptosis-inducing proteins VAP1 and VAP2 from Crotalus atrox. Toxicon 52 , 589-593.

Kirchmair, J., Laggner, C., Wolber, G., and Langer, T. (2005). Comparative analysis of protein- bound ligand conformations with respect to catalyst's conformational space subsampling algorithms. J Chem Inf Model 45 , 422-430.

Kirchmair, J., Ristic, S., Eder, K., Markt, P., Wolber, G., Laggner, C., and Langer, T. (2007). Fast and efficient in silico 3D screening: toward maximum computational efficiency of pharmacophore-based and shape-based approaches. J Chem Inf Model 47 , 2182-2196.

Kirchmair, J., Markt, P., Distinto, S., Wolber, G., and Langer, T. (2008). Evaluation of the performance of 3D virtual screening protocols: RMSD comparisons, enrichment assessments, and decoy selection--what can we learn from earlier mistakes? J Comput Aided Mol Des 22 , 213-228.

Kishimoto, M., and Takahashi, T. (2002). Molecular cloning and sequence analysis of cDNA encoding flavoridin, a disintegrin from the venom of Trimeresurus flavoviridis. Toxicon 40 , 1033- 1040.

Klein, T., and Bischoff, R. (2010a). Active Metalloproteases of the A Disintegrin And Metalloprotease (ADAM) family: biological function and structure. J Proteome Res DOI: 10.1021/pr100556z.

Klein, T., and Bischoff, R. (2010b). Physiology and pathophysiology of matrix metalloproteases. Amino Acids DOI: 10.1007/s00726-010-0689-x.

Kleywegt, G.J., and Jones, T.A. (1995). Where freedom is given, liberties are taken. Structure 3, 535-540.

169 6 BIBLIOGRAPHY Kleywegt, G.J., and Brunger, A.T. (1996). Checking your imagination: applications of the free R value. Structure 4, 897-904.

Kleywegt, G.J., Henrick, K., Dodson, E.J., and van Aalten, D.M. (2003). Pound-wise but penny- foolish: How well do micromolecules fare in macromolecular refinement? Structure 11 , 1051- 1059.

Koh, D.C., Armugam, A., and Jeyaseelan, K. (2006). Snake venom components and their applications in biomedicine. Cell Mol Life Sci 63 , 3030-3041.

Koppen, H. (2009). Virtual screening - what does it give us? Curr Opin Drug Discov Devel 12 , 397-407.

Krissinel, E., and Henrick, K. (2004). Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D Biol Crystallogr 60 , 2256- 2268.

Kumasaka, T., Yamamoto, M., Moriyama, H., Tanaka, N., Sato, M., Katsube, Y., Yamakawa, Y., OmoriSatoh, T., Iwanaga, S., and Ueki, T. (1996). Crystal structure of H-2-proteinase from the venom of Trimeresurus flavoviridis. J Biochem 119 , 49-57.

Lacy, D.B., Tepp, W., Cohen, A.C., DasGupta, B.R., and Stevens, R.C. (1998). Crystal structure of botulinum neurotoxin type A and implications for toxicity. Nat Struct Biol 5, 898-902.

Laemmli, U.K. (1970). Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature 227 , 680-685.

Lafrenie, R.M., Lee, S.F., Hewlett, I.K., Yamada, K.M., and Dhawan, S. (2002). Involvement of integrin alphavbeta3 in the pathogenesis of human immunodeficiency virus type 1 infection in monocytes. Virology 297 , 31-38.

Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., and Higgins, D.G. (2007). Clustal W and clustal X version 2.0. Bioinformatics 23 , 2947-2948.

Laskowski, R.A., MacArthur, M.W., Moss, D.S., and Thornton, J.M. (1993). PROCHECK: A program to check the stereochemical quality of protein structures. J Appl Cryst 14 , 379-400.

Leach, A.R., Gillet, V.J., Lewis, R.A., and Taylor, R. (2010). Three-dimensional pharmacophore methods in drug discovery. J Med Chem 53 , 539-558.

Lee, E.H., Hsin, J., Sotomayor, M., Comellas, G., and Schulten, K. (2009). Discovery through the computational microscope. Structure 17 , 1295-1306.

Leslie, A.G.W. (2006). The integration of macromolecular diffraction data. Acta Crystallogr D Biol Crystallogr 62 , 48-57.

Levin, J.I., Chen, J.M., Laakso, L.M., Du, M., Du, X., Venkatesan, A.M., Sandanayaka, V., Zask, A., Xu, J., Xu, W., Zhang, Y., and Skotnicki, J.S. (2005). Acetylenic TACE inhibitors. Part 2: SAR of six-membered cyclic sulfonamide hydroxamates. Bioorg Med Chem Lett 15 , 4345-4349.

Levin, J.I., Chen, J.M., Laakso, L.M., Du, M., Schmid, J., Xu, W., Cummons, T., Xu, J., Jin, G., Barone, D., and Skotnicki, J.S. (2006). Acetylenic TACE inhibitors. Part 3: Thiomorpholine sulfonamide hydroxamates. Bioorg Med Chem Lett 16 , 1605-1609.

170 Bibliography

Lingott, T. (2005). Mutagenese, Präparation, Kristallisation und röntgenkristallographische Untersuchungen der Apocarotinoid-15,15'-Oxygenase aus Synechocystis sp. PCC 6803, Freiburg im Breisgau, Albert-Ludwigs-Universität.

Lingott, T., Schleberger, C., Gutierrez, J.M., and Merfort, I. (2009). High-Resolution Crystal Structure of the Snake Venom Metalloproteinase BaP1 Complexed with a Peptidomimetic: Insight into Inhibitor Binding. Biochemistry 48 , 6166-6174.

Liu, H.L., Shim, A.H.R., and He, X.L. (2009). Structural Characterization of the Ectodomain of a Disintegrin and Metalloproteinase-22 (ADAM22), a Neural Adhesion Receptor Instead of Metalloproteinase INSIGHTS ON ADAM FUNCTION. J Biol Chem 284 , 29077-29086.

Longenecker, K.L., Garrard, S.M., Sheffield, P.J., and Derewenda, Z.S. (2001). Protein crystallization by rational mutagenesis of surface residues: Lys to Ala mutations promote crystallization of RhoGDI. Acta Crystallogr D Biol Crystallogr 57 , 679-688.

Lou, Z., Hou, J., Liang, X., Chen, J., Qiu, P., Liu, Y., Li, M., Rao, Z., and Yan, G. (2005). Crystal structure of a non-hemorrhagic fibrin(ogen)olytic metalloproteinase complexed with a novel natural tri-peptide inhibitor from venom of Agkistrodon acutus. J Struct Biol 152 , 195-203.

Lovejoy, B., Hassell, A.M., Luther, M.A., Weigl, D., and Jordan, S.R. (1994). Crystal structures of recombinant 19-kDa human fibroblast collagenase complexed to itself. Biochemistry 33 , 8207- 8217.

Lovell, S.C., Word, J.M., Richardson, J.S., and Richardson, D.C. (2000). The penultimate rotamer library. Proteins 40 , 389-408.

Lovell, S.C., Davis, I.W., Arendall, W.B., III, de Bakker, P.I., Word, J.M., Prisant, M.G., Richardson, J.S., and Richardson, D.C. (2003). Structure validation by Calpha geometry: phi,psi and Cbeta deviation. Proteins 50 , 437-450.

Ma, B., Kumar, S., Tsai, C.J., and Nussinov, R. (1999). Folding funnels and binding mechanisms. Protein Eng 12 , 713-720.

Mackessy, S.P. (2009). Handbook of Venoms and Toxins of Reptiles, (CRC Press: Boca Raton, Florida).

Maher, D., Wu, X., Schacker, T., Horbul, J., and Southern, P. (2005). HIV binding, penetration, and primary infection in human cervicovaginal tissue. PNAS 102 , 11504-11509.

Maiorov, V.N., and Crippen, G.M. (1994). Significance of root-mean-square deviation in comparing three-dimensional structures of globular proteins. J Mol Biol 235 , 625-634.

Marcinkiewicz, C. (2005). Functional characteristic of snake venom disintegrins: potential therapeutic implication. Curr Pharm Design 11 , 815-827.

Marcussi, S., Oliveira, C.Z., Sant'Ana, C.D., Quintero, A., Menaldo, D.L., Beleboni, R.O., Stabeli, R.G., Giglio, J.R., Fontes, M.R.M., and Soares, A.M. (2007). Snake venom phospholipase A(2) inhibitors: Medicinal chemistry and therapeutic potential. Curr Top Med Chem 7, 743-756.

Markland, F.S. (1998a). Snake venoms and the hemostatic system. Toxicon 36 , 1749-1800.

171 6 BIBLIOGRAPHY Markland, F.S. (1998b). Snake venom fibrinogenolytic and fibrinolytic enzymes: an updated inventory. Registry of Exogenous Hemostatic Factors of the Scientific and Standardization Committee of the International Society on Thrombosis and Haemostasis. Thromb Haemost 79 , 668- 674.

Maruyama, M., Sugiki, M., Yoshida, E., Mihara, H., and Nakajima, N. (1992). Purification and Characterization of 2 Fibrinolytic Enzymes from Bothrops-Jararaca (Jararaca) Venom. Toxicon 30 , 853-864.

Maskos, K., Fernandez-Catalan, C., Huber, R., Bourenkov, G.P., Bartunik, H., Ellestad, G.A., Reddy, P., Wolfson, M.F., Rauch, C.T., Castner, B.J., Davis, R., Clarke, H.R.G., Petersen, M., Fitzner, J.N., Cerretti, D.P., March, C.J., Paxton, R.J., Black, R.A., and Bode, W. (1998). Crystal structure of the catalytic domain of human tumor necrosis factor-alpha-converting enzyme. PNAS 95 , 3408-3412.

Matthews, B.W. (1968). Some crystal forms of bovine chymotrypsinogen B and chymotrypsinogen A. J Mol Biol 33 , 499-501.

Matthews, B.W., Colman, P.M., Jansoniu, Jn., Walsh, K.A., Titani, K., and Neurath, H. (1972). Structure of Thermolysin. Nature 238 , 41-45.

Matthews, B.W. (1975). Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405 , 442-451.

Mazzi, M.V., Marcussi, S., Carlos, G.B., Stabeli, R.G., Franco, J.J., Ticli, F.K., Cintra, A.C., Franca, S.C., Soares, A.M., and Sampaio, S.V. (2004). A new hemorrhagic metalloprotease from Bothrops jararacussu snake venom: isolation and biochemical characterization. Toxicon 44 , 215- 223.

Mazzi, M.V., Magro, A.J., Amui, S.F., Oliveira, C.Z., Ticli, F.K., Stabeli, R.G., Fuly, A.L., Rosa, J.C., Braz, A.S., Fontes, M.R., Sampaio, S.V., and Soares, A.M. (2007). Molecular characterization and phylogenetic analysis of BjussuMP-I: a RGD-P-III class hemorrhagic metalloprotease from Bothrops jararacussu snake venom. J Mol Graph Model 26 , 69-85.

Mazzola, R.D., Jr., Zhu, Z., Sinning, L., McKittrick, B., Lavey, B., Spitler, J., Kozlowski, J., Neng-Yang, S., Zhou, G., Guo, Z., Orth, P., Madison, V., Sun, J., Lundell, D., and Niu, X. (2008). Discovery of novel hydroxamates as highly potent tumor necrosis factor-alpha converting enzyme inhibitors. Part II: optimization of the S3' pocket. Bioorg Med Chem Lett 18 , 5809-5814.

McCoy, A.J. (2004). Liking likelihood. Acta Crystallogr D Biol Crystallogr 60 , 2183.

McPherson, A. (1990). Current approaches to macromolecular crystallization. Eur J Biochem 189 , 1-23.

Merritt, E.A. (1999). Comparing anisotropic displacement parameters in protein structures. Acta Crystallogr D Biol Crystallogr 55 , 1997-2004.

Modesto, J.C.D., Junqueira-de-Azevedo, I.L.M., Neves-Ferreira, A.G.C., Fritzen, M., Oliva, M.L.V., Ho, P.L., Perales, J., and Chudzinski-Tavassi, A.M. (2005). Insularinase A, a prothrombin activator from Bothrops insularis venom, is a metalloprotease derived from a gene encoding protease and disintegrin domains. Biol Chem 386 , 589-600.

172 Bibliography

Moss, M.L., Sklair-Tavron, L., and Nudelman, R. (2008). Drug insight: tumor necrosis factor- converting enzyme as a pharmaceutical target for rheumatoid arthritis. Nat Clin Pract Rheumatol 4, 300-309.

Mosyak, L., Georgiadis, K., Shane, T., Svenson, K., Hebert, T., McDonagh, T., Mackie, S., Olland, S., Lin, L., Zhong, X., Kriz, R., Reifenberg, E.L., Collins-Racie, L.A., Corcoran, C., Freeman, B., Zollner, R., Marvell, T., Vera, M., Sum, P.E., Lavallie, E.R., Stahl, M., and Somers, W. (2008). Crystal structures of the two major aggrecan degrading enzymes, ADAMTS4 and ADAMTS5. Protein Sci 17 , 16-21.

Muniz, J.R., Ambrosio, A.L., Selistre-de-Araujo, H.S., Cominetti, M.R., Moura-da-Silva, A.M., Oliva, G., Garratt, R.C., and Souza, D.H. (2008). The three-dimensional structure of bothropasin, the main hemorrhagic factor from Bothrops jararaca venom: insights for a new classification of snake venom metalloprotease subgroups. Toxicon 52 , 807-816.

Murphy, G. (2008). The ADAMs: signalling scissors in the tumour microenvironment. Nat Rev Cancer 8, 929-941.

Murphy, G. (2009). Regulation of the proteolytic disintegrin metalloproteinases, the 'Sheddases'. Semin Cell Dev Biol 20 , 138-145.

Murshudov, G.N., Vagin, A.A., and Dodson, E.J. (1997). Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr D Biol Crystallogr 53 , 240-255.

Nagase, H., and Woessner, J.F., Jr. (1999). Matrix metalloproteinases. J Biol Chem 274 , 21491- 21494.

Neurath, H., and Walsh, K.A. (1976). Role of proteolytic enzymes in biological regulation (a review). PNAS 73 , 3825-3832.

Nikai, T., Kato, C., Komori, Y., Nodani, H., Homma, M., and Sugihara, H. (1995). Primary Structure of Ac(1)-Proteinase from the Venom of Deinagkistrodon-Acutus (100-Pace Snake) from Taiwan. Biol Pharm Bull 18 , 631-633.

Oefner, C., D'Arcy, A., Hennig, M., Winkler, F.K., and Dale, G.E. (2000). Structure of human neutral endopeptidase (Neprilysin) complexed with phosphoramidon. J Mol Biol 296 , 341-349.

Olah, M., Rad, R., Ostopovici, L., Bora, A., Hadaruga, N., Hadaruga, D., Moldovan, R., Fulias, A., Mracec, M., and Oprea, T.I. (2007). WOMBAT and WOMBAT-PK: Bioactivity Databases for Lead and Drug Discovery. In: Chemical Biology: From Small Molecules to Systems Biology and Drug Design. S.L.Schreiber, T.M.Kapoor, and G.Wess, eds. (Wiley-VCH: New York), pp. 760- 786.

Ondetti, M.A., Williams, N.J., Sabo, E.F., Pluscec, J., Weaver, E.R., and Kocy, O. (1971). Angiotensin-converting enzyme inhibitors from the venom of Bothrops jararaca. Isolation, elucidation of structure, and synthesis. Biochemistry 10 , 4033-4039.

Orth, P., Reichert, P., Wang, W., Prosise, W.W., Yarosh-Tomaine, T., Hammond, G., Ingram, R.N., Xiao, L., Mirza, U.A., Zou, J., Strickland, C., Taremi, S.S., Le, H.V., and Madison, V. (2004). Crystal structure of the catalytic domain of human ADAM33. J Mol Biol 335 , 129-137.

Overall, C.M., and Lopez-Otin, C. (2002). Strategies for MMP inhibition in cancer: innovations for the post-trial era. Nat Rev Cancer 2, 657-672.

173 6 BIBLIOGRAPHY Paine, M.J., Desmond, H.P., Theakston, R.D., and Crampton, J.M. (1992). Purification, cloning, and molecular characterization of a high molecular weight hemorrhagic metalloprotease, jararhagin, from Bothrops jararaca venom. Insights into the disintegrin gene family. J Biol Chem 267 , 22869- 22876.

Paine, M.J., Moura-da-Silva, A.M., Theakston, R.D., and Crampton, J.M. (1994). Cloning of metalloprotease genes in the carpet viper (Echis pyramidum leakeyi). Further members of the metalloprotease/disintegrin gene family. Eur J Biochem 224 , 483-488.

Painter, J., and Merritt, E.A. (2006). Optimal description of a protein structure in terms of multiple groups undergoing TLS motion. Acta Crystallogr D Biol Crystallogr 62 , 439-450.

Perrakis, A., Morris, R., and Lamzin, V.S. (1999). Automated protein model building combined with iterative structure refinement. Nat Struct Biol 6, 458-463.

Petsko, G.A. (1985). Preparation of isomorphous heavy-atom derivatives. Methods Enzymol 114 , 147-156.

Pinto, A.F., Terra, R.M., Guimaraes, J.A., and Fox, J.W. (2007). Mapping von Willebrand factor A domain binding sites on a snake venom metalloproteinase cysteine-rich domain. Arch Biochem Biophys 457 , 41-46.

Pinto, A.F.M., Terra, R.M.S., Guimaraes, J.A., Kashiwagi, M., Nagase, H., Serrano, S.M.T., and Fox, J.W. (2006). Structural features of the reprolysin atrolysin C and tissue inhibitors of metalloproteinases (TIMPs) interaction. Biochem Biophys Res Comm 347 , 641-648.

Puerta, D.T., Lewis, J.A., and Cohen, S.M. (2004). New beginnings for matrix metalloproteinase inhibitors: identification of high-affinity zinc-binding groups. J Am Chem Soc 126 , 8388-8389.

Puerta, D.T., Griffin, M.O., Lewis, J.A., Romero-Perez, D., Garcia, R., Villarreal, F.J., and Cohen, S.M. (2006). Heterocyclic zinc-binding groups for use in next-generation matrix metalloproteinase inhibitors: potency, toxicity, and reactivity. J Biol Inorg Chem 11 , 131-138.

Raha, K., Peters, M.B., Wang, B., Yu, N., Wollacott, A.M., Westerhoff, L.M., and Merz, K.M., Jr. (2007). The role of quantum mechanics in structure-based drug design. Drug Discov Today 12 , 725-731.

Ramachandran, G.N., and Sasisekharan, V. (1968). Conformation of polypeptides and proteins. Adv Protein Chem 23 , 283-438.

Ramos, O.H., and Selistre-de-Araujo, H.S. (2004). Comparative analysis of the catalytic domain of hemorrhagic and non-hemorrhagic snake venom metallopeptidases using bioinformatic tools. Toxicon 44 , 529-538.

Ramos, O.H., and Selistre-de-Araujo, H.S. (2006). Snake venom metalloproteases-structure and function of catalytic and disintegrin domains. Comp Biochem Phys C 142 , 328-346.

Rao, B.G., Bandarage, U.K., Wang, T.S., Come, J.H., Perola, E., Wei, Y.Y., Tian, S.K., and Saunders, J.O. (2007). Novel thiol-based TACE inhibitors: Rational design, synthesis, and SAR of thiol-containing aryl sulfonamides. Bioorg Med Chem Lett 17 , 2250-2253.

Read, R.J. (1986). Improved Fourier coefficients for maps using phases from partial structures with errors. Acta Crystallogr Sect A 36 , 878-884.

174 Bibliography

Read, R.J. (1990). Structure factor probabilities for related structures. Acta Crystallogr Sect A 46 , 900-912.

Rella, M., Rushworth, C.A., Guy, J.L., Turner, A.J., Langer, T., and Jackson, R.M. (2006). Structure-based pharmacophore design and virtual screening for novel angiotensin converting enzyme 2 inhibitors. J Chem Inf Model 46 , 708-716.

Rhodes, G. (2000). Crystallography made crystal clear, (Academic press: San Diego).

Rice, L.M., and Brunger, A.T. (1994). Torsion angle dynamics: reduced variable conformational sampling enhances crystallographic structure refinement. Proteins 19 , 277-290.

Richardson, J.S. (1985). Schematic drawings of protein structures. Methods Enzymol 115 , 359- 380.

Richardson, J.S. (1985). Describing patterns of protein tertiary structure. Methods Enzymol 115 , 341-358.

Righetti, P.G. (1990). Recent developments in electrophoretic methods. J Chromatogr 516 , 3-22.

Robeva, A., Politi, V., Shannon, J.D., Bjarnason, J.B., and Fox, J.W. (1991). Synthetic and endogenous inhibitors of snake venom metalloproteinases. Biomed Biochim Acta 50 , 769-773.

Rodrigues, V.M., Soares, A.M., Guerra-Sa, R., Rodrigues, V., Fontes, M.R.M., and Giglio, J.R. (2000). Structural and functional characterization of neuwiedase, a nonhemorrhagic fibrin(ogen)olytic metalloprotease from Bothrops neuwiedi snake venom. Arch Biochem Biophys 381 , 213-224.

Rosenbaum, G., Holmes, K.C., and Witz, J. (1971). Synchrotron Radiation As A Source for X- Ray Diffraction. Nature 230 , 434-439.

Rossmann, M.G., and Blow, D.M. (1962). The detection of sub-units within the crystallographic asymmetric unit. Acta Crystallogr 15 , 31.

Rucavado, A., Nunez, J., and Gutierrez, J.M. (1998). Blister formation and skin damage induced by BaP1, a haemorrhagic metalloproteinase from the venom of the snake Bothrops asper. Int J Exp Pathol 79 , 245-254.

Saitou, N., and Nei, M. (1987). The Neighbor-Joining Method - A New Method for Reconstructing Phylogenetic Trees. Mol Biol Evol 4, 406-425.

Sanbonmatsu, K.Y., and Tung, C.S. (2007). High performance computing in biology: multimillion atom simulations of nanoscale systems. J Struct Biol 157 , 470-480.

Sanchez, E.F., Schneider, F.S., Yarleque, A., Borges, M.H., Richardson, M., Figueiredo, S.G., Evangelista, K.S., and Eble, J.A. (2010). The novel metalloproteinase atroxlysin-I from Peruvian Bothrops atrox (Jergon) snake venom acts both on blood vessel ECM and platelets. Arch Biochem Biophys 496 , 9-20.

Sander, C., and Schneider, R. (1991). Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9, 56-68.

175 6 BIBLIOGRAPHY Schechter, I., and Berger, A. (1967). On the size of the active site in proteases. I. Papain. Biochem Biophys Res Commun 27 , 157-162.

Schomaker, T.R., and Trueblood, K.N. (1968). On the rigid-body motion of molecules in crystals. Acta Crystallogr Sect B 24 , 63-76.

Schulz, G.E. (1981). Protein Differentiation - Emergence of Novel Proteins During Evolution. Angew Chem Int Edit 20 , 143-151.

Schuttelkopf, A.W., and van Aalten, D.M. (2004). PRODRG: a tool for high-throughput crystallography of protein-ligand complexes. Acta Crystallogr D Biol Crystallogr 60 , 1355-1363.

Seals, D.F., and Courtneidge, S.A. (2003). The ADAMs family of metalloproteases: multidomain proteins with multiple functions. Genes Dev 17 , 7-30.

Selistre de Araujo, H.S., and Ownby, C.L. (1995). Molecular cloning and sequence analysis of cDNAs for metalloproteinases from broad-banded copperhead Agkistrodon contortrix laticinctus. Arch Biochem Biophys 320 , 141-148.

Selistre de Araujo, H.S., de Souza, D.H., and Ownby, C.L. (1997). Analysis of a cDNA sequence encoding a novel member of the snake venom metalloproteinase, disintegrin-like, cysteine-rich (MDC) protein family from Agkistrodon contortrix laticinctus. Biochim Biophys Acta 1342 , 109-115.

Serrano, S.M., Jia, L.G., Wang, D., Shannon, J.D., and Fox, J.W. (2005). Function of the cysteine-rich domain of the haemorrhagic metalloproteinase atrolysin A: targeting adhesion proteins collagen I and von Willebrand factor. Biochem J 391 , 69-76.

Serrano, S.M., Kim, J., Wang, D., Dragulev, B., Shannon, J.D., Mann, H.H., Veit, G., Wagener, R., Koch, M., and Fox, J.W. (2006). The cysteine-rich domain of snake venom metalloproteinases is a ligand for von Willebrand factor A domains: role in substrate targeting. J Biol Chem 281 , 39746-39756.

Serrano, S.M., Wang, D., Shannon, J.D., Pinto, A.F., Polanowska-Grabowska, R.K., and Fox, J.W. (2007). Interaction of the cysteine-rich domain of snake venom metalloproteinases with the A1 domain of von Willebrand factor promotes site-specific proteolysis of von Willebrand factor and inhibition of von Willebrand factor-mediated platelet aggregation. FEBS J 274 , 3611-3621.

Shannon, J.D., Baramova, E.N., Bjarnason, J.B., and Fox, J.W. (1989). Amino acid sequence of a Crotalus atrox venom metalloproteinase which cleaves type IV collagen and gelatin. J Biol Chem 264 , 11575-11583.

Sheu, J.R., Yen, M.H., Kan, Y.C., Hung, W.C., Chang, P.T., and Luk, H.N. (1997). Inhibition of angiogenesis in vitro and in vivo: comparison of the relative activities of triflavin, an Arg-Gly-Asp- containing peptide and anti-alpha(v)beta3 integrin monoclonal antibody. Biochim Biophys Acta 1336 , 445-454.

Shimokawa, K., Jia, L.G., Wang, X.M., and Fox, J.W. (1996). Expression, activation, and processing of the recombinant snake venom metalloproteinase, pro-atrolysin E. Arch Biochem Biophys 335 , 283-294.

Shin, J., Hong, S.Y., Chung, K., Kang, I., Jang, Y., Kim, D.S., and Lee, W. (2003). Solution structure of a novel disintegrin, salmosin, from Agkistrondon halys venom. Biochemistry 42 , 14408-14415. 176 Bibliography

Siigur, E., Aaspollu, A., Trummal, K., Tonismagi, K., Tammiste, I., Kalkkinen, N., and Siigur, J. (2004). Factor X activator from Vipera lebetina venom is synthesized from different genes. BBA- Prot Proteom 1702 , 41-51.

Singhamatr, P., and Rojnuckarin, P. (2007). Molecular cloning of albolatin, a novel snake venom metalloprotease from green pit viper (Trimeresurus albolabris), and expression of its disintegrin domain. Toxicon 50 , 1192-1200.

Smellie, A., Teig, S.L., and Towbin, P. (1994). Poling: Promoting Conformational Variation. J Comp Chem 16 , 171-187.

Smellie, A., Kahn, S.D., and Teig, S.L. (1995a). Analysis of Conformational Coverage. 1. Validation and Estimation of Coverage. J Chem Inf Comput Sci 35 , 285-294.

Smellie, A., Kahn, S.D., and Teig, S.L. (1995b). Analysis of Conformational Coverage. 2. Applications of Conformational Models. J Chem Inf Comput Sci 35 , 295-304.

Sotomayor, M., and Schulten, K. (2007). Single-molecule experiments in vitro and in silico. Science 316 , 1144-1148.

Souza, D.H., Selistre-de-Araujo, H.S., Moura-da-Silva, A.M., Della-Casa, M.S., Oliva, G., and Garratt, R.C. (2001). Crystallization and preliminary X-ray analysis of jararhagin, a metalloproteinase/disintegrin from Bothrops jararaca snake venom. Acta Crystallogr D Biol Crystallogr 57 , 1135-1137.

Springman, E.B., Angleton, E.L., Birkedal-Hansen, H., and Van Wart, H.E. (1990). Multiple modes of activation of latent human fibroblast collagenase: evidence for the role of a Cys73 active- site zinc complex in latency and a "cysteine switch" mechanism for activation. PNAS 87 , 364-368.

Steinbrecher, T., and Labahn, A. (2010). Towards accurate free energy calculations in ligand protein-binding studies. Curr Med Chem 17 , 767-785.

Sternlicht, M.D., and Werb, Z. (2001). How matrix metalloproteinases regulate cell behavior. Annu Rev Cell Dev Biol 17 , 463-516.

Stocker, K. (1999). Use of snake venom proteins in medicine. Schweiz Med Wschr 129 , 205-216.

Stocker, W., and Bode, W. (1995). Structural features of a superfamily of zinc-endopeptidases: the metzincins. Curr Opin Struct Biol 5, 383-390.

Stocker, W., Grams, F., Baumann, U., Reinemer, P., Gomis-Ruth, F.X., McKay, D.B., and Bode, W. (1995). The metzincins--topological and sequential relations between the astacins, adamalysins, serralysins, and matrixins (collagenases) define a superfamily of zinc-peptidases. Protein Sci 4, 823-840.

Swaminathan, S., and Eswaramoorthy, S. (2000). Structural analysis of the catalytic and binding sites of Clostridium botulinum neurotoxin B. Nat Struct Biol 7, 693-699.

Swenson, S., and Markland, F.S., Jr. (2005). Snake venom fibrin(ogen)olytic enzymes. Toxicon 45 , 1021-1039.

177 6 BIBLIOGRAPHY Takahashi, T., and Ohsaka, A. (1970). Purification and characterization of a proteinase in the venom of Trimeresurus flavoviridis. Complete separation of the enzyme from hemorrhagic activity. Biochim Biophys Acta 198 , 293-307.

Takeda, S., Igarashi, T., Mori, H., and Araki, S. (2006). Crystal structures of VAP1 reveal ADAMs' MDC domain architecture and its unique C-shaped scaffold. EMBO J 25 , 2388-2396.

Takeda, S., Igarashi, T., and Mori, H. (2007). Crystal structure of RVV-X: An example of evolutionary gain of specificity by ADAM proteinases. FEBS Lett 581 , 5859-5864.

Takeda, S. (2009). Three-dimensional domain architecture of the ADAM family proteinases. Semin Cell Dev Biol 20 , 146-152.

Takeya, H., Arakawa, M., Miyata, T., Iwanaga, S., and Omori-Satoh, T. (1989). Primary structure of H2-proteinase, a non-hemorrhagic metalloproteinase, isolated from the venom of the habu snake, Trimeresurus flavoviridis. J Biochem 106 , 151-157.

Takeya, H., Onikura, A., Nikai, T., Sugihara, H., and Iwanaga, S. (1990). Primary Structure of A Hemorrhagic Metalloproteinase, Ht-2, Isolated from the Venom of Crotalus-Ruber-Ruber. J Biochem 108 , 711-719.

Takeya, H., Nishida, S., Miyata, T., Kawada, S., Saisaka, Y., Morita, T., and Iwanaga, S. (1992). Coagulation factor X activating enzyme from Russell's viper venom (RVV-X). A novel metalloproteinase with disintegrin (platelet aggregation inhibitor)-like and C-type lectin-like domains. J Biol Chem 267 , 14109-14117.

Tallant, C., Garcia-Castellanos, R., Baumann, U., and Gomis-Ruth, F.X. (2010). On the Relevance of the Met-turn Methionine in Metzincins. J Biol Chem 285 , 13951-13957.

Taylor, A.B., Smith, B.S., Kitada, S., Kojima, K., Miyaura, H., Otwinowski, Z., Ito, A., and Deisenhofer, J. (2001). Crystal structures of mitochondrial processing peptidase reveal the mode for specific cleavage of import signal sequences. Structure 9, 615-625.

Tortorella, M.D., Tomasselli, A.G., Mathis, K.J., Schnute, M.E., Woodard, S.S., Munie, G., Williams, J.M., Caspers, N., Wittwer, A.J., Malfait, A.M., and Shieh, H.S. (2009). Structural and inhibition analysis reveals the mechanism of selectivity of a series of aggrecanase inhibitors. J Biol Chem 284, 24185-24191.

Tronrud, D.E. (2004). Introduction to macromolecular refinement. Acta Crystallogr D Biol Crystallogr 60 , 2156-2168.

Trummal, K., Tonismagi, K., Siigur, E., Aaspollu, A., Lopp, A., Sillat, T., Saat, R., Kasak, L., Tammiste, I., Kogerman, P., Kalkkinen, N., and Siigur, J. (2005). A novel metalloprotease from Vipera lebetina venom induces human endothelial cell apoptosis. Toxicon 46 , 46-61.

Tsai, C.J., and Nussinov, R. (1997a). Hydrophobic folding units at protein-protein interfaces: implications to protein folding and to protein-protein association. Protein Sci 6, 1426-1437.

Tsai, C.J., and Nussinov, R. (1997b). Hydrophobic folding units derived from dissimilar monomer structures and their interactions. Protein Sci 6, 24-42.

Tsai, I.H., Wang, Y.M., Chiang, T.Y., Chen, Y.L., and Huang, R.J. (2000). Purification, cloning and sequence analyses for pro-metalloprotease-disintegrin variants from Deinagkistrodon acutus venom and subclassification of the small venom metalloproteases. Eur J Biochem 267 , 1359-1367. 178 Bibliography

Unge, T. (1999). Crystallization methods. In: Protein Crystallization: Techniques, Strategies, and Tips. T.M.Bergfors, ed. (International University Line: La Jolla), pp. 7-17.

Vagin, A., and Teplyakov, A. (1997). MOLREP: an automated program for molecular replacement. J Appl Cryst 30 , 1022-1025.

van Goor, H., Melenhorst, W.B., Turner, A.J., and Holgate, S.T. (2009). Adamalysins in biology and disease. J Pathol 219 , 277-286.

Verdonk, M.L., Cole, J.C., Hartshorn, M.J., Murray, C.W., and Taylor, R.D. (2003). Improved protein-ligand docking using GOLD. Proteins 52 , 609-623.

Vieth, M., Hirst, J.D., and Brooks, C.L., III (1998). Do active site conformations of small ligands correspond to low free-energy solution structures? J Comput Aided Mol Des 12 , 563-572.

Vonrhein, C., and Schulz, G.E. (1999). Locating proper non-crystallographic symmetry in low- resolution electron-density maps with the program GETAX. Acta Crystallogr D Biol Crystallogr 55 , 225-229.

Vuotila, T., Ylikontiola, L., Sorsa, T., Luoto, H., Hanemaaijer, R., Salo, T., and Tjaderhane, L. (2002). The relationship between MMPs and pH in whole saliva of radiated head and neck cancer patients. J Oral Pathol Med 31 , 329-338.

Wagstaff, S.C., Laing, G.D., Theakston, R.D.G., Papaspyridis, C., and Harrison, R.A. (2006). Bioinformatics and multiepitope DNA immunization to design rational snake antivenom. Plos Medicine 3, 832-844.

Wallace, A.C., Laskowski, R.A., and Thornton, J.M. (1995). LIGPLOT: a program to generate schematic diagrams of protein-ligand interactions. Protein Eng 8, 127-134.

Wallnoefer, H.G., Lingott, T., Gutierrez, J.M., Merfort, I., and Liedl, K.R. (2010). Backbone Flexibility Controls the Activity and Specificity of a Protein-Protein Interface: Specificity in Snake Venom Metalloproteases. J Am Chem Soc 132 , 10330-10337.

Wan, S.G., Jin, Y., Lee, W.H., and Zhang, Y. (2006). A snake venom metalloproteinase that inhibited cell proliferation and induced morphological changes of ECV304 cells. Toxicon 47 , 480- 489.

Wang, W.J., Shih, C.H., and Huang, T.F. (2004). A novel P-I class metalloproteinase with broad substrate-cleaving activity, agkislysin, from Agkistrodon acutus venom. Biochem Biophys Res Commun 324 , 224-230.

Warren, G.L., Andrews, C.W., Capelli, A.M., Clarke, B., LaLonde, J., Lambert, M.H., Lindvall, M., Nevins, N., Semus, S.F., Senger, S., Tedesco, G., Wall, I.D., Woolven, J.M., Peishoff, C.E., and Head, M.S. (2006). A critical assessment of docking programs and scoring functions. J Med Chem 49 , 5912-5931.

Watanabe, L., Rucavado, A., Kamiguti, A., Theakston, R.D., Gutierrez, J.M., and Arni, R.K. (2002). Crystallization and preliminary diffraction data of BaP1, a haemorrhagic metalloproteinase from Bothrops asper snake venom. Acta Crystallogr D Biol Crystallogr 58 , 1034-1035.

179 6 BIBLIOGRAPHY Watanabe, L., Shannon, J.D., Valente, R.H., Rucavado, A., Alape-Giron, A., Kamiguti, A.S., Theakston, R.D.G., Fox, J.W., Gutierrez, J.M., and Arni, R.K. (2003). Amino acid sequence and crystal structure of BaP1, a metalloproteinase from Bothrops asper snake venom that exerts multiple tissue-damaging activities. Protein Sci 12 , 2273-2281.

White, J.M. (2003). ADAMs: modulators of cell-cell and cell-matrix interactions. Curr Opin Cell Biol 15 , 598-606.

Willis, T.W., and Tu, A.T. (1988). Purification and Biochemical-Characterization of Atroxase, A Nonhemorrhagic Fibrinolytic Protease from Western Diamondback Rattlesnake Venom. Biochemistry 27 , 4769-4777.

Wilson, A.J.C. (1949). The probability distributions of X-ray intensities. Acta Crystallogr 2, 318-321.

Winn, M.D., Isupov, M.N., and Murshudov, G.N. (2001). Use of TLS parameters to model anisotropic displacements in macromolecular refinement. Acta Crystallogr D Biol Crystallogr 57 , 122-133.

Winn, M.D., Murshudov, G.N., and Papiz, M.Z. (2003). Macromolecular TLS refinement in REFMAC at moderate resolutions. Methods Enzymol 374 , 300-321.

Wolber, G., and Langer, T. (2005). LigandScout: 3-D pharmacophores derived from protein- bound ligands and their use as virtual screening filters. J Chem Inf Model 45 , 160-169.

Wolfsberg, T.G., Bazan, J.F., Blobel, C.P., Myles, D.G., Primakoff, P., and White, J.M. (1993). The precursor region of a protein active in sperm-egg fusion contains a metalloprotease and a disintegrin domain: structural, functional, and evolutionary implications. PNAS 90 , 10783-10787.

Wu, E.L., Wong, K.Y., Zhang, X., Han, K.L., and Gao, J.L. (2009). Determination of the Structure Form, of the Fourth Ligand of Zinc in Acutolysin A Using Combined Quantum Mechanical and Molecular Mechanical Simulation. Journal of Physical Chemistry B 113 , 2477- 2485.

Wu, W.B., Chang, S.C., Liau, M.Y., and Huang, T.F. (2001). Purification, molecular cloning and mechanism of action of graminelysin I, a snake-venom-derived metalloproteinase that induces apoptosis of human endothelial cells. Biochem J 357 , 719-728.

Xu, X., Wang, C., Liu, J., and Lu, Z. (1981). Purification and characterization of hemorrhagic components from Agkistrodon acutus (hundred pace snake) venom. Toxicon 19 , 633-644.

Xu, X.L., Liu, X.H., Wu, B., Liu, Y., Liu, W.Q., Xie, Y.S., and Liu, Q.L. (2004). Metal-ion- and pH-induced conformational changes of acutolysin D from Agkistrodon acutus venom probed by fluorescent spectroscopy. Biopolymers 74 , 336-344.

Yong, V.W. (2005). Metalloproteinases: mediators of pathology and regeneration in the CNS. Nat Rev Neurosci 6, 931-944.

You, W.K., Seo, H.J., Chung, K.H., and Kim, D.S. (2003). A novel metalloprotease from Gloydius halys venom induces endothelial cell apoptosis through its protease and disintegrin-like domains. J Biochem 134 , 739-749.

Zaki, M.J., and Hsiao, C.-J. (1999). CHARM: An efficient algorithm for closed association rule mining. (Department of Computer Science, Rensselaer Polytechnic Institute: Troy, NY). 180 Bibliography

Zhang, D.C., Botos, I., Gomisruth, F.X., Doll, R., Blood, C., Njoroge, F.G., Fox, J.W., Bode, W., and Meyer, E.F. (1994). Structural Interaction of Natural and Synthetic Inhibitors with the Venom Metalloproteinase, Atrolysin-C (Form-D). PNAS 91 , 8447-8451.

Zhao, W., Chu, W.S., Li, S.J., Liu, Y.W., Gao, B., Niu, L.W., Teng, M.K., Benfatto, M., Hu, T.D., and Wu, Z.Y. (2007). X-ray absorption near edge structure study on Acutolysin-C, a zinc- metalloproteinase from Agkistrodon acutus venom: Insight into the acid-inactive mechanism. Spectrochim Acta B 62 , 1246-1251.

Zhou, Q., Smith, J.B., and Grossman, M.H. (1995). Molecular cloning and expression of catrocollastatin, a snake-venom protein from Crotalus atrox (western diamondback rattlesnake) which inhibits platelet adhesion to collagen. Biochem J 307 , 411-417.

Zhu, X.Y., Teng, M.K., and Niu, L.W. (1999). Structure of acutolysin-C, a haemorrhagic toxin from the venom of Agkistrodon acutus, providing further evidence for the mechanism of the pH- dependent proteolytic reaction of zinc metalloproteinases. Acta Crystallogr D Biol Crystallogr 55 , 1834-1841.

Zhu, Z., Gong, W., Zhu, X., Teng, M., and Niu, L. (1997). Purification, characterization and conformational analysis of a haemorrhagin from the venom of Agkistrodon acutus. Toxicon 35 , 283-292.

Zhu, Z.Q., Gao, Y.X., Zhu, Z.L., Yu, Y., Zhang, X., Zang, J.Y., Teng, M.K., and Niu, L.W. (2009). Structural basis of the autolysis of AaHIV suggests a novel target recognizing model for ADAM/reprolysin family proteins. Biochem Biophys Res Comm 386 , 159-164.

Zink, M., and Grubmuller, H. (2009). Mechanical properties of the icosahedral shell of southern bean mosaic virus: a molecular dynamics study. Biophys J 96 , 1350-1363.

181 7 APPENDIX

7 Appendix

7.1 Calibration of gel permeation column

Figure 79 Calibration curve for the Superdex-200 26/60 column.

7.2 Successful crystallization conditions with streak seeding

Table 22 Successful crystallization conditins of Wizard TM II screen. Wizard TM II Precipitant Buffer [0.1 M] Salt [0.2 M] pH value

#1 10% (w/v) PEG-3k acetate Zn(OAc) 2 4.5 #2 35% (v/v) MPD MES Li 2SO 4 6.0 #7 30% (w/v) PEG-3k Tris NaCl 7.0 #8 10% (w/v) PEG-8k Na/K phosphate NaCl 6.2

#9 2.0 M (NH 4)2SO 4 cacodylate NaCl 6.5 #20 15% (v/v) Ethanol MES Zn(OAc) 2 6.0 #24 30% (w/v) PEG-8k imidazol NaCl 8.0 #26 30% (w/v) PEG-400 CHES - 9.5 #32 20% (w/v) PEG-1k Tris - 8.5 #34 10% (w/v) PEG-8k imidazol - 8.0 #37 1.0 M Na/K tartrate Tris Li 2SO 4 7.0 #38 2.5 M NaCl acetate Li 2SO 4 4.5 #44 20% (w/v) PEG-1k cacodylate MgCl 2 6.5 #47 2.5 M NaCl imidazol Zn(OAc) 2 8.0

182 7.2 Successful crystallization conditions with streak seeding

Table 23 Successful crystallization conditins of Wizard TM I sceen. Wizard TM I Precipitant Buffer [0.1 M] Salt [0.2 M] pH value #1 20% (w/v) PEG-8k CHES - 9.5 #2 10% (v/v) iso -propanol HEPES NaCl 7.5 #6 20% (w/v) PEG-3k citrate - 5.5

#8 2.0 M (NH 4)2SO 4 citrate - 5.5

#15 10% (w/v) PEG-3k imidazol Li 2SO 4 8.0 #17 30% (w/v) PEG-8k acetate Li 2SO 4 4.5 #26 10% (w/v) PEG-3k CHES - 9.5 #28 20% (w/v) PEG-3k HEPES NaCl 7.5 #41 30% (w/v) PEG-3k CHES - 9.5 #43 35% (v/v) MPD Na/K phosphate - 6.2

7.3 Crystal contacts in BaP1 models of dataset I, II, III, and IV

Table 24 Crystal contacts of the BaP1*inhibitor model of dataset I in space group P2 12121. No. Sym-op # / Surface H-bonds / Interfacing residues translation a area b salt [Å 2] bridges Chain 1 Chain 2 20-23, 25, 28, 31, 102, 105- 5, 7, 9, 34, 38, 50, I #3 / 0, -1, 0 553 7 / 2 106, 127, 129-132, 135, 172 52-55, 89-91, 93, 197-202 72, 75, 104-105, 107, 191-194, 196, II #3 / 1, -1, 0 334 0 / 0 110-113, 121, 151-152 199, 201-202 1-2, 46, 158-161, III #2 / 0, 1, -1 263 1 / 0 61-66, 84-85, 88-89 177, 180, 183, 187 112-115, 118, 121, IV #1 / -1, 0, 0 235 1 / 0 24-28, 30-31, 34-35, 38 188, 192-193 V #4 / -1, 0, 1 59 1 / 0 161, 174-175 154, 156, 158 a Number of the symmetry operation as defined in the International Tables for Crystallography (1996): #1, x, y, z; #2, ½-x, -y, ½+ z; #3, -x, ½+ y, ½-z; #4, ½+ x, ½-y, -z; unit cell translation in fractional coordinates. Applying the lattice translation and the symmetry operation to chain 2 generates the contact. b Area covered by the contact. Total surface area for chain A, 9 ′527 Å2.

183 7 APPENDIX

Table 25 Crystal contacts of the BaP1*inhibitor model of dataset II in space group P2 12121. No. Sym-op # / Surface H-bonds / Interfacing residues translation a area b salt [Å 2] bridges Chain 1 Chain 2 20-23, 25, 28, 102, 105-106, 5, 7, 9, 34, 38, 50, I #3 / 0, -1, 0 540 7 / 2 127, 129-132, 135, 172 52-55, 89-91, 93, 197-202 72, 75, 104-105, 107, 110- 191-194, 196, II #3 / 1, -1, 0 382 2 / 0 113, 121, 146, 151-152, 168 199, 201-202 1-2, 46, 159-161, III #2 / 0, 1, -1 254 2 / 1 61-66, 84-85, 88-89 177, 180, 187 112-115, 118, 121, IV #1 / -1, 0, 0 237 4 / 0 24-28, 30-31, 34-35, 38 188, 192-193 V #4 / -1, 0, 1 59 0 / 0 161-162, 174-175 154, 156, 158 a See above. b Area covered by the contact. Total surface area for chain A, 9 ′423 Å2.

Table 26 Crystal contacts of the BaP1*inhibitor model of dataset III in space group P2 12121. No. Sym-op # / Surface H-bonds / Interfacing residues translation a area b salt [Å 2] bridges Chain 1 Chain 2 20-23, 25, 28, 102, 105-106, 5, 7, 9, 34, 38, 50, I #3 / 0, -1, 0 549 7 / 2 127, 129-132, 135, 172 52-55, 89-91, 93, 197-202 72, 75, 104-105, 107, 191-194, 196, II #3 / 1, -1, 0 379 0 / 0 110-113, 151-152, 168 199, 201-202 1-2, 46, 159-161, III #2 / 0, 1, -1 258 2 / 2 61-66, 84-85, 88-89 177, 180, 187 112-115, 118, 121, IV #1 / -1, 0, 0 242 3 / 0 24-28, 30-31, 34-35, 38 188, 192-193 V #4 / -1, 0, 1 54 0 / 0 161, 174-175 154, 156, 158 a See above. b Area covered by the contact. Total surface area for chain A, 9 ′444 Å2.

Table 27 Crystal contacts of the BaP1*inhibitor model of dataset IV in space group P2 12121. No. Sym-op # / Surface H-bonds / Interfacing residues translation a area b salt [Å 2] bridges Chain 1 Chain 2 20-23, 25, 28, 102, 105-106, 5, 7, 9, 30, 34, 38, 50, I #3 / 0, -1, 0 571 7 / 2 127, 129-132, 135, 169, 172 52-55, 89-91, 93, 197-202 72, 75, 104-105, 107, 110- 191-194, 196, II #3 / 1, -1, 0 369 0 / 0 113, 151-152, 168 199, 201-202 112-115, 118, 121, III #1 / -1, 0, 0 253 3 / 0 24-28, 30-31, 34-35, 38 188, 192-193 1-2, 46, 159-161, IV #2 / 0, 1, -1 241 1 / 0 61-66, 84-85, 88-89 177, 180, 187 V #4 / -1, 0, 1 54 0 / 0 161, 174-175 154, 156, 158 a See above. b Area covered by the contact. Total surface area for chain A, 9 ′587 Å2.

184 7.4 Refinement statistics of dataset I, II, III, and IV

7.4 Refinement statistics of dataset I, II, III, and IV

Table 28 Refinement statistics for all four datasets. Dataset I II III IV Resolution range [Å] 18.2 - 1.14 17.8 - 1.46 19.7 - 1.05 18.2 - 1.08 Unique reflections 66'242 32'627 82'161 74'965 Reflections in test set a 2'403 (3.5) 1'631 (5.0) 2'542 (3.0) 2'318 (3.0) b Rcryst [%] 13.0 14.8 11.7 11.8 b Rfree [%] 16.0 17.8 14.4 14.3 Number of atoms Protein 1747 1657 1698 1702 Water molecules 345 376 391 370 Ligands 51 37 37 42 Total 2143 2070 2126 2114 B-factors [Å 2] Main chain 5.6 6.2 5.3 6.0 Side chain 8.0 9.0 8.1 8.7 Ligands 10.3 8.2 8.2 8.0 Water molecules 20.4 22.2 19.8 20.5 Total 9.1 10.3 9.1 9.8 RMS deviations b Bond lengths [Å] 0.016 0.018 0.017 0.017 Bond angles [°] 1.776 1.861 1.793 1.748 Ramachandran statistics favored region 98.0 98.5 98.5 99.0 allowed region 2.0 1.5 1.5 1.0 outlier region 0.0 0.0 0.0 0.0 PDB accession code 2W13 2W12 2W15 2W14 a Percentage of reflections in the test set is given in parentheses. b R-factors and RMS deviations after the last refinement step with REFMAC. c Ramachandran statistics calculated with RAMPAGE.

185 7 APPENDIX

7.5 SVMP sequences deposited in the UniProtKB/SwissProt database

7.5.1 P-I snake venom metalloproteinases

186 7.5 SVMPs sequences deposited in the UniProtKB/Swiss Prot database

187 7 APPENDIX 7.5.2 P-II snake venom metalloproteinases

188 7.5 SVMPs sequences deposited in the UniProtKB/Swiss Prot database

7.5.3 P-III snake venom metalloproteinases

189 7 APPENDIX

190 7.6 Multiple sequence alignments of the metalloproteinase domain of SVMPs

7.6 Multiple sequence alignments of the metalloproteinase domain of SVMPs

7.6.1 Metalloproteinase domain of P-I snake venom metalloproteinases

Figure 80 Sequence alignment of the catalytic domain of all P-I SVMPs performed with ClustalW2 (Larkin et al., 2007). Residue numbering and secondary structure elements (light gray: α- helices; black: β-sheets; gray: 3 10 -helix) correspond to those of BaP1 (2W15). Highly conserved residues of the active site and the Met-turn are depicted in white letters on black background. Residues of the highly dynamic loop area are shaded light gray. Sequence identities concerning binary alignments with the sequence of BaP1 are given at the end of the sequences. Consensus symbol explanation: (*) identical residues; (:) conserved residue substitution; (.) semi-conserved residue substitution.

191 7 APPENDIX

Figure 80 Continued.

192 7.6 Multiple sequence alignments of the metalloproteinase domain of SVMPs

7.6.2 Metalloproteinase domain of P-II snake venom metalloproteinases

Figure 81 Sequence alignment of the catalytic domain of all P-II SVMPs performed with ClustalW2 (Larkin et al., 2007). Residue numbering and secondary structure elements (light gray: α- helices; black: β-sheets; gray: 3 10 -helix) correspond to those of the P-I SVMP BaP1 (2W15) which is also aligned for reference. Highly conserved residues of the active site and the Met-turn are depicted in white letters on black background. Residues of the highly dynamic loop area are shaded light gray. Sequence identities concerning binary alignments with the sequence of BaP1 are given at the end of the sequences. Consensus symbol explanation: (*) identical residues; (:) conserved residue substitution; (.) semi-conserved residue substitution.

193 7 APPENDIX

Figure 81 Continued.

7.6.3 Metalloproteinase domain of P-III snake venom metalloproteinases

Figure 82 Sequence alignment of the catalytic domain of all P-III SVMPs performed with ClustalW2 (Larkin et al., 2007). Residue numbering and secondary structure elements (light gray: α- helices; black: β-sheets; gray: 3 10 -helix) correspond to those of the P-I SVMP BaP1 (2W15) which is also aligned for reference. Highly conserved residues of the active site and the Met-turn are depicted in white letters on black background. Residues of the highly dynamic loop area are shaded light gray. Sequence identities concerning binary alignments with the sequence of BaP1 are given at the end of the sequences. Consensus symbol explanation: (*) identical residues; (:) conserved residue substitution; (.) semi-conserved residue substitution.

194 7.6 Multiple sequence alignments of the metalloproteinase domain of SVMPs

Figure 82 Continued.

195 7 APPENDIX 7.6.4 Statistical distribution of amino acid residues at the active site of SVMPs

Figure 83 All complete sequences of SVMPs deposited in the UniProtKB/SwissProt database were analyzed concerning amino acid composition at the highly conserved zinc-binding motif. Thereby were involved 35 P-I, 22 P-II, and 27 P-III SVMP sequences of 31 different species and further 48 SVMP sequences of unknown class affiliation (gen.). The 18 residues of an elongated zinc-binding motif show a clear preference for certain physico- chemical characteristics of amino acids.

196 7.6 Multiple sequence alignments of the metalloproteinase domain of SVMPs

7.6.5 Hemorrhagic and non-hemorrhagic P-I snake venom metalloproteinases

Figure 84 Sequence alignment of the catalytic domain of hemorrhagic and non-hemorrhagic P-I SVMPs performed with ClustalW2 (Larkin et al., 2007). Residue numbering and secondary structure elements (light gray: α-helices; black: β-sheets; gray: 3 10 -helix) correspond to those of the P-I SVMP BaP1 (2W15). Highly conserved residues of the active site and the Met-turn are depicted in bold letters. Residues of the highly dynamic loop area are shaded gray. Sequence identities concerning binary alignments with the sequence of BaP1 are given at the end of the sequences. Consensus symbol explanation: (*) identical residues; (:) conserved residue substitution; (.) semi-conserved residue substitution.

197 7 APPENDIX

Figure 84 Continued.

7.7 Biological testing of NCI database screening hits

Table 32 NCI Compounds chosen for biological testing. Compound a Structure Mol. weight Solvent Rel. BaP1 activity [%] ID [g/mol] 50 µM 100µM 100 µM 111595_19* 319.3 DMSO 97.0 89.3

112506_176 431.5 DMSO 92.2 91.3

116451_8 514.6 DMSO 92.4 99.1

131114_214 391.5 - - -

134169_44 474.5 DMSO 98.7 91.0

198 7.7 Biological testing of NCI database screening hits

137541_166 394.3 - - -

14417_82 470.5 - - -

145975_137 505.5 DMSO 98.4 99.2

148988_202* 466.6 DMSO 94.2 85.7

15228_88* 337.4 DMSO 85.3 89.6

159938_51* 367.3 DMSO 89.6 85.2

159940_143 369.3 DMSO 95.6 91.4

187784_357 442.5 DMSO 96.6 96.4

222550_53 250.3 DMSO 95.5 95.3

243024_234* 340.4 DMSO 87.9 94.1

262668_28 431.5 DMSO 95.0 89.5

262679_10* 417.5 DMSO 83.5 87.9

199 7 APPENDIX

305987_4 453.4 DMSO 101.7 97.4

307442_187* 415.3 DMSO 88.5 103.2

321073_358* 367.2 DMSO 84.5 97.1

333465_131 431.5 DMSO 92.4 98.1

333760_54* 396.5 DMSO 95.6 89.9

334043_59 362.2 DMSO 99.6 95.4

334319_6 370.4 DMSO 98.6 97.9

335980_131* 431.5 DMSO 90.2 87.7

338490_125* 372.4 DMSO 85.1 84.4

343504_0 431.5 DMSO 96.4 99.2

344553_73* 377.3 DMSO 89.4 97.5

350115_5 393.4 DMSO 100.6 97.0

369685_20 376.4 DMSO 93.5 96.5

200 7.7 Biological testing of NCI database screening hits

372329_55* 382.4 DMSO 103.7 86.6

382712_131* 391.3 DMSO 91.4 85.7

382713_179 319.3 DMSO 102.1 95.2

382924_69* 307.3 DMSO 91.8 87.9

382925_61 342.3 DMSO 99.1 100.3

38751_146* 411.5 DMSO 98.2 84.7

407177_0 446.4 DMSO 101.8 92.3

41795_32 411.4 DMSO 95.9 92.6

45749_214* 467.5 DMSO 85.3 85.2

45750_8 453.5 DMSO 91.4 90.1

46390_260* 417.4 DMSO 87.4 79.9

618432_165 352.4 DMSO 101.4 95.7

618833_17 338.1 DMSO 100.2 98.1

201 7 APPENDIX

622433_54 351.5 DMSO 92.9 94.8

625896_21 416.9 DMSO 97.8 95.6

627349_24 363.3 DMSO 105.5 97.3

635833_59* 412.5 DMSO 93.7 83.5

637697_271 402.4 DMSO 100.5 97.9

640988_101* 447.4 DMSO 93.4 85.1

640990_257 518.5 DMSO 90.9 94.6

640996_295 441.8 DMSO 107.7 100.0

640998_317 415.8 DMSO 107.6 101.8

641002_154 443.8 DMSO 106.1 100.2

641445_52 495.4 - - -

641992_157 379.8 DMSO 101.3 90.1

202 7.7 Biological testing of NCI database screening hits

644743_145 463.5 - - -

644744_312* 483.3 DMSO 87.8 95.2

644778_268 436.4 DMSO 95.6 92.5

645876_260 637.1 DMSO 103.2 95.1

648607_232 511.5 DMSO 96.4 93.6

648649_224 461.4 - - -

648656_64 517.2 - - -

649558_24* 470.8 DMSO 92.7 88.5

655366_5 449.6 DMSO 100.2 90.5

659162_12 434.4 DMSO 91.1 90.8

203 7 APPENDIX

661081_83 470.0 DMSO 92.2 98.3

666723_5 329.4 DMSO 93.9 93.8

668455_90 319.2 DMSO 95.7 93.9

675197_109 505.5 DMSO 98.9 92.5

71425_15 483.5 DMSO 97.7 102.3

74817_146 306.4 DMSO 104.6 103.0

86747_58 498.6 DMSO 101.7 99.2

89644_25* 371.4 DMSO 88.4 91.4

97916_42 429.6 DMSO 99.1 92.7

a Compounds marked with an asterix were also analyzed in a higher concentration.

204 7.8 Acknowledgements

7.8 Acknowledgements

This work was carried out in the group of Prof. I. Merfort at the Institute for Pharmaceutical Sciences, Albert-Ludwigs-University of Freiburg, Germany.

Foremost, I would like to thank Prof. I. Merfort for offering the opportunity to work freely and independently on this interesting and special topic, her unequivocal support in setting up new collaborations to research groups at the University of Innsbruck, University of Costa Rica as well as the University of Karlsruhe, and her openess for discussion.

Thanks to everyone at the Instituto Clodomiro Picado, especially José Maria Gutiérrez, Teresa Escalante, and Alexandra Rucavado, for teaching solid fundamentals related to snake venom proteins and for the great atmosphere in which work did not feel like work. Besides, I would like to thank Prof. Renato Murillo at the Escuala de Química without whom this PhD thesis probably would not have been possible. Thanks to all the Costa Ricans for so much Pura Vida.

At the University of Innsbruck, I would like to thank everyone at the Computer Aided Molecular Design Group, especially Gerhard Wolber, Patrick Markt, and Simona Distinto, and at the Center for Molecular Biosciences, especially Hannes Wallnoefer and Klaus Liedl, for their endless support in any way, numerous discussions on in silico topics and successful cooperations.

I thank Thomas Steinbrecher for the calculation of binding affinities and pK a values via MD simulations and for the help according to theories behind in silico durg discovery methods.

My special thank goes to the past and present members of the Merfort group, particularly to Christoph Jäger, Kathrin Naumann, Christian Lass, Sandra Ebeling and Claudia Kern. For so many technical and organizational questions, I would like to thank Barbara Schuler.

I am very grateful to the French-Brazilian connection, Agnès Millet, Cleber Schmidt, and Marcio Fronza, for the great atmosphere while sharing the office and the many talks about hiking, skiing, soccer, Brazilian cachaça, and German beer which were helping me a lot to stay sane during the last few months.

I would like to thank Christian Schleberger, Dirk Reinert, and, especially Daniel Kloer at the former Schulz group for teaching principles of X-ray crystallography and essentials in Linux, especially Zossing, and for the critical reading of so many manuscripts.

Finally, I would like to thank my parents for always being there for me and supporting me in all my ideas, as well as my brothers for giving me the possibility to grow up in the best family ever.

205