Docking Study of Matrix Metalloproteinase Inhibitors

MASARYK UNIVERSITY FACULTY OF SCIENCE NATIONAL CENTREFOR BIOMOLECULAR RESEARCH

Docking study of matrix metalloproteinase inhibitors

BACHELORTHESIS

Jan Ryška

Brno, spring 2011 Declaration

Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source.

Jan Ryška

Supervisor: RNDr. Radka Svobodová Vaˇreková,Ph.D. Consultant: MSc. Sushil Kumar Mishra

ii Acknowledgement

I would like to acknowledge my supervisor RNDr. Radka Svobodová Vaˇreková,Ph.D. for her patient leadership and help throughout writ- ing of this thesis. I would also like to thank my consultant MSc. Sushil Kumar Mish- ra for his valuable insights and advice on the topic, Mgr. Martin Prokop, Ph.D. for implementation of parameters for metals in TRI- TON and all members of LCC for support.

iii Keywords docking, matrix metalloproteinase inhibitors, AutoDock, DOCK, zinc parameters, structure-based drug design

iv Contents

1 Introduction ...... 1 2 Theory ...... 3 2.1 Matrix metalloproteinases ...... 3 2.1.1 Structure and function ...... 3 2.1.2 Active site ...... 6 2.1.3 Inhibition ...... 7 2.2 Molecular docking ...... 7 2.2.1 Search algorithms ...... 10 2.2.2 Scoring function ...... 13 3 Methods ...... 15 3.1 Test set ...... 15 3.2 Docking preparation ...... 15 3.2.1 Ligand preparation ...... 15 3.2.2 Receptor preparation ...... 16 3.3 Docking software ...... 17 3.3.1 AutoDock 3 ...... 17 3.3.2 AutoDock 4 ...... 19 3.3.3 AutoDock Vina ...... 20 3.3.4 UCSF DOCK 6.4 ...... 20 3.4 Analysis of results ...... 21 3.4.1 RMSD ...... 21 3.4.2 Binding score ...... 22 4 Results and discussion ...... 24 4.1 Software comparison ...... 24 4.1.1 Geometry prediction ...... 24 4.1.2 Binding afﬁnity prediction ...... 26 4.2 Publication outputs ...... 28 5 Conclusions ...... 29 6 Summary ...... 30 7 Souhrn ...... 31 8 Appendices ...... 32 8.1 Docking results by receptor ...... 32 8.2 List of complexes ordered by binding afﬁnity ...... 38 8.3 Contents of the attached CD ...... 39 Bibliography ...... 40

v 1 Introduction

Computational chemistry [1] is a branch of classical chemistry which uses principles from computer science to solve chemical problems. Both fields are based on the same theoretical grounds, but they are also different in many ways. Cooperation of computational and classical chemistry is bene- ficial for both fields. The computational methods and mathemati- cal descriptions of chemical systems are developed based on results from classical chemistry. On the other hand, computational chemistry can be used to predict experimental results, so the researchers can prepare more efficient and targeted experiments. It can also be used to predict properties of molecules, which are too unstable to work with or too difficult or expensive to prepare or purchase. Molecular docking [2–4] is an in silico computational technique used to predict conformation and binding affinity of intermolecular complexes based on the three-dimensional structures of individual molecules. This method is widely used in the field of structure-based drug design, in which researchers try to find compounds, which will form a stable intermolecular complex with a target protein. The target protein is usually known to play a vital role in a pathological process, so finding a potent inhibitor is crucial in disruption of its function. Initial screening of possibly millions of compounds in a labora- tory conditions is often too expensive and time-consuming process to be feasible and thus fast molecular docking methods are used to eliminate unlikely candidates. Like many methods in the field of computational chemistry, molecular docking uses a number of approximations to reduce a time required for each simulation. As it has many applications, we need to be aware of its accuracy and possible limitations. One of the identified challenges is how docking software handles non-standard atoms, such as metal atoms [5]. While docking programs often employ chemical system descriptions based on molecular mechanics, majority of them requires parameters for individual atom types. An extensive effort has been put into optimization of these parameters for common atom types, such as atom types of carbon and oxygen. However, parameters for non-standard atoms are

1 1. INTRODUCTION usually based on a much smaller data sets, or entirely missing in the software. In this thesis, we present an overview of molecular docking methods as well as comparison of several docking software. As a test set, we used a family of matrix metalloproteinases [6], where interaction of a ligand with a metal ion is crucial in the process of complex formation.

2 2 Theory

This chapter is divided into two parts. First section introduces matrix metalloproteinases, biologically important proteins that are in focus of this thesis. Second section provides overview of the theory behind molecular docking, describing several algorithms and programs employed to solve the docking problem.

2.1 Matrix metalloproteinases

Matrix metalloproteinases (MMPs) [6] are a family of zinc-dependent, calcium-containing endopeptidases. The MMPs belong to a larger family of proteases known as metzincin superfamily [7]. The research in the ﬁeld of MMPs was initi- ated in 1962, when Gross and Lapiere [8] reported the discovery of a collagenolytic enzyme involved in resorbing amphibian tadpole tails during metamorphosis. The enzyme was named interstitial collage- nase (MMP-1) and became the ﬁrst member of the MMP family. Since then, many other members of this family were found in vertebrates, including human, as well as in invertebrates and plants.

2.1.1 Structure and function MMPs can be divided into eight groups based on their domain structure [9]. Three common homologous domains include the N-terminal pro-peptide, the catalytic domain and the hemopexin-like C-terminal domain, which is linked to the catalytic domain by a ﬂexible hinge region. Many MMPs also contain one or more speciﬁc structural features, as shown in Figure 2.1.

Cysteine switch MMPs are synthesized in the form of inactive zymogen. In this form, conserved cysteine residue in the pro-peptide domain interacts with the zinc atom in the active site and prevents binding and cleavage of a substrate. In most MMPs, this particular cysteine is present in

3 2. THEORY

Figure 2.1: Domain structure of MMPs. Meaning of the important abbreviations: Pre – signal peptide; Pro – pro-peptide domain; Fi – ﬁbronectin-like repeats; Fu – recognition motif for furin-like serine proteases; Vn – vitronectin-like inserts [9].

4 2. THEORY

the conserved sequence PRCGxPD. The cleavage of the pro-peptide domain is therefore crucial in the mechanism of MMP activation, so called cysteine switch [10]. Upon release of the proenzyme from the cell, changes in the proenzyme conformation or other proteases may open the cysteine switch and trigger proMMP activation. The pro-peptide domain is subsequently removed in an autocatalytical manner or by proteases. Other, already active MMPs are also capable of proMMP activation.

Biological function MMPs perform a variety of roles in living organisms. They are re- sponsible for the tissue remodeling and degradation of many extra- cellular matrix (ECM) proteins, including collagens, elastins, gelatin, matrix glycoproteins and proteoglycan [6]. More recently, it has also been recognized that they cleave many other peptides and proteins and perform other functions that may be independent of proteolytic activity [11]. Physiological processes where MMPs are involved include angio- genesis (formation of new blood vessels), apoptosis (process of pro- grammed cell death), bone modeling or wound healing.

Role in pathological processes Under normal physiological conditions, level of MMP expression and activity is very low. Transcription of these enzymes is tightly reg- ulated by cytokines or growth factors, including transforming growth factors, interleukins (IL-1, IL-4, IL-6) or tumor necrosis factor alpha (TNFα) [12]. Post-transcriptionally, MMP activity is controlled by interaction between zinc-containing catalytic site and N-terminal pro- peptide domain. When this balance between MMPs and their natural inhibitors is shifted towards enzyme expression and activity, increased tissue degradation occurs. Increased levels of MMP expression have been shown to be involved in a large number of pathological conditions, such as arthritis, Alzheimer’s disease, cardiovascular disease, as well as cancer [6]. MMPs have now been therefore considered important pharmaceutical targets and extensive effort have been put into de-

5 2. THEORY sign of potential drugs based on MMP inhibition [13, 14].

2.1.2 Active site The active site of MMPs features two distinct regions [6]. First one is a groove centered on the catalytic zinc. Second one is a S1’ speci- ﬁcity site, which varies among different members of a family and is important for substrate (or inhibitor) selectivity. The volume of this S1’ subsite varies highly from a small hydrophobic pocket of MMP-7 to a very large site in MMP-8. An example illustrating the protein-ligand interactions occurring in the active site is presented in ﬁg. 2.2.

C O O N C CA

His 128(A) CA CB His 122(A) ND1 CB N CE1 CG CD2 CG Asn 80(A) CD2 ND1 NE2

NE2 ZN 170(A) 3.14 CG2 ZN O CE1 O47 Cgs 173(A) N C 3.22 CB N11 CG1 C34 CZ O O48 CY CA CA CE CB CB CA N35 C5 C N O33 N1 2.67 CG His 118(A) CG 2.99 S4 CD CD2 CC CD2 N CD1 O O32 ND1

C17 CE2 NE2 C Leu 81(A) CA Ala 82(A) CD2 CE1 CE1 CB 2.73 CD1 C20

O27 His 83(A) Pro 138(A) COM Tyr 140(A)

Figure 2.2: Interaction diagram of MMP-1/CGSSer 139(A) complex (PDB ID: 3AYK) generated by LIGPLOT [15]. Note the group chelating the zinc, hydrogen bonds (green) and hydrophobic contacts (orange).

6 2. THEORY

2.1.3 Inhibition Binding affinity The strength of interaction between receptor protein and ligand is very important (but not the only one) criterion to distinguish potent inhibitors (i.e. potential drugs) from the non-binding compounds. This attractive force between protein and ligand is called binding affinity. It is influenced by non-covalent interactions, such as hydrogen bonding, electrostatic or van der Waals interactions.

MMP Inhibitors As mentioned earlier, MMPs are promising pharmaceutical targets, especially for cancer therapy. Large number of both synthetic and natural inhibitors have been identified and tested in clinical trials, but so far with only limited success. While many of these compounds showed cytostatic or anti-angio- genic activity, discovered side effects or low specificity leading to ex- cessive inhibition of MMPs not involved in the particular pathological process led to disappointing results [16]. Current effort is now focused on computer-aided design of more specific inhibitors based on knowledge of three-dimensional structure of many MMPs [17]. The requirements [6] for a compound to be a potent MMP inhibitor are following: a) a functional group capable of chelation of catalytic zinc(II) ion [e.g. carboxylate (COO−), thiolate (S−) or hy- droxamate (CONH-O−)]; b) one or more functional groups capable of interacting with enzyme backbone via hydrogen bonds; c) at least one functional group, which will undergo van der Waals interactions with the protein subsites.

2.2 Molecular docking

With the rapid increase in computational power, in silico methods became widely used in the ﬁelds of structural molecular biology and structure-based drug design. Molecular docking [2–4] is one of these computational techniques. Docking is a method which predicts preferred orientation of one

7 2. THEORY molecule to the second when they bind to form a stable complex. In the field of drug design, first molecule is usually protein and the second one is a small organic molecule, potential drug candidate. Knowledge of preferred orientation of ligand and protein can then be used to predict binding affinity, thus discriminating high-affinity drug candidates from the low-affinity compounds. As a well established technique, certain terms are commonly used in the field. A brief overview of terminology is presented in Table 2.1.

Lock-and-key analogy

Molecular docking is sometimes described as a problem of lock-and- key, where one is interested in ﬁnding the correct orientation of a key (ligand) that will open the lock (protein). While this analogy is simple to understand, it does not account for inherent ﬂexibility of both molecules which is why more appropriate term hand-in-glove is sometimes used.

Term Meaning

Receptor or host The "receiving" molecule, commonly a protein or lock Ligand or guest The complementary molecule binding to a re- or key ceptor, often a small organic molecule Binding mode Relative position of the ligand to the receptor Pose A candidate binding mode Scoring Evaluation of a particular pose based on a number and strength of favorable intermolecular interactions Ranking Classiﬁcation of ligands based on the predicted binding afﬁnity / binding score

Table 2.1: Docking terminology.

8 2. THEORY

Rigid-body docking vs. ﬂexible docking

The docking problem involves many degrees of freedom [18]. There are three translational and three rotational degrees of freedom for each molecule as well as the conformational degrees of freedom for both molecules. The simplest approach to docking is to take into account only translational and rotational degrees of freedom and treat both receptor and ligand as rigid objects. This approach is known as rigid-body docking [18]. It depends from case to case, whether this approxima- tion is accurate enough or not. If there are significant conformational changes within the molecules during the complex formation, this approach is inadequate. How- ever, generation and scoring of all possible conformations is pro- hibitively expensive in computer time. Flexible docking algorithms [19] must therefore take into consid- eration only a selected subset of possible conformational changes. Today with continual increase in computational resources, ligands are often considered flexible and depending on required accuracy, flexibility of amino acid side chains in the vicinity of active site may be considered as well.

Explicit vs. implicit solvent

Another division can be made based on how docking software treats effect of solvent. There are two ways we can include solvent (usually water) and its interaction with protein in our simulation. It is possible to include individual water molecules in our calculations [20]. The simplest model treats water molecules as rigid and relies only on non-bonded interactions. Coulomb’s Law is used to calculate electrostatic interactions and repulsion forces are treated by Lennard-Jones potential. The accuracy of this approach can be enhanced by addition of interaction sites to each molecule, but the computational cost of including water molecules, thus greatly increasing the number of atoms, in the system is expensive. While docking simulations need to be as fast as possible, following approach is commonly used. Implicit solvation [21] (sometimes referred to as continuum sol-

9 2. THEORY vation) is a method, which approximates behavior of many highly dynamic solvent molecules by a continuous medium. In liquids, the potential of mean force can be used to approximate behavior of individual molecules. This approach is less computationally demanding and is commonly used in molecular dynamics and other applications of molecular mechanics.

2.2.1 Search algorithms The search space which the docking software should take into account, theoretically consists of every possible conformation and orientation of the receptor and ligand. While it is impossible to exhaus- tively explore this search space, efﬁcient search algorithm is able to explore its large portion and identify global extrema (i.e. minima in the energy corresponding to the preferred conformations) [22]. The docking problem can be handled manually with help of interactive computer graphics. This solution may work, if we have a good idea of the binding mode of a similar ligand. Automatic software will be however less biased than a human and will consider many more possibilities in much shorter time frame. Overview of the three commonly used automatic docking algorithms is presented in the following paragraphs.

Shape complementarity As the name suggests, software using this geometry-based algorithm will try to find the preferred complex conformation based on degree of shape complementarity [18]. Good example is an algorithm used in one of the docking programs DOCK [23]. The algorithm first generates a "negative image" of the binding site from the molecular surface of the receptor. This image consists of a number of overlapping spheres of varying radii. Each sphere touches the receptor surface at only two points. Ligand atoms are then matched to the sphere centers to find matching sets (cliques) in which all the distances between the ligand atoms in the set are equal to the corresponding sphere center – sphere center distances with a specified tolerance. The ligand can then be oriented in the binding site by performing a least-squares fit of the atoms to the sphere cen-

10 2. THEORY ters. After checking the generated conformations for unfavorable steric clashes, a score is calculated for the particular conformation. The top- scoring conformations are stored for further analysis.

Monte Carlo methods Monte Carlo methods [24] refer to a simulation which uses computer algorithm dependent on a series of (pseudo)random numbers. Its name, derived from the famous Monacco casino, emphasizes in- ﬂuence of randomness in the method. The basic algorithm can be described in the following steps:

1. Randomly generate starting conformation C1.

2. Calculate energy E1 (e.g. using molecular mechanics).

3. Generate new conformation C2. At each iteration, this conformation is produced by a random change of the internal conformation of the ligand (i.e. rotation about a bond in the ligand by random degree) or by rotation or translation of the whole molecule.

4. Calculate E2.

5. Apply so called Metropolis criterion to determine whether the C2 conformation is an improvement over starting conformation C1. Description of the Metropolis criterion:

(a) If the difference between energy of the resulting conformation and the energy of starting conformation,

∆E = E2 − E1, (2.1)

is negative (i.e. the energy of the resulting conformation is lower), then the resulting conformation is accepted and stored as C1. (b) If ∆E is positive, however, a (pseudo)random number between 0 and 1, 0 < R < 1, is generated. The C2 conformation is in this case accepted only if the following condition

11 2. THEORY

is true: e−∆E/T > R. (2.2) Parameter T introduced in this equation denotes temperature-like quantity used to control the acceptance probabil- ity of energetically unfavorable states. As it has the same course as a temperature function, higher values of this parameter allow high-energy states to be considered. (c) If e−∆E/T < R, (2.3) the resulting conformation C2 is refused. 6. Repeat steps 3-5. While Monte Carlo is a stochastic method, it is not guaranteed to ﬁnd optimal complex conformation. Severity of the implications for the docking has not been ﬁrmly established, but various versions of Monte Carlo approach have been implemented in common docking algorithms.

Genetic algorithms Genetic algorithms [25] are search methods that mimic the process of evolution by incorporation of techniques inspired by natural evolution, such as inheritance, mutation or crossover. In genetic algorithm, an initial population of one-dimensional str- ings (called chromosomes), which encode candidate solutions (indi- viduals) evolves toward better solutions. In case of a molecular docking, each individual may represent one possible system configura- tion and each string may contain information about its conformation (e.g. values of angles of rotatable bonds). At the beginning, initial population is randomly generated. In the next step, a subset of the initial population is chosen (based on results of the fitness function, which evaluates the quality of a particular individual). This subset is subsequently used to produce next generation. New generations are produced until a certain number of steps is performed or until a required level of fitness is reached. One example of genetic algorithm which was used in this work is Lamarckian Genetic Algorithm (LGA).

12 2. THEORY

Lamarckian Genetic Algorithm LGA is hybrid genetic algorithm named after Jean-Baptiste Lamarck (1744-1829), a French soldier and academic, who proposed an idea, that organisms can pass on characteristics that they learned or ac- quired during their life to their offspring [26]. While this idea con- tradicts Mendelian genetics and was later disproved, its implementation into genetic algorithm may lead to more accurate docking results. Genome is in LGA represented by ﬂoating point genes (unlike classical genetic algorithms, which use binary representation), each of which encodes one state variable describing molecular position, orientation and conformation.

2.2.2 Scoring function Search algorithms are able to quickly generate large number of possible conformations. The "quality" of these possible solutions need to be compared, so that best binding modes can be selected. This is the purpose of a scoring function used in docking software [27]. Many of the scoring functions in common use attempt to approximate the binding free energy (or other energy-like quantity) for the ligand binding to the receptor; a low (negative) energy indicates stable system and thus a likely receptor-ligand binding interaction. While many ways to predict free energies of binding exist, most of them are too computationally expensive to be of use in the ﬁeld of molecular docking. Faster, more approximate scoring functions tend to be used. These simpliﬁed scoring function usually assume that the binding free energy can be written as a sum of several additive compo- nents representing various contributions to the binding free energy. An equation of this kind would have the following contributions:

∆Gbind = ∆Gsolvent + ∆Gconf + ∆Gint + ∆Grot + ∆Gt/r + ∆Gvib (2.4)

∆Gsolvent represents contributions of solvent effects, which arise from the interaction of the solvent and ligand, receptor and the intermolecular complex. ∆Gconf arises from the conformational changes in both protein and especially more ﬂexible ligand. ∆Gint stands for the free

13 2. THEORY energy of speciﬁc protein-ligand interactions. ∆Grot is the free energy loss caused by freezing of the internal rotations. ∆Gr/t is a change in rotational and translational free energy due to association of receptor and ligand, forming a single body and ∆Gvib corresponds to free energy changes in vibrational modes. More details on each term can be found for example in [18] or [28].

14 3 Methods

Procedure, which was used for docking of MMP/ligand complexes and analysis of results is described in this chapter. Auxiliary software used in docking preparation and evaluation is reported and more details on algorithms and parameters employed by tested docking software are given.

3.1 Test set

Test set used in this thesis consists of 38 complexes of MMPs and various ligands. Three-dimensional coordinates of MMP/ligand structures were obtained from RCSB Protein Data Bank [29]. The structures were experimentally determined by X-ray crystallography or NMR spectroscopy. Complexes of following MMPs were used: MMP-1, -3, -7, -9, -12, -13 and -20. These were reported in complexes with ligands varying in size (the smallest ligand contains only 5 heavy atoms, the largest 33) and functional groups they contain.

3.2 Docking preparation

At the beginning of a docking procedure, water and possibly other superﬂuous molecules (e.g. artifacts of crystallization process) are removed from the structure and three-dimensional coordinates of receptor and ligand are divided into separate ﬁles. Each receptor and ligand is subsequently prepared for docking in several steps. This section describes steps in a docking procedure that are universal for all tested software.

3.2.1 Ligand preparation

Geometry and partial charges of all ligands were optimized. For the geometry optimization, we used Hartree-Fock method using 6-31G* basis set, as implemented in Gaussian 03 [30].

15 3. METHODS

Partial charges were calculated by antechamber [31]. Antecham- ber is a program in the software suite Amber [32], which can be used for atom type assignment, conversion between formats as well as generation of charges by several implemented methods. Docking programs are usually adjusted to work better with partial charges calculated by certain method. Because of this fact, we used different charge calculation methods for AutoDock and DOCK, respectively. These are mentioned in sections 3.3.1 and 3.3.4. The process of ligand preparation can be summarized in these steps:

1. Add all the hydrogen atoms using UCSF Chimera [33]. Hydro- gen atoms are important for correct geometry optimization and calculation of partial charges.

2. Convert the ligand ﬁle to a .com format by Open Babel [34]. This ﬁle contains three-dimensional Cartesian coordinates of each ligand atom, as well as commands recognized by Gaus- sian.

3. Run geometry optimization procedure in Gaussian using pre- viously prepared .com ﬁle as input. Output ﬁle will contain lig- and with optimized geometry.

4. Assign partial charges by antechamber.

3.2.2 Receptor preparation

The receptor structures were checked and eventually corrected by tleap (using ff99SB force ﬁeld), another program in Amber suite used to prepare input ﬁles for simulation programs. These corrections included addition of missing atoms, such as N- and C-terminal atoms or missing hydrogens. Further manipulation of receptor structure then depends on a particular docking software used.

16 3. METHODS 3.3 Docking software

In this section, we present detailed information about tested docking software, including software-speciﬁc steps in receptor and ligand preparation as well as parameters used in docking simulations.

3.3.1 AutoDock 3 AutoDock 3 [35] is an automatic docking software introduced in 1998. While preceding versions of AutoDock used the Metropolis method described in 2.2.1 to search conformational space, version 3 introduced implementation of Lamarckian Genetic Algorithm (LGA). It also uses enhanced scoring function [36] based on the principles of QSAR (quantitative structure-activity relationship), which was pa- rameterized using a large number of protein-ligand complexes. The AutoDock actually consists of two main programs: AutoDock and AutoGrid. AutoDock performs the actual docking of the ligand to a pre-calculated grids describing target protein. AutoGrid, which is run prior to AutoDock, calculates these grids.

AutoGrid AutoGrid pre-calculates grid maps of interaction energies between macromolecule, such as protein, and various atom types, such as aliphatic carbons or hydrogen-bonding oxygens. Doing this pre-calculation saves the time required for the docking, as it reduces the order of complexity of a problem from N 2 to N, with N being the number of interacting atoms.

Ligand preparation RESP charges were added to the ligand ﬁle by antechamber. Further steps in ligand and receptor preparation as well as preparation of docking simulation were carried out in the in-house developed interactive graphics software TRITON [37]. In the next step, the non-polar hydrogens (i.e. the hydrogens in methyl and methylene groups) were merged. This step is necessary because AutoDock uses United Atom model [18].

17 3. METHODS

United Atom model AutoDock uses United Atom model to simplify the system and to reduce the number of degrees of freedom. In this model, non-polar hydrogens are deleted and their charges are merged into the carbon atom which they bind. This way methyl and methylene groups ef- fectively form a single interaction center.

Another parameter that affects accuracy and time required for the simulation is the flexibility allowed to the ligand. This flexibility is expressed in the number of rotatable bonds. While taking into account full flexibility of a ligand is desirable, as it is possible to access larger portion of the conformational space this way, ligands with too many rotatable bonds may present a chal- lenge to the docking software, resulting in increased time required to complete the simulation. In our case, flexibility of ligands was not restricted. That is, all bonds excluding those inherently non-rotatable (e.g. double or aro- matic bonds), were considered freely rotatable.

Receptor preparation The charges on receptor atoms were assigned using the Kollman united atoms force ﬁeld (ff84) [38] as recommended by AutoDock authors. The charges on metal atoms, namely calcium (+2.0 e) and zinc (+0.95 e), were added manually. The non-polar hydrogens were merged and solvation parameters were set automatically by TRI- TON.

Zinc parameters Catalytic zinc plays a crucial role in the binding of a ligand to the active site. Zinc parameters are therefore very important for the correct conformation prediction. For this thesis, results from parametrization study [39] summarized in Table 3.1 were used.

18 3. METHODS

Parameter Charge Atom radius Well depth Value +0.95 e 0.87 Å 0.35 kcal/mol

Table 3.1: Values of zinc parameters.

Speciﬁcally for this work, parameters for metal atoms and capability of conveniently changing atom radii and well depth were implemented in TRITON.

Docking preparation When we have prepared receptor and ligand input ﬁles, it is necessary to set up parameters used for actual docking simulation such as search algorithm, search exhaustiveness or location of input ﬁles.

Search space deﬁnition Docking software requires deﬁnition of the search space. While we know the location of an active site, we can reduce the size of the search space to a box enclosing the binding site. In our case, each box was chosen to be centered on a catalytic zinc and its dimensions were proportional to the ligand size, so the ligand could freely move and rotate in the box during simulation.

LGA was used as a search algorithm and a maximum number of 2.5 . 106 energy evaluations was set. One hundred docking runs were performed and the best scoring conformations were saved for analysis. The rest of the parameters was left at default values.

3.3.2 AutoDock 4 Although AutoDock 4 [40] has several new features and improve- ments over AutoDock 3, such as an enhanced scoring function, the docking procedure itself remains very similar. Only notable excep- tion is a fact that it uses different atom types, which were set automatically by TRITON. Concerning catalytic zinc, the same parameter set as in AutoDock 3 was used, as there is currently not a similar parametrization study

19 3. METHODS

focused on AutoDock 4 known to us.

3.3.3 AutoDock Vina

AutoDock Vina [41] is the newest member of AutoDock suite introduced in 2010. It has been developed by Dr. Oleg Trott in the Molec- ular Graphics Lab at The Scripps Research Institute. It differs from previous versions in many regards, one of them being user-friendliness. Only three-dimensional structures of molecules (with polar hydrogens only, as AutoDock Vina still uses United Atom model) and a box deﬁnition of a search space is required. Partial charges, solvation parameters or pre-calculated interaction energy grids are not necessary for the simulation. Unlike previous versions, it does not provide the user with a choice of search algorithm, instead it uses Iterated Local search global optimizer. In this algorithm, steps consisting of mutation and sub- sequent local optimization are performed, with each step being accepted according to the Metropolis criterion. Details on this algorithm as well as on the scoring function used by AutoDock Vina can be found in the original paper [41]. The preparation of receptor and ligand structures was performed using AutoDock Tools [42], interactive graphics software distributed with AutoDock.

3.3.4 UCSF DOCK 6.4

UCSF DOCK [23] is a docking software which uses geometry-based search algorithm described in 2.2.1. For the scoring function, we decided to use original DOCK score as well as GBSA (Generalized Born/surface area) method [43], which is more computationally expensive, for re-scoring of the best conformations.

Ligand and receptor preparation

Both ligands and proteins were prepared for docking using UCSF Chimera. This interactive graphics software includes a set of tools

20 3. METHODS

for convenient preparation of molecules under the procedure Dock Prep. It automatically carries out all necessary steps (deletion of solvent molecules, adding hydrogens, charge assignment) and writes the output into a ﬁle in Mol2 format. The charges used in DOCK were calculated by AM1-BCC method [44].

Sphere generation Spheres required by a DOCK algorithm were generated using sphgen [23] program, which is also provided with DOCK. As sphgen creates negative image of the protein, its procedure needs information about receptor surface. First step in the sphere set generation was to prepare a file containing only receptor molecule without any hydrogens. This file was used as input for the program dms [45], which calculated surface of the receptor and saved it in .ms file format. Sphgen subsequently used .ms file for the actual sphere generation. Similar to AutoDock, as we know the location of active site, we can significantly narrow down the search space. In DOCK, this is achieved by creation of the box (with the same center and dimensions as in AutoDock) around the binding site, as well as selecting the spheres, which will be used in the docking procedure. In our case, only spheres in the 10 Å radius from the original ligand conformation were considered and selected with the sphereselector utility.

3.4 Analysis of results

Two commonly used criteria were utilized for assessment of docking software accuracy.

3.4.1 RMSD The ﬁrst way to evaluate quality of a docked pose is to compare its geometry relative to the original experimental structure. Difference between two conformations (or any three-dimensional structures) is often measured by computing root-mean square devi- ation (RMSD) [46].

21 3. METHODS

RMSD can be calculated using formula

v u N u 1 X 2 RMSD = t δi , (3.1) N i=1

where N is the total number of atoms in the molecule and δ is a distance between each pair of corresponding atoms. Concerning current docking software accuracy, the RMSD value of 2 Å is commonly used as a cutoff value. Poses closer to the experimentally determined structure (i.e. with RMSD lower than 2 Å) are generally considered sound. RMSD for heavy atoms was calculated using RMSD Tool plugin implemented in an interactive graphics software VMD [47].

3.4.2 Binding score

Accuracy of a scoring function was measured by a comparison of predicted binding score of a ligand with experimental value of free energy of binding. While scoring functions employed in docking software tend to use various approximations to enhance their speed, their accuracy is not on a level of more computationally expensive methods. To provide a context, standard error of AutoDock 4 was estimated to be around 2.5 kcal/mol [48]. In comparing binding energies predicted by docking software with experimentally determined values, two criteria are commonly considered. First one is comparison of absolute values of energy, therefore accuracy of binding energy prediction. On the other hand, in the ﬁeld of drug design, researchers are often more interested in comparing inhibitor potency relative to each other. For this purpose, docking software should ideally be able to rank the ligands from the most to the least potent (predict the correct binding trend), even if the absolute values of binding energy are not accurate. Two correlation co- efﬁcients are commonly used [49] to quantify relationship between actual and predicted biding trend.

22 3. METHODS

Pearson’s correlation coefficient The Pearson product-moment correlation coefficient [50] is a mea- sure of correlation between variables X and Y. It is defined as the covariance of the two variables divided by the product of their standard deviations:

cov(X,Y ) E[(X − µX )(Y − µY )] ρX,Y = = . (3.2) σX σY σX σY The above formula defines the population correlation coefficient ρ. If we substitute the covariances and variances based on a sample, we get sample correlation coefficient r:

Pn ¯ ¯ i=1(Xi − X)(Yi − Y ) r = q q . (3.3) Pn ¯ 2 Pn ¯ 2 i=1(Xi − X) i=1(Yi − Y ) The Pearson’s correlation coefﬁcient ranges from -1 to +1. A value of +1 means that all data points lie on a line for which Y increases as X increases. A value of -1 implies that all data points lie on a line for which Y decreases as X increases. A value of 0 implies that there is no linear relationship between X and Y. The higher the Pearson’s coefﬁcient value between experimental and predicted values of binding energy, the better is the ability of docking software to correctly determine the order of potency of the ligands.

Spearman’s rank correlation coefficient The Spearman’s rank correlation coefficient [51] is defined as the Pearson’s correlation coefficient between ranked values. The values Xi,Yi are first converted to ranks xi, yi and ρs is computed from these:

Pn i=1(xi − x¯)(yi − y¯) ρs = q q . (3.4) Pn 2 Pn 2 i=1(xi − x¯) i=1(yi − y¯) The reason why it is used alongside Pearson’s correlation coefﬁ- cient is that it is much less inﬂuenced by outlier values.

23 4 Results and discussion

In this chapter, we present results from the docking simulations performed in the course of this thesis and compare the software accuracy demonstrated by tested docking programs. For raw data (i.e. RMSD and prediction of binding energy for individual complexes), see Section 8.1.

4.1 Software comparison

4.1.1 Geometry prediction

The results showing geometry prediction accuracy of individual software are presented in Figure 4.1 and Table 4.1. Figure 4.1 shows how many MMP/ligand complexes each software predicted with RMSD in certain range, while Table 4.1 summa- rizes these data in a single value of average RMSD.

20 AutoDock 3 15 AutoDock 4 AutoDock Vina 10 UCSF DOCK 6.4 (DOCK score)

Number of complexes of Number 5 UCSF DOCK 6.4 (GBSA score)

0 < 1 1 - 2 2 -3 3 - 4 > 4 RMSD (Å)

Figure 4.1: Comparison of geometry prediction accuracy in terms of RMSD.

24 4. RESULTS AND DISCUSSION

Software AD3 AD4 Vina DOCK GBSA Average RMSD (Å) 2.03 2.73 1.49 6.02 6.33

Table 4.1: Comparison of software accuracy in terms of average RMSD.

AutoDock Vina When we consider the accuracy of ligand geometry prediction expressed in RMSD values, AutoDock Vina provided the best results. As can be seen in Figure 4.1, its results are also very consistent with only one MMP/ligand complex predicted with RMSD higher than 4 Å. This conclusion is conﬁrmed by average RMSD values.

AutoDock 3, AutoDock 4 Although AutoDock 4 predicted a number of complexes with RMSD higher than 4 Å, both programs provided reasonably good results with majority of conformations being under the 2 Å threshold.

UCSF DOCK 6.4 Commonly used and one of the most recognized software DOCK failed in this test. After examination of output files, we found that DOCK often generated poses very close to the native structure, but evaluated their energy to be much higher than that of the best scoring pose. This was apparent in the case of MMP-13, which has a narrow tunnel in the active site, in which the ligand is usually deeply buried. While all versions of AutoDock mostly managed to predict the correct ligand position, DOCK repeatedly placed the ligand on the edge of the binding pocket. In the case of three MMP-13/ligand complexes, its Anchor-and- Grow algorithm was not able to finish the ligand placement. While we generally followed instructions given in DOCK manual and mailing lists, this failure may have arisen from the van der Waals parameter file used, as vdW term often contributed the most to high energies of native conformations. Further investigation is however

25 4. RESULTS AND DISCUSSION

needed to conﬁrm the cause.

4.1.2 Binding afﬁnity prediction

Figure 4.2 show binding trends predicted by each docking software. Each data point represents one complex and the complexes are ordered by decreasing binding potency (expressed in kcal/mol as negative free energy of binding or negative binding score) as determined from experimental data. For a list of MMP/ligand complexes ordered by binding energy see Section 8.2. To see how accurately docking software predicted binding trend, we calculated Spearman’s rank correlation coefficient and Pearson’s sample correlation coefficient. The values are shown in Table 4.2. Initial values of Pearson’s coefficient (r1) were close to zero, sug- gesting weak predictive capability of tested software. As Pearson’s correlation coefficient is much more sensitive to presence of outliers, we examined the data sets and identified several outliers. Exclusion of three out of thirty-eight data points significantly increased Pearson’s sample correlation coefficient (r2), most notably in case of AutoDock 4.

Software AD3 AD4 Vina DOCK GBSA r1 0.06 0.05 -0.04 -0.01 -0.21 r2 0.25 0.45 0.10 -0.01 -0.21 ρs 0.001 0.23 -0.06 0.09 -0.13

Table 4.2: Pearson’s sample correlation coefﬁcients including (r1) and excluding (r2) outliers and Spearman’s rank correlation coefﬁcients (ρs).

These results suggest that while docking programs are often able to predict correct binding modes of MMP/ligand complexes, they are still inaccurate in the ﬁeld of binding afﬁnity prediction. This is in agreement with recently published article [49], in which the authors tested several commercially available docking programs.

26 4. RESULTS AND DISCUSSION

20,0 Experiment AutoDock 3 18,0

16,0

14,0

12,0

10,0

8,0

Binding score Binding 6,0

4,0

2,0

0,0 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 20,0 MMP/ligand complex AutoDock 4 AutoDock Vina 18,0

16,0

14,0

12,0

10,0

8,0

Binding score Binding 6,0

4,0 0,0 2,0

0,0 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 90,0 MMP/ligand complex DOCK score GBSA score 80,0

70,0

60,0

50,0

40,0

Binding score Binding 30,0

20,0

10,0

0,0 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 MMP/ligand complex

Figure 4.2: Comparison of experimental binding energy and binding score predicted by each software.

27 4. RESULTS AND DISCUSSION 4.2 Publication outputs

The results reported in this thesis have also been published at the conference:

Ryška J., Mishra S. K., Svobodová Vaˇreková R., KoˇcaJ.: Docking study of matrix metalloproteinase inhibitors. IX Discussions in Struc- tural Molecular Biology, 2011. (Poster, March 2011)

28 5 Conclusions

This thesis is focused on evaluation of molecular docking methods and their applications in structure-based drug design. The first part provides an introduction to the in silico molecular docking techniques. It also presents information on the family of matrix metalloproteinases, zinc-dependent proteins used in this study to compare accuracy of docking software. The second part describes in detail the docking procedure, which was used in docking of MMP/ligand complexes, as well as overview of the tested software and criteria used in their assessment. The third part presents docking results obtained from individual programs and their comparison. These results show that recently developed AutoDock Vina finds the conformation of the ligands clos- est to the crystal structure. All tested programs had however significant problems with prediction of reasonable binding energies for the docked complex. Further work may include closer examination of UCSF DOCK results and refinement of its docking procedure for metalloproteins. In future research, AutoDock Vina can be used for docking of the compounds of our interest whose crystal structure is not known. To get the accurate idea about the binders, binding energy calculation should be performed using some molecular dynamics based free energy calculation methods like LIE or MM-PBSA.

29 6 Summary

Molecular docking is an important tool in computational chemistry and computer-aided drug design. The goal of ligand-protein docking is to identify favored binding modes of a ligand with a protein of known three-dimensional structure. This thesis is focused on describing several approaches and algorithms used to find the optimal conformation of resulting ligand- protein complex. It also aims to provide overview and assessment of several commonly used docking software. We tested the programs on a set of matrix metalloproteinases to evaluate their accuracy and treatment of metal atoms. The initial protein and ligand structures have been optimized, docked with each software and the results have been compared with experimental data. While the software were often able to find correct ligand conformations, the results revealed significant problems of tested docking software in prediction of binding energy.

30 7 Souhrn

Molekulové dokování je d ˚uležitýmnástrojem používaným v mnoha oblastech výpoˇcetníchemie. Cílem protein-ligand dokování je naleze- ní energeticky výhodných vazebných mód ˚uligandu s proteinem, je- hož trojrozmˇernoustrukturu známe. Tato práce je zamˇeˇrenana popis nˇekolikabˇežnýchpostup ˚ua al- goritm ˚upoužívaných pˇrihledání optimální konformace výsledného protein-ligand komplexu. Klade si též za cíl poskytnout pˇrehleda srovnání nˇekolikabˇežnˇepoužívaných dokovacích program ˚u. Software byl testován na rodinˇematrix metalloproteinas za úˇcel- em ohodnocení, jak jednotlivé programy zacházejí s atomy kov ˚u. Struktury všech protein ˚ui ligand ˚ubyly optimalizovány, dokovány všemi programy a výsledky byly srovnány s experimentálnˇenamˇeˇre- nými daty. Hodnocený software byl v mnoha pˇrípadechschopen najít správ- nou konformaci ligandu, nicménˇevýsledky ukazují závažné prob- lémy pˇrivýpoˇctuvazebných energií výsledných komplex ˚u.

31 8 Appendices

8.1 Docking results by receptor

Following tables present values of RMSD and binding scores (energies) predicted by AutoDock 3 (AD3), AutoDock 4 (AD4), AutoDock Vina (Vina) and two methods (original DOCK score and GBSA re- scoring) implemented in UCSF DOCK 6.4.

MMP-1

RMSD (Å) PDB ID Ligand AD3 AD4 Vina DOCK GBSA 966C RS2 0.86 0.98 1.13 2.64 3.59 3AYK CGS 2.05 1.94 1.87 4.46 7.54 1FBL HTA 3.72 6.21 1.83 7.32 7.35 2TCL RO4 2.38 2.00 2.40 7.50 7.48 1HFC PLH 2.81 6.31 2.13 6.49 6.66

Table 8.1: RMSD values of MMP-1/ligand complexes.

Binding score / energy (kcal/mol) PDB ID Exp. AD3 AD4 Vina DOCK GBSA 966C -10.4 -12.0 -8.8 -9.7 -66.8 -14.4 3AYK -10.6 -10.4 -6.6 -7.3 -62.4 +0.2 1FBL -11.2 -9.5 -7.5 -6.9 -68.2 -20.5 2TCL -10.0 -9.2 -6.8 -6.7 -74.7 -41.7 1HFC -11.1 -11.3 -8.6 -7.6 -68.8 -29.0

Table 8.2: Binding score/energy values of MMP-1/ligand complexes.

32 8. APPENDICES

MMP-3

RMSD (Å) PDB ID Ligand AD3 AD4 Vina DOCK GBSA 1G05 BBH 1.56 6.42 0.95 1.60 1.54 2JT5 JT5 1.63 1.25 1.77 8.83 3.07 2JT6 JT6 1.76 1.42 1.56 1.34 1.75 2JNP NGH 0.95 1.16 2.65 2.50 2.40 1G4K HQQ 1.24 1.65 1.04 2.80 3.04 1BIW S80 1.87 1.63 1.04 7.87 10.34 1B3D S27 2.14 2.14 1.39 7.82 7.59 1D5J MM3 0.56 0.61 1.12 1.77 2.24 1D7X SPC 0.80 0.71 0.72 0.90 1.34 1D8F SPI 2.72 2.61 1.94 8.79 8.59 1G49 111 1.36 1.68 1.57 4.24 1.35 2D1O FA4 2.15 3.67 2.65 13.22 13.09

Table 8.3: RMSD values of MMP-3/ligand complexes.

Binding score / energy (kcal/mol) PDB ID Exp. AD3 AD4 Vina DOCK GBSA 1G05 -11.6 -10.6 -8.9 -8.7 -68.7 -42.7 2JT5 -9.7 -11.2 -9.4 -9.4 -61.7 -21.7 2JT6 -9.5 -11.2 -9.7 -9.6 -63.6 -23.5 2JNP -9.8 -8.1 -6.7 -6.7 -65.1 -24.4 1G4K -8.3 -11.1 -9.3 -10.1 -54.2 -18.3 1BIW -9.5 -10.4 -8.2 -7.9 -24.3 -16.6 1B3D -10.4 -7.8 -7.8 -7.9 -52.6 -19.7 1D5J -12.5 -11.0 -9.8 -8.6 -55.7 -20.8 1D7X -10.5 -10.0 -8.2 -7.6 -70.6 -28.2 1D8F -10.6 -10.4 -9.5 -8.8 -48.1 -33.3 1G49 -10.6 -9.8 -8.0 -8.5 -78.1 -33.4 2D1O -10.5 -12.0 -5.0 -8.1 -36.8 -28.1

Table 8.4: Binding score/energy values of MMP-3/ligand complexes.

33 8. APPENDICES

MMP-7

RMSD (Å) PDB ID Ligand AD3 AD4 Vina DOCK GBSA 1MMQ RRS 1.05 0.62 0.41 10.32 8.99

Table 8.5: RMSD values of MMP-7/ligand complexes.

Binding score / energy (kcal/mol) PDB ID Exp. AD3 AD4 Vina DOCK GBSA 1MMQ -11.6 -12.8 -10.1 -8.7 -39.3 -20.6

Table 8.6: Binding score/energy values of MMP-7/ligand complexes.

MMP-8

RMSD (Å) PDB ID Ligand AD3 AD4 Vina DOCK GBSA 1ZP5 2NI 1.32 3.21 1.42 1.27 1.26 3DNG AXA 1.20 1.16 0.82 11.71 11.75 3DPE AXB 1.21 1.32 0.92 6.19 6.06 1ZVX FIN 1.72 1.15 1.67 6.32 6.87 1ZS0 EIN 1.71 1.90 1.34 6.44 6.33 1MNC PLH 2.02 6.08 1.54 6.44 6.42

Table 8.7: RMSD values of MMP-8/ligand complexes. Binding score / energy (kcal/mol) PDB ID Exp. AD3 AD4 Vina DOCK GBSA 1ZP5 -4.0 -11.7 -12.2 -10.4 -62.8 -41.7 3DNG -11.1 -15.5 -10.4 -12.4 -33.2 -22.8 3DPE -9.9 -17.5 -12.0 -12.8 -43.4 -9.8 1ZVX -12.6 -11.8 -10.3 -10.2 -63.1 -41.5 1ZS0 -8.4 -12.2 -10.8 -10.3 -66.1 -55.3 1MNC -11.9 -9.9 -9.7 -8.2 -71.5 -37.2

Table 8.8: Binding score/energy values of MMP-8/ligand complexes.

34 8. APPENDICES

MMP-9

RMSD (Å) PDB ID Ligand AD3 AD4 Vina DOCK GBSA 2OVX 4MR 8.41 2.37 1.72 8.24 12.12

Table 8.9: RMSD values of MMP-9/ligand complexes.

Binding score / energy (kcal/mol) PDB ID Exp. AD3 AD4 Vina DOCK GBSA 2OVX -11.9 -12.9 -9.7 -11.3 -30.8 -15.7

Table 8.10: Binding score/energy values of MMP-9/ligand complexes.

MMP-12

RMSD (Å) PDB ID Ligand AD3 AD4 Vina DOCK GBSA 1JIZ CGS 1.24 1.21 1.41 4.78 10.67 1RMZ NGH 1.34 1.49 1.29 1.06 0.75 1Y93 HAE 2.32 2.31 0.83 0.83 2.03

Table 8.11: RMSD values of MMP-12/ligand complexes.

Binding score / energy (kcal/mol) PDB ID Exp. AD3 AD4 Vina DOCK GBSA 1JIZ -11.9 -10.1 -8.1 -7.6 -30.4 -17.5 1RMZ -10.9 -9.2 -8.8 -7.5 -68.4 -38.3 1Y93 -2.9 -5.1 -5.3 -4.3 -36.9 -23.3

Table 8.12: Binding score/energy values of MMP-12/ligand complexes.

35 8. APPENDICES

MMP-13

RMSD (Å) PDB ID Ligand AD3 AD4 Vina DOCK GBSA 1CXV CBP 0.89 0.70 1.14 6.87 17.72 830C RS1 1.05 0.76 0.61 1.56 1.79 3I7I 518 2.10 1.19 2.63 N/A N/A 1XUC PB3 4.74 12.15 0.67 11.50 0.42 1XUD PB4 N/A 3.12 0.61 11.96 16.76 1XUR PB5 3.59 8.53 0.98 10.90 13.60 2PJT 347 2.64 1.82 1.95 N/A N/A 2D1N FA4 1.77 2.37 1.77 N/A N/A 1YOU PFD 2.87 2.53 1.14 9.52 3.19

Table 8.13: RMSD values of MMP-13/ligand complexes.

Binding score / energy (kcal/mol) PDB ID Exp. AD3 AD4 Vina DOCK GBSA 1CXV -13.3 -12.6 -11.2 -9.9 -27.0 -16.3 830C -12.9 -13.4 -12.4 -10.4 -82.1 -28.4 3I7I -8.5 -14.7 -13.7 -11.6 N/A -N/A 1XUC -9.7 -12.7 -8.1 -11.5 -64.8 -46.9 1XUD -11.0 N/A -8.2 -11.9 -66.4 -28.7 1XUR -7.1 -12.5 -6.6 -10.5 -64.6 -24.9 2PJT -8.3 -11.7 -8.5 -9.3 N/A N/A 2D1N -11.1 -11.0 -7.5 -7.9 N/A N/A 1YOU -12.1 -11.2 -9.9 -9.1 -61.2 -11.4

Table 8.14: Binding score/energy values of MMP-13/ligand complexes.

36 8. APPENDICES

MMP-20

RMSD (Å) PDB ID Ligand AD3 AD4 Vina DOCK GBSA 2JSD NGH 1.37 5.23 3.97 4.62 5.90

Table 8.15: RMSD values of MMP-20/ligand complexes.

Binding score / energy (kcal/mol) PDB ID Exp. AD3 AD4 Vina DOCK GBSA 2JSD -10.6 -9.7 -8.9 -7.1 -69.7 -34.7

Table 8.16: Binding score/energy values of MMP-20/ligand complexes.

37 8. APPENDICES 8.2 List of complexes ordered by binding afﬁnity

Table 8.17 lists all MMP/ligand complexes used in the course of this thesis ordered by decreasing binding afﬁnity. The ranks of complexes correspond to the ranking of complexes in Figure 4.2.

Rank PDB ID ∆Gexp. Rank PDB ID ∆Gexp. 1 1CXV -13.3 20 2JSD -10.6 2 830C -12.7 21 1D7X -10.5 3 1ZVX -12.6 22 2D1O -10.5 4 1YOU -12.6 23 966C -10.4 5 1D5J -12.5 24 2JT6 -10.4 6 1FBL -12.3 25 1B3D -10.4 7 1MNC -11.9 26 1MMQ -10.3 8 2OVX -11.9 27 2TCL -10.1 9 1JIZ -11.9 28 3DPE -9.9 10 1G05 -11.6 29 2JNP -9.8 11 1HFC -11.1 30 1XUC -9.7 12 3DNG -11.1 31 1BIW -9.5 13 2D1N -11.1 32 3I7I -8.5 14 1XUD -11.0 33 1ZS0 -8.4 15 1RMZ -10.9 34 2PJT -8.3 16 3AYK -10.6 35 1G4K -7.8 17 2JT5 -10.6 36 1XUR -7.1 18 1D8F -10.6 37 1Y93 -7.0 19 1G49 -10.6 38 1ZP5 -4.0

Table 8.17: MMP/ligand complexes ordered by binding afﬁnity.

38 8. APPENDICES 8.3 Contents of the attached CD

• Receptor and ligand structures used in docking (folder TestSet).

• Table with complete results of docking simulations (folder Results).

• Text and source code of this thesis (folder Thesis).

39 Bibliography

[1] F. Jensen. Introduction to Computational Chemistry. John Wiley & Sons, 1999. [2] T. Lengauer and M. Rarey. Computational methods for biomolecular docking. Curr. Opin. Struct. Biol., 6(3):402–406, 1996. [3] D. B. Kitchen, H. Decornez, J. R. Furr, and J. Bajorath. Docking and scoring in virtual screening for drug discovery: methods and applications. Nature Reviews Drug Discovery, 3(11):935– 949, 2004. [4] D. A. Gschwend, A. C. Good, and I. D. Kuntz. Molecular docking towards drug discovery. Journal of Molecular Recognition, 9(2):175–186, 1996. [5] J. J. Irwin, F. M. Raushel, and B. K. Shoichet. Virtual screening against metalloenzymes for inhibitors and substrates. Biochem- istry, 44:12316–12328, 2005. [6] R. P. Verma and C. Hansch. Matrix metalloproteinases (MMPs): Chemical-biological functions and (Q)SARs. Bioorg. Med. Chem., 15:2223–2268, 2007. [7] F. X. Gomis-R˝uth. Structural aspects of the metzincin clan of metalloendopeptidases. Mol. Biotechnol., 24(2):157–202, 2003. [8] J. Gross and C. M. Lapiere. Collagenolytic activity in amphibian tissues: a tissue culture assay. Proc. Natl. Acad. Sci. USA, 48(6):1014–1022, 1962. [9] M. Egeblad and Z. Werb. New functions for the matrix metalloproteinases in cancer progression. Nature Reviews Cancer, 2:161–174, 2002. [10] H. E. Van Vart and H. Birkedal-Hansen. The cysteine switch: a principle of regulation of metalloproteinase activity with potential applicability to the entire matrix metalloproteinase gene family. Proc. Natl. Acad. Sci. USA, 87(14):5578–5582, 1990.

40 8. APPENDICES

[11] C. M. Overall and C. Lopez-Otin. Strategies for MMP inhibition in cancer: innovations for the post-trial era. Nature Reviews Cancer, 2:657–672, 2002. [12] M. D. Sternlicht and Z. Werb. How matrix metalloproteinases regulate cell behavior. Annu. Rev. Cell. Dev. Biol., 17:463–516, 2001. [13] V. Aranapakam, J. M. Davis, G. T. Grosu, J. Baker, J. Elling- boe, A. Zask, J. I. Levin, V. P. Sandanayaka, M. Du, J. S. Skot- nicki, J. F. DiJoseph, A. Sung, M. A. Sharr, L. M. Killar, T. Walter, G. Jin, R. Cowling, J. Tillett, W. Zhao, J. McDevitt, and Z. B. Xu. Synthesis and structure-activity relationship of n-substituted 4- arylsulfonylpiperidine-4-hydroxamic acids as novel, orally active matrix metalloproteinase inhibitors for the treatment of os- teoarthritis. Journal of Medicinal Chemistry, 46(12):2376–2396, 2003. [14] M. Whittaker, C. D. Floyd, P. Brown, and A. J. Gearing. De- sign and therapeutic application of matrix metalloproteinase inhibitors. Chem. Rev., 99:2735–2776, 1999. [15] A. C. Wallace, R. A. Laskowski, and J. M. Thornton. LIGPLOT: a program to generate schematic diagrams of protein-ligand interactions. Protein Eng., 8:127–134, 1996. [16] L. M. Coussens, B. Fingleton, and L. M. Matrisian. Matrix metalloproteinases and cancer: Trials and tribulations. Science, 295:2387–2392, 2002. [17] F. Manello, G. Tonti, and S. Papa. Matrix metalloproteinase inhibitors as anticancer therapeutics. Curr. Cancer Drug Targets, 5:285–298, 2005. [18] A. R. Leach. Molecular modelling: Principles and Applications, pages 662–663. Prentice Hall, second edition, 2001. [19] C. A. Baxter, C. W. Murray D. E. Clark, D. R. Westhead, and M. D. Eldridge. Flexible docking using tabu search and an empirical estimate of binding afﬁnity. Proteins: Structure, Function, and Bioinformatics, 33(3):367–382, 1998.

41 8. APPENDICES

[20] M. P. Allen and D. J. Tildesley. Computer Simulation of Liquids. Oxford University Press, 1989.

[21] B. Roux and T. Simonson. Implicit solvent models. Biophys. Chem., 78(1-2):1–20, 1999.

[22] I. Halperin, B. Ma, H. Wolfson, and R. Nussinov. Principles of docking: An overview of search algorithms and a guide to scoring functions. Proteins, 47(4):409–443, 2002.

[23] I. D. Kuntz, J. M. Blaney, S. J. Oatley, R. Langridge, and T. E. Ferrin. A geometric approach to macromolecule-ligand interactions. J. Mol. Biol., 161(2):269–288, 1982.

[24] D. P. Kroese, T. Taimre, and Z. I. Botev. Handbook of Monte Carlo Methods, page 772. John Wiley & Sons, 2011.

[25] C. M. Oshiro, I. D. Kuntz, and J. S. Dixon. Flexible ligand docking using a genetic algorithm. Journal of Computer-Aided Molecular Design, 9:113–130, 1995.

[26] P. J. Bowler. Evolution: The History of an Idea. University of California Press, 2003.

[27] Ajay and M. A. Murcko. Computational methods to predict binding free energy in ligand-receptor complexes. J. Med. Chem., 38:4953–4967, 1995.

[28] J. Bostrøm, P.-O. Norrby, and T. J. Liljefors. Conformational energy penalties of protein-bound ligands. Journal of Computa- tionally Aided Molecular Design, 12:383–396, 1998.

[29] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E Bourne. The Protein Data Bank. Nucleic Acids Research, 28:235–242, 2000.

[30] M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, J. A. Montgomery, Jr., T. Vreven, K. N. Kudin, J. C. Burant, J. M. Millam, S. S. Iyengar, J. Tomasi, V. Barone, B. Mennucci, M. Cossi, G. Scalmani, N. Rega, G. A. Petersson, H. Nakatsuji, M. Hada, M. Ehara, K. Toyota,

42 8. APPENDICES

R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, M. Klene, X. Li, J. E. Knox, H. P. Hratchian, J. B. Cross, V. Bakken, C. Adamo, J. Jaramillo, R. Gomperts, R. E. Stratmann, O. Yazyev, A. J. Austin, R. Cammi, C. Pomelli, J. W. Ochterski, P. Y. Ayala, K. Morokuma, G. A. Voth, P. Salvador, J. J. Dannenberg, V.G. Zakrzewski, S. Dapprich, A. D. Daniels, M. C. Strain, O. Farkas, D. K. Malick, A. D. Rabuck, K. Raghavachari, J. B. Foresman, J. V. Ortiz, Q. Cui, A. G. Baboul, S. Clifford, J. Cioslowski, B. B. Stefanov, G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R. L. Martin, D. J. Fox, T. Keith, M. A. Al-Laham, C. Y. Peng, A. Nanayakkara, M. Challacombe, P. M. W. Gill, B. Johnson, W. Chen, M. W. Wong, C. Gonzalez, and J. A. Pople. Gaussian 03, Revision E.01. Gaussian, Inc., Wallingford, CT, 2004.

[31] J. Wang, W. Wang, P. A. Kollman, and D. A. Case. Automatic atom type and bond type perception in molecular mechanical calculations. Journal of Molecular Graphics and Modelling, 25:247–260, 2006.

[32] University of California San Francisco. The Amber Molecular Dynamics Package. http://ambermd.org/, 2011. [Online; accessed 22-April-2011].

[33] University of California San Francisco. UCSF Chimera Home Page. http://www.cgl.ucsf.edu/chimera/, 2011. [On- line; accessed 22-April-2011].

[34] R. Guha, M. T. Howard, G. R. Hutchison, P. Murray-Rust, H. Rzepa, C. Steinbeck, J. K. Wegner, and E. L. Willighagen. The Blue Obelisk–Interoperability in Chemical Informatics. Journal of Chemical Information and Modeling, 46:991–998, 2006.

[35] G. M. Morris, R. Huey, W. Lindstrom, M. F. Sanner, R. K. Belew, D. S. Goodsell, and A. J. Olson. Automated docking using a lamarckian genetic algorithm and and empirical binding free energy function. Journal of Computational Chemistry, 19:1639– 1662, 1998.

43 8. APPENDICES

[36] G. M. Morris, D. S. Goodsell, R. Huey, W. E. Hart, S. Halliday, R. Belew, and A. J. Olson. AutoDock 3 User’s Guide.

[37] M. Prokop et. al. TRITON: a graphical tool for ligand-binding protein engineering. Bioinformatics, 24:1955–1956, 2008.

[38] S. J. Weiner, P. A. Kollman, D. A. Case, U. C. Singh, C. Ghio, G. Alagona, S. Profeta, and P. Weiner. A new force ﬁeld for molecular mechanical simulation of nucleic acids and proteins. Journal of the American Chemical Society, 106(3):765–784, 1984.

[39] X. Hu and W. H. Shelver. Docking studies of matrix metalloproteinase inhibitors: zinc parameter optimization to improve the binding free energy prediction. Journal of Molecular Graphics and Modelling, 22(2):115–126, 2003.

[40] G. M. Morris, D. S. Goodsell, R. S. Halliday, R. Huey, W. E. Hart, R. K. Belew, and A. J. Olson. AutoDock4 and AutoDockTools4: Automated docking with selective receptor ﬂexibility. Journal of Computational Chemistry, 30(16):2785–2791, 2009.

[41] O. Trott and A. J. Olson. Autodock Vina: improving the speed and accuracy of docking with a new scoring function, efﬁcient optimization, and multithreading. Journal of Computational Chemistry, 31(2):455–461, 2010.

[42] M. F. Sanner. Python: A programming language for software integration and development. Journal of Molecular Graphics and Modelling, 17:57–61, 1999.

[43] D. Qui, P. Shenkin, F. Hollinger, and W. Still. The GB/SA continuum model for solvation. a fast analytical method for the calculation of approximate born radii. J. Phys. Chem. A., 101(16):3005–3014, 1997.

[44] A. Jakalian, D. B. Jack, and C. I. Bayly. Fast, efﬁcient generation of high-quality atomic charges. AM1-BCC model: II. Parameter- ization and validation. J Comput Chem, 23(16):1623–1641, 2002.

44 8. APPENDICES

[45] C. Huang. dms. http://www.cgl.ucsf.edu/chimera/ docs/UsersGuide/midas/dms1.html, 2011. [Online; accessed 22-April-2011].

[46] E. W. Weisstein. Root-mean-square. http://mathworld. wolfram.com/Root-Mean-Square.html, 2011. [Online; accessed 22-April-2011].

[47] W. Humphrey, A. Dalke, and K. Schulten. VMD – Visual Molec- ular Dynamics. Journal of Molecular Graphics, 14:33–38, 1996.

[48] The Scripps Research Institute. AutoDock - AutoDock. http: //autodock.scripps.edu/, 2011. [Online; accessed 22- April-2011].

[49] D. Plewczynski, M. Łazniewski,˙ M. von Grotthuss, L. Rych- lewski, and K. Ginalski. Votedock: Consensus docking method for prediction of protein-ligand interactions. Journal of Compu- tational Chemistry, 32(4):568–581, 2011.

[50] J. L. Rodgers and W. A. Nicewander. Thirteen ways to look at the correlation coefﬁcient. The American Statistician, 42(1):59– 66, 1988.

[51] J. L. Myers and A. D. Well. Research Design and Statistical Anal- ysis, page 508. Lawrence Erlbaum, 2003.