Computational Studies on Druggable Targets of Mycobacterium Tuberculosis and Human Immunodeficiency Virus, and Identification of Potential Inhibitors

Thesis submitted for partial fulfillment of

requirements for the degree of

DOCTORATE OF PHILOSOPHY

in

COMPUTATIONAL NATURAL SCIENCES

by Chinmayee Choudhury

201066007

[email protected]

International Institute for Information Technology (Deemed to be University) Hyderabad -500032 August 2015

ii

iii

Dedicated to My Parents

iv

Acknowledgement

It gives me immense pleasure that I have an opportunity to place on record, the contributions of several people, who have been influential in crystallizing this thesis. I have been benefited from numerous people during the course of this work and I am also thankful to many others who have laid the foundations for this research during my graduate education and even earlier. I attempt to list these people here, and I fervently hope I have not missed anyone. My first thanks must go out to my advisors, Dr G. Narahari Sastry and Dr. U. Deva Priyakumar whose encouraging support and mentorship has made it possible for me to achieve this goal. My association with Dr. Sastry dates back to January 2008, when I joined him for four months as an M. Sc. project trainee and a year later, I got the opportunity to do Ph. D. under him. Right from those days, he has been of immense support and an excellent mentor throughout the course of my stay at the Institute. I feel extremely privileged to get an opportunity to work with Dr. Priyakumar, who has been my inspiration and has provided me with constant support, encouragement and a strong guidance. I thank the former Directors, CSIR-IICT, Hyderabad, Dr. J. S. Yadav, and Dr. M. Lakshmi Kantam for the academic support and the facilities provided to carry out the research work at the Institute. I thank Dr. P. J. Narayanan, Director, IIITH for the research facilities at IIITH. I also thank all the professors of CCNSB, IIITH for their insightful teachings during my coursework, which helped me to understand many basic concepts and helped to improve my confidence level to carry out the research work. Dr. Y. Soujanya, Scientist, CSIR-IICT, has been very encouraging and supportive, and I express my gratitude to her. I acknowledge Department of Science and Technology (DST), New Delhi for the financial support in the form of DST-INSPIRE Fellowship for pursuing my Ph. D. I thank my senior colleague Dr. Hemant Srivastava for his help and critical suggestions, particularly for the fruitful discussions on the QSAR studies on HIV protease inhibitors, presented in this thesis. I also extend my thanks to Dr. Preethi Badrinarayan for her constructive suggestions and support as a co-author while writing the ‘’ book chapter. I would like to express my heartfelt thanks to my friend Mohammed and seniors Dr. Swati, Dr. Anirban, and Dr. Soumen for their constant affectionate support and care in a very special way during my Ph. D. time. I have learnt a lot from them, through both their personal and scholarly interactions, their suggestions at various points of time.

v

I extend my thanks to all my seniors (Dr. Gayathri, Dr. Subha, Dr. Janardhan, Dr. Bhaskar, Dr. Uma, Dr. Purushotham, Dr. Prem, Althaf, Neela) and lab mates (Sirisha, Srikant, Ram Vivek, Prashanti, Arun, Parameswari, Lijo and Anamika), who provided a friendly and cooperative atmosphere in the lab during my Ph. D. period. I also thank my friends Siladitya, Prathyusha, Suresh, Navneet, Koushik, Sandhya, Antarip, Sohini, Rajitha, Ramesh, Mohan, Karthik and Thamanna for their cooperative company and help during my stay in IIITH. I am so much grateful to all my teachers from Sri Aurobindo Integral Education Centre, GNUP School, MRGH School and SKCG College, Paralakhemundi and OUAT, Bhubaneswar for not only teaching me the curriculum but also giving lessons on how to be passionate for knowledge and dream big. Words cannot express my gratitude to my parents Shri Ram Prasad Choudhury and Smt. Pratibha Choudhury, to whom this thesis is dedicated, for their unconditional love, selfless support, constant encouragement and for all of the sacrifices that they have made for making me what I am today. I owe everything in life to them. I must specially mention my mom, who obviously had to endure a lot more of my own pressures during my difficult times. I thank my younger brother Bikash for believing in me and my abilities and for his critical, constructive comments, which has made me confident. I also thank him for making my hectic and stressful Ph. D. time cheerful with all his love, care and sense of humor. I extend my thanks to my sister in law Resami for her affectionate support. I also take this opportunity to extend my cordial thanks to my grandmother, all my uncles, aunts and my sweet little cousins who have always been my well wishers and have made me strong by appreciating and encouraging me. I am very much indebted to my parents-in-law for their love, care and constant encouragements and support. I am extremely thankful to my husband Dr. Sourabh for standing beside me throughout my career and for his strong support in all difficult situations in life. He has always been my inspiration and motivation in every step of my career. I am eternally grateful to him for always making me smile and for understanding me on those days when I was working on my thesis far away from him instead of accompanying him. Last but not the least, I bow my head before the almighty, without whose grace, nothing could be possible.

vi

Abstract

World health organization (WHO) surveys report that in 2014 there were 8.8 million

(range 8.5-9.2) incident cases of tuberculosis (TB), 1.1 million (range, 0.9-1.2 million) deaths from TB among human immunodeficiency virus (HIV)-negative people and an additional 0.035 million (range 0.32-0.39 million) deaths from HIV-associated TB and there are several issues of drug-drug interactions in treating both Mycobacterium tuberculosis (M. Tb) and HIV simultaneously. Therefore, molecules that are able to inhibit multiple M. Tb targets, and also those which simultaneously act on both M. Tb and HIV are the need of the hour to tackle the therapeutic challenges such as drug resistance and coexistence with HIV respectively. This thesis reports application of rigorous computational approaches at various length and time scales to understand the important druggable targets of M. Tb and HIV as well as strategies for identifying dual inhibitors of M.Tb and HIV. The work carried out for the Ph.D. has been divided into 8 chapters.

Chapter 1 introduces the advances in computational modeling of biomolecules at various levels of complexities, followed by a brief discussion on the challenges in M. Tb drug discovery

Chapter 2 gives the theoretical background of computational methods used in the thesis.

Mycobacterial cyclopropane synthase 1 (CmaA1) is an important drug target of M. Tb as it is responsible for cis-cyclopropanation at the distal position of unsaturated mycolates, which is an essential step for the pathogenicity, persistence and drug resistance.

Chapter 3 of the thesis reports a study, where five representative models of CmaA1 corresponding to different stages in the cyclopropanation process have been studied using (MD) simulations. The MD simulations and structural analyses provide a detailed account of the structural changes in the active sites of CmaA1. CmaA1 has two distinct

vii binding sites, i.e., cofactor binding site (CBS) and acyl substrate binding site (ASBS). The apo state of CmaA1 corresponds to a closed conformation where the CBS is inaccessible due to the existence of H-bond between Pro202 of loop10 (L10) and Asn11 of N-terminal α1 helix.

However, cofactor binding leads to the breaking of this H-bond. The hydrophobic side chains orient towards the inner side of the ASBS upon cofactor binding to create a hydrophobic environment for the substrate. The cofactor and substrate tend to come close to each other facilitated by opening of L10 to exchange the methyl group from the cofactor to the substrate.

The MD study also revealed that the system tends to regain the apo conformation within 40 ns after releasing the product.

In Chapter 4, structure based pharmacophore models generated using the MD trajectories of the model systems have been presented. The performance of these pharmacophore models have been validated by mapping 23 molecules which have been previously reported to exhibit inhibitory activities on CmaA1. The models were further validated by comparing the results from the pharmacophore mapping with the results obtained from docking these molecules with the respective protein structures. On the basis of the screening ability and consistency with the docking results five models have been proposed. The models generated from the MD trajectories were found to perform better than the one generated based on the structure demonstrating the importance of incorporating receptor flexibility in drug design.

Chapter 5 gives the details of a novel virtual screening (VS) approach where five structure based pharmacophore models proposed in the previous chapter have been used along with the five validated ligand based pharmacophore models. Aiming towards repurposing the existing drugs to inhibit CmaA1, 6,429 drugs reported in DrugBank were considered for screening. To find compounds that inhibit multiple targets of M. Tb as well as HIV, 701 and

viii

11,109 compounds showing activity below 1μM range on M. Tb and HIV cell lines respectively were also collected from ChEMBL database. Thus, a total of 18,239 compounds were screened against CmaA1 using four levels of screening i. e., ligand based pharmacophore screening, structure based pharmacophore screening, docking and absorption, distribution, metabolism, excretion, toxicity (ADMET) filters. Twelve compounds were identified as potential hits for

CmaA1 at the end of the fourth step. These compounds were found to interact with the key active site residues of CmaA1.

Chapter 6 reports an analogue-based approach employed to predict the inhibitory activities of compounds against HIV protease. This study critically examines the role of conceptual density functional theory (-DFT) descriptors and docking scores on a diverse set of

156 inhibitors of HIV proteases. A huge number of quantitative structure activity relationship

(QSAR) models were developed based on available experimental IC50 values (HIV-I and HIV-

IIIB infected MT4 and CEMSS cells and HIV-I infected C8166 cells). B3LYP/6-31G(d) optimizations were carried out on all considered protease inhibitors, and the results were compared with more economic semi-empirical SCF AM1 results in order to find out the best and efficient way of descriptor calculations. Interestingly semi-empirical results appeared to be satisfactory for this class of inhibitors. Selected QSAR models were validated by taking about

20% of inhibitors in the test sets. The 3-4 orthogonal descriptors based models were selected to be the optimum ones to avoid over correlation. A systematic comparison of conventional descriptors generated by CODESSA, C-DFT and docking scores was done. Their ability to generate statistically significant QSAR models reveal the prominence of conventional and C-

DFT descriptors compared to docking scores.

ix

An exhaustive analysis of target binding, ADMET/physicochemical properties of 110 natural products (NP) and NP derivative (ND) drugs reported in DrugBank, containing hexadecahydro-1H-cyclopenta[a]phenanthrene framework (HHCPF) to understand their structural and functional diversities and target specificities has been reported in Chapter 7.

Analyses of the target information collected from DrugBank, UniProt and PDB show the selectivity of the scaffolds for different targets and vice versa. The substituents present at 17 different positions of the scaffolds were classified as six features viz., H-bond donors, H-bond acceptors, aromatic rings, hydrophobic, charged and halogen groups. Good correlations (R > 0.8) exist between the number of such features present at certain positions of the scaffolds and the

ADMET/physicochemical properties of the HHCPF drugs. Docking studies revealed the role of substituents at different positions to make specific interactions with their respective targets.

Based on the docking interactions, we proposed structure based e-Pharmacophore models for the seven most common targets of HHCPF drugs. The study enables preliminary prediction of target selectivity and ADMET properties of a new HHCPF molecule based on the type of scaffold, substitutions and spatial arrangements of the pharmacophoric features.

The work presented in this thesis illustrates the dynamic structural and conformational changes in the active sites of CmaA1 during various stages of cyclopropanation in a systematic way for the first time. Thus, this study brings out the necessity of incorporating the flexibility of the receptor during virtual screening which resulted in employing the dynamic pharmacophore models. An exhaustive QSAR study examines critically various molecular descriptors and their role, especially QM descriptors in the QSAR models of HIV PI. The work carried out in the thesis tries to capture the dramatic consequences of small structural variations in the HHCPF scaffolds leading to diverse target selectivity.

x

Contents

Topic Page Declaration of Authorship ii Certificate iii Abstract vii List of Figures xiv List of Tables xvii List of Abbreviations xix

Chapter 1: introduction 1 1.1 Computations in biology and drug design 2 1.1.1 First principles calculations 4 1.1.2 Molecular mechanics (MM) 6 1.1.3 Hybrid QM/MM methods 9 1.1.4 Simulating large macro-molecules 10 1.2 Computer aided drug design (CADD) 11 1.3 Tuberculosis 14 1.3.1 Current anti-TB drugs, vaccines and therapies 16 1.3.2 Current TB therapy (DOTS) 18 1.3.3 Drawbacks of the current therapy 18 1.3.4. Co-existence of M. Tb. and HIV 19 1.4 Role of CADD in TB Drug Discovery 20 1.5 Objectives of the current work 24 1.6 Overview of the thesis 25 References 27

Chapter 2: Computational Methods 32 2.1 Molecular Dynamics 33 2.1.1 Statistical ensembles 34 2.1.2 Steps of MD simulations 35 2.2 Pharmacophore modeling 38 2.2.1 Ligand-based pharmacophore 39 2.2.2 Structure-based pharmacophore 39 2.2.3 Dynamic pharmacophore 40 2.2.4 e-Pharmacophore 40 2.3 Docking 41 2.3.1 Glide 42 2.3.1 GOLD 43 2.4 QSAR 44 2.4.1 Descriptors 45 2.4.2 Generation of the QSAR equation 48 2.4.3 Statistical parameters 48

xi

References 50 Chapter: 3 Active Site Dynamics of CmaA1 during Various Stages of the Cyclopropanation Process 53 3.1 Background 54 3.2 Methodology 59 3.2.1 Model systems 59 3.2.2 Molecular dynamics (MD) simulations 59 3.3 Results and discussion 60 3.3.1 Analysis of structural properties 60 3.3.2 Active site dynamics 65 3.4. Conclusions 74 References 76

Chapter 4: Dynamics Based Pharmacophore Models for Screening Potential 79 Inhibitors of Mycobacterial Cyclopropane Synthase 4.1 Background 80 4.2 Methodology 82 4.2.1 Model systems 82 4.2.2 Molecular dynamics (MD) simulations 82 4.2.3 Generation of structure based pharmacophore models 83 4.2.4 Pharmacophore screening and docking 84 4.3. Results and Discussion 86 4.3.1 Comparison of e-pharmacophore models generated from different 88 model systems 4.3.2 Screening CmaA1 inhibitors by e-pharmacophore and docking 90 4.4 Conclusions 101 References 102

Chapter 5: Dynamic Ligand Based Pharmacophore Modeling and Virtual 105 Screening to Identify Mycobacterial Cyclopropane Synthase Inhibitors 5.1 Background 106 5.2 Methodology 109 5.2.1 Generation and validation of ligand based pharmacophore models 109 5.2.2 Virtual Screening 110 5.2.2.1 Preparation of dataset 110 5.2.2.2 Screening 110 5.3 Results and Discussion 111 5.3.1 Details and validation of ligand based pharmacophore models 112 5.3.2 Virtual Screening 115 5.3.2.1 Choice of the dataset 115 5.3.2.2 Design of the VS protocol 116 5.3.2.3 First level filter: Dynamic ligand based pharmacophore 117

xii

screening 5.3.2.4 Second level filter: Dynamic structure based pharmacophore 117 screening 5.3.2.5 Third level filter: Docking 118 5.3.2.6 Fourth level filter: ADMET properties 118 5.3.3 Interactions of screened compounds with the active site of CmaA1 119 5.4 Conclusions 124 References 128

Chapter 6: The efficacy of conceptual DFT descriptors and docking scores on the 132 QSAR models of HIV protease inhibitors 6.1 Background 133 6.2 Methodology 136 6.2.1 Dataset preparation 136 6.2.2. Geometry Optimization and descriptor calculation 137 6.2.3. QSAR model generation and validation 138 6.3 Results and Discussion 139 6.4 Conclusions 151 References 152

Chapter 7: The Structural and Functional Diversities of Hexadecahydro-1H- 154 Cyclopenta[a] Phenanthrene Framework 7.1 Background 155 7.2 Methodology 158 7.2.1 Preparation of dataset 158 7.2.2 Targets 158 7.2.3 Physicochemical and ADMET properties 159 7.2.4 Docking 159 7.2.5 Generation and validation of e-Pharmacophore models 160 7.3 Results and discussion 160 7.3.1 Classification of the NP/ND drugs based on scaffold 161 7.3.2 Classification of the targets 163 7.3.3 Promiscuity of HHCPF scaffolds 166 7.3.4 Interaction of the scaffolds and substituents of HHCPF drugs with 168 the targets 7.3.5 e-Pharmacophore models for the common HHCPF drugs 173 7.3.6 Correlation between the number/nature of substituents and the 175 ADMET properties 7.4 Conclusions 177 References 178

Chapter 8: Summary 181

xiii

List of Figures

Figure # Figure Caption Page

Figure 1.1 Hierarchical order of molecular modeling approaches at different time 4 and length scales. The figure depicts typical systems and methods which can be useful for varying time and length scales

Figure 1.2 Examples of various types of atomic interactions in a molecule 7 considered in MM

Figure 1.3 Outline of a typical CADD approach. 12

Currently available anti-TB drugs along with their targets and 17 Figure 1.4 pathways.

Figure 2.1 Overview of all the computational methods used in different chapters. 33

Figure 2.2 Flow-chart depicting the general steps of a typical MD simulation 35

Figure 2.3 Steps of QSAR modeling. 46

Scheme 3.1 Mechanism of cyclopropanation of unsaturated mycolates in 55 mycobacteria.

Figure 3.1 Crystal structures of CmaA1 56

Scheme 3.2 Schematic representation of the cyclopropanation cycle 58

Figure 3.2 Covariance matrices of the residues of systems 63

Figure 3.3 A RMSD (in Ås) matrices of the 5 systems with respect to each other. 64

Figure 3.3 B Probability distribution of RMSD (in Ås) of the 5 systems with respect 65 to each other.

Figure 3.4 CBS and ASBS residues of each of the systems 67

Figure 3.5 Relative movements of L10 and the N-terminus in the five model 68 systems.

Figure 3.6 Inward orientation of the hydrophobic side chains of the ASBS upon 69 SAM binding.

Figure 4.1 Schematic representation of the generation of various types of 83 pharmacophore models as filters for virtual screening.

Scheme 4.1 Compounds used for validation of the performance of the 85 pharmacophore models to screen active inhibitors.

xiv

Figure 4.2 e-pharmacophore model generated from the crystal structure of CmaA1 88 (1KPH)

Figure 4.3 Pharmacophore fitness and docking scores of the reference compounds 91 with the MD based models/snapshots (dark blue) and the crystal structure based models/snapshots (cyan).

Figure 4.4 Compounds screened by the best 5 e-pharmacophore models and 95 docking with the respective snapshots.

Figure 4.5. Pharmacoophore fitness score and XP docking scores of the reference 98 compounds with the models and the respective snapshots.

Figure 4.6 Selected e-pharmacophore models with the active site residues 99 associated with the pharmacophore features.

Figure 4.7 Best 5 e-pharmacophore models mapped to SAM/SAHC 100

Figure 5.1 Superposition and mutual root mean squared deviations (RMSD) of the 112 40 MD snapshots of the cofactors (SAM/SAHC).

Figure 5.2 The most active reference compound C1 mapped with all the selected 114 ligand based pharmacophore models.

Figure 5.3 Schematic representation of the step by step virtual screening process 116

Scheme 5.1 Compounds selected structures of all the selected compounds from all 120 the three datasets taken

Figure 5.4 A Interaction of the screened ChEMBL-MTb compounds with the CmaA1. 121 Among the five docking poses with 5 selected snapshots, the complexes with highest docking scores have been shown.

Figure 5.4 B Interaction of the screened DrugBank compounds with the CmaA1. 122 Among the five docking poses with 5 selected snapshots, the complexes with highest docking scores are shown.

Figure 5.4 C Interaction of the screened ChEMBL-HIV compounds with the 126 CmaA1. Among the five docking poses with 5 selected snapshots, the complexes with highest docking scores have been shown.

Scheme 6.1 Scaffolds representing 156 HIV protease inhibitors with the cell line, 137 the number of inhibitors in each scaffold is mentioned in parenthesis.

Figure 6.1 Various descriptors employed in the study. The types of descriptors are 138 mentioned in bold and name of the descriptors are given in the parenthesis.

xv

Figure 6.2 Effect of number of descriptors on the correlation coefficient of cell 148 line based QSAR models

Figure 6.3 Regression summary 148

Figure 6.4 A The predicted pIC50 values plotted against the experimental pIC50 149 values for 3(Set1-5) & 5 (Set6) descriptor based models.

Figure 6.4 B The predicted pIC50 values plotted against the experimental pIC50 150 values for 4(Set1-5) & 6 (Set6) descriptor based models.

Figure 7.1 Classification of HHCPF drugs into various scaffolds based on the 161 number and position of double bonds in the HHCP skeleton.

Figure 7.2 A) Structural classification (CATH) of all the targets of the HHCPF 166 drugs (targets those have 3D structure reported in PDB), B) Classification of the functional families of the targets.

Figure 7.3 Frequencies of binding of the 15 HHCPF scaffolds to different targets 167 and families of targets.

Figure 7.4 Important targets binding various scaffolds. The targets that bind to at 169 least 3 compounds have been shown here. The figure shows the preference of the targets to bind a particular scaffold.

Figure 7.5 Interactions of the HHCPF drugs with the active sites of the 171 corresponding HHCPF drug targets. One representative drug for each of the 7 important HHCPF drug targets are shown here.

Figure 7.6 e-Pharmacophore models proposed for the seven most important 174 HHCPF drug targets. The important active site residues associated with the features have been shown.

xvi

List of Tables

Table # Table Caption Page

Table 1.1 List of recently reported targets in M. tuberculosis along with pathways 23 represented by them, and their inhibitors.

Table 1.2 Stages of HIV life cycle and where can they be targeted. 24

Table 2.1 Types of statistical ensembles used in MD simulations 35

Table 3.1 Averages of various structural and energetic properties of the systems along 61 with the standard deviations.

Table 3.2 Key intramolecular H-bonds 66

Table 3.3 Key H-bond interactions between the cofactors and the active site residues. 66

Table 4.1 Features of the selected 5 best e-pharmacophore models. 94

Table 4.2 Comparison of number of inhibitors and non-inhibitors screened by the 96 selected models and ranges of their fitness scores.

Table 5.1 Comparison of number of inhibitors and non-inhibitors screened by the ligand 115 based pharmacophore models and ranges of their fitness scores.

Table 6.1 Name of the scaffolds, cell lines, number of inhibitors and pIC50 range of 136 various sets of HIV protease inhibitors considered for the study.

Table 6.2 Effect of conceptual DFT based descriptors and docking scores on the 141 statistical quality of QSAR models of Set-1 obtained by three, four and five conventional descriptors.

Table 6.3 Effect of conceptual DFT based descriptors and docking scores on the 141 statistical quality of QSAR models of Set-2 obtained by three, four and five conventional descriptors.

Table 6.4 Effect of conceptual DFT based descriptors and docking scores on the 142 statistical quality of QSAR models of Set-3 obtained by three, four and five conventional descriptors.

Table 6.5 Effect of conceptual DFT based descriptors and docking scores on the 143 statistical quality of QSAR models of Set-4 obtained by three, four and five conventional descriptors.

Table 6.6 Effect of conceptual DFT based descriptors and docking scores on the 144 statistical quality of QSAR models of Set-5 obtained by three, four and five conventional descriptors.

xvii

Table 6.7 Effect of conceptual DFT based descriptors and docking scores on the 145 statistical quality of QSAR models of Set-6 obtained by three, four, five and six conventional descriptors.

Table 6.8 Regression equations and statistical significance for final selected QSAR 147 models for all the sets.

Table 6.9 Comparative performance of AM1, B3LYP/6-31G(d)//AM1 and B3LYP/6- 150 31G(d) level of theories on the statistical quality of QSAR models.

Table 7.1 List of targets of the tetra cyclic drugs reported in DrugBank with the number 163 of drugs binding to them.

Table 7.2 Correlation between number of certain chemical feature at a particular 176 position and the ADMET properties grouped based on the scaffold.

xviii

List of Abbreviations

Absorption, distribution, metabolism, excretion, toxicity (ADMET) Acyl substrate binding site (ASBS) Adapted Basis Newton Raphson (ABNR) Cofactor binding site (CBS) Conceptual density functional theory (C-DFT) Extremely drug resistant (XDR) Food and Drug Administration (FDA) Hexadecahydro-1H-Cyclopenta[a]Phenanthrene Framework (HHCPF) High throughput screens (HTS) Human immunodeficiency virus (HIV) Hydrogen bonds (H-bond) Molecular dynamics (MD) Multi-drug resistant (MDR) Mycobacterial cyclopropane synthase 1 (CmaA1) Mycobacterium tuberculosis (M. Tb) Natural products (NP) NP derivative/mimetic/inspired (ND) Quantitative structure activity relationship (QSAR) S-adenosyl-l-homocysteine (SAHC) S-adenosyl-L-methionine (SAM) Steepest descent (SD) Structure-activity relationship (SAR) Structure-based drug design (SBDD) Tuberculosis (TB) Virtual Screening (VS) World health organization (WHO)

xix

Chapter 1 Introduction

“But medicine has long had all its means to hand, and has discovered both a principle and a method, through which the discoveries made during a long period are many and excellent, while full discovery will be made, if the inquirer be competent, conducts his research with knowledge of the discoveries already made, and make them his starting point.” — Hippocrates Ancient Medicine, in Hippocrates, trans. W. H. S. Jones (1923), Vol. I15. Chapter 1

1.1 Computations in biology and drug design Computational modeling and informatics have emerged as indispensible approaches in all branches of science, engineering and medicine. Drug discovery is inherently an interdisciplinary field and understanding the mechanism of drug action requires an effective integration of chemistry, biology and allied areas. Computations and informatics proved to be of outstanding importance in understanding the biomolecular structure, function and mechanism. In most cases the computational power becomes a bottleneck as it decides their applicability to explore complex relationship between the biological structure and its function. The past decade has witnessed an exponential growth in computational power resulting in immense advances in understanding the complexity of the biological and chemical problems by computer simulations

[1, 2]. In 1964, ‘CDC 6000’ being world’s first supercomputer developed by Seymour Cray, ushered the path of revolution in the world of computational research [3]. Thereafter, there has been an explosion in computer power culminating with today’s petaflop supercomputers [4].

Such a tremendous increase in computational power has facilitated exploration of larger biological systems by explicit long simulations. In the recent years, advances in biotechnology, strengthened by the colossal leap in computational resources, have provided a wealth of biological data at all levels of biological organization. A range of genome and proteome projects

[5, 6], have revealed the details of the constituent parts and basic structures of living organisms at cellular levels, while advanced techniques in molecular biology, spectroscopy and biochemistry have given the atomistic insight. These atomistic details obtained from in vitro laboratory experiments are mostly static, single molecular scale observations. However a complete understanding of the biological functions will be possible only when all relevant information is integrated at multiple levels of organization and the dynamic interactions are recreated. For example a successful drug therapy is not only dependent on the interactions of the

2

Chapter 1 drugs with the target protein, but also on its interactions with the surrounding environment i.e., interactions with the transporters, p-glycoproteins, other receptors, enzymes etc., its toxicity and side effects [7]. Thus, a holistic approach is required to deal with complex systems and the dynamic interactions among the components of the system. Practically, all of these dynamic interactions cannot be recreated purely by experimental observations. So, mathematical or computational models play crucial role in figuring out the perplexing non-linear biological processes [8]. In order to achieve the abstract descriptions of biological processes with the dynamic complexity to give the required biological function, an iterative interplay between experiment and modeling is necessary [9]. The models should be able to take into account the multiple spatial and temporal scales across which these processes take place. The spatial scales that may need to be considered in a complete model of a biological process range from 10-10 m

(small molecules) to 10-5 m (a cell) - 5 orders of magnitude and the temporal scales from 1 fs

(bond vibrations) to 1 ms (protein folding) - 9 orders of magnitude [10]. Such a wide range of scales requires a coupled hierarchy of models, each describing behavior at a different scale, to encapsulate the complete system. These multi scale models are too complex that they can only be solved numerically, resulting in the increasing importance of computational simulation methods.

At different time and length scales, different levels of computations have to be applied. While the ab initio computational approaches have the ability to model and obtain every possible experimental property, their applicability becomes very limited as the size of the system increases. Consequently, for macro-molecules, the application of quantum mechanical approaches is severely limited. Also the time scales that can be probed using ab initio methods are very small. Thus application of quantum chemical methods based on ab initio theory is practically limited to systems with very limited length and time scales. Therefore, one needs to

3

Chapter 1 resort to methods based on classical mechanics to employ computational methods with larger length and time scales (Figure 1.1). In the following sections we discuss different levels of computations that are suitable for different length and time scales.

10-3

10-6 MM Region

10-9 Coarse Grained QM Region Simulation

10-12 Time Scale (Seconds) Hybrid QM/MM

Atomistic MD Simulations 10-15 Quantum Mechanics 10-12 10-9 10-6 10-3 Length Scale (Meters)

Figure 1.1 Hierarchical order of molecular modeling approaches at different time and length scales. The figure depicts typical systems and methods which can be useful for varying time and length scales

1.1.1 First principles calculations

Quantum chemical approaches are needed to accurately model the systems at atomistic scale and more importantly for obtaining electronic structure information [8, 11, 12]. According to quantum mechanics (QM), all possible information on a molecular system can be obtained from a wave function, ψ, which is obtained by solving the Schrödinger wave equation (Eqn. 1), which states that, when the Hamiltonian operator acts on a certain wave function Ψ, and the result is proportional to the same wave function Ψ, then Ψ is a stationary state, and the proportionality constant, V, is the potential energy of the state Ψ.

4

Chapter 1

2 ( , ) = 2 + ( , ) ( , ) Eqn. (1) 2 𝜕𝜕 ћ 𝑖𝑖ћ 𝜕𝜕𝜕𝜕 ѱ 𝑟𝑟 𝑡𝑡 � 𝜇𝜇 ∇ 𝑉𝑉 𝑟𝑟 𝑡𝑡 � ѱ 𝑟𝑟 𝑡𝑡 where i is the imaginary unit, ħ is the Planck’s constant divided by 2π, μ is the particle's reduced mass and 2 is the Laplacian (a differential operator). However, the Schrödinger wave equation can be solved∇ only for one electron systems thus rendering it unsolvable for many electron systems. The Schrödinger equation is the fundamental equation in QM and provides the basis for providing a complete description of the electronic structure of a molecule. Due to the difficulty associated with solving the Schrödinger equation for many electronic systems a large number of approximations were provided. One of the primary approximations to simplify the wave function is the Born-Oppenheimer approximation which allows the wave function of a molecule to be broken into its electronic and nuclear (vibrational, rotational) components.

= X Eqn. (2)

𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 ѱ Followingѱ this,ѱ variation theorem has become extremely effective for setting a lower bound for the energy. A trial wave function is first chosen depending on one or more parameters, and the values of these parameters for which the lowest possible expectation value of the energy are obtained through iterative procedures. The second important approximation is perturbation theory truncated to second, third or higher orders. However, the most effective theory based on electronic structure methods is ab initio self-consistent field (SCF) theory, where the fundamental level of reference wave function for the single determinant wave function is obtained by using the Hartree-Fock method. Electron correlation, which in principle may be divided into static and dynamic, is one of the most important parameter which needs to be included for the accurate description of the wave function. Thus, methods which go beyond

Hartree-Fock level were warranted to obtain reliable properties of the molecular systems. The most popular variants of these methods, where a single Slater determinant can reasonably

5

Chapter 1 describe the system, are based on Moller-Plesset perturbation theory, configuration interaction and couple cluster methods. However, for open shell systems, the non-dynamic electron correlation becomes important, and one needs to have more than one Slater determinant for the reference wave functions. In such conditions the multi-configurational self-consistent field

(MCSCF) procedures become imminent. Methods based on these approximations have become very popular and have contributed greatly to the understanding of molecular structure, function and property relationships. The most rigorous method among these is based on the ab initio molecular orbital theory. One of the main bottlenecks in the application of the rigorous ab intio calculations to large molecules is computational capacity. In order to overcome that, several economical semi-empirical SCF methods have emerged. However, recent advances in the density functional theory have become very effective in dealing with the medium to large biomolecules [12].

1.1.2 Molecular mechanics (MM)

Next, at the large molecular level, the system is defined in terms of the (classical) interactions between atoms (and ions). Molecules in a matter exert attractive and repulsive forces on each other when they are close enough to influence one another and these forces are essentially responsible for the various physical, chemical and biological properties of the matter.

The goal of MM is to predict the detailed structure and physical properties of molecules by calculating the energy of a molecule and then optimize the geometry through changes in bond lengths and angles to obtain the minimum energy structures. The system is treated by the laws of motion and many different force fields are used to calculate the forces, which are due to the deformation of chemical bonds, H-bonding, and electrostatic, dipole–dipole and van der Waals interactions [13]. The atomistic MM methods treat molecules as balls joined by springs wherein

6

Chapter 1 each atom is a single particle with an assigned radius (typically the Van der Waals radius), polarizability, constant net charge (generally derived from quantum calculations and experiment). The bonded interactions are treated as springs with an equilibrium distance equal to the experimental or calculated bond length. MM estimates the steric energy of a molecule as the energy due to the geometry of a molecule, from a few specific interactions within a molecule

(Figure 1.2). These interactions include the stretching or compressing of bonds beyond their equilibrium lengths and angles, torsion effects of twisting about single bonds, the van der Waals attractions or repulsions of atoms that come close together, and the electrostatic interactions between partial charges in a molecule due to polar bonds. To quantify the contribution of each, these interactions can be modeled by a potential function that gives the energy of the interaction as a function of distance, angle, or charge.

r Ɵ

Bond stretching Stretch-Bend

Ɵ ψ

φ Improper Angle bending Dihedrals (Out of plane bending) Figure 1.2 Examples of various types of atomic interactions in a molecule considered in MM

The total steric energy of a molecule can be written as a sum of the energies of the interactions:

= + Eqn. (3)

𝐸𝐸 𝐸𝐸𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝐸𝐸𝑛𝑛𝑛𝑛𝑛𝑛 −𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 7

Chapter 1

= + + + Eqn. (4)

𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 ℎ𝑖𝑖𝑖𝑖𝑖𝑖 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑑𝑑𝑑𝑑ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝐸𝐸 𝐸𝐸= + 𝐸𝐸 𝐸𝐸 𝐸𝐸 Eqn. (5)

𝑛𝑛𝑛𝑛𝑛𝑛 −𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑣𝑣𝑣𝑣𝑣𝑣 𝐸𝐸 All these𝐸𝐸 bonded𝐸𝐸 and non bonded terms all together are represented as a functional abstraction or force field to calculate the potential energy of a molecular system in a given conformation. Force field refer to a mathematical function with a set of parameters (obtained experimentally as well as theoretically from computer intensive quantum calculations) to represent the potential energy of a molecular system. There are various types of force fields depending upon the level of accuracy. For example, the″co arse grained″ force fields which are used to simulate large proteins provide a crude representation to save computational time, while the ‘all atom’ force fields, although computationally expensive, can accurately treat even the terminal hydrogen atoms [14]. Apart from a representative function for the potential energy, each force field has a set of parameters for each bonded and non-bonded terms. Also, each force field has a particular atom typing. For example, the parameters for an oxygen atom in a carbonyl group and in a hydroxyl group are given distinct parameters. The typical parameter set includes values for atomic mass, van der Waals radii, and partial charge for various atom types, and equilibrium values of bond lengths, bond angles, dihedral angles, impropers and also the spring constants associated with them. The parameters for given atom types are generally derived from observations on small organic molecules that are more tractable for experimental studies and quantum calculations and extrapolated for larger molecules like proteins and DNA [14]. Unlike the quantum level, simulations of relatively large systems are possible at the molecular level.

Commercial MM packages can be used to estimate the structure of very large molecules.

However, the accuracy of predicting the thermodynamic behavior of the molecules is largely dependent on the accuracy of the force field used. MD simulations of pure lipids can deal with

8

Chapter 1 length scales of the order of 10–30 nm and time scales of hundreds of nanoseconds [15]. MD simulations have been extensively used in the current thesis. A detailed discussion about the MD simulations and CHARMM force field has been given in the following chapter. Hence, in this chapter, the current section has been limited only to MM and different types of force fields.

1.1.3 Hybrid QM/MM methods

QM methods are too complex to be applied on the large biomolecular systems whereas

MM methods fail to model the enzyme mediated reaction mechanisms. Therefore considering the individual shortcomings of each of these methods, a hybrid method such as QM/MM employing their individual strengths is warranted [16]. The QM/MM partitions the biomolecular system into two regions. The active site comprises the smaller region and is treated quantum mechanically while the rest of the system is treated with the classical molecular mechanics force fields. There are two schemes to calculate the total energy of the system namely the additive

(Eqn. (6)) and subtractive (Eqn. (7)) [17].

EQM/MM (system) = (MM) + (QM) - (QM, MM) Eqn. (6) 𝐸𝐸𝑀𝑀𝑀𝑀 𝐸𝐸𝑄𝑄𝑄𝑄 𝐸𝐸𝑄𝑄𝑄𝑄−𝑀𝑀𝑀𝑀 EQM/MM (system) = (system) + (QM) - (QM) Eqn. (7) 𝐸𝐸𝑀𝑀𝑀𝑀 𝐸𝐸𝑄𝑄𝑄𝑄 𝐸𝐸𝑀𝑀𝑀𝑀 The key to such QM/MM methods is the coupling between the electric field from the surrounding and the QM Hamiltonian in the active-site region. This requires careful treatment of the boundary between the QM and MM regions, either by using hybrid orbitals for the connection or a linked atom approach. Calculation of free energies from QM/MM simulations can be performed by averaging over the system’s configurations via perturbations from a reference surface; however, such sampling for accurate free energy evaluations as well as calculations of pKa values remain challenging and form an active area of research [18].

9

Chapter 1

1.1.4 Simulating large macro-molecules

MD simulations work well for biomolecular systems such as proteins, lipids and nucleic acids, but if the system of interest contains a large number of atoms and is required to be simulated for a significantly long period, then use of atomistic MD simulations will not be practically feasible. They fall short in investigating complex phenomenon such as protein-protein assembly, vesicle diffusion, membrane deformation, DNA super coiling, DNA packaging in bacteriophage, folding of RNA in ribosome etc. However, it is possible to reduce the number of degrees of freedom in the system by approaches such as coarse grained simulations, wherein the single complex system is divided in to a couple of systems by grouping several atoms [19] as single entities (beads), allowing simulations to be run for longer time scales, but with a compromise in the accuracy. Based on the level of accuracy, either one amino acid is defined as a bead or a group of amino acids form beads, which interact with each other according to the force fields. The accuracy and utility of a coarse grained model is largely dependent on the force field parameterization which implicitly account for the enthalpic and entropic contributions of free energy. The key steps in coarse graining include development of primary models based on experimental results followed by large scale simulation and identification of interactions influencing the energetic of the model system. Coarse graining retains the primary physical features of the system and represents the atomistic scale information as simplistic but low resolution models. The final step therefore is to link with the molecular scale through all atom

MD or Monte Carlo simulations based on the previous coarse grained simulation results so as to bridge the atomistic and mesoscopic scales. The precision in such cases can be obtained with multiple iterations of the entire protocol. Thus the property or responses which are inaccessible at the atomistic or continuum levels of theory can be effectively simulated through coarse

10

Chapter 1 grained approach. Tieleman et al. (2001) describe how coarse graining can be used in the simulation of a large number of ions passing through a channel in order to calculate a macroscopic current [20].

1.2 Computer aided drug design (CADD)

Computational approaches have become an integral and indispensable part of drug discovery, both in academia and industry. Deciphering of the human genome is one of the first definitive accomplishments towards the molecular level understanding of biology. This has provided quantitative understanding of functional aspects of biology at a fundamental level unraveling a multitude of disease targets for drug discovery [21]. New drugs are constantly required for improving the treatment of existing and the newly identified diseases, in addition to the production of safer drugs by the reduction or removal of adverse side effects. Consequently huge investments are being channeled from pharmaceutical industries in research and development activities. The interdisciplinary nature of drug discovery warrants a fruitful collaboration among chemists, biologists, pharmacologists, physicians, computational and informatics scientists, etc. New lead design is now more a strategic than a serendipity driven process. Thus, in the last couple of decades, in silico approaches have become an integral part of essentially all rational drug discovery programs. The rational approaches in drug discovery are traditionally classified as analogue based and structure based drug design (SBDD) (Figure 1.3).

The confirmed computational results are a rational approach of finding potent compounds against a target disease without synthesising them in laboratory. CADD has given new dimensions to the pharmaceutical industry by identifying and optimising new potent compounds and getting more accurate results in the lab. The main advantage of CADD is it saves millions of dollars and seven to eight years of research. Analogue based approaches such as QSAR [22-25],

11

Chapter 1 pharmacophore modelling, and structure based approaches such as virtual screening (VS) [26-

28] and molecular docking [29], free energy of binding to understand binding affinity (MM-

PBSA and MM-GBSA) [30, 31], are the typical approaches employed in CADD to understand ligand drug receptor interaction, establishing the biological activity of the ligand or a compound.

The past few decades have witnessed exponential explosion of structural information as a result of numerous structural genomic initiatives and methodological advancements in high throughput X-ray crystallographic and/or NMR spectroscopy approaches. The rational design of novel drug-like candidates using SBDD is a viable alternative to phenotypic high throughput screens (HTS). For this approach, the structure of the therapeutic target protein should be determined by X-ray crystallography, NMR spectroscopy for generated using homology modelling techniques that are reliant upon the structural details of a family member.

Target Structures 3D Structure Databases Small Molecules Molecular Fragments Databases Homology Modeling Databases Prepare/Select ProteinModeling Select molecules for fragments for screening lead design Virtual Screening De Novo Lead Design CADD Structure Based Structure Based Group Build, (Docking) Appr oaches Link-Grow Ligand Based Ligand Based (Pharmacophore, (Scaffold Hopping) QSAR) Lead Optimization •Scaffold Enrichment MD Simulation Development • of Scoring •MM-PBSA/GBSA Chemotype Selectivity Function • •Rescoring/Refinement of Docked Poses

ADME/Toxicity Estimation of Prediction Binding Affinity

Figure 1.3 Outline of a typical CADD approach. In SBDD, structural data of the target receptor, preferably in complex with a ligand, is requisite in the discovery process. These structural data provide important structure-activity

12

Chapter 1 relationship (SAR) details of the therapeutic target of interest in a ligand-bound conformation.

However, structural details gathered from the structure of the target in the presence of several unique ligands provide important insight into the geometric fit of these compounds into the binding site and details including a low-energy conformation, ideal molecular electrostatic potentials, the presence of charged and/or neutral hydrogen bonds (H-bond) between functional groups, and hydrophobic interactions between lipophilic surfaces [29]. While hydrophobic interactions increase receptor-ligand binding affinities, the contribution of H-bonds to the overall binding free energy is dependent on a balance between desolvation energies of this interaction and the energy imparted on the system by newly formed H-bonds. Subtle functional group alterations within the drug-like ligand can have complex structure-function consequences [27].

Using sophisticated molecular modelling software, the ligand can be modified in silico to achieve a better theoretical fit between defined binding sites and complementary molecular volumes. In these studies the bound ligand is removed from the binding site of the receptor and new molecular structures are computationally docked into the binding site [29]. Evidence of the successes of structure-based design efforts to drug development is reflected in the large number of new drug entities that are currently in clinical evaluation. The central assumption of SBDD is that, this is an iterative process, which involves multiple cycles of design and optimization with a singular goal of identifying a modulator (inhibitor, activator, agonist or antagonist) of the target enzyme or receptor. Ideally, this drug-like compound would have demonstrable activity in the nano molar concentration range; possess good selectivity, high binding affinity, significant ligand efficiency as well as acceptable pharmacokinetic properties. A typical in silico drug design cycle consists of docking, scoring and ranking initial hits on the basis of their steric and electrostatic interactions with the target site, which is commonly referred to as VS [26-28]. One

13

Chapter 1 alternate approach employs a ligand based pharmacophore strategy that is often partnered with structure-based docking that uses a more stringent scoring matrix to enhance the enrichment of initial hits and identify the best compounds for biochemical evaluation, which are the first generation hits [32]. In the second phase, the molecular interactions between the target and biologically validated hits/leads (per industry metrics of KD, IC50 or Ki) often identify ligand- based sites for optimizing these metrics for a unique molecular chemotype. In many situations,

2D similarity searches of databases are performed using chemotype information from the first generation hits. This approach often identifies commercially available modifications of these lead compounds (limited SAR data), which are thoroughly evaluated computationally prior to being ordered and evaluated in a biological assay [33]. Subsequent iterations of this process may necessitate the chemical synthesis of other commercially unavailable and/or novel modifications around the lead chemotype. Ultimately a lead optimized compound that is active in vitro and in vivo is co-crystallized in the presence of the target to provide additional SAR information and permit structure-guided pharmacokinetic and pharmacodynamic optimization with the goal of reducing toxicity while improving potency [34]. While this remains a highly iterative process, in silico structure-guided identification, design and optimization cycles significantly decrease the length of time needed to identify/develop optimized lead compounds that merit clinical evaluation.

1.3 Tuberculosis

TB continues to remain as one of the most important and worsening health problems in

India. Approximately 2.9 million people die from TB each year worldwide; about one fifth of them in India alone [35]. The disease is of particular interest to India and Asia, with more than half of all deaths occurring in Asia. Further, about 500,000 new multi-drug resistant (MDR) TB

14

Chapter 1 cases are estimated to occur every year [36]. The principal TB etiological agent in humans M.

Tb, enters the human body via the respiratory tract through the inhalation of respiratory droplet nuclei, which are typically 1–2 µm in size. After entering the lungs, the pathogen may be destroyed by a strongly effective initial host response or grow and multiply immediately after infection, causing a primary TB infection. The bacilli may also become dormant and never cause disease at all or the latent ones can eventually become active and progress to disease condition

[37]. Cole and co-workers [38] were the first to sequence the complete genome of M. Tb, in

1998. Camus and coworkers re-annotated it in 2002 [39]. The M. Tb genome comprising

4,411,529 base pairs is very rich in guanine and cytosine content that is reflected in the biased amino-acid contents. A very large portion of the M. Tb genome codes for lipogenetic and lipolytic enzymes making it strikingly different from other bacteria [38]. Slow growth, intracellular pathogenesis, dormancy, complex cell envelope and genetic homogeneity have been found as some of the characteristic features of M. Tb [38]. The cell envelope of M. Tb, contains an additional layer beyond the peptidoglycan cell wall that is very rich in unusual lipids, glycolipids and polysaccharides. Specific biosynthetic pathways generate components such as mycolic acids, mycocerosic acid, lipo-arabinomannan, phenolthiocerol and arabino-galactan, which are responsible for mycobacterial longevity, host-pathogen reactions and pathogenesis

[40-43]. Arabinogalactan-mycolate [44], by covalent bonds with peptidoglycan and trehalose dimycolate, gives rise to a thick layer that protects the M. Tb from general antibiotics and the immune system of the host [45]. Mycolic acids are long chain α-alkyl-β-hydroxy fatty acids and the major constituents of the mycobacterial cell envelope. The biosynthetic pathway of mycolic acid has been shown to be very crucial for the survival of M. Tb [46]. InhA, an enoyl-ACP reductase, involved in mycolic acid synthesis, is a well-known target for front-line anti-

15

Chapter 1 tubercular drugs [47] such as isoniazid [48], and ethionamide [49]. A novel and important drug target CmaA1 [50], which is responsible for the cyclopropanation of mycolic acids in the mycolic acid pathway was therefore chosen for our study (Chapters 3, 4 and 5).

1.3.1 Current anti-TB drugs, vaccines and therapies

Most of the anti-TB drugs that are in use currently were discovered during the 1950s and

1960s and are of either chemical or antibiotic origin. Streptomycin, the first effective TB drug, was isolated from Streptomyces griseus by Albert Schatz and Selman Waksman in 1944 [51]. In

1938 sulphanilamide, a sulpha drug, developed for treatment of Gram-positive bacterial infections was found to be effective for TB infection in pigs and opened the path to refine the sulpha drugs for TB treatment. Consequently, thiosemicarbazones, were synthesized showing more efficacy than sulphanilamide, but were poorer, compared to streptomycin [51]. Discovery of isoniazid in 1952 by reshuffling of chemical groups in thiosemicarbazone was a major breakthrough in the history of anti-TB drug discovery. The nicotinamide lead also led to the discovery of pyrazinamide in 1952. Synthesis of diamine analogs, inspired by the anti-TB effect of diamines and polyamines, led to the discovery of ethambutol, in 1961. By screening for antibiotics from soil microbes, many other anti-TB drugs, such as cycloserine, kanamycin and its derivative amikacin, viomycin, capreomycin, and rifamycins and rifampicin were discovered

[51]. Broad-spectrum quinolones were developed in 1980s on the basis of the anti-bacterial activity of nalidixic acid discovered in the 1960s. The quinolones, were subsequently shown to have high activity against M. Tb are being used as second-line drugs for drug-resistant TB since the late 1980s [51]. Currently, among over available 20 anti-TB drugs, isoniazid, rifampicin, pyrazinamide, streptomycin and ethambutol, are used as the front-line drugs. Next preferred drugs are kanamycin, capreomycin, amikacin and viomycin which are injectable.

16

Chapter 1

Fluoroquinolones such as ciprofloxacin and ofloxacin have been found to be indispensable in the treatment of MDR-TB. P-aminosalicylic acid, ethionamide and cycloserine are used as second- line drugs that show high clinical efficacy but also have severe side effects [52]. Isoniazid and ethionamide target the mycolic acid biosynthesis [48, 49]; while cycloserine and ethambutol inhibit peptidoglycan synthesis [53] and cell wall arabinogalactan [54, 55] respectively, weakening the cell wall of the bacterium. Rifampicin and Amikacin act by targeting the transcription and translation processes of M. Tb [56-57]. The first-line drugs exhibit activities against actively metabolizing bacteria, while the second-line drugs are used to combat resistance.

Isoniazid is shown to be the strongest first line drug against the bacteria growing actively in cavities, followed by rifampicin, streptomycin and quinolones. However, isoniazid causes high toxicity and hence side effects like fever [59].

Drugs Targets Pathways targeted Isoniazid InhA Mycolicacid pathway Isoniazid Pyrazinamide Ethambutol Rifampin RNA polymerase RNA synthesis First line drugs Pyrazinamide FAS-I Mycolicacid pathway Streptomycin 30S ribosomal subunit Protein synthesis Ethambutol Arabinosyltransferase AG synthesis Rifampicin Streptomycin PAS DHPS Folate synthesis Kanamycin 30S ribosomal subunit Protein synthesis Ethionamide InhA Mycolicacid pathway Second line Amikacin 30S ribosomal subunit Protein synthesis drugs Cycloserine L-ala racemase Alanine metabolism p-aminosalicylic Kanamycin Fluoroquinolones Capreomycin 16SrRNA Protein synthesis acid Fluoroquinolones DNAgyrase DNA synthesis

Gatifloxacin DNAgyrase DNA synthesis Moxifloxacin DNAgyrase DNA synthesis Ethionamide Amikacin Cycloserine Drugs under ATP synthesis clinical TMC 207 ATP synthase trials PA-824 Unknown Mycolicacid pathway OPC67683 Unknown Mycolicacid pathway Gatifloxacin Moxifloxacin TMC 207 SQ109 Mmpl3 Mycolicacid pathway

PA-824 SQ109 Figure 1.4 Currently available anti-TB drugs along with their targets and pathways

17

Chapter 1

Challenges in TB chemotherapy include drug inactivation e.g. β-lactam antibiotics, decreased influx of drugs e.g. dormant bacilli, Increased efflux e.g. fluoroquinolones, isoniazid, target alteration e.g. rifampicin, streptomycin, kanamycin, ethionamide, isoniazid, target over expression e.g. isoniazid, failure to activate drug e.g. isoniazid, pyrazinamide, ethionamide etc.

Figure 1.4 shows the current drugs available for TB, along with their targets and pathways.

1.3.2 Current TB therapy (DOTS)

WHO initiated DOTS (Directly Observed Therapy, Short-course) therapy recommends usage of isoniazid, rifampicin, and pyrazinamide, ethambutol for the first two months followed by isoniazid and rifampicin for the next four months. DOTS has a cure rate of up to 95%, however this is dependent on patient compliance [60]. The initial phase aims at inhibiting the cell wall, nucleic acids and mycobacterial protein synthesis. The second phase aims at elimination of all remaining bacilli by the bactericidal action to consolidate the treatment. DOTS strategy although has many shortcomings, prevents the occurrence of newer infections and reduces the emergence of MDR- and extremely drug resistant (XDR) TB [61].

1.3.3 Drawbacks of the current therapy

The lengthy six month therapy adopted in DOTS results in patient non-compliance and generates drug-resistant strains. Although TB chemotherapy leaves the patients infection free for a few weeks after the initiation of therapy, the therapy needs to be continued for a longer period.

Persistence is one of the critical aspects of TB, posing a serious challenge to the effectiveness of anti-tubercular drugs and therapies [51]. Antibiotics act only on growing bacteria, but are ineffective on the ones in the stationary phase, persisting bacteria not killed during antibiotic

18

Chapter 1 exposure and the dormant bacteria [51]. The mechanism of phenotypic resistance in M. Tb is still unexplored. The severe side effects are another aspect that adversely affects patient compliance.

1.3.4 Co-existence of M. Tb and HIV

According to the world health organization (WHO), the increase in the global incidence of this disease is attributed to co-infection with HIV-AIDS taking a toll of about 0.35 million people every year. During HIV infection, the risk of acquisition, reactivation, and re-infection of

TB increases by 30 times because of the immune deficiency [61]. HIV infection also represents the major risk for the progression of a latent TB infection to active disease. On the other hand,

TB infection facilitates HIV infection by release of the pro inflammatory cytokines and over expression of co-receptors CXCR4 and CCR5. HIV infection mostly depletes the number of

CD4+ T cells, which is very crucial to kill M. Tb. The T cells produce interferon gamma, which activates the macrophages to produce reactive oxygen and nitrogen intermediates, leading to phago-lysosome formation and killing of the bacteria [61]. Tumor necrosis factor alpha (TNF ά) production in response to M. Tb infection is crucial for control of bacterial replication, and the mycobacteria are maintained within the granuloma to contain the spread of disease, which, in turn, is maintained by CD4+ T cells and TNF. The risk of TB acquisition is twice during the primary HIV infection, which increases 10 times when CD4+ cell counts less than 100 per μl than with counts greater than 500 per μl. TB infection releases many pro inflammatory cytokines such as TNF and IL6 and other mechanisms, such as over expression of co-receptors CXCR4 and CCR5, thus facilitating an increased viral load and further suppression of CD4+ cell count

[62]. Anti retroviral therapy has been shown to reduce TB incidence, however, there are several factors that make the simultaneous treatment difficult. These include additive/overlapping toxicity profiles, drug interactions leading to sub-therapeutic concentrations of one or more

19

Chapter 1 agents culminating in treatment failure, complicated regimens that make adherence difficult.

Rifampicin, an integral component of anti tuberculosis therapy, interacts with both no nucleoside analogue reverse transcriptase inhibitors (NNRTIs) and protease inhibitors (PIs). The presence of rifampicin induces CYP3A4, which in turn increases the metabolism of NNRTIs and reduces its concentrations to sub therapeutic levels resulting in treatment failure [63]. Rifampicin decreases the serum concentrations of nevirapine by approximately 50 %. Rifampicin has also been shown to substantially decrease therapeutic concentrations of the CCR5 inhibitor maraviroc and the integrase inhibitor raltegravir [64]. PIs, such as ritonavir, increase the serum levels of rifabutin by inhibiting CYP3A4 and, thus, preventing its breakdown, resulting in increased toxicity [65].

1.4 Role of CADD in TB drug discovery

Most of the anti-TB drugs in current practice have been discovered by a combination of serendipity and novel chemical modifications of an existing lead compound. Considering the fact that, most of these discoveries were made decades ago, there is a desperate need for applying newer strategies for anti-TB drug discovery to address the serious threats posed by TB and the rise of resistant strains. The emergence of MDR and XDR TB, which are spreading to many countries and posing a major threat to TB eradication programmes, demand exploration of newer and more rational strategies for the identification of newer drugs and drug targets for TB. Hence, it is clear that new chemical entities to shorten therapy and dual inhibitors to prevent M. Tb and

HIV are desperately needed, which can come with a multidisciplinary approaches of identifying and understanding the structure and dynamic behavior of the potential drug targets of M. Tb and

HIV. Considering the complexity of the disease, computational approaches play a very important role to fasten the drug discovery process as well as to attack the problems at atomistic /molecular levels in a time and cost effective way.

20

Chapter 1

A large number of computational studies have been employed in recent times to identify potential druggable targets in case of M. Tb. Cui et al. report a study wherein an analysis of protein-protein interaction networks constructed by homogenous protein mapping (HPM) method comprising of 738 proteins and 5639 interaction pairs, has shown molecular chaperones, ribosomal proteins and ABC transporters to be highly interconnected proteins [65]. Comparison of HPM data with other computational predictive methods such as gene cluster (GC), conserved gene methods (GN), phylogenetic profile method (PP), Rosetta-Stone (RS) indicates a high overlap with data produced by the RS method and very low overlap with GC method. Kushwaha et al. reported a study mentioning non-homologous proteins (selective to pathogen) as preferential drug targets through protein-interaction network analysis. In this study, comparative metabolic pathway analysis is performed with consideration of non-homologous proteins identified through BLASTp, followed by metabolic chokepoint analysis and protein interaction network analysis for identification of interacting proteins [66]. Researchers worldwide have been trying to identify, characterize and understand the pathways of the druggable targets of this deadly pathogen, both theoretically and experimentally since last few decades. A recent study proposes structural annotation of M. Tb proteome and provides detailed information about the predominant folds and combinations of folds that constitute multi domain proteins [67]. The publication of complete genome of M. Tb has lead to the development of new genetic tools to ascertain the functioning of individual genes, leading to subsequent identification and validation of potential drug targets. With the help of computer-aided approaches it is possible to find molecules with desired chemical and geometric properties that bind in a receptor cavity of specific target protein. These approaches would lead to promising new drugs for the effective treatment of not only TB, but also for MDR, XDR, HIV and persistent TB. The importance of

21

Chapter 1 modeling in anti-TB drug discovery efforts too has been revealed by numerous QSAR, pharmacophore, and/or docking studies. Studies using MD and docking techniques to elucidate resistance to TB drugs like isoniazid [68] aid us in understanding mechanism of resistance better and may help in developing newer inhibitors to the same target. QSAR, docking and QM/MM studies on MbtA (important in synthesis of myobactin, a siderophore) inhibitors are examples of excellent modeling studies pursued in M. Tb [69]. Another investigation, focusing in inhibition of thymidine monophosphate kinase (TMPkinase) of M. Tb has been recently reported [70], where a receptor independent 4D QSAR formalism is combined with a novel 3D pharmacophore generation. The same enzyme has also been part of a study generating potent anti tubercular moieties using a VS approach with molecular fingerprints [71]. An enzyme IspF which is a part of the non mevalonate route to isoprenoid synthesis, validated as a target by genetic approaches in bacteria, has been subjected to VS, based on a hierarchical filtering methodology and docking studies leading to generation of novel IspF inhibitors [72]. Recent studies have also focused on in silico design, synthesis and inhibitory activity analysis of pantothenate synthetase inhibitors, found to be important in the non replicating persistent form of M. Tb [73]. Besides a number of cell wall component biosynthesis inhibitors have also been scrutinized using docking and MD simulations. Work on protease inhibitors targeting both HIV and TB proteases are also of significant interest in case of AIDS patients. [74] Thus a wide range of proteins key for survival of M. Tb are being looked into as druggable targets [52]. Although numerous attempts have been made to develop broad range anti-TB drugs we are still in need of strategies that will help in evolving more potent and effective class of drugs. Several initiatives (OSDD [75] TB structural genomics consortium, TB alliance etc [76]) both in public and private sectors are currently in progress. It has been the endeavor of several groups to identify novel drug targets for M. Tb with

22

Chapter 1

a bioinformatics approach following complete sequencing of its genome [77]. These include use

of comparative metabolic pathway analysis [78], systems biology approaches [79], weighted

gene ranking for druggability [80] and comparing 3D structure databases [81]. Target validation

might include many components: demonstration of the biochemical activity of the enzyme,

determination of its crystal structure in complex with a substrate or an inhibitor, confirmation of

essentiality, and the identification of potent growth inhibitors either in vitro or in an infection

model. An important consideration in identifying targets is to ensure that the target is crucial for

survival of the bacteria; it should also be sufficiently different from the host proteins and gut

flora [82]. The metabolic pathways that are currently being targeted in anti-TB therapy are given

in Table 1.1.

Table 1.1 List of recently reported targets in M. tuberculosis along with pathways represented by them, and their inhibitors.

Stage of life cycle Target Pathway Inhibitor of M. Tb GlgE Maltose metabolism - Mycolic acid Mycolic acid metabolism Dioctylamine cyclopropanation DprE1/DprE2 Cell wall metabolism Benzothiazinone, dinitrobenzamides Actively MshC Mycothiol ligase Dequalinium chloride growing M. Tb HisG Histidine biosynthesis Nitrobenzothiazole AtpE ATP synthesis Diarylquinoline, TMC207 Def Protein processing LBK-611 Methionine amino Protein processing 2,3-dichloro-1,4-naphthoquinones peptidase Isocitrate lyase Energy metabolism Proteasome complex Protein processing Oxathiazol-2-one Dormant L,D transpeptidase Peptidoglycan metabolism - DosR (DevR) Regulation of dormancy - CarD Stringent response - Cell wall composition and permeability is of substantial significance in helping the

bacterium to counter host induced immune response. Typical targets include enzymes involved

in synthesis of unique components characteristic of mycobacterial cell walls [83]. Enzymes

involved in basic metabolic pathways and in siderophore biosynthesis needed for iron

23

Chapter 1 sequestration have also been keenly studied. Factors involved in conferring virulence to different strains [84] and those involved in shifting from highly active aerobic form to maintaining a dormant anaerobic state in host-induced hypoxia are also currently being extensively researched

[85, 86]. Thus target identification and validation are steps that are extremely crucial for in silico drug design [87]. A review by Youcef Mehellou and Erik De Clercq throw light on the targets and inhibitors of HIV [88]. The currently Food and Drug Administration (FDA) approved anti-

HIV drugs can be divided into seven groups: nucleoside reverse transcriptase inhibitors (NRTIs), nucleotide reverse transcriptase inhibitors (NtRTIs), non-nucleoside reverse transcriptase inhibitors (NNRTIs), protease inhibitors (PIs), fusion inhibitors (FIs), co-receptor inhibitors

(CRIs), and integrase inhibitors (INIs). The following Table 1.2 illustrates the various stages of

HIV life cycle and where can they be targeted. Hence, these six are considered as the major targets of HIV.

Table 1.2 Stages of HIV life cycle and where can they be targeted.

Stage of HIV lifecycle Potential intervention Binding to target cell Antibodies to the virus or cell receptor Early entry to target cell Drugs that block fusion or interfere with retroviral uncoating Transcription of RNA to DNA by reverse Reverse transcriptase inhibitors transcriptase Degradation of viral RNA in the RNA-DNA hybrid Inhibitors of RNase H activity Integration of DNA into the host genome Drugs that inhibit “integrase” function Expression of viral genes “Antisense” constructs; inhibitors of the tat protein or art/trs protein Viral component production and assembly Myristoylation, glycosylation, and protease inhibitors Budding of virus Interferons

1.5 Objectives of the current work

The overall work presented in this thesis employs various computational techniques as well as new strategies to identify new inhibitors of M. Tb and HIV. Identification of new drug targets and discovery of new drugs and vaccine candidates for TB is the need of the hour.

24

Chapter 1

Considering the complexity of the disease, multidisciplinary strategies are the essentially required to tackle the multiple aspects such as resistance, permeability, toxicity, drug efflux, compatibility with anti HIV therapy etc. In these circumstances, computational approaches play a very important role to fasten the drug discovery process as well as to attack the problems at atomistic /molecular levels in a time and cost effective way. This thesis aims at employing rigorous computational methods to understand the active site dynamics of CmaA1, and pharmacophore based VS including the flexibility of the target. The thesis also presents strategies like drug repositioning and polypharmacology to screen inhibitors for CmaA1 keeping the drug resistance, toxicity and coexistence with HIV in mind. Other objectives are to evaluate the efficiency of C-DFT descriptors to model the structure activity relationship of HIV Protease inhibitors and understand the target binding preferences, structural and functional diversity of important NP scaffolds.

1.6 Overview of the thesis

Chapter 1 of this thesis gives a brief account of the importance and applications of computations to investigate biomolecules at different length and time scales, followed by an account of the current challenges and bottlenecks of the current therapies for treating TB and parallel treatment of HIV and TB. Chapter 2 gives a theoretical detail of various computational methods and analyses that have been used in the thesis. Chapter 3 deals with the study of the precise changes that occur in the active site of CmaA1 during various stages of cyclopropanation reaction using MD simulation method. In Chapter 4 dynamics based pharmacophore models have been generated from the snapshots of MD trajectories of CmaA1 model systems. The predictive abilities of these dynamic structure based pharmacophore models have been validated by mapping the known CmaA1 inhibitors with the models. A comparison of the performance of

25

Chapter 1 the pharmacophore models generated from the static crystal structure and MD snapshots is done.

Chapter 5 presents the generation and validation of ligand based pharmacophore models from the

MD trajectories and screening the DrugBank and ChEMBL databases in order to identify inhibitors of CmaA1. Screening has been done by the dynamic ligand and structure based pharmacophore models, docking and ADMET filters. Chapter 6 is about generation of huge numbers of QSAR models from 156 HIV PIs belonging to nine different clusters. This chapter discusses the importance of quantum chemical and C-DFT based descriptors to explain the activities of the HIV protease inhibitors. Chapter 7 reports an exhaustive analysis target binding,

ADMET/physicochemical properties of 110 NP and ND drugs reported in DrugBank, containing the HHCPF. HHCPF is one of the most privileged NP scaffolds that and we aim to design compound libraries in future to inhibit M. Tb and HIV targets with the insights that we got from this chapter. The last chapter summarizes the work presented in the thesis.

26

Chapter 1

References 1. Badrinarayan P., Choudhury C., Sastry G. N. (2014) Molecular modeling in the book entitled "Systems and Synthetic Biology (S2B2)" Ed. by P. K. Dhar and V. Singh, Springer Press, 93-128. 2. Andrew, R. L. (1996). Molecular modelling: principles and applications (Second Edition), Prentice Hall. 3. Thornton, J. E., & Thortorn, J. E. (1970). Design of a computer: the Control Data 6600 (pp. 60-63). Glenview, Ill.: Scott, Foresman. 4. Shaw, D. E., Deneroff, M. M., Dror, R. O., Kuskin, J. S., Larson, R. H., Salmon, J. K., Wang, S. C. (2007) 34th Annual International Symposium on Computer Architecture (ISCA ’07), San Diego. 5. States, D.J., Omenn, G.S., Blackwell, T.W., Fermin, D., Eng, J., Speicher, D.W., Hanash, S.M. (2006) Challenges in deriving highconfidence protein identifications from data gathered by a HUPO plasma proteome collaborative study. Nat Biotechnol, 24, 333–338. 6. Snow, C.D., Sorin, E.J., Rhee, Y.M., Pande, V.S. (2005) How well can simulation predict protein folding kinetics and thermodynamics? Annu Rev Biophys Biomol Struct, 34, 43– 69. 7. Noble, D. (2002) Modeling the heart—from genes to cells to the whole organ. Science 295, 1678–1682. 8. Cramer, C. J. (2013). Essentials of : theories and models. John Wiley & Sons. 9. Werner, E. (2005) Meeting report: the future and limits of systems biology. Sci. STKE 5, pe16. 10. Burrowes, K. S., Tawhai, M. H., & Hunter, P. J. (2004). Modeling RBC and neutrophil distribution through an anatomically based pulmonary capillary network. Ann Biomed Engg, 32(4), 585-595. 11. Jensen, F. (2013). Introduction to computational chemistry. John Wiley & Sons. 12. Levine, I. N. (2013) (Seventh Edition). Prentice Hall. 13. Field, M. J., Moleculaire, L. D., Grenoble, C. (2007) Practical introduction to the simulation of molecular systems. Second edn. Cambridge University Press. 14. Ponder, J. W., & Case, D. A. (2003). Force fields for protein simulations. Adv Prot Chem 66, 27-85. 15. Tieleman, D. P. (2006). Computer simulations of transport through membranes: passive diffusion, pores, channels and transporters. Clin Exp Pharmacol Physiol, 33(10), 893- 903. 16. Ayton, G. S., Noid, W. G., & Voth, G. A. (2007). Multiscale modeling of biomolecular systems: in serial and in parallel. Curr Opin Struct Biol, 17(2), 192-198. 17. Sherwood, P., Brooks, B. R., & Sansom, M. S. (2008). Multiscale methods for macromolecular simulations. Curr Opin Struct Biol, 18(5), 630-640. 18. Senn, H. M., & Thiel, W. (2009). QM/MM methods for biomolecular systems. Angew Chem Int Ed, 48(7), 1198-1229. 19. Saunders, M. G., & Voth, G. A. (2013). Coarse-graining methods for computational biology. Annu Rev Biophys, 42, 73-93. 20. Tieleman, D. P., C Biggin, P., R Smith, G., & SP Sansom, M. (2001). Simulation approaches to ion channel structure–function relationships. Quart Reviews Biophys, 34(04), 473-561.

27

Chapter 1

21. Hopkins, A. L., & Groom, C. R. (2002). The druggable genome. Nat Rev Drug Discov, 1(9), 727-730. 22. Bohari, M. H., Srivastava, H. K., & Sastry, G. N. (2011). Analogue-based approaches in anti-cancer compound modelling: the relevance of QSAR models. Org Med Chem Lett, 1(1), 1-12. 23. Srivani, P., & Sastry, G. N. (2009). Potential choline kinase inhibitors: a molecular modeling study of bis-quinolinium compounds. J Mol Graph Model, 27(6), 676-688. 24. Ravindra, G. K., Achaiah, G., & Sastry, G. N. (2008). Molecular modeling studies of phenoxypyrimidinyl imidazoles as p38 kinase inhibitors using QSAR and docking. Eur J Med Chem, 43(4), 830-838. 25. Kumar Srivastava, H., H Bohari, M., & Narahari Sastry, G. (2012). Modeling anti-HIV compounds: the role of analogue-based approaches. Curr Comput-Aid Drug 8(3), 224- 248. 26. Reddy, A. S., Pati, S. P., Kumar, P. P., Pradeep, H. N., & Sastry, G. N. (2007). Virtual screening in drug discovery-a computational perspective. Curr Protein Pept Sc, 8(4), 329-351. 27. Badrinarayan, P., & Sastry, G. N. (2012). Virtual screening filters for the design of type II p38 MAP kinase inhibitors: A fragment based library generation approach. J Mol Graph Model, 34, 89-100. 28. Badrinarayan, P., & Narahari Sastry, G. (2011). Virtual high throughput screening in new lead identification. Comb Chem High T Scr, 14(10), 840-860. 29. Bohari, M. H., & Sastry, G. N. (2012). FDA approved drugs complexed to their targets: evaluating pose prediction accuracy of docking protocols. J Mol Model, 18(9), 4263- 4274. 30. Srivastava, H. K., & Sastry, G. N. (2013). Efficient estimation of MMGBSA-based BEs for DNA and aromatic furan amidino derivatives. J Biomol Struct Dyn, 31(5), 522-537. 31. Srivastava, H. K., & Sastry, G. N. (2012). Molecular dynamics investigation on a series of HIV protease inhibitors: assessing the performance of MM-PBSA and MM-GBSA approaches. J Chem Inf Model, 52(11), 3088-3098. 32. Yang, S. Y. (2010). Pharmacophore modeling and applications in drug discovery: challenges and recent advances. Drug Discov Today, 15(11), 444-450. 33. Willett, P. (2006). Similarity-based virtual screening using 2D fingerprints. Drug Discov Today, 11(23), 1046-1053. 34. Jain, A. N. (2004). Virtual screening in lead discovery and optimization. Curr Opin Drug Disc, 7(4), 396-403. 35. World Health Organization. (2010). Global tuberculosis control: WHO report 2010. World Health Organization. 36. Bass Jr, J. B., Farer, L. S., Hopewell, P. C., O'Brien, R., Jacobs, R. F., Ruben, F., Thornton, G. (1994). Treatment of tuberculosis and tuberculosis infection in adults and children. American Thoracic Society and the Centers for Disease Control and Prevention. Am J Resp Crit Care, 149(5), 1359-1374. 37. Schluger, N. W., & Rom, W. N. (1998). The host immune response to tuberculosis. Am J Resp Crit Care, 157(3), 679-691. 38. Cole, S., Brosch, R., Parkhill, J., Garnier, T., Churcher, C., Harris, D., Barrell, B. G. (1998). Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature, 393(6685), 537-544.

28

Chapter 1

39. Camus, J. C., Pryor, M. J., Médigue, C., & Cole, S. T. (2002). Re-annotation of the genome sequence of Mycobacterium tuberculosis H37Rv. Microbiology, 148(10), 2967- 2973. 40. Smith, I. (2003). Mycobacterium tuberculosis pathogenesis and molecular determinants of virulence. Clin Microbiol Rev, 16(3), 463-496. 41. Barry, C. E., Lee, R. E., Mdluli, K., Sampson, A. E., Schroeder, B. G., Slayden, R. A., & Yuan, Y. (1998). Mycolic acids: structure, biosynthesis and physiological functions. Prog Lipid Res, 37(2), 143-179. 42. Dubnau, E., Chan, J., Raynaud, C., Mohan, V. P., Lanéelle, M. A., Yu, K., Daffé, M. (2000). Oxygenated mycolic acids are necessary for virulence of Mycobacterium tuberculosis in mice. Mol Microbiol, 36(3), 630-637. 43. Glickman, M. S., Cox, J. S., & Jacobs, W. R. (2000). A novel mycolic acid cyclopropane synthetase is required for cording, persistence, and virulence of Mycobacterium tuberculosis. Mol Cell, 5(4), 717-727. 44. Crick, D. C., Mahapatra, S., & Brennan, P. J. (2001). Biosynthesis of the arabinogalactan-peptidoglycan complex of Mycobacterium tuberculosis. Glycobiology, 11(9), 107R-118R. 45. Takayama, K., Wang, C., & Besra, G. S. (2005). Pathway to synthesis and processing of mycolic acids in Mycobacterium tuberculosis. Clin Microbiol Rev, 18(1), 81-101. 46. Draper, P. H. I. L. I. P., & Daffé, M. (2005). The cell envelope of Mycobacterium tuberculosis with special reference to the capsule and the outer permeability barrier. Tuberculosis, 17(3), 261-273. 47. Pasqualoto, K. F., Ferreira, E. I., Santos-Filho, O. A., & Hopfinger, A. J. (2004). Rational design of new antituberculosis agents: receptor-independent four-dimensional quantitative structure-activity relationship analysis of a set of isoniazid derivatives. J Med Chem, 47(15), 3755-3764. 48. Lei, B., Wei, C. J., & Tu, S. C. (2000). Action mechanism of antitubercular isoniazid Activation by Mycobacterium tuberculosis KatG, isolation, and characterization of InhA inhibitor. J biol Chem, 275(4), 2520-2526. 49. Banerjee, A., Dubnau, E., Quemard, A., Balasubramanian, V., Um, K. S., Wilson, T., Jacobs, W. R. (1994). inhA, a gene encoding a target for isoniazid and ethionamide in Mycobacterium tuberculosis. Science, 263(5144), 227-230. 50. Huang, C. C., Smith, C. V., Glickman, M. S., Jacobs, W. R., & Sacchettini, J. C. (2002). Crystal structures of mycolic acid cyclopropane synthases fromMycobacterium tuberculosis. J biol Chem, 277(13), 11559-11569. 51. Cole, S., Brosch, R., Parkhill, J., Garnier, T., Churcher, C., Harris, D., Barrell, B. G. (1998). Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature, 393(6685), 537-544. 52. Janin, Y. L. (2007). Antituberculosis drugs: ten years of research. Bioorgan Med Chem, 15(7), 2479-2513. 53. Feng, Z., & Barletta, R. G. (2003). Roles of Mycobacterium smegmatis D-alanine: D- alanine ligase and D-alanine racemase in the mechanisms of action of and resistance to the peptidoglycan inhibitor D-cycloserine. Antimicrob Agents Ch, 47(1), 283-291. 54. Deng, L., Mikusová, K., Robuck, K. G., Scherman, M., Brennan, P. J., & McNeil, M. R. (1995). Recognition of multiple effects of ethambutol on metabolism of mycobacterial cell envelope. Antimicrob Agents Ch, 39(3), 694-701.

29

Chapter 1

55. Belanger, A. E., Besra, G. S., Ford, M. E., Mikusová, K., Belisle, J. T., Brennan, P. J., & Inamine, J. M. (1996). The embAB genes of Mycobacterium avium encode an arabinosyl transferase involved in cell wall arabinan biosynthesis that is the target for the antimycobacterial drug ethambutol. P Natl Acad Sci Usa, 93(21), 11919-11924. 56. Telenti, A., Imboden, P., Marchesi, F., Matter, L., Schopfer, K., Bodmer, T., Cole, S. (1993). Detection of rifampicin-resistance mutations in Mycobacterium tuberculosis. The Lancet, 341(8846), 647-651. 57. Busscher, G. F., Rutjes, F. P., & Van Delft, F. L. (2005). 2-Deoxystreptamine: central scaffold of aminoglycoside antibiotics. Chem Rev, 105(3), 775-792. 58. Maus, C. E., Plikaytis, B. B., & Shinnick, T. M. (2005). Molecular analysis of cross- resistance to capreomycin, kanamycin, amikacin, and viomycin in Mycobacterium tuberculosis. Antimicrob Agents Ch, 49(8), 3192-3197. 59. Ducati, R. G., Ruffino-Netto, A., Basso, L. A., & Santos, D. S. (2006). The resumption of consumption: a review on tuberculosis. Mem I Oswaldo Cruz, 101(7), 697-714. 60. Zhang, Y. (2007). Advances in the treatment of tuberculosis. Clin Pharmacol Ther, 82(5), 595-600. 61. Padyana, M., Bhat, R. V., Dinesha, M., & Nawaz, A. (2012). HIV-tuberculosis: a study of chest x-ray patterns in relation to CD4 count. N Am J Med Sci, 4(5), 221. 62. Burman, W. J., Gallicano, K., & Peloquin, C. (1999). Therapeutic implications of drug interactions in the treatment of human immunodeficiency virus-related tuberculosis. Clin Infect Dis, 419-429. 63. Burman, W. J., Gallicano, K., & Peloquin, C. (1999). Therapeutic implications of drug interactions in the treatment of human immunodeficiency virus-related tuberculosis. Clin Infect Dis, 419-429. 64. Varghese, G. M., Janardhanan, J., Ralph, R., & Abraham, O. C. (2013). The twin epidemics of tuberculosis and HIV. Curr Infect Dis Rep, 15(1), 77-84. 65. Cui, T., Zhang, L., Wang, X., & He, Z. G. (2009). Uncovering new signaling proteins and potential drug targets through the interactome analysis of Mycobacterium tuberculosis. BMC genomics, 10(1), 118. 66. Kushwaha, S. K., & Shakya, M. (2010). Protein interaction network analysis—approach for potential drug target identification in Mycobacterium tuberculosis. J Theor Biol, 262(2), 284-294. 67. Raman, K., Yeturu, K., & Chandra, N. (2008). targetTB: a target identification pipeline for Mycobacterium tuberculosis through an interactome, reactome and genome-scale structural analysis. BMC Syst Biol, 2(1), 109. 68. Wahab, H. A., Choong, Y. S., Ibrahim, P., Sadikun, A., & Scior, T. (2008). Elucidating isoniazid resistance using molecular modeling. J Chem Inf Model, 49(1), 97-107. 69. Neres, J., Labello, N. P., Somu, R. V., Boshoff, H. I., Wilson, D. J., Vannada, J., Chen, L., Barry, III, C. E., Bennett, E. M., Aldrich, C. C. (2008) Inhibition of Siderophore Biosynthesis in Mycobacterium tuberculosis with Nucleoside Bisubstrate Analogues: Structure−Activity Relationships of the Nucleobase Domain of 5′-O-[N- (Salicyl)sulfamoyl]adenosine. J Med Chem, 51, 5349–5370. 70. Andrade, C. H., Pasqualoto, K. F., Ferreira, E. I., & Hopfinger, A. J. (2009). Rational design and 3D-pharmacophore mapping of 5′-thiourea-substituted α-thymidine analogues as mycobacterial TMPK inhibitors. J Chem Inf Model, 49(4), 1070-1078. 71. Kumar, A., Chaturvedi, V., Bhatnagar, S., Sinha, S., & Siddiqi, M. I. (2008). Knowledge

30

Chapter 1

based identification of potent antitubercular compounds using structure based virtual screening and structure interaction fingerprints. J Chem Inf Model, 49(1), 35-42. 72. Ramsden, N. L., Buetow, L., Dawson, A., Kemp, L. A., Ulaganathan, V., Brenk, R., Hunter, W. N. (2009). A Structure-Based Approach to Ligand Discovery for 2 C-Methyl- d-erythritol-2, 4-cyclodiphosphate Synthase: A Target for Antimicrobial Therapy. J Med Chem, 52(8), 2531-2542. 73. Velaparthi, S., Brunsteiner, M., Uddin, R., Wan, B., Franzblau, S. G., & Petukhov, P. A. (2008). 5-tert-Butyl-N-pyrazol-4-yl-4, 5, 6, 7-tetrahydrobenzo [d] isoxazole-3- carboxamide derivatives as novel potent inhibitors of Mycobacterium tuberculosis pantothenate synthetase: initiating a quest for new antitubercular drugs. J Med Chem, 51(7), 1999-2002. 74. Bonora, S., & Di Perri, G. (2008). Interactions between antiretroviral agents and those used to treat tuberculosis. Curr Opin HIV AIDS, 3(3), 306-312. 75. OSDD, Open source drug discovery, http://www.osdd.net/ 76. TB structural genomics consortium, http://www.doe-mbi.ucla.edu/TB/. 77. Global Alliance for TB Drug Development, http://www.tballiance.org/home, 2009. 78. Cole, S. T., Brosch, R., Parkhill, J., Garnier, T., Churcher, C., Harris, D., Barrell, B. G. (1998). Erratum: Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature, 396(6707), 190-190. 79. Anishetty, S., Pulimi, M., & Pennathur, G. (2005). Potential drug targets in Mycobacterium tuberculosis through metabolic pathway analysis. Comput Biol Chem, 29(5), 368-378. 80. Raman, K., Yeturu, K., & Chandra, N. (2008). targetTB: a target identification pipeline for Mycobacterium tuberculosis through an interactome, reactome and genome-scale structural analysis. BMC Syst Biol, 2(1), 109. 81. Hasan, S., Daugelat, S., Rao, P. S., & Schreiber, M. (2006). Prioritizing genomic drug targets in pathogens: application to Mycobacterium tuberculosis. Plos Comput Biol, 2(6), e61. 82. Jose Freitas da Silveira, N., Eduardo Bonalumi, C., Andrade Arcuri, H., & Filgueira de Azevedo Junior, W. (2007). Molecular modeling databases: a new way in the search of protein targets for drug development. Curr Bioinform, 2(1), 1-10. 83. Levy, J. (2000). The effects of antibiotic use on gastrointestinal function. Am J Gastroenterol, 95(1), S8-S10. 84. Crick, D. C., Mahapatra, S., & Brennan, P. J. (2001). Biosynthesis of the arabinogalactan-peptidoglycan complex of Mycobacterium tuberculosis. Glycobiology, 11(9), 107R-118R. 85. Mdluli, K., & Spigelman, M. (2006). Novel targets for tuberculosis drug discovery. Curr Opin Pharmacol, 6(5), 459-467. 86. Voskuil, M. I., Schnappinger, D., Visconti, K. C., Harrell, M. I., Dolganov, G. M., Sherman, D. R., & Schoolnik, G. K. (2003). Inhibition of respiration by nitric oxide induces a Mycobacterium tuberculosis dormancy program. J Exp Med, 198(5), 705-713. 87. Kumar, A., Toledo, J. C., Patel, R. P., Lancaster, J. R., & Steyn, A. J. (2007). Mycobacterium tuberculosis DosS is a redox sensor and DosT is a hypoxia sensor. P Natl Acad Sci Usa, 104(28), 11568-11573. 88. Hopkins, A. L., & Groom, C. R. (2002). The druggable genome. Nat Rev Drug Discov, 1(9), 727-730.

31

Chapter 2 Computational Methods

“In the beginning (if there was such a thing), God created Newton’s laws of motion together with the necessary masses and forces. This is all; everything beyond this follows from the development of appropriate mathematical methods by means of deduction. — Albert Einstein The Ultimate Quotable Einstein (2011), 397. Chapter 2

The present thesis employs rigorous computational methods like MD simulations, structure and ligand based pharmacophore modeling, docking, QSAR and ADMET property calculations in order to understand the structures of the druggable targets of M. Tb and HIV, their interactions with the small molecules as well as to identify potential lead compounds for these complex diseases. This chapter gives an account of the important computational methods that have been used in the thesis. Figure 2.1 depicts the various computational methods used in various chapters.

Computational Methods Programs/Software/Servers/Databases Used Chapter

MD Simulations CHARMM-GUI, CHARMM, NAMD Chapter 3, 4 e-Pharmacophore Phase Module, Schrodinger Chapter 4, 5, 7 Modeling Molecular Modeling Suite Pharmacophore Phase Module, Schrodinger Chapter 4, 5, 7 Screening Molecular Modeling Suite Glide Module, Schrodinger Chapter 3,4, Docking Molecular Modeling Suite, GOLD 5, 6, 7 ADMET Property QuickPropModule, Schrodinger Chapter 5, 7 Calculations Molecular Modeling Suite, DrugBank Descriptor CODESSA, GAUSSIAN09 Chapter 6 Calculation CODESSA, QSAR Modeling Chapter 6 Explorer Figure 2.1 Overview of all the computational methods used in different chapters.

2.1 Molecular Dynamics

Biological processes are complex and involve a repertoire of atomic interactions.

Although experiments help to deduce the molecular level understanding of the biological processes, they are time consuming, costly and can’t often describe the atomic interactions.

Thus, MD simulations are used to estimate the microscopic properties and dynamic motions of assemblies of a biomolecular structure. MD simulations provide an access to the thermally-

33

Chapter 2 accessible states and help to correlate them with the functions of biomolecular systems [1]. It is thus a method, which integrates the Newtonian equations of motion for 'N' particles of a system over a period of time resulting in a trajectory which is used for the calculation of the micro and macroscopic properties. The calculation of the MD trajectories is based on the principles of statistical mechanics [2]. MD simulations calculate the microscopic properties of the system such as position and velocities of each individual atom of the system. However, the properties that are of higher practical value are the macroscopic properties such as number of particles (N), volume

(V), energy (E), temperature (T), pressure (P), chemical potential of particles μ [3]. These bulk properties are used to gauge the modulations in the thermodynamic behavior of a system with time.

2.1.1 Statistical ensembles

The positions and momenta of all the particles of a system define a microscopic state.

The positions and momenta of all the particles in the system are adjudicated as coordinates of a

6N dimensional space also called as phase space [3, 4]. Thus at any given time, the system corresponds to a point of the multidimensional space. The evolution of a system with time therefore corresponds to a trajectory in the phase space and can be determined by solving the equations of motion based on the potential energy (PE) determined. An ensemble constitutes of a collection of systems with similar macroscopic properties wherein each system corresponds to a point in the phase space. There are different types of ensembles based on the set of constant macroscopic properties such as canonical (NVT), the grand canonical (μVT), micro canonical

(NVE) and the isothermal-isobaric (NPT) ensemble. The partition function as given in Table 2.1 defines the microscopic state of a system explicitly. However, due to the existence of large

34

Chapter 2

number of microscopic states in a biomolecular system and their sampling according to

Boltzmann distribution in canonical ensemble, direct calculation of NVT is not feasible.

Table 2.1 Types of statistical ensembles used in MD simulations [4]𝑍𝑍.

S. Ensemble Features Partition function Remarks No. 1. Micro canonical • Constant number of Ω(E) is the number of particles (N), volume (V) micro-states and energy (E). Ω (E) corresponding to the • Entropy of the system system's energy E increases continuously. 1 2. Canonical • Number of = , particles (N), volume ( ) (V) and temperature = 𝑗𝑗𝑚𝑚𝑚𝑚𝑚𝑚 PE(x)𝐾𝐾 =𝐵𝐵 𝑇𝑇energy of 𝛽𝛽 th (T) are constant. =1 −𝛽𝛽𝛽𝛽𝛽𝛽 𝑥𝑥 the x microstate of 𝑍𝑍𝑁𝑁𝑁𝑁𝑁𝑁 � 𝑒𝑒 the system 𝑗𝑗 1 3. Grand Canonical • Chemical potential or = , fugacity (µ), volume (V) ( ( ))

and temperature (T) are = PE(x)𝐾𝐾=𝐵𝐵 energy𝑇𝑇 of 𝑁𝑁𝑖𝑖𝜇𝜇− 𝑃𝑃𝑃𝑃 𝑥𝑥 𝛽𝛽 th constant. �𝑇𝑇𝐾𝐾𝐵𝐵 the x microstate of 𝑍𝑍𝜇𝜇𝜇𝜇𝜇𝜇 � 𝑒𝑒 𝑖𝑖 the system

2.1.2 Steps of MD simulations

Read in the parameters Ti , N, D, time, Δt

Initialization t = 0, vi , xi

Force Calculation F = -dU/dx

Integration of the laws of motion Calculate xf , vf ,

xi = xf , vi = vf Ye s t < time? t = t + Δt

No

Calculate Properties

End Figure 2.2 Flow-chart depicting the general steps of a typical MD simulation Figure 2.2 depicts the general steps of a typical MD simulation. A MD simulation is

initialized with the assignment of initial positions and velocities of all particles in the system.

35

Chapter 2

The initial velocities are assigned to particles in such a way that the total momentum is ensured to be zero, whereby the Maxwellian velocity distribution law is obeyed [5].

B Eqn. (8) 2 =

𝛼𝛼 𝜅𝜅 𝑇𝑇 〈𝜈𝜈 〉 2 Where𝑚𝑚 is the α component of the velocity of a given particle, kB is Boltzmann

𝛼𝛼 constant, T is 𝜈𝜈the absolute temperature and m is the mass of the particle. With the initial velocities assigned, the next step is the calculation of potential energy of the system using a potential function, or a description of the terms by which the particles in the simulation will interact, referred to as force fields. In chapters 3 and 4 of the present thesis, all the model systems have been studied using CHARMM force field [6] which uses the following equations to calculate the potential energy of the system.

= + Eqn. (3)

=𝐸𝐸 𝐸𝐸𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝐸𝐸𝑛𝑛𝑛𝑛𝑛𝑛 −𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏

2 ( 0) +

∑𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑠𝑠 𝑘𝑘𝑏𝑏 𝑏𝑏𝑖𝑖 − 𝑏𝑏 ( ( ))2 ( )2 + 1 + cos + ( )2 + 0 0 𝜒𝜒 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ɵ 𝑖𝑖 𝑑𝑑𝑑𝑑ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑘𝑘 𝑛𝑛𝑛𝑛 − 𝛿𝛿 𝑈𝑈𝑈𝑈 𝑈𝑈𝑈𝑈 𝑖𝑖 ∑ 𝑘𝑘 ɵ − ɵ ∑ 12 6 ∑ 𝑘𝑘 𝑆𝑆 − 𝑆𝑆 2 , , , , ( 0) + , > + , > Eqn. (9) 𝑅𝑅𝑚𝑚𝑚𝑚𝑚𝑚 𝑖𝑖 𝑗𝑗 𝑅𝑅𝑚𝑚𝑚𝑚𝑚𝑚 𝑖𝑖 𝑗𝑗 𝑞𝑞𝑖𝑖𝑞𝑞𝑗𝑗 ∑𝑖𝑖𝑖𝑖𝑖𝑖 𝑘𝑘𝜑𝜑 𝜑𝜑𝑖𝑖 − 𝜑𝜑 ∑𝑖𝑖 𝐽𝐽 𝑖𝑖 𝜖𝜖𝑖𝑖𝑖𝑖 �� 𝑟𝑟𝑖𝑖𝑖𝑖 � − � 𝑟𝑟𝑖𝑖𝑖𝑖 � � ∑𝑖𝑖 𝑗𝑗 𝑖𝑖 𝜖𝜖𝑖𝑖𝑟𝑟𝑖𝑖𝑖𝑖

Where kb, kθ, kχ, kUB, kϕ are constants; bi, θi, Si, ϕi are bond length, angle, UB 1, 3- distances, improper angle respectively; b0 θ0, S0, ϕ0 are ideal values of bond length, angle, Urey-

Bradley 1,3-distance (The Urey-Bradley term is a harmonic term in the distance between atoms 1 and 3 of (some) of the angle terms and was introduced on a case by case basis during the final optimization of vibrational spectra.), improper angle respectively; χ is dihedral value, n is multiplicity, δ is phase; εij is Lennard-Jones well depth, Rmin,ij is the distance at the Lennard-

36

Chapter 2

Jones minimum, rij is the minimum distance between two atoms, qi and qj are atomic charges, εi is the effective dielectric constant.

The potential energy calculation is followed by deducing the force acting on each particle of the system by differentiating the calculated energy with respect to the atomic positions. Force on the particles is calculated as –dE/dx. This is the most time-consuming part of almost all MD simulations. If we consider a model system with pair wise additive interactions, we have to consider the contribution to the force on particle i due to all its neighbors [5]. If we consider only the interaction between a particle and the nearest image of another particle, this implies that for a system of N particles we must evaluate N*(N-1)/2 pair distances, the time needed for the evaluation of the forces scales as N2. Efficient techniques are often used to speed up calculation of the short-range and long-range forces by computing time scales as N, rather than N2. If a given pair of particles is close enough to interact, we must compute the force between these particles, and the contribution to the potential energy. Cell list [7] and Verlet lists [8] are typically used for finding all atom pairs within a given cut-off distance of each other in MD simulations.

After calculating the force on each particle, Newton’s laws of motions are integrated to generate new positions and velocities for specified time-steps using various algorithms. For essentially all systems that we study by MD simulations, the trajectory of the system through the

6N dimensional space spanned by all particle coordinates and momenta, depends sensitively on the initial conditions. For the MD simulation studies presented in this thesis Leap-Frog algorithm

[9] has been used. This algorithm calculates the velocities at half-integer time steps and uses these velocities to compute the new positions. At first the velocities are defined at half-integer time steps. Eqn. (10) and Eqn. (11) represent the equations to update the positions and velocities of the particles in Leap-Frog algorithm. Since the velocities are not defined at the same time as

37

Chapter 2 the positions, kinetic and potential energy are also not defined at the same time, and hence, the total energy cannot be directly computed in the Leap-Frog scheme, where x(t) and v(t) are the initial position and velocity respectively, x( + ) is the updated position in time step ,

+ 𝑡𝑡 𝛥𝛥𝛥𝛥 𝛥𝛥𝛥𝛥 2 is the half integer updated velocity, f(t) is initial force on the system and m is the mass 𝛥𝛥𝛥𝛥 of𝑣𝑣 � the𝑡𝑡 system.�

Eqn. (10) ( + ) = ( ) + 2 𝛥𝛥𝛥𝛥 𝑥𝑥 𝑡𝑡 𝛥𝛥𝛥𝛥 𝑥𝑥 𝑡𝑡 − 𝛥𝛥𝛥𝛥𝛥𝛥 �𝑡𝑡 � ( ) Eqn. (11) + = + 2 2 𝛥𝛥𝛥𝛥 𝛥𝛥𝛥𝛥 𝑓𝑓 𝑡𝑡 𝑣𝑣 �𝑡𝑡 � 𝑣𝑣 �𝑡𝑡 − � 𝛥𝛥𝛥𝛥 2.2 Pharmacophore modeling 𝑚𝑚

Pharmacophore modeling and screening have gained popularity in last few years due to their simple way of capturing and representing the chemical features of a compound responsible for making interactions with the target proteins. In the current thesis structure and ligand based pharmacophore models have been used in combination with the MD simulations for in silico screening in order to include the flexibility of the active sites. According to IUPAC definition

[10], “A pharmacophore is the ensemble of steric and electronic features that is necessary to ensure the optimal supra-molecular interactions with a specific biological target structure and to trigger (or to block) its biological response.” The chemical signatures identified in a molecule which are actually responsible for making a certain type of non-covalent interactions with the receptor are called as pharmacophore features. A few examples of such functional features are

H-bond donors, H-bond acceptors, aromatic rings (may be ring atoms, ring center, or normal to the ring), hydrophobic centers (also called neutral centers), positive charge centers, negative charge centers, acidic groups, basic groups, bulky groups engaged in steric interactions, planar atoms, CO2 centroid (i.e., ester or carboxylic acid), metal (also called a metal ligator) and

38

Chapter 2 excluded volumes, i.e., forbidden regions, where the protein is and the ligand cannot have functional groups [11]. Pharmacophore models can be created either in a ligand based manner, wherein a set of active molecules are superposed followed by extraction of common chemical features that are essential for their bioactivity, or in a structure-based manner, where the possible key interaction points between the receptor and ligands are probed.

2.2.1 Ligand-based pharmacophore

Ligand-based pharmacophore models are generated by extracting common chemical features from 3D structures of a set of known ligands representing the key interactions between the ligands and their target. In general, pharmacophore generation from multiple ligands is performed in two steps: conformer generation for each ligand in the training set and alignment of the ligands in the training set and determining the essential common chemical features to construct pharmacophore models. Handling conformational flexibility of ligands and conducting molecular alignment represent the key techniques and also the main difficulties in ligand-based pharmacophore modeling [12]. Various programs such as HipHop, HypoGen (Accelrys Inc., http://www.accelrys.com), DISCO, GASP, GALAHAD (Tripos Inc.,http://www.tripos.com),

PHASE (Schrödinger Inc., http://www.schrodinger.com) and MOE (Chemical Computing

Group, http://www.chemcomp.com) have been developed for generating ligand based pharmacophore models. These software mostly differ in the algorithms used for conformation generation of ligands and for the alignment of molecules.

2.2.2 Structure-based pharmacophore

Structure-based pharmacophore modeling requires the 3D structure of the receptor or a receptor-ligand complex. The models are generated based on the complementary chemical

39

Chapter 2 features of the active site and their spatial relationships followed by pharmacophore model assembly with selected features. The structure-based pharmacophore models can be derived from the receptor-ligand-complex or/and from the receptor (without ligand). The receptor–ligand- complex-based approach is useful in locating the ligand-binding site of the target and determining the key interaction points between ligands and protein. LigandScout [13], Pocket v.2

[14] and GBPM [15] are few examples of programs that incorporate the receptor-ligand- complex-based methods for generating structure based pharmacophore models. The limitation of this approach is the need for the 3D structure of receptor-ligand complex, implying that it cannot be applied to cases when no compounds targeting the binding site of interest are known. This limitation can be overcome by the structure-based pharmacophore method. The structure-based pharmacophore method implemented in is one of the examples of a receptor- based approach that converts LUDI [16] interaction maps within the protein-binding site into

Catalyst pharmacophoric features.

2.2.3 Dynamic pharmacophore

The active sites of the drug targets are mostly very flexible. Hence, structure based pharmacophore models generated from a single conformational state of the receptor would not adequately account for the potential drug-receptor interactions. In this context, MD simulation is an efficient method to address the receptor flexibility issues in SBDD. Carlson et al. [17] developed a “dynamic” pharmacophore model of HIV integrase from an ensemble of MD snapshots and identified two new inhibitors screening the Available Chemical Database (ACD).

Another approach by Moitessier et al. involves prediction of binding modes by analyzing the dynamic pharmacophore model and orientation of the pharmacophore points. Several studies have been carried out by Carlson and coworkers, which demonstrated the successful

40

Chapter 2 incorporation of protein flexibility in SBDD applied over a range of pharmacologically relevant targets [18, 19].

2.2.4 e-Pharmacophore

The e-Pharmacophores method of Schrodinger suite is a new approach that utilizes the grid-based ligand docking with energetics (Glide) extra precision (XP) scoring function [20] to accurately characterize protein-ligand interactions. Each pharmacophore feature site is first assigned an energetic value equal to the sum of the Glide XP contributions of the atoms comprising the site, allowing sites to be quantified and ranked on the basis of the energetic terms.

Glide XP descriptors include terms for hydrophobic enclosure, hydrophobically packed correlated H-bonds, electrostatic rewards, π-π stacking, cation-π, and other interactions.

ChemScore H-bonding and lipophilic atom pair interaction terms are included while the Glide

XP terms for H-bonding and hydrophobic enclosure are zero. E-Pharmacophores also allow for excluded volumes representing the regions of space occupied by the receptor where any portion of the ligand can’t be accommodated. E-Pharmacophores have been shown to fetch diverse set of actives than traditional structure-based pharmacophore methods, making it more useful [21]. In chapter 4 and 7, e-pharmacophore models have been generated from the MD snapshots and docked complexes. These models have been validated by screening active compounds. The scoring function and details of screening method and parameters will be discussed in the corresponding chapters.

2.3 Docking

Docking is an automated computer algorithm that determines how a compound will bind in the active site of a protein. This includes determining the orientation of the compound, its conformational geometry, and the scoring. The scoring may be a binding energy, free energy, or

41

Chapter 2 a qualitative numerical measure [22]. There are two key components of a docking program namely the search algorithm and the scoring function. The search algorithm automatically tries to generate many different orientations and conformations of the compound in the active site, followed by computing a score for each. The identified orientations are sampled further through energy minimization to obtain the optimal conformations. The choice of the search algorithm determines the thoroughness of the program in checking the possible positions of the molecule and time taken. The scoring function is responsible for determining if the orientations chosen by the search algorithm are energetically the most favorable, and is responsible for computing the binding energy. In the present thesis docking studies have been carried out using the grid-based

Glide [23] method of Schrodinger suite in chapters 3 to 7. The genetic optimization for ligand docking (GOLD) program [24] has also been used in chapter 6 along with Glide. The following sub sections give accounts of the search algorithms and scoring functions of these two programs.

2.3.1 Glide

Hierarchical series of filters are used by Glide to search for possible locations of the ligand in the binding pockets of the receptor. The shape and properties of the receptor are represented on a grid by several different sets of fields that provide progressively more accurate scoring of the ligand poses. Glide performs an extensive conformational search complemented by a heuristic screen which eliminates energetically unstable conformations. For each generated conformation an exhaustive search of possible locations and possible interaction with the active site residues are done. In the next level, Glide examines the placement of atoms that lie within the ligand diameter and omits the orientations that have many steric clashes with the receptor.

The third stage in the hierarchy is energy minimization on the pre-computed OPLS-AA van der

Waals and electrostatic grids for the receptor. Finally, the minimized poses are re-scored using

42

Chapter 2

Schrödinger’s proprietary GlideScore scoring function. The XP mode of Glide [20] combines a powerful sampling protocol with the use of a custom scoring function that avoids the false positives. XP mode involves more extensive XP docking method and specialized XP scoring method which are strongly coupled. The scoring functions for the Glide Gscore and Glide XP are given below as Eqn. 12 and Eqn. 13.

= 0.065 + 0.130 + + + + + + Eqn. (12) 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 ∗ 𝑣𝑣𝑣𝑣𝑣𝑣 ∗ 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵

𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 ( 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆) = + + + + + + + + + + + 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑋𝑋𝑋𝑋 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 + + + + Eqn. (13) 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿 𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋 𝑋𝑋𝑋𝑋𝑋𝑋ℎ𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑃𝑃ℎ𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋 𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋 𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋 𝑋𝑋𝑋𝑋𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋 Where, Glerotb (Penalty for freezing rotatable bonds in Glide score), GlEientern (internal torsional energy), GlEvdw (van der Waals energy), GlEcoul (Coulomb energy), GlEnergy

(Modified Coulomb Van der Waals interaction energy), GlEmodel (model energy), Glligeff

(Glide score/number of heavy atoms), LigEffsa (Glligeff, approximating the effect of surface area), XPlipvdw (Lipophilic term derived from hydrophobic grid potential), PhobEn

(Hydrophobic enclosure reward), XPPhoben (Reward for hydrophobically packed hydrophobic groups), XPelectro (Electrostatic rewards; includes Coulomb and metal terms), XPlowmw

(Reward for ligands with low molecular weight), XPPenal (Polar atom burial and desolvation penalties, and penalty for intra-ligand contacts), XProtpenal (Rotatable bond penalty). XPpicat

(pi-cation interactions).

2.3.1 GOLD

GOLD uses genetic algorithm for conformational search and a fitness function, a particular type of objective function is used to summarize, as a single figure of merit, how

43

Chapter 2 energetically favorable a given conformation is. Each conformational state is commonly represented as a string of numbers (referred to as a chromosome). After each round of testing, or simulation, the idea is to skip the 'n' worst conformations, and to breed 'n' new ones from the best solutions. Each conformation, therefore, needs to be awarded a figure of merit, to indicate how close it came to meet the overall specification, and this is generated by applying the fitness function. The GOLD fitness function given by Eqn. 14 and 15 is made up of four components: protein-ligand hydrogen bond energy ( ), protein-ligand van der Waals (vdw) energy

𝑒𝑒𝑒𝑒𝑒𝑒 ( ), ligand internal vdw energy ( 𝐻𝐻𝐻𝐻 ), ligand torsional strain energy (torsion).

𝑣𝑣𝑣𝑣𝑣𝑣𝑒𝑒𝑒𝑒𝑒𝑒 𝑣𝑣𝑣𝑣𝑣𝑣𝑖𝑖𝑖𝑖𝑖𝑖 = + 1.3750 + + 1.0000 Eqn. (14)

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐻𝐻𝐻𝐻𝑒𝑒𝑒𝑒𝑒𝑒 ∗ 𝑣𝑣𝑣𝑣𝑣𝑣𝑒𝑒𝑒𝑒𝑒𝑒 𝐻𝐻𝐻𝐻𝑖𝑖𝑖𝑖𝑖𝑖 ∗ 𝐿𝐿𝐿𝐿𝐿𝐿𝑖𝑖𝑖𝑖𝑖𝑖 = + torsion Eqn. (15)

𝐿𝐿𝐿𝐿𝐿𝐿𝑖𝑖𝑖𝑖𝑖𝑖 𝑣𝑣𝑣𝑣𝑣𝑣𝑖𝑖𝑖𝑖𝑖𝑖 2.4 QSAR

Analogue based approaches for rational drug design have also emerged in parallel to the structure based approaches. These approaches complement the structure based approaches where the structure of the target is unknown, but the active inhibitors for the target are known. The main concept of analogue based drug design is based on a belief that chemical structure and biological activity of the analogues of a drug are often similar to the lead drug [22]. The QSAR modeling is one of the analogue based computational tools, which establishes a quantitative correlation between biological activity/toxicity/property of a molecule and its structural features.

In QSAR study, the variations of biological activity/toxicity/property within a series of compounds are correlated with changes in a group of computed features of the molecules referred to as descriptors. In chapter 6 of the present thesis, exhaustive QSAR studies have been

44

Chapter 2 carried out with constitutional, topological, geometrical, electrostatic, quantum chemical and in addition C-DFT and docking based descriptors.

QSAR method to predict a certain property of a molecule from its structure as a mathematical expression in the form of

= 1 1 + 2 2 + +C Eqn. (16)

𝑦𝑦 𝑚𝑚 𝑥𝑥 𝑚𝑚 𝑥𝑥 ⋯ where, y is the predicted property (the dependent variable) and x1, x2 … are the known molecular properties called descriptors. QSAR uses descriptors that are a single number describing some aspect of the molecule, such as molecular weight, number of atoms, topological indices etc. The coefficients m1, m2… in the QSAR equation are weights of the descriptors obtained by using various curve fitting methods. The activities and properties being modeled by QSAR/QSPR are known as dependent variables (y) of the QSAR model. A dependent variable can be a biological property such as receptor binding, inhibition constant, permeability, pharmacokinetics, biodegradation, carcinogenicity, drug metabolism and clearance, mutagenicity, toxicity etc. or a chemical property such as boiling point, chromatographic retention time, dielectric constant, diffusion coefficient, dissociation constant, melting point, reactivity, solubility, stability, thermodynamic properties, viscosity etc.[22].

2.4.1 Descriptors

QSAR modeling typically describes molecular structures in terms of the descriptors and then correlates these molecular descriptors with observed activities using various statistical methods. The first step of QSAR modeling is preparation of a dataset of molecules with their activities, which follow a uniform distribution and calculation of descriptors. Molecular descriptors are chemical information that is encoded within the molecular structures and are collectively responsible for a particular activity of the molecule [25]. The descriptors serve as the

45

Chapter 2 independent variables of a QSAR model. Steps of a typical QSAR model generation protocol and various categories of descriptors employed in QSAR have been shown in Figure 2.3 [26, 27].

Figure 2.3 Steps of QSAR modeling.

Constitutional descriptors are simple descriptors that represent only the molecular composition of the compound independent of the geometry and electronic structure. Examples are number of atoms, number of bonds, molecular weight etc. Topological descriptors/topological indices describe the atomic connectivity in the molecule. Examples are

Wiener index, Randic and Kier & Hall indices, Kier flexibility index, Information content index and its derivatives etc. [26]. Geometrical descriptors are dependent upon 3D-coordinates of the atoms in the given molecule. For example, moments of inertia, shadow indices, molecular volume, molecular surface area, gravitation indexes etc. Electrostatic descriptors are calculated

46

Chapter 2 based on the charge distribution of the molecule. Examples are topological electronic index and charged partial surface area descriptors. Quantum-chemical descriptors are calculated from quantum chemical data at various levels of theory. For example Extreme (maximum and minimum) values of the atomic nucleophilic (NA), electrophilic (EA) and one-electron (RA) Fukui reactivity indices, εLUMO and εHOMO etc. [27]. Hydrophobicity descriptors such as log P, aqueous solubility and chromatographic parameters are also very useful for QSAR studies [26]. However, development of simple and new descriptors is still a topic of high interest [29, 30].

Among the new descriptors the DFT based ones are extensively studied. In many studies

DFT based descriptors show good performance in predicting the biological activities [31]. DFT was founded within the two basic theorems provided by Hohenberg and Khon in the 1960s [32,

33]. Performance of the DFT method in the description of structural, energetic, and magnetic molecular properties has been quite substantially reviewed in recent times. DFT methods are, in general, capable of generating a variety of isolated molecular properties, such as ionization energies, dipole moment, electrostatic potential, electron affinities, electronegativities, electrophilicity index, chemical potential and hardness, etc., quite accurately [34].

Employment of docking scores as QSAR descriptors is one of the new approaches. The free energies of binding calculated by MMPBSA/GBSA methods are also tested in several studies and they show excellent correlation with the bioactivities [35]. Once descriptors are computed, it is very crucial to choose the descriptors that should be included in the QSAR model. Preprocessing of the dataset should also be performed carefully as anomalies, errors, missing/ incomplete data may lead to severe erroneous/misleading predictions. The data should also be normalized or standardized where there is a large range of variability in the dataset. Inter- correlated descriptors should be removed from the dataset before the model construction. [36].

47

Chapter 2

2.4.2 Generation of the QSAR equation

Various techniques based on the multi-linear regression (MLR) analysis are employed in order to achieve the QSAR equation. This equation essentially correlates the variation of activities of the molecules as a function of the variations of the molecular structures present in the molecular data set [37]. MLR analysis is usually used to correlate a given bioactivity with molecular descriptors. Different statistical methods come into play for building a QSAR model.

Depending on the type of dataset and other parameters, however, it is possible to generate nonlinear equations that contain exponents of best fit, logarithms of descriptors, etc. MLR, principal component regression (PCR), partial least square, artificial neural network (ANN), genetic function approximation (GFA), factor analysis, discriminant analysis, cluster analysis are a few of the statistical methods that can be employed in the QSAR modeling [38].

2.4.3 Statistical parameters

For the linear QSAR equations the correlation coefficient r2gives a quantitative measure of how well the descriptor describes the activity [37]. r2 is calculated as follows

1 ( )2 Eqn. (17) 2 = ( )2 − ∑ 𝑦𝑦𝑜𝑜𝑜𝑜𝑜𝑜 − 𝑦𝑦𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑟𝑟 ∑ 𝑦𝑦𝑜𝑜𝑜𝑜𝑜𝑜 − 𝑦𝑦𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 where, ycalc, yobs and ymean are predicted, actual, and mean values of the target property respectively. Thus, the descriptors with the highest correlation coefficient can be selected. The predictive power of a QSAR model can be verified through statistical measures such as the correlation coefficient between actual and predicted values. Various statistical parameters such as cross validated correlation coefficient, Fisher statistic (F-value) values etc.

48

Chapter 2

2 2 2 Cross validated r , also called as q or r cv signifies how best the model predicts. It is calculated by omitting each compound once from the training set, then predicting its activity using the model constructed from the remaining compounds. The model thus built with the remaining molecules is used to predict the response of the deleted compound/compounds. This cycle is repeated till all the molecules of the dataset have been deleted once. The cross-validated squared correlation coefficient q2 is calculated as follows.

1 ( )2 Eqn. (18) 2 = ( )2 − �∑ 𝑦𝑦𝑜𝑜𝑜𝑜𝑜𝑜 − 𝑦𝑦𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑞𝑞 ∑ 𝑦𝑦𝑜𝑜𝑜𝑜𝑜𝑜 − 𝑦𝑦𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 where, ycalc, yobs and ymean are predicted, actual, and mean values of the target property respectively. F-value is also an important measure of the statistical significance of the regression model, which is given by the following equation [39].

2 Eqn. (19) = 1 2 𝑟𝑟 𝐹𝐹 where r−2 is𝑟𝑟 the correlation coefficient. Also as an external validation, some of the compounds with known results are left out of the training set to be used as a test of the predictive ability of the QSAR model.

QSAR is a valuable tool for predicting molecular properties that cannot be computed any other way. It is very useful or the prediction of a wide range of biological properties, essential to identify potential leads [40]. Although it may not be a reliable tool to predict drug activity, pharmacokinetic properties, such as blood–brain barrier permeability and passive intestinal absorption etc. can be fairly predicted by QSAR method. Hence, QSAR models are of immense help to predict the properties of new and untested compounds possessing analogous molecular structures as compounds used in the development of the models.

49

Chapter 2

50

Chapter 2

References

1. Frenkel, D., & Smit, B. (2001). Understanding molecular simulation: from algorithms to applications (Vol. 1). Academic press.Allen MP, Tildesley DJ (1987) Computer Simulations of Liquids. Oxford University Press. 2. Rapaport, D. C. (2004). The art of molecular dynamics simulation. Cambridge university press. 3. Gibbs, J. W. (2014). Elementary principles in statistical mechanics. Courier Corporation. 4. Zeigler, B. P., Praehofer, H., & Kim, T. G. (2000). Theory of modeling and simulation: integrating discrete event and continuous complex dynamic systems. Academic press. 5. Field, M. J. (1999). A practical introduction to the simulation of molecular systems. Cambridge University Press. 6. Mattson, W., & Rice, B. M. (1999). Near-neighbor calculations using a modified cell- linked list method. Comput Phys Commun, 119(2), 135-148. 7. Verlet, L. (1967). Computer" experiments" on classical fluids. I. Thermodynamical properties of Lennard-Jones molecules. Phys Rev, 159(1), 98. 8. Birdsall, C. K., & Langdon, A. B. (2014). Plasma physics via computer simulation. CRC Press. 9. Wermuth, C. G., Ganellin, C. R., Lindberg, P., & Mitscher, L. A. (1998). Glossary of terms used in medicinal chemistry (IUPAC Recommendations 1998). Pure Appl Chem, 70(5), 1129-1143. 10. Wermuth, C. G. (2006). Pharmacophores: historical perspective and viewpoint from a medicinal chemist. Methods and Principles in Medicinal Chemistry, 32, 3. 11. Yang, S. Y. (2010). Pharmacophore modeling and applications in drug discovery: challenges and recent advances. Drug Discov Today, 15(11), 444-450. 12. Ekins, S., de Groot, M. J., & Jones, J. P. (2001). Pharmacophore and three-dimensional quantitative structure activity relationship methods for modeling cytochrome P450 active sites. Drug Metab Dispos, 29(7), 936-944. 13. Wolber, G. et al. (2005) LigandScout: 3-D pharmacophores derived from protein bound ligands and their use as virtual screening filters. J. Chem. Inf. Model. 45, 160–169. 14. Chen, J. and Lai, L.H. (2006) Pocket v.2: further developments on receptor-based pharmacophore modeling. J Chem Inf Model 46, 2684–2691. 15. Ortuso, F. et al. (2006) GBPM: GRID based pharmacophore model. Concept and application studies to protein–protein recognition. Bioinformatics 22, 1449–1455. 16. Bo¨hm, H.J. (1992) The computer program LUDI: a new method for the de novo design of enzyme inhibitors. J Comput Aid Mol Des 6, 61–78. 17. Carlson, H. A., Masukawa, K. M., Rubins, K., Bushman, F. D., Jorgensen, W. L., Lins, R. D., McCammon, J. A. (2000). Developing a dynamic pharmacophore model for HIV-1 integrase. J Med Chem, 43(11), 2100-2114. 18. Carlson, H. A. (2002). Protein flexibility is an important component of structure-based drug discovery. Curr Pharm Des, 8(17), 1571-1578. 19. Carlson, H. A., Masukawa, K. M., & McCammon, J. A. (1999). Method for including the dynamic fluctuations of a protein in computer-aided drug design. J Phys Chem A, 103(49), 10213-10219.

51

Chapter 2

20. Dixon, S.L.; Smondyrev, A.M.; Knoll, E.H.; Rao, S.N.; Shaw, D.E.; Friesner, R.A. Dixon, S. L., Smondyrev, A. M., Knoll, E. H., Rao, S. N., Shaw, D. E., & Friesner, R. A. (2006). PHASE: a new engine for pharmacophore perception, 3D QSAR model development, and 3D database screening: 1. Methodology and preliminary results. J Comput Aided Mol Des, 20(10-11), 647-671. 21. Friesner, R. A., Murphy, R. B., Repasky, M. P., Frye, L. L., Greenwood, J. R., Halgren, T. A., Mainz, D. T. (2006). Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. J Med Chem, 49(21), 6177-6196. 22. Loving, K., Salam, N. K., & Sherman, W. (2009). Energetic analysis of fragment docking and application to structure-based pharmacophore hypothesis generation. J Comput Aided Mol Des, 23(8), 541-554. 23. Young, D. C. (2009). Computational drug design: a guide for computational and medicinal chemists. John Wiley & Sons. 24. Friesner, R. A., Banks, J. L., Murphy, R. B., Halgren, T. A., Klicic, J. J., Mainz, D. T., Shenkin, P. S. (2004). Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J Med Chem, 47(7), 1739-1749. 25. Morris, G. M., Goodsell, D. S., Halliday, R. S., Huey, R., Hart, W. E., Belew, R. K., & Olson, A. J. (1998). Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J Comput Chem, 19(14), 1639-1662. 26. Todeschini, R., & Consonni, V. (2008). Handbook of molecular descriptors (Vol. 11). John Wiley & Sons. 27. Katritzky AR, Lobanov VS, Karelson, M. (1994) CODESSA 2.0, Comprehensive Descriptors for Structural and Statistical Analysis. University of Florida, U.S.A. 28. Karelson, M., Lobanov, V. S., & Katritzky, A. R. (1996). Quantum-chemical descriptors in QSAR/QSPR studies. Chem Rev, 96(3), 1027-1044. 29. Helguera, A. M., Combes, R. D., González, M. P., & Cordeiro, M. N. D. S. (2008). Applications of 2D descriptors in drug design: a DRAGON tale. Curr Top Med Chem, 8(18), 1628-1655. 30. Badrinarayan, P., Srivani, P., & Sastry, G. N. (2011). Design of 1-arylsulfamido-2- alkylpiperazine derivatives as secreted PLA2 inhibitors. J Mol Model, 17(4), 817-831. 31. Srivani, P., Srinivas, E., Raghu, R., & Sastry, G. N. (2007). Molecular modeling studies of pyridopurinone derivatives—potential phosphodiesterase 5 inhibitors. Journal of Molecular Graphics and Modelling, 26(1), 378-390. 32. Parr, R. G. (1983). Density functional theory. Annu Rev Phys Chem, 34(1), 631-656. 33. Singh, P. P., Srivastava, H. K., & Pasha, F. A. (2004). DFT-based QSAR study of testosterone and its derivatives. Bioorganic & medicinal chemistry, 12(1), 171-177. 34. Wadehra, A., & Ghosh, S. K. (2005). A density functional theory-based chemical potential equalisation approach to molecular polarizability. J Chem Sci, 117(5), 401-409. 35. Kumar Srivastava, H., Choudhury, C., & Narahari Sastry, G. (2012). The efficacy of conceptual DFT descriptors and docking scores on the QSAR models of HIV protease inhibitors. Med Chem, 8(5), 811-825. 36. Srivastava, H. K., & Sastry, G. N. (2012). Molecular dynamics investigation on a series of HIV protease inhibitors: assessing the performance of MM-PBSA and MM-GBSA approaches. J Chem Inf Model, 52(11), 3088-3098.

52

Chapter 2

37. Nantasenamat, C., Isarankura-Na-Ayudhya, C., Naenna, T., & Prachayasittikul, V. (2009). A practical overview of quantitative structure-activity relationship. EXCLI J, 8(7). 38. Mannhold, R., Krogsgaard-Larsen, P., & Timmerman, H. (2008). QSAR: Hansch analysis and related approaches (Vol. 1). H. Kubinyi (Ed.). John Wiley & Sons. 39. Dehmer M, Varmuza K, Bonchev D (eds) (2012) Statistical modeling of descriptors in QSAR and QSPR. Wiley-VCH. 40. Wold, S. (1991). Validation of QSAR's. Quant Struct-Act Rel, 10(3), 191-193.

53

Chapter 3 Active Site Dynamics of CmaA1 during Various Stages of the Cyclopropanation Process

So many of the chemical reactions occurring in living systems have been shown to be catalytic processes occurring isothermally on the surface of specific proteins, referred to as enzymes, that it seems fairly safe to assume that all are of this nature and that the proteins are the necessary basis for carrying out the processes that we call life. — John Bernal J.D. Bernal: The Sage of Science (2005), 359.

Chapter 3

3.1 Background

Unique architecture of the thick waxy cell wall of M. Tb is very crucial for the integrity of the bacteria. This cell wall prevents dehydration, offers protection against varying levels of pH and detrimental effects of free radicals, which is essential for the pathogenicity, persistence and drug resistance [1-3]. Thus, characterizing the proteome [3] and study of the dynamics and permeability of the cell wall [4, 5] is interesting in its own right. The main constituents which impart the above characteristics to the cell wall are mycolic acids, a group of highly hydrophobic long-chain α-alkyl-β-hydroxy fatty acids [6-8]. Mycolic acids have been proposed to be biosynthesized via a diversion in normal fatty acid metabolism in which short chain fatty acids are extended and modified to form lipids of exceptional length [9]. M. Tb cell wall contains three types of mycolic acids, viz., α-mycolates and oxygenated keto- and methoxy- mycolates, out of which α-mycolates are the most abundant and important type of mycolic acids with an average length of 70-80 carbons [9, 10]. It has been reported that in pathogenic M. Tb majority of the usaturated α-mycolates undergo cyclopropanation of the double bonds using S- adenosyl-L-methionine (SAM) as the methyl donor by a family of enzymes called cyclopropane synthases, which catalyse the cyclopropanation selectively at different positions of the unsaturated mycolates [10-17] and this cyclopropanation is essential for the proper functioning of the mycolic acids [18]. We have considered M. Tb CmaA1 for our study which is responsible for cis-cyclopropanation at the distal position of α-mycolates. It has been experimentally shown that over expression of the protein CmaA1 makes the bacteria resistant to hydrogen peroxide, suggesting that the cyclopropanation at the distal positions may be an important adaptation of M.

Tb against oxidative stress. Hence CmaA1 is an important target for anti-TB therapy [19]. The crystal structures of apo and holo forms of CmaA1 reveal the existence of two distinct binding

54

Chapter 3 sites viz., a CBS containing S-adenosyl-l-homocysteine (SAHC) and an ASBS containing a lipid inhibitor DDDMAB [20]. The lipid inhibitor mimics the carbocation intermediate in the cyclopropanation reaction and points to be the possible site of binding for the hydrophobic mycolic acids, which provide valuable information on the ligand orientation in the active site.

CmaA1 has several additions to the core methyl transferase fold such as, a short helix inserted between β4 and α4, 2 helices inserted between β5 and α5, 3 helices inserted between β6 and β7 and additions that form active site cover [21, 22]. CmaA1 shows >50% sequence identity with the other proteins of the family CmaA2, and PcaA, which may point to the similarity of reaction mechanism [23]. The mechanism of cyclopropanation as proposed by earlier studies [24-30] is illustrated in Scheme 3.1. In the first step, the transfer of a methyl group from SAM to the substrate double bond occurs to form a carbocation intermediate and SAM converts to SAHC after the methyl transfer. Then the bicarbonate ion acts as a base to take a proton from the methyl group, resulting in the ring closure [27]. DFT studies by Liao et al., (2011) have shown that the rearrangement of the carbocation intermediate after methyl transfer has a slightly lower barrier than the proton transfer supporting this mechanism.

Scheme 3.1 Mechanism of cyclopropanation of unsaturated mycolates in mycobacteria.

55

Chapter 3

A)

B)

Figure 3.1 Crystal structures of CmaA1. A. Holo (IKPH) and Apo (1KP9). The various secondary structural components and major motifs are shown for the holo structure. The major differences in the apo and holo structures are also demarcated with different colors (10A is the other name for DDDMAB ligand), B. Topology of the different secondary structural elements of CmaA1.

56

Chapter 3

As studied from the crystal structure, this protein consists of 7 β strands (12%) and 14

α helices (51%). The core region comprises of the β strands arranged in a parallel fashion (except for β6 which is antiparallel to its neighboring strands β5 and β7) as β3-β2-β1-β4-β5-β7-β6 and α helices situated at both sides of each strand. The CBS consists of 4 structural motifs viz., the loop between β1 and α4 (motif I), the loop between β2 and α5 (motif II), the loop between β3 and α6

(motif III), α7 (motif IV), α1 of the N-terminus and the loop between α1 and α2. The major difference in the apo and the holo structures of CmaA1 lies in residue range 137-144 which is a two turn helix in the holo form while is a loop in the apo form. This constitutes a major part of motif IV involved in cofactor binding. Also the residues 170-210 of ASBS are situated towards the surface of the apo protein (Figure 3.1).

In this chapter, we aim to study the structural properties, energetics and dynamics of the active sites of CmaA1 in detail as understanding the rigidity and flexibility of the binding sites is essentially a first step in rational inhibitor design [31]. The apo and holo model systems of CmaA1 were taken to elucidate the precise changes in the conformations of the binding pockets using MD simulation and to correlate the structure with the catalytic and mechano- chemical functions of the enzymes [32, 33]. The stabilities and roles of various molecular interactions in the active sites have been thoroughly studied using the MD simulations. Particular attention has been paid to the way in which non bonded interactions such as H-bonds modulate the active site dynamics. The cation-π and the π-π interactions also play major roles in maintenance of the protein structure and also facilitate various enzyme catalyses [34, 35]. CmA1 has been recognized as an important drug target, and inhibitors of this enzyme have been reported in the literature including thiacetazone and its analogues [36] and dideoxy nucleosides

[37]. Upon CmaA1 inhibition by thiacetazone and its chemical analogues, Alahari et al. have

57

Chapter 3 observed significant reduction in the mycolic acid contents in various mycobacterial strains. Rai et al. have synthesized and experimentally tested various classes of dideoxy nucleosides as potent and selective inhibitors of Mycobacterium bovis, M. Tb and drug-resistant M. Tb. A better understanding of the conformational changes of the active site of CmA1 is expected to immensely help in structure based drug design. With respect to the several stages in the reaction cycle, the geometric dynamic properties are quite likely to change with respect to each other, and hence one needs to consider all these conformations/states during ligand design.

Five model systems of CmaA1 representing various stages of cyclopropanation have been considered and we propose Scheme 3.2 for the cyclopropanation cycle based on the previous studies.

Scheme 3.2 Schematic representation of the cyclopropanation cycle

58

Chapter 3

3.2 Methodology

3.2.1 Model systems

The following five model systems of CmaA1 were generated and subjected to MD simulation using the CHARMM program [38] (Scheme 3.2). All the systems except the first one have been modeled from the crystal structure of the holo structure (PDB ID: 1KPH), whereas the first one, E has been modeled from the crystal structure of the Apo form (PDB ID: 1KP9) of

CmaA1. E: Apo form of CmaA1. As the first 17 N-terminal residues for the apo form were unresolved in the original apo crystal structure, they were modeled from the holo structure. E-

SAM: Holo form of CmaA1 with SAM in the CBS. SAM was modeled from SAHC present in the original PDB structure by adding a methyl group to the S atom followed by 100 steps of

Adapted Basis Newton Raphson (ABNR) minimization. The ligand present in the ASBS was removed. E-SAM-S: Holo form with SAM in the CBS and the substrate in the ASBS. The unsaturated substrate (Scheme 3.1) was modeled from the ligand DDDMAB present in the original crystal structure (Figure 3.1A). Both SAM and substrate molecules were subjected to

100 steps of ABNR minimization. E-SAHC-P: Holo form with SAHC in the CBS and the cyclopropanated product in the ASBS. The cyclopropanated product (Scheme 3.1) was also modeled from DDDMAB present in the original crystal structure. The product molecule was

- subjected to 100 steps of ABNR minimization. The HCO3 molecule was converted to H2CO3 in order to represent a system after cyclopropanation. E-SAHC: Holo form with SAHC in the CBS.

3.2.2 Molecular dynamics (MD) simulations

The individual components of each system were generated using CHARMM GUI [39].

Topology files for the substrate and product molecules were created and parameters were

59

Chapter 3 assigned to the atoms using CGenFF [40]. Some of the parameters for the cyclopropanated product were taken from a recent literature [41]. The topology and parameters for the cofactors

SAHC and SAM are available along with the CHARMM protein force field [42, 43]. The protonation states of the histidine residues were assigned manually by examining their local environments in the crystal structure. The whole systems were minimized using a 1000 step

ABNR method to remove any nonphysical contacts. The systems were then solvated in a TIP3P water box of 80 x 75 x 75Å3 dimension and appropriate numbers of counter ions were added to neutralize the systems. The systems were then equilibrated for 1 ns in the NVT ensemble using periodic boundary conditions after 500 steps of steepest descent (SD) and 500 steps of ABNR energy minimizations. During the initial minimization and equilibration, the solute molecules were restrained by applying a constant harmonic force of 5 kcal/mol. The systems were then subjected to 40 ns MD simulation in the NPT ensemble at 298K with Leap Frog integrator and a time step of 0.002 ps. Langevin piston algorithm, SHAKE and particle mesh Ewald (PME) were used to perform pressure control, constraint covalent bonds involving hydrogen atoms and to treat long range interactions respectively. The coordinates were saved every 5 ps for analysis.

3.3 Results and discussion

3.1.1 Analysis of structural properties

In this section, we start with the description of the model systems and a systematic analysis and comparison of the following properties of these 5 systems of the cycle, viz., root mean square deviation (RMSD), radius of gyration (RGyr), solvent accessible surface area

(SASA), root mean square fluctuation (RMSF) and covariance matrices have been done with the

MD trajectories. Energetic properties were studied by calculating the interaction energies of

60

Chapter 3 cofactor and substrate/product binding. Table 3.1 represents the averages of these structural and energetic properties and the averaging has been done on the final 30 ns of the trajectories. The errors are the standard deviations, which are quite low (<10%) signifying the reliability of these values. The active site dynamics during cyclopropanation has been studied by analyzing the stabilities of the H-bond interactions in the active sites of the 5 model systems.

Table 3.1 Averages of various structural and energetic properties of the systems along with the standard deviations.

E E-SAM E-SAM-S E-SAHC-P E-SAHC

Structural properties

RMSD 1.35± 0.08 2.31± 0.04 1.67± 0.07 2.10± 0.06 1.57± 0.07 (Å)

RGyr 18.75 ± 0.09 18.79 ± 0.07 18.78 ± 0.04 18.81 ± 0.09 18.75 ± 0.07 (Å)

SASA of the whole system 12758.30±71.9 13004.54±138.0 13008.17±175.1 12806.65±84.7 12822.98±60.9 (Å2) Surface area buried by the cofactors 292.41±3.71 268.45±5.68 268.90±19.45 236.03±5.84 (Å2) Surface area buried by the 266.51±5.12 262.36±7.09 substrate/product(Å2) Energetics

Interaction energy of the cofactors -237.04±5.37 -304.82±2.44 -131.24±8.14 -129.19±6.08 (kcal/mol)

Interaction energy of the substrate/product -56.24±0.49 -51.95±0.49 (kcal/mol)

The averages of the RMSD of the systems in the last 30 ns are below 2.5Å (Table 3.1), which suggests that the systems do not show many deviations during the simulations and are well equilibrated. The systems E-SAM and E-SAHC-P show marginally higher RMSD,

61

Chapter 3 however, no major structural rearrangements were noticed except for some rigid body motions which are discussed later. The RGyr gives an idea of compactness of a system during the simulation. The average values of the RGyr of the systems also show stable profiles indicating that the overall sizes and compactness of systems do not vary much with respect to the five different stages of (un)binding of the substrate and the cofactor (Table 3.1). Notably, the RGyr of holo systems were found to be marginally higher than that of the apo system due to the increase in the size of the protein due to ligand and cofactor binding. We found that RMSD of the apo form shows a very stable profile during the simulation while the system shows little higher deviation after SAM binding, mostly the N-terminal residues, L6 and α1 due to the conformational changes upon SAM binding. Such larger deviations are understandable since the initial structures of four of the model systems have been taken from the same X-Ray structure.

However, the simulations reach the equilibrated state. The CBS as well as ASBS residues show relatively high RMSF upon SAM binding (E-SAM). The degrees of correlated movements between different secondary structural units of CmaA1 were analyzed by calculating the covariance matrices [44] (Figure 3.2). The covariance matrix of the system E-SAM shows highly anti-correlated movement of residues 66-136 (including β3, α4, β2, α5, β1, α6 and β4 of CBS) with respect to residues 45-85 (including α2 and α3) and residues 247-274 (including α10 and

α11 lining ASBS) which may move away from each other in order to accommodate the cofactor and the substrate. After substrate binding the α2 and L6, and L8 of ASBS show higher deviations. The cyclopropanated product formation (E-SAM-S) brings about some structural deviations in α1, L6 and the N-terminal residues. The residues 170-180 of ASBS show a high

RMSF in this system. These local structural changes are induced by binding of different

62

Chapter 3

cofactors and ligands in the CBS and ASBS respectively during the sequential eve nts of

cyclopropanation.

A) B)

C) D)

E) The residue numbers are plotted in both the axes. A positive correlation coefficient indicates a correlated motion, while a negative correlation coefficient indicates an anti-correlated movement. A threshold of 0.25 for the correlation coefficient has been recommended and is used in the present study to understand correlated movements in the protein domains in various systems. So the blue patches correspond to the anti correlated movements of the residues and the green ones the correlated movements.

Figure 3.2 Covariance matrices of the residues of systems A. E, B. E-SAM, C. E-SAM-S, D. E-SAHC- P, E. E-SAHC-P.

63

Chapter 3

Colours representing various ranges of RMSD

0 1 2 3 4 5

Figure 3.3 A RMSD (in Ås) matrices of the 5 systems with respect to each other. The X and Y axes represent the coordinates of each trajectory saved at every 100 ps (400 time frames) and Z axes represent the mutual RMSDs. different colors indicate different ranges of RMSD.

In order to understand the sequential structural variations in CmaA1 during cyclopropanation, when it binds different cofactors and ligands in the CBS and ASBS, pair wise

RMSD values were computed by comparing each snapshot (coordinates saved at every 100 ps) of one trajectory with each snapshot of another. The RMSD matrices for each of the trajectory pair and their probability distributions are shown in Figure 3.3 A and Figure 3.3 B respectively.

These matrices give an account of how the structure of CmaA1 changes between consecutive steps of cyclopropanation. These structural changes are fairly large when SAM binds to the apo structure and also after the cyclopropanation bimodal distributions of the RMSDs were observed for the systems E-SAM and E-SAM-P with respect to the other systems. The probability

64

Chapter 3 distributions of the RMSD matrices show a huge conformational change upon SAM binding and product formation.

Figure 3.3 B Probability distribution of RMSD (in Ås) of the 5 systems with respect to each other. The X and Y axes contain the RMSD (in Ås) of each trajectory with respect to the other saved at every 100 ps and the probability distribution respectively.

Variation of SASA of the 5 systems during the simulations (Table 3.1) reveals that, as the

RGyr of the system increases upon SAM binding (E-SAM), the system becomes less compact and hence SASA of the system also increases as compared to the apo structure. The surface area buried due to the cofactor decreases as the cyclopropanation reaction proceeds as the average

SASA for the systems E-SAM, E-SAM-S, E-SAHC-P and E-SAHC are 292.41, 268.45, 268.90 and 236.03Å2 respectively.

3.3.2. Active site dynamics

For understanding the structural changes that take place in the active sites of CmaA1 during each stage of cyclopropanation in detail, the distance profiles of all the intramolecular H-

65

Chapter 3 bonds formed among the residues present in 6 Å of the cofactor and also the ligand from the

- crystal structure 1KPH were identified and studied. The H-bonds that the cofactors/HCO3

/H2CO3 make with the respective active site residues were also identified and their stabilities were verified from their distance profiles. The most probable distances for the key intramolecular and intermolecular H-bonds that undergo major variations during cyclopropanation are given in

Table 3.2 and Table 3.3 respectively for each stage of cyclopropanation.

Table 3.2 Key intramolecular H-bonds undergoing major variations during cyclopropanation. The most probable distances (in Å) for each bond have been given for the 5 model systems representing various stages of cyclopropanation. ID (residues involved) Most probable distance (Å) E E-SAM E-SAM-S E-SAHC-P E-SAHC IHB14 (TYR33-TYR265) 4.81 5.23 2.83 2.78 5.00 IHB18 (GLY72-LEU93) 4.81 3.02 3.00 3.00 5.81 IHB19 (CYS73-LEU93) 2.85 2.84 2.70 2.84 2.85 IHB21 (TRP75-GLN99) 4.93 6.92 5.69 6.93 7.31 IHB34 (ARG146-GLU124) 4.44 4.37 3.88 4.40 4.36 IHB35 (ARG146-PRO7) 4.56 2.72 6.91 4.88 2.64 IHB36 (HIS167-SER135) 4.03 2.87 2.84 2.84 4.03 IHB37 (ILE169-GLU140) 3.02 3.98 4.04 3.06 4.78 IHB43 (ARG204-GLU140) 5.42 2.87 4.89 2.89 2.80 IHB45 (ASN12-PRO202) 2.89 5.45 3.61 5.04 3.00

Table 3.3 Key H-bond interactions between the cofactors and the active site residues. The most probable distances (in Å) for each bond have been given for the 5 model systems representing various stages of cyclopropanation. ID (residues involved) Most probable distance (Å) E-SAM E-SAM-S E-SAHC-P E-SAHC HB1 (cof-TYR33) 4.82 5.94 2.79 2.97 HB2 (cof-SER34) 3.27 6.05 2.62 2.63 HB3 (cof-GLY74) 3.16 7.40 3.29 4.27 HB4 (cof-THR94) 3.50 6.45 2.86 2.86 HB5 (cof-LEU95) 3.02 7.80 3.09 2.91 HB6 (cof-SER96) 4.44 7.33 3.84 3.47 HB7 (cof-GLN99) 5.05 2.86 4.06 4.25 HB8 (cof-TRP123) 6.88 8.80 3.31 3.50 HB9 (cof-GLY137) 4.85 7.65 4.82 4.91 HB10 (cof-ILE136) 2.87 4.78 2.94 2.76 HB11 (cof-GLY72) 2.67 4.10 2.69 5.32 HB12 (cof-GLN99) 3.85 5.43 2.63 4.09 HB13 (cof-GLU124) 2.67 4.10 2.69 5.32 HB14 (cof-ASP70) 5.14 2.65 4.84 7.86 HB15 (cof-THR78) 4.74 2.75 5.16 7.03 HB16 (cof-LEU95) 7.27 3.40 5.73 5.72

66

Chapter 3

A

I 136 K B T 52 S 32 34 T 94 G C 74 73 G Y 72 16 S 96 L G 95 137 Q 99 F 200 A 138 H V 141 12 H F 8 142

I T C 136 G 32 71 G 203 74 V F C I 196 200 H S 195 141 E 34 140 Y R Y 33 204 16 A L 77 192 T L 78 205 G Y I S 137 16 278 F 34 D 23 F Y 70 Q 191 33 99 F L 188 I 169 W 95 T W M 239 94 75 272 Y Y H F 141 232 L 265 273 236 G A 136 C 72 F 269 142 Y 33 S G D 32 203 V T I 196 T 94 195 34 I F G 278 188 72 F R 191 204 C G G 73 F L 137 74 273 192 L Q 205 99 L L Y 200 I 95 265 E 136 S L 140 Y 96 235 I 16 169 A Y H V 138 C 232 141 12 V 269 71 Y 33 G H W F 137 141 142 S 123 34 F W 9 239

T 78 W T S 32 34 K E 75 52 Q Y 99 33 G S G 74 34 137 G Y C 72 16 73 T V 94 71 D L 70 I 95 136 A 138 H 141 W 123 L A H 93 121 8 F E G 142 124 122

Figure 3.4 CBS (left part) and ASBS (right part) residues of each of the systems along with a 2D representation of interactions of SAM/SAHC/Substrate/product with their respective active sites. A. System E, B. System E-SAM, C. System E-SAM-S, D. System E-SAHC-P, E. System E- SAHC. The same representations for interactions have been used in the rest of thesis.

67

Chapter 3

The CBS of CmaA1 consists of four structural motifs viz., the loop between β1 and α4

(motif I), the loop between β2 and α5 (motif II), the loop between β3 and α6 (motif III), α7

(motif IV), α1 of the N-terminus and the loop between α1 and α2, which are highly conserved residues among the other proteins of the class methyl transferases. The cofactors bind to CmaA1 mostly by H-bonding, electrostatic interactions and also hydrophobic interactions. The stabilities of these interactions at various stages of cyclopropanation are further discussed below. The substrate binding site is a ~ 10 Å long tunnel lined by residues 170-210, the loop between β5 and

α11. This pocket is mainly composed of residues rich in hydrophobic side chains as the substrate is a hydrophobic fatty acid chain, which is bound to the protein by very weak hydrophobic

- interactions. The HCO3 /H2CO3 make interactions with the residues Cys35, Glu140, and Tyr232.

Figure 3.4 shows the interactions of the cofactors/substrate/product with their respective active site residues in the various model systems.

E L10 E-SAM E-SAM E-SAHC-P E-SAHC α1

Figure 3.5 Relative movements of L10 and the N-terminus in the five model systems. The H- bond distances (in Å) between the Asn12 of the N-terminal α1 helix and Pro202 of L10 in the model systems are 3.17, 8.71, 5.62, 7.59 and 3.26 at the end of 40ns for the E, E-SAM, E-SAM- S, E-SAHC-P and E-SAHC systems.

68

Chapter 3

Binding of SAM to the apo structure brings about many conformational changes in the binding sites of CmaA1 as mentioned in the literature (Huang et al., 2002). As analyzed in our study, SAM binding is mostly stabilized by strong H-bonds with Ile136 (HB10), Gly137 (HB9),

Glu124 (HB13) and Gly72 (HB11). It also makes weak, but stable H-bonds with SER34 (HB2),

Gly74 (HB3), Thr94 (HB4), and Leu95 (HB5). The interaction energy between CmaA1 and

SAM was calculated to be -237.04 kcal/mol (Table 3.1). The major conformational change upon

SAM binding was observed in the H-bond between the Asn12 of the N-terminal α1 helix and

Pro202 of L10 in the apo structure, which makes the CBS closed and inaccessible to outside. But this H-bond breaks due to movement of α1 towards the periphery to facilitate the entry of SAM into the CBS when SAM approaches (IHB45). Figure 3.5 shows the orientation of L10 with respect to the N terminus at various stages of cyclopropanation. The residues 170-210 including

L8, α9, L10 and α10 of the ASBS move towards the CBS which were towards the periphery in the apo structure.

E E-SAM

Figure 3.6 Inward orientation of the hydrophobic side chains of the ASBS upon SAM binding. The Apo structure (System E, black) is superimposed on the SAM bound structure (System E- SAM, red) at the end of 40 ns.

69

Chapter 3

The intramolecular H-bond distance analysis reveals that the residues Gly72 and Cys73 of motif III move towards β2 to form H-bonds with Leu93 (IHB18, IHB19), but Gln99 of α5 moves ~2Å away from Trp75 (IHB21) of this motif as it forms weak a H-bond (HB7) with

SAM. Arg146 of α8 forms H-bonds with Pro7 of N-terminus (IHB35) and Glu124 of the CBS

(IHB34) in the apo form. But both these H-bonds break after SAM binding as both N-terminus and α6 move outwards. His167, Ile169 and Arg204 of substrate binding pocket come closer to the α7 or motif IV of cofactor binding residues upon cofactor binding as revealed from the distance profiles of IHB36, IHB37 and IHB43.The SAM binding also brings about some conformational changes in the ASBS. The hydrophobic side chains of the residues Ile169,

Leu192, Ile195, Phe200, Leu205, Tyr232, Leu236, Phe273, Ile278, and Tyr33 are oriented towards the lining of the ASBS (Figure 3.6). This state may represent the structure of the protein that is capable of binding the substrate. Binding of the cofactor, SAM seems to be essential for such a conformational transition.

SAM undergoes a conformational change when the substrate binds. The -CH3 group attached to the S+ of SAM tends to form intramolecular CH-π interactions with its own purine π system. At the same time SAM tends to come closer to the C-C double bond of the substrate as

+ + shown in the distance profiles between the S and C of the CH3 group attached to the S and the

+ two double bonded carbons of the substrate where the CH3 from SAM is supposed to attack.

The distance between the S+ of SAM and the double bond carbons of substrate decreases from

~10 Å to ~6 Å. Such a lowering of the distance seems to be consistent with the overall reaction mechanism which requires the cofactor and substrate to be close to each other. At this stage the

L10 loop lifts up, providing more space for the cofactor and the substrate to come closer to facilitate the reaction (Figure 3.5). Because of this conformational change almost all the H-bonds

70

Chapter 3 previously formed by SAM are lost after the substrate binding. SAM moves towards the substrate binding site and makes strong interactions with ASP70 (HB14) and Thr78 (HB15).

However SAM binds more strongly to CmaA1 as the interaction energy was calculated to be -

304.8 kcal/mol. The interaction energy between Asp70 and SAM was calculated to be ~-150

+ - kcal/mol due to salt bridge interactions formed between the NH3 group of SAM and COO group of Asp70 separated by a distance of ~2.65 Å, dominated by the electrostatic term of the potential energy function. Leu95 which was previously making H-bonds with N3 and O2’ now makes H-bond with N7 of SAM (HB16) because of the altered orientation of SAM. Tyr33 was also previously making H-bond with N of SAM and after substrate binding it forms a H-bond with O3’ (HB17). The H-bond distance between the Trp75 motif III and Gln99 of α5 (IHB21) decreases back by ~2 Å again as Gln99 forms 2 strong H-bonds with SAM upon substrate binding. Distance between Gln99 and Ser96 also increases by ~1 Å at this stage. The distance between Arg146 and Glu124 (IHB34) now increases by ~2 Å and shows a more stable profile at this stage. The loop between α9 and α10 now moves ~2 Å away from the α7 of CBS to accommodate the substrate as depicted from the distance profile of IHB43.

After the methyl transfer and cyclopropanation SAM is converted to SAHC and the

- HCO3 becomes H2CO3 by receiving a proton from the carbocation intermediate in order to close the cyclopropane ring. SAHC shows a relatively straight structure as compared to SAM as the bulky CH3 group is removed. The interaction energy of SAHC decreases by ~150 kcal/mol as compared to SAM (Table 3.1). This is mainly due to the loss in the electrostatic component of the interaction energy since SAM is positively charged, but SAHC is neutral and hence these two are not strictly comparable. SAHC binding is mainly stabilized by stable H-bonds with Tyr33

(HB1), Ser34 (HB2), Gly74 (HB3), Thr94 (HB4), Leu95 (HB5), Gly137 (HB9), Ile136 (HB10)

71

Chapter 3 and Gln99 (HB12), but the strong interactions are with Gly72, Ile136, Ser34 and Tyr33. After formation of the product, the N-terminus undergoes a large conformational change after 15 ns.

The RMSD of N-terminus raises upto 15 Å as it lifts up and moves far from the motif IV as revealed from the distance profile of IHB35. The distances between Tyr33, present in the loop between α2 and α3 (CBS) and Tyr265 of α12 (ASBS) decrease to ~2.5 Å as revealed from the distance profile of IHB14. Distance between Trp75 and Gln99 (IHB21) again increases and shows similar profile as in the SAM bound system before substrate binding.

After the enzyme turn over the distances between Arg204 and Glu140 (IHB43) as well as

Ile169 and Glu140 (IHB37) decrease and show similar profiles to SAM bound systems. SAHC moves closer towards N-terminus hence away from the binding pocket. The distance between

SAHC and product is lesser than that between SAM and the substrate initially, but the distance gradually increases. Distance between the S atom of SAHC and the carbon atoms of the product, corresponding to those of the double bond carbons of the substrate decreases from ~10 Å to ~5

Å.

The H-bonding pattern remains almost similar after the product release except a few exceptions and consequently the interaction energy between CmaA1 and SAHC slightly decreases by 2 kcal/mol. The distances for the H-bonds with Gly72, Gly74, Ile136, Gln99 increase. But the H-bonds that SAHC forms with Trp123 and Ser96 show stable distance profiles. SAHC binding is stabilized by relatively stronger interactions (~5-20 kcal/mol) with

Gln99, Leu95, Gly74, Thr94, Tyr33 and Trp123. The residues Gly72 and Leu93 of the CBS move 6 Å apart (IHB18), which is ~5 Å in the apo form. The H-bonds between the Glu140 and residues Ile169 and Arg204 of the substrate binding site are completely lost at this stage, possibly because of removal of H2CO3. The bond between Ile169 and Glu140 (IBH37) is

72

Chapter 3 regained after 20 ns of simulation which may indicate the regain of the Apo structure. This fact may also be supported from the H-bond between the Arg146 of α8 and N-terminal Pro7, which is regained resembling the apo conformation.

In this chapter, an attempt has been made to understand the conformational changes that occur in the binding sites at each step of the cyclopropanation process. The knowledge of the rigidities and flexibilities of the active site residues obtained from the MD simulations can be implemented to propose a common framework of the active site representing the spatial locations of the chemical fingerprints such as H-bond donors, H-bond acceptors, hydrophobic groups etc., which can be useful for screening chemical libraries to obtain suitable lead compounds. The dynamic changes of the active site of CmA1 discussed in this chapter has been exploited in rational drug design by generating dynamics based pharmacophore models for CmaA1 followed by VS and SBDD, which will be discussed in the following chapters.

The energetics involved in the cyclopropanation cycle was studied by calculating the interaction energies between the enzyme and cofactors/substrate/product in all the systems

(Table 3.1). The interaction energy between SAM and protein was found to be much lower

(~150-200 kcal/mol) than that of SAHC. The interaction energy between SAM and CmaA1 decreases further by ~70 kcal/mol after substrate binding. This indicates a more stable interaction of SAM with CmaA1 substrate binding. After methyl transfer SAM converts to SAHC which shows a lower binding affinity towards the enzyme. Interaction energy of the product shows less stable binding to the enzyme than that of the substrate. The interaction energy of SAHC further increases making the enzyme-SAHC complex weaker after the removal of the product suggesting favored unbinding of SAHC after product release to gain the apo structure.

73

Chapter 3

3.4 Conclusions

Various stages of cyclopropanation reaction in CmaA1 have been systematically analyzed through MD simulations on carefully chosen model systems. It was observed that, the apo state is a closed state, where the CBS is closed from the surrounding by a stable H-bond between the Pro202 of L10 and Asn11 of the N-terminus, whose distance increases from ~3.17

Å to ~8.71 Å upon SAM binding to open the binding site and accommodate SAM. The SAM binding also induces creation of a hydrophobic environment in the ASBS in order to bind the hydrophobic substrate by orientation of the hydrophobic side chain to the inner side of the

ASBS. Upon substrate binding, SAM changes its conformation in such a way that the –CH3 group attached to the S+ of SAM tends to come closer to the double bond carbon atoms of the

+ substrate. The distances between the –CH3 carbon attached to the S and the double bond carbon atoms decrease from ~10 Å to ~4.5 Å. The distance between the S+ and the double bonded carbon atoms also decrease from~10 Å to ~6 Å. Lifting up of L10 facilitates this proximity.

After product formation SAM converts to SAHC and the distance between SAHC and the product starts increasing as revealed from the distances between the S atom of SAHC and the carbon atoms of the product, corresponding to the double bond carbons of the substrate, which increases from ~10 Å to ~5 Å. Although, unfortunately we could not model the intermediate state with the carbocation species, the energies and structural variations in the model systems with the reactants and products gave enough information to understand the structural rearrangements occurring at each step of cyclopropanation reaction. The substrate binds more strongly as compared to the product. Similarly, SAM binds more strongly to CmaA1 as compared to SAHC by ~150 kcal/mol. These findings support the favored unbinding of the product and then SAHC after cyclopropanation and regain of the apo structure. Thus the study

74

Chapter 3 illustrates the sequence of events and the structural as well as energetic changes occurring in the active sites of CmaA1 at each step of cyclopropanation of unsaturated mycolic acid, which will be very useful for designing inhibitors of CmaA1.

75

Chapter 3

References

1. Józefowski, S., Sobota, A., & Kwiatkowska, K. (2008). How Mycobacterium tuberculosis subverts host immune responses. Bioessays, 30(10), 943-954. 2. Flynn, J. L., & Chan, J. (2001). Immunology of tuberculosis. Annu Rev Immunol, 19(1), 93-129. 3. Wolfe, L. M., Mahaffey, S. B., Kruh, N. A., & Dobos, K. M. (2010). Proteomic definition of the cell wall of Mycobacterium tuberculosis. J Proteome Res, 9(11), 5816- 5826. 4. Hong, X., & Hopfinger, A. J. (2004). Construction, molecular modeling, and simulation of Mycobacterium tuberculosis cell walls. Biomacromolecules, 5(3), 1052-1065. 5. Banerjee, R., Vats, P., Dahale, S., Kasibhatla, S. M., & Joshi, R. (2011). Comparative genomics of cell envelope components in mycobacteria. PloS One, 6(5), e19280. 6. Vander Beken, S., Al Dulayymi, J. A. R., Naessens, T., Koza, G., Maza‐Iglesias, M., Rowles, R., Grooten, J. (2011). Molecular structure of the Mycobacterium tuberculosis virulence factor, mycolic acid, determines the elicited inflammatory pattern. Eur J Immunol, 41(2), 450-460. 7. Minnikin, D. E., & Polgar, N. (1966). Studies on the mycolic acids from human tubercle bacilli. Tetrahedron Lett, 7(23), 2643-2647. 8. Wayne, L. G., & Kubica, G. P. (Eds.). (1984). The Mycobacteria: a Sourcebook. Dekker. 9. Kaneda, K., Imaizumi, S., Mizuno, S., Baba, T., Tsukamura, M., & Yano, I. (1988). Structure and molecular species composition of three homologous series of α-mycolic acids from Mycobacterium spp. J Gen Microbiol, 134(8), 2213-2229. 10. Yuan, Y., Lee, R. E., Besra, G. S., Belisle, J. T., & Barry, C. E. (1995). Identification of a gene involved in the biosynthesis of cyclopropanated mycolic acids in Mycobacterium tuberculosis. P Natl Acad Sci, 92(14), 6630-6634. 11. George, K. M., Yuan, Y., Sherman, D. R., & Barry, C. E. (1995). The biosynthesis of cyclopropanated mycolic acids in Mycobacterium tuberculosis Identification and functional analysis of CMAS-2. J Biol Chem, 270(45), 27292-27298. 12. Grogan, D. W., & Cronan, J. E. (1997). Cyclopropane ring formation in membrane lipids of bacteria. Microbiol Mol Biol Rev, 61(4), 429-441. 13. Glickman, M. S. (2003). The mmaA2 gene of Mycobacterium tuberculosis encodes the distal cyclopropane synthase of the α-mycolic acid. J Biol Chem, 278(10), 7844-7849. 14. Glickman, M. S., Cahill, S. M., & Jacobs, W. R. (2001). The Mycobacterium tuberculosis cmaA2 gene encodes a mycolic acid trans-cyclopropane synthetase. J Biol Chem, 276(3), 2228-2233. 15. Glickman, M. S., Cox, J. S., & Jacobs, W. R. (2000). A novel mycolic acid cyclopropane synthetase is required for cording, persistence, and virulence of Mycobacterium tuberculosis. Mol Cell, 5(4), 717-727. 16. Barkan, D., Rao, V., Sukenick, G. D., & Glickman, M. S. (2010). Redundant function of cmaA2 and mmaA2 in Mycobacterium tuberculosis cis cyclopropanation of oxygenated mycolates. J Bacteriol, 192(14), 3661-3668. 17. Boissier, F., Bardou, F., Guillet, V., Uttenweiler-Joseph, S., Daffé, M., Quémard, A., & Mourey, L. (2006). Further insight into S-adenosylmethionine-dependent methyltransferases structural characterization of hma, an enzyme essential for the

76

Chapter 3

biosynthesis of oxygenated mycolic acids in Mycobacterium Tuberculosis. J Biol Chem, 281(7), 4434-4445. 18. Barkan, D., Liu, Z., Sacchettini, J. C., & Glickman, M. S. (2009). Mycolic acid cyclopropanation is essential for viability, drug resistance, and cell wall integrity of Mycobacterium tuberculosis. Chem Biol, 16(5), 499-509. 19. Arcus, V. L., Lott, J. S., Johnston, J. M., & Baker, E. N. (2006). The potential impact of structural genomics on tuberculosis drug discovery. Drug discovery today, 11(1), 28-34. 20. Lamichhane, G. (2011). Novel targets in M. tuberculosis: search for new drugs. Trends Mol Med, 17(1), 25-33. 21. Huang, C. C., Smith, C. V., Glickman, M. S., Jacobs, W. R., & Sacchettini, J. C. (2002). Crystal structures of mycolic acid cyclopropane synthases from Mycobacterium tuberculosis. J Biol Chem, 277(13), 11559-11569. 22. Martin, J. L., & McMillan, F. M. (2002). SAM (dependent) I AM: the S- adenosylmethionine-dependent methyltransferase fold. Curr Opin Struc Biol, 12(6), 783- 793. 23. Kozbial, P. Z., & Mushegian, A. R. (2005). Natural history of S-adenosylmethionine- binding proteins. BMC Struc Biol, 5(1), 19. 24. Umbarger, H. E. (1978). Amino acid biosynthesis and its regulation. Annu Rev Biochem, 47(1), 533-606. 25. Courtois, F., Guérard, C., Thomas, X., & Ploux, O. (2004). Escherichia coli cyclopropane fatty acid synthase. Eur J Biochem, 271(23‐24), 4769-4778. 26. Iwig, D. F., Grippe, A. T., McIntyre, T. A., & Booker, S. J. (2004). Isotope and elemental effects indicate a rate-limiting methyl transfer as the initial step in the reaction catalyzed by Escherichia coli cyclopropane fatty acid synthase. Biochemistry, 43(42), 13510- 13524. 27. Iwig, D. F., Uchida, A., Stromberg, J. A., & Booker, S. J. (2005). The activity of Escherichia coli cyclopropane fatty acid synthase depends on the presence of bicarbonate. J Am Chem Soc, 127(33), 11612-11613. 28. Molitor, E. J., Paschal, B. M., & Liu, H. W. (2003). Cyclopropane Fatty Acid Synthase from Escherichia coli: Enzyme Purification and Inhibition by Vinylfluorine and Epoxide‐Containing Substrate Analogues. Chem Bio Chem, 4(12), 1352-1356. 29. Veyron-Churlet, R., Bigot, S., Guerrini, O., Verdoux, S., Malaga, W., Daffé, M., & Zerbib, D. (2005). The biosynthesis of mycolic acids in Mycobacterium tuberculosis relies on multiple specialized elongation complexes interconnected by specific protein– protein interactions. J Mol Biol, 353(4), 847-858. 30. Liao, R. Z., Georgieva, P., Yu, J. G., & Himo, F. (2011). Mechanism of mycolic acid cyclopropane synthase: a theoretical study. Biochemistry, 50(9), 1505-1513. 31. Carlson, H. A. (2002). Protein flexibility and drug design: how to hit a moving target. Curr Opin Chem Biol, 6(4), 447-452. 32. Yang, L. W., & Bahar, I. (2005). Coupling between catalytic site and collective dynamics: a requirement for mechanochemical activity of enzymes. Structure, 13(6), 893-904. 33. McGeagh, J. D., Ranaghan, K. E., & Mulholland, A. J. (2011). Protein dynamics and enzyme catalysis: insights from simulations. BBA-Proteins Proteom, 1814(8), 1077-1092.

77

Chapter 3

34. Chourasia, M., Sastry, G. M., & Sastry, G. N. (2011). Aromatic–Aromatic Interactions Database, A 2 ID: An analysis of aromatic π-networks in proteins. Int J Biol Macromol, 48(4), 540-552. 35. Mahadevi, A. S., & Sastry, G. N. (2012). Cation− π interaction: Its role and relevance in chemistry, biology, and material science. Chem Rev, 113(3), 2100-2138. 36. Alahari, A., Trivelli, X., Guérardel, Y., Dover, L. G., Besra, G. S., Sacchettini, J. C., Kremer, L. (2007). Thiacetazone, an antitubercular drug that inhibits cyclopropanation of cell wall mycolic acids in mycobacteria. PLoS One, 2(12), e1343. 37. Rai, D., Johar, M., Srivastav, N. C., Manning, T., Agrawal, B., Kunimoto, D. Y., & Kumar, R. (2007). Inhibition of Mycobacterium tuberculosis, Mycobacterium bovis, and Mycobacterium avium by novel dideoxy nucleosides. Med Chem, 50(19), 4766-4774. 38. Brooks, B. R., Brooks, C. L., MacKerell, A. D., Nilsson, L., Petrella, R. J., Roux, B., Karplus, M. (2009). CHARMM: the biomolecular simulation program. J Comput Chem, 30(10), 1545-1614. 39. Jo, S., Kim, T., Iyer, V. G., & Im, W. (2008). CHARMM‐GUI: a web‐based graphical user interface for CHARMM. J Comput Chem, 29(11), 1859-1865. 40. Vanommeslaeghe, K., Raman, E. P., & MacKerell Jr, A. D. (2012). Automation of the CHARMM General Force Field (CGenFF) II: assignment of bonded parameters and partial atomic charges. J Chem Inf Model, 52(12), 3155-3168. 41. Pandit, K. R., & Klauda, J. B. (2012). Membrane models of E. coli containing cyclic moieties in the aliphatic lipid chain. BBA-Biomembranes, 1818(5), 1205-1210. 42. Mackerell, A. D., & Banavali, N. K. (2000). All‐atom empirical force field for nucleic acids: II. Application to molecular dynamics simulations of DNA and RNA in solution. J Comput Chem, 21(2), 105-120. 43. MacKerell, A. D., Feig, M., & Brooks, C. L. (2004). Extending the treatment of backbone energetics in protein force fields: Limitations of gas‐phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations. J Comput Chem, 25(11), 1400-1415. 44. Roy, A., & Post, C. B. (2012). Detection of long-range concerted motions in protein by a distance covariance. J Chem Theory Comput, 8(9), 3009-3014.

78

Chapter 4 Dynamics Based Pharmacophore Models for Screening Potential Inhibitors of Mycobacterial Cyclopropane Synthase

e-Pharmacophore P6:Ile136

N14:Ser34

A6:Leu95

D8:Thr94 D7:Tyr16 Snapshots of CmaA1 from MD simulations

No one who has experienced the intense involvement of computer modeling would deny that the temptation exists to use any data input that will enable one to continue playing what is perhaps the ultimate game of solitaire. — James Lovelock Gaia: A New Look at Life on Earth (1979), 137-8.

Chapter 4

4.1 Background

In recent few decades, computational methods have become mainstream in understanding the ligand-receptor interactions as well as for in silico screening of huge chemical libraries providing a fast and less expensive alternative to the traditional HTS [1-5]. Also, rapid developments in high throughput NMR spectroscopy and X-ray crystallography [6, 7] have paved the path of rapid structure determination of the targets and have accelerated the use of structure based methods by many folds [8, 9]. Incorporating receptor flexibility in docking calculations in an efficient way is a major challenge, and few methods have been proposed to address this. Few of the early methods included ‘soft-docking’ by Jiang et al. [10] where the penalty for van der Waals clashes between the receptor and ligand atoms were reduced followed by other researchers who significantly contributed towards development and imp rovements of the soft docking methods [11-13]. Leach developed an algorithm that considers the conformational flexibilities of amino acid side chains and ligands using a rotamer library and identifies the combinations of conformations of side chains and the ligand with lowest energies

[14]. Some examples include the studies of Murray et al. [15] on thrombin, thermolysin and neuraminidase, studies by Bouzida et al. [16] and Erickson et al. [17] on HIV Protease,

Cavasotto and Abagyans’ flexible-ligand-grid receptor docking experiments on 33 crystal structures of four protein kinase sub-families [18], Schapira and Abagyans’ study on VS against nuclear receptors [19], and docking study by Daeyaert et al. [20] using the crystal structures of

HIV reverse transcriptase complexes with different non-nucleoside inhibitors. Representation of the flexibility of receptors through collective degrees of freedom also became popular in early

1990s [21, 22]. Use of multiple protein conformations offers the ligand to select from an ensemble of pre-existing partially fitting protein conformations which represent the highly- populated low-energy states of a receptor also founded a strong basis for incorporating protein

80

Chapter 4 flexibility in SBDD [23-25]. MD simulation has emerged as a powerful and efficient method to address the receptor flexibility issues in SBDD. Carlson et al. [26] developed a “dynamic” pharmacophore model of HIV integrase from an ensemble of MD snapshots and identified two new inhibitors screening the Available Chemical Database (ACD). They also validated this method by generating dynamic pharmacophore model from the apo form of HIV protease, which could discriminate between known HIV-1 protease inhibitors and other drug-like molecules [27].

Another approach by Moitessier et al. [28] involves prediction of binding modes by analyzing the dynamic pharmacophore model and orientation of the pharmacophore points. Several studies have been carried out by Carlson and coworkers, which demonstrated the successful incorporation of protein flexibility in SBDD applied over a range of pharmacologically relevant targets [29-34]. CmaA1 is an essential drug target, which is responsible for cis-cyclopropanation at the distal position of α-mycolates. In the earlier chapter, the structural, energetic and dynamic properties of CmaA1 have been discussed by analyzing the results of 40 ns MD simulations on each of the five model systems of CmaA1 representing various stages of cyclopropanation process [35]. This chapter presents a study wherein, an attempt has been made to incorporate dynamic nature of the active site of CmaA1 in drug design by generating structure based e- pharmacophore models from the snapshots taken from the MD trajectories. These models have been validated by examining their efficiencies to screen previously known CmaA1 inhibitors

[35]. The results from the models were also compared docking calculations done on multiple conformations of the protein structure.

81

Chapter 4

4.2 Methodology

4.2.1 Model systems

The following five model systems of CmaA1 were considered for generation of dynamics based pharmacophore models. E-SAM: Holo form of CmaA1 with SAM in the CBS.

E-SAM-S: Holo form with SAM in the CBS and the substrate in the ASBS. E-SAHC-P: Holo form with SAHC in the CBS and the cyclopropanated product in the ASBS. E-SAHC: Holo form with SAHC in the CBS. E-SAHC-D: Holo form with SAHC in the CBS and a CmaA1 inhibitor DDDMAB in the ASBS. The first four systems were taken from our previous study

[36], and the fifth model (E-SAHC-D) was based on the MD simulations done on the crystal structure of the holo state of CmaA1 structure (PDB ID: 1KPH [37]).

4.2.2 Molecular dynamics (MD) simulations

The methodology for MD simulation with the CHARMM program [38] on the system E-

SAHC-D was the same as for rest of the four systems. The individual components of each system were generated using CHARMM GUI [39] using the CGenFF force field [40]. Non- physical contacts were removed by minimizing the whole system using a 1000-step ABNR minimization. The system was then solvated in a TIP3P water box of 80 x 75 x 75 Å3 dimension and appropriate numbers of counter ions were added for neutralization. 1 ns equilibration run was used in the NVT ensemble with periodic boundary conditions after 500 steps of SD and 500 steps of ABNR energy minimizations. The solute molecules were restrained by applying a constant harmonic force of 5 kcal/mol during the initial minimization and equilibration. The systems was then subjected to 40 ns MD simulation in the NPT ensemble at 298 K with leap frog integrator and a time step of 0.002 ps. Langevin piston algorithm, SHAKE and PME were used

82

Chapter 4 to perform pressure control, constraint covalent bonds involving hydrogen atoms and to treat long range interactions respectively. The coordinates were saved every 5 ps for analysis.

4.2.3 Generation of structure based pharmacophore models

Forty snapshots were collected at every 5 ns interval from the MD trajectories of the five model systems of CmaA1. Including the static crystal structure 1KPH, a total of 41 structures were thus considered to construct the structure based e-pharmacophore models. Glide energy grids were generated for each snapshot to define the active site as a cubic box of

12*12*12 Å3 around the cofactors and then the interactions between the cofactor and protein was assessed by using “Score in place” mode of the Glide, Schrödinger molecular modeling suite, with the option to output Glide XP descriptor information [41].

40 ns MD simulations on 5 model systems E-SAHC-D, E-SAM, E-SAM-S, E-SAHC-P, E-SAHC

8 snapshots at each 5ns from each trajectory + crystal structure 1KPH

Glide XP Extract the cofactors Flexible docking

Glide XP docking (Score in position)

e-Pharmacophore models

Validation by screening reference inhibitors 23 Thiacetazone analogues (MIC in μg/mL)

Further validation Selection of best Comparison of Pharmacophore pharmacophore screening models and docking

Figure 4.1 Schematic representation of the generation of various types of pharmacophore models as filters for VS.

83

Chapter 4

Default settings were used for the scoring. The resulting protein-cofactor complexes along with the XP energy terms were then subjected to e-Pharmacophore [42] generation tool of

Schrodinger to generate energy based pharmacophore models. Figure 4.1 schematically shows the generation and selection of the e-pharmacophore models as filters for VS.

4.2.4 Pharmacophore screening and docking

A set of 23 CmaA1 inhibitors with reported MIC values ranging from 0.0125-12.5

μg/mL [35] (Scheme 4.1) were used for verifying the abilities of all the generated structure based e-pharmacophore models. These are the only available inhibitors for CmaA1 with their activities.

All these compounds were energy minimized using the default parameters of LigPrep [43] module of the Schrodinger Suite. Five best conformers were chosen for each compound. The

‘Advanced Pharmacophore Screening’ option, Phase module of Schrodinger Suite [44] was used with an option to generate five conformations per rotatable bond and maximum number of conformations per compound were kept to be 100. A rapid sampling was used for screening and the default option for skipping structures with more than 15 rotatable bonds was used. The minimum number of sites the molecule must match was assigned to be 4 for all the models.

Among many conformers of a ligand, the one with the best fitness score (S) given by Eqn. (14)

[49] was retained for each compound.

Eqn. (14)

where, Salign is the alignment score, i.e., RMS deviation between the site point positions in the matching conformation and the site point positions in the hypothesis, Calign is the alignment

cutoff, Wsite is weight of site score ( ), Svec is the vector score, i.e., average

84

Chapter 4 cosine between vector features in the matching conformation and the vector features in the reference conformation, Wvec is the weight of vector score, Svol ( ) is the volume score, i.e., ratio of the common volume occupied by the matching conformer and the reference conformer, to the total volume (the volume occupied by both).Wvol is the weight of volume score, Sivol is the included volume score i.e., ratio of the volume overlap between the matching conformer and the included volumes (if present) to the total included volume. Volumes were computed using van der Waals models of all atoms except nonpolar hydrogens and Wivol is the weight of volume score. Calign, Wsite, Wvec, Wvol and Wivol are user-adjustable parameters, with default values of 1.20, 1.00, 1.00, 1.00 and 0.0 respectively.

Scheme 4.1 Compounds used for validation of the performance of the pharmacophore models to screen active inhibitors. The compound name and the MIC values (in μg/mL) are given below each compound [35].

85

Chapter 4

To verify if the models screen any inactive compounds, 2050 compounds reported to be inactive against M. Tb were collected. 1398 compounds were found to be within the molecular weight range of 180-400 and consisting 12 to 27 heavy atoms (similar to that of the 23 active compounds, SAM and SAHC). These 1398 compounds were then screened against all e-

Pharmacophore models using the same criteria to check if these models screen any inactive compounds.

For further validation, all the 23 reference inhibitors were docked to the active sites of each of the snapshots and the crystal structures. The compounds were first subjected to Glide SP module to generate five best poses for every compound. The Glide SP generated poses were further subjected to XP docking and the poses with the highest score for all the compounds were retained which were ranked based on the docking score for each snapshot. The top scored compounds from docking and the hits screened by the corresponding structure based e- pharmacophore models were compared.

4.3 Results and Discussion

Proteins being very flexible, sample a number of conformations in physiological conditions out of which some of them are relevant for ligand binding. Hence, choosing right conformations of binding pockets of receptors for structure based VS processes is crucial for identifying suitable lead compounds. In the previous chapter, we observed significant conformational changes in the binding sites of CmaA1 with respect to the different stages of cyclopropanation. Given the possibility of diverse conformational states, considering receptor flexibility in this case seems to be crucial for SBDD. Many hypotheses also suggest that binding site of a protein exists in an equilibrium ensemble of different conformations of similar energy and ligands can choose the most favorable conformer from this ensemble to bind with [45, 46].

86

Chapter 4

Hence, we collected the snapshots of holo CmaA1 structure bound to different cofactors/substrate/product from the MD simulation trajectories representing all the possible natural states of the target and exploring all types of interactions of the active site residues with the natural ligands SAM and SAHC. The crystal structure (1KPH) was also considered for generation of pharmacophore models as we were interested to compare the predictive abilities of the pharmacophore models generated from the MD snapshots and the static crystal structure. e- pharmacophore models were generated based on the interaction of SAM/SAHC with the active site residues of CmaA1from each of the structure considered for the study (the MD snapshots and the static crystal structure). e-pharmacophore is a new approach, which generates structure based pharmacophore models using the energetic binding terms from Glide XP and is a relatively fast method for screening. The protein-ligand interactions are reasonably well characterized in the e-pharmacophore by the XP terms, and excluded volumes corresponding to regions space occupied by the receptor are also allowed. The models consist of six different chemical features, viz., H-bond acceptor (A), H-bond donor (D), hydrophobic sites (H), negative ionizable sites

(N), positive ionizable sites (P) and aromatic rings (R), and the maximum number of features per model was assigned as 9 [47]. H-bond donors were represented as projected points, located at the corresponding H-bond acceptor positions in the binding site. Projected points allow the possibility for structurally dissimilar active compounds to form H-bonds to the same location, regardless of their point of origin and directionality. Each pharmacophore feature site is first assigned an energetic value equal to the sum of the Glide XP contributions of the atoms comprising the site, allowing sites to be quantified and ranked on the basis of the energetic terms.

Glide XP descriptors include terms for hydrophobic enclosure, hydrophobically packed correlated H-bonds, electrostatic rewards, π-π stacking, cation-π, and other interactions.

87

Chapter 4

ChemScore H-bonding and lipophilic atom pair interaction terms are included while the Glide

XP terms for H-bonding and hydrophobic enclosure are zero.

4.3.1 Comparison of e-pharmacophore models generated from different model systems

The e-pharmacophore model generated from the static crystal structure has 7 features (Figure

4.2). The A and D features near the sugar part of SAHC are formed as a result of the H-bonding

+ between the residues T94, L95 and Q99. The terminal –NH3 group interacts with G72, I136 and

G137 resulting in the P feature. The –NH2 group attached to the C6 atom of the adenine moiety of SAHC with E124 results in formation of a donor feature.

W123 I136

A138 G137 E124 G72

L93

T94

L95

S96 Q99

Figure 4.2 e-pharmacophore model generated from the crystal structure of CmaA1 (1KPH) along with the associated active site residues. Color codes for the pharmacophoric features are as follows. Cyan: D, Pink: A, Red: N, Blue: P, Green: H and Orange: R. Same color code for the features is followed for all the other figures in the thesis.

Two R features are located near the aromatic ring of the adenine part of SAHC. The presence and the absence of the cofactor and or the ligand induce several conformational changes and the differences among the pharmacophore models due to this effect are discussed below.

Overall many similarities were found among the models generated from the trajectories of

88

Chapter 4 different model systems, the most common being the R features which occur near the F142 residue. The other common features found are the D and A features near the residues G72, T94,

Y16, L95, S96 and Q99 as a result of H-bonds made by the sugar moiety of SAM/SAHC. The P and N features are found mostly near the residues I136 and Y33, S34 respectively which show electrostatic interaction with the polar terminal part of the cofactors. In the system E-SAM, SAM has a different conformation than the other systems and interacts with different residues. So, the spatial locations of the features of models generated from the system E-SAM were found to be strikingly different than those of the others. H-bonding with the residues G72, S96 and Q99 with the sugar part of SAM was not found in this system as E-SAHC-D. Due to the conformational differences between SAM and SAHC, G72 interacts with the terminal –NH3+ group of SAM rather than the sugar part as in E-SAHC-D. Similarly, in the E-SAM-S system, the two –OH groups of the sugar part of SAM mostly make H-bonds with Q99 as H-bond acceptors and with

Y16 as H-bond donors in some snapshots contributing towards the A and D features. Initially the terminal –NH3+ and –COOH groups make H-bond and salt bridge interactions with G72 and S34 respectively, but after 10 ns, there is a large conformational change in SAM and new interactions were formed with D70 and T78 respectively in all snapshots giving rise to the P and N features.

It was observed that the P features have higher scores in the E-SAM and E-SAM-S systems, while the A and D features have higher scores in the E-SAHC-D, E-SAHC-P and E-SAHC systems. The models generated from the crystal structure do not have an N feature whereas the ones generated from the E-SAHC-D system do not have P features. Such differences among the pharmacophore models further illustrate the major variations in the conformational states of the residues that form the binding site, and hence the importance of accounting for the flexible nature of the protein.

89

Chapter 4

4.3.2 Screening CmaA1 inhibitors by e-pharmacophore and docking

The predictive abilities of the generated models were verified by screening a set of 23 reference compounds showing CmaA1 inhibitory activities in the range from 0.0125 to 12.5

μg/mL. These are anti-tubercular drug thiacetazone and its clinical analogues that are shown to cause significant loss of cyclopropanation in various mycobacterial strains [35]. The reversal of their effect on cyclopropanation upon over-expression of the cyclopropane synthase enzymes has provided evidence of direct binding of these compounds to cyclopropane synthase [35]. The pharmacophore fitness score was obtained for each of the compound against each model and hits were fetched in order of decreasing fitness. The fitness score is a linear combination of the site and vector alignment scores and the volume score that measures how well the matching pharmacophore site points align to those of the hypothesis, how well the matching vector features (acceptors, donors, aromatic rings) overlay those of the hypothesis, and how well the matching conformation superimposes, in an overall sense, with the reference ligand conformation. The reference compounds were also docked to the active sites of the MD snapshots and the crystal structure to analyze the interactions made by the reference compounds with the active site residues of CmaA1 and compare these interactions with the pharmacophoric features of the corresponding e-pharmacophore models matched by the same compounds. The pharmacophore fitness score and the XP docking scores were compared for each molecule with each pharmacophore model and the corresponding snapshot.

The models generated from the MD snapshots could screen upto 17 out of 23 reference compounds while the model generated from the crystal structure could screen only one compound. It was found that for a given compound the highest fitness score with the e- pharmacophore models obtained from the MD snapshots was higher than that obtained from the

90

Chapter 4 crystal structure. The docking scores of a given reference compound with the crystal structure was mostly found to be lower than that with the MD snapshots. This shows the advantage of sampling many conformations of the binding site from the MD studies in screening. Figure 4.3 shows the pharmacophore fitness and docking scores for each reference compound with the models generated from the crystal structure and the MD snapshots.

1.6 -10.0 -9.5 1.4 -9.0 -8.5 1.2 -8.0 -7.5 1.0 -7.0 -6.5 0.8 -6.0 0.6 -5.5 -5.0 0.4 -4.5 -4.0 0.2 -3.5

Fitness Score Fitness -3.0 Docking Score 0.0 -2.5 -2.0 -0.2 -1.5 -1.0 -0.4 -0.5

0.0

C1 C2 C3 C4 C5 C6 C7 C8 C9

C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 C23

C1 C2 C3 C4 C5 C6 C7 C8 C9

C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 C23 Reference Compounds C10 Reference Compounds

Figure 4.3 Pharmacophore fitness and docking scores of the reference compounds with the MD based models/snapshots (dark blue) and the crystal structure based models/snapshots (cyan).

It was also observed that at least one snapshot from each trajectory gave a model that could screen 13-17 out of 23 active compounds. At the same time models generated based on certain snapshots from each of the five trajectories screened none of the active compounds. This is in agreement with the hypothesis that ligand binding is an event where the ligand selects the most suitable binding site conformation from an ensemble of pre-existing partially fitting receptor conformations, as stated by Bosshard [48, 49]. This concept of pre-existing receptor conformational ensembles offers a strong support to the incorporation of receptor flexibility through multiple receptor conformations generated by MD simulations [45]. As the model systems considered in this study are the representative conformations of the highly-populated low-energy states of CmaA1 binding site when bound to its natural ligands (E-SAM, E-SAM-S,

91

Chapter 4

E-SAHC-P, E-SAHC) at each step of the cyclopropanation reaction, as well as when bound to an inhibitor (E-SAHC-D), the inhibitors could recognize one of those representative conformation at each step of cyclopropanation that best suits them for binding. A single static crystal structure of CmaA1 does not appropriately represent all such possible states, and hence does not account for binding of all the ligands considered. The generated e-Pharmacophore models were also tested for their ability to differentiate between the inhibitors and non-inhibitors of similar size and molecular weight. 1398 non-inhibitors of M. Tb were collected from the ChEMBL database for this purpose. Upon screening with the e-Pharmacophore models using the same criteria, it was observed that none of the models could screen more than 180 non inhibitors out of 1398.

Also, the ranges of the fitness scores for the non inhibitors were found to be lower than those of the inhibitors. The maximum fitness scores for non-inhibitors screened by each of the models were mostly found to be less than 1, and mostly lower than the active compounds.

Comparing the screening and docking results, we found that the reference compound C17 and C23 were screened by 24 and 26 out of 41 pharmacophore models respectively and interestingly these two compounds showed the highest scores viz., -9.2 and -9.5 with the snapshots E-SAM at 30 ns and E-SAHC-P 10 ns. The correlation among the pharmacophore fitness and docking score are insignificant because it is well known that both these methods use many approximations while calculating the scores. Also the activity range of the reference compounds was also not very wide and we consider all of them to be active. Hence, a lower correlation between these two scores and also with the activity is expected.

92

Chapter 4

Pharmacophore Common Docking Pharmacophore Common Docking screening hits hits hits screening hits hits hits

C1, C2, C4, C5, C7, C7, C8, C15, C22 C1, C2, C3, C20, C21 C3, C6, C11, C12, C14, C18, C4, C5, C7, C9, C10, C13, C16, C20, C21, C10, C11, C15 C17, C19, C22 C12, C14, C23 C16, C18, C23

A) E-S AHC-D (15 ns) B) E-SAHC-D (20 ns)

Pharmacophore Common Docking Pharmacophore Common Docking screening hits hits hits screening hits hits hits

C2, C6, C1, C3, C4, C8, C18, C3, C11, C1, C2, C4, C8, C14, C7, C13, C5, C9, C20, C21, C15, C19 C5, C7, C9, C18, C22 C19 C10, C11, C22 C10, C12, C12, C16, C13, C16, C17, C23 C17, C23

C) E-S AM (30 ns) D) E-S AHC (10 ns)

Pharmacophore Common Docking screening hits hits hits

C12, C15, C1, C2, C3, C8, C14, C16 C4, C5, C6, C21 C7, C9, C10, C11, C18, C22, C23

E) E-SAHC (35 ns)

Figure 4.4 Compounds screened by the best 5 e-pharmacophore models and docking with the respective snapshots.

Considering the fact that both pharmacophore screening and docking are recognized as integral parts of drug design, each method having its own strengths and weaknesses, we tried to identify the snapshots which screened common reference compounds both by e-pharmacophore models as well as docking. Same number (that a pharmacophore model screened) of top scoring compounds from the corresponding docking results were taken out.

93

Chapter 4

Table 4.1 Features of the selected 5 best e-pharmacophore models.

Feature Label Type Score X Y Z Source A) E-S AHC-D (15 ns) D8 D -1.61 -3.72 -6.27 4.90 HBond A5 A -1.60 -3.06 -6.97 4.84 HBond A6 A -1.52 -1.28 -4.83 5.28 HBond R21 R -0.97 -5.27 -8.56 0.97 RingChemscoreHphobe N18 N -0.73 -0.64 2.56 5.19 HBond D9 D -0.59 -2.21 -4.77 5.48 HBond R20 R -0.59 -3.20 -8.26 0.77 RingChemscoreHphobe D13 D -0.27 -6.10 -11.18 -0.95 HBond B) E-SAHC-D (20 ns) A2 A -1.95 -6.27 -9.45 0.03 PhobEnHB+HBond A5 A -1.60 -2.90 -7.40 4.16 HBond D9 D -1.60 -2.27 -4.91 4.67 none R21 R -0.92 -5.11 -8.72 0.31 RingChemscoreHphobe D8 D -0.83 -3.43 -6.63 4.36 HBond N18 N -0.56 -0.46 2.23 5.08 HBond R20 R -0.56 -3.01 -8.56 0.24 RingChemscoreHphobe A6 A -0.25 -1.34 -5.11 4.81 HBond D14 D -0.25 -5.88 -11.64 -0.96 HBond C) E-S AM (30 ns) P16 P -2.61 -4.07 0.08 3.59 Electro A6 A -1.60 -3.04 -8.72 5.56 HBond D7 D -1.60 -0.37 -6.73 6.50 HBond R18 R -0.56 -2.76 -9.49 0.46 RingChemscoreHphobe D8 D -0.47 -3.35 -8.00 6.09 HBond N14 N -0.33 -1.64 1.33 4.62 none+HBond D) E-S AHC (10 ns) A2 A -2.20 -6.81 -9.73 -0.11 PhobEnHB+HBond A5 A -1.60 -3.23 -8.70 5.46 HBond D8 D -1.60 -3.79 -8.31 6.14 HBond D9 D -1.60 -3.65 -5.84 6.75 HBond A6 A -1.26 -3.77 -5.89 5.80 HBond D14 D -0.34 -6.15 -11.90 -1.25 None N18 N -0.25 0.44 -0.01 4.78 none+HBond R21 R -0.89 -5.92 -9.20 0.76 RingChemscoreHphobe E) E-SAHC (35 ns) A5 A -1.60 -3.36 -8.55 4.48 None D9 D -1.60 -2.38 -5.80 6.31 HBond A2 A -1.50 -6.84 -9.70 -0.29 PhobEnHB D8 D -1.46 -3.70 -8.05 5.23 HBond N18 N -0.94 -0.70 1.68 4.84 HBond R21 R -0.87 -5.82 -9.13 0.41 RingChemscoreHphobe

94

Chapter 4

Docking C23 Fitness C23 C19 C19 C17 C17 C16 C16 C15 C15 C13 C13 C12 C12 C11 C11 C10 C10 C9 C9 C7 C7 C6 C6

C5 C5 Referance compounds

Reference Compounds C4 C4 C3 C3 C2 C2 C1 C1 -6 -5 -4 -3 -2 -1 0 1 -6 -5 -4 -3 -2 -1 0 1 Fitness and Docking Score Fitness and Docking Score

A) E-S AHC-D (15 ns) B) E-SAHC-D (20 ns) C23 C23 C19 C19 C17 C17 C16 C16 C15 C13 C13 C12 C12 C11 C11 C10 C10 C9 C9 C7 C7 C6 C6 C5 C5

C4 Reference Compounds C4 Reference Compounds C3 C3 C2 C2 C1 C1 -8 -7 -6 -5 -4 -3 -2 -1 0 1 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 Fitness and Docking Score Fitness and Docking Score

C) E-S AM (30 ns) D) E-S AHC (10 ns) C23 C19 C17 C16 C13 C12 C11 C10 Docking C9 Fitness C7 C6 C5

C4 Reference Compounds C3 C2 C1 -8 -7 -6 -5 -4 -3 -2 -1 0 1 Fitness and Docking Score

E) E-SAHC (35 ns)

Figure 4.5 Pharmacophore fitness score and XP docking scores of the reference compounds with the models and the respective snapshots. The cyan bars represent the docking scores while the dark blue ones represent the pharmacophore fitness scores.

On comparing with the compounds screened by the respective e-pharmacophore models, the compounds screened by both the methods were identified. It was observed that in case of most of the MD snapshots, the pharmacophore model generated from the snapshot and docking screen a large number of common reference ligands as top scoring hits. Five models generated from the snapshots of the MD trajectories of the model systems E-SAHC-D at 15 ns and 20 ns,

95

Chapter 4

E-SAM at 30 ns and E-SAHC at 10 ns and 35 ns and were selected as the best models as the number of common compounds screened by the pharmacophore model and docking was more than 10.

Table 4.1 lists the details of the five best models and Figure 4.5 shows the docking scores and pharmacophore fitness scores of the reference compounds with these snapshots/models.

These models were also found to screen very small number of non-inhibitors with lower fitness scores as compared to the reference inhibitors. This shows the selectivity of the models to screen the true inhibitors of CmaA1. Table 4.2 shows a comparison of abilities of the selected models to screen the inhibitors and non-inhibitors and a comparison of their fitness scores. The ability of the models to screen a majority of the inhibitors and a small fraction of non-inhibitors further validate the models proposed here.

Table 4.2 Comparison of number of inhibitors and non-inhibitors screened by the selected models and ranges of their fitness scores.

Model % of inhibitors Fitness Score % of non-inhibitors Fitness Score screened Range screened Range E-SAHC-D (15 ns) 74 0.56-1.56 4.5 0.4-1.06 E-SAHC-D (20 ns) 74 0.46-1.41 5.4 0.32-0.96 E-SAM (30 ns) 70 1.02-.1.51 0 - E-SAHC (10 ns) 74 0.44-1.70 5.1 0.33-0.87 E-SAHC (35 ns) 70 0.66-1.64 0.5 0.38-0.91

SAHC-D at 15 ns and 20 ns, E-SAM at 30 ns and E-SAHC at 10 ns and 35 ns and were selected as the best models as the number of common compounds screened by the pharmacophore model and docking was more than 10. The models obtained from the snapshots

E-SAHC-D at 15 and 20 ns have similar features and screen the same set of compounds. The reference compounds C1, C3 and C12 have highest fitness scores for the model generated from

96

Chapter 4 the snapshot E-SAHC-D at 15 ns and the number of common top scored compounds by pharmacophore screening and docking for this snapshot is 10. Similarly, the model generated from the snapshot E-SAHC-D at 20 ns matches the compounds C7, C9 and C11 with the highest fitness score and the number of common compounds was 13. The model generated from the snapshot E-SAM at 30 ns screens the compounds C2, C4 and C13 with highest score and the snapshot produced the highest docking score for C23.The number of common top scored compounds by pharmacophore screening and docking for this snapshot is 11. The model generated from the snapshot E-SAHC at 10 ns has 12 common compounds in the top scored hits of pharmacophore screening and docking. The reference compounds C6 and C17 have their highest fitness scores with this model. The pharmacophore model from the snapshot E-SAHC at

35 ns has 13 common compounds as the top scored matches for both pharmacophore screening and docking. Figure 4.6 shows these five best models along with the active site residues associated with the pharmacophoric features.

Figure 4.7 depicts SAHC/SAM mapped to the five pharmacophore models and also the interactions of SAM/SAHC with the respective active site residues that give rise to the features.

Figure 4.8 shows the most active compound C1 mapped to the selected e-pharmacophore models and also the interactions made by the compound with the active site residues of CmaA1 obtained from docking. The reference compounds were found to interact with the associated active site residues the same way as they are matched to the pharmacophore models. We propose the e- pharmacophore models obtained from the snapshots E-SAHC-D at 15 ns and 20 ns, E-SAM at

30 ns and E-SAHC at 10 ns and 35 ns as the best models that can be used for screening compound databases. Further studies on large scale VS based on the models obtained here are presented in the next chapter.

97

Chapter 4

A) B) A2:Trp123

D13:Glu124 D8&D9Gly72 D8&D9:Gly72

N18:Ser34 D14:Glu124 N18:Tyr33&Ser34 A5:Leu95 A5:Leu95 A6:Gln99 A6:Gln99

D) C) P6:Ile136 A2:Trp123

N14:Ser34 N18:Tyr33 A6:Gly74 D14:Glu124 A6:Leu95 A5:Leu95 D8:Thr94 D7:Tyr16 D9:Gln99 D8:Thr94

E)

A2:Trp123

N18:Tyr33

A5:Leu95 D9:Gln99 D8:Thr94

Figure 4.6 Selected e-pharmacophore models with the active site residues associated with the pharmacophore features. A) E-SAHC-D (15 ns), B) E-SAHC-D (20 ns), C) E-SAM (30 ns), D) E-SAHC (10 ns) and E) E-SAHC (35 ns).

98

Chapter 4

I 136 E F A 138 G 124 142 137

W S 123 34

G Y 122 T 33 32 G L W L G Q 74 95 75 93 72 99 V 12 S T 96 94

A) E-SAHC-D (15 ns) G F 72 H 142 141 Y A Y 33 16 138

W 123 G 122 L I 93 136 V T L Q 71 94 95 99 T S S 32 V G 96 34 12 74

B) E-SAHC-D (20 ns) S 34

H F 141 G 142 A 138 137 T H 32 8

I 136 Y G 16 72 L 95 C 73 T G 94 74

Q 99

C) E-SAM (30 ns) E H 124 141 G I 72 136 V G 12 G 122 137 W 123 Y 33 V 71 S T W 34 L 32 Y 75 93 L 16 95 Q G 99 74

D) E-SAHC (10 ns) G F 72 H 142 141 Y A Y 33 16 138

W 123 G 122 L I 93 136 V T L Q 71 94 95 99 T S S 32 V G 96 34 12 74

E) E-SAHC (35 ns)

Figure 4.7 Best 5 e-pharmacophore models mapped to SAM/SAHC and the interactions of SAM/SAHC with the active site residues which give rise to the pharmacophoric features.

99

Chapter 4

H L I V 141 95 136 71

W 123 G 137 E 124 G A F 72 138 142

A) E-SAHC-D (15 ns) F 142 W E 123 L 124 95 A 138 T 94 G 72 R

146 L H 93 8 H 141 G G 122 74

B) E-SAHC-D (20 ns) F Y Q 23 33 T G 31 32 C 74 73 Y 16 I L 136 27 G H 72 15 W S 75 34

C) E-SAM (30 ns) H 141 E T 124 94 C 73 Q W 99 123 G R L 74 146 L G G F 93 95 72 122 142

D) E-SAHC (10 ns) R 146 H 141 V 12

G 122 W F L 123 F 142 95 9

E) E-SAHC (35 ns) Figure 4.8 The most active reference compound C1 mapped with the best 5 e-pharmacophore models and the docked pose and interaction of the compound with the active site residues of the corresponding snapshots.

100

Chapter 4

4.4 Conclusions

In this chapter, structure based e-pharmacophore models generated from the snapshots obtained from the MD simulation trajectories of five model systems of mycobacterial CmaA1 representing various stages of cyclopropanation and the reported crystal structure of CmaA1.

The performance of these pharmacophore models were validated by mapping 23 thiacetazone analogues showing CmaA1 inhibitory activities (MIC) in the 0.0125 to 12.5 μg/mL range. The e- pharmacophore models generated from the MD snapshots were able to screen upto 17 reference compounds. The models generated by considering the flexible nature of the protein were able to screen the reference compounds more efficiently as compared to the models generated from the static crystal structure indicating the need of incorporating receptor flexibility in VS. We found that at least one model generated from the MD snapshots is able to screen upto 16 out of 23 reference compounds supporting the pre-existing receptor conformational ensembles theory. The models were further validated by comparing the hits screened by the e-pharmacophore model and docking the compounds with the respective snapshots. Five best models have been proposed based on the number of common compounds screened by both pharmacophore screening and docking methods for VS. Further studies to compare the present results with ligand-based pharmacophore models, and VS approaches are discussed in the next chapter.

101

Chapter 4

References

1. Badrinarayan, P., & Sastry, G. N. (2012). Virtual screening filters for the design of type II p38 MAP kinase inhibitors: A fragment based library generation approach. J Mol Graph Model, 34, 89-100. 2. Badrinarayan, P., & Narahari Sastry, G. (2011). Virtual high throughput screening in new lead identification. Comb Chem High Thr Scr , 14(10), 840-860. 3. Badrinarayan, P., & Sastry, G. N. (2010). Sequence, structure, and active site analyses of p38 MAP kinase: exploiting DFG-out conformation as a strategy to design new type II leads. J Chem Inf Model, 51(1), 115-129. 4. Reddy, A. S., Pati, S. P., Kumar, P. P., Pradeep, H. N., & Sastry, G. N. (2007). Virtual screening in drug discovery-a computational perspective. Curr Protein Pept Sci, 8(4), 329-351. 5. Badrinarayan, P., & Sastry, G. N. (2013). Rational approaches towards lead optimization of kinase inhibitors: The issue of specificity. Curr Pharm Des, 19(26), 4714-4738. 6. Villar, H. O., Yan, J., & Hansen, M. R. (2004). Using NMR for ligand discovery and optimization. Curr Opin Chem Biol, 8(4), 387-391. 7. Blundell, T. L., & Patel, S. (2004). High-throughput X-ray crystallography for drug discovery. Curr Opin Pharmacol, 4(5), 490-496. 8. Amzel, L. M. (1998). Structure-based drug design. Curr Opin Biotechnol, 9(4), 366-369. 9. Jorgensen, W. L. (2004). The many roles of computation in drug discovery. Science, 303(5665), 1813-1818. 10. Jiang, F., & Kim, S. H. (1991). “Soft docking”: matching of molecular surface cubes. J Mol Biol, 219(1), 79-102. 11. Walls, P. H., & Sternberg, M. J. (1992). New algorithm to model protein-protein recognition based on surface complementarity: Applications to antibody-antigen docking. J Mol Biol, 228(1), 277-297. 12. Chan, H. S., & Dill, K. A. (1998). Protein folding in the landscape perspective: Chevron plots and non‐Arrhenius kinetics. Proteins: Struct., Funct., Bioinf., 30(1), 2-33. 13. Schnecke, V.; Swanson, C. A.; Getzoff, E. D.; Tainer, J. A.;Kuhn, L. A. Protein folding in the landscape perspective: Chevron plots and non‐Arrhenius kinetics. 1998, 33, 74. 14. Leach, A. R. J.Ligand docking to proteins with discrete side-chain flexibility. Mol. Biol.1994, 235, 345-356. 15. Murray, C. W.; Baxter, C. A.; Frenkel, A. D. J. The sensitivity of the results of molecular docking to induced fit effects: application to thrombin, thermolysin and neuraminidase. Comput. Aided Mol. Des.1999, 13, 547-562. 16. Bouzida, D.; Rejto, P. A.; Arthurs, S.; Colson, A. B.; Freer, S. T.; Gehlhaar, D. K.; Larson, V.; Luty, B. A.; Rose, P. W.; Verkhivker, G. M. Computer simulations of ligand– protein binding with ensembles of protein conformations: A Monte Carlo study of HIV‐1 protease binding energy landscapes. Int. J. Quantum Chem.1999, 72, 73-84. 17. Erickson, J. A.; Jalaie, M.; Robertson, D. H.; Lewis, R. A.; Vieth, M.Lessons in Molecular Recognition: The Effects of Ligand and Protein Flexibility on Molecular Docking Accuracy. J. Med. Chem.2004, 47, 45-55. 18. Cavasotto, C. N.; Abagyan, R. Protein Flexibility in Ligand Docking and Virtual Screening to Protein Kinases. J. Mol. Biol.2004, 337, 209-225.

102

Chapter 4

19. Schapira, M.; Abagyan, R.; Totrov, M.Nuclear Hormone Receptor Targeted Virtual Screening. J. Med. Chem.2003, 46, 3045-3059. 20. Daeyaert, F.; de Jonge, M.; Heeres, J.; Koymans, L.; Lewi, P.; Vinkers, M. H.; Janssen, P. A. A pharmacophore docking algorithm and its application to the cross‐docking of 18 HIV‐NNRTI's in their binding pockets. Proteins: Struct Funct Bioinf 2004, 54, 526-533. 21. Case, D. A. (1994). Normal mode analysis of protein dynamics. Curr Opin Struct Biol, 4(2), 285-290. 22. Hayward, S., & Go, N. (1995). Collective variable description of native protein dynamics. Annu Rev Phys Chem, 46(1), 223-250. 23. Ferrari, A. M., Wei, B. Q., Costantino, L., & Shoichet, B. K. (2004). Soft docking and multiple receptor conformations in virtual screening. J Med Chem, 47(21), 5076-5084. 24. Totrov, M., & Abagyan, R. (2008). Flexible ligand docking to multiple receptor conformations: a practical alternative. Curr Opin Struct Biol, 18(2), 178-184. 25. Carlson, H. A. (2002). Protein flexibility is an important component of structure-based drug discovery. Curr Pharm Des, 8(17), 1571-1578. 26. Carlson, H. A., Masukawa, K. M., Rubins, K., Bushman, F. D., Jorgensen, W. L., Lins, R. D., McCammon, J. A. (2000). Developing a dynamic pharmacophore model for HIV-1 integrase. J Med Chem, 43(11), 2100-2114. 27. Carlson, H. A., Masukawa, K. M., & McCammon, J. A. (1999). Method for including the dynamic fluctuations of a protein in computer-aided drug design. J Phys Chem A, 103(49), 10213-10219. 28. Moitessier, N., Henry, C., Maigret, B., & Chapleur, Y. (2004). Combining pharmacophore search, automated docking, and molecular dynamics simulations as a novel strategy for flexible docking. Proof of concept: Docking of arginine-glycine- aspartic acid-like compounds into the αvβ3 binding site. J Med Chem, 47(17), 4178-4187. 29. Meagher, K. L., & Carlson, H. A. (2004). Incorporating protein flexibility in structure- based drug discovery: using HIV-1 protease as a test case. J Am Chem Soc, 126(41), 13276-13281. 30. Meagher, K. L., & Carlson, H. A. (2005). Solvation influences flap collapse in HIV‐1 protease. Proteins: Struct Funct Bioinf, 58(1), 119-125. 31. Damm, K. L., & Carlson, H. A. (2007). Exploring experimental sources of multiple protein conformations in structure-based drug design. J Am Chem Soc, 129(26), 8225- 8235. 32. Lerner, M. G., Bowman, A. L., & Carlson, H. A. (2007). Incorporating dynamics in E. coli dihydrofolate reductase enhances structure-based drug discovery. J Chem Info Model, 47(6), 2358-2365. 33. Meagher, K. L., Lerner, M. G., & Carlson, H. A. (2006). Refining the multiple protein structure pharmacophore method: consistency across three independent HIV-1 protease models. J Med Chem, 49(12), 3478-3484. 34. Lexa, K. W., & Carlson, H. A. (2012). Protein flexibility in docking and surface mapping. Q Rev Biophys, 45(03), 301-343. 35. Alahari, A., Trivelli, X., Guérardel, Y., Dover, L. G., Besra, G. S., Sacchettini, J. C., Kremer, L. (2007). Thiacetazone, an antitubercular drug that inhibits cyclopropanation of cell wall mycolic acids in mycobacteria. PLoS One, 2(12), e1343.

103

Chapter 4

36. Choudhury, C., Priyakumar, U. D., & Sastry, G. N. (2014). Molecular dynamics investigation of the active site dynamics of mycobacterial cyclopropane synthase during various stages of the cyclopropanation process. J Struct Biol, 187(1), 38-48. 37. Huang, C. C., Smith, C. V., Glickman, M. S., Jacobs, W. R., & Sacchettini, J. C. (2002). Crystal structures of mycolic acid cyclopropane synthases from Mycobacterium tuberculosis. J Biol Chem, 277(13), 11559-11569. 38. Brooks, B. R., Brooks, C. L., MacKerell, A. D., Nilsson, L., Petrella, R. J., Roux, B., Karplus, M. (2009). CHARMM: the biomolecular simulation program. J Comput Chem, 30(10), 1545-1614. 39. Jo, S., Kim, T., Iyer, V. G., & Im, W. (2008). CHARMM‐GUI: a web‐based graphical user interface for CHARMM. J Comput Chem, 29(11), 1859-1865. 40. Vanommeslaeghe, K., Raman, E. P., & MacKerell Jr, A. D. (2012). Automation of the CHARMM General Force Field (CGenFF) II: assignment of bonded parameters and partial atomic charges. J Chem Inf Model, 52(12), 3155-3168. 41. Friesner, R. A., Murphy, R. B., Repasky, M. P., Frye, L. L., Greenwood, J. R., Halgren, T. A., Mainz, D. T. (2006). Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. J Med Chem, 49(21), 6177-6196. 42. Salam, N. K., Nuti, R., & Sherman, W. (2009). Novel method for generating structure- based pharmacophores using energetic analysis. J Chem Inf Model, 49(10), 2356-2368. 43. LigPrep, version 2.5, Schrödinger, LLC, New York, NY, 2012. 44. Dixon, S. L., Smondyrev, A. M., Knoll, E. H., Rao, S. N., Shaw, D. E., & Friesner, R. A. (2006). PHASE: a new engine for pharmacophore perception, 3D QSAR model development, and 3D database screening: 1. Methodology and preliminary results. J Comput Aided Mol Des, 20(10-11), 647-671. 45. Ma, B., Shatsky, M., Wolfson, H. J., & Nussinov, R. (2002). Multiple diverse ligands binding at a single protein site: A matter of pre‐existing populations. Protein Sci, 11(2), 184-197. 46. Van Regenmortel, M. H. (1999). Molecular recognition in the post‐reductionist era. J Mol Recognit, 12(1), 1-2. 47. Salam, N. K., Nuti, R., & Sherman, W. (2009). Novel method for generating structure- based pharmacophores using energetic analysis. J Chem Inf Model, 49(10), 2356-2368. 48. Bosshard, H. R. (2001). Molecular recognition by induced fit: how fit is the concept?. Physiology, 16(4), 171-173. 49. Boehr, D. D., Nussinov, R., & Wright, P. E. (2009). The role of dynamic conformational ensembles in biomolecular recognition. Nat Chem Biol, 5(11), 789-796.

104

Chapter 5 Dynamic Ligand Based Pharmacophore Modeling and Virtual Screening to Identify Mycobacterial Cyclopropane Synthase Inhibitors

A drug is a substance which, if injected into a rabbit, produces a paper. — Otto Loewi Some Reminiscences of My Life as a Scientist', Int J Quant Biol Symposium (1976), 3, 7.

Chapter 5

5.1 Background

WHO surveys report that in 2014 there were 8.8 million (range 8.5-9.2) incidences of TB,

1.1 million (range, 0.9-1.2 million) deaths from TB among HIV-negative people and an additional 0.035 million (range 0.32-0.39 million) deaths from HIV-associated TB [1] and there are several issues of drug-drug interactions in treating both M. Tb and HIV simultaneously [2].

Hence, molecules that are able to inhibit multiple M. Tb targets as well as simultaneously act on both M. Tb and HIV are required to address the therapeutic challenges like drug resistance and coexistence with HIV. ‘Drug repositioning/repurposing’ is one of the recent approaches for faster drug discovery by finding new indications for existing approved drugs [3-6]. Drug repositioning saves the time and expences of early tests and makes the drug discovery process faster. Literature reveals many examples of repositioning of FDA approved drugs that showed activity against other new diseases suggesting new indications using a wide range of approaches

[6-10]. Approved drugs are already validated for their pharmacokinetic properties and toxicological profiles; hence the use of existing drugs for new indications can potentially reduce the expenses of preliminary testing of the hit compounds. A few recent studies reporting attempts to identify existing drugs acting against M. Tb include application of chemical systems biology by Kinnings et al., where Entacapone and tolcapone, catechol-O-methyltransferase inhibitors used for Parkinson’s disease were found to show inhibitory activities against M. Tb target InhA

[11]. Similarly, upon screening against replicating and non-replicating M. Tb, Nitazoxanide, which was earlier used for infections caused by Giardia and Cryptosporidium species, was found to be active on multiple potential M. Tb targets [12]. The antihelmintic drug Pyrvinium pamoate was found to act against M. Tb as well as Cryptosporidium parvum and Trypanosoma brucei in nanomolar range [13-15].

106

Chapter 5

Polypharmacology is another new concept which has gained popularity in last few years.

The polypharmacological approaches aim to target multiple proteins in an organism is one of the effective strategies to combat drug resistance in M. Tb [16-18]. If a drug that acts on many targets belonging to different functional classes, the chance of drug resistance by mutation will be significantly reduced. Sarah L. Kinnings et al. report a computational study that integrates structural bioinformatics and systems biology to construct a proteome-wide drug-target network of M. Tb called as ‘TB-drugome’ [19]. Analysis of this TB-drugome reveals that approximately one-third of the existing drugs can potentially be repositioned to inhibit M. Tb It also states the possibility of existence of many unexploited druggable targets of M. Tb Thomas Evangelidis and

Lei Xie have developed a ‘Proteome-wide Off-target Pipeline’ integrating analysis of binding sites, docking and scoring and calculation of electrostatic potential and hypothesized the molecular mechanism of Nelfinavir as an anti-cancer agent [20]. Apart from taking care of the drug resistance, it is also essential to identify inhibitors that act as dual inhibitors of M. Tb and

HIV as the incidence of TB in HIV infected patients is one of the challenges to be considered seriously [1, 2]. One example of such an approach is by Lauren Blakea and Mahmoud E. S.

Soliman who developed a hybrid pharmacophore and structure-based approach to design five scaffolds as potential anti-HIV/TB dual-inhibitors. They have employed binding mode analysis, molecular dynamics (MD) simulations and binding free energy calculations and proposed that the designed compounds exhibited better binding affinities towards HIV protease and BlaC enzymes compared to the original drugs darunavir and meropenem respectively [21].

Along with effective strategies for anti-TB drug discovery considering the resistance and coincidence with HIV, reliabilities of the techniques used in the process play crucial role. In recent years, technological advances in in silico methodologies like structure and ligand based

107

Chapter 5 pharmacophore screening [22-25], docking [26], 2D and 3D QSAR modeling [27] etc., have offered medicinal chemists fast and cost-effective alternatives to the traditional HTS to screen drug libraries against therapeutic targets. At the same time all these methodologies have their own limitations as they are based on many severe approximations. For example, most of the docking methods calculate interaction energies and free energies of binding by empirical methods and also they do not accurately incorporate the receptor flexibilities. Hence, it is more reliable to use multiple approaches while designing a VS protocol [28-32]. Consideration of the receptor flexibility using MD simulations in such VS methods adds great value to the accuracy of inhibitor prediction [33]. The pharmacokinetic properties like ADMET profiles of the molecules are some of the very important aspects to be considered while screening compounds to avoid costly failures in the later stages [34, 35]. Molecules with poor ADMET properties may increase the development costs and cause risks on patients, for example drugs with lower absorption capacities have to be administered at a higher dosage. In silico prediction of these properties can rapidly analyze a set of molecules prior to synthesis and help prioritizing the molecules that can then be further investigated experimentally [36, 37]. CmaA1 is an important drug target for anti-TB drug discovery. The previous chapter reported active site dynamics investigation and generation of dynamic structure based pharmacophore models of CmaA1 [38,

39].

The current chapter presents development and validation of dynamic ligand based pharmacophore models followed by VS. A systematic approach employing dynamic ligand and structure based pharmacophore screening, docking and ADMET filters has been used to identify the existing drugs in DrugBank [40] and highly active anti-HIV and anti-M.Tb molecules reported in ChEMBL database [41] that would potentially inhibit CmaA1.

108

Chapter 5

5.2 Methodology

5.2.1 Generation and validation of ligand based pharmacophore models

40 snapshots were collected at every 5 ns interval from the MD trajectories of the five model systems of CmaA1 reported in our earlier study [39]. The cofactors were extracted from each of these snapshots and averaged structures were generated for the extracted cofactors from each trajectory using “General tools” module of Schrodinger Suite. All the cofactors extracted from one trajectory at every 5 ns were first superimposed on the first selected entry and then the averaged structures were calculated using uniform weighing method. Thus, we obtained five averaged structures of the cofactors from the five model systems. These five averaged structures of the cofactors along with the one extracted from the static crystal structure were used to generate the ligand based pharmacophore models using the Phase module of Schrodinger program [42]. Each model consisted of the six types of features. The number of features present in each model varied from 8 to 11. A set of 23 CmaA1 inhibitors with reported MIC values ranging from 0.0125-12.5 μg/mL [43] were used for verifying the abilities of the ligand based pharmacophore models to screen active CmaA1 inhibitors. All these compounds were energy minimized using the default parameters of LigPrep module of Schrodinger Suite [44]. A maximum of 10 conformations per molecule were generated during the matching. The matching criteria were assigned as the compounds must match at least 4 features of a model. A set of 1398

M. Tb inactive compounds reported in ChEMBL database were found to be within the molecular weight range of 180-400 and consisting 12 to 27 heavy atoms (similar to that of the 23 active compounds, SAM and SAHC). These 1398 compounds were then screened against all e-

Pharmacophore models using the same criteria to check if these models screen any inactive compounds.

109

Chapter 5

5.2.2 Virtual Screening

5.2.2.1 Preparation of dataset

Three sets of compounds were used for screening against CmaA1. First one consists of all the 6583 drugs (approved/illicit/withdrawn/nutraceuticals) reported in DrugBank. 701 unique compounds showing activity below 1μM range on M. Tb cell lines reported in ChEMBL database constitute the second set, and 11 089 compounds showing activity below 1μM range on various HIV cell lines were collected from ChEMBL database as the third set. All these compounds were energy minimized using the default parameters of LigPrep module of

Schrodinger Suite. The individual datasets listed above are referred as ‘DrugBank’, ‘ChEMBL-

Mtb’ and ‘ChEMBL-HIV’ respectively.

5.2.2.2 Screening

The three datasets were screened individually to identify potential CmaA1 inhibitors from each set. Four different levels of filters were used for screening each of these datasets. The ligand based pharmacophore models were chosen as the first level filters for VS as they have more number of features and hence can screen diverse compounds. The ‘Advanced

Pharmacophore Screening’ option, Phase module of Schrodinger suite was used with an option to generate five conformations per rotatable bond and maximum number of conformation per compound was assigned to be 100. A rapid sampling was used for screening and the default option for skipping structures with > 15 rotatable bonds was used. The minimum number of sites the molecule must match was assigned to be six for all the ligand based models. All the compounds passing through the first level filter i.e., matching six features of any one of the five models were subjected to second level filter. Five structure based pharmacophore models that have been generated and validated in our earlier study were used as the second level of screening

110

Chapter 5 as they can screen target specific compounds [39]. All the screening criteria were same as the first filter, but here thorough sampling was used and the minimum number of pharmacophore sites the molecules must match was set to be 4. All the compounds that could match with four features of any of the five structure based pharmacophore models were subjected to docking to the active sites of the corresponding protein structures of the selected structure based pharmacophore models. Compounds having reactive functional groups were opted out before the

Glide docking [45]. Glide energy grids were generated for each snapshot to define the active site as a cubic box of 12*12*12 Å3 around the cofactors. Docking was performed in 2 sub-steps i.e., the simple precision (SP) docking and XP docking [46]. Top 50% compound poses ranked according to Glide SP score were subjected to Glide XP docking and top 50% compounds ranked according to the XP scores were retained as top hits. The common compounds in the top

25% of all the five docking results were selected for the ADMET/Drug likeness property calculation with the QuickProp module of Schrodinger [47] to obtain the final hits. Further, the non-covalent interactions made by the final hits obtained from the VS with the CBS residues were examined to see whether these compounds can be potential competitive inhibitors of

CmaA1.

5.3 Results and Discussion

SAM is the native cofactor of CmaA1 which transfers a methyl group to the substrate double bond during the cyclopropanation process and gets converted to SAHC after the methyl transfer. MD simulations on the model systems of CmaA1 in our previous study [40] revealed huge conformational changes in the cofactors in order to facilitate the cyclopropanation reaction.

Along with the conformation, the pattern and types of interactions of these cofactors with the active site residues also showed wide diversity. Figure 5.1 shows the superposition and mutual

111

Chapter 5 root mean squared deviations (RMSD) of the 40 MD snapshots of the cofactors. The predominance of the red color in this figure clearly shows the large difference in the geometries of the cofactors in different snapshots, especially when they are from different model systems.

Hence, we feel that while screening inhibitors of CmaA1, including the flexibilities of the native cofactors within the active site of CmaA1 at different stages of cyclopropanation would also be useful. Representative structures of cofactors from each trajectory were generated and ligand based pharmacophore models were developed from each of them.

Snapshots of SAM/SAHC E-SAHC-D E-SAM E-SAM-S E-SAHC-P E-SAHC

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 E 0.00 2.51 2.51 1.71 2.09 2.23 2.06 2.32 2.86 1.30 2.98 1.69 1.02 1.91 2.56 4.63 4.68 4.74 4.57 4.62 4.60 4.60 4.83 5.98 5.90 5.97 6.05 6.05 5.73 5.87 5.86 1.88 1.93 2.12 2.07 2.36 2.32 2.08 3.30 1.92 1 - 0.00 1.23 2.72 1.55 1.38 2.20 0.80 1.36 2.50 1.79 2.73 2.62 2.69 1.35 4.73 4.62 4.52 4.50 4.52 4.47 4.50 4.53 5.83 5.78 5.84 5.88 5.87 5.62 5.71 5.74 3.05 3.09 3.19 3.15 3.37 3.32 3.08 3.87 2.98 2 SAHC 0.00 2.72 1.55 1.38 2.20 0.80 1.36 2.50 1.79 2.73 2.62 2.69 1.35 4.73 4.62 4.52 4.50 4.52 4.47 4.50 4.53 5.83 5.78 5.84 5.88 5.87 5.62 5.71 5.74 3.05 3.09 3.19 3.15 3.37 3.32 3.08 3.87 2.98 3 0.00 2.22 2.15 2.59 2.55 3.15 0.99 2.44 1.12 1.29 1.43 2.78 5.18 5.23 5.28 5.15 5.21 5.18 5.19 5.35 6.51 6.48 6.52 6.59 6.58 6.31 6.44 6.40 2.56 2.61 2.81 2.75 3.08 3.05 2.83 4.18 2.62 4 0.00 0.73 2.30 1.21 2.05 2.00 1.80 2.23 2.07 2.08 1.74 4.67 4.64 4.64 4.52 4.58 4.55 4.55 4.72 6.10 6.02 6.08 6.14 6.13 5.89 5.99 6.00 2.75 2.75 2.90 2.86 3.07 3.03 2.89 4.01 2.74 5 0.00 2.23 1.07 1.93 2.01 1.60 2.24 2.13 2.18 1.68 4.80 4.81 4.76 4.66 4.70 4.67 4.69 4.84 6.14 6.08 6.16 6.20 6.17 5.93 6.04 6.04 2.87 2.90 2.97 2.97 3.27 3.21 3.01 4.02 2.84

6 - Snapshots SAM/SAHCof 0.00 2.24 2.44 2.30 2.55 2.11 2.47 2.10 2.09 4.58 4.63 4.67 4.54 4.56 4.54 4.55 4.78 5.81 5.73 5.84 5.88 5.87 5.53 5.69 5.67 2.77 2.72 2.70 2.66 2.98 2.91 2.60 3.34 2.55 7 D 0.00 1.40 2.23 1.69 2.53 2.38 2.51 1.30 4.78 4.63 4.58 4.50 4.53 4.49 4.49 4.57 6.05 5.97 6.05 6.09 6.07 5.82 5.93 5.95 2.93 2.99 3.06 3.02 3.26 3.23 2.96 3.92 2.83 8 0.00 2.87 1.93 3.02 3.16 2.83 1.66 4.86 4.67 4.49 4.57 4.57 4.51 4.56 4.50 5.79 5.72 5.80 5.82 5.78 5.65 5.67 5.72 3.36 3.30 3.36 3.30 3.57 3.48 3.25 4.04 3.25 9 RMSD E 0.00 2.51 1.10 1.03 1.45 2.58 5.05 4.99 5.11 4.94 4.98 4.96 4.97 5.13 6.55 6.47 6.55 6.61 6.60 6.32 6.45 6.43 2.33 2.39 2.63 2.45 2.76 2.77 2.51 4.00 2.32 10

0.00 2.25 2.92 2.07 1.82 5.26 5.01 4.94 4.96 4.94 4.89 4.92 4.92 6.40 6.35 6.41 6.45 6.39 6.23 6.31 6.32 3.62 3.62 3.69 3.64 3.89 3.85 3.63 4.65 3.56 - 0.1 to 1 11 SAM 0.00 1.56 0.99 2.66 5.09 5.07 5.17 5.01 5.04 5.01 5.02 5.22 6.51 6.43 6.51 6.57 6.55 6.27 6.42 6.38 2.63 2.66 2.84 2.70 2.98 2.97 2.73 4.02 2.58 12 1 to 2 0.00 2.00 2.77 4.82 4.98 5.07 4.85 4.92 4.90 4.91 5.15 6.27 6.19 6.27 6.33 6.35 6.00 6.17 6.14 2.07 2.14 2.38 2.33 2.59 2.58 2.41 3.65 2.13 13 0.00 2.50 5.10 4.98 5.07 4.96 4.99 4.95 4.97 5.13 6.51 6.44 6.49 6.57 6.54 6.32 6.42 6.40 2.82 2.80 3.00 2.85 3.09 3.09 2.88 4.23 2.81 14 2 to 3 0.00 4.60 4.43 4.39 4.31 4.34 4.31 4.27 4.36 5.79 5.70 5.80 5.83 5.80 5.56 5.69 5.68 3.07 3.10 3.19 3.13 3.29 3.26 3.03 3.80 3.02 15 0.00 2.51 2.68 2.38 2.63 2.61 2.63 3.09 3.77 3.57 3.78 3.85 3.83 3.41 3.66 3.66 4.49 4.41 4.33 4.39 4.45 4.40 4.37 4.33 4.49 16 3 to 4 0.00 1.18 0.78 0.74 0.93 0.89 1.24 4.52 4.29 4.51 4.59 4.44 4.21 4.38 4.44 4.57 4.54 4.54 4.47 4.44 4.49 4.45 4.87 4.68 17 E

0.00 0.97 0.85 0.83 0.93 0.90 4.15 3.95 4.20 4.24 4.06 3.90 4.00 4.12 4.70 4.61 4.57 4.56 4.61 4.60 4.52 4.76 4.76 18 - SAM Above 4 0.00 0.55 0.66 0.63 1.33 4.37 4.16 4.40 4.47 4.34 4.04 4.23 4.30 4.48 4.45 4.41 4.39 4.41 4.44 4.36 4.62 4.56 19 0.00 0.42 0.45 1.09 4.45 4.22 4.48 4.53 4.38 4.09 4.28 4.37 4.59 4.54 4.50 4.48 4.50 4.53 4.45 4.72 4.66 20 0.00 0.40 1.10 4.52 4.28 4.55 4.60 4.44 4.14 4.34 4.44 4.62 4.57 4.53 4.51 4.54 4.56 4.48 4.71 4.67 21 0.00 1.13 4.53 4.29 4.55 4.61 4.46 4.18 4.36 4.45 4.52 4.51 4.48 4.45 4.46 4.48 4.39 4.68 4.59 22 - 0.00 4.39 4.18 4.42 4.46 4.25 4.16 4.24 4.38 4.74 4.70 4.71 4.65 4.65 4.66 4.57 4.95 4.80 23 S

0.00 0.67 0.66 0.56 0.69 1.10 0.64 0.58 5.79 5.63 5.36 5.63 5.73 5.55 5.53 4.83 5.76 24 E

0.00 0.60 0.55 0.59 0.96 0.55 0.62 5.68 5.51 5.25 5.51 5.58 5.41 5.39 4.63 5.63 25 -

0.00 0.40 0.62 1.23 0.67 0.63 5.74 5.57 5.33 5.58 5.65 5.46 5.45 4.81 5.70 26 SAHC 0.00 0.54 1.24 0.62 0.63 5.82 5.63 5.38 5.64 5.72 5.53 5.52 4.80 5.77 27 0.00 1.22 0.64 0.83 5.83 5.64 5.38 5.64 5.72 5.54 5.52 4.83 5.78 28 0.00 0.92 0.93 5.65 5.49 5.21 5.50 5.61 5.44 5.35 4.58 5.56 29

0.00 0.62 5.71 5.54 5.28 5.54 5.63 5.45 5.42 4.69 5.65 30 - 0.00 5.67 5.49 5.21 5.49 5.59 5.41 5.38 4.66 5.62 31 P 0.00 0.52 0.94 0.69 0.87 0.86 0.96 2.63 0.78 32 0.00 0.73 0.46 0.79 0.65 0.72 2.53 0.66

33 E

0.00 0.72 1.05 0.79 0.83 2.11 0.72 34 -

0.00 0.66 0.61 0.54 2.48 0.66 35 SAHC 0.00 0.50 0.77 2.57 0.99 36 0.00 0.71 2.37 0.83 37 0.00 2.21 0.74 38 0.00 2.33 39 0.00 40

Figure 5.1 Superposition and mutual root mean squared deviations (RMSD) of the 40 MD snapshots of the cofactors (SAM/SAHC).

5.3.1 Details and validation of ligand based pharmacophore models

The ligand based pharmacophore models consist of six different chemical features, viz., hydrogen bond (H-bond) acceptor (A), H-bond donor (D), hydrophobic sites (H), negative ionic sites (N), positive ionic sites (P) and aromatic rings (R). D features were represented as projected

112

Chapter 5 points, located at the corresponding A positions in the binding site. Projected points allow the possibility for structurally dissimilar active compounds to form H-bonds to the same location, regardless of their point of origin and directionality. All the ligand based models basically consisted of similar features except an extra P feature for SAM and an extra H feature for SAHC.

But the inter feature distances, spacing and orientations largely vary based on the conformation of these cofactors in the binding pocket. The adenine part consists of 2 A features and 1 D feature, the sugar part contains 2 D and 2 A features situated on the two –OH groups. The methionine part has a P feature for the ligand based models of E-Sam and E-SAM-S while the corresponding homocysteine parts in E-SAHC, E-SAHC-D and E-SAHC-P have H features. The terminal parts of all the ligand based models contain the P and N features. The abilities of the generated models were verified by screening a set of 23 reference compounds showing CmaA1 inhibitory activities in μg/mL range. These are the anti-tubercular drug thiacetazone and its clinical analogues that are shown to cause significant loss of cyclopropanation in various mycobacterial strains. The reversal of their effect on cyclopropanation upon over expression of the cyclopropane synthase enzymes have provided evidence of direct binding of these compounds to the cyclopropane synthase enzymes [43]. The ligand based models were able to screen upto 22 out of the 23 active compounds. It was obvious as the number of features present in these models were more (8-11) hence they could screen more compounds when the matching criteria was kept as minimum 4 features should match with a molecule. The ligand based model generated from the co-crystallized SAHC from the static structure could screen only 4 active compounds in spite of having the same number and types of features that the ligand based models generated from MD trajectories had. Hence, the 5 ligand based pharmacophore models generated from the MD trajectories were used as the first level filters in the VS process. Figure

113

Chapter 5

5.2 shows the five ligand based models with the most active reference compound C1 mapped to them.

Ligand based pharmacophore models E-SAHC-D E-SAM

E-SAM-S E-SAHC-P

E-SAHC

Figure 5.2 The most active reference compound C1 mapped with all the selected ligand based pharmacophore models.

When the 1398 non inhibitors of M. Tb were screened using the 5 ligand based models, it was found that the models screen very less percentage of non inhibitors as compared to the inhibitors (Table 5.1). Table 5.1 also shows that the fitness scores for the non inhibitors are lower

114

Chapter 5 than those of the inhibitors, demonstrating the ability of the models to discriminate the inhibitors from the non inhibitors. Hence, the 5 ligand based pharmacophore models generated from the

MD trajectories were used as the first level filters in the VS process.

Table 5.1 Comparison of number of inhibitors and non-inhibitors screened by the ligand based pharmacophore models and ranges of their fitness scores.

Model % of inhibitors Fitness Score % of non- Fitness Score Range screened Range inhibitors screened E-SAHC-D 65 0.68-1.15 10.2 0.25-1.04 E-SAM 96 1.65-1.90 8.1 0.26-1.29 E-SAM-S 83 0.57-1.38 0.14 0.06-1.09 E-SAHC-P 91 0.62-1.45 10.8 0.24-1.06 E-SAHC 91 0.46-1.69 12.4 0.20-1.05

5.3.2 Virtual Screening

5.3.2.1 Choice of the dataset

Keeping the immediate need of new therapeutic agents to treat tuberculosis, we are interested to screen these existing drugs against CmaA1 inspired by the concept of ‘Drug repurposing/Drug repositioning’, wherein we look for new indications for the existing drugs.

This is a less costly and less time consuming process as well, because of the availability of the pharmacokinetic, toxicology and safety data. Drug repurposing has been proposed to be one of the primary strategies in drug discovery. [48] Hence, the first dataset (DrugBank) was chosen to screen all the existing drugs against CmaA1 to find a possibility to repurpose them. The purpose of choosing the second set (ChEMBL-MTb) of molecules for screening is that, they are experimentally proven as inhibitors of one M. Tb target. The compounds that can inhibit multiple targets can more effectively combat drug resistance by mutation [11]. So we are interested to identify those compounds which inhibit CmaA1 along with other targets of M. Tb. The third

115

Chapter 5 dataset (ChEMBL-HIV) was chosen keeping the urgent need of new drugs that can inhibit both

M. Tb and HIV simultaneously.

5.3.2.2 Design of the VS protocol

ChEMBL DrugBank ChEMBL 11109 compounds having 6429 Approved, Illicit, 701 compounds having anti HIV activities < 1 Withdrawn and Experimental anti M. Tb. activities < 1 μM Drugs and Neutraceuticals μM 18239 Screening with Ligand Based Pharmacophore models (Minimum number of Matching features = 6)

4146 Compounds/drugs screened by all the 5 Ligand Based models

Screening with Selected Structure Based Pharmacophore Models (Minimum Number of Matching Features = 4)

Compounds/drugs screened by all 948 the 5 Structure Based models

Docking with the respective

snapshots # ofCompounds #

Common compounds/drugs 55 screened by all the 5 Dockings

ADMET Filters

12 Hits

Figure 5.3 Schematic representation of the step by step VS process.

Once we chose the dataset to be screened, the VS protocol was carefully designed. The literature report several methods to screen potential compounds for potential targets. Although the pharmacophore screening is one of the fast and cost effective techniques, it has its own limitations which have been discussed elsewhere [49]. Similarly other screening techniques based on docking use severe approximations to estimate the binding affinity of the ligands for a particular protein. Docking based methods also do not consider receptor flexibility to save time

116

Chapter 5 and computational cost. Hence, it would be wise to design the VS protocols employing several techniques so that all their strengths can be exploited as well as their limitations will be complemented by each other. So, in our work we employed ligand and structure based pharmacophore models incorporating dynamics of the receptor, docking with multiple receptor conformation and ADMET filters for screening the dataset of our interest. Figure 5.3 shows the step by step VS process used in our study.

5.3.2.3 First level filter: Dynamic ligand based pharmacophore screening

The first level filter used for our study was the five ligand based pharmacophore models.

These models are based on the diverse conformations of the cofactors in the binding site of

CmaA1 at various stages of the enzyme catalysis. They have 10-11 features and hence can screen structurally diverse compounds satisfying the ligand based requirements of a CmaA1 binder. The datasets of interest were screened initially by the five ligand based models and the unique set of compounds obtained by all the five models were selected for the next level screening. This first level screening returned 1 439, 166 and 2 542 compounds from the datasets DrugBank,

ChEMBL-MTb and ChEMBL-HIV respectively, thus a total of 4 146 compounds.

5.3.2.4 Second level filter: Dynamic structure based pharmacophore screening

The next level filter applied was the five structure based models selected in the previous chapter [39]. These models are based on the interaction of the cofactors with the active site residues of CmaA1 hence are more specific for the target. In the selected structure based models, the D features were formed near the residues G72, T94, Q99 which make H-bonds with the sugar moiety of SAM/SAHC and also with E124, Y16, which make H-bonds with the adenine moiety of SAM/SAHC. In most cases the A features were found near L95, Q99, G74 which make H-

117

Chapter 5 bonds with the sugar moiety of the cofactors and the residues W123 which makes H-Bonds to the adenine part of the cofactors. The P and N features are found mostly near the residues I136 and Y33, S34 respectively which show electrostatic interaction with the polar terminal part of the cofactors. The reference compounds were found to interact with the associated active site residues the same way as they are matched to the pharmacophore models. A thorough sampling was used at this stage of screening and a unique set of 948 compounds in total and 532, 33 and

383 compounds from the datasets DrugBank, ChEMBL-MTb and ChEMBL-HIV respectively were obtained at the second level.

5.3.2.5 Third level filter: Docking

The compounds screened in the previous level were subjected to the third level screening, i. e., docking. All the 948 compounds were docked to the active sites of the parent snapshots of the selected structure based models i.e., the snapshots from the MD trajectories of the model systems E-SAHC-D at 15 ns and 20 ns, E-SAM at 30 ns and E-SAHC at 10 ns and 35 ns. These snapshots were shown to screen optimum numbers of CmaA1 active compounds [39]. Top scoring compounds of each docking run with each snapshots were analyzed. The compounds present in the top 25% of all the five snapshots were passed to the next level of screening. Thus the screened compounds are expected to bind to many optimal conformations of the flexible active site of CmaA1. A total of 55 compounds viz., 30 from DrugBank, 8 from ChEMBL-MTb and 17 from ChEMBL-HIV were screened as the top scoring hits screened by all the five snapshots.

5.3.2.6 Fourth level filter: ADMET properties

These compounds were then subjected to calculation of ADMET properties with

QuickProp module of Schrodinger which predicts many physically significant and

118

Chapter 5 pharmacologically relevant properties to estimate the drug likeliness of a given molecules. We can compare certain properties of a particular molecule with the given ranges of those of 95% of known drugs. Also, QuickProp can identify the presence of 30 types of reactive functional groups that may cause false positives in VSs. The important properties that are calculated and can be compared with the ranges of known drugs are MW, dipole, IP, EA, SASA, FOSA, FISA,

PISA, WPSA, PSA, volume, #rotor, donorHB, accptHB, glob, QPpolrz, QPlogPC16, QPlogPoct,

QPlogPw, QPlogPo/w, logS, QPLogKhsa, QPlogBB, #metabol etc. The descriptions of all these descriptors are listed in reference 47. The screened compounds were prioritized based on a value

#stars calculated by QuickProp, which means the number of property or descriptor values that fall outside the 95% range of similar values for 95% of known drugs. Hence, a smaller #stars suggests that a molecule is more drug-like than molecules with more #stars. All the compounds that have passed the previous filters were screened to get compounds which have #star as 0. Thus a total of 12 compounds were obtained that could pass this filter; 3 DrugBank compounds, 3

ChEMBL-MTb compounds and 6 ChEMBL-HIV compounds. Scheme 5.1 shows the structures of all the selected compounds from all the three datasets taken.

5.3.3 Interactions of the screened compounds with the active site residues of CmaA1

Figures 5.4A-5.4C show the interactions of these 12 selected compounds with the

CmaA1 snapshot with which they have the highest docking score. Among the three selected

ChEMBL-MTb compounds the first two i.e., CHEMBL460104 and CHEMBL462376 have been synthesized and their biological activity against various M. Tb as well as Plasmodium falciparum and Escherichia coli beta-ketoacyl-ACP-synthase III (FabH) enzymes have also evaluated by

Alhamadsheh et al. [50].

119

Chapter 5

DB02224 DB02375 DB03800

CHEMBL460104 CHEMBL462376 CHEMBL512633

CHEMBL462018 CHEMBL37869 CHEMBL67076

CHEMBL209958 CHEMBL340775 CHEMBL1173780

Scheme 5.1 Compounds selected structures of all the selected compounds from all the three datasets taken.

These two compounds show very low IC50 values of 3.69 and 31.4 nM against M. Tb

FabH. Our docking studies reveal that the aromatic rings of both these compounds make π-π interactions with the F142 residue and make H-bonds with the residues T94 and L95.

CHEMBL460104 makes an additional π-π interaction with the H141 residue. The third

ChEMBL-MTb compound CHEMBL512633 screened in our study was synthesized and biologically evaluated by Guzel et al. This compound showed a Ki of 7.2 and 7.5 nM against M.

Tb recombinant carbonic anhydrases Rv3273 and Rv1284 respectively and showed selectivity for the M. Tb targets than their human homologues [51]. This compound makes H-bonds with

120

Chapter 5 the E124 and H141 residues and the aromatic system makes π-π interactions with the F142 residue. The study identified these compounds as potential M. Tb CmaA1 inhibitors as well exploring their new indications.

Y H V G H F 16 141 12 137 A 8 142 138

I 126

W 123 C 73

G G 122 L V 74 Q 71 S 95 99 96 T L 94 93 G 72

CHEMBL460104 H 141 V H 12 R 8 S L 146 96 95 T P 94 7

E 124

W Y 123 G F 122 142 16 G L V 72 C 93 71 73 G Y 74 33 A Q 138 99

CHEMBL462376 R E F 146 124 142 W 123 H 8 V G 12 A 122 138 G 137 C 73

V 71 E 140 T 94 H 141 G L Q L 72 95 99 93

A 121 CHEMBL512633 Figure 5.4 A Interaction of the screened ChEMBL-MTb compounds with the CmaA1. Among the five docking poses with 5 selected snapshots, the complexes with highest docking scores have been shown.

121

Chapter 5

One of the three selected DrugBank compounds is (2s, 3s)-trans-dihydroquercetin (DB02224) which is an experimental drug belonging to the class flavonoids with no information available about the target in DrugBank.

H R 141 146 A 138 F C H 142 73 G 8 74

Y P 16 7 Q 99 E 124 S 96 W V G V G 123 L 12 122 71 72 93 T L 95 94 DB02224 H R 141 146 A 138 F C H 142 73 G 8 74

Y P 16 7 Q 99 E 124 S 96 W V G V G 123 L 12 122 71 72 93 T L 95 94 DB02375 P H 6 H 141 F 8 142 A R G 138 Q 146 C 74 73 99 G E 72 124

A V 121 12 G W Y 122 123 V 71 16 L 93 L T 95 94

S 96 DB03800

Figure 5.4 B Interaction of the screened DrugBank compounds with the CmaA1. Among the five docking poses with 5 selected snapshots, the complexes with highest docking scores are shown.

This compound has high similarity with the approved drug Hesperetin (0.907 Tanimotto coefficient) which is a sterol o-acyltransferase 1 inhibitor used for lowering cholesterol. This compound binds to M. Tb CmaA1 by making-bonds with the residues W123, D124, H141 and

122

Chapter 5

Q99 residues. The electron rich aromatic ring also makes π-π interactions with the F142. The second screened DrugBank compound is Myricetin (DB02375) which also belongs to Flavonoids class and act as inhibitors of multidrug resistance-associated protein1, which may be useful in managing of antimicrobial drug resistance [52]. This compound makes H- bonds with the residues Y16, G74, S96, Q99 and W123. It also makes π-π interactions with the F142. The third

DrugBank compound is an experimental small molecule 2'-deoxyuridylic acid (DB03800) belonging to the class carbohydrate conjugates which binds to proteins from a wide range of organisms including deoxyuridine 5'-triphosphate nucleotidohydrolase of M. Tb and thymidylate synthase of human. This compound binds to CmaA1 by making H-bond interactions with H8,

G72, T94, L95, Q99, W123 and E124.

Among the 6 screened ChEMBL-HIV compounds, the first one is CHEMBL37869, which shows an IC50 of 300 nM against HIV integrase as reported by Zouhiri et al. who performed structure activity relationship studies and tested this compound for inhibition of HIV-

1 integrase and replication of HIV-1 in cell culture. This compound makes H-bonds with the residues H8, Y33, Q99, G137 and H141 and makes π-π interactions with F142. The compound

CHEMBL67076 has been reported to inhibit the HIV-1 integrase with an IC50 of 1 μM. This compound makes H-bonds with H8, G72, T94, L95, Q99 and E124 and π-π interactions with

F142 [22]. The compound CHEMBL209958 was synthesized and the in vitro structure bioactivity study was performed by Fardi et al. [53]. This compound shows an IC50 of 80 nM against HIV-1 integrase. Our docking study shows that this compound makes H-bonds with

W123 and H141 and π-π interactions with F142 of CmaA1. The compound CHEMBL340775 was synthesized and tested for HIV-1 integrase inhibitory activity by Artico et al. in 1998 [54].

This compound showed an IC50 of 200 nM against the HIV-1 integrase. This compound binds to

123

Chapter 5

CmaA1 by making H-bonds with the residues L5, Q99, E124 and R146. The compound

CHEMBL1173780 is a 13 hydroxylated 2-arylnaphthalene synthesized and tested for its inhibitory activities against HIV-1 integrase by Maurin et al in 2010 [55]. This compound showed IC50 value of 500 nM against HIV-1 integrase. In our docking study it was found to bind to CmaA1 by making H-bond interactions with H8, Y33, G74, T94, Q99 and H141 and π-π interactions with F142. The compound CHEMBL462018 is one of the 6-N-acyltriciribine analogues synthesized and tested by Porcari et al. showed IC50 of 40 nM against the reverse transcriptase activity in cells acutely infected with HIV-1 [56]. This compound binds to CmaA1 by making H-bonds with the residues H8, Y33, G72 and E124 of CmaA1. The compounds screened by the dynamics based pharmacophore models and docking mostly interact with the residues H8, Y33, G72, G74, T94, L95, Q99, W123, E124 and H141 which were found to be important residues for the cofactor binding. Hence, these compounds are expected to be potential inhibitors of CmaA1 by competing with the natural cofactors for binding into the cofactor binding pocket of CmaA1.

5.4 Conclusions

In this study ligand based pharmacophore models generated from the snapshots obtained from the MD simulation trajectories of five model systems of CmaA1 representing various stages of cyclopropanation taken from the previous study and the reported crystal structure of

CmaA1. The performance of these pharmacophore models were validated by mapping 23 active and 1398 inactive reference compounds of CmaA1. The ligand based pharmacophore models generated from the averaged structures of the cofactors extracted from the snapshots were able to screen the upto 22 out of 23 CmaA1 active compounds and very less percentage of inactive compounds. Performance of the dynamics based models were found to be better than the model

124

Chapter 5 obtained from the conformation of SAHC in the crystal structure. A novel VS workflow was designed with four levels of filters viz., ligand based pharmacophore screening, structure based pharmacophore screening, docking and ADMET filters.

125

Chapter 5

H H G C Q V S 8 141 E R 74 73 99 12 96 124 146 Y H V L Q F 33 E 141 12 95 99 142 Y F 124 16 142 T C 94 73 P H 7 G 8 74 R P 146 W 7 G 123 137 G L G I W A 137 95 72 136 123 L V T 138 G 93 L G 71 94 122 G 93 V 122 A 71 72 121 Y I 16 136 A 138 CHEMBL462018 CHEMBL37869 A A R P H 138 138 F V 146 7 142 141 H 71 8 H G V G V P 141 137 H 7 12 72 12 8 R 146 E 124 Q E 99 G 124 C Y 122 73 W 16 G 123 Y G 72 G 33 122 F L 142 C 74 L T 93 73 95 94 Q G L L W 93 S S 99 74 T 95 123 V 96 96 94 71 CHEMBL67076 CHEMBL209958 F H R R H K 142 8 146 146 141 S G C 6 G V S L 96 74 73 E H H 72 12 96 95 P 124 141 8 F Q 7 142 T L 99 94 5

Q Y W 125 16 123 Q 99 E P G 124 L 7 V 74 95 71 L G G T A G 93 V 122 W L 72 94 138 122 V 123 93 71 12 A Y 138 16

CHEMBL340775 CHEMBL1173780

Figure 5.4 C Interaction of the screened ChEMBL-HIV compounds with the CmaA1. Among the five docking poses with 5 selected snapshots, the complexes with highest docking scores have been shown.

126

Chapter 5

A dataset containing a total of 18 239 compounds (6 583 drugs reported in DrugBank, 701 and

11 089 compounds showing activity below 1μM range on M. Tb and HIV cell lines respectively collected from ChEMBL database) was screened using the VS workflow. The 12 screened compounds were found to bind to the CmaA1 active site by interacting with the residues H8,

Y33, G72, G74, T94, L93, L95, Q99, W123, E124 and H141. These residues not only have important roles for the binding of the natural substrates of CmaA1, but also found to undergo conformational changes that are necessary during the cyclopropanation reaction. Hence, the screened compounds may be effective inhibitors of CmaA1 out of which 6 compounds may act as dual inhibitors of HIV and M. Tb.

127

Chapter 5

References

1. Global tuberculosis control: WHO report 2015. (http://www.who.int/tb/publications/ global_report/en/index.html). 2. Varghese, G. M., Janardhanan, J., Ralph, R., & Abraham, O. C. (2013). The twin epidemics of tuberculosis and HIV. Curr Infect Dis Rep, 15(1), 77-84. 3. Ekins, S., Williams, A. J., Krasowski, M. D., & Freundlich, J. S. (2011). In silico repositioning of approved drugs for rare and neglected diseases. Drug Discov Today, 16(7), 298-310. 4. Ashburn, T. T., & Thor, K. B. (2004). Drug repositioning: identifying and developing new uses for existing drugs. Nat Rev Drug Discov, 3(8), 673-683. 5. Sardana, D., Zhu, C., Zhang, M., Gudivada, R. C., Yang, L., & Jegga, A. G. (2011). Drug repositioning for orphan diseases. Brief Bioinform, 12(4), 346-356. 6. Cheng, F., Liu, C., Jiang, J., Lu, W., Li, W., Liu, G., Tang, Y. (2012). Prediction of drug- target interactions and drug repositioning via network-based inference. PLoS Comput Biol, 8(5), e1002503. 7. Iorio, F., Bosotti, R., Scacheri, E., Belcastro, V., Mithbaokar, P., Ferriero, R., di Bernardo, D. (2010). Discovery of drug mode of action and drug repositioning from transcriptional responses. Proc Nat Acad Sci, 107(33), 14621-14626. 8. Yang, L., & Agarwal, P. (2011). Systematic drug repositioning based on clinical side- effects. Plos One, 6(12), e28025. 9. Diaz-Gonzalez, R., Kuhlmann, F. M., Galan-Rodriguez, C., da Silva, L. M., Saldivia, M., Karver, C. E., Pollastri, M. P. (2011). The susceptibility of trypanosomatid pathogens to PI3/mTOR kinase inhibitors affords a new opportunity for drug repurposing. PLoS Negl Trop Dis, 5(8), e1297. 10. Moriaud, F., Richard, S. B., Adcock, S. A., Chanas-Martin, L., Surgand, J. S., Jelloul, M. B., & Delfaud, F. (2011). Identify drug repurposing candidates by mining the Protein Data Bank. Brief Bioinform, 12(4), 336-340. 11. Kinnings, S. L., Liu, N., Buchmeier, N., Tonge, P. J., Xie, L., & Bourne, P. E. (2009). Drug discovery using chemical systems biology: repositioning the safe medicine Comtan to treat multi-drug and extensively drug resistant tuberculosis. PLoS Comput Biol, 5(7), e1000423. 12. de Carvalho, L. P. S., Lin, G., Jiang, X., & Nathan, C. (2009). Nitazoxanide kills replicating and nonreplicating Mycobacterium tuberculosis and evades resistance. J Med Chem, 52(19), 5789-5792. 13. Lougheed, K. E., Taylor, D. L., Osborne, S. A., Bryans, J. S., & Buxton, R. S. (2009). New anti-tuberculosis agents amongst known drugs. Tuberculosis, 89(5), 364-370. 14. Downey, A. S., Chong, C. R., Graczyk, T. K., & Sullivan, D. J. (2008). Efficacy of pyrvinium pamoate against Cryptosporidium parvum infection in vitro and in a neonatal mouse model. Antimicrob Agents Chemother, 52(9), 3106-3112. 15. Mackey, Z. B., Baca, A. M., Mallari, J. P., Apsel, B., Shelat, A., Hansell, E. J., McKerrow, J. H. (2006). Discovery of trypanocidal compounds by whole cell HTS of Trypanosoma brucei. Chem Biol Drug Des, 67(5), 355-363. 16. Xie, L., Xie, L., Kinnings, S. L., & Bourne, P. E. (2012). Novel computational approaches to polypharmacology as a means to define responses to individual drugs. Annu Rev Pharmacol Toxicol, 52, 361-379.

128

Chapter 5

17. Keiser, M. J., Setola, V., Irwin, J. J., Laggner, C., Abbas, A. I., Hufeisen, S. J., Roth, B. L. (2009). Predicting new molecular targets for known drugs. Nature, 462(7270), 175- 181. 18. Liu, X., Zhu, F., H Ma, X., Shi, Z., Y Yang, S., Q Wei, Y., & Z Chen, Y. (2013). Predicting targeted polypharmacology for drug repositioning and multi-target drug discovery. Curr Med Chem, 20(13), 1646-1661. 19. Kinnings, S. L., Xie, L., Fung, K. H., Jackson, R. M., Xie, L., & Bourne, P. E. (2010). The Mycobacterium tuberculosis drugome and its polypharmacological implications. PLoS Comput Biol, 6(11), e1000976. 20. Xie, L., Evangelidis, T., Xie, L., & Bourne, P. E. (2011). Drug discovery using chemical systems biology: weak inhibition of multiple kinases may contribute to the anti-cancer effect of nelfinavir. PLoS PLoS Comput Biol, 7(4), e1002037. 21. Blake, L., & ES Soliman, M. (2013). Bifunctional anti-HIV/TB inhibitors: perspective from in-silico design and molecular dynamics simulations. Lett Drug Des Discov, 10(8), 706-712. 22. Carlson, H. A., Masukawa, K. M., Rubins, K., Bushman, F. D., Jorgensen, W. L., Lins, R. D., McCammon, J. A. (2000). Developing a dynamic pharmacophore model for HIV-1 integrase. J Med Chem, 43(11), 2100-2114. 23. Meagher, K. L., & Carlson, H. A. (2004). Incorporating protein flexibility in structure- based drug discovery: using HIV-1 protease as a test case. J Am Chem Soc, 126(41), 13276-13281. 24. Meagher, K. L., & Carlson, H. A. (2005). Solvation influences flap collapse in HIV‐1 protease. Proteins: Struct Funct Bioinf, 58(1), 119-125. 25. Damm, K. L., & Carlson, H. A. (2007). Exploring experimental sources of multiple protein conformations in structure-based drug design. J Am Chem Soc, 129(26), 8225- 8235. 26. Kitchen, D. B., Decornez, H., Furr, J. R., & Bajorath, J. (2004). Docking and scoring in virtual screening for drug discovery: methods and applications. Nat Rev Drug Discov, 3(11), 935-949. 27. Kubinyi, H. (1997). QSAR and 3D QSAR in drug design Part 1: methodology. Drug Discov Today, 2(11), 457-467. 28. Badrinarayan, P., & Sastry, G. N. (2012). Virtual screening filters for the design of type II p38 MAP kinase inhibitors: A fragment based library generation approach. J Mol Graph Model, 34, 89-100. 29. Badrinarayan, P., & Narahari Sastry, G. (2011). Virtual high throughput screening in new lead identification. Comb Chem High Thr Scr , 14(10), 840-860. 30. Badrinarayan, P., & Sastry, G. N. (2010). Sequence, structure, and active site analyses of p38 MAP kinase: exploiting DFG-out conformation as a strategy to design new type II leads. J Chem Inf Model, 51(1), 115-129. 31. Reddy, A. S., Pati, S. P., Kumar, P. P., Pradeep, H. N., & Sastry, G. N. (2007). Virtual screening in drug discovery-a computational perspective. Curr Protein Pept Sci, 8(4), 329-351. 32. Badrinarayan, P., & Sastry, G. N. (2013). Rational approaches towards lead optimization of kinase inhibitors: The issue of specificity. Curr Pharm Des, 19(26), 4714-4738. 33. Carlson, H. A. (2002). Protein flexibility and drug design: how to hit a moving target. Curr Opin Chem Biol, 6(4), 447-452.

129

Chapter 5

34. Selick, H. E., Beresford, A. P., & Tarbit, M. H. (2002). The emerging importance of predictive ADME simulation in drug discovery. Drug Discov Today, 7(2), 109-116. 35. Kubinyi, H. (2003). Drug research: myths, hype and reality. Nat Rev Drug Discov, 2(8), 665-668. 36. Van de Waterbeemd, H., & Gifford, E. (2003). ADMET in silico modelling: towards prediction paradise?. Nat Rev Drug Discov, 2(3), 192-204. 37. Oprea, T. I., Davis, A. M., Teague, S. J., & Leeson, P. D. (2001). Is there a difference between leads and drugs? A historical perspective. J Chem Inf Comput Sci, 41(5), 1308- 1315. 38. Choudhury, C., Priyakumar, U. D., & Sastry, G. N. (2014). Molecular dynamics investigation of the active site dynamics of mycobacterial cyclopropane synthase during various stages of the cyclopropanation process. J Struct Biol, 187(1), 38-48. 39. Choudhury, C., Priyakumar, U. D., & Sastry, G. N. (2015). Dynamics Based Pharmacophore Models for Screening Potential Inhibitors of Mycobacterial Cyclopropane Synthase. J Chem Info Model (Article ASAP), DOI: 10.1021/ci500737b 40. Wishart, D. S., Knox, C., Guo, A. C., Shrivastava, S., Hassanali, M., Stothard, P., Woolsey, J. (2006). DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res, 34(suppl 1), D668-D672. 41. Bento, A. P., Gaulton, A., Hersey, A., Bellis, L. J., Chambers, J., Davies, M., Overington, J. P. (2014). The ChEMBL bioactivity database: an update. Nucleic Acids Res, 42(D1), D1083-D1090. 42. Dixon, S. L., Smondyrev, A. M., Knoll, E. H., Rao, S. N., Shaw, D. E., & Friesner, R. A. (2006). PHASE: a new engine for pharmacophore perception, 3D QSAR model development, and 3D database screening: 1. Methodology and preliminary results. J Comput Aided Mol Des, 20(10-11), 647-671. 43. Alahari, A., Trivelli, X., Guérardel, Y., Dover, L. G., Besra, G. S., Sacchettini, J. C., Kremer, L. (2007). Thiacetazone, an antitubercular drug that inhibits cyclopropanation of cell wall mycolic acids in mycobacteria. PLoS One, 2(12), e1343. 44. LigPrep, version 2.5, Schrödinger, LLC, New York, NY, 2012. 45. Glide, version 5.8, Schrödinger, LLC, New York, NY, 2012. 46. Friesner, R. A., Murphy, R. B., Repasky, M. P., Frye, L. L., Greenwood, J. R., Halgren, T. A., Mainz, D. T. (2006). Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. J Med Chem, 49(21), 6177-6196. 47. QikProp, version 3.5, Schrödinger, LLC, New York, NY, 2012. 48. Tobinick, E. L. (2009). The value of drug repositioning in the current pharmaceutical market. Drug News Perspect, 22(2), 119-125. 49. Yang, S. Y. (2010). Pharmacophore modeling and applications in drug discovery: challenges and recent advances. Drug Discov Today, 15(11), 444-450. 50. Alhamadsheh, M. M., Waters, N. C., Sachdeva, S., Lee, P., & Reynolds, K. A. (2008). Synthesis and biological evaluation of novel sulfonyl-naphthalene-1, 4-diols as FabH inhibitors. Bioorg Med Chem Lett, 18(24), 6402-6405. 51. G el, O., Maresca, A., Scozzafava, A., Salman, A., Balaban, A. T., & Supuran, C. T. (2009). Discovery of low nanomolar and subnanomolar inhibitors of the mycobacterial β- carbonic anhydrases Rv1284 and Rv3273. J Med Chem, 52(13), 4063-4067.

130

Chapter 5

52. Zhou, S. F., Wang, L. L., Di, Y. M., Xue, C. C., Duan, W., Li, C. G., & Li, Y. (2008). Substrates and inhibitors of human multidrug resistance associated proteins and the implications in drug development. Curr Med Chem, 15(20), 1981-2039. 53. Fardis, M., Jin, H., Jabri, S., Cai, R. Z., Mish, M., Tsiang, M., & Kim, C. U. (2006). Effect of substitution on novel tricyclic HIV-1 integrase inhibitors. Bioorg Med Chem Lett, 16(15), 4031-4035. 54. Artico, M., Di Santo, R., Costi, R., Novellino, E., Greco, G., Massa, S., La Colla, P. (1998). Geometrically and conformationally restrained cinnamoyl compounds as inhibitors of HIV-1 integrase: synthesis, biological evaluation, and molecular modeling. J Med Chem, 41(21), 3948-3960. 55. Maurin, C., Lion, C., Bailly, F., Touati, N., Vezin, H., Mbemba, G., Cotelle, P. (2010). New 2-arylnaphthalenediols and triol inhibitors of HIV-1 integrase—Discovery of a new polyhydroxylated antiviral agent. Bioorg Med Chem, 18(14), 5194-5201. 56. Porcari, A. R., Ptak, R. G., Borysko, K. Z., Breitenbach, J. M., Drach, J. C., & Townsend, L. B. (2000). 6-n-acyltriciribine analogues: Structure-activity relationship between acyl carbon chain length and activity against HIV-1. J Med Chem, 43(12), 2457-2463.

131

Chapter 6 The efficacy of conceptual DFT descriptors and docking scores on the QSAR models of HIV protease inhibitors

Inhibitor

Asp25 Asp125

Compare ... the various quantities of the same element contained in the molecule of the free substance and in those of all its different compounds and you will not be able to escape the following law: The different quantities of the same element contained in different molecules are all whole multiples of one and the same quantity, which always being entire, has the right to be called an atom. — Stanislao Cannizzaro Sketch of a Course of Chemical Philosophy (1858), Alembic Club Reprint (1910), 11.

Chapter 6

6.1 Background

Worldwide, TB is one of the most hazardous infections affecting people with HIV.

Although highly effective therapy exists for both HIV and TB, concomitant administration is fraught with difficulties. In patients with co-existence of HIV and TB, the highly active antiretroviral therapy (HAART) is normally delayed for two or more months to minimize the risk of toxic side effects, as after 2 months the TB pill burden and number of TB drugs has been reduced

[1]. Anti-HIV and anti-TB drugs have similar routes of metabolism and elimination, extensive drug interactions and overlapping toxicity profiles that may result in the interruption or alteration of TB and HIV regimens. Hence, it is need of the hour to address the adverse events that disrupt the simultaneous treatment of TB and HIV. The FDA approved anti-HIV drugs in current use can be classified into seven groups: PIs, NRTIs, NNRTIs, FIs, CRIs, and INIs [2]. Among them PIs are the most popular and important class. In the current chapter we report a study with 156 HIV PIs from nine different scaffolds. HIV protease is a non-covalent homo dimer that acts catalytically as an aspartic acid protease. The active site is situated at the dimer interface and possesses catalytic Asp residue from each monomer [3]. Vierling et al. tagged clinically used nelfinavir (S1), saquinavir

(S2) and indinavir (S3) PIs with a mannose residue and synthesized various mannose derived PI prodrugs [4]. Palinavir is a potent inhibitor of the protease and viral replication in-vitro and exhibited a good pharmacological profile in several laboratory animal species [5]. Beaulieu et al. modified the Palinavir and developed 2’,6’-dimethylphenoxy-acetyl derivatives (S4) which is the simpler and smaller version of Palinavir and an achiral high affinity P3-P2 ligand for peptidomimetic-based HIV PIs [6]. Aminodiol (S5) is an effective HIV PI and shows comparable activity against the isolated enzyme (IC50 = 125 nM), HIV-I (ED50 = 80 nM), and HIV-II (ED50 =

120 nM) in cell culture and demonstrates a promising pharmacokinetic profile in rats [7]. Chen et

133

Chapter 6 al. observed a significant correlation between lipophilicity and cytotoxicity for modified aminodiol inhibitors and applied this to improve the in-vitro therapeutic index for these inhibitors [8]. Ghosh et al. reported tetrahydrofurans (S6) as nonpeptidal high affinity ligands for the HIV protease substrate binding site [9]. These ligands were designed based upon various available three- dimensional structures of the protein−ligand complexes by incorporating the conformationally constrained functionality that replaces a peptide bond and mimics the biological mode of action.

Structure-activity studies have established that the position of ring oxygens, ring size, and stereochemistry are all crucial to potency of tetrahydrofurans [10]. 4-hydroxy-5, 6-dihydropyrones

(S7) are a novel class of nonpeptidic PIs [11]. Attempts to enhance the antiviral potency by manipulating the lipophilicity and polarity of the target inhibitors resulted in the discovery of dihydropyrone which demonstrated excellent enzyme activity against the HIV protease and displayed an EC50 of 200 nM with a therapeutic index of >1000 [12]. Hagen et al. synthesized and presented the preliminary pharmacokinetics for this series of inhibitors [12]. Wilkerson et al. synthesized and evaluated a series of non-symmetrically substituted cyclic urea carboxamides (S8) for antiviral activity as a function of the inhibition of HIV-protease [13]. Dorsey et al. presented the

3-pyridylmethyl-hydroxylaminpentenamides (S9) as a clinical backup of indinavir which is the most widely prescribed HIV PI [14]. The foregoing discussion on HIV PIs reveals that the scaffolds and their substituents are structurally diverse and complex. Considering the outstanding importance of the topic, several crystal structures of HIV PIs were reported in the last couple of decades.

Various structure and shape based attempts to design novel HIV PIs have been appeared in the literature [15].

QSAR model generation is one of the most important analogue based CADD methods. The

QSAR models quantitatively measure the biological activities of compounds on a particular target,

134

Chapter 6 based on molecular descriptors calculated from a set of known inhibitors of the target. Considering the complexity and diversity of the HIV PIs, it is challenging to develop QSAR models. QSAR equations are conventionally based on constitutional (C), topological (T), geometrical (G) and electrostatic (E) descriptors [16]. While three and higher dimensional QSAR equations based on molecular field and similarity indices are quite useful, the 2D QSAR approach is straightforward in tracing the right kind of descriptors [17]. Recently the applications of quantum chemical descriptors in the development of QSAR models became popular due to their reliability and versatility. In particular net atomic charges, HOMO-LUMO energies, frontier orbital electron densities and superdelocalizabilities have been used frequently to generate QSAR models [18]. Density functional theory based descriptors are very useful in the prediction of reactivity of atoms and molecules as well as site selectivity [19]. It has been shown that the stability of the molecule is related to hardness [20]. The importance of electrophilicity index, hardness, chemical potential and other descriptors for the prediction of biological activity and toxicity of various compounds has been extensively studied [21, 22].

In this chapter, a study investigating the importance of various types of molecular descriptors those contribute significantly towards the activities of HIV PIs. The study tries to incorporate the analogue based approaches with structure based docking methods and DFT calculations to generate QSAR models. Inclusion of descriptors based on the docking scores and those based on computationally demanding C-DFT, is a topic of great interest [23]. Therefore the objective of the study is twofold. First and foremost is to develop reliable QSAR models with minimal number of descriptors that quantitatively predict the activities of the HIV PIs. Secondly, we wanted to explore the importance of including each and every subcategory of descriptor types in developing QSAR equations.

135

Chapter 6

6.2 Methodology

6.2.1 Dataset preparation

Nine different scaffolds of 156 HIV PIs were collected from literature along with their experimental IC50 values. These scaffolds include nelfinavir (S1), saquinavir (S2), indinavir (S3),

2’,6’-dimethylphenoxy-acetyl derivatives (S4), aminodiols (S5), tetrahydrofurans (S6), 4-hydroxy-

5,6-dihydropyrones (S7), non symmetric cyclic urea derivatives (S8) and 3-pyridylmethyl- hydroxylaminpentenamides (S9). These inhibitors were classified into 5 different sets based on the cell lines on which their activity is reported (Table 6.1). In addition the sixth set was made by combining the molecules of all five sets. Scheme 6.1 represents the structure of all the nine scaffolds with the cell line against which the inhibitor activity was reported.

Table 6.1 Name of the scaffolds, cell lines, number of inhibitors and pIC50 range of various sets of HIV PIs considered for the study.

# of Sets Scaffolds Name Cell line pIC50 range inhibitors Tetrahydrofurans; 3-pyridylmethyl- HIV-IIIB infected MT4 Set1 44 5.77–9.96 hydroxylaminpentenamides cells Aminodiols; Non Symmetric cyclic urea HIV-I and HIV-II Set2 43 6.52- 8.07 derivatives infected CEMSS cells HIV-I infected C8166 Set3 2’; 6’-dimethylphenoxy -acetyl derivatives 22 4.95-8.82 cells Aminodiols; Non Symmetric cyclic urea derivatives; 4-hydroxy-5;6-dihydropyrones; HIV-IIIB and HIV-I Set4 90 5.67- 9.96 Mannosylated Saquinavir; Nelfinavir and infected CEMSS cells Indinavir prodrugs Tetrahydrofurans; 3-pyridylmethyl- hydroxylaminpentenamides; Mannosylated HIV-IIIB and HIV-I Set5 56 5.65- 9.96 Saquinavir; Nelfinavir and Indinavir MT4 cells prodrugs Set6 All the above scaffolds All cell lines 156 4.95-9.96

136

Chapter 6

ORa NH N O RbO H

O NH2 OR O H S O NH N NH N NH O Ph O NH

S1: Nelfinavir prodrug, HIV-IIIB & HIV I infected S2: Saquinavir derivatives, HIV-IIIB & HIV I MT4 and CEMSS cells (4) infected MT4 and CEMSS cells (4)

N ORa ORb O CONHtBu N NH N R NH N NH OH O O N S3: Indinavir derivatives, HIV-IIIB & HIV I infected S4: 2',6'-dimethylphenoxy-acetyl derivatives MT4 and CEMSS cells (4) HIV I infected C8166 cells (22)

H OH R OH S H R NH N HET OH BocHN NH NHBoc H O O OH OH O Ph O NH S5: Aminodiols, HIV-I and II S6: Tetrahydrofuran derivatives, S7: 4-Hydroxy-1,4-dihydroxypyrone, infected CEMSS cells (28) HIV-IIIB infected MT4 cells (18) HIV-IIIB infected CEMSS cells (35) O O A Y X R NH Z N OH OH N N Y N N X N H O N Ph OH OH Ph tBu O S8: Non Symmetrical Cyclic urea derivatives, S9: 3-pyridylmethylhydroxylaminpentanamidies, HIV-IIIB infected MT4 cells(15) HIV-IIIB infected MT4 cells (26) Scheme 6.1 Scaffolds representing 156 HIV PIs with the cell line, the number of inhibitors in each scaffold is mentioned in parenthesis.

6.2.2. Geometry Optimization and descriptor calculation

Gaussian-09 program was used for three different types of computations [AM1, B3LYP/6-

31G(d)//AM1 and B3LYP/6-31G(d)] for all the inhibitors. Conventional descriptors (constitutional, topological, geometrical, electrostatic and quantum chemical) were calculated by using CODESSA program [24], C-DFT based descriptors (hardness, chemical potential and electrophilicity index)

137

Chapter 6

[25] were calculated manually, and docking based descriptors (docking fitness score from GOLD

[26] and GLIDE [27], H-bond & vdw components of docking fitness score and energy & Emodel from GLIDE) were taken from docking calculation results. The inhibitors were docked with the crystal structure of HIV PI (3NU3) using all the default parameters of GOLD and GLIDE docking protocols. Figure 6.1 gives the details of all the descriptors employed in the study.

Descriptor Pool

DFT Based Descriptors Docking Scores

 Hardness(η) Default Descriptors of CODESSA  Gold Score (GS) = ½ (εLUMO – εHOMO)  Chemical Potential (μ)  Gold Score_vdw Constitutional Descriptors Component (GV) = ½(εLUMO + εHOMO) (RNH, RNO)  Gold Score_hbond  Electrophilicity (ω) Geometrical Descriptors Component (GH) = (μ2)/ 2η (ZXS/R, XYS)  Glide Score (GlS)

Topological Descriptors  Glide Score_vdw (RI1, AIC2, KSI2) Component (GlV)  Glide Score_hbond Thermodynamic Descriptors(AM1) Component (GlH) (LoNMVFr, MiRECHB)  Glide Emodel (GlEm) Electrostatic Descriptors (MiPCOZ, Qmin, RNCG(QM/QT)Z )  Gold Energy(GlE)

Quantum Chemical Descriptors (B3LYP & AM1) (MaNAC, Mi1ERIO, Ma1ERIO, MiNRIN, Mi(>0.1) BOC, ESPMaNACO, HAHDCA-1/TMSAQ, Mi e-n AHNB, MiVO, Ma1ERIO, MiNRIC, FHACAQ, ABOC, Mi(>0.1) BOO)

Figure 6.1 Various descriptors employed in the study. The types of descriptors are mentioned in bold and name of the descriptors are given in the parenthesis.

6.2.4. QSAR model generation and validation

Almost all possible combinations of conventional descriptors, C-DFT based descriptors and docking scores were tested to select the three to five descriptors based QSAR models by heuristic method. All two-parameter regression models with remaining descriptors are subsequently developed and ranked by the correlation coefficient R2. A stepwise addition of further descriptor scales is performed to find the best multi-parameter regression models with the optimum values of

138

Chapter 6 statistical criteria (highest values of R2, the cross validated R2cv, and the F-value). Descriptors were selected by heuristic method using CODESSA program and regression analysis was done by using

ProjectLeader program associated with Scigress Explorer package [28]. All data sets were divided in to training and test sets for the rigorous validation of the generated models. The inter-correlation level among the individual descriptors was set to below 0.2 in all the models.

6.3 Results and Discussion

QSAR models were developed by using CODESSA generated conventional descriptors along with C-DFT descriptors and docking scores for the purpose of deriving structural requirements of these inhibitors for their HIV PI activity. Quantum chemical methods can be applied to QSAR by direct derivation of electronic descriptors from the molecular wave functions

[18]. Thus the quantum chemical descriptors have outstanding significance and importance in

QSAR model building. Since electrophilicity is a descriptor of reactivity that allows a quantitative classification of the global electrophilic nature of a molecule within a relative scale and hardness is related to stability, they are certainly useful for describing the biological activity of the inhibitors.

Considering the usefulness of the quantum chemical descriptors, C-DFT (C-DFT) based descriptors

(hardness, chemical potential and electrophilicity index) were calculated and their contribution towards the QSAR models to predict the activities of HIV PI was analyzed. Among them electrophilicity index and hardness showed good correlations in combination with the conventional descriptors. Further the relative importance of the docking scores as a descriptor was evaluated and compared with conventional and C-DFT descriptors. Inhibitors were docked to the receptor (PDB-

ID 3NU3) to calculate the docking based descriptors. Figure 6.1 gives the details of all the descriptor classes employed in this study. All the collected inhibitors were divided in to five data sets on the basis of cell lines on which their IC50 values were reported and the sixth set was

139

Chapter 6 constructed by combining all the inhibitors from all five data sets. Set 1 contains the 44 inhibitors of

HIV-IIIB infected MT4 cells and regression details for this set using three, four and five descriptors are given in Table 6.2. QSAR analysis were carried out by using constitutional (C), topological (T), geometrical (G), electrostatic (E) and quantum chemical (Q; B3LYP and AM1) descriptors separately to delineate the comparative performance of these classes of descriptors. It is clear that Q class of descriptors show slightly better statistical significance compared to C, T, G and E class of descriptors in three descriptors based models. The combination of all the above class of descriptors improve the statistical quality and provide good correlation. Similar results were obtained in the regression analysis using four and five descriptors. Addition of C-DFT based descriptors improve the coefficient values obtained from C, T, E, and combined descriptors based QSAR models which indicates that C-DFT based descriptors are important class of descriptors for Set 1. Surprisingly docking based descriptors are not making any positive contribution on the coefficient values obtained from conventional descriptors based QSAR models which indicates that stronger binding affinity is not associated with higher inhibitor activity for Set 1. The statistical coefficients obtained by the addition of combination of C-DFT and docking based descriptors are essentially the same as obtained by addition of C-DFT based descriptors alone.

Set 2 contains the 43 inhibitors of HIV-I and II infected CEMSS cells and regression details for this set using three, four and five descriptors are given in Table 6.3. Similar to Set 1 in case of

Set 2 also, Q class of descriptors show better performance compared to C, T, G and E class of descriptors and combination of all these descriptors improve the statistical quality and provide good correlation as evident from Table 6.3. Addition of C-DFT and docking based descriptors are not making any positive contribution on the coefficient values obtained from conventional descriptors based QSAR models for the Set 2.

140

Chapter 6

Table 6.2 Effect of conceptual DFT based descriptors and docking scores on the statistical quality of QSAR models of Set-1 obtained by three, four and five conventional descriptors.

C- S. Descriptors Default C-DFT Docking DFT+Docking No. 2 2 2 2 2 2 2 2 Type # R R cv R R cv R R cv R R cv 1 C 3 0.74 0.69 0.78 0.74 0.74 0.69 0.78 0.74 2 T 3 0.71 0.66 0.78 0.76 0.71 0.66 0.78 0.76 3 G 3 0.69 0.62 0.69 0.62 0.69 0.62 0.69 0.62 4 E 3 0.73 0.68 0.79 0.73 0.73 0.68 0.79 0.73 5 Q(B3LYP) 3 0.76 0.70 0.76 0.70 0.76 0.70 0.76 0.70 6 All (B3LYP) 3 0.79 0.75 0.82 0.77 0.79 0.75 0.82 0.77 7 Q (AM1) 3 0.80 0.75 0.80 0.75 0.80 0.75 0.80 0.75 8 All (AM1) 3 0.80 0.75 0.80 0.75 0.80 0.75 0.80 0.75 9 All (B3LYP) 4 0.82 0.77 0.84 0.78 0.82 0.77 0.84 0.78 C +T +G +E +Q 10 4 0.83 0.78 0.83 0.78 0.83 0.78 0.83 0.78 (AM1) 11 All(AM1) 4 0.83 0.78 0.83 0.78 0.83 0.78 0.83 0.78 12 All (B3LYP) 5 0.82 0.81 0.84 0.78 0.84 0.81 0.84 0.78 C +T +G +E +Q 13 5 0.84 0.77 0.84 0.77 0.84 0.77 0.84 0.77 (AM1) 14 All(AM1) 5 0.84 0.77 0.84 0.77 0.84 0.77 0.84 0.77 n = 44; pIC50 range = 5.7709 – 9.959; cell line= HIV-IIIB infected MT4.

Table 6.3 Effect of conceptual DFT based descriptors and docking scores on the statistical quality of QSAR models of Set-2 obtained by three, four and five conventional descriptors.

C-DFT S. Descriptors Default C-DFT Docking +Docking No. 2 2 2 2 2 2 2 2 Type # R R cv R R cv R R cv R R cv 1 C 3 0.76 0.74 0.76 0.74 0.76 0.74 0.76 0.74 2 T 3 0.80 0.76 0.80 0.76 0.80 0.76 0.80 0.76 3 G 3 0.76 0.74 0.76 0.74 0.76 0.74 0.76 0.74 4 E 3 0.72 0.64 0.72 0.64 0.72 0.64 0.72 0.64 5 Q (B3LYP) 3 0.86 0.84 0.86 0.84 0.86 0.84 0.86 0.84 6 All (B3LYP) 3 0.86 0.84 0.86 0.84 0.86 0.84 0.86 0.84 7 Q (AM1) 3 0.83 0.74 0.83 0.74 0.83 0.74 0.83 0.74 8 All(AM1) 3 0.83 0.74 0.83 0.74 0.83 0.74 0.83 0.74 9 Q (B3LYP) 4 0.89 0.86 0.89 0.86 0.89 0.86 0.89 0.86 C+T+G+E+Q 10 4 0.86 0.82 0.86 0.82 0.86 0.82 0.86 0.82 (AM1) 11 All (AM1) 4 0.86 0.82 0.86 0.82 0.86 0.82 0.86 0.82 12 Q (B3LYP) 5 0.90 0.86 0.90 0.86 0.90 0.86 0.90 0.86 C+T+G+E+Q 13 5 0.88 0.82 0.88 0.82 0.88 0.82 0.88 0.82 (AM1) 14 All (AM1) 5 0.88 0.82 0.88 0.82 0.88 0.82 0.88 0.82 n=43; pIC50 range=6.523- 8.066; cell line=HIV-I and II infected CEMSS.

141

Chapter 6

Set 3 contains the 22 inhibitors of HIV-I infected C8166 cells and regression details for this set using three, four and five descriptors are given in Table 6.4. Although C, T, G and E class of descriptors show poor statistical quality for this set, Q class of descriptors show good correlation.

Moreover combination of all these descriptors improves the statistical quality and provides better correlation as evident from Table 6.4. Addition of C-DFT and docking based descriptors are not making any positive contribution on the coefficient values obtained from conventional descriptors based QSAR models for the Set 3 also.

Table 6.4 Effect of conceptual DFT based descriptors and docking scores on the statistical quality of QSAR models of Set-3 obtained by three, four and five conventional descriptors.

C-DFT S. Descriptors Default C-DFT Docking +Docking No. 2 2 2 2 2 2 2 2 Type # R R cv R R cv R R cv R R cv 1 C 3 0.33 0.09 0.33 0.09 0.52 0.32 0.52 0.32 2 T 3 0.49 0.24 0.49 0.24 0.49 0.22 0.49 0.22 3 G 3 0.22 0.06 0.22 0.06 0.23 0.09 0.23 0.09 4 E 3 0.62 0.49 0.62 0.49 0.62 0.49 0.62 0.49 5 Q (B3LYP) 3 0.70 0.53 0.70 0.53 0.70 0.53 0.70 0.53 6 All (B3LYP) 3 0.73 0.57 0.73 0.57 0.73 0.58 0.73 0.58 7 Q (AM1) 3 0.91 0.85 0.91 0.85 0.91 0.85 0.91 0.85 8 All(AM1) 3 0.91 0.85 0.91 0.85 0.91 0.85 0.91 0.85 9 Q (B3LYP) 4 0.75 0.59 0.75 0.59 0.75 0.59 0.75 0.59 C+T+G+E+Q 10 4 0.93 0.89 0.93 0.80 0.93 0.89 0.93 0.80 (AM1) 11 All (AM1) 4 0.93 0.89 0.93 0.80 0.93 0.89 0.93 0.80 12 Q (B3LYP) 5 0.78 0.70 0.78 0.70 0.78 0.66 0.78 0.66 C+T+G+E+Q 13 5 0.94 0.89 0.94 0.89 0.94 0.89 0.94 0.89 (AM1) 14 All (AM1) 5 0.94 0.89 0.94 0.89 0.94 0.89 0.94 0.89 n= 22; pIC50 range= 4.95-8.82; cell line= HIV-I infected C8166.

Set 4 contains the 90 inhibitors of HIV-IIIB and HIV-I infected CEMSS cells and regression details for this set using three, four and five descriptors are given in Table 6.5. This set shows exceptional behavior where T class of descriptors show a slightly better statistical significance followed by Q, E, C and G class of descriptors. However, similar to other sets the combination of

142

Chapter 6 all these descriptors improves the statistical quality. Addition of C-DFT and docking based descriptors are not making any positive contribution on the coefficient values of Set 4 obtained from conventional descriptors based QSAR models.

Table 6.5 Effect of conceptual DFT based descriptors and docking scores on the statistical quality of QSAR models of Set-4 obtained by three, four and five conventional descriptors.

C-DFT S. Descriptors Default C-DFT Docking +Docking No. 2 2 2 2 2 2 2 2 Type # R R cv R R cv R R cv R R cv 1 C 3 0.74 0.73 0.74 0.73 0.74 0.73 0.74 0.73 2 T 3 0.77 0.74 0.77 0.74 0.77 0.74 0.77 0.74 3 G 3 0.73 0.71 0.73 0.71 0.74 0.72 0.74 0.72 4 E 3 0.76 0.74 0.76 0.74 0.76 0.74 0.76 0.74 5 Q (B3LYP) 3 0.76 0.74 0.76 0.74 0.76 0.74 0.76 0.74 6 All (B3LYP) 3 0.79 0.76 0.79 0.76 0.79 0.76 0.79 0.76 7 Q (AM1) 3 0.75 0.73 0.75 0.73 0.75 0.73 0.75 0.73 8 All(AM1) 3 0.78 0.77 0.78 0.77 0.78 0.77 0.78 0.77 9 Q (B3LYP) 4 0.79 0.76 0.79 0.76 0.79 0.76 0.79 0.76 C+T+G+E+Q 10 4 0.81 0.78 0.81 0.78 0.81 0.78 0.81 0.78 (AM1) 11 All (AM1) 4 0.81 0.78 0.81 0.78 0.81 0.78 0.81 0.78 12 Q (B3LYP) 5 0.80 0.76 0.80 0.76 0.80 0.76 0.80 0.76 C+T+G+E+Q 13 5 0.82 0.79 0.82 0.79 0.82 0.79 0.82 0.79 (AM1) 14 All (AM1) 5 0.82 0.79 0.82 0.79 0.82 0.79 0.82 0.79 n=90; pIC50 range= 5.67- 9.96; cell line- CEMSS cells.

Set 5 contains the 56 inhibitors of HIV-IIIB and HIV-I infected MT4 cells and the regression details for this set using three, four and five descriptors are given in Table 6.6. T and G class of descriptors show poor statistical quality and E, C and Q class of descriptors show similar trend with slightly better performance of Q class of descriptors for this set. Similar to other sets the combination of all these descriptors improves the statistical quality for this set also. Addition of C-

DFT based descriptors improve the coefficient values obtained from C, T, G, E, and combined descriptors based QSAR models which indicate that, similar to Set 1, the C-DFT based descriptors are important class of descriptors for Set 5 also. Although the docking based descriptors are making

143

Chapter 6 positive contribution on the coefficient values obtained from C, T, G, and E descriptors based

QSAR models, this effect is not observed on the quantum chemical and combined descriptors based models.

Table 6.6 Effect of conceptual DFT based descriptors and docking scores on the statistical quality of QSAR models of Set-5 obtained by three, four and five conventional descriptors.

C-DFT S. Descriptors Default C-DFT Docking +Docking No. 2 2 2 2 2 2 2 2 Type # R R cv R R cv R R cv R R cv 1 C 3 0.70 0.66 0.75 0.71 0.70 0.66 0.75 0.71 2 T 3 0.42 0.34 0.52 0.44 0.57 0.50 0.59 0.50 3 G 3 0.23 0.14 0.50 0.43 0.52 0.44 0.54 0.45 4 E 3 0.67 0.64 0.70 0.66 0.67 0.64 0.70 0.66 5 Q (B3LYP) 3 0.71 0.65 0.71 0.65 0.71 0.65 0.71 0.65 6 All (B3LYP) 3 0.75 0.70 0.77 0.72 0.75 0.70 0.77 0.72 7 Q (AM1) 3 0.76 0.71 0.76 0.71 0.76 0.71 0.76 0.71 8 All(AM1) 3 0.78 0.74 0.78 0.74 0.78 0.74 0.78 0.74 9 Q (B3LYP) 4 0.73 0.67 0.73 0.67 0.73 0.67 0.73 0.67 C+T+G+E+Q 10 4 0.79 0.75 0.79 0.75 0.79 0.75 0.79 0.75 (AM1) 11 All (AM1) 4 0.82 0.78 0.82 0.78 0.82 0.78 0.82 0.78 12 Q (B3LYP) 5 0.77 0.63 0.77 0.63 0.77 0.63 0.77 0.63 C+T+G+E+Q 13 5 0.80 0.48 0.80 0.48 0.80 0.48 0.80 0.48 (AM1) 14 All (AM1) 5 0.83 0.79 0.83 0.79 0.83 0.79 0.83 0.79 n=56; pIC50 range= 5.651- 9.959;cell line= MT4 cells.

Based on these observations we have selected two best models (one each from three and four descriptors based) for each set of HIV PIs. The models were selected based on their high correlation coefficient (R2 ~0.84 for Set 1, 0.89 for Set 2, 0.93 for Set 3, 0.81 for Set 4 and 0.82 for

2 Set 5) and cross-validation coefficient (R cv ~0.77 for Set 1, 0.77 for Set 2, 0.89 for Set 3, 0.78 for

Set 4 and 0.78 for Set 5). Thus, these models are appropriate for the prediction of unknown derivatives of respective types of HIV PIs. The inter-correlation among the descriptors for most of the developed QSAR models (694 of 696 models) is well within 0.2, while the value is 0.39 for the other two models. Thus, the generated QSAR models are extremely robust and the descriptors

144

Chapter 6 employed are unique and least correlated with each other. We found that five descriptor based models in all the above five sets contain slightly inter-correlated descriptors so these models are not statistically very significant. In addition the correlation and cross validation coefficient values for three and four descriptors in all the five sets are similar, so three descriptors based models are optimum for the QSAR study of these classes of inhibitors.

Table 6.7 Effect of conceptual DFT based descriptors and docking scores on the statistical quality of QSAR models of Set-6 obtained by three, four, five and six conventional descriptors.

Descriptors Default C-DFT Docking C-DFT +Docking S. No. 2 2 2 2 2 2 2 2 Type # R R cv R R cv R R cv R R cv 1 C 3 0.37 0.35 0.37 0.35 0.37 0.35 0.37 0.35 2 T 3 0.34 0.32 0.37 0.34 0.34 0.32 0.37 0.34 3 G 3 0.26 0.24 0.26 0.24 0.26 0.24 0.26 0.24 4 E 3 0.44 0.41 0.44 0.41 0.44 0.41 0.44 0.41 5 Q (B3LYP) 3 0.52 0.49 0.52 0.49 0.52 0.49 0.52 0.49 6 All (B3LYP) 3 0.58 0.56 0.58 0.56 0.58 0.56 0.58 0.56 7 Q (AM1) 3 0.47 0.44 0.47 0.44 0.47 0.44 0.47 0.44 8 All(AM1) 3 0.49 0.46 0.49 0.46 0.49 0.46 0.49 0.46 9 Q (B3LYP) 4 0.54 0.50 0.54 0.51 0.54 0.50 0.54 0.51 10 C+T+G+E+Q (AM1) 4 0.53 0.51 0.53 0.51 0.53 0.51 0.53 0.51 11 All (AM1) 4 0.53 0.51 0.53 0.51 0.53 0.51 0.53 0.51 12 Q (B3LYP) 5 0.56 0.51 0.56 0.52 0.56 0.51 0.56 0.52 13 C+T+G+E+Q (AM1) 5 0.64 0.60 0.66 0.63 0.64 0.60 0.66 0.63 14 All (AM1) 5 0.64 0.60 0.66 0.63 0.64 0.60 0.66 0.63 15 Q (B3LYP) 6 0.56 0.51 0.57 0.53 0.57 0.52 0.57 0.52 16 C+T+G+E+Q (AM1) 6 0.64 0.61 0.68 0.64 0.58 0.55 0.64 0.61 17 All (AM1) 6 0.64 0.61 0.68 0.64 0.58 0.53 0.64 0.61

n=156; pIC50 range= 4.953-9.959; cell line=All cell lines.

Set 6 contains all the inhibitors of the above five sets and regression details for this set using three, four, five and six descriptors are given in Table 6.7. C, T, G, E and Q class descriptors show poor statistical performance for this set using three and four descriptors and the combined descriptors are also not much improving the statistical quality. Five or six descriptors based models show slight improvement in the statistical quality and provide coefficient values in the range of

0.64. Five descriptors based model does not contain inter-correlated descriptors and seems

145

Chapter 6 reasonable for the QSAR study of inhibitors of Set 6. Addition of C-DFT based descriptors slightly improve the coefficient values while docking based descriptors are not making any positive contribution. Coefficient values obtained from combination of C-DFT and docking based descriptors are essentially same as C-DFT based descriptors alone. Based on these observations we

2 2 have selected five descriptor based model which shows moderate coefficient values (R ~0.64; R cv

~0.60). The reason for the poor result for this set may be associated with the high diversity of scaffolds and cell lines in the same model.

A quick look at the types of descriptors applied reveal that the quantum chemical descriptors are highly predominant in almost all the models. The use of Quantum chemical descriptors in the

QSAR models signifies that the compounds and their various fragments and substituents have been directly characterized on the basis of their molecular structure only. As a robust amount of physico- chemical information content can be related to these theoretical descriptors, these models tend to be highly accurate with good predictive ability. Therefore, the derived QSAR models are optimum choices in determining the biological activity of the HIV PIs. The inhibitors in each set were divided into test set and training set by taking about 20% of the inhibitors in the test set to validate the predictability of developed models. The statistical significance of the final selected QSAR models are presented in Table 6.8. In order to assess and compare the predictive power of QSAR

2 2 2 models, statistical parameters (other than R and R cv) are reported and widely applied like s , F and

AE where s2 is the standard deviation, F is the Fisher statistics (the ratio between explained and unexplained variance for a given number of degrees of freedom and it indicates a factual correlation or the significance level for QSAR models) and AE is the average residual (values derived by subtracting predicted activity from experimental activity).

146

Chapter 6

Table 6.8 Regression equations and statistical significance for final selected QSAR models for all the sets.

# of # of Sets Regression equation R2 R 2 s2 F PRESS Des. Comp. cv

=0.433498* RI1-0.237058*ω+ Set1 3 44 0.82 0.75 0.26 60.17 7.31 88.0869*MiPCOZ+15.5711 =0.395912* RI1-0.263153* ω +93.3027* Set1 4 44 0.84 0.77 0.16 58.60 5.93 MiPCOZ +3.82766*ZXS/R+14.6994 =-0.71509*MaNAC+14.8642*Mi1ERIO Set2 3 43 0.81 0.76 0.05 53.01 1.67 -35.8078* Ma1ERIO +8.51041 =-0.663176* MaNAC +15.264* Set2 4 43 Mi1ERIO -88.5042*MiNRIN-23.9661 0.85 0.77 0.04 44.76 1.30 *Mi(>0.1) BOC +10.9222 =13.8533*ESPMaNACO+0.351533*HA Set3 3♣ 22 HDCA-1/TMSAQ-0.609693*Mi e-n 0.91 0.89 0.16 45.26 2.18 AHNB+59.5109 =13.428* ESPMaNACO -0.495094* Mi Set3 4♣ 22 e-n AHNB +0.335295*HAHDCA-1Q- 0.94 0.87 0.11 53.47 1.37 0.322176*KSI3+52.8475 =20.3734* RNCG(QM/QT)Z +5.10815* Set4 3 90 0.80 0.75 0.18 110.86 14.05 MiVO-71.4143* Ma1ERIO -3.53739 =19.2496* RNCG(QM/QT)Z +5.08487* Set4 4 90 MiVO -73.1173* Ma1ERIO +116707* 0.83 0.76 0.18 82.97 12.33 MiNRIC -3.40974 =-14.9413*FHACAQ-37.4679*RNH- Set5 3 56 0.77 0.73 0.41 50.44 18.01 0.338127* ω +29.8884 =15.1738* RNAB -0.03505* XYS Set5 4♣ 56 +0.0403574* LoNMVFr +2.30838* 0.81 0.76 0.34 41.66 13.08 MiRECHB-13.1028 =17.6559*ABOC+1584190*MiNRIN+7 Set6 5 156 5.5666*MiPCOZ+0.516385*μ-13.7422* 0.66 0.65 0.52 46.96 60.97 RNO+6.32054 =19.2278* ABOC +165934* MiNRIN - 12.736* FHACAQ +77.9327* MiPCOZ Set6 6 156 0.65 0.61 0.50 41.64 78.43 +0.4995* μ +9.93943* Mi(>0.1) BOO+ 3.32132 ♣AM1 based models.

It is clear from this table that all the models show very good statistical significance except the models for Set 6. In one hand, the uniqueness of a molecule and its total chemical information cannot be described by very few descriptors while in other hand large number of descriptors will create confusions and reduce the statistical robustness of the model. The effect of number of descriptors on the correlation coefficient values for all the models were tested by correlating 1 to 10 descriptors separately and presented in Figure 6.2.

147

Chapter 6

1.05 1.00 0.95 0.90 0.85 0.80 0.75

0.70

2

R 0.65 0.60 Set1 Set2 0.55 Set3 0.50 Set4 0.45 Set5 0.40 Set6 0.35 0 1 2 3 4 5 6 7 8 9 10 11 # of Descriptors Figure 6.2 Effect of number of descriptors on the correlation coefficient of cell line based QSAR models

1.0

0.9 R2 2 0.8 R cv

3 Des 3 AE 0.7 R2 2 R cv

4 Des 4 AE 0.6 R2 2 R cv 0.5 Des 5 AE R2 2 0.4 R cv 6 Des 6 AE 0.3

0.2

0.1

0.0 C T G E Q 3 4 3 4 3 4 3 4 3 4 5 6 Th CDFT Set1 Set2 Set3 Set4 Set5 Set6 Docking

2 2 Figure 6.3 Regression summary (correlation coefficient R , cross validation coefficient R CV and average residual AE values) for the 3, 4 & 5 descriptor based QSAR models. The stack column represents the type of descriptors involved in the models.

148

Chapter 6

We observed that in the first five models three to four descriptors are sufficient for getting a good correlation and using more than four descriptors make only small effect on the statistical quality of the models. So three and four descriptors based models were selected to be the optimum one to avoid over correlation as well as to have sufficient chemical information. Although most of these QSAR models do not have any outlier however in some cases maximum of one outlier is present on the basis of its higher deviation between observed and predicted activities. The lower average residual values were obtained in the training set as well as in the test set of molecules in all

2 2 the models. Figure 6.3 depicts the R , R cv and AE values and descriptor classes for three to five descriptors based models for Set 1-5 and five to six descriptors based models for Set 6. Figures 6.4

A and 6.4 B are the plots between experimental and predicted IC50 values for three and four descriptors based models respectively.

8.2 11.0 Set 2 (43) Set 1(44) 9.5 Set 3(22) 10.5 8.0 2 2 R =0.81 R = 0.82 9.0 2 10.0 R =0.91 7.8 2 2 R cv=0.76 8.5 9.5 R cv=0.75 2 R cv=0.89

9.0 7.6 8.0 50 50 8.5 7.4 7.5 8.0 7.0 7.2 7.5

Predicted Predicted 6.5 Predicted

7.0 Predicted 7.0 PredictedpIC Predicted pIC Predicted 6.0 6.5 6.8 6.0 5.5

5.5 6.6 5.0 5.0 6.4 4.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 6.5 7.0 7.5 8.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 Experimental pIC Experimental 50 ExperimentalExprerimental pIC ExpreimentalExperimental pIC 50 50

10.5 9.5 Set 4(90) Set 5(56) Set 6 (156) 10 2 2 10.0 2 9.0 R =0.80 R =0.77 R =0.65 2 2 9.5 2 R cv=0.75 R cv=0.73 R cv=0.61 8.5 9 9.0

50 8.0 8 8.5 7.5 8.0

7.0 Predicted Predicted

Y Axis Y Title 7 7.5

Y Axis Title Axis Y

Predicted PredictedpIC 6.5

Predicted Predicted 7.0 6.0 6 6.5

5.5 6.0 5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 5 6 7 8 9 10 Experimental pIC X Axis Title Experimental 50 Experimental ExperimentalX Axis Title

Training Set Test Set Figure 6.4 A The predicted pIC50 values plotted against the experimental pIC50 values for 3(Set1-5) & 5 (Set6) descriptor based models. No. of inhibitors are mentioned in parenthesis.

149

Chapter 6

8.2 Set 2 (43) 9.5 10.5 Set 3 (22) Set 1 (44) 8.0 R2= 0.85 9.0 2 10.0 R = 0.94 2 R = 0.84 7.8 2 8.5 2 9.5 R cv=0.77 R cv=0.87 2 R =0.77 8.0 9.0 cv 7.6

8.5 7.5 7.4 8.0 7.0 7.2 7.5

Predicted 6.5

Predicted Y Axis Y Title 7.0 Axis Y Title

Y Axis Title Axis Y 7.0 Predicted Predicted 6.0 6.5 6.8 5.5 6.0 6.6 5.0 5.5 6.4 5.0 4.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 6.2 6.4 6.6 6.8 7.0 7.2 7.4 7.6 7.8 8.0 8.2 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 ExperimentalX Axis Title ExperimentalX Axis Title ExperimentalX Axis Title

10.5 10.5 Set 6 (156) Set 4 (90) 10.5 Set 5 (56) 10.0 2 10.0 10.0 R = 0.62 R2= 0.83 2 R = 0.81 9.5 2 9.5 9.5 R cv=0.59 R2 =0.76 2 cv R cv=0.76 9.0 9.0 9.0 8.5 8.5 8.5 8.0 8.0 8.0 7.5 7.5

7.0 7.5

Predicted Predicted

Predicted Predicted

Y Axis Title Axis Y Predicted Predicted

Y Axis Title Axis Y 7.0 Y Axis Title Axis Y 6.5 7.0 6.5 6.0 6.5 6.0 5.5 6.0 5.5 5.0

5.0 4.5 5.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 ExperimentalX Axis Title ExperimentalX Axis Title ExperimentalX Axis Title

Training Set Test Set

Figure 6.4 B The predicted pIC50 values plotted against the experimental pIC50 values for 4(Set1-5) & 6 (Set6) descriptor based models. No. of inhibitors are mentioned in parenthesis.

Linear relationship between experimental and predicted pIC50 values for most of the models is clear from these plots.

Table 6.9 Comparative performance of AM1, B3LYP/6-31G(d)//AM1 and B3LYP/6-31G(d) level of theories on the statistical quality of QSAR models.

# of # AM1 B3LYP/6-31G(d)//AM1 B3LYP/6-31G(d) Set 2 2 2 2 2 2 Comp. of Des. R R cv R R cv R R cv 3 0.80 0.75 0.82 0.77 0.78 0.71 Set1 44 4 0.83 0.78 0.85 0.76 - - 3 0.83 0.74 0.86 0.84 0.85 0.80 Set2 43 4 0.86 0.82 0.89 0.86 0.88 0.83 3 0.91 0.85 0.73 0.58 0.76 0.63 Set3 22 4 0.93 0.89 0.78 0.61 0.79 0.67 3 0.79 0.77 0.80 0.76 0.78 0.74 Set4 90 4 0.81 0.78 0.82 0.78 0.78 0.77 3 0.78 0.74 0.77 0.72 0.74 0.70 Set5 56 4 0.82 0.78 0.79 0.76 0.78 0.73 5 0.66 0.65 0.63 0.59 0.57 0.53 Set6 156 6 0.65 0.61 0.64 0.61 0.57 0.52

150

Chapter 6

Table 6.9 shows the comparative performance of AM1, B3LYP/6-31G(d)//AM1 and

B3LYP/6-31G(d) level of theories on the statistical quality of QSAR models. In general, however, we can observe that none of these methods always show best performance but B3LYP/6-

31G(d)//AM1 is a good compromise between accuracy and computational cost.

6.4 Conclusions

In this chapter, a study investigating the importance of various types of molecular descriptors those contribute significantly towards the activities of 156 HIV PIs collected from literature. The effectiveness of minimal descriptors in providing reasonable correlation and structurally diverse sets of HIV PIs has been illustrated. The study employs docking score and conceptual DFT descriptors and compared with the conventional (constitutional, topological, geometrical, electrostatic and quantum chemical) descriptors. Quantum chemical descriptors appear to be very important and conceptual DFT descriptors improve the statistical quality of the models in many cases. Surprisingly docking scores were not very effective. This may probably due to the limitations of the scoring functions. Although analysis is done with various models where the number of descriptors is increased from 1 to 10, in most cases 3 descriptor based model is adequate.

Interestingly in some cases, even a single descriptor provides a satisfactory correlation. The systematic study also reveals that the more economical semi-empirical procedures are as effective as the computationally demanding B3LYP/6-31G(d) method. Therefore the study highlights the importance of conventional and conceptual DFT descriptors even for the structurally diverse data sets considered here.

151

Chapter 6

References:

1. Varghese, G. M., Janardhanan, J., Ralph, R., & Abraham, O. C. (2013). The twin epidemics of tuberculosis and HIV. Curr Infect Dis Rep, 15(1), 77-84. 2. Mehellou, Y., & De Clercq, E. (2010). Twenty-six years of anti-HIV drug discovery: where do we stand and where do we go?. J Med Chem, 53(2), 521-538. 3. Wlodawer, A., & Gustchina, A. (2000). Structural and biochemical studies of retroviral proteases. BBA-Protein Struct Mol Enzymol, 1477(1), 16-34. 4. Roche, D., Greiner, J., Aubertin, A. M., & Vierling, P. (2006). Synthesis and in vitro biological evaluation of mannose-containing prodrugs derived from clinically used HIV- protease inhibitors with improved transepithelial transport. Bioconjugate Chem, 17(6), 1568-1581. 5. Lamarre, D., Croteau, G., Wardrop, E., Bourgon, L., Thibeault, D., Clouette, C., Anderson, P. C. (1997). Antiviral properties of palinavir, a potent inhibitor of the human immunodeficiency virus type 1 protease. Antimicrob Agents Chemother, 41(5), 965-971. 6. Beaulieu, P. L., Anderson, P. C., Cameron, D. R., Croteau, G., Gorys, V., Grand-Maître, C., Tong, L. (2000). 2', 6'-Dimethylphenoxyacetyl: A New Achiral High Affinity P3-P2 Ligand for Peptidomimetic-Based HIV Protease Inhibitors. J Med Chem, 43(6), 1094-1108. 7. Barrish, J. C., Gordon, E., Alam, M., Lin, P. F., Bisacchi, G. S., Chen, P., Greytok, J. A. (1994). Amino diol HIV protease inhibitors. 1. Design, synthesis, and preliminary SAR. J Med Chem, 37(12), 1758-1768. 8. Chen, P., Cheng, P. T., Alam, M., Beyer, B. D., Bisacchi, G. S., Dejneka, T., Barrish, J. C. (1996). Aminodiol HIV Protease Inhibitors. Synthesis And Structure-Activity Relationships Of P1/P1'Compounds: Correlation between Lipophilicity and Cytotoxicity. J Med Chem, 39(10), 1991-2007. 9. Ghosh, A. K., Thompson, W. J., McKee, S. P., Duong, T. T., Lyle, T. A., Chen, J. C., Emini, E. A. (1993). 3-Tetrahydrofuran and pyran urethanes as high-affinity P2-ligands for HIV-1 protease inhibitors. J Med Chem, 36(2), 292-294. 10. Ghosh, A. K., Kincaid, J. F., Walters, D. E., Chen, Y., Chaudhuri, N. C., Thompson, W. J., Huff, J. R. (1996). Nonpeptidal P2 ligands for HIV protease inhibitors: structure-based design, synthesis, and biological evaluation. J Med Chem, 39(17), 3278-3290. 11. Hagen, S., Prasad, J. V., & Tait, B. D. (2000). Nonpeptide inhibitors of HIV protease. Adv Med Chem, 5, 159-195. 12. Hagen, S. E., Domagala, J., Gajda, C., Lovdahl, M., Tait, B. D., Wise, E., Brodfuehrer, J. (2001). 4-Hydroxy-5, 6-dihydropyrones as inhibitors of HIV protease: the effect of heterocyclic substituents at C-6 on antiviral potency and pharmacokinetic parameters. J Med Chem, 44(14), 2319-2332. 13. Wilkerson, W. W., Dax, S., & Cheatham, W. W. (1997). Nonsymmetrically substituted cyclic urea HIV protease inhibitors. J Med Chem, 40(25), 4079-4088. 14. Dorsey, B. D., McDonough, C., McDaniel, S. L., Levin, R. B., Newton, C. L., Hoffman, J. M., Vacca, J. P. (2000). Identification of MK-944a: a second clinical candidate from the hydroxylaminepentanamide isostere series of HIV protease inhibitors. J Med Chem, 43(18), 3386-3399. 15. De Lucca, G. V., Liang, J., Aldrich, P. E., Calabrese, J., Cordova, B., Klabe, R. M., Chang, C. H. (1997). Design, synthesis, and evaluation of tetrahydropyrimidinones as an example

152

Chapter 6

of a general approach to nonpeptide HIV protease inhibitors. J Med Chem, 40(11), 1707- 1719. 16. Reddy, A. S., Pati, S. P., Kumar, P. P., Pradeep, H. N., & Sastry, G. N. (2007). Virtual screening in drug discovery-a computational perspective. Curr Protein Pept Sci, 8(4), 329- 351. 17. Srivastava, H. K., Chourasia, M., Kumar, D., & Sastry, G. N. (2011). Comparison of computational methods to model DNA minor groove binders. J Chem Inf Model, 51(3), 558- 571. 18. Karelson, M., Lobanov, V. S., & Katritzky, A. R. (1996). Quantum-chemical descriptors in QSAR/QSPR studies. Chem Rev, 96(3), 1027-1044. 19. Parr, R. G. (1983). Density functional theory. Annu Rev Phys Chem, 34(1), 631-656. 20. Padmanabhan, J., Parthasarathi, R., Subramanian, V., & Chattaraj, P. K. (2007). Electrophilicity-based charge transfer descriptor. J Phys Chem A, 111(7), 1358-1361. 21. Srivastava, H. K., Pasha, F. A., Mishra, S. K., & Singh, P. P. (2009). Novel applications of atomic softness and QSAR study of testosterone derivatives. Med Chem Res, 18(6), 455- 466. 22. Srivani, P., Usharani, D., Jemmis, E. D., & Sastry, G. N. (2008). Subtype selectivity in phosphodiesterase 4 (PDE4): a bottleneck in rational drug design. Curr Pharma Des, 14(36), 3854-3872. 23. Chattaraj, P. K., & Maiti, B. (2001). Reactivity dynamics in atom-field interactions: a quantum fluid density functional study. J Phys Chem A, 105(1), 169-183. 24. 25. Katritzky, A.R; Lobanov, V.S; Karelson, M. Copyright 1994−1996, CODESSA 2.0, Comprehensive Descriptors for Structural and Statistical Analysis; University of Florida, U.S.A. 26. Parr, R. G., & Zhou, Z. (1993). Absolute hardness: unifying concept for identifying shells and subshells in nuclei, atoms, molecules, and metallic clusters. Acc Chem Res, 26(5), 256- 258. 27. Jones, G., Willett, P., Glen, R. C., Leach, A. R., & Taylor, R. (1997). Development and validation of a genetic algorithm for flexible docking. J Mol Biol, 267(3), 727-748. 28. Glide, version 5.0, Schrödinger, LLC, New York, NY, 2008. 29. Scigress Explorer version 7.7; Fujitsu: Tokyo, Japan, 2008.

153

Chapter 7 The Structural and Functional Diversities of Hexadecahydro-1H-Cyclopenta[a] Phenanthrene Framework

Where? How many? S0(14) Presence of double bonds S1.I1(33) S1.I2(9) in HHCPF

S1.I3(1) S1.I4(1) S1.I5(1) Scaffolds 12 17 11 13 Target 16 1 9 selectivity & 10 + 14 2 8 15 Substitutents ADMET 3 7 S2.I1(20) S2.I2 (2) S2.I3 (2) 5 (Acceptor, Donor, profiles 4 6 Hydrophobic, HHCPF Charged, Ring, S2.I4(1) S2.I5(1) Halogen) Substituents at 17 S3.I1(20) S3.I2(2) different positions Where? (1) (2) What? How many?

Unity of plan everywhere lies hidden under the mask: of diversity of structure—the complex is everywhere evolved out of the simple. — Thomas Henry Huxley 'A Lobster; or, the Study of Zoology' (1861). In Collected Essays (1894). Vol. 8, 205-6.

Chapter 7

7.1 Background

NPs have been one of the most popular sources of lead compounds for the development of drugs since centuries [1-4]. Not only have they been a major source in drug discovery providing lead compounds approved for clinical use, but also have inspired the synthesis of new chemical entities [5, 6]. NPs have led to many advances in methodologies that synthesize analogues of the original lead compound with improved pharmacological or pharmaceutical properties [7-9]. Tremendous effort is being made to modify the existing NPs to enhance their drug-like properties [10-15]. NP scaffolds have also been well recognized to be ‘privileged’ structures because of their abilities to bind to multiple targets. Such scaffolds can be used as basic skeletons of compound libraries created by combinatorial methods. Few examples include creation of libraries based on alkaloids, polyketides, terpenoids and flavonoids [16-22]. Evans and coworkers had originally coined the term ‘privileged structure’ for benzodiazepines and their derivatives, which were found to bind to peripheral benzodiazepine receptors and cholecystokinin receptors apart from their own target benzodiazepine receptors [23]. Most of the

NPs consist of privileged structures, because they need to bind to the proteins which synthesize/metabolize them and also to other proteins to carry out their regular function.

Waldmann et al. in their review discuss that NPs bind to particular human proteins probably because of the evolutionary similarity of human proteins with their natural targets [24]. Hence, if we know the suitable molecular scaffolds for certain families of targets, then these scaffolds can be used as the biologically validated starting points for designing libraries of new compounds against those targets [25].

Computational techniques are being employed in NP research to further advance the identification of new drugs from the NP privileged scaffolds [26]. One of the areas of NP

155

Chapter 7 research assisted by computational techniques is database development. Yongye and Medina-

Franco recently compiled five NP databases containing 560 and 89000 compounds [27]. ZINC database contains over 19 million molecules including major subsets of NP [28]. Many computational studies report structural diversity analysis of NP by different approaches which helps to evaluate the structural uniqueness of a group of compounds [29]. One of them is based on employment of structural fingerprints like molecular fragments or pharmacophoric features and the second approach uses molecular frameworks or scaffolds that represent chemical structures in a perceptive way [30]. Clemons et al. in 2010 in their study on experimental activity of around 15 000 compounds tested on a set of diverse targets showed that structural complexity of compounds is associated with increased structural specificity and enhanced target binding

[31]. Many chemoinformatics approaches also address profiling of physicochemical properties of natural compounds datasets that follow rules defining drug- and lead-likeness. Several studies have addressed the analysis of the distribution of physicochemical properties of different NPs databases for example by Feher and Schmidt [32], Singh et al [33] and Medina Franco et al [34].

A chemoinformatics analysis by Yongye and Medina-Franco reports quantitative measures that can predict the changes in binding profiles of compounds as a function of their structural diversity [34]. Such studies are expected to be helpful for the synthetic chemists for designing drug like compound libraries.

The HHCPF considered in this study occurs in most of the steroidal hormones like progesterone, estrogen, testosterone, cortisone etc. and has been identified as one of the five most common unique privileged scaffolds occurring in the NPs space [22]. This tetra cyclic molecular framework is present in a wide range of steroidal hormones viz., progesterone, estrogen, testosterone, cholesterol, cortisole etc. Compounds with a wide range of substitutions on the

156

Chapter 7

HHCPF have shown to bind to different drug targets showing different pharmacological effects.

Examples include ICI 182,780, a steroidal estrogen antagonist developed in AstraZeneca

(Cheshire, United Kingdom) [35, 36]. This compound was found to inhibit growth of the breast and endometrium carcinoma, without crossing the blood-brain barrier and neutral with respect to lipids and bone. Banday et al, reported synthesis of 21-triazolyl derivatives of pregnenolone and their potential antitumor activities on human prostrate, colon, liver, lung and CNS cancer cell lines [37]. Yue et al. reported that, the Ginsenosides and Panax ginseng extracts show protective effects on vascular dysfunctions, such as hypertension, atherosclerotic disorders and ischemic injury by binding to a group of nuclear steroid hormone receptors [38]. A detailed review by

Auci et al. report the therapeutic effects such as anti inflammatory effects, anti-gluco corticoid activity, hematopoietic effects, immune regulation, thermogenesis, anti-aging effects and anti cancer activities of androstene hormones and their synthetic derivatives [39]. Assays coupled by influx of hydrophobic probes, 3-oxosteroids were performed by Plesiat et al, to access the permeabilities of outer membranes of Gram-negative bacteria [40], showing the potential of the

HHCPF compounds to be potential anti bacterials.

Two proteins of M. Tb (PDB IDs: 1NFQ and 1X8V) and one protein of HIV (PDB ID:

1EX4) reported in protein data bank were also found to bind compounds with the HHCPF scaffolds. Hence this framework is a topic of interest in this thesis. In this study, an attempt has been made to analyze structural and functional diversity of the HHCPF in the approved drugs.

Effect of the position and number of double bonds in the HHCPF and the types of substitutions at each carbon atom on the target selectivity of the drugs has been studied here. This analysis is expected to help the design of new chemical entities which show high selectivity and specificity towards M. Tb and HIV targets.

157

Chapter 7

7.2 Methodology

7.2.1 Preparation of dataset

The DrugBank [41, 42] was thoroughly searched for all the drugs which contain the

HHCPF. This framework has 17 C atoms and 17 C-C bonds. We generated 17 different structures by systematically replacing these 17 C-C bonds by double bonds. Thus 136 structures were generated for two double bonds placed at different positions in various combinations.

Similarly, all the possible structures containing three, four, five and six double bonds were generated. Then all these structures were subjected to substructure search (exact matches) in

DrugBank to find drugs that have these structures. All categories of drugs viz., approved, experimental, nutraceutical, illicit and withdrawn reported in DrugBank were considered for the study. No hits were obtained when substructure search was done with query structures having 6 double bonds. Hence further substructure search taking a query structure beyond 6 double bonds was not performed. The hits obtained were first classified based on the number of double bonds present and then each class was further categorized based on the positions of the double bonds.

7.2.2 Targets

Information on the targets (proteins to which a drug binds to produce the pharmacological effect) of all these drugs were collected from DrugBank. Other proteins to which the drug also binds such as the carriers, enzymes and the transporters reported for each of the drug were also collected from DrugBank. The 3D structures of these drugs were also obtained from the protein data bank (PDB) [43]. Their structural classification was collected from the CATH database. The families, to which each of these targets belongs to, were identified from UniProt [44].

158

Chapter 7

7.2.3 Physicochemical and ADMET properties

Various ADMET properties likes Human Intestinal Absorption (HIA), Blood Brain

Barrier (BBB), Caco-2 permeable (Caco-2 P), P-glycoprotein substrate (PGPS), Renal organic cation transporter (ROCT), CYP450 inhibitory promiscuity (CYP450 IP), Ames test (AT),

Carcinogenicity (Carcin.), Biodegradation (Biodegr.), Rat acute toxicity (RAT, (LD50)), hERG inhibition (hERG) etc. as well as physicochemical properties such as water solubility (WS), logP, logS, pKa (strongest acidic), pKa (strongest basic), polar surface area (PSA), rotatable bond count (#RB), refractivity (Refr.) and polarizability (Polarz.) were collected from DrugBank.

7.2.4 Docking

Seven different targets, to which at least five drugs bind, were chosen for the docking study. These are glucocorticoid receptor (PDB ID: 1M2Z), estrogen receptor (PDB ID: 1A52), androgen receptor (PDB ID: 1E3G), progesterone receptor (PDB ID: 1A28), estrogen receptor-β

(PDB ID: 1QKM), steroid delta-isomerase (PDB ID: 3NHX), estradiol 17-β-dehydrogenase 1

(PDB ID: 1A27) and mineralo corticoid receptor (PDB ID: 2AA5). Glide module of Schrödinger molecular modeling suite [45] was used for docking the NP/ND drugs to their respective targets.

The crystal structures of these proteins were downloaded from PDB and were prepared for docking using the protein preparation wizard of Schrödinger molecular modeling suite. The water molecules beyond 5 Å of the active site ligand were removed. The bond orders were corrected, disulfide bridges were assigned and hydrogen atoms were added to the crystal structure. The missing residues and side chains in the crystal structures were filled using the prime module. The hydrogen atoms were then minimized using OPLS 2005 force field. Then a grid was generated to define the active site as a cubic box of 12*12*12 Å3 around the co-crystal ligand. Molecules were sketched in 3D format with maestro and LigPrep module [46] was used

159

Chapter 7 with all default parameters to produce low-energy conformers with OPLS 2005 force field. The optimized ligands were then docked into the respective active sites of the targets using Glide module of Schrödinger first using the default parameters of simple precision (SP) mode and then extra-precision (XP) mode. The ligands were kept flexible by producing the ring conformations and by penalizing non-polar amide bond conformations, whereas the receptors were kept rigid throughout the docking studies. The lowest energy conformations were selected and, the ligand interactions with target protein were determined. Three best energy poses were generated per molecule and the pose having the lowest energy was considered for the analysis. To validate the pose prediction ability of Glide, the ligands present in the crystal structures were re-docked in the same active site. RMSD between the conformations of the ligands in the original crystal structure and the top scored docking pose were calculated.

7.2.5 Generation and validation of e-Pharmacophore models

The drug-target complexes along with the XP energy terms obtained in the previous step were then subjected to e-Pharmacophore [47] generation tool of Schrodinger to generate energy based pharmacophore models. The HHCPF compounds bound to each of these targets, which are absent in our dataset of 110 drugs, were collected from PDB. These compounds were mapped to the e-Pharmacophore models of the respective targets for validating the performances of the models.

7.3 Results and discussion

The objectives of the study are to understand i) structural diversity and frequency of occurrence of the HHCPF in the approved drugs, ii) target binding preferences of scaffolds with different numbers of double bonds present at different positions of the HHCPF, ii) preferences of

160

Chapter 7 different targets to bind certain HHCPF scaffolds, iii) effect of a particular type of substitution at a particular position of the scaffold on the target selectivity and ADMET and physicochemical properties of the drugs. DrugBank was chosen as the source of the dataset because all the compounds reported in DrugBank are either approved drugs or at least have been shown experimentally to bind specific proteins in mammals, bacteria, viruses, fungi, or parasites etc.

7.3.1 Classification of the NP/ND drugs based on scaffold

12 17 11 13 16 1 9 10 14 2 8 15

3 5 7 4 6 110

No 1 2 3 4 5 double bond double bond double bonds double bonds double bonds double bonds S0 (14, 1) S1 (45, 5) S2 (26, 5) S3 (22, 2) S4 (1, 1) S5 (2, 1)

S0(14) S2.I1(20) S2.I2 (2) S2.I3 (2) (1) (2)

S2.I4(1) S2.I5(1)

S1.I1(33) S1.I2(9) S1.I3(1) S3.I1(20) S3.I2(2)

S1.I4(1) S1.I5(1) Figure 7.1 Classification of HHCPF drugs into various scaffolds based on the number and position of double bonds in the HHCP skeleton.

The DrugBank was searched for all the drugs containing HHCPF. HHCPF has 17 C atoms and 17 C-C bonds, which can be single or double bonds (Figure 1). All the possible structures were generated with one double bond placed at different positions of HHCPF.

Similarly all structures with two, three, four, five and six double bonds placed at different positions of the tetra cycle in different combinations were also generated. Then substructure

161

Chapter 7 searches (exact match) were performed against the DrugBank with all these generated structures.

A total of 110 drugs were obtained which contained the query structures as sub-structures.

Among these 110 drugs, 25 are NPs, 79 are ND drugs and 6 are synthetic drugs. The 79 ND were mostly derived from/ mimetic of/ inspired by the NPs estradiol, progesterone, testosterone, androstanedione, cortisole (hydrocortisone), cortisone, cholesterol, corticosterone, equilenin and digoxin. These 110 drugs were now classified into groups based on the number and positions of double bonds in the frame work. Figure 1 shows the 15 scaffolds and number of drugs containing each of these scaffolds. First, they were divided into 5 groups viz., S0, S1, … S5 based on the number of double bonds present in the HCCPF and then each of these classes were further grouped as S1.I1, S1.I2 etc. based on the positions of double bonds in the framework. Thus a total of 15 groups of structures were obtained, each group having a common HHCPF and we refer these frameworks as scaffolds. One type of scaffold with no double bonds (S0), five types of scaffolds with one double bond (S1.I1-S1.I5), five types of scaffolds with two double bonds

(S2.I1-S2.I5), two types of scaffolds with three double bonds (S3.I1 and S3.I2), one type of scaffold with four double bonds (S4) and one type of scaffold with five double bonds (S5) were obtained after the classification. No compounds were found in DrugBank having six or more double bonds in the HHCPF. Scaffolds with one double bond were found to be the most frequently occurring ones, constituting 45 out of 110 i.e., 41% of the total drugs. As the number of double bonds increases, the frequency of occurrence of the scaffold decreases indicating the preference of more flexible scaffolds over the rigid ones with higher number of double bonds in the HHCPF. It was observed that although a large number of positional isomers are possible for the scaffolds (which were used as queries for the substructure search) with same number of double bonds, only a very few of them occur in the NP/ND drugs. For example, there are 17

162

Chapter 7 positional isomers possible when we substitute one of the 17 bonds of the HHCPF as a double bond, but only five of them occur in the NP/ND drugs. Thus there are 136 possible isomers for scaffolds with two double bonds, but only 5 out of them are present in the drugs. As we go on increasing the number of double bonds in the scaffold, the number of possible positional isomers increases, but the numbers of isomers that actually occur in the NP/ND drugs decreases. Even among the different positional isomers with same number of double bonds, one isomer occurs in very high frequency as compared to the others. This shows that nature selects only a few positional isomers among many possibilities.

7.3.2 Classification of the targets

We obtained the target information for each drug from the DrugBank. The 110 drugs bind to a total of 47 targets belonging to 20 families of proteins. The complete list of these targets and their families with their short forms are given in Table 7.1.

Table 7.1 List of targets of the tetra cyclic drugs reported in DrugBank with the number of drugs binding to them.

Sl. Target Family # No. 2 1 Glucocorticoid Receptor (T19) Nuclear Hormone Receptor Family (F15) 5 1 2 Estrogen Receptor (T17) Nuclear Hormone Receptor Family (F15) 9 1 3 Androgen Receptor (T10) Nuclear Hormone Receptor Family (F15) 4 1 4 Progesterone Receptor (T42) Nuclear Hormone Receptor Family (F15) 1 5 Estrogen Receptor-Β (T18) Nuclear Hormone Receptor Family (F15) 7 6 Steroid Delta-Isomerase (T46) 3-Beta-HSD Family (F1) 6 Short-Chain Dehydrogenases/Reductases 7 Estradiol 17-Beta-Dehydrogenase 1 (T16) 5 (SDR) Family (F16) Sodium/Potassium-Transporting Atpase Alpha-1 Chain Cation Transport Atpase (P-Type) (TC 8 4 (T45) 3.A.3) Family (F6) 9 Mineralo Corticoid Receptor (T28) Nuclear Hormone Receptor Family (F15) 4 10 Neuronal Acetylcholine Receptor Subunit Alpha-2 (T32) Nuclear Hormone Receptor Family (F15) 4 G-Protein Coupled Receptor 1 Family 11 Muscarinic Acetylcholine Receptor M2 (T29) 3 (F8)

163

Chapter 7

SRC/P160 Nuclear Receptor Coactivator 12 Nuclear Receptor Coactivator 2 (T36) 3 Family (F17) 13 Sulfotransferase Family Cytosolic 2B Member 1 (T47) Sulfotransferase 1 Family (F18) 3 14 Cytochrome P450 19A1 (T15) Cytochrome P450 Family (F7) 2 G-Protein Coupled Receptor 1 Family 15 Muscarinic Acetylcholine Receptor M3 (T30) 2 (F8) 16 Nuclear Receptor ROR-Alpha (T38) Nuclear Hormone Receptor Family (F15) 2 17 Nuclear Receptor Subfamily 1 Group I Member 3 (T39) Nuclear Hormone Receptor Family (F15) 2 18 Orphan Nuclear Receptor (T40) Nuclear Hormone Receptor Family (F15) 2 3 Beta-Hydroxysteroid Dehydrogenase/Delta 5-->4- 19 3-Beta-HSD Family (F1) 1 Isomerase Type I (T2) Adenylyl Cyclase Class-4/Guanylyl 20 Adenylate Cyclase (T7) 1 Cyclase Family (F2) 21 3-Oxo-5-Alpha-Steroid 4-Dehydrogenase 1 (T3) Aldo/Keto Reductase Family (F3) 1 22 3-Oxo-5-Beta-Steroid 4-Dehydrogenase (T4) Aldo/Keto Reductase Family (F3) 1 23 Aldo-Keto Reductase Family 1 Member C1(T8) Aldo/Keto Reductase Family (F3) 1 24 Aldo-Keto Reductase Family 1 Member C3 (T9) Aldo/Keto Reductase Family (F3) 1 25 Carbonic Anhydrase 2 (T1) Alpha-Carbonic Anhydrase Family (F4) 1 26 Annexin A1 (T11) Annexin Family (F5) 1 27 6-Deoxyerythronolide B Hydroxylase (T6) Cytochrome P450 Family (F7) 1 28 Aromatase (T12) Cytochrome P450 Family (F7) 1 29 Cytochrome P450 17A1 (T14) Cytochrome P450 Family (F7) 1 30 Ig Gamma-1 Chain C Region (T20) Immunoglobulin (F9) 1 31 Ig Gamma-2 Chain C Region (T21) Immunoglobulin (F9) 1 32 Ig Kappa Chain C Region (T22) Immunoglobulin (F9) 1 33 Ig Kappa Chain V-II Region RPMI 6410 (T23) Immunoglobulin (F9) 1 Ligand-Gated Ion Channel (TC 1.A.9) 34 5-Hydroxytryptamine 3 Receptor (T5) 1 Family (F10) Ligand-Gated Ion Channel (TC 1.A.9) 35 Neuronal Acetylcholine Receptor Subunit Alpha-3 (T33) 1 Family (F10) 36 Microtubule-Associated Protein 1A (T26) MAP1 Family (F11) 1 37 Microtubule-Associated Protein 2 (T27) MAP2 Family (F12) 1 38 Neocarzinostatin (T31) Neocarzinostatin Family (F13) 1 39 Nitric Oxide Synthase, Inducible (T34) NOS Family (F14) 1 40 Nuclear Receptor 0B1 (T35) Nuclear Hormone Receptor Family (F15) 1 41 Pentaerythritol Tetranitrate Reductase (T41) Nuclear Hormone Receptor Family (F15) 1 42 Putative Uncharacterized Protein (T43) Nuclear Hormone Receptor Family (F15) 1 43 Retinoic Acid Receptor RXR-Alpha (T44) Nuclear Hormone Receptor Family (F15) 1 Short-Chain Dehydrogenases/Reductases 44 Corticosteroid 11-Beta-Dehydrogenase Isozyme 1 (T13) 1 (SDR) Family (F16) SRC/P160 Nuclear Receptor Coactivator 45 Nuclear Receptor Coactivator 5 (T37) 1 Family (F17) 46 Lanosterol Synthase (T24) Terpene Cyclase/Mutase Family (F20) 1 Type-B Carboxylesterase/Lipase Family 47 Lipase 3 (T25) 1 (F19) #: The number of tetra cyclic drugs bound to the target

164

Chapter 7

The families of each of the 47 targets were obtained from UniProt. UniProt uses a number of sources like protein family databases (Gene3D, PANTHER, Pfam, PIRSF, PRINTS,

ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs etc.), sequence analysis and search tools and scientific literature to assign the families of the proteins involved in similar functions considering the factors such as end-to-end similarity with other family members and shared organization such as common domain order and topology. 14 out of 47 targets belong to the nuclear hormone receptor family (F15). The number of targets belonging to the other families is much lesser than F15, e. g., aldo/keto reductase family (F3), cytochrome P450 family (F7) and immunoglobulins (F9) families have four targets each. This shows the preferential binding of the

HHCPF drugs for a particular family of protein performing similar function. Apart from classifying the targets based on their function, the preference of the HHCPF drugs to bind to targets having a certain type of architecture was also analyzed. Out of the 47 targets, 3D structures have been reported in PDB for 33. The CATH classifications of these 33 targets reveals that 17 (more than 50%) targets belong to the αβ class, 11 belong to mainly α and five belong to the mainly β classes. Among the αβ class, the targets have all types of architectures, but preferably the 3 layered αβα sandwich and αβ barrel architectures. The α proteins are mostly orthogonal bundles and the β proteins have the sandwich architecture. This analysis shows that the compounds with the HHCPF bind to targets of all type of secondary structural architectures, but mostly prefer the αβ class of proteins. Figure 7.2 shows structural (CATH) and family classifications of all the targets of the HHCPF drugs.

165

Chapter 7

A)

Sandwich 5 Mainly beta

Orthogonal bundle

B) F19 F20 F1 F18 F2

F17 F3

F16 F4 F5 F6

F7

F15 F8

F9

F10 F14 F13 F12 F11

Figure 7.2 A) Structural classification (CATH) of all the targets of the HHCPF drugs (targets those have 3D structure reported in PDB), B) Classification of the functional families of the targets. Full names of the families have been given in Table 7.1.

7.3.3 Promiscuity of HHCPF scaffolds

In this section, the targets and their families for the drugs containing a particular scaffold were analyzed to understand how specific or how promiscus a certain scaffold is for the targets.

The frequencies of different scaffolds binding to different targets and families of targets have been shown in Figure 7.3.

166

Chapter 7

S4(1) S5(1) S3.I2 (3) S0(15) S3.I1(11) S2.I5 S2.I4 S2.I3(2)

S2.I1(6) S2.I2(2) S1.I1(18)

S1.I4(1) S1.I2(6)

S1.I5(1) S1.I3(1)

Figure 7.3 Frequencies of binding of the 15 HHCPF scaffolds to different targets and families of targets.

It was found that S1.I1 scaffold binds to 18 different targets, but most of them (~75%) belonging to the family nuclear hormone receptor (F15). Based on the structural classification

(CATH) of the 18 targets, it was observed that the S1I.1 scaffold binds to all the three types of structural classes i.e., αβ, mainly α and mainly β. Going to a deeper level of classification this scaffold binds to the 2 and 3 Layer (αβα) Sandwich, αβ Barrel and Roll architectures among the

αβ class, but more frequently to the 3 Layer (αβα) Sandwich architecture, while among the mainly α proteins, it prefers the orthogonal bundle architecture. It also binds to the mainly β proteins with sandwich arrangement. Hence, the S1.I1 scaffold binds to a wide range of targets belonging to various families classified based on their function as well as structurally diverse

167

Chapter 7 classes and is the most privileged scaffold among all the scaffolds considered in this study. The

S0 scaffold binds to 15 different targets belonging to 7 different families following a nearly uniform distribution unlike the S1.I1 scaffold which has a higher preference for one family. This scaffold also binds to the αβ, mainly α and mainly β types of proteins. The S3.I1 scaffold binds to 11 different targets belonging to 7 functionally diverse families, but ~75% of them are from the F15 family. Analysis of the structural diversity of the targets of S3.I1 drugs shows that the scaffold mostly binds to the αβ and mainly α proteins. It binds to 2 and 3 layer αβ sandwich architectures in αβ proteins and orthogonal bundle architecture in mainly α class. For this scaffold, the functional and structural diversity of the targets is lesser than those of S0 and S1.I1.

S1.I2 and S2.I1 scaffolds bind to 6 different targets each. The S1.I2 scaffold binds to 5 different families maintaining a uniform distribution. This scaffold binds mostly to the αβ and mainly α proteins and 3 layer (αβα) sandwich and orthogonal bundle architectures respectively. The targets of the S2.I1 scaffold mostly belong to F15 family and structurally they are mainly α orthogonal bundles. Hence, the major observation in this section was that introduction of double bonds at different positions and in different numbers makes the scaffolds more specific for a certain target/target family and a scaffold is more selective for a group of targets that perform similar functions rather than for a particular structural class.

7.3.4 Interaction of the scaffolds and substituents of HHCPF drugs with the targets

Apart from understanding the preference of a scaffold for binding to a certain target or target class, the compounds binding to each target were listed out to understand whether a target has a preference to bind to a certain scaffold. Figure 7.4 depicts the preference of a particular target for a scaffold.

168

Chapter 7

S2.I3, 1 S2.I2, 1

S1.I1, 6 S3.I2, 2 S0, 1 S1.I1, 7 S1.I5, 1 S1.I2, 1

S2.I1, 16 S3.I1, 11 S1.I1, 10

Glucocorticoid receptor Estrogen Receptor Androgen Receptor

S3.I2, 1 S2.I3, 1 S5, 2 S0, 2

S2.I2, 2 S3.I1, 7 S1.I1, 7

S1.I1, 2

Progesterone receptor Estrogen Receptor-β Steroid Delta-isomerase

S0, 1 S1.I2, 1 S1.I1, 2

S4, 1 S0, 4 S1.I1, 4 S3.I1, 1

Estradiol 17-β-dehydrogenase 1 Mineralo corticoid receptor Sodium/potassium-transporting ATPase alpha-1 chain

S3.I2, 1

S0, 4 S0, 3 S3.I1, 3

Neuronal acetylcholine receptor Nuclear receptor co activator 2 Muscarinic Acetylcholine Receptor subunit alpha-2 M2

S1I2, 1

S2.I1, 2

Sulfotransferase Family Cytosolic 2B Member 1

Figure 7.4 Important targets binding various scaffolds. The targets that bind to at least 3 compounds have been shown here. The figure shows the preference of the targets to bind a particular scaffold.

169

Chapter 7

Out of 47, 13 targets were found to bind a minimum of 4 drugs. Glucocorticoid receptor was found to be the major target binding 25 out of 110 drugs followed by estrogen receptor, androgen receptor and progesterone receptor binding 18, 14 and 11 drugs respectively. It was observed that each of these targets bind preferably a certain scaffold for example, glucocorticoid receptor binds a total of 5 different scaffolds, but prefers the S2.I1 scaffold while estrogen receptor β and nuclear receptor co activator 2 preferably bind the drugs with S3.I1 scaffold.

Similarly the S0 scaffolds are proffered by sodium/potassium-transporting ATPase alpha-1 chain, neuronal acetylcholine receptor subunit alpha-1 and muscarinic acetylcholine receptor M2 targets.

Presence of a certain substituent at different positions of the HHCPF also play important role in target binding. Here, the interaction of the 7 important targets glucocorticoid receptor, estrogen receptor, androgen receptor, progesterone receptor, estrogen receptor-βsteroid delta isomerase and estradiol 17-β-dehydrogenase 1 (which bind more than 5 drugs) with their respective binders have been discussed. Figure 7.5 shows interactions of the representative

HHCPF drugs with the active sites of the corresponding HHCPF drug targets. As observed from the docking results, all the glucocorticoid receptor binders have a substitution at position 11 of the scaffold, which is not observed in case of other binders. Most of these substituents are –OH or keto groups which make interactions with the N564 residue of glucocorticoid receptor. The presence of H-bond donors and acceptor groups attached at position 17 were found to be very essential for binding to residues N564, Q642, M560, L563, Q570, W600 etc. by H-bonds. In all the glucocorticoid receptor binders except (3β, 7β)-cholest-5-ene-3,7-diol the position 3 contains a ketone group which act as H-bond acceptors for the H-bond interactions made by the residue

R611.

170

Chapter 7

Glucocorticoid Receptor - Paramethasone Estrogen Receptor - Fluoxymesterone

Androgen Receptor - 17-Hydroxy-18a-Homo- Progesterone Receptor - Methyltrienolone 19-Nor-17alpha-Pregna-4,9,11-Trien-3-One

Steroid Delta Isomerase - 17-Methyl-17- Estrogen Receptor-- Estriol Alpha-Dihydroequilenin

Estradiol 17-Β-Dehydrogenase 1 - Equilin

Figure 7.5 Interactions of the HHCPF drugs with the active sites of the corresponding HHCPF drug targets. One representative drug for each of the 7 important HHCPF drug targets are shown here.

171

Chapter 7

In case of estrogen receptor binders, the reason for preference of the S3.I1 scaffold is probably due to the π-π interaction of the aromatic ring of the main scaffold with the residue

F404. A diversity of substitution at position 3 was observed which was conserved in case of glucocorticoid receptor binders. Drugs with –OH or C=O groups at the 3rd carbon mostly make

H-bond interactions with polar residues E353 and R394. Some compounds like conjugated estrogens and estropipate have sulphate groups at position 3 which make H-bonds with the R394 residue. The H-bond donors and acceptors at position 17 interact with H524 and G521. The androgen receptor binders show a conserved type of interaction pattern. As mentioned earlier, the androgen receptor shows a preference towards the S1.I1 scaffold. Presence of C=O group in all the binders was seen as a conserved feature. This group interacts with the R752 residue as an

H-bond acceptor. Presence of the –OH at position 17 is observed to be very essential for binding to the residues N705 as a H-bond donor and T877 as a H-bond acceptor. In fluoxymesterone of an –OH group is added at position 11 and makes interaction with the L704 residue. Addition of an aromatic ring at position 17 in nandrolone phenpropionate also was found to be beneficial as this ring adds to the ligand binding by making π-π interactions with the residue F876. The progesterone receptor binders also have a conserved C=O group at third position which interacts with the R766 and Q725 residues showing the importance of the substituent. The –OH group present in most of the progesterone binders make H-bond interactions with the backbone C=O group of L887 residue in a conserved manner. Hence, this group also is important at position 17.

The estrogen receptor-β binders mostly have –OH groups at the third position which make H- bonds with the residues R346, L339 and E305. The aromatic rings in most of the drugs (S3.I1 scaffold) make interactions with the F356 residue. The –OH/C=O groups attached at the 17th positions of most of the drugs make interactions with the H475 residue as H-bond donors. The

172

Chapter 7 steroid delta isomerase binders were found to bind to the target by interacting with Y14, N99 and

D38 residues through H-bonds either at position 3 or 17. Binders of estradiol 17-β- dehydrogenase 1 interact with the target mostly by making H-bonds with the C185, S142 and

Y155 residues. In this case also the H-bond donors/ acceptors at the 3rd position play crucial role for the target binding. Overall we see that binding of the HHCPF drugs to the targets is mostly governed by the type of scaffold (i. e., number and position of double bonds in the HHCPF) and the substitutions at positions 3, 11 and 17. The binding is also influenced by the presence or absence of the H-bond donors, H-bond acceptors and aromatic rings.

7.3.5 e-Pharmacophore models for the common HHCPF drugs

As found in the last section, the HHCPF drugs bind to their respective targets in a particular pattern mostly governed by the type of scaffold and interactions of the nature of substituents present at different positions of the scaffold. Even within a group of closely related protein domains, the shape and orientation of chemical features in the active sites vary substantially making some of the scaffolds selective for binding. Therefore, the stereochemistry and substitution pattern of the chosen scaffolds for library design play crucial role in target specificity [24]. We superposed the binding sites of each of the seven common targets (discussed in the previous section) with each other based on their backbone and found that the RMSDs range from 0.51 to 3.22 Å. Although these targets share significant sequence and overall secondary structures, but the spatial arrangement of the chemical features in the active sites required for drug binding are different as revealed from our docking studies. A drug should have certain type of chemical features as substituents at certain positions of the HHCPF scaffolds in order to bind to its target. So, in this section an attempt has been made to capture the target specific interactions of the HHCPF drugs in the form of e-Pharmacophore models.

173

Chapter 7

M601 Q642 R394 F404 M604

M560 H524 Q570 L346 E353 N564 G567

Glucocorticoid receptor Estrogen Receptor W741 N705 L877 F778

R766 C891 M745 T877 M756 R752 M909 Q725 W755 L704

Androgen Receptor Progesterone receptor

L298 H475 D38 E305 Y14 F356 L61 L18 L63 I376 R346 P97 V84 L380

Steroid Delta-isomerase Estrogen Receptor-β

S142 Y218

C185 H221

P187

Estradiol 17-β-dehydrogenase 1

Figure 7.6 e-Pharmacophore models proposed for the seven most important HHCPF drug targets. The important active site residues associated with the features have been shown.

174

Chapter 7

e-Pharmacophore models have been generated for the seven targets most common

HHCPF targets based on the drug-target interactions obtained from docking results as discussed in the previous section. Figure 7.6 shows the e-Pharmacophore models for the seven most common targets of HHCPF drugs along with the active site residues associated with each feature.

The models consist of six different chemical features, viz., H-bond acceptor (A), H-bond donor

(D), hydrophobic sites (H), negative ionizable sites (N), positive ionizable sites (P) and aromatic rings (R). H-bond donors were represented as projected points, located at the corresponding H- bond acceptor positions in the binding site. The inter feature distances can be used as measures to identify target specific HHCPF compounds. These models would also help to identify the most suitable target(s) for a new set of molecules with HHCPF scaffolds. The predictive abilities of these models are validated by mapping the co-crystallized ligands of the respective targets reported in PDB, which have the HHCPF scaffolds and which are not included in the model generation. The spatial arrangements and inter feature distances of the features of each of these models are different from each other in spite of the overall structural similarities among them.

Hence, these models would be helpful to predict the target specificity of HHCPF compounds.

7.3.6 Correlation between the number/nature of substituents and the ADMET properties

In this section the effect of substitution at a certain position of a scaffold on the physicochemical as well as ADMET properties of the drug has been analyzed. ADMET and physicochemical properties of these drugs were collected from DrugBank. The substituents at different positions of the main scaffolds were represented as 6 different features. These features are H-bond acceptors (Acceptor), H-bond donors (Donor), hydrophobic groups (Hydrophobic), charged species (Charged), aromatic rings (Ring) and halogens (Halogen) (Here, different nomenclature has been used for the substituents to avoid confusion with the features of e-

175

Chapter 7

Pharmacophores described in the previous section). The number of these features present at each of the 17 positions for each the HHCPF drugs were determined. Correlation coefficients (R) were calculated between the numbers of each feature present at each position of the HHCPF and the ADMET and physicochemical properties for sets of drugs having a common HHCPF scaffold. The R values for the different series of drugs have been shown in Table 7.2.

Table 7.2 Correlation between number of certain chemical feature at a particular position and the ADMET properties grouped based on the scaffold.

Scaffold Feature Position Property R Feature Position Property R S0 Acceptor 3 HIA -0.93 Charged 2 HIA -0.88 Acceptor 3 Biodegr. -0.87 Charged 2 BBB -0.95 Acceptor 3 PSA 0.96 Charged 2 Caco-2 P -0.91 Acceptor 3 Refr. 0.83 Charged 2 PGPS 0.90 Acceptor 3 Polarz. 0.80 Charged 2 AT -0.88 Acceptor 17 HIA -0.93 Charged 2 logP -0.93 Acceptor 17 Caco-2 P -0.98 Charged 2 PSA 0.93 Acceptor 17 PGPS 0.95 Charged 2 Refr. 0.93 Acceptor 17 ROCT -0.82 Charged 2 Polarz. 0.91 Acceptor 17 AT -0.99 Charged 16 HIA -0.88 Acceptor 17 Carcin. 0.86 Charged 16 BBB -0.95 Acceptor 17 RAT (LD50) 0.86 Charged 16 Caco-2 P -0.91 Acceptor 17 hERG 0.93 Charged 16 PGPS 0.90 Acceptor 17 WS (g/l) -0.80 Charged 16 AT -0.88 Acceptor 17 logP -0.83 Charged 16 logP -0.93 Acceptor 17 PSA 0.96 Charged 16 PSA 0.93 Acceptor 17 Refr. 0.97 Charged 16 Refr. 0.93 Acceptor 17 Polarz. 0.98 Charged 16 Polarz. 0.91 S1.I1 Hydrophobic 10 CYP450 IP 0.70 S1.I2 Acceptor 3 Caco-2 P -0.92 Donor 3 Caco -2 P 0.92 Acceptor 3 PGPS -0.85 Donor 3 PGPS 0.85 Acceptor 3 RAT (LD50) -0.92 Donor 3 RAT (LD50) 0.92 S2.I1 Acceptor 17 HIA -0.84 Donor 17 logP -0.89 Acceptor 17 PSA 0.89 Donor 17 PSA 0.88 Hydrophobic 16 Biodegr. 0.83 S3. I1 Acceptor 17 HIA -0.97 Donor 3 PGPS -0.82 Donor 17 HIA -0.90 Donor 3 Carcin. -0.91 The correlations with R > 0.8 have only been shown (except that for the S1 scaffold). Full descriptions of the properties have been mentioned in the ‘Materials and methods’ section.

It was found that many of the properties have very high (R > 0 .8) correlation with the number of a certain feature at a particular position. Although it is a very rough estimation, Table

7.2 shows that, the substitutions especially Acceptors and Donors at the 3rd and 17th positions of

176

Chapter 7 the scaffolds higher effect on the ADMET and physicochemical properties for almost all HHCPF scaffolds.

7.4 Conclusions

This chapter reports a systematic study of structural and functional diversities of the

HHCPF drugs collected from DrugBank. The drugs were classified into different scaffolds based on the number and positions of double bonds in the HHCPF and the preferences of these scaffolds for different targets were analyzed. As the numbers of double bonds in the scaffold go on increasing, the number of possible positional isomers increases, but the numbers of isomers that actually occur in the NP/ND drugs decreases. Even among the different isomers with same number of double bonds, one isomer occurs in very high frequency as compared to the other isomers indicating nature’s selection of only a few positional isomers among many possibilities.

Introduction of double bonds at different positions and in different numbers make the scaffolds more specific for a certain target/target family and a scaffold is more selective for a group of targets that perform similar functions rather than for a particular structural class. The binding is also influenced by the presence or absence of the H-bond donors, H-bond acceptors and aromatic rings as substitutions at different positions of the HHCPF scaffolds. The number and nature of substituents present at positions 3 and 17 were found to have influence on the ADMET and physicochemical properties of drugs. The overall analyses show that the scaffolds determine the preference of the drugs for a target while the substituents at various positions of the scaffold are responsible for the binding strength, physicochemical properties and ADMET profiles of the

HHCPF drugs.

177

Chapter 7

References

1. Newman, D. J., & Cragg, G. M. (2012). Natural products as sources of new drugs over the 30 years from 1981 to 2010. J Nat Prod, 75(3), 311-335. 2. Newman, D. J., Cragg, G. M., & Snader, K. M. (2000). The influence of natural products upon drug discovery. Nat Prod Rep, 17(3), 215-234. 3. Ganesan, A. (2008). The impact of natural products upon modern drug discovery. Curr Opin Chem Biol, 12(3), 306-317. 4. Harvey, A. L. (2008). Natural products in drug discovery. Drug Discov Today, 13(19), 894-901. 5. Harvey, A. (2000). Strategies for discovering drugs from previously unexplored natural products. Drug Discov Today, 5(7), 294-300. 6. Wilson, R. M., & Danishefsky, S. J. (2006). Small molecule natural products in the discovery of therapeutic agents: the synthesis connection. J Org Chem, 71(22), 8329- 8351. 7. Newman, D. J. (2008). Natural products as leads to potential drugs: an old process or the new hope for drug discovery?. J Med Chem, 51(9), 2589-2599. 8. Sunazuka, T., Hirose, T., & O̅ mura, S. (2008). Efficient total synthesis of novel bioactive microbial metabolites. Acc Chem Res, 41(2), 302-314. 9. Bade, R., Chan, H. F., & Reynisson, J. (2010). Characteristics of known drug space. Natural products, their derivatives and synthetic drugs. Eur J Med Chem, 45(12), 5646- 5652. 10. Butler, M. S. (2008). Natural products to drugs: natural product-derived compounds in clinical trials. Nat Prod Rep, 25(3), 475-516. 11. Riva, S. (2001). Biocatalytic modification of natural products. Curr Opin Chem Biol, 5(2), 106-111. 12. Clough, J., Chen, S., Gordon, E. M., Hackbarth, C., Lam, S., Trias, J., Jacobs, J. W. (2003). Combinatorial modification of natural products: synthesis and in vitro analysis of derivatives of thiazole peptide antibiotic GE2270 A: A-ring modifications. Bioorg Med Chem Lett, 13(20), 3409-3414. 13. Shu, Y. Z. (1998). Recent natural products based drug development: a pharmaceutical industry perspective. J Nat Prod, 61(8), 1053-1071. 14. Ortholand, J. Y., & Ganesan, A. (2004). Natural products and combinatorial chemistry: back to the future. Curr Opin Chem Biol, 8(3), 271-280. 15. Boldi, A. M. (2004). Libraries from natural product-like scaffolds. Curr Opin Chem Biol, 8(3), 281-286. 16. Yao, N., Song, A., Wang, X., Dixon, S., & Lam, K. S. (2007). Synthesis of flavonoid analogues as scaffolds for natural product-based combinatorial libraries. J Comb Chem, 9(4), 668-676. 17. Atuegbu, A., Maclean, D., Nguyen, C., Gordon, E. M., & Jacobs, J. W. (1996). Combinatorial modification of natural products: preparation of unencoded and encoded libraries of Rauwolfia alkaloids. Bioorg Med Chem, 4(7), 1097-1106. 18. Gordeev, M. F., Luehr, G. W., Hui, H. C., Gordon, E. M., & Patel, D. V. (1998). Combinatorial chemistry of natural products: Solid phase synthesis of D-and L- cycloserine derivatives. Tetrahedron, 54(52), 15879-15890.

178

Chapter 7

19. Aberle, N. S., Ganesan, A., Lambert, J. N., Saubern, S., & Smith, R. (2001). Parallel modification of tropane alkaloids. Tetrahedron Lett, 42(10), 1975-1977. 20. Nicolaou, K. C., Cho, S. Y., Hughes, R., Winssinger, N., Smethurst, C., Labischinski, H., & Endermann, R. (2001). Solid‐and Solution‐Phase Synthesis of Vancomycin and Vancomycin Analogues with Activity against Vancomycin‐Resistant Bacteria. Chem Eur J, 7(17), 3798-3823. 21. . Nicolaou, K. C., Hughes, R., Cho, S. Y., Winssinger, N., Labischinski, H., & Endermann, R. (2001). Synthesis and Biological Evaluation of Vancomycin Dimers with Potent Activity against Vancomycin‐Resistant Bacteria: Target‐Accelerated Combinatorial Synthesis. Chem Eur J, 7(17), 3824-3843. 22. . Grabowski, K., Baringhaus, K. H., & Schneider, G. (2008). Scaffold diversity of natural products: inspiration for combinatorial library design. Nat Prod Rep, 25(5), 892-904. 23. Evans, B. E., Rittle, K. E., Bock, M. G., DiPardo, R. M., Freidinger, R. M., Whitter, W. L., Anderson, P. S. (1988). Methods for drug discovery: development of potent, selective, orally effective cholecystokinin antagonists. J Med Chem, 31(12), 2235-2246. 24. Breinbauer, R., Vetter, I. R., & Waldmann, H. (2002). From protein domains to drug candidates—natural products as guiding principles in the design and synthesis of compound libraries. Angew Chem, 41(16), 2878-2890. 25. Welsch, M. E., Snyder, S. A., & Stockwell, B. R. (2010). Privileged scaffolds for library design and drug discovery. Curr Opin Chem Biol, 14(3), 347-361. 26. Medina-Franco, J. L. (2013). Advances in computational approaches for drug discovery based on natural products. Rev Latinoam Quimioter Revista latinoamericana de química, 41(2), 95-110. 27. Yongye, A. B., & Medina-Franco, J. L. (2012). Data mining of protein-binding profiling data identifies structural modifications that distinguish selective and promiscuous compounds. J Chem Info Model Journal of chemical information and modeling, 52(9), 2454-2461. 28. Irwin, J. J., & Shoichet, B. K. (2005). ZINC-a free database of commercially available compounds for virtual screening. J Chem Info Model, 177-182. 29. Medina‐Franco, J. L. (2012). Interrogating novel areas of chemical space for drug discovery using chemoinformatics. Drug Devel Res, 73(7), 430-438. 30. Brown, N., & Jacoby, E. (2006). On scaffolds and hopping in medicinal chemistry. Mini- Rev Med Chem, 6(11), 1217-1229. 31. Clemons, P. A., Bodycombe, N. E., Carrinski, H. A., Wilson, J. A., Shamji, A. F., Wagner, B. K., Schreiber, S. L. (2010). Small molecules of different origins have distinct distributions of structural complexity that correlate with protein-binding profiles. Proc Nat Acad Sci, 107(44), 18787-18792. 32. Feher, M., & Schmidt, J. M. (2003). Property distributions: differences between drugs, natural products, and molecules from combinatorial chemistry. J Chem Info Comp Sci, 43(1), 218-227. 33. Singh, N., Guha, R., Giulianotti, M. A., Pinilla, C., Houghten, R. A., & Medina-Franco, J. L. (2009). Chemoinformatic analysis of combinatorial libraries, drugs, natural products, and molecular libraries small molecule repository. J Chem Info Model, 49(4), 1010-1024. 34. Medina-Franco, J. L., & Waddell, J. (2012). Towards the bioassay activity landscape modeling in compound databases. J Mex Chem, 56(2), 163-168.

179

Chapter 7

35. Howell A., Osborne C. K, Morris C., Wakeling A. E. (2000). ICI 182,780 (Faslodex™). Cancer, 89, 817-825. 36. Howell, A., DeFriend, D. J., Robertson, J. F., Blamey, R. W., Anderson, L., Anderson, E., Walton, P. (1996). Pharmacokinetics, pharmacological and anti-tumour effects of the specific anti-oestrogen ICI 182780 in women with advanced breast cancer. Brit J Cancer 74(2), 300. 37. Banday, A. H., Shameem, S. A., Gupta, B. D., & Kumar, H. S. (2010). D-ring substituted 1, 2, 3-triazolyl 20-keto pregnenanes as potential anticancer agents: Synthesis and biological evaluation. Steroids, 75(12), 801-804. 38. Yue, P. Y., Mak, N. K., Cheng, Y. K., Leung, K. W., Ng, T. B., Fan, D. T., Wong, R. N. (2007). Pharmacogenomics and the Yin/Yang actions of ginseng: anti-tumor, angiomodulating and steroid-like activities of ginsenosides. Chinese Med 2 Chinese medicine, 2(1), 6-26. 39. Auci, D. L., Ahlem, C., Li, M., Trauger, R., Dowding, C., Paillard, F., Reading, C. L. (2003). The immunobiology and therapeutic potential of androstene hormones and their synthetic derivatives: novel anti-inflammatory and immune regulating steroid hormones. Mod Asp Immunobiol, 3, 64-70. 40. Plesiat, P., & Nikaido, H. (1992). Outer membranes of Gram‐negative bacteria are permeable to steroid probes. Mol. Microbiol, 6(10), 1323-1333. 41. Wishart, D. S., Knox, C., Guo, A. C., Shrivastava, S., Hassanali, M., Stothard, P., Woolsey, J. (2006). DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res, 34(suppl 1), D668-D672. 42. DrugBank [http://www.drugbank.ca] 43. Protein Data Bank [http://www.rcsb.org/pdb] 44. UniProt [http://www.uniprot.org/] 45. Glide, version 5.8, Schrödinger, LLC, New York, NY, 2012. 46. LigPrep, version 2.5, Schrödinger, LLC, New York, NY, 2012.

180

Chapter 8 Summary

It is on record that when a young aspirant asked Faraday the secret of his success as a scientific investigator, he replied, 'The secret is comprised in three words— Work, Finish, Publish.' — Michael Faraday J. R. Gladstone, Michael Faraday (1872), 122.

Chapter 8

This thesis presents application of diverse computational methods like MD simulations, pharmacophore modeling, docking, ADMET property calculations, QSAR and exhaustive data analysis to explore the important targets of M. Tb. and HIV as well as new strategies to screen potential inhibitors of M. Tb. The overall work has been divided into eight chapters. The thesis starts with an introduction about the existing challenges in anti-TB drug discovery and the co- existence of M. Tb. and HIV. This is followed by explanation of all the computational methods that have been employed in the thesis.

The third chapter delineates the structural and energetic properties of an important M. Tb. drug target, CmaA1, wherein the detailed conformational changes in the active sites during the cyclopropanation reaction have been studied. Five representative models of CmaA1 which correspond to different stages in the cyclopropanation process have been studied using MD simulations. Analyses of the MD trajectories provide a detailed account of the structural changes in the active sites of CmaA1. The results show that the apo state of CmaA1 corresponds to a closed conformation where the CBS is inaccessible due to the existence of H-bond between

Pro202 of loop10 (L10) and Asn11 of N-terminal α1 helix. However, cofactor binding leads to the breaking of this H-bond and thus the H-bond is absent in the holo form. The hydrophobic side chains orient towards the inner side of the ASBS upon cofactor binding to create a hydrophobic environment for the substrate. The cofactor and substrate tend to come close to each other facilitated by opening of L10 to exchange the methyl group from the cofactor to the substrate. The MD study also revealed that the system tends to regain the apo conformation within 40 ns after releasing the product.

In the fourth chapter, the MD trajectories of the model systems were used to generate structure and ligand based pharmacophore. The performance of these pharmacophore models

182

Chapter 8 were validated by mapping 23 molecules which have been previously reported to exhibit inhibitory activities on CmaA1. The models were further validated by comparing the results from the pharmacophore mapping with the results obtained from docking these molecules with the respective protein structures. On the basis of the screening ability and consistency with the docking results five models are proposed. The models generated from the MD trajectories were found to perform better than the one generated based on the crystal structure demonstrating the importance of incorporating receptor flexibility in drug design.

In the fifth chapter, five structure based and five ligand based pharmacophore models generated from the MD snapshots were considered for a virtual screening protocol. Aiming towards repurposing the existing drugs to inhibit CmaA1, 6583 drugs reported in DrugBank were considered for screening. To find compounds that inhibit multiple targets of M. Tb as well as

HIV, we also chose 701 and 11089 compounds showing activity below 1μM range on M. Tb and

HIV cell lines respectively collected from ChEMBL database. Thus, a total of 18 239 compounds were screened against CmaA1 using four levels of screening i. e., ligand based pharmacophore screening, structure based pharmacophore screening, docking and ADMET filters. 12 compounds were identified as potential hits for CmaA1 at the end of the fourth step.

These compounds were found to interact with the key active site residues of CmaA1.

The sixth chapter deals with an analogue based approach was also employed to critically examine the role of conceptual DFT descriptors and docking scores on a diverse set of 156 inhibitors of HIV proteases. Six QSAR models were developed based on available experimental

IC50 values (HIV-I and HIV-IIIB infected MT4 and CEMSS cells and HIV-I infected C8166 cells). B3LYP/6-31G(d) optimizations were carried out on all considered protease inhibitors, and the results are compared with more economic semi-empirical SCF AM1 results in order to find

183

Chapter 8 out the best and efficient way of descriptor calculations. Interestingly semi-empirical results appear to be satisfactory for this class of inhibitors. Selected QSAR models were validated by taking about 20% of inhibitors in the test sets. The 3-4 orthogonal descriptors based models were selected to be the optimum ones to avoid over correlation. A systematic comparison of conventional descriptors generated by CODESSA, conceptual DFT and docking scores is done.

Their ability to generate statistically significant QSAR models reveal the prominence of conventional and C-DFT descriptors compared to docking scores.

The seventh chapter presents an exhaustive analysis of the 110 NP and ND drugs reported in DrugBank, containing HHCPF to understand how the substitutions at different positions of the scaffold influence their target binding, ADMET/physicochemical properties. The substituents present at 17 different positions of the scaffolds were classified as six features viz.,

H-bond donors, H-bond acceptors, aromatic rings, hydrophobic, charged and halogen groups.

Good correlations (R > 0.8) exist between the number of such features present at certain positions of the scaffolds and the ADMET/physicochemical properties of the HHCPF drugs.

Discovering new chemical entities which tackle the drug resistant mutants of M. Tb and also its co-occurrence with HIV is a topic of profound academic and social interest. Detailed structural investigation of new targets and exhaustive drug design strategies are extremely important and in this regard effective integrative in silico approaches with experiment is indispensable. CmaA1 is one of the most important drug targets of M. Tb. the current study provides details on the precise conformational changes occurring in the active site residues of

CmaA1 during multiple stages of the cyclopropanation cycle. Knowledge of the key residues responsible for the co-factor binding is very essential for designing new inhibitors of CmaA1.

Further, the importance of considering the flexibility of the receptor active site while screening

184

Chapter 8 new inhibitors have been delineated by generating and validating dynamic pharmacophore models. The following chapter employs a novel VS strategy to identify twelve potential inhibitors of CmaA1. These identified compounds include FDA approved drugs those can be considered for repositioning, compounds that can inhibit multiple targets of M. Tb and compounds that would inhibit CmaA1 along with the HIV targets. The robust QSAR study including about 1000 models, presented in chapter 6 investigate the role of a wide range of descriptor classes in predicting the bioactivities of HIV PI inhibitors. HHCPF is one of the important and privileged NP scaffolds that bind to a wide range of targets. Compounds with

HHCPF scaffolds bound to some of the M. Tb and HIV targets have been reported in PDB. An exhaustive analysis of the structural and functional diversities of the HHCPF drugs, their target binding specificities have been performed and presented as chapter 7. The observations of this study are expected to help the design of new HHCPF compounds which show high selectivity towards the targets of M. Tb and HIV.

Overall, the thesis presents a novel VS strategy that considers flexibility of the receptors, drug repurposing and polypharmacology along with discovering dual inhibitors of M. Tb and

HIV as per the current need of anti-TB drug discovery to deal with the drug resistance. The active site dynamics analysis presented here can be applied to explore a wide range of M. Tb. and HIV drug targets. We feel that the current study would trigger prospective experimental investigations on the M. Tb inhibitory activities. We have been in active consultation of the experimental groups to synthesize and screen the designed compounds.

185

List of Publications

Journal articles

1. Srivastava H. K., Choudhury C., Sastry G. N., The efficacy of conceptual DFT descriptors and docking scores on the QSAR models of HIV protease inhibitors Med. Chem. (2012) 8, 811-825.

2. Kurumurthy C., Sambasiva Rao P., Veeraswamy B., Santhosh Kumar G., Shanthan Rao P., Choudhury C., Narsaiah B., A facile and single pot strategy for the synthesis of novel naphthyridine derivatives under microwave irradiation conditions using ZnCl2 as catalyst, evaluation of AChE inhibitory activity, and molecular modeling studies Med. Chem. Res. (2012) 21, 1785-1795.

3. Choudhury C., Priyakumar, U. D., Sastry G. N., Molecular Dynamics Investigation of the Active Site Dynamics of Mycobacterial Cyclopropane Synthase during Various Stages of the Cyclopropanation Process. J. Struct. Biol. (2014) 187, 38-48.

4. Choudhury C., Priyakumar, U. D., Sastry G. N., Dynamics Based Pharmacophore Models for Screening Potential Inhibitors of Mycobacterial Cyclopropane Synthase. J. Chem. Inf. Model. (2015) 55, 848-860.

5. Choudhury C., Sastry G. N., The Structural and Functional Diversities of Hexadecahydro- 1H-Cyclopenta[a] Phenanthrene Framework. Communicated to Mol. Info. (Under review)

6. Choudhury C., Priyakumar, U. D., Sastry G. N., Dynamic Ligand Based Pharmacophore Modeling and Virtual Screening to Identify Mycobacterial Cyclopropane Synthase Inhibitors. Communicated to J. Mol. Modell. (Under review)

7. Reddy K. B., Kumar B. V., Rajan K. S., Choudhury C., Sastry G. N., Synthesis and Evaluation of Novel Benzimidazol-4-Carboxamides as Potent Inhibitors of poly (ADP-ribose) polymerase-1 (PARP-1) enzyme (communicated)

Book Chapter

8. Badrinarayan P., Choudhury C., Sastry G. N., Molecular modeling in the book entitled "Systems and Synthetic Biology (S2B2)" Ed. by P. K. Dhar and V. Singh, Springer Press, (2014) 93-128. Poster and oral presentations

1. “Structure and analogue based approaches on HIV protease inhibitors; Chinmayee Choudhury, H. K. Srivastava and G. N. Sastry in “A.P. Science Congress and Annual convention of A. P Akademi of Sciences”, 18 – 20 November 2010, JNTU, Hyderabad.

2. “The efficacy of conceptual DFT descriptors and docking scores on the QSAR models of HIV protease inhibitors”; Chinmayee Choudhury, H. K. Srivastava and G. N. Sastry in “Applied Theory on Molecular Systems (ATOMS)” 2-5 November, 2011, IICT, Hyderabad.

3. “QSAR Models: The Game and Gambling of Choosing Descriptors”; Chinmayee Choudhury, H. K. Srivastava and G. N. Sastry in “Modelling Chemical and Biological (Re) activity (MCBR3)”, 26 February – 3rd March, 2013, NIPER and IISER-Mohali, Mohali.

4. “Insights into the Active Site Dynamics of Mycobacterial Cyclopropane Synthase through Molecular Dynamics Simulations” Chinmayee Choudhury, U. D. Priyakumar and G. N. Sastry in International Conference on Chemical Biology: Disease Mechanisms and Therapeutics (ICCB), 6-8 February, 2014, IICT, Hyderabad.

5. “Active Site Dynamics of Mycobacterial Cyclopropane Synthase” Chinmayee Choudhury, U. D. Priyakumar and G. N. Sastry in “Molecular Modeling and Informatics in Drug Design (M2ID2)” 3-6 November, 2014, NIPER-Mohali, Mohali. (Best poster award)