A New Ligand-Based Approach to Virtual Screening and Profiling of Large Chemical Libraries
Total Page:16
File Type:pdf, Size:1020Kb
A new ligand-based approach to virtual screening and profiling of large chemical libraries Elisabet Gregori Puigjané Memòria presentada per optar al grau de Doctor en Biologia per la Universitat Pompeu Fabra. Aquesta Tesi Doctoral ha estat realitzada sota la direcció del Dr. Jordi Mestres al Departament de Ciències Experimentals i de la Salut de la Universitat Pompeu Fabra Jordi Mestres Elisabet Gregori Puigjané Barcelona, Maig 2008 The research in this thesis has been carried out at the Chemogenomics Laboratory (CGL) within the Unitat de Recerca en Informàtica Biomèdica (GRIB) at the Parc de Recerca Biomèdica de Barcelona (PRBB). The research carried out in this thesis has been supported by Chemotargets S. L. Table of contents Acknowledgements ........................................................................................... 3 Abstract .............................................................................................................. 5 Objectives ........................................................................................................... 7 List of publications ............................................................................................ 9 Part I – INTRODUCTION .................................................................................. 11 Chapter I.1. Drug discovery ..................................................................... 13 I.1.1. Obtaining a drug candidate ....................................................... 14 I.1.1.1. Hit identification ........................................................... 15 I.1.1.2. Hit-to-lead .................................................................... 17 I.1.1.3. Lead optimization ......................................................... 19 I.1.2. Beyond the one drug – one target paradigm ............................ 20 Chapter I.2. In silico drug discovery ......................................................... 25 I.2.1. Classification schemes ............................................................. 26 I.2.2. Annotated chemical libraries ..................................................... 28 I.2.3. Molecular descriptors ................................................................ 29 I.2.4. Quantitative structure-activity relationships .............................. 32 I.2.5. Virtual ligand screening ............................................................. 34 I.2.6. Virtual target profiling ................................................................ 35 I.2.7. Chemical library design ............................................................. 36 I.2.8. Network pharmacology ............................................................. 37 Chapter I.3. Conclusions and Outlook ..................................................... 41 Part II – DISCUSSION ....................................................................................... 43 Chapter II.1. Future directions of research .............................................. 47 II.1. Chemical structure hopping ........................................................ 47 II.2. Systems chemical biology ........................................................... 48 1 Part III – PUBLICATIONS ................................................................................. 51 Chapter III.1. Molecular descriptors and ligand-based virtual screening . 53 Chapter III.2. Virtual target profiling ......................................................... 75 Chapter III.3. Targeted library design ...................................................... 99 Chapter III.4. Network pharmacology .................................................... 133 Chapter III.5. ViSCA .............................................................................. 156 Conclusions ................................................................................................... 163 References ...................................................................................................... 165 2 Acknowledgements I would like to thank all those who contributed professionally to this thesis, very specially to my supervisor Jordi Mestres who has been a great teacher and very supportive all the way until here. I would also like to thank my mates in the Chemogenomics Lab (Rut, Lulla, Montse, Ricard, Xavi, Ferran, Miguel Ángel, Praveena and Núria) and in Chemotargets (David). I also thank Ramon Aragüés and Fabien Fontaine who were very patient and helped me a lot at the beginning of my thesis. Last, but not least, many thanks to those who helped me at the beginning of my scientific career and that made me realize I wanted to do research: Jordi Villà, Isma Zamora and Manolo Pastor. 3 4 Abstract The representation of molecules by means of molecular descriptors is the basis of most of the computational tools for drug design. These computational methods are based on the abstraction from the chemical structure to summarize its relevant features while being efficient in the comparison of large molecule libraries. A very important feature of these descriptors is their ability to capture the information relevant for the interaction with any target independently from the scaffold of the compound. This will allow detecting as similar any two compounds with the same features arranged in the same way around essentially different scaffolds, a property referred to as scaffold hopping. With this in mind, a new set of descriptors based on the distribution of atom- centred pharmacophoric feature pairs by means of the information theory concept of Shannon entropy [1], called SHED, have been developed. These descriptors have been successfully used in a number of applications important in the drug discovery process. After the implementation of novel in vitro technologies like high-throughput screening and combinatorial chemistry, the capacity of synthesizing and testing compounds increased exponentially but the need for a rational selection of the compounds arose as well. The prioritisation of compounds in terms of their predicted chances of displaying the targeted activity is thus one of the first applications of the ligand-based virtual ligand screening based on SHED descriptors. This application has shown very good results, both in terms of enrichment of actives in the hit list and in terms of scaffold hopping ability, i.e. the novelty of the scaffolds of the found actives in the top ranked compounds. Actually, this methodology can be extended to a chemogenomics view of the drug discovery process, using the descriptors to build ligand-based models of all the proteins with any ligand information. This broader approach, the virtual target profiling, is a step towards completing the activity matrix between all possible chemical compounds and all relevant targets. Moreover, a deeper analysis of this complete matrix generated by virtual target profiling can lead us to a network pharmacology perspective of the drug discovery process. This direction an be further followed by adding to ligand-target information the information about pathways and systems approaches, leading to a systems chemical biology approach that could help understanding biological processes as a whole and identifying more rationally novel and promising drug targets 5 6 Objectives The main objectives of this PhD thesis can be summarized as follows: 1. To develop a new set of topological, feature-based descriptors (SHED). 2. To develop a ligand-based approach to in silico chemical screening and target profiling by exploiting pharmacological data extracted from bibliographic sources and stored in annotated chemical libraries. 3. To implement a new approach to design chemical libraries directed to entire protein families. 4. To further exploit ligand – protein information to analyse and understand biological processes in a systems-directed approach. The first objective was addressed by implementing a set of descriptors based on Shannon entropy called SHED (SHannon Entropy Descriptors) (see Chapter III.1). The second objective was pursued by developing a ligand-based method for small molecule – protein activity prediction (see Chapter III.1 and Chapter III.2). The third objective was tackled with the application of SHED- based models to the prediction of the target family profile for large compounds libraries and the selection of those compounds with the most appropriate profile for the project of interest (see Chapter III.3). The fourth objective consists on using all the previous generated information to provide a global picture that will help understanding the modulation of biological pathways and processes by small molecules (see Chapter III.4). 7 8 List of publications Articles: • Gregori-Puigjané E, Mestres J: SHED: Shannon entropy descriptors from topological feature distributions. J Chem Inf Model 2006, 46:1615–1622. • Mestres J, Martin-Couce L, Gregori-Puigjané E, Cases M, Boyer S: Ligand-based approach to in silico pharmacology: nuclear receptor profiling. J Chem Inf Model 2006, 46:2725–2736. • Gregori-Puigjané E, Mestres J: Coverage and bias in chemical library design. Curr Opionion Chem Biol 2008, doi:10.1016/j.cbpa.2008.03.015. • Gregori-Puigjané E, Mestres J: A ligand-based approach to mining the chemogenomic space of drugs. Comb Chem High Throughput Screen 2008 (In press) • Mestres J, Gregori-Puigjané E, Valverde S, Solé RV: The effect of data completeness on drug-target interaction networks. Nature Biotech 2008 (In press). Book chapters: • Gregori-Puigjané E, Mestres J: Designing chemical libraries directed to