Molecular Similarity and Xenobiotic Metabolism
Total Page:16
File Type:pdf, Size:1020Kb
Molecular Similarity and Xenobiotic Metabolism Samuel Edward Adams Trinity College University of Cambridge This dissertation is submitted for the degree of Doctor of Philosophy Preface This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration except where specifically indicated in the text. The dissertation does not exceed the word limit for the Degree Committee. Copyright © 2010 Samuel Edward Adams This work is licensed under a Creative Commons Attribution-Share Alike 2.0 UK: England & Wales License. This means that you are free: to copy, distribute, display, and perform the work to make derivative works Under the following condition: Attribution. You must give the original author credit. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a licence identical to this one. For any reuse or distribution, you must make clear to others the licence terms of this work. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights. To view the full text of this license, visit http://www.creativecommons.org; or, send a letter to Creative Commons, 171 2nd Street, Suite 300, San Francisco, California, 94105, USA. i Summary Molecular Similarity and Xenobiotic Metabolism Samuel Edward Adams MetaPrint2D, a new software tool implementing a data-mining approach for predicting sites of xenobiotic metabolism has been developed. The algorithm is based on a statistical analysis of the occurrences of atom centred circular fingerprints in both substrates and metabolites. This approach has undergone extensive evaluation and been shown to be of comparable accuracy to current best-in-class tools, but is able to make much faster predictions, for the first time enabling chemists to explore the effects of structural modifications on a compound’s metabolism in a highly responsive and interactive manner. MetaPrint2D is able to assign a confidence score to the predictions it generates, based on the availability of relevant data and the degree to which a compound is modelled by the algorithm. In the course of the evaluation of MetaPrint2D a novel metric for assessing the performance of site of metabolism predictions has been introduced. This overcomes the bias introduced by molecule size and the number of sites of metabolism inherent to the most commonly reported metrics used to evaluate site of metabolism predictions. This data mining approach to site of metabolism prediction has been augmented by a set of reaction type definitions to produce MetaPrint2D-React, enabling prediction of the types of transformations a compound is likely to undergo and the metabolites that are formed. This approach has been evaluated against both historical data and metabolic schemes reported in a number of recently published studies. Results suggest that the ability of this method to predict metabolic transformations is highly dependent on the relevance of the training set data to the query compounds. MetaPrint2D has been released as an open source software library, and both MetaPrint2D and MetaPrint2D-React are available for chemists to use through the Unilever Centre for Molecular Science Informatics’ website. ii Acknowledgements Firstly I would like to thank my supervisor Professor Robert Glen for giving me the opportunity to undertake these studies, and for all of his help and support throughout the course of my research. My thanks go to Dr Scott Boyer and the members of his Computational Toxicology group at AstraZeneca, Mölndal, for their welcome and the help they have given me, in particular Lars Carlsson. I would also like to thank Ola Spjuth of Uppsala University for his assistance in working with Bioclipse. I am grateful to all the members of the Unilever Centre for Molecular Science Informatics for making my time there so interesting and enjoyable. Particular thanks have to go to Charlotte and Phil for keeping the computers working and to Susan and Emma for keeping the centre running! Finally, I would like to express my gratitude to those who have supported me and borne with me during the writing of this thesis. This work was funded by Boehringer Ingelheim and Unilever. iii Contents Preface .............................................................................................................................. i Summary .......................................................................................................................... ii Acknowledgements ......................................................................................................... iii Contents ......................................................................................................................... iv 1. Introduction .............................................................................................................. 1 1.1 The drug discovery process ..................................................................................................... 2 1.2 The role of computational methods ........................................................................................ 6 1.3 Virtual screening methods ...................................................................................................... 8 1.4 Current challenges and developments .................................................................................. 28 2. Prediction of xenobiotic metabolism ........................................................................ 41 2.1 Introduction ........................................................................................................................... 41 2.2 Effects of metabolism ............................................................................................................ 43 2.3 Mechanisms of metabolism .................................................................................................. 50 2.4 Predicting xenobiotic metabolism ......................................................................................... 56 2.5 Conclusion ............................................................................................................................. 64 3. Development of MetaPrint2D: a tool for predicting sites of xenobiotic metabolism .. 65 3.1 Substrate/Product Occurrence Ratio Calculator ................................................................... 65 3.2 Development of MetaPrint2D ............................................................................................... 74 3.3 The Symyx® Metabolite database .......................................................................................... 75 3.4 MetaPrint2D’s implementation ............................................................................................. 84 3.5 Software availability ............................................................................................................ 114 4. Evaluation and optimization of MetaPrint2D .......................................................... 120 4.1 Reaction centre identification ............................................................................................. 120 4.2 Pre-processing of Symyx® Metabolite data ......................................................................... 122 4.3 Evaluating metabolic site predictions ................................................................................. 126 4.4 Evaluation of MetaPrint2D and the effects of data pre-processing options ...................... 132 4.5 Analysis of MetaPrint2D’s performance ............................................................................. 135 4.6 Speed of predictions ............................................................................................................ 143 4.7 Parameterization of MetaPrint2D ....................................................................................... 144 iv 4.8 Isoform specific models ....................................................................................................... 147 4.9 Comparison with other tools ............................................................................................... 152 4.10 Accuracy of the test data ..................................................................................................... 153 4.11 Conclusions .......................................................................................................................... 155 5. Extension of MetaPrint2D to the prediction of transformation types and the generation of metabolites.................................................................................... 157 5.1 Introduction ......................................................................................................................... 157 5.2 Identifying transformations ................................................................................................. 159 5.3 Predicting transformations .................................................................................................. 174 5.4 Generating product structures ............................................................................................ 175 5.5 User interface ...................................................................................................................... 175 5.6 Evaluation ...........................................................................................................................