Methods to Improve Virtual Screening of Potential Drug Leads for Specific Pharmacodynamic and Toxicological Properties
Total Page:16
File Type:pdf, Size:1020Kb
METHODS TO IMPROVE VIRTUAL SCREENING OF POTENTIAL DRUG LEADS FOR SPECIFIC PHARMACODYNAMIC AND TOXICOLOGICAL PROPERTIES LIEW CHIN YEE (B.Sc. (Pharm.) (Hons.), NUS) A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF PHARMACY NATIONAL UNIVERSITY OF SINGAPORE 2011 Acknowledgments My deepest appreciation to my graduate advisor, Asst. Prof. Yap Chun Wei, for his patience, encouragement, assistance, and counsel throughout my Ph.D. study. To my dearest, Peter Lau, thank you for your insightful discussions, strength and care. I thank Prof. Chen Yu Zong, BIDD group members, and the Centre for Computational Science & Engineering for the resources provided. I am very grateful to the National University of Singapore for the reward of research scholarship, and to Assoc. Prof. Chan Sui Yung, Head of Pharmacy Department, for the kind provision of opportunities, resources and facilities. I am also appreciative of my Ph.D. commit- tee members and examiners for their insights and recommendations to improve my research. In addition, I acknowledge the financial assistance of the NUS start-up grant (R-148-000-105-133). My appreciation to Yen Ching for her help in the hepatotoxicity project. Also to Pan Chuen, Andre Tan, Magneline Ang, Hui Min, Xiong Yue, and Xiaolei for their contributions to the projects on ensemble of mixed features, it was fun and enlightening being their mentor. To my family, thank you for the support and understanding. Thank you PHARMily mem- bers and friends for the company and advice. – Chin Yee i Contents Acknowledgmenti Contents ii Summary vii List of Tables viii List of Figuresx List of Publications xii Glossary xiii 1 Introduction1 1.1 Drug Discovery & Development . .1 1.2 Complementary Alternative . .2 1.3 Current Challenges . .3 1.3.1 Small Data Set and Lack of Applicability Domain . .4 1.3.2 OECD QSAR Guidelines . .6 1.3.3 Unavailability of Model for Use . .7 1.4 Objectives . .8 1.5 Significance of Projects . .9 1.6 Thesis Structure . 10 2 Methods and Materials 12 2.1 Introduction to QSAR . 12 2.2 Data Set . 13 2.2.1 Data curation . 14 2.2.2 Sampling . 15 2.2.3 Description of Molecules . 15 2.2.4 Feature Selection . 16 2.2.5 Determination of Structural Diversity . 17 2.3 Modelling . 17 2.3.1 k-Nearest Neighbour . 18 2.3.2 Logistic Regression . 19 ii CONTENTS 2.3.3 Na¨ıve Bayes . 19 2.3.4 Random Forest and Decision Trees . 20 2.3.5 Support Vector Machine . 22 2.4 Applicability Domain . 24 2.5 Model Validation . 25 2.5.1 Internal and External Validation . 25 2.6 Performance Measures . 26 I Data Augmentation 28 3 Introduction to Putative Negatives 29 4 Lck Inhibitor 32 4.1 Summary of Study . 32 4.2 Introduction to Lck Inhibitors . 32 4.3 Materials and Methods . 34 4.3.1 Training Set . 34 4.3.2 Modelling . 35 4.3.3 Model Validation . 35 4.3.4 Evaluation of Prediction Performance . 36 4.4 Results . 37 4.4.1 Data Set Diversity and Distribution . 37 4.4.2 Applicability Domain . 38 4.4.3 Model Performances . 38 4.5 Discussions . 40 4.5.1 Cutoff Value for Lck Inhibitory Activity . 40 4.5.2 Putative Negative Compounds . 41 4.5.3 Predicting Positive Compounds Unrepresented in Training Set . 42 4.5.4 Evaluation of SVM Model Using MDDR . 42 4.5.5 Comparison of SVM Model with Logistic Regression Model . 43 4.5.6 Challenges of Using Putative Negatives . 43 4.5.7 Application of SVM model for Novel Lck Inhibitor Design . 46 4.6 Conclusion . 47 5 PI3K Inhibitor 48 5.1 Summary of Study . 48 5.2 Introduction to PI3Ks . 48 5.3 Materials and Methods . 49 5.3.1 Training Set . 49 5.3.2 Modelling . 51 5.3.3 Model Validation . 51 5.4 Results . 52 5.4.1 Data Set Diversity and Distribution . 52 iii CONTENTS 5.4.2 Model Performances . 53 5.5 Discussions . 53 5.6 Conclusion . 55 II Ensemble Methods 57 6 Introduction to Ensemble Methods 58 7 Ensemble of Algorithms 61 7.1 Combining Base Classifiers . 61 7.2 Materials and Methods . 61 7.2.1 Training Set . 61 7.2.2 Modelling . 61 7.2.3 Applicability Domain . 62 7.2.4 Model Validation and Screening . 62 7.2.5 Evaluation of Prediction Performance . 62 7.2.6 Identification of Novel Potential Inhibitors . 62 7.3 Results . 63 7.3.1 Data Set Diversity and Distribution . 63 7.3.2 Applicability Domain . 64 7.3.3 Model Performances . 64 7.3.4 Inhibitors versus Noninhibitors: Molecular Descriptors . 65 7.4 Discussions . 67 7.4.1 The Model . 67 7.4.2 Application of Model for Novel PI3K Inhibitor Design . 68 7.5 Conclusion . 70 8 Ensemble of Features 71 8.1 Summary of Study . 71 8.2 Introduction to Reactive Metabolites . 71 8.3 Materials and Methods . 73 8.3.1 Training Set . 73 8.3.2 Molecular Descriptors . 74 8.3.3 Modelling . 75 8.4 Results . 76 8.4.1 Effects of Performance Measure for Ranking . 76 8.4.2 Effects of Consensus Modelling . 77 8.5 Discussions . 79 8.5.1 Quality of Base Classifiers . 79 8.5.2 Performance Measure for Ranking . 80 8.5.3 Ensemble Compared with Single Classifier . 80 8.5.4 Model for Use . 81 8.6 Conclusion . 84 iv CONTENTS 9 Ensemble of Algorithms and Features 85 9.1 Summary of Study . 85 9.2 Introduction to DILI . 85 9.3 Materials and Methods . 87 9.3.1 Training Set . 87 9.3.2 Validation Sets . 88 9.3.3 Molecular Descriptors . 90 9.3.4 Performance Measures . 90 9.3.5 Modelling . 90 9.3.6 Base Classifiers Selection . 92 9.3.7 Y-randomization . 94 9.4 Results . 95 9.4.1 Hepatic Effects Prediction . 95 9.4.2 Applicability Domain . 100 9.4.3 Y-randomization . ..