Development of Virtual Screening and in Silico Biomarker Identification Model for Pharmaceutical Agents
Total Page:16
File Type:pdf, Size:1020Kb
DEVELOPMENT OF VIRTUAL SCREENING AND IN SILICO BIOMARKER IDENTIFICATION MODEL FOR PHARMACEUTICAL AGENTS ZHANG JINGXIAN NATIONAL UNIVERSITY OF SINGAPORE 2012 Development of Virtual Screening and In Silico Biomarker Identification Model for Pharmaceutical Agents ZHANG JINGXIAN (B.Sc. & M.Sc., Xiamen University) A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF PHARMACY NATIONAL UNIVERSITY OF SINGAPORE 2012 Declaration Declaration I hereby declare that this thesis is my original work and it has been written by me in its entirety. I have duly acknowledged all the sources of information which have been used in the thesis This thesis has also not been submitted for any degree in any university previously. Zhang Jingxian Acknowledgements Acknowledgements First and foremost, I would like to express my sincere and deep gratitude to my supervisor, Professor Chen Yu Zong, who gives me with the excellent guidance and invaluable advices and suggestions throughout my PhD study in National University of Singapore. Prof. Chen gives me a lot help and encouragement in my research as well as job-hunting in the final year. His inspiration, enthusiasm and commitment to science research greatly encourage me to become research scientist. I would like to appreciate him and give me best wishes to him and his loving family. I am grateful to our BIDD group members for their insight suggestions and collaborations in my research work: Dr. Liu Xianghui, Dr. Ma Xiaohua, Dr. Jia Jia, Dr. Zhu Feng, Dr. Liu Xin, Dr. Shi Zhe, Mr. Han Bucong, Ms Wei Xiaona, Mr. Guo Yangfang, Mr. Tao Lin, Mr. Zhang Chen, Ms Qin Chu and other members. I honestly thank for their support for my research. It is a great honor to become a member of BIDD, which likes a big family. The great passion and successfulness of our BIDD group inspire me the most. I would also like to thank Prof. Yap Chun Wei, Prof. Guo Meiling for devoting their time as my QE examiners. I would like to thank Prof. Ji Zhiliang, my Master supervisor, for his great encouragement and help in my study in Xiamen and continue to support me in my PhD study and job hunting. I would like to thank Dr. Liu Xianghui for his great effort in teaching me in my research and warm invitations to his home. I would like to give my best wishes to him and his happy family. I would like to thank Dr. Wei Xiaona and Dr. Han Bucong for continuing encouragement and help in my research; I also like to give my best wishes to their future. I would also like to thank Mr. Wang Li, Mr. Li Fang, Mr. Wang Zhe and Mr. Patel Dhaval Kumar for their help in my study in pharmacy, I would like to wish them great future after graduation. Lastly, I would like to thank my parents and my wife Gao Shizhen for their great cares on me all the time. Zhang Jingxian, 2012 I Table of Contents Table of Contents Acknowledgements…………………..…………….….……….…………………………..I Table of Contents…………………………...…….….…….……………………………II Summary………………………………………….…….…..………..………………….VI List of Tables…….…………………….… …….……………………………………..VIII List of Figures…………………………… ……….…………………………………….XI List of Acronyms………………………………………………………………………..XIII Chapter 1 Introduction .................................................................................................................... 1 1.1 Cheminformatics in drug discovery ................................................................................. 1 1.2 Cheminformatics and bioinformatics resources .............................................................. 5 1.3 Virtual screening of pharmaceutical agents ..................................................................... 7 1.3.1 Structure-based and ligand based virtual screening ............................................. 7 1.3.2 Machine learning methods for virtual screening ............................................... 12 1.3.3 Virtual screening for subtype-selective pharmaceutic agents ............................ 15 1.4 Bioinformatics tools in biomarker identification ........................................................... 16 1.5 Objectives and outline ................................................................................................... 19 Chapter 2 Methods ......................................................................................................................... 22 2.1 Datasets.......................................................................................................................... 22 2.1.1 Data Collection .................................................................................................. 22 2.1.2 Quality analysis ................................................................................................. 23 2.2 Molecular descriptors .................................................................................................... 25 2.2.1 Definition and generation of molecular descriptors .......................................... 25 2.2.2 Scaling of molecular descriptors ....................................................................... 30 2.3 Statistical machine learning methods in ligand based virtual screening ........................ 30 2.3.1 Support vector machines method ...................................................................... 32 2.3.2 K-nearest neighbor method ............................................................................... 35 2.3.3 Probabilistic neural network method ................................................................. 37 2.3.4 Tanimoto similarity searching methods ............................................................. 40 2.3.5 Combinatorial SVM method ............................................................................. 40 2.3.6 Two-step Binary relevance SVM method .......................................................... 41 II Table of Contents 2.4 Statistical machine learning methods model evaluations .............................................. 42 2.4.1 Model validation and parameters optimization ................................................. 42 2.4.2 Performance evaluation methods ....................................................................... 44 2.4.3 Overfiting .......................................................................................................... 45 2.5 Feature reduction methods in biomarker identification ................................................. 45 2.5.1 Data normalization ............................................................................................ 46 2.5.2 Recursive features elimination SVM ................................................................. 46 Chapter 3 A two-step Target Binding and Selectivity Support Vector Machines Approach for Virtual Screening of Dopamine Receptor Subtype-Selective Ligands ................................. 52 3.1 Introduction ................................................................................................................... 54 3.2 Method ........................................................................................................................... 60 3.2.1 Datasets ............................................................................................................. 60 3.2.2 Molecular representations.................................................................................. 69 3.2.3 Support vector machines ................................................................................... 70 3.2.4 Combinatorial SVM method ............................................................................. 71 3.2.5 Two-step Binary relevance SVM method .......................................................... 71 3.2.6 Multi-label K nearest neighbor method ............................................................. 72 3.2.7 The random k-labelsets decision tree method ................................................... 72 3.2.8 Virtual screening model development, parameter determination and performance evaluation .................................................................................................... 73 3.2.9 Determination of similarity level of a compound against dopamine ligands in a dataset ........................................................................................................... 74 3.2.10 Determination of dopamine receptor subtype selective features by feature selection method ............................................................................................................... 75 3.3 Results and discussion ................................................................................................... 76 3.3.1 5-fold cross-validation tests ............................................................................... 76 3.3.2 Applicability domains of the developed SVM VS models ................................ 80 3.3.3 Prediction performance on dopamine receptor subtype selective and multi-subtype ligands ....................................................................................................... 84 III Table of Contents 3.3.4 Virtual screening performance in searching large chemical libraries ................ 88 3.3.5 Dopamine receptor subtype selective features .................................................. 92 3.3.6 Virtual screening performance of the two-step binary relevance SVM method in searching estrogen receptor subtype selective ligands .................................... 94 3.4 Conclusion ....................................................................................................................