High-Throughput Prediction and Analysis of Drug-Protein Interactions in the Druggable Human Proteome
Total Page:16
File Type:pdf, Size:1020Kb
Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2018 High-throughput prediction and analysis of drug-protein interactions in the druggable human proteome Chen Wang Virginia Commonwealth University Follow this and additional works at: https://scholarscompass.vcu.edu/etd Part of the Bioinformatics Commons © The Author Downloaded from https://scholarscompass.vcu.edu/etd/5509 This Dissertation is brought to you for free and open access by the Graduate School at VCU Scholars Compass. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of VCU Scholars Compass. For more information, please contact [email protected]. High-throughput prediction and analysis of drug-protein interactions in the druggable human proteome A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Virginia Commonwealth University. by Chen Wang Ph.D., Department of Computer Science May 17, 2018 Supervisor: Lukasz Kurgan Ph.D., Qimonda Endowed Professor, Department of Computer Science Virginia Commonwealth University, Richmond, Virginia May 2018 Acknowledgment First, I would like to express my deep gratitude to my supervisor Dr. Lukasz Kurgan for all his continuous motivation, guidance, support, encouragement, enthusiasm, knowledge, and time that have helped me over the last few years to become a better researcher. Dr. Kurgan is the best advisor for directing my development in work and also the best mentor for answering my concerns in life. I am grateful to the members of my dissertation committee, Dr. Tomasz Arodz, Dr. Michal Brylinski, Dr. Krzysztof Cios, and Dr. Vladimir Uversky. Their constructive suggestions and inspiring comments helped me to clarify my understanding of the project and improve the quality of this work. I would like to thank other members and visitors of the BioMine lab for their collaboration and support. Thanks to my friends and all the people who helped me to complete my PhD journey. Last, but not least, I would like to thank my family for their unconditional love and endless encouragement. Thanks to our parents for taking care of their kids and our kid. Thanks to my marvelous wife Xiao for everything. Also thanks to our son Jonathan for being cute and bringing happiness and surprise to our life. ii Dedication This dissertation is dedicated to my wife, our parents, and our son. iii Preface This dissertation is an original work conducted by Chen Wang. It includes materials and results from the following publications and submitted works. [1] Wang C, Hu G, Wang K, Brylinski M, Xie L, Kurgan LA. PDID: database of molecular- level putative protein-drug interactions in the structural human proteome. Bioinformatics, 32(4): 579-586, 2016 [2] Wang C, Kurgan LA. Survey of similarity-based prediction of drug-protein interactions. Revision submitted to the Current Medicinal Chemistry journal on April 15, 2018. [3] Wang C, Kurgan LA. Review and comparative assessment of similarity-based methods for prediction of drug-protein interactions in the druggable human proteome. Submitted to the Briefings in Bioinformatics journal on May 1, 2018. Chapter 3 of this dissertation is based on Ref [1]. Materials from Chapter 4 and 5 have been submitted for publication as Ref [2] and [3], respectively. iv Table of Contents Table of Contents............................................................................................................................ v List of Tables ................................................................................................................................. ix List of Figures................................................................................................................................. x Abstract......................................................................................................................................... xii Chapter 1 Introduction .................................................................................................................... 1 1.1 Goals of the dissertation ................................................................................................... 3 Chapter 2 Background on drug-protein interactions....................................................................... 5 2.1 Protein sequence and structure ......................................................................................... 5 2.2 Drug and drug-protein interaction .................................................................................... 7 2.3 Protein structure-based methods..................................................................................... 10 2.4 Similarity-based methods ............................................................................................... 14 Chapter 3 Evaluation of protein structure-based predictions of drug-protein interactions and development and deployment of protein-drug interaction database ............................................. 17 3.1 Motivation to develop a novel database ......................................................................... 17 v 3.2 Considered protein structure-based predictors ............................................................... 21 3.2.1 FINDSITE and eFindSite..................................................................................... 21 3.2.2 SMAP................................................................................................................... 23 3.2.3 ILbind................................................................................................................... 26 3.3 Assessment of predictive quality.................................................................................... 27 3.4 Protein-drug interaction database ................................................................................... 29 3.4.1 Contents and availability...................................................................................... 29 3.4.2 User interface ....................................................................................................... 32 3.5 Conclusions..................................................................................................................... 36 Chapter 4 Review of the similarity-based predictions of drug-protein interactions..................... 38 4.1 Overview of recent surveys ............................................................................................ 38 4.2 Selection and overview of similarity-based DPI predictors ........................................... 41 4.3 Source databases............................................................................................................. 43 4.3.1 Timeline and impact............................................................................................. 44 4.3.2 Data contents........................................................................................................ 48 4.3.3 Relationships between source databases.............................................................. 51 4.3.4 Other drug-target interaction databases ............................................................... 55 4.4 Similarity-based predictors............................................................................................. 58 4.4.1 Categorization of predictors................................................................................. 58 4.4.2 Timeline, impact, and availability ....................................................................... 58 vi 4.4.3 Internal databases................................................................................................. 61 4.4.4 Predictive models................................................................................................. 66 4.4.5 Timeline of the use of similarities and their ensembles....................................... 70 4.4.6 Critique of the current assessments of predictive performance ........................... 72 4.5 Conclusions..................................................................................................................... 74 Chapter 5 Empirical assessment and comparative analysis of similarity-based methods for prediction of drug-protein interactions ......................................................................................... 78 5.1 Experimental setup ......................................................................................................... 79 5.1.1 Computation of similarities.................................................................................. 79 5.1.2 Prediction of drug-protein interactions ................................................................ 81 5.1.3 Benchmark database of drug-protein interactions ............................................... 82 5.1.4 Assessment of predictive performance ................................................................ 89 5.2 Empirical analysis of predictive performance ................................................................ 91 5.2.1 Assessment of predictive performance at the drug-protein interaction level....... 91 5.2.2 Assessment of predictive performance at drug level ........................................... 99 5.3 Sensitivity of predictive performance to characteristics of predictors ......................... 102 5.3.1 Sensitivity to intrinsic characteristics ................................................................ 103 5.3.2 Sensitivity to extrinsic