Predicting the Structure and Energetics of Protein-Ligand Interaction
Total Page:16
File Type:pdf, Size:1020Kb
Predicting the Structure and Energetics of Protein-Ligand Interaction A Dissertation presented to the Faculty of the Graduate School at the University of Missouri In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy by Zhiwei Ma Dr. Xiaoqin Zou, Dissertation Supervisor DECEMBER 2019 The undersigned, appointed by the Dean of the Graduate School, have examined the dissertation entitled: Predicting the Structure and Energetics of Protein-Ligand Interaction presented by Zhiwei Ma, a candidate for the degree of Doctor of Philosophy and hereby certify that, in their opinion, it is worthy of acceptance. Chair: Dr. Xiaoqin Zou, Physics Dr. Jianlin Cheng, Computer Science Dr. Shi-jie Chen, Physics Dr. Ioan Kosztin, Physics Dr. Gavin M. King, Physics DEDICATION This dissertation is dedicated to my parents. ACKNOWLEDGMENTS First, I would like to express my deepest gratitude to my Ph.D. advisor, Dr. Xiaoqin Zou, who has mentored and supported me for my whole Ph.D. period. Dr. Zou has devoted a great deal of time and energy to help me grow from a new graduate student with few research experiences in biophysics to a researcher. Dr. Zou is my role model not only in academia but also in being a nice lady and a decent person. She is always full of energy in her work, being patient to her students and enormous enthusiasm in everyday life. I still remember that cold winter of 2014 when I arrived in Columbia, Dr. Zou supported me to get through all the hard times since then. The most encouragement to my heart is her trust and confidence in my ability. Her warmth makes this place a second home and would be carried away into a region of my dreams no matter where I am. Second, I would like to thank my committee members, Dr. Jianlin Cheng, Dr. Shi-jie Chen, Dr. Ioan Kosztin, and Dr. Gavin M. King, for their precious time and support. During the critical discussions with me, they show me how scientists are in discerning ways of thinking. Their passion for science and their advice will constantly encourage me in the rest of my life when I am doing research. I extend my appreciation to all my current and previous lab members, Dr. Sheng- You Huang, Dr. Sam Grinter, and Benjamin Merideth, for their help and advice. Thank Dr. Chengfei Yan and Erica Hroblak, for their great and selfless help at the time when I first arrived. I treasure and appreciate their friendship in my life. Thank Dr. Xianjin Xu, Dr. Rui Duan, and Dr. Liming Qiu for their guidance and help on my research. I also have enjoyable and fruitful collaborations with them. Thank Dr. ii Shuang Zhang and Dr. Zhe Wang for their company in the last year of my Ph.D. study. They make me enjoy simple pleasures like in the undergraduate days. I would like to thank all my good friends in Columbia, Jialu Yan, Dr. Milica Utjesanovic, Chenhan Zhao and Yuanzhe Zhou. I am really lucky to know them in this beautiful and peaceful town here. Last but not least, thank Dr. Bao-xing Li in Hangzhou Normal University for his inspiration. Thank my parents and my whole family for their understanding and support. iii TABLE OF CONTENTS ACKNOWLEDGMENTS . ii LIST OF TABLES . vii LIST OF FIGURES . ix ABSTRACT . xiii 1 Introduction . 1 1.1 Physical Basis of Molecular Docking . 4 1.1.1 Protein-ligand interaction . 4 1.1.2 Enthalpy-Entropy Compensation . 7 1.1.3 Molecular recognition models . 8 1.2 Key Components in Ligand-Protein Docking . 11 1.2.1 Conformational sampling . 11 1.2.2 Sampling algorithms . 13 1.2.3 Scoring functions . 14 1.3 Docking Programs . 16 1.4 Structure-based virtual screening and inverse virtual screening . 19 1.5 Challenges . 20 Bibliography . 23 2 Drug Discovery against Diverse Kinases: Predicting Selectivity and Virtual Screening . 33 2.1 Introduction . 34 iv 2.2 Materials and Methods . 36 2.2.1 Ensemble docking algorithm to simultaneously address multiple targets . 36 2.2.2 Scoring function . 37 2.2.3 Construction of the reference protein . 38 2.2.4 Docking algorithm . 40 2.2.5 Test data sets . 41 2.2.6 Database preparation . 42 2.3 Results . 46 2.3.1 Evaluation of the dock algorithm and scoring function . 47 2.3.2 The ligand selectivity . 48 2.3.3 Virtual Database Screening . 53 2.3.4 Computational efficiency . 55 2.4 Discussion and Conclusion . 58 Bibliography . 61 3 EDockMS: An Efficient Docking Platform for Multiple Target Screening . 68 3.1 Introduction . 69 3.2 Implementation . 70 3.3 Discussion and perspectives . 72 3.4 ACKNOWLEDGEMENT . 73 Bibliography . 75 v 4 Predicting Protein Ligand Binding Modes for CELPP and GC3: Workflows and Insight . 77 4.1 Introduction . 78 4.2 Materials and Methods . 80 4.2.1 The query protein-ligand complexes . 80 4.2.2 Overview of our binding mode prediction methods . 82 4.2.3 Protein preparation . 83 4.2.4 Ligand preparation and similarity calculation . 84 4.2.5 Binding mode prediction . 85 4.3 Results and Discussion . 87 4.3.1 CELPP . 87 4.4 Conclusion . 98 4.5 ACKNOWLEDGEMENT . 98 Bibliography . 99 Supplementary . 103 .1 EDockMS Implementation Details Related to Chapter 3. 103 .1.1 Server interface . 103 .1.2 Database preparation . 104 .1.3 Docking Protocol . 105 .1.4 Clustering procedure . 106 .1.5 Example and validation of the approach . 106 Bibliography . 164 VITA . 165 vi LIST OF TABLES Table Page 1.1 Example scoring functions . 18 2.1 The 14 protein kinases for testing the ensemble docking algorithm. The PDB codes of corresponding ligand-bound complexes are listed in the right column with the ligand IDs in parentheses. The protein structure with bold font represents its corresponding protein kinase in ensemble docking calculations. 43 2.2 A ranked list of 14 protein kinases according to the computed binding energies by ensemble docking experiments for seven typical inhitors. The true targets of each inhibitor are colored in gray. 46 2.3 The energy scores of Gleevec binding to 4 homologous tyrosine kinases and to the 14 protein kinases in our ensemble docking test set. The true targets are colored gray. 53 4.1 The results of binding mode prediction for the CELPP targets . 88 S1 Table S1 related to Chapter3. EDockMS results of Progesterone. 107 S2 related to Chapter3. Target entries in MDTD . 109 vii S3 Table S3, related to Chapter 3. Benchmark for validation . 160 S4 Table S4, related to Table 4.1. The results of binding mode prediction for the CELPP targets based on the Vina Score. The first row (Bound) lists the results of docking with the bound protein structures. The last row (hiSHAFTS) gives the results of docking with the user-specified protein structures, and the other five rows show the results of docking using the candidate protein structures provided by CELPP. The error of each value was estimated with the bootstrap method, in which 1000 replicates were used, and was reported in the parentheses. 163 viii LIST OF FIGURES Figure Page 1.1 Protein Data Bank growth statistics.14 Number of structures deposited per year vs the number of total available structures. 3 1.2 Illustration of ligand-protein docking. Docking algorithms generate a variety of complex configurations. The right panel shows the surface representation of a protein binding pocket, with the ligands in different orientations. 5 1.3 Illustration of the three conceptual protein-ligand interaction models accompanied with one extended interaction model: (a) The lock-and- key model; both protein and ligand are rigid. (b) The induced-fit model; the conformational change of the protein occurs. (c) The con- formational selection model; the ligand binds to the most suitable con- formation among an ensemble of protein conformations. (d) The ex- tended conformational selection model; the ligand binds to one protein conformer first and then a subsequent conformational change occurs to the protein. To focus on protein flexibility, this figure does not show ligand flexibility. 10 ix 1.4 Stick representation of sugar with the elements in different colors: C(olive), O(red), N(blue). The glycosidic torsion angles ', and ! are annotated. C and O represents carbon and oxygen, respectively. The labeled atoms define the torsion angles as ' (O5-C1-Ox-Cx), (C1-Ox-Cx-Cx−1) and ! (O6-C6-C5-O5). 12 1.5 Illustration of dihedral angles in one amino acid residue of a protein: two backbone dihedral angles ', and one side chain dihedral angle χ. The elements are in different colors: C(grey), O(red), and N(blue). Hydrogen atoms are not shown. The the backbone of this amino acid is colored green. 14 2.1 The success rates in binding mode predictions for 200 complexes at different criteria when the top conformation was considered: rmsd < 1:0, 1:5, 2:0, 2:5 and 3:0 A,˚ respectively. 48 2.2 The 2D structures of selected protein kinase inhibitors for ligand se- lectivity analysis. The ligand IDs with their corresponding PDB codes are shown in the lower-left corner of each cell. This figure is drawn with Marvinsketch, ChemAxon Ltd., http://www.chemaxon.com/. 49 2.3 (A) The 2D structures of staurosporine. (B) The 2D structures of Gleevec. (C) The binding pocket conformations of PDK1 kinase (sea green) and GSK3β (forest green) bound with staurosporine. (D) Su- perimposition of the conformation of Gleevec bound to the Tyrosine Ki- nase domains of Abl(red), c-Kit(orange), Lck(green), and c-Src(blue). Gleevec is shown in the ball & stick representation. 54 x 2.4 The efficient factors of the ligands in discriminating the true targets from the non-targets predicted by the ensemble docking algorithm.