Uncovering Hidden Patterns of Molecular Recognition
Total Page:16
File Type:pdf, Size:1020Kb
UNCOVERING HIDDEN PATTERNS OF MOLECULAR RECOGNITION By Sebastian Raschka A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Biochemistry and Molecular Biology – Doctor of Philosophy Quantitative Biology – Dual Major 2017 ABSTRACT UNCOVERING HIDDEN PATTERNS OF MOLECULAR RECOGNITION By Sebastian Raschka It happened in 1958 that John Kendrew’s group determined the three-dimensional structure of myoglobin at a resolution of 6 Å. This first view of a protein fold was a breakthrough at that time. Now, more than half a century later, both experimental and computational techniques have substantially improved as well as our understanding of how proteins and ligands interact. Yet, there are many unanswered questions to be addressed and patterns to be uncovered. One of the most pressing needs in structural biology is the prediction of protein-ligand complexes in aiding inhibitor and drug discovery, ligand design, and studies of catalytic mechanisms. Throughout the past few decades, improvements in computational technologies and insights from experimental data have converged into numerous protein-ligand docking and scoring algorithms. However, these methods are still far from being perfect, and only minimal improvements have been made in the past few years. That might be because current scoring functions regard individual intermolecular interactions as independent events in a binding interface. This thesis addresses existing shortcomings in the conventional view of protein-ligand recog- nition by characterizing interactions as patterns. Finding that binding rigidifies protein-ligand complexes has led to our design of a robust scoring function that predicts native protein-ligand complexes through the coupling of interactions that rigidifies the protein-ligand interface. Also, the analysis of a non-homologous set of protein-ligand complexes has revealed that binding interfaces are polarized – surprisingly, proteins donate twice as many hydrogen bonds to ligands as they accept, on average, and the opposite is true for ligands. A more in-depth analysis of atom type distributions among H-bond donor and acceptor atoms showed that the discovered trends contain surprisingly strong patterns that are also predictive of native protein-ligand binding. Both the coupling of interactions as well as the distribution of hydrogen bond patterns are currently not captured by other methods and provide new information for the prediction and design of ligands. In the absence of the protein receptor structure, our results show that data from experimental assays can be mined to identify functional group patterns on ligands that are predictive of biological activity. Additionally, we present methods to use functional group patterns to improve the success rate of ligand-based virtual screening. Applied to G protein-coupled receptor inhibitor discovery, this approach has led to the discovery of a potent inhibitor that nullifies the biological response and presents the first instance where virtual screening has been used for aquatic invasive species control. Finally, to overcome current challenges in drug discovery for protein-protein interfaces, a new method for identifying small molecules that block protein-protein interactions is presented. We developed and applied an epitope-based virtual screening workflow to find inhibitors of focal adhesion kinase interactions involved in cancer metastasis. In sum, this work presents both novel insights into the coupling among and trends in intermolec- ular interactions as well as methods to predict the biological activity of ligands based on patterns of functional groups. Along with the insights gained in this work, computational tools and software for measuring the rigidification that is characteristic of native protein-ligand complexes, analyzing H-bond patterns rigorously, and screening millions of small molecules in hypothesis-driven ligand discovery have been developed and are now being made available to other scientists. Copyright by SEBASTIAN RASCHKA 2017 This thesis is dedicated to my mom, dad, and sister. Thank you for always believing in me. v ACKNOWLEDGEMENTS First and foremost, I would like to thank my advisor, Dr. Leslie A. Kuhn, for all her guidance throughout my Ph.D. studies and the opportunity to work on many interesting and exciting projects in her lab. My Ph.D. studies have been the biggest adventure of my life thus far – something I will always remember as one of the most profound experiences of my life. I would also like to thank my guidance committee, Dr. Michael Feig, Dr. David N. Arnosti, Dr. C. Titus Brown, and Dr. Jian Hu. Having such a great committee really helped me with staying on track throughout the last five years, and I appreciate all the helpful advice, constructive critique, and encouraging words. Special thanks go to Jessica Lawrence, Becky Conat Mansel, and Jeannine Lee, who helped me navigate through grad school. Knowing that someone was always there to help with various forms and paperwork was invaluable. I would also like to say thanks to Dr. John J. LaPres, Dr. Jon M. Kaguni, and Dr. R. Michael Garavito, who recruited me into the BMS, BMB, and Quantitative Biology programs, and who have done a remarkable job making those programs a very worthwhile experience. A warm thanks goes to Vahid Mirjalili for being such a good friend throughout these years at MSU: exploring the Michigan rivers on kayaks, traveling to conferences, being witness to my friend’s wedding, and collaborating on countless different projects – it was always fun! I am grateful to my grandparents, uncle, sister, mom, and dad, who have always been there for me with moral support. Although I am sad and very sorry that I missed countless birthday parties and other celebrations, I am very grateful to have such a great family who always understood and always believed in me. Last but not least, I would like to thank all my friends and fellow students at MSU. Thank you for five amazing years! vi TABLE OF CONTENTS LIST OF TABLES ....................................... xi LIST OF FIGURES ....................................... xii CHAPTER 1 INTRODUCTION ............................... 1 CHAPTER 2 PROTEIN-LIGAND INTERFACES ARE POLARIZED: DISCOVERY OF A STRONG TREND FOR INTERMOLECULAR HYDROGEN BONDS TO FAVOR DONORS ON THE PROTEIN SIDE WITH IMPLICATIONS FOR PREDICTING AND DESIGNING LIGAND COMPLEXES ...... 5 2.1 Abstract . 6 2.2 Introduction . 6 2.3 Methods . 9 2.3.1 Dataset . 9 2.3.2 Protonation . 13 2.3.3 Optimization of proton orientation . 14 2.3.4 Influence of partial charges . 15 2.3.5 Hbind software . 15 2.3.6 Identification of ligand H-bonding patterns . 16 2.3.7 Software for statistical analyses . 19 2.3.8 Visualization and plotting software . 19 2.4 Results and Discussion . 19 2.4.1 Are donor groups on proteins preferred in H-bonding to biological ligands? 19 2.4.2 Can the observed trends in interfacial polarity, with H-bonds tending to be formed by donors on the protein side of the interface interacting with acceptors on the ligand side, be explained by the prevalence of binding-site protons versus lone pairs? . 22 2.4.3 Do certain residues predominate in the observed preference for proteins to donate H-bonds to ligands? . 23 2.4.4 When protein and ligand atoms are categorized according to their chem- istry, are H-bonding preferences between proteins and ligands fundamen- tally similar or different? . 25 2.4.5 Do different classes of ligand differ in their tendency to accept versus donate H-bonds? . 27 2.4.6 Can orientational selectivity of the biological ligand explain the prefer- ence for proteins to donate H-bonds to ligands? . 30 2.4.7 Do protein-bound metal ions contribute significantly to ligand binding, and how does their bond chemistry relate to observed trends in H-bonding? 33 2.4.8 How do these results relate to Lipinski’s rule of 5 for drug-likeness in small molecules? . 34 vii 2.4.9 Can the observed H-bonding trends be used to predict protein-ligand interactions? . 35 2.5 Conclusions . 39 2.6 Acknowledgements . 42 CHAPTER 3 DETECTING THE NATIVE LIGAND ORIENTATION BY INTERFA- CIAL RIGIDITY: SITEINTERLOCK ..................... 43 3.1 Abstract . 44 3.2 Introduction . 44 3.2.1 Stabilization of protein complexes by ligand binding . 44 3.2.2 Computational probes of protein rigidity and flexibility . 46 3.2.3 Computational detection of protein-ligand interfacial rigidification . 47 3.3 Materials and Methods . 49 3.3.1 Protein-ligand complexes analyzed . 49 3.3.2 Sampling complexes by molecular docking . 51 3.3.3 Evaluating correlation between scoring functions . 53 3.3.4 Rigidity analysis . 53 3.3.5 SiteInterlock interfacial rigidity score . 58 3.3.6 Other scoring functions . 59 3.4 Results and Discussion . 60 3.4.1 Detecting structural rigidification upon protein-ligand complex formation . 60 3.4.2 Interfacial rigidity as a signature of native protein-ligand interaction . 66 3.5 Conclusions . 70 3.6 Acknowledgements . 71 CHAPTER 4 ENABLING THE HYPOTHESIS-DRIVEN PRIORITIZATION OF LIG- AND CANDIDATES IN BIG DATABASES: SCREENLAMP AND ITS APPLICATION TO GPCR INHIBITOR DISCOVERY FOR INVASIVE SPECIES CONTROL ............................. 72 4.1 Abstract . 73 4.2 Introduction . 73 4.2.1 Virtual screening for inhibitor discovery . 73 4.2.2 Pioneering aquatic invasive species control and GPCR inhibitor discovery through virtual screening . 76 4.2.3 G-protein coupled receptors and olfactory receptors . 77 4.3 Methods . 78 4.3.1 Driving structure-activity hypothesis development by structural modeling of 3kPZS-receptor interactions . 78 4.3.2 Development of Screenlamp, a hypothesis-based screening toolkit . 79 4.3.3 Preparation of millions of drug-like molecules for ligand-based screening . 82 4.3.4 Identification of incorrect steroid substructures in molecular database . 84 4.3.5 Step 1: Hypothesis-based molecular filtering . 84 4.3.6 Step 2: Sampling favorable molecular conformations . 86 4.3.7 Generation of overlays to compare molecular shape and charge distribu- tion with a known ligand .