Computational Studies of Charge in G Protein Coupled Receptors
Total Page:16
File Type:pdf, Size:1020Kb
Computational Studies of Charge in G Protein Coupled Receptors A thesis submitted to the University of Manchester for the degree of MPhil Bioinformatics in the Faculty of Life Sciences 2013 Spyros Charonis Contents Abstract …………………………………………………………………………….. 4 Declaration …………………………………………………………………………. 5 Copyright …………………………………………………………………………... 6 Acknowledgements ………………………………………………………………… 7 Abbreviations ………………………………………………………………………. 8 1 Introduction ………………………………………………………. ………. 9 1.1 Biology in the Silicon Era ………………………………………… 9 1.2 Structural Biology and Bioinformatics ………………………… 10 1.3 G Protein Coupled Receptors …………………………………... 11 1.3.1 GPCR Classification and Nomenclature ………………. 12 1.3.2 Structural modularity of GPCRs ……………………….. 17 1.3.3 GPCR Functional Mechanisms ………………………… 21 1.3.4 GPCRs as Drug Targets ………………………………… 24 1.4 Electrostatics …………………………………………………….. 25 1.4.1 pH and pKa ………………………………………………. 27 1.4.2 pH dependence of charge state for amino acids ………. 29 1.4.3 Electrostatics in Protein Interactions ………………….. 31 1.4.4 Modeling Electrostatics …………………………………. 33 1.4.4.1 Finite Difference Poisson Boltzmann …………... 34 1.4.4.2 Debye-Hückel Theory …………………………… 35 1.5 Bioinformatics Tools and Methodologies ………………………. 37 1.5.1 Sequence Analysis Methods ……………………………... 38 1.5.1.1 BLAST and PSI-BLAST ………………………... 40 1 1.5.2 Structure Prediction ……………………………………. 41 1.5.2.1 Homology Modeling ……………………………. 42 1.5.3 GPCR Information Repositories ………………………. 45 1.6 Aims and Objectives ………………………………………………... 47 2 Methods …………………………………………………………………. 48 2.1 Sequence Analysis Methodologies ……………………………... 48 2.1.1 Detecting Low-Complexity Regions …………………… 48 2.1.2 PSI-BLAST ……………………………………………... 50 2.2 Structural Analysis Methodologies …………………………… 51 2.3 GPCR Dataset Generation ……………………………………... 52 2.4 PDB File Processing ……………………………………............. 53 2.5 pKa Calculations ………………………………………………... 55 2.6 Molecular Visualization ………………………………………... 57 3 Results …………………………………………………………………… 59 3.1 Empirically Defined GPCR Topology ………………………… 59 3.2 GPCR Sequence Dataset ………………………………………. 60 3.2.1 The distribution of ionizable groups ………………….. 62 3.2.2 Characterizing major peaks …………………………... 66 3.3 Shortlisting charged residues for pKa calculations ………….. 67 3.4 pKa predictions …………………………………………………. 70 3.4.1 Database and Literature Searches ……………………. 71 3.4.2 Predicted pKa Values and Charge States for Residues 73 4 Discussion ………………………………………………………………. 80 4.1 7TM Charge Distribution ……………………………... 80 4.2 Assessment of the Hypothesis ………………………..... 81 4.3 Conclusions …………………………………………….. 83 2 References …………………………………………………………………….. 84 Appendix A GPCR Dataset Sequence Names ……..……………………… 90 Appendix B Unix text filtering commands .……………………………… 101 Final Word Count: 19,454 Figures and Tables Figure 1.1 Layout of a GPCR ………………………………………… 12 Figure 1.2 Hierarchy of the GRAFS Classification Scheme ………… 16 Figure 1.3 Generic Architecture of GPCRs …………………………... 18 Figure 1.4 GPCR Family Structural Coverage ………………………. 21 Figure 1.5 Overview of GPCR Functional Mechanism ……………… 23 Figure 1.6 Electric field created by two oppositely charged bodies …. 26 Figure 1.7 Deprotonation of carboxylic side-chain group ……………. 30 Figure 1.8 Protonation of amino side-chain group …………………… 30 Figure 1.9 Dielectric environment of protein-solvent systems ………. 35 Figure 1.10 Debye Length vs. Counterionic Concentration …………… 37 Figure 1.11 The steps involved in homology modeling ……………….. 43 Figure 2.1 Dotplots of template sequence and control sequence ……... 49 Figure 2.2 Cylinder of pseudo-atoms enclosing GPCR model ………... 56 Figure 2.3 Methods for Charge Studies on GPCRs …………………… 57 Figure 2.4 Structure of human β2 adrenergic receptor ………………... 58 Figure 3.1 GPCR Dataset Composition ……………………………….. 61 Figure 3.2 Frequency of Ionizable Groups vs. Cartesian Location ……..62 Figure 3.3 Sample output of histogram analysis script ……………….. 65 Figure 3.4 Filtering residues for pKa calculations …………………….. 69 Figures 3.5 – 3.12 Residue Locations ……………………………. 74 – 79 Table 1.1 Comparing the GRAFS and A-F Classification Systems……. 17 Table 1.2 Solved GPCR Structures ……………………………………. 20 Table 1.3 pKa values for charged side-chains …………………………. 31 Table 1.4 pH-dependent variation of charge states ……………………. 31 Table 2.1 PSI-BLAST search parameters ……………………………... 50 Table 2.2 Charged atom indicators for ionizable side-chains …………. 54 Table 3.1 Delineation of GPCR Domains in Cartesian Coordinates …... 59 Table 3.2 GPCR Dataset Sequence Composition ……………………… 60 Table 3.3 Peak groups selected for charge frequency distribution …….. 64 Table 3.4 Matching Histogram Peaks to Conserved Residues ………… 67 Table 3.5 Minor Peak Shortlisted Residues ……………………………. 70 Table 3.6 pKa Predictions ……………………………………………… 71 Table 3.7 Mapping model residues onto wild-type GPCRs …………… 72 Table 3.8 pH-dependent variation of charge states ……………………. 73 Table 3.9 Predicted charge states of residues …………………………...73 3 Abstract Electrostatic interactions play significant roles in the functioning of almost all proteins, and thus have a significant impact on biological phenomena at the molecular level. At the epicenter of these interactions are amino acids that are electrically charged and thus participate in interactions involving proteins. The G protein- coupled receptors (GPCRs) comprise the largest known class of transmembrane receptors and are important in mediating an extremely diverse array of signal transduction pathways. The distribution of charged residues carrying ionizable groups was studied by creating a dataset of GPCRs using homology modeling. This dataset was used to study the distribtuion of charged residues, with particular emphasis on the transmembrane elements of GPCRs, in an attempt to correlate locations with significant amounts of charge to functional sites. Calculations of pKa were used to assess functionality of such residues; there was no conclusive evidence that these amino acids are functional. Ionizable groups localized to the transmembrane region of GPCRs were divided into two groups – high frequency residues and low frequency residues. After verifying that high frequency residues represented well known charged residues conserved throughout GPCRs, the residues having low frequencies were selected for pKa calculations. The Finite Difference Poisson Boltzmann (FDPB) and Finite Difference Debye Huckel (FD/DH) methods were used calculate shifts in pKa values for a series of residues, so that their charge state could be predicted, and each residue location was queried in literature for functional annotations. Few of the selected residues had annotations, suggesting perhaps that the scale of the dataset should be increased. 4 Declaration No portion of the work referred to in the thesis has been submitted in support of an application for another degree or qualification of this or any other university or other institute of learning. 5 Copyright Statement i. The author of this thesis (including any appendices and/or schedules to this thesis) owns certain copyright or related rights in it (the “Copyright”) and s/he has given The University of Manchester certain rights to use such Copyright, including for administrative purposes. ii. Copies of this thesis, either in full or in extracts and whether in hard or electronic copy, may be made only in accordance with the Copyright, Designs and Patents Act 1988 (as amended) and regulations issued under it or, where appropriate, in accordance with licensing agreements which the University has from time to time. This page must form part of any such copies made. iii. The ownership of certain Copyright, patents, designs, trade marks and other intellectual property (the “Intellectual Property”) and any reproductions of copyright works in the thesis, for example graphs and tables (“Reproductions”), which may be described in this thesis, may not be owned by the author and may be owned by third parties. Such Intellectual Property and Reproductions cannot and must not be made available for use without the prior written permission of the owner(s) of the relevant Intellectual Property and/or Reproductions. iv. Further information on the conditions under which disclosure, publication and commercialisation of this thesis, the Copyright and any Intellectual Property and/or Reproductions described in it may take place is available in the University IP Policy (see http://www.campus.manchester.ac.uk/medialibrary/policies/intellectual- property.pdf), in any relevant Thesis restriction declarations deposited in the University Library, The University Library’s regulations (see http://www.manchester.ac.uk/library/aboutus/ 6 Acknowledgements I would firstly like to thank my supervisor, Dr. Jim Warwicker, for his continuous and expert advice, support and guidance throughout my time working on this project. Many special thanks go to my parents for their moral and financial support throughout all my years of education. 7 Abbreviations BLAST Basic Local Alignment Search Tool DH Debye Huckel ECL Extracellular Loop FD Finite Difference FD/DH Finite Difference/Debye Huckel FDPB Finite Difference Poisson Boltzmann GPCR G protein coupled receptor ICL Intracellular Loop MSA Multiple Sequence Alignment PDB Protein Data Bank pH potential Hydrogen pKa acid dissociation constant PSI-BLAST Position Specific Iterated Basic Local Alignment Search Tool TM Transmembrane 8 Chapter 1. Introduction The past fifty years have hosted an unprecedented transformation in the field of biology from a purely experimental