Computational Methods for Prediction and Classification of G Protein-Coupled Receptors Khodeza Begum University of Texas at El Paso, [email protected]

Computational Methods for Prediction and Classification of G Protein-Coupled Receptors Khodeza Begum University of Texas at El Paso, Kbegum@Miners.Utep.Edu

University of Texas at El Paso DigitalCommons@UTEP Open Access Theses & Dissertations 2017-01-01 Computational methods for prediction and classification of G protein-coupled receptors Khodeza Begum University of Texas at El Paso, [email protected] Follow this and additional works at: https://digitalcommons.utep.edu/open_etd Part of the Bioinformatics Commons, Computer Sciences Commons, and the Mathematics Commons Recommended Citation Begum, Khodeza, "Computational methods for prediction and classification of G protein-coupled receptors" (2017). Open Access Theses & Dissertations. 408. https://digitalcommons.utep.edu/open_etd/408 This is brought to you for free and open access by DigitalCommons@UTEP. It has been accepted for inclusion in Open Access Theses & Dissertations by an authorized administrator of DigitalCommons@UTEP. For more information, please contact [email protected]. COMPUTATIONAL METHODS FOR PREDICTION AND CLASSIFICATION OF G PROTEIN-COUPLED RECEPTORS KHODEZA BEGUM Master’s Program in Computational Science APPROVED: Ming-Ying Leung, Ph.D., Chair Rachid Skouta, Ph.D. Xiaogang Su, Ph.D. Charlotte M. Vines, Ph.D. Charles Ambler, Ph.D. Dean of the Graduate School Copyright © by Khodeza Begum 2017 COMPUTATIONAL METHODS FOR PREDICTION AND CLASSIFICATION OF G PROTEIN-COUPLED RECEPTORS by KHODEZA BEGUM, B.S. THESIS Presented to the Faculty of the Graduate School of The University of Texas at El Paso in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE COMPUTATIONAL SCIENCE PROGRAM THE UNIVERSITY OF TEXAS AT EL PASO December 2017 Acknowledgements I would first like to thank my advisor Dr. Ming-Ying Leung for the continuous support and encouragement in my study and thesis. Her guidance and immense knowledge has always helped me to give my best effort and be successful. I would also like to thank the rest of my thesis committee: Dr. Charlotte M. Vines, Dr. Rachid Skouta, and Dr. Xiaogang Su, for their encouragement, insightful comments, and questions. I would like to thank my mother, Monowara Begum and my brother, Maksudul Alam for always supporting me and encouraging me to achieve my goals. I thank My Husband, Richard E. Mitchell for his tremendous care and support, and always giving me importance for anything I needed, also my son, Austin Mitchell, who always helped me in every possible way to work harder and encouraging me with his sweet words. I am blessed to have wonderful friends like Ifeanyi H. Nwigboji, Bethuel Khamala and Sharmin Abdullah for always being there for me, helping me and treating me as a family. I thank Osei Tweneboah, Julio H. Solis, M Masum Bhuiyan, Dr. Andrew Pownuk and Kamal Nyaupane for being magnificent friends and classmates. In conclusion, my sincere thanks to Jon Mohl for always helping me with everything and encouraging me to do better with my work. I thank Anastasia Kellog and Kyle Long for the initial data collection and constructing the framework of the database, as well as Gerardo Cardenas for the help in utilizing the bioinformatics computing facilities. iv Abstract G protein-coupled receptors (GPCRs) constitute the largest group of membrane receptor proteins in eukaryotes. Due to their significant roles in many physiological processes such as vision, smell, and inflammation, GPCRs are the targets of many prescribed drugs. However, the functional and structural diversity of GPCRs has kept their prediction and classification based on amino acid sequence data as a challenging bioinformatics problem. As existing computational methods to predict and classify GPCRs are focused on mammalian (mostly human) data, the ultimate goal of our project is to establish an ensemble approach and implement a web-based software that can be used reliably on a wider range of organisms. As a first step, we have constructed a searchable MySQL database with experimentally confirmed GPCRs and non- GPCRs along with protein features for distinguishing them. This database currently contains 2887 GPCR and 1614 non-GPCR sequences collected from the UniProtKB/Swiss-Prot protein database, covering over 300 species including arthropods, fungi, nematode, etc. Each protein in the database is assigned a unique identification number and linked to information about its source organism, sequence lengths, and other features including amino acid and dipeptide composition. For the GPCRs, family classifications according to the popular GRAFS and IUPHAR systems are also included. This database will provide the training and testing data for subsequent steps in our ongoing work to evaluate existing computational tools, incorporate them into our ensemble, and apply them to identify potential GPCRs in several fly, mosquito, and tick species that are of biomedical or agricultural importance. v Table of Contents Abstract ..................................................................................................................................... iv Table of Contents ....................................................................................................................... vi List of Tables .......................................................................................................................... viii List of Figures ............................................................................................................................ ix Chapter 1: Introduction ............................................................................................................... 1 1.1 GPCRs and their biomedical significance ..................................................................... 1 1.2 Prediction and classification of GPCRs ........................................................................ 1 1.3 Objectives of the research ............................................................................................. 4 Chapter 2: Literature Review ...................................................................................................... 8 2.1 Computational tools for GPCR prediction and classification ........................................ 8 2.2 Combination of tools ................................................................................................. 12 2.3 Databases ................................................................................................................... 15 2.4 Web-servers................................................................................................................ 16 2.5 Comparison and accuracy ........................................................................................... 24 Chapter 3: Materials and Methods ............................................................................................. 27 3.1 Construction of the GPCR-PEnDB database ............................................................... 27 3.2 Collection of other datasets ......................................................................................... 28 3.3 Feature collections for GPCRs and Non-GPCRs ......................................................... 29 Chapter 4: Results ..................................................................................................................... 31 4.1 The GPCR-PEnDB database ....................................................................................... 31 4.2 Results of protein features and datasets analyzed ........................................................ 32 Chapter 5: Future work ............................................................................................................. 37 5.1 Research goals ............................................................................................................ 37 5.2 Proposed methods ....................................................................................................... 38 5.3 Expected results and possible pitfalls .......................................................................... 39 5.4 Timeline ..................................................................................................................... 40 vi References ................................................................................................................................ 42 Appendices ............................................................................................................................... 46 Appendix A – Percentages of amino acids in Python ........................................................ 46 Appendix B – Non-GPCR IDs, Protein name in Python ................................................... 48 Appendix C – Dipeptide count of amino acids in Python .................................................. 51 Appendix D – Taxonomy, IDs for Organisms in Python ................................................... 53 Appendix E – Length using TMHMM2.0 in Python ......................................................... 56 Appendix F – MySQL commands for creating tables and populating data ....................... 58 Appendix G – R code for generating plots ....................................................................... 62 Curriculum Vita ........................................................................................................................ 64 vii List of Tables Table 2.1: Summary of datasets, algorithms and accuracy measurements. ................................. 24 Table 4.1: Five number summary of lengths .............................................................................. 35 Table 5.1: Timeline for the completion

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    74 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us