A Deep Learning-Based Predictor for Surface Accessibility of Transmembrane Protein Residues

Total Page:16

File Type:pdf, Size:1020Kb

A Deep Learning-Based Predictor for Surface Accessibility of Transmembrane Protein Residues crystals Article TMP-SSurface: A Deep Learning-Based Predictor for Surface Accessibility of Transmembrane Protein Residues Chang Lu 1,2 , Zhe Liu 1,2, Bowen Kan 1,2, Yingli Gong 1,2, Zhiqiang Ma 1,2,3,* and Han Wang 1,2,3,* 1 School of Information Science and Technology, Northeast Normal University, Changchun 130117, China; [email protected] (C.L.); [email protected] (Z.L.); [email protected] (B.K.); [email protected] (Y.G.) 2 Institute of Computational Biology, Northeast Normal University, Changchun 130117, China 3 Department of Computer Science, College of Humanities & Sciences of Northeast Normal University, Changchun 130117, China * Correspondence: [email protected] (Z.M.); [email protected] (H.W.) Received: 4 November 2019; Accepted: 29 November 2019; Published: 1 December 2019 Abstract: Transmembrane proteins (TMPs) play vital and diverse roles in many biological processes, such as molecular transportation and immune response. Like other proteins, many major interactions with other molecules happen in TMPs’ surface area, which is important for function annotation and drug discovery. Under the condition that the structure of TMP is hard to derive from experiment and prediction, it is a practical way to predict the TMP residues’ surface area, measured by the relative accessible surface area (rASA), based on computational methods. In this study, we presented a novel deep learning-based predictor TMP-SSurface for both alpha-helical and beta-barrel transmembrane proteins (α-TMP and β-TMP), where convolutional neural network (CNN), inception blocks, and CapsuleNet were combined to construct a network framework, simply accepting one-hot code and position-specific score matrix (PSSM) of protein fragment as inputs. TMP-SSurface was tested against an independent dataset achieving appreciable performance with 0.584 Pearson correlation coefficients (CC) value. As the first TMP’s rASA predictor utilizing the deep neural network, our method provided a referenceable sample for the community, as well as a practical step to discover the interaction sites of TMPs based on their sequence. Keywords: transmembrane protein; surface accessibility; deep learning 1. Introduction Transmembrane protein (TMP) is one of the most important types of membrane proteins (MPs) that span the entire biological membranes in the whole molecular life cycle as a gateway or receptor. They involve in diverse biological processes, such as cell mechanics regulation [1], signal transduction [2], molecule transport [3], etc. Special interest in TMPs also arises from the fact that they associate with many types of diseases, such as autism [4], dyslipidemia [5], epilepsy [6], and various types of cancers [7–9]. Since TMPs play numerous roles in basic physiology and pathophysiology, TMPs are major targets for more than one-third of known drugs on the current therapeutics market [10]. On one side of the membrane, TMPs INTERACT with ligands, including protons, metal ions, enzyme, drug-like compound, etc. On the other side, they interact with proteins, RNAs, or other molecules to trigger a series of molecular reactions and eventually control the cell functions. The interaction interface is always located on the surface areas of TMPs, according to the statistics [11]. The surface accessibility of the residues in the protein can be measured by the relative accessible surface area (rASA), Crystals 2019, 9, 640; doi:10.3390/cryst9120640 www.mdpi.com/journal/crystals Crystals 2019, 9, 640 2 of 13 which refers to the relative surface area of the residues exposed to the environment surrounding the protein [12]. As a valuable structural property, predicting rASA based on primary sequence is a rewarding task for TMPs in structure prediction, function annotation, and drug discovery [13]. In recent years, several sequence-based methods have been developed to predict the surface accessibility of residues for TMPs. Thijs B. et al. firstly published a knowledge-based method (ProperTM) to predict and analyzed the burial state (burial or exposure) of transmembrane residues within TMPs [14]. After that, several machine learning-based methods have been developed. Depending on their functionality, these methods can be roughly grouped into two categories: burial state identifier and surface area predictor. Identifying the burial state of residues is a binary classification problem; it predicts whether residues are exposed to the surface or buried inside of the TMP, such as TMX [15], Yao et al. (2011) [16], TMexpoSVC [17]. Predicting the real value of surface area is a regression problem; these tools predict the accessible surface area (ASA) value or the relative accessible surface area (rASA) value of residues, such as ASAP [18], MPRAP [19], Yao et al. (2012) [20], TMexpoSVR [17], and MemBrain-Rasa [21,22]. A summary of these methods is listed in Table1 in chronological order. Table 1. The summaries of existing methods for predicting surface accessibility of transmembrane protein (TMP) residues. Method Year Samples Algorithm TMP Type Seq Region Measure ProperTM [14] 2004 59 knowledge α-TMP TM region Burial state ASAP [18] 2006 73 SVR all TMP TM region ASA TMX [15] 2007 43 SVC α-TMP TM region Burial state MPRAP [19] 2010 80 SVR α-TMP full sequence rASA Yao et al. (2011) [16] 2011 53 SVM α-TMP TM region Burial state Yao et al. (2012) [20] 2012 122 RF all TMP TM region ASA TMexpoSVR [17] 2013 110 SVR α-TMP TM region rASA TMexpoSVC [17] 2013 110 SVC α-TMP TM region Burial state MenBrain-Rasa [21,22] 2015 80 SVR α-TMP full sequence rASA Although considerable achievements have been made in the field of TMP surface accessibility prediction, there are still several issues that deserved to be further improved. First of all, none of the mentioned methods could predict the rASA of the whole sequence of all kinds of TMPs. On the one hand, except for MPRAP and MenBrain-Rasa, most predictors can only be applied within transmembrane regions of TMPs, which focus only on the lipid-accessible surface while ignoring the water-accessible surface. It is worth pointing out that the prediction of rASA on the full sequence is more challenging than those that only apply to transmembrane residues. On the other hand, most methods only focus on α-helical TMPs while ignoring β-barrel TMPs—including the only two full sequence predictors. Although β-barrel TMPs just account for a small proportion of TMPs, it is also essential to be studied and should not be ignored. Up to now, ASAP and Yao et al. (2012) are the only two predictors that can be applied to both α-helical and β-barrel TMPs, but it is a pity that they can only be used to predict transmembrane regions of TMPs. Thus, it is meaningful to design a more powerful full sequence predictor to predict rASA for all kinds of TMPs. Besides, previous predictors relied heavily on the features derived from third-party tools, such as position-specific score matrix (PSSM) [23], Z-coordinate, secondary structure [24], and so on. Although these features contribute to the improvement of the predictor performance [25–27], their weakness cannot be ignored. On the one hand, using these third-party tool-derived features will make the predictor slow and may lead to uncontrollable failure. For example, MemBrain-Rasa uses six types of features, four of whom relied on the third-part tools, and seven out of 50 proteins cannot get a reliable prediction result from MemBrain-Rasa on the independent test. On the other hand, expertise in TMPs is always required to successfully use the previous methods, which may confuse the non-professional users and hinder the exploration of the biological significance of the prediction process. Since most Crystals 2019, 9, 640 3 of 13 previous predictors can only be applied in the transmembrane regions of TMPs, the topology structure of TMPs must be known before using them. However, it is difficult for non-professional researchers to determine the topology of TMPs. Based on this consideration, we tried to describe the protein fragment with features as concise as possible to make the predictor simpler and more efficient. After a series of experiments, we selected two types of encoding schemes to represent the protein fragment: one-hot code [28–30] and a position-specific scoring matrix (PSSM), where the former one encodes the residues arrangement and the latter one reflect the evolutionary profile. However, reducing the number of features will inevitably result in less information that may get by the predictor and cause performance deterioration. As a promising solution, decreasing the dependency on sophisticated features, a deep learning-based method was introduced in this study for its ability to discover the structural features from the sequence. The proposed method was a deep learning network that combines a convolutional neural network (CNN), inception network, and CapsuleNet. In this study, we proposed a sequence-based rASA predictor (TMP-SSurface) for the full sequence of all types of TMPs, that achieved considerable performance while simplifying the input features as much as possible. Only one-hot code and PSSM were used as the input features of a new proposed deep learning-based regression method, which combined the inception network with CapsuleNet. The experimental result showed that the performance of TMP-SSurface achieved a Pearson correlation coefficient (CC) of 0.581 on the independent validation, which was slightly better than the results of today’s best predictor, but much more simple than it. TMP-SSurface is accessible freely in http://icdtools.nenu.edu.cn/tmp_ssurface. The datasets used in the experiment and project of the predictor could be downloaded from the web-server. 2. Results and Discussion 2.1. Feature Analysis We tried several features, such as topology structure, physicochemical properties, and Z-coordinate. Although these features contributed to the predictor more or less, they were not as significant as one-hot code and PSSM.
Recommended publications
  • The TIM Barrel Fold Nagarajan D
    The TIM barrel fold Nagarajan D. and Nanajkar N. Comments and corrections: Line 10: fix “αhelices” in “α-helices”. Lines 11-12: C-terminal loops are important for catalytic activity, while N-terminal loops are important for the stability of the TIM-barrels. This should be mentioned. Line 14: The reference #7 is not related to the statement. Line 14: There is a new EC classe (EC.7, translocases). Change “5 of 6” in “5 of 7”. Lines 26-27: It is not correct to state that the shear number of 8 for the TIM-barrels is due to “their staggered nature”. Most of the β-barrels have a staggered nature, but their shear number is not 8. Line 27: The reference #2 is imprecise. Wierenga did not defined himself the shear number of TIM-barrel proteins. Please check the 2 papers of Murzin AG, 1994, “Principle determining the structure of β-sheet barrels in proteins,” I and II, and the paper of Liu W, 1998, “Shear numbers of protein β-barrels: definition refinements and statistics”. Line 29: Again, it is not correct to state that the 4-fold geometric symmetry depends on the stagger. Since the number of strands (n) is equal to the Shear number (S), side-chains point alternatively towards the pore and the core, giving a 4-fold symmetry. Line 37: “historically” is a bit exaggerated for a reference dated 2015, especially if it comes from the author itself. Find a true historic reference, or just mention that you defined the regions “core” and “pore”. Line 43: “Consequently” is misleading.
    [Show full text]
  • Smurflite: Combining Simplified Markov Random Fields With
    SMURFLite: combining simplified Markov random fields with simulated evolution improves remote homology detection for beta-structural proteins into the twilight zone Noah M. Daniels 1, Raghavendra Hosur 2, Bonnie Berger 2∗, and Lenore J. Cowen 1∗ 1Department of Computer Science, Tufts University, Medford, MA 02155 2Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 ABSTRACT are limited in their power to recognize remote homologs because of Motivation: One of the most successful methods to date for their inability to model statistical dependencies between amino-acid recognizing protein sequences that are evolutionarily related has residues that are close in space but far apart in sequence (Lifson and been profile Hidden Markov Models (HMMs). However, these models Sander (1980); Zhu and Braun (1999); Olmea et al. (1999); Cowen do not capture pairwise statistical preferences of residues that are et al. (2002); Steward and Thorton (2002)). hydrogen bonded in beta sheets. These dependencies have been For this reason, many have suggested (White et al. (1994); partially captured in the HMM setting by simulated evolution in the Lathrop and Smith (1996); Thomas et al. (2008); Liu et al. (2009); training phase and can be fully captured by Markov Random Fields Menke et al. (2010); Peng and Xu (2011)) that more powerful (MRFs). However, the MRFs can be computationally prohibitive when Markov Random Fields (MRFs) be used. MRFs employ an auxiliary beta strands are interleaved in complex topologies. dependency graph which allows them to model more complex We introduce SMURFLite, a method that combines both simplified statistical dependencies, including statistical dependencies that Markov Random Fields and simulated evolution to substantially occur between amino-acid residues that are hydrogen bonded in beta improve remote homology detection for beta structures.
    [Show full text]
  • Modeling and Predicting Super-Secondary Structures of Transmembrane Beta-Barrel Proteins Thuong Van Du Tran
    Modeling and predicting super-secondary structures of transmembrane beta-barrel proteins Thuong van Du Tran To cite this version: Thuong van Du Tran. Modeling and predicting super-secondary structures of transmembrane beta-barrel proteins. Bioinformatics [q-bio.QM]. Ecole Polytechnique X, 2011. English. NNT : 2011EPXX0104. pastel-00711285 HAL Id: pastel-00711285 https://pastel.archives-ouvertes.fr/pastel-00711285 Submitted on 23 Jun 2012 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THESE` pr´esent´ee pour obtenir le grade de DOCTEUR DE L’ECOLE´ POLYTECHNIQUE Sp´ecialit´e: INFORMATIQUE par Thuong Van Du TRAN Titre de la th`ese: Modeling and Predicting Super-secondary Structures of Transmembrane β-barrel Proteins Soutenue le 7 d´ecembre 2011 devant le jury compos´ede: MM. Laurent MOUCHARD Rapporteurs Mikhail A. ROYTBERG MM. Gregory KUCHEROV Examinateurs Mireille REGNIER M. Jean-Marc STEYAERT Directeur Laboratoire d’Informatique UMR X-CNRS 7161 Ecole´ Polytechnique, 91128 Plaiseau CEDEX, FRANCE Composed with LATEX !c Thuong Van Du Tran. All rights reserved. Contents Introduction 1 1Fundamentalreviewofproteins 5 1.1 Introduction................................... 5 1.2 Proteins..................................... 5 1.2.1 Aminoacids............................... 5 1.2.2 Properties of amino acids .
    [Show full text]
  • Development and Characterization of Novel Bioluminescent Reporters of Cellular Activity by Derrick C. Cumberbatch Dissertation
    Development and Characterization of Novel Bioluminescent Reporters of Cellular Activity By Derrick C. Cumberbatch Dissertation Submitted to the Faculty of the Graduate School of Vanderbilt University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY in Biological Sciences May 10, 2019 Nashville, Tennessee Approved: C. David Weaver, Ph.D. Douglas McMahon, Ph.D. Qi Zhang, Ph.D. Carl Johnson, Ph.D. To my beloved and supportive wife Alicia, and to my parents Cameron and Marcia Cumberbatch. ii ACKNOWLEDGEMENTS This work was made possible by financial support from the NIMH grants MH107713 and MH116150 awarded to Carl Johnson, Ph.D. as well as funds provided by the Vanderbilt University Dissertation Enhancement Grant, graciously awarded to me by the Graduate School. I appreciate Dr. Carl Johnson for taking me into his lab and providing me with ample tools that aided in the successful completion of my Ph.D. I would like to express gratitude to my committee members Drs. David Weaver, Douglas McMahon, Qi Zhang, and the late Dr. Donna Webb for guiding me through the process of becoming a competent researcher. I would also like to make special mention of Jie Yang, Ph.D. whose persistent efforts and sage advice were an ever-present help during my graduate studies. His one-on-one training provided me with many skills that will serve me well as a molecular biologist. Meaningful contributions from the other current and past members of the Johnson lab group, Yao Xu, Ph.D., Tetsuya Mori, Ph.D., Shuqun Shi, Ph.D., Chi Zhao, Ph.D., Peijun Ma, Ph.D., He Huang, Ph.D., Kathryn Campbell, Briana Wyzinski, Kevin Kelly, Maria Luisa Jabbur, Carla O’Neale and Ian Dew deserve to be highlighted here as well.
    [Show full text]
  • The Structure of Small Beta Barrels
    bioRxiv preprint doi: https://doi.org/10.1101/140376; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. The Structure of Small Beta Barrels Philippe Youkharibache*, Stella Veretnik1, Qingliang Li, Philip E. Bourne*1 National Center for Biotechnology Information, The National Library of Medicine, The National Institutes of Health, Bethesda Maryland 20894 USA. *To whom correspondence should be addressed at [email protected] and [email protected] 1 Current address: Department of Biomedical Engineering, The University of Virginia, Charlottesville VA 22908 USA. 1 bioRxiv preprint doi: https://doi.org/10.1101/140376; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Abstract The small beta barrel is a protein structural domain, highly conserved throughout evolution and hence exhibits a broad diversity of functions. Here we undertake a comprehensive review of the structural features of this domain. We begin with what characterizes the structure and the variable nomenclature that has been used to describe it. We then go on to explore the anatomy of the structure and how functional diversity is achieved, including through establishing a variety of multimeric states, which, if misformed, contribute to disease states.
    [Show full text]
  • Folding-TIM Barrel
    Protein Folding Practical September 2011 Folding up the TIM barrel Preliminary Examine the parallel beta barrel that you constructed, noting the stagger of the strands that was needed to connect the ends of the 8-stranded parallel beta sheet into the 8-stranded beta barrel. Notice that the stagger dictates which side of the sheet is on the inside and which is on the outside. This will be key information in folding the complete TIM linear peptide into the TIM barrel. Assembling the full linear peptide 1. Make sure the white beta strands are extended correctly, and the 8 yellow helices (with the green loops at each end) are correctly folded into an alpha helix (right handed with H-bonds to the 4th ahead in the chain). 2. starting with a beta strand connect an alpha helix and green loop to make the blue-red connecting peptide bond. Making sure that you connect the carbonyl (red) end of the beta strand to the amino (blue) end of the loop-helix-loop. Secure the just connected peptide bond bond with a twist-tie as shown. 3. complete step 2 for all beta strand/loop-helix-loop pairs, working in parallel with your partners 4. As pairs are completed attach the carboxy end of the strand- loop-helix-loop to the amino end of the next strand-loop-helix-loop module and secure the new peptide bond with a twist-tie as before. Repeat until the full linear TIM polypeptide chain is assembled. Make sure all strands and helices are still in the correct conformations.
    [Show full text]
  • New Mesh Headings for 2018 Single Column After Cutover
    New MeSH Headings for 2018 Listed in alphabetical order with Heading, Scope Note, Annotation (AN), and Tree Locations 2-Hydroxypropyl-beta-cyclodextrin Derivative of beta-cyclodextrin that is used as an excipient for steroid drugs and as a lipid chelator. Tree locations: beta-Cyclodextrins D04.345.103.333.500 D09.301.915.400.375.333.500 D09.698.365.855.400.375.333.500 AAA Domain An approximately 250 amino acid domain common to AAA ATPases and AAA Proteins. It consists of a highly conserved N-terminal P-Loop ATPase subdomain with an alpha-beta-alpha conformation, and a less-conserved C- terminal subdomain with an all alpha conformation. The N-terminal subdomain includes Walker A and Walker B motifs which function in ATP binding and hydrolysis. Tree locations: Amino Acid Motifs G02.111.570.820.709.275.500.913 AAA Proteins A large, highly conserved and functionally diverse superfamily of NTPases and nucleotide-binding proteins that are characterized by a conserved 200 to 250 amino acid nucleotide-binding and catalytic domain, the AAA+ module. They assemble into hexameric ring complexes that function in the energy-dependent remodeling of macromolecules. Members include ATPASES ASSOCIATED WITH DIVERSE CELLULAR ACTIVITIES. Tree locations: Acid Anhydride Hydrolases D08.811.277.040.013 Carrier Proteins D12.776.157.025 Abuse-Deterrent Formulations Drug formulations or delivery systems intended to discourage the abuse of CONTROLLED SUBSTANCES. These may include physical barriers to prevent chewing or crushing the drug; chemical barriers that prevent extraction of psychoactive ingredients; agonist-antagonist combinations to reduce euphoria associated with abuse; aversion, where controlled substances are combined with others that will produce an unpleasant effect if the patient manipulates the dosage form or exceeds the recommended dose; delivery systems that are resistant to abuse such as implants; or combinations of these methods.
    [Show full text]
  • Tertiary Structure
    Comments Structural motif v sequence motif polyproline (“PXXP”) motif for SH3 binding “RGD” motif for integrin binding “GXXXG” motif within the TM domain of membrane protein Most common type I’ beta turn sequences: X – (N/D/G)G – X Most common type II’ beta turn sequences: X – G(S/T) – X 1 Putting it together Alpha helices and beta sheets are not proteins—only marginally stable by themselves … Extremely small “proteins” can’t do much 2 Tertiary structure • Concerns with how the secondary structure units within a single polypeptide chain associate with each other to give a three- dimensional structure • Secondary structure, super secondary structure, and loops come together to form “domains”, the smallest tertiary structural unit • Structural domains (“domains”) usually contain 100 – 200 amino acids and fold stably. • Domains may be considered to be connected units which are to varying extents independent in terms of their structure, function and folding behavior. Each domain can be described by its fold, i.e. how the secondary structural elements are arranged. • Tertiary structure also includes the way domains fit together 3 Domains are modular •Because they are self-stabilizing, domains can be swapped both in nature and in the laboratory PI3 kinase beta-barrel GFP Branden & Tooze 4 fluorescence localization experiment Chimeras Recombinant proteins are often expressed and purified as fusion proteins (“chimeras”) with – glutathione S-transferase – maltose binding protein – or peptide tags, e.g. hexa-histidine, FLAG epitope helps with solubility, stability, and purification 5 Structural Classification All classifications are done at the domain level In many cases, structural similarity implies a common evolutionary origin – structural similarity without evolutionary relationship is possible – but no structural similarity means no evolutionary relationship Each domain has its corresponding “fold”, i.e.
    [Show full text]
  • Molecular Modeling 2021 Lecture 3 -- Tues Feb 2
    Molecular Modeling 2021 lecture 3 -- Tues Feb 2 Protein classification SCOP TOPS Contact maps domains Domains To a cell biologist a domain is a sequential unit within a gene, usually with a specific function. To a structural biologist a domain is a compact globular unit within a protein, classified by its 3D structure. 2 A domain is... • ... an autonomously-folding substructure of a protein. • ... > 30 residues, but typically < 200. May be bigger. • ...usually has a single hydrophobic core • ... usually composed of one chain (occasionally composed of multiple chains) • ...is usually composed on one contiguous segment (occasionally made of discontiguous segments of the same chain) 3 SARS-CoV-2 spike protein — a multi domain protein 4 SCOPe -- classification of domains !http://scop.berkeley.edu similar secondary structure (1) class content (2) fold vague structural homology (3) superfamily Clear structural homology (4) family increasing structural similarity structural increasing (5) protein Clear sequence homology (6) species nearly identical sequences individual structures SCOPe -- class 1. all α (289) classes of domains 2. all β (178) 3. α/β (148) 4. α+β (388) 5. multidomain (71) 6. membrane (60) 7. small (98) Not true classes of globular 8. coiled coil (7) protein domains 9. low-resolution (25) 10. peptides (148) 11. designed proteins (44) 12. artifacts (1) Proteins of the same class conserve secondary structure content SCOPe -- fold level within α/β proteins -- Mainly parallel beta sheets (beta-alpha-beta units) TIM-barrel (22) swivelling beta/beta/alpha domain (5) Many folds have historical spoIIaa-like (2) names. “TIM” barrel was flavodoxin-like (10) first seen in TIM.
    [Show full text]
  • Beta-Barrel Scaffold of Fluorescent Proteins: Folding, Stability and Role in Chromophore Formation
    Author's personal copy CHAPTER FOUR Beta-Barrel Scaffold of Fluorescent Proteins: Folding, Stability and Role in Chromophore Formation Olesya V. Stepanenko*, Olga V. Stepanenko*, Irina M. Kuznetsova*, Vladislav V. Verkhusha**,1, Konstantin K. Turoverov*,1 *Institute of Cytology of Russian Academy of Sciences, St. Petersburg, Russia **Department of Anatomy and Structural Biology and Gruss-Lipper Biophotonics Center, Albert Einstein College of Medicine, Bronx, NY, USA 1Corresponding authors: E-mails: [email protected]; [email protected] Contents 1. Introduction 222 2. Chromophore Formation and Transformations in Fluorescent Proteins 224 2.1. Chromophore Structures Found in Fluorescent Proteins 225 2.2. Autocatalytic and Light-Induced Chromophore Formation and Transformations 228 2.3. Interaction of Chromophore with Protein Matrix of β-Barrel 234 3. Structure of Fluorescent Proteins and Their Unique Properties 236 3.1. Aequorea victoria GFP and its Genetically Engineered Variants 236 3.2. Fluorescent Proteins from Other Organisms 240 4. Pioneering Studies of Fluorescent Protein Stability 243 4.1. Fundamental Principles of Globular Protein Folding 244 4.2. Comparative Studies of Green and Red Fluorescent Proteins 248 5. Unfolding–Refolding of Fluorescent Proteins 250 5.1. Intermediate States on Pathway of Fluorescent Protein Unfolding 251 5.2. Hysteresis in Unfolding and Refolding of Fluorescent Proteins 258 5.3. Circular Permutation and Reassembly of Split-GFP 261 5.4. Co-translational Folding of Fluorescent Proteins 263 6. Concluding Remarks 267 Acknowledgments 268 References 268 Abstract This review focuses on the current view of the interaction between the β-barrel scaf- fold of fluorescent proteins and their unique chromophore located in the internal helix.
    [Show full text]
  • 101 Appendix 1. List of Candidate Genes in the Mapped QTL Intervals
    Appendix 1. List of candidate genes in the mapped QTL intervals (for Chapter 3) Chromosome Gene Gene PFAM Gene Name Name Start (bp) End (bp) ID PFAM Description Glyma01g21480 Gm01 26542533 26543925 Glyma01g21500 Gm01 26596310 26599861 PF02617 ATP -dependent Clp protease adaptor protein ClpS Glyma01g21510 Gm01 26672736 26675940 PF01490 Transmembrane amino acid transporter protein Glyma01g21530 Gm01 26738538 26739700 Glyma01g21540 Gm01 26740712 26741148 Glyma01g21570 Gm01 26771298 26774049 Glyma01g21580 Gm01 26814404 26816960 Glyma01g21590 Gm01 26883579 26885938 Glyma01g21620 Gm01 26918445 26921076 Glyma01g21660 Gm01 27034015 27042113 PF08154 NLE (NUC135) domain Glyma01g21680 Gm01 27083934 27086891 Glyma01g21710 Gm01 27124069 27134363 PF00076 RNA recognition motif. (a.k.a.
    [Show full text]
  • Evidence for the Preferential Reuse of Sub‐Domain Motifs in Primordial Protein Folds
    Received: 4 January 2021 Revised: 15 April 2021 Accepted: 28 April 2021 DOI: 10.1002/prot.26089 RESEARCH ARTICLE Evidence for the preferential reuse of sub-domain motifs in primordial protein folds Leonhard Heizinger | Rainer Merkl Institute of Biophysics and Physical Biochemistry, University of Regensburg, Abstract Regensburg, Germany A comparison of protein backbones makes clear that not more than approximately Correspondence 1400 different folds exist, each specifying the three-dimensional topology of a pro- Rainer Merkl, Institute of Biophysics and tein domain. Large proteins are composed of specific domain combinations and many Physical Biochemistry, University of Regensburg, 93040 Regensburg, Germany. domains can accommodate different functions. These findings confirm that the reuse Email: [email protected] of domains is key for the evolution of multi-domain proteins. If reuse was also the Funding information driving force for domain evolution, ancestral fragments of sub-domain size exist that Deutsche Forschungsgemeinschaft, Grant/ are shared between domains possessing significantly different topologies. For the Award Numbers: ME 2259/4-1, SFB 960 fully automated detection of putatively ancestral motifs, we developed the algorithm Fragstatt that compares proteins pairwise to identify fragments, that is, instantiations of the same motif. To reach maximal sensitivity, Fragstatt compares sequences by means of cascaded alignments of profile Hidden Markov Models. If the fragment sequences are sufficiently similar, the program determines and scores the structural concordance of the fragments. By analyzing a comprehensive set of proteins from the CATH database, Fragstatt identified 12 532 partially overlapping and structurally similar motifs that clustered to 134 unique motifs. The dissemination of these motifs is limited: We found only two domain topologies that contain two different motifs and generally, these motifs occur in not more than 18% of the CATH topologies.
    [Show full text]