New Methods for the Prediction and Classification of Protein Domains

Total Page:16

File Type:pdf, Size:1020Kb

New Methods for the Prediction and Classification of Protein Domains New Methods for the Prediction and Classi¯cation of Protein Domains Jan Erik Gewehr MÄunchen2007 New Methods for the Prediction and Classi¯cation of Protein Domains Jan Erik Gewehr Dissertation an der FakultÄatfÄurMathematik und Informatik der Ludwig{Maximilians{UniversitÄat MÄunchen vorgelegt von Jan Erik Gewehr aus LÄubeck MÄunchen, den 01.10.2007 Erstgutachter: Prof. Dr. Ralf Zimmer Zweitgutachter: Prof. Dr. Dmitrij Frishman Tag der mÄundlichen PrÄufung:19.12.2007 Contents 1 Motivation and Overview 1 1.1 The Bene¯t of Protein Structure Prediction ................. 1 1.2 Thesis Outline .................................. 3 2 Introduction to Protein Structure Prediction 7 2.1 Protein Structures and Related Databases .................. 7 2.1.1 From Primary to Tertiary Structure .................. 7 2.1.2 Structure-Related Databases ...................... 8 2.2 Assignment of Protein Domains ........................ 10 2.2.1 An Old Problem ............................ 10 2.2.2 Comparison of Domain Assignments . 11 2.3 Structure Prediction Categories ........................ 13 2.3.1 Comparative Modeling ......................... 13 2.3.2 Fold Recognition ............................ 14 2.3.3 Ab Initio ................................. 15 2.4 Community-Wide E®orts ............................ 15 2.4.1 Community-Wide Experiments .................... 15 2.4.2 Structural Genomics and Structure Prediction . 16 2.5 Alignment Methods ............................... 16 2.5.1 Sequence Alignment .......................... 16 2.5.2 Alignment Methods used in this Work . 19 3 Selection of Fold Classes based on Secondary Structure Elements 23 3.1 Introduction ................................... 23 3.2 Material ..................................... 25 3.2.1 Training and Test Data ......................... 25 3.2.2 Quoted Methods ............................ 27 3.3 Preselection of Fold Classes .......................... 28 3.3.1 Secondary Structure Element Alignment (SSEA) . 28 3.3.2 Selection Strategies based on SSEA . 29 3.4 Re¯nement with Pro¯le-Pro¯le Alignment . 31 3.5 Results ...................................... 31 3.5.1 Preselection Performance on ASTRAL25 . 32 vi CONTENTS 3.5.2 Fold Recognition Accuracy ....................... 33 3.5.3 Speed-Up Evaluation .......................... 34 3.6 Discussion .................................... 34 4 AutoSCOP: Unique Mapping of Patterns to SCOP 37 4.1 Introduction ................................... 38 4.2 Material ..................................... 40 4.2.1 InterPro and its Member Databases . 40 4.2.2 ASTRAL Asteroids and Family HMMs . 41 4.2.3 Training Data .............................. 41 4.2.4 Test Data ................................ 41 4.3 The AutoSCOP Approach ........................... 42 4.3.1 Motivation ................................ 42 4.3.2 Unique Patterns ............................. 42 4.3.3 Extension: Pattern Combinations ................... 44 4.3.4 AutoSCOP¤: Inclusion of Further Data Sources . 45 4.4 Results ...................................... 45 4.4.1 Mapping of Training Domains ..................... 45 4.4.2 Prediction of SCOP 1.67 Domains ................... 47 4.4.3 Comparison of InterPro Entries and AutoSCOP Mappings . 51 4.4.4 Fold Prediction of CAFASP4 Targets . 52 4.4.5 Performance in the Sequence Twilight Zone . 54 4.4.6 Using AutoSCOP¤ as a Filter ..................... 54 4.5 AutoPSI DB ................................... 55 4.5.1 AutoPSI Database Content and Methods . 55 4.5.2 Towards Large-Scale Protein Domain Prediction . 59 4.6 Conclusion .................................... 61 5 SSEP-Domain: Template-Based Protein Domain Prediction 65 5.1 Introduction ................................... 66 5.2 The Domain Prediction Pipeline ........................ 67 5.2.1 Preliminaries .............................. 69 5.2.2 Step 1: Finding Potential Domain Boundaries . 69 5.2.3 Step 2: Scoring of Domain Regions . 72 5.2.4 Step 3: Combining Multiple Domain Regions . 75 5.3 Results ...................................... 76 5.3.1 CAFASP 4 and CASP 6 Results .................... 76 5.3.2 Current Version under CAFASP Conditions . 77 5.3.3 Evaluation of InterPro and Combination with AutoSCOP . 86 5.3.4 SSEP-Align: An Extension towards Structure Prediction . 87 5.3.5 Other Possible Extensions ....................... 89 5.4 Two Years After: CASP 7 and its Lessons . 90 5.4.1 Analysis of CASP 7 Results ...................... 90 CONTENTS vii 5.4.2 Using Alternative De¯nitions for SCOP Domains . 91 5.4.3 Discontinuous Domains ......................... 94 5.5 Independent Applications and Evaluations . 95 5.6 Discussion .................................... 96 6 Environment-Speci¯c Alignment Computation and Scoring 99 6.1 The QUASAR Framework . 100 6.1.1 Methods ................................. 100 6.1.2 Use Cases ................................ 102 6.2 Optimized Score Combinations . 103 6.2.1 Preliminaries .............................. 103 6.2.2 Generation of Linear Combinations . 105 6.2.3 Results .................................. 106 6.3 Optimized Matrices for Alignment Ranking . 108 6.3.1 Comparison Matrices . 108 6.3.2 Range-Adaptive Genetic Algorithm . 109 6.3.3 Results .................................. 111 6.4 Optimized Pro¯le-Pro¯le Alignments . 112 6.4.1 Training and Test Data . 113 6.4.2 Modi¯ed Genetic Algorithm . 113 6.4.3 Results .................................. 114 6.5 Discussion .................................... 115 7 Additional Tools 117 7.1 Vorolign: Structural Alignment and SCOP Classi¯cation Prediction . 117 7.2 Representation of Protein Information in ProML . 119 7.3 BioWeka: Extending the Weka Framework for Bioinformatics . 121 7.3.1 Motivation ................................ 121 7.3.2 The Weka Framework . 122 7.3.3 The BioWeka Library . 124 7.3.4 Example Applications . 126 7.3.5 Discussion ................................ 128 8 Concluding Remarks 129 Acknowledgements 146 viii CONTENTS List of Figures 1.1 Introduction: Thesis Overview ......................... 6 2.1 Background: Homology-based Protein Structure Prediction . 14 3.1 Preselection: Overview ............................. 26 3.2 Preselection: Preselection Performance .................... 30 3.3 Preselection: Fold Recognition Accuracy on Three Evaluation Sets. 35 4.1 AutoSCOP: Pattern-Class Graph. ....................... 43 4.2 AutoSCOP: Pattern Combinations. ...................... 43 4.3 AutoSCOP: Screenshot of the AutoPSI Database. 56 4.4 AutoSCOP: Annotation Process for the AutoPSI Database. 57 4.5 AutoSCOP Example: 1a0p. .......................... 60 4.6 AutoSCOP Example: 1jwlc. .......................... 60 4.7 AutoSCOP Example: 1a79a. .......................... 60 5.1 SSEP-Domain: Overview ............................ 68 5.2 SSEP-Domain: Domain Length Histogram . 70 5.3 SSEP-Domain: Alignment Score Histogram . 74 5.4 SSEP-Domain: Overlap Score Example .................... 83 5.5 SSEP-Domain: Results Plot .......................... 85 6.1 QUASAR: Overview .............................. 101 6.2 Comparison of GA Matrices with Well-Known Matrices. 112 7.1 Vorolign: Similarity Computation . 118 7.2 BioWeka: Overview ............................... 124 x List of Figures List of Tables 4.1 AutoSCOP: Quantitative Analysis of Unique InterPro Patterns. 45 4.2 AutoSCOP: Coverage Analysis on Training Data. 46 4.3 AutoSCOP: Influence of InterPro databases. 46 4.4 AutoSCOP: Sensitivity and Speci¯city .................... 49 4.5 AutoSCOP: Reduced Training Data ...................... 51 4.6 AutoSCOP: Non-Trivial Targets ........................ 53 5.1 SSEP-Domain: Correctly Predicted Targets . 80 5.2 SSEP-Domain: Speci¯city ........................... 81 5.3 SSEP-Domain: Average Overlap Score .................... 84 5.4 SSEP-Domain: SSEP-Align Results ...................... 88 5.5 SSEP-Domain: Comparison with SSEP-Domain* on CAFASP 4 and CASP 7 94 5.6 SSEP-Domain: Algorithmic Ingredients .................... 96 6.1 Evaluated Scoring Schemes and Corresponding Parameter Settings . 107 6.2 Results of Optimized PPA Parameters . 114 xii List of Tables Summary Proteins play a central role in organisms as they perform many important tasks in their cells. Accordingly, the better we understand how proteins are built, the better we can deal with many common diseases. In particular, information on structural properties of proteins can give insight into the way they work and how mutations, for instance, may a®ect their operability. Such knowledge can therefore help and influence modern medicine and drug development. This work is situated in the ¯eld of protein structure prediction. Here, the aim is to determine a three-dimensional structure from a protein's amino acid sequence. Depending on their quality, the resulting structure models can be used for a variety of purposes. At the moment there are millions of known protein sequences but only tens of thousands of known protein structures available. As it cannot be expected that it will be possible to experimentally determine a structure for each protein in the near future, protein structure prediction is an important task of current bioinformatics research. In particular, so-called template-based modeling can result in very good predicted struc- tures. Here, given a target sequence with unknown structure, a simple modeling approach (1) searches for similar sequences with known structures (so-called templates), (2) com- putes mappings from the target sequence to the template sequences (so-called alignments), (3) takes the atom coordinates from the mapped parts
Recommended publications
  • A Stochastic Point Cloud Sampling Method for Multi-Template Protein Comparative Modeling
    www.nature.com/scientificreports OPEN A Stochastic Point Cloud Sampling Method for Multi-Template Protein Comparative Modeling Received: 05 November 2015 Jilong Li1 & Jianlin Cheng1,2 Accepted: 21 April 2016 Generating tertiary structural models for a target protein from the known structure of its homologous Published: 10 May 2016 template proteins and their pairwise sequence alignment is a key step in protein comparative modeling. Here, we developed a new stochastic point cloud sampling method, called MTMG, for multi-template protein model generation. The method first superposes the backbones of template structures, and the Cα atoms of the superposed templates form a point cloud for each position of a target protein, which are represented by a three-dimensional multivariate normal distribution. MTMG stochastically resamples the positions for Cα atoms of the residues whose positions are uncertain from the distribution, and accepts or rejects new position according to a simulated annealing protocol, which effectively removes atomic clashes commonly encountered in multi-template comparative modeling. We benchmarked MTMG on 1,033 sequence alignments generated for CASP9, CASP10 and CASP11 targets, respectively. Using multiple templates with MTMG improves the GDT-TS score and TM-score of structural models by 2.96–6.37% and 2.42–5.19% on the three datasets over using single templates. MTMG’s performance was comparable to Modeller in terms of GDT-TS score, TM-score, and GDT-HA score, while the average RMSD was improved by a new sampling approach. The MTMG software is freely available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/mtmg.html. The tertiary structure of a protein, which can be represented by the coordinates of its atoms, is important for understanding the function and activity of the protein1,2.
    [Show full text]
  • Emodel-BDB: a Database of Comparative Structure Models Of
    GigaScience, 7, 2018, 1–9 doi: 10.1093/gigascience/giy091 Advance Access Publication Date: 24 July 2018 Research RESEARCH eModel-BDB: a database of comparative structure models of drug-target interactions from the Binding Database Misagh Naderi 1, Rajiv Gandhi Govindaraj 1 and Michal Brylinski 1,2,* 1Department of Biological Sciences, Louisiana State University, 202 Life Sciences Bldg, Baton Rouge, LA 70803, USA and 2Center for Computation & Technology, Louisiana State University, 2054 Digital Media Center, Baton Rouge, LA 70803, USA ∗Correspondence address. Michal Brylinski, Louisiana State University, 202 Life Sciences Bldg, Baton Rouge, LA 70803, USA; Tel: +(225) 578-2791; Fax: +(225) 578-2597; E-mail: [email protected] http://orcid.org/0000-0002-6204-2869 Abstract Background: The structural information on proteins in their ligand-bound conformational state is invaluable for protein function studies and rational drug design. Compared to the number of available sequences, not only is the repertoire of the experimentally determined structures of holo-proteins limited, these structures do not always include pharmacologically relevant compounds at their binding sites. In addition, binding affinity databases provide vast quantities of information on interactions between drug-like molecules and their targets, however, often lacking structural data. On that account, there is a need for computational methods to complement existing repositories by constructing the atomic-level models of drug-protein assemblies that will not be determined experimentally in the near future. Results: We created eModel-BDB, a database of 200,005 comparative models of drug-bound proteins based on 1,391,403 interaction data obtained from the Binding Database and the PDB library of 31 January 2017.
    [Show full text]
  • (12) United States Patent (10) Patent No.: US 8,715,988 B2 Arnold Et Al
    US008715988B2 (12) United States Patent (10) Patent No.: US 8,715,988 B2 Arnold et al. (45) Date of Patent: May 6, 2014 (54) ALKANE OXIDATION BY MODIFIED 2003, OO77795 A1 4, 2003 Wilson HYDROXYLASES 2003/OO77796 A1 4/2003 Crotean 2003. O100744 A1 5/2003 Farinas 2005, 0037411 A1 2/2005 Arnold et al. (75) Inventors: Frances Arnold, Pasadena, CA (US); 2005/00591.28 A1 3, 2005 Arnold et al. Peter Meinhold, Pasadena, CA (US); 2005/0202419 A1 9, 2005 Cirino et al. Matthew W. Peters, Pasadena, CA 2008/0293928 A1 11/2008 Farinas et al. (US); Rudi Fasan, Brea, CA (US); Mike M. Y. Chen, Pasadena, CA (US) FOREIGN PATENT DOCUMENTS EP O 505 198 A1 9, 1992 (73) Assignee: California Institute of Technology, EP O 752 008 2, 1997 Pasadena, CA (US) EP O932 670 8, 2000 WO WO 89.03424 A1 4f1989 (*) Notice: Subject to any disclaimer, the term of this WO WO95/22625 A1 8, 1995 patent is extended or adjusted under 35 WO WO97, 16553 A1 5, 1997 WO WO97/20078 A1 6, 1997 U.S.C. 154(b) by 999 days. WO WO 97.35957 A1 Of 1997 WO WO97/35966 A1 10, 1997 (21) Appl. No.: 11/697,404 WO WO 98.27 230 A1 6, 1998 WO WO 98,31837 A1 7, 1998 (22) Filed: Apr. 6, 2007 WO WO98/41653 A1 9, 1998 WO WO 98.42832 A1 10, 1998 Prior Publication Data WO WO99,60096 A2 11/1999 (65) WO WOOOOO632 A1 1, 2000 US 2008/0057577 A1 Mar.
    [Show full text]
  • From Directed Evolution to Computational Enzyme Engineering—A Review
    Received: 19 August 2019 Revised: 14 October 2019 Accepted: 19 October 2019 DOI: 10.1002/aic.16847 TRIBUTE TO FOUNDERS: FRANCES ARNOLD. BIOMOLECULAR ENGINEERING, BIOENGINEERING, BIOCHEMICALS, BIOFUELS, AND FOOD From directed evolution to computational enzyme engineering—A review Ratul Chowdhury | Costas D. Maranas Department of Chemical Engineering, The Pennsylvania State University, University Park, Abstract Pennsylvania Nature relies on a wide range of enzymes with specific biocatalytic roles to carry out Correspondence much of the chemistry needed to sustain life. Enzymes catalyze the interconversion of a Costas D. Maranas, Department of Chemical vast array of molecules with high specificity—from molecular nitrogen fixation to the syn- Engineering, The Pennsylvania State University, University Park, PA 16802. thesis of highly specialized hormones and quorum-sensing molecules. Ever increasing Email: [email protected] emphasis on renewable sources for energy and waste minimization has turned enzymes Funding information into key industrial workhorses for targeted chemical conversions. Modern enzymology is Division of Chemical, Bioengineering, central to not only food and beverage manufacturing processes but also finds relevance in Environmental, and Transport Systems, Grant/ Award Number: CBET-1703274 countless consumer product formulations such as proteolytic enzymes in detergents, amy- lases for excess bleach removal from textiles, proteases in meat tenderization, and lactoperoxidases in dairy products. Herein, we present an overview of enzyme science and engineering milestones and the emergence of directed evolution of enzymes for which the 2018 Nobel Prize in Chemistry was awarded to Dr. Frances Arnold. KEYWORDS biocatalysis, computational protein design, directed evolution, enzyme engineering, mutagenesis 1 | BACKGROUND O:S to be approximately 1:1.58:0.28:0.3:0.01 (experiments performed by Johannes Mulder)3 and reported that some of the proteins are cat- 1.1 | Background of enzymes, enzymology, and alytic.
    [Show full text]
  • A Domain Based Protein Structural Modelling Platform Applied in the Analysis of Alternative
    A domain based protein structural modelling platform applied in the analysis of alternative splicing Su Datt Lam A thesis submitted for the degree of Doctor of Philosophy January 2018 Institute of Structural and Molecular Biology University College London Declaration I, Su Datt Lam, confirm that the work presented in this thesis is my own except where otherwise stated. Where the thesis is based on work done by myself jointly with others or where information has been derived from other sources, this has been indicated in the thesis. Su Datt Lam January 2018 Abstract Functional families (FunFams) are a sub-classification of CATH protein domain su- perfamilies that cluster relatives likely to have very similar structures and functions. The functional purity of FunFams has been demonstrated by comparing against ex- perimentally determined Enzyme Commission annotations and by checking whether known functional sites coincide with highly conserved residues in the multiple se- quence alignments of FunFams. We hypothesised that clustering relatives into Fun- Fams may help in protein structure modelling. In the first work chapter, we demonstrate the structural coherence of domains in FunFams. We then explore the usage of FunFams in protein monomer modelling. The FunFam based protocol produced higher percentages of good models compared to an HHsearch (the state-of-the-art HMM based sequence search tool) based protocol for both close and remote homologs. We developed a modelling pipeline that, utilises the FunFam protocol, and is able to model up to 70% of domain sequences from human and fly genomes. In the second work chapter, we explore the usage of FunFams in protein complex modelling.
    [Show full text]
  • An Ultra-Fast Approach for Large-Scale Protein Structure Similarity Searching Lei Deng1, Guolun Zhong1, Chenzhe Liu1, Judong Luo2* and Hui Liu3*
    Deng et al. BMC Bioinformatics 2019, 20(Suppl 19):662 https://doi.org/10.1186/s12859-019-3235-1 METHODOLOGY Open Access MADOKA: an ultra-fast approach for large-scale protein structure similarity searching Lei Deng1, Guolun Zhong1, Chenzhe Liu1, Judong Luo2* and Hui Liu3* From International Conference on Bioinformatics (InCoB 2019) Jakarta, Indonesia. 10–12 Septemebr 2019 Abstract Background: Protein structure comparative analysis and similarity searches play essential roles in structural bioinformatics. A couple of algorithms for protein structure alignments have been developed in recent years. However, facing the rapid growth of protein structure data, improving overall comparison performance and running efficiency with massive sequences is still challenging. Results: Here, we propose MADOKA, an ultra-fast approach for massive structural neighbor searching using a novel two-phase algorithm. Initially, we apply a fast alignment between pairwise structures. Then, we employ a score to select pairs with more similarity to carry out a more accurate fragment-based residue-level alignment. MADOKA performs about 6–100 times faster than existing methods, including TM-align and SAL, in massive alignments. Moreover, the quality of structural alignment of MADOKA is better than the existing algorithms in terms of TM-score and number of aligned residues. We also develop a web server to search structural neighbors in PDB database (About 360,000 protein chains in total), as well as additional features such as 3D structure alignment visualization. The MADOKA web server is freely available at: http://madoka.denglab.org/ Conclusions: MADOKA is an efficient approach to search for protein structure similarity. In addition, we provide a parallel implementation of MADOKA which exploits massive power of multi-core CPUs.
    [Show full text]
  • Probabilistic Protein Homology Modeling
    Probabilistic Protein Homology Modeling Armin Jonathan Olaf Meier M¨unchen2014 Dissertation zur Erlangung des Doktorgrades der Fakult¨atf¨urChemie und Pharmazie der Ludwig-Maximilians-Universit¨atM¨unchen Probabilistic Protein Homology Modeling Armin Jonathan Olaf Meier aus Memmingen, Deutschland 2014 Erkl¨arung: Diese Dissertation wurde im Sinne von x7 der Promotionsordnung vom 28. November 2011 von Herrn Dr. Johannes S¨odingbetreut. Eidesstattliche Versicherung: Diese Dissertation wurde eigenst¨andigund ohne unerlaubte Hilfe erarbeitet. M¨unchen, am 17. Juni 2014 Armin Meier Dissertation eingereicht am 1. Gutachter: Dr. Johannes S¨oding 2. Gutachter: PD Dr. Dietmar Martin M¨undliche Pr¨ufungam 27. Juni 2014 Acknowledgments The completion of this thesis could not have been accomplished without the help of several people. First and foremost, I would like to thank Dr. Johannes S¨oding for having given me the opportunity to work in his group. With his passion and enthusiasm for science he was an outstanding mentor for me, and I am very happy to have profited from his wealth of knowledge and his approach to tackle scientific problems. I would like to thank PD Dr. Dietmar Martin for agreeing to be my second PhD examiner. I am also very grateful to Prof. Dr. Patrick Cramer, Prof. Dr. Klaus F¨orstemann,Prof. Dr. Mario Halic and Prof. Dr. Ulrike Gaul for offering their time as members of my committee. Next, I owe a debt of gratitude to Jessica Andreani for having critically read parts of this thesis. Also thanks to Stefan Seemayer who has made valuable remarks on paper drafts and was very helpful with all kinds of technical questions.
    [Show full text]
  • Engineering Enzyme Systems by Recombination
    Engineering Enzyme Systems by Recombination Thesis by Devin Lee Trudeau In Partial Fulfillment of the Requirements for the degree of Doctor of Philosophy CALIFORNIA INSTITUTE OF TECHNOLOGY Pasadena, California 2014 (Defended December 11, 2013) 2014 Devin L. Trudeau All Rights Reserved iii ACKNOWLEDGEMENTS I’d like to thank my advisor Frances Arnold for seeing me through four years of failure and success. Frances has continually challenged my ideas and work, which has taught me how to be a better scientist, thinker, and bioengineer. In her lab I’ve had the freedom to pursue many different ideas, and interact with people of diverse personalities and research interests. The Arnold lab has been my home these last few years. I’d like to thank Florence Mingardon and Yousong Ding for being my two mentors as I was learning the art of protein engineering. Toni Lee has been a great collaborator on our endoglucanase project, and I’m happy that our efforts were as synergistic as our enzymes. Matthew Smith and Indira Wu have both been invaluable sources of advice on cellulase engineering. Sabrina Sun was a pleasure to work with over the course of two summers and a fourth-year project. Joseph Meyerowitz was my collaborator on the aldehyde tolerance project. Sabine Bastian kept the lab running smoothly, and has provided excellent advice on directed evolution methodology. Jackson Cahn, Ardemis Boghossian, Phil Romero, Ryan Lauchli, John McIntosh, Martin Engqvist, Chris Farwell, and Jane Wang have been great feedback for my (mostly crazy) ideas— and good friends. I’d like to thank Frank Chang and the rest of my biofuels collaborators in the Ho and Chao labs at Academia Sinica.
    [Show full text]
  • On the Engineering of Proteins: Methods and Applications for Carbohydrate-Active Enzymes
    On the engineering of proteins: methods and applications for carbohydrate-active enzymes Fredrika Gullfot Doctoral Thesis in Biotechnology Stockholm, Sweden 2010 © Fredrika Gullfot School of Biotechnology Royal Institute of Technology AlbaNova University Centre SE-106 91 Stockholm Sweden Printed at US-AB Universitetsservice TRITA-BIO Report 2010:14 ISSN 1654-2312 ISBN 978-91-7415-709-3 ii ABSTRACT This thesis presents the application of different protein engineering methods on enzymes and non-catalytic proteins that act upon xyloglucans. Xyloglucans are polysaccharides found as storage polymers in seeds and tubers, and as cross-linking glucans in the cell wall of plants. Their structure is complex with intricate branching patterns, which contribute to the physical properties of the polysaccharide including its binding to and interaction with other glucans such as cellulose. One important group of xyloglucan-active enzymes is encoded by the GH16 XTH gene family in plants, including xyloglucan endo-transglycosylases (XET) and xyloglucan endo-hydrolases (XEH). The molecular determinants behind the different catalytic routes of these homologous enzymes are still not fully understood. By combining structural data and molecular dynamics (MD) simulations, interesting facts were revealed about enzyme-substrate interaction. Furthermore, a pilot study was performed using structure-guided recombination to generate a restricted library of XET/XEH chimeras. Glycosynthases are hydrolytically inactive mutant glycoside hydrolases (GH) that catalyse the formation of glycosidic linkages between glycosyl fluoride donors and glycoside acceptors. Different enzymes with xyloglucan hydrolase activity were engineered into glycosynthases, and characterised as tools for the synthesis of well-defined homogenous xyloglucan oligo- and polysaccharides with regular substitution patterns.
    [Show full text]
  • An XML Based Semantic Protein Map
    An XML based semantic protein map A. S. Sidhu, T. S. Dillon & H. Setiawan Faculty of Information Technology, University of Technology, Sydney Abstract From the nature of the algorithms for data mining we note that an XML framework can be represented using graph matching algorithms. Various techniques currently exist for graph matching of data structures such as the Adjacency Matrix or Algebraic Representation of Graphs. The Graph Representation can be easily converted to a string representation. Both Graph and String Representations miss semantic relationships that exist in the data. These relationships can be captured by using semi-structured XML as a representation format. We already have an approach to integrate different data formats into a Unified Database. The technique is successfully applied to diverse Protein Databases in a Bioinformatics Domain. An XML representation of this comprehensive database preserving order and semantic relationships is already generated. In this paper we propose an approach to a Semantic Protein Map (PMAP) by building a shared ontology on our structured database model. This ontology can be used by various Bioinformatics researchers from one single site. This site will host mirrors of Protein Databases along with BIODB and have tools on Similarity Searching. Keywords: bioinformatics, protein structures, biomedical ontologies, data integration, data semantics, semantic web. 1 Introduction Bioinformatics is the field of science in which, biology, computer science, and information technology merge to form a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.
    [Show full text]
  • STRUCTURAL MODELING of PROTEIN-PROTEIN INTERACTIONS USING MULTIPLE-CHAIN THREADING and FRAGMENT ASSEMBLY by SRAYANTA MUKHERJEE
    STRUCTURAL MODELING OF PROTEIN-PROTEIN INTERACTIONS USING MULTIPLE-CHAIN THREADING AND FRAGMENT ASSEMBLY By SRAYANTA MUKHERJEE Submitted to the graduate degree program in Bioinformatics and the Graduate Faculty of the University of Kansas in partial fulfillment of the requirements for the degree of Doctorate of Philosophy ________________________________ Chairperson Ilya A. Vakser, PhD ________________________________ Co-Chairperson Yang Zhang, PhD ________________________________ John Karanicolas, PhD ________________________________ Eric Deeds, PhD ________________________________ Mark Richter, PhD Date Defended: The Dissertation Committee for Srayanta Mukherjee certifies that this is the approved version of the following dissertation: STRUCTURAL MODELING OF PROTEIN-PROTEIN INTERACTIONS USING MULTIPLE-CHAIN THREADING AND FRAGMENT ASSEMBLY ________________________________ Chairperson Ilya Vaker, PhD ________________________________ Co-Chairperson Yang Zhang, PhD Date approved: ii Abstract Since its birth, the study of protein structures has made progress with leaps and bounds. However, owing to the expenses and difficulties involved, the number of known protein structures has not been able to catch up with the number of protein sequences and in fact has steadily lost ground. This necessitated the development of high-throughput, but accurate, computational algorithms capable of predicting the three dimensional structure of proteins from its amino acid sequence. While progress has been made in the realm of protein tertiary structure prediction, the advancement in protein quaternary structure prediction has been limited by the fact that the degree of freedom for protein complexes is even larger and fewer number of protein complex structures are present in the PDB library. In fact, protein complex structure prediction has largely remained a docking problem where automated algorithms aim to predict the complex structures starting from the unbound crystal structure of its component subunits and has remained largely limited in scope.
    [Show full text]
  • How Significant Is a Protein Structure Similarity with TM-Score = 0.5?
    Vol. 26 no. 7 2010, pages 889–895 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btq066 Structural bioinformatics Advance Access publication February 17, 2010 How significant is a protein structure similarity with TM-score = 0.5? Jinrui Xu1,2 and Yang Zhang1,2,∗ 1Department of Medical School, Center for Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109 and 2Department of Molecular Biosciences, Center for Bioinformatics, University of Kansas, 2030 Becker Drive, Lawrence, KS 66047, USA Downloaded from Associate Editor: Anna Tramontano ABSTRACT commonly used means to compare protein structures is to calculate Motivation: Protein structure similarity is often measured by root the root mean squared deviation (RMSD) of all the equivalent atom mean squared deviation, global distance test score and template pairs after the optimal superposition of the two structures (Kabsch, modeling score (TM-score). However, the scores themselves cannot 1978). However, because all atoms in the structures are equally http://bioinformatics.oxfordjournals.org provide information on how significant the structural similarity weighted in the calculation, one of the major drawbacks of RMSD is. Also, it lacks a quantitative relation between the scores and is that it becomes more sensitive to the local structure deviation than conventional fold classifications. This article aims to answer two to the global topology when the RMSD value is big. For example, questions: (i) what is the statistical significance
    [Show full text]