Fast, Sensitive Protein Sequence Searches Using Iterative Pairwise Comparison of Hidden Markov Models

Total Page:16

File Type:pdf, Size:1020Kb

Fast, Sensitive Protein Sequence Searches Using Iterative Pairwise Comparison of Hidden Markov Models Dissertation zur Erlangung des Doktorgrades der Fakultät für Chemie und Pharmazie der Ludwig–Maximilians–Universität München Fast, sensitive protein sequence searches using iterative pairwise comparison of hidden Markov models Michael Remmert aus Köln 2011 Erklärung: Diese Dissertation wurde im Sinne von §13 Abs. 3 bzw. 4 der Promotionsordnung vom 29. Januar 1998 (in der Fassung der sechsten Änderungssatzung vom 16. August 2010) von Herrn Professor Dr. Patrick Cramer betreut. Ehrenwörtliche Versicherung: Diese Dissertation wurde selbstständig, ohne unerlaubte Hilfe erarbeitet. München, am 16. September 2011 Michael Remmert Dissertation eingereicht am: 16. September 2011 1. Gutachter: Prof. Dr. Patrick Cramer 2. Gutachter: Prof. Dr. Dmitrij Frishman Mündliche Prüfung am: 23. November 2011 Acknowledgements I owe my gratitude to all those people who have made this dissertation possible by support- ing and encouraging me throughout the last years, and I can only try to acknowledge them here. First and foremost, my deepest gratitude is to my advisor Dr. Johannes Söding for given me the opportunity to work in his group and to contribute to many fascinating projects. Furthermore, I would like to thank Johannes for all the fruitful discussions, the constant support, and for making the last years to such a great experience. I would like to thank Prof. Dr. Patrick Cramer for being my doctoral supervisor, and Prof. Dr. Dmitrij Frishman for being my second PhD examiner. I am also very grateful to Prof. Dr. Klaus Förstemann, Prof. Dr. Roland Beckmann, Prof. Dr. Ulrike Gaul and Prof. Dr. Karl-Peter Hopfner for offering their time as members of my dissertation commit- tee. Furthermore, I deeply appreciate the critical reading of this thesis by Dr. Johannes Söding and Theresa Niederberger. I also would like to thank all members of the Söding and Tresch group for the great atmosphere in our office, for all their help and discussions, and for all the fun at the social activities. Thanks to Andreas, Maria and Andy for the help in integrating their tools in this project. Last but not least, I am deeply grateful to my parents, my sister Ulrike with Jürgen, my nieces Christiane and Caroline and all my good friends for all the support, patience, trust and encouragement. iii Summary Most sequence-based methods for protein structure or function prediction construct a mul- tiple sequence alignment (MSA) of homologs as a first step. The standard search tool to generate multiple sequence alignments is PSI-BLAST (> 25 000 citations), an extension of BLAST to profile-sequence comparison. PSI-BLAST owes its sensitivity to its use of se- quence profiles and to its iterative search scheme. Significant sequence hits are added to the evolving multiple alignment from which a sequence profile is generated for the next search iteration. HMMER3 is similar to PSI-BLAST, but uses a profile hidden Markov model (HMM) instead of a sequence profile to represent the evolving query multiple alignment. The gain in sensitivity over PSI-BLAST is paid by a factor 3 to 4 reduction in speed. In my thesis work, I have developed HHblits, the first method for iterative sequence searching based on the comparison of profile HMMs. It is faster than PSI-BLAST and HMMER3, more sensitive, and constructs multiple alignments of significantly better quality. The method builds on the HHsearch algorithm for pairwise comparison of profile HMMs, to which it owes its high sensitivity and alignment quality. In parallel to HHblits, our group developed a fast clustering method that can generate a covering set of HMMs for the entire UniProt database in a few weeks time. To speed up the search by a factor ∼ 2000, I developed a prefilter that is based on a novel algorithm for profile-profile comparison designed for maximum speed. The algo- rithm effectively reduces profile-profile alignment to profile-sequence alignment by coding the database profiles by “column state sequences“ in which profile columns are represented by an alphabet of 219 discretized column states. This permits a fast implementation of the profile-profile comparison that employs the SIMD (single instruction multiple data) instruc- tion sets available on modern CPUs. In this way, HHblits performs 16 byte operations in parallel in a single clock cycle. Several further filtering steps reduce the amount of slow, full-blown HMM-HMM comparisons to a fraction of < 1/1000. Our tests show that the loss of sensitivity due to the prefilter is negligible. On a standard SCOP20 ROC benchmark (SCOP1.73 proteins filtered to 20% maximum sequence identity), HHblits detects twice as many true positives as PSI-BLAST and 54% more than HMMER3 at 1% error rate in the first iteration. Two search iterations HHblits detect significantly more true positives than five PSI-BLAST iterations. Alignment quality is likewise improved significantly. Furthermore, we are able to make confident fold predictions (E-value < 10−3) and build structural models for 394 Pfam domains for which no fold prediction has been possible. iv HHblits is a robust, general-purpose protein sequence search tool that has the potential to replace PSI-BLAST as state-of-the-art method for the generation of MSAs. Due to the high alignment quality, HHblits alignments are able to improve secondary structure prediction methods such as PSIPRED. Furthermore, these better alignments facilitates the function and structure prediction for proteins for which nothing is yet known. In the second part of this thesis we study the evolution of outer membrane β-barrels (OMBBs). These proteins are the major class of outer membrane proteins (OMPs) from Gram-negative bacteria, mitochondria and plastids. Their transmembrane domains consist of 8 to 24 β-strands forming a closed, barrel-shaped β-sheet around a central pore. Despite their obvious structural regularity, evidence for an origin by duplication or for a common ancestry had not been found. We use three complementary approaches to show that all OMBBs from Gram-negative bacteria evolved from a single, ancestral ββ hairpin. First, we link almost all families of known single-chain bacterial OMBBs with each other through transitive profile searches. Second, we identify a clear repeat signature in the sequences of many OMBBs in which the repeating sequence unit coincides with the structural ββ hairpin repeat. Third, we show that the observed sequence similarity between OMBB hairpins cannot be explained by structural or membrane constraints on their sequences. The third approach addresses a longstanding problem in protein evolution: how to distinguish between a very remotely homologous relationship and the opposing scenario of ”sequence convergence“. The origin of a diverse group of proteins from a single hairpin module supports the hypothesis that, around the time of transition from the RNA to the protein world, proteins arose by amplification and recombination of short peptide modules that had previously evolved as cofactors of RNAs. This research provides the basis for the identification and classification of outer membrane β-barrels and explains the evolutionary origin of this important bacterial protein class. v Contents Acknowledgements iii Summary iv 1. Motivation and overview 1 I. HHblits - Iterative HMM-based homology searches 3 2. Introduction to remote homology searches 4 2.1. Scoring models and gap penalties . .4 2.2. Pairwise sequence alignment . .6 2.3. Profile alignment . .8 2.4. HMM-HMM alignment . .9 2.4.1. Log-sum-of-odds score . .9 2.4.2. Pairwise alignment of HMMs....................... 10 2.4.3. HHsearch . 12 2.5. Iterative sequence search methods . 12 2.5.1. PSI-BLAST ................................. 13 2.5.2. HMMER3 .................................. 13 3. Material and methods 15 3.1. Workflow of HHblits . 16 3.2. Generation of HHblits databases with kClust . 18 3.3. Fast prefiltering of HHblits . 20 3.3.1. Column state alphabet for sequence profile encoding . 20 3.3.2. Generation of the column state alphabet . 22 3.3.3. Translation of sequence profiles in column state sequences . 23 3.3.4. SSE2 .................................... 25 3.3.5. Fast prefilter algorithms . 25 3.3.6. Gapless local alignment with SSE2 ................... 27 3.3.7. Smith-Waterman with SSE2 ....................... 28 3.3.8. Smith-Waterman with backtrace . 32 vi Contents 3.4. Additional filter steps . 33 3.4.1. Early stopping . 34 3.4.2. Tube shading . 35 3.4.3. Avoid aligning the same templates in multiple iterations . 36 3.5. P -value calculation with neural networks . 36 3.6. Realignment of matches by MAC algorithm . 39 3.6.1. Consecutive merging of best alignments . 40 3.7. Multi-threading support . 41 3.8. Compact database by FFINDEX ......................... 41 4. Results 43 4.1. Benchmarks . 43 4.1.1. Homology detection benchmark . 45 4.1.2. Homology detection benchmark with multi-domain proteins . 47 4.1.3. Alignment quality . 48 4.1.4. Runtime . 49 4.1.5. Quality of E-values ........................... 53 4.1.6. Comparison of HHblits to HHsearch . 54 4.1.7. Comparison of profiles built by HHblits or buildali.pl . 55 4.1.8. UniProt20 versus NR20 database . 57 4.1.9. Using HMMER3-formatted databases . 58 4.1.10. Prefilter parameters . 60 4.1.11. Parameter optimization . 61 4.1.12. Things that did not work . 62 4.2. Improved PSIPRED secondary structure prediction with HHblits profiles . 63 4.3. Structure prediction for selected Pfam domains . 64 4.3.1. CTK3 - a possible new CID . 65 4.3.2. PIP49 - a possible Ca2+-activated protein kinase . 67 4.3.3. The HAP complex . 70 4.4. Web server . 72 5. Conclusion and Outlook 74 II. Evolution of outer membrane β-barrels 79 6. Introduction to the evolution of outer membrane β-barrels 80 6.1. Cell envelope of Gram-negative bacteria . 80 6.2. Outer membrane β-barrels . 82 6.3. Protein evolution .
Recommended publications
  • Membrane Integration of an Essential Β-Barrel Protein Prerequires Burial of an Extracellular Loop
    Membrane integration of an essential β-barrel protein prerequires burial of an extracellular loop Joseph S. Wzoreka, James Leea,b, David Tomaseka,b, Christine L. Hagana, and Daniel E. Kahnea,b,c,1 aDepartment of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138; bDepartment of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138; and cDepartment of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115 Edited by Robert T. Sauer, Massachusetts Institute of Technology, Cambridge, MA, and approved February 1, 2017 (received for review October 5, 2016) The Bam complex assembles β-barrel proteins into the outer mem- these studies have emphasized how the machine might alter the brane (OM) of Gram-negative bacteria. These proteins comprise membrane structure or how its components might sequentially cylindrical β-sheets with long extracellular loops and create pores bind and thereby guide substrates into the membrane. However, to allow passage of nutrients and waste products across the mem- much less has been done to understand how structure emerges brane. Despite their functional importance, several questions re- within a substrate during assembly by these machines. Here, we main about how these proteins are assembled into the OM after have taken the approach of characterizing the nascent structure their synthesis in the cytoplasm and secretion across the inner of a substrate as it interacts with the machine to identify the membrane. To understand this process better, we studied the as- features of the substrate required to permit its proper membrane sembly of an essential β-barrel substrate for the Bam complex, integration.
    [Show full text]
  • HMMER User's Guide
    HMMER User’s Guide Biological sequence analysis using profile hidden Markov models http://hmmer.org/ Version 3.0rc1; February 2010 Sean R. Eddy for the HMMER Development Team Janelia Farm Research Campus 19700 Helix Drive Ashburn VA 20147 USA http://eddylab.org/ Copyright (C) 2010 Howard Hughes Medical Institute. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are retained on all copies. HMMER is licensed and freely distributed under the GNU General Public License version 3 (GPLv3). For a copy of the License, see http://www.gnu.org/licenses/. HMMER is a trademark of the Howard Hughes Medical Institute. 1 Contents 1 Introduction 5 How to avoid reading this manual . 5 How to avoid using this software (links to similar software) . 5 What profile HMMs are . 5 Applications of profile HMMs . 6 Design goals of HMMER3 . 7 What’s still missing in HMMER3 . 8 How to learn more about profile HMMs . 9 2 Installation 10 Quick installation instructions . 10 System requirements . 10 Multithreaded parallelization for multicores is the default . 11 MPI parallelization for clusters is optional . 11 Using build directories . 12 Makefile targets . 12 3 Tutorial 13 The programs in HMMER . 13 Files used in the tutorial . 13 Searching a sequence database with a single profile HMM . 14 Step 1: build a profile HMM with hmmbuild . 14 Step 2: search the sequence database with hmmsearch . 16 Searching a profile HMM database with a query sequence . 22 Step 1: create an HMM database flatfile . 22 Step 2: compress and index the flatfile with hmmpress .
    [Show full text]
  • Modeling and Predicting Super-Secondary Structures of Transmembrane Beta-Barrel Proteins Thuong Van Du Tran
    Modeling and predicting super-secondary structures of transmembrane beta-barrel proteins Thuong van Du Tran To cite this version: Thuong van Du Tran. Modeling and predicting super-secondary structures of transmembrane beta-barrel proteins. Bioinformatics [q-bio.QM]. Ecole Polytechnique X, 2011. English. NNT : 2011EPXX0104. pastel-00711285 HAL Id: pastel-00711285 https://pastel.archives-ouvertes.fr/pastel-00711285 Submitted on 23 Jun 2012 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THESE` pr´esent´ee pour obtenir le grade de DOCTEUR DE L’ECOLE´ POLYTECHNIQUE Sp´ecialit´e: INFORMATIQUE par Thuong Van Du TRAN Titre de la th`ese: Modeling and Predicting Super-secondary Structures of Transmembrane β-barrel Proteins Soutenue le 7 d´ecembre 2011 devant le jury compos´ede: MM. Laurent MOUCHARD Rapporteurs Mikhail A. ROYTBERG MM. Gregory KUCHEROV Examinateurs Mireille REGNIER M. Jean-Marc STEYAERT Directeur Laboratoire d’Informatique UMR X-CNRS 7161 Ecole´ Polytechnique, 91128 Plaiseau CEDEX, FRANCE Composed with LATEX !c Thuong Van Du Tran. All rights reserved. Contents Introduction 1 1Fundamentalreviewofproteins 5 1.1 Introduction................................... 5 1.2 Proteins..................................... 5 1.2.1 Aminoacids............................... 5 1.2.2 Properties of amino acids .
    [Show full text]
  • Validation and Annotation of Virus Sequence Submissions to Genbank Alejandro A
    Schäffer et al. BMC Bioinformatics (2020) 21:211 https://doi.org/10.1186/s12859-020-3537-3 SOFTWARE Open Access VADR: validation and annotation of virus sequence submissions to GenBank Alejandro A. Schäffer1,2, Eneida L. Hatcher2, Linda Yankie2, Lara Shonkwiler2,3,J.RodneyBrister2, Ilene Karsch-Mizrachi2 and Eric P. Nawrocki2* *Correspondence: [email protected] Abstract 2National Center for Biotechnology Background: GenBank contains over 3 million viral sequences. The National Center Information, National Library of for Biotechnology Information (NCBI) previously made available a tool for validating Medicine, National Institutes of Health, Bethesda, MD, 20894 USA and annotating influenza virus sequences that is used to check submissions to Full list of author information is GenBank. Before this project, there was no analogous tool in use for non-influenza viral available at the end of the article sequence submissions. Results: We developed a system called VADR (Viral Annotation DefineR) that validates and annotates viral sequences in GenBank submissions. The annotation system is based on the analysis of the input nucleotide sequence using models built from curated RefSeqs. Hidden Markov models are used to classify sequences by determining the RefSeq they are most similar to, and feature annotation from the RefSeq is mapped based on a nucleotide alignment of the full sequence to a covariance model. Predicted proteins encoded by the sequence are validated with nucleotide-to-protein alignments using BLAST. The system identifies 43 types of “alerts” that (unlike the previous BLAST-based system) provide deterministic and rigorous feedback to researchers who submit sequences with unexpected characteristics. VADR has been integrated into GenBank’s submission processing pipeline allowing for viral submissions passing all tests to be accepted and annotated automatically, without the need for any human (GenBank indexer) intervention.
    [Show full text]
  • RNA Ontology Consortium January 8-9, 2011
    UC San Diego UC San Diego Previously Published Works Title Meeting report of the RNA Ontology Consortium January 8-9, 2011. Permalink https://escholarship.org/uc/item/0kw6h252 Journal Standards in genomic sciences, 4(2) ISSN 1944-3277 Authors Birmingham, Amanda Clemente, Jose C Desai, Narayan et al. Publication Date 2011-04-01 DOI 10.4056/sigs.1724282 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Standards in Genomic Sciences (2011) 4:252-256 DOI:10.4056/sigs.1724282 Meeting report of the RNA Ontology Consortium January 8-9, 2011 Amanda Birmingham1, Jose C. Clemente2, Narayan Desai3, Jack Gilbert3,4, Antonio Gonzalez2, Nikos Kyrpides5, Folker Meyer3,6, Eric Nawrocki7, Peter Sterk8, Jesse Stombaugh2, Zasha Weinberg9,10, Doug Wendel2, Neocles B. Leontis11, Craig Zirbel12, Rob Knight2,13, Alain Laederach14 1 Thermo Fisher Scientific, Lafayette, CO, USA 2 Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO, USA 3 Argonne National Laboratory, Argonne, IL, USA 4 Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA 5 DOE Joint Genome Institute, Walnut Creek, CA, USA 6 Computation Institute, University of Chicago, Chicago, IL, USA 7 Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, VA, USA 8 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK 9 Department of Molecular, Cellular and Developmental, Yale University, New Haven, CT, USA 10 Howard Hughes Medical Institute, Yale University, New
    [Show full text]
  • Structural Basis for Substrate Specificity in the Escherichia Coli
    Structural basis for substrate specificity in the Escherichia coli maltose transport system Michael L. Oldhama, Shanshuang Chenb, and Jue Chena,b,1 aHoward Hughes Medical Institute and bDepartment of Biological Sciences, Purdue University, West Lafayette, IN 47907 Edited by Christopher Miller, Howard Hughes Medical Institute, Brandeis University, Waltham, MA, and approved September 27, 2013 (received for review June 14, 2013) ATP-binding cassette (ABC) transporters are molecular pumps that maltose analogs with a modified reducing end are not trans- harness the chemical energy of ATP hydrolysis to translocate sol- ported despite their high-affinity binding to MBP (5, 6). Further utes across the membrane. The substrates transported by different evidence for selectivity through the ABC transporter MalFGK2 ABC transporters are diverse, ranging from small ions to large itself comes from mutant transporters that function independently proteins. Although crystal structures of several ABC transporters of MBP. In the absence of MBP, these mutants constitutively are available, a structural basis for substrate recognition is still hydrolyze ATP and specifically transport maltodextrins (7, 8). lacking. For the Escherichia coli maltose transport system, the se- In this study, we determined the crystal structures of the lectivity of sugar binding to maltose-binding protein (MBP), the maltose transport complex MBP-MalFGK2 bound with large periplasmic binding protein, does not fully account for the selec- maltodextrin in two conformational states. The determination tivity of sugar transport. To obtain a molecular understanding of of these structures, along with previous studies of maltoporin this observation, we determined the crystal structures of the trans- and MBP, allow us to define how overall substrate specificity is porter complex MBP-MalFGK2 bound with large malto-oligosaccha- achieved for the maltose transport system.
    [Show full text]
  • Tools for Simulating Evolution of Aligned Genomic Regions with Integrated Parameter Estimation Avinash Varadarajan*, Robert K Bradley† and Ian H Holmes†‡
    Open Access Software2008VaradarajanetVolume al. 9, Issue 10, Article R147 Tools for simulating evolution of aligned genomic regions with integrated parameter estimation Avinash Varadarajan*, Robert K Bradley and Ian H Holmes Addresses: *Computer Science Division, University of California, Berkeley, CA 94720-1776, USA. Biophysics Graduate Group, University of California, Berkeley, CA 94720-3200, USA. Department of Bioengineering, University of California, Berkeley, CA 94720-1762, USA. Correspondence: Ian H Holmes. Email: [email protected] Published: 8 October 2008 Received: 20 June 2008 Revised: 21 August 2008 Genome Biology 2008, 9:R147 (doi:10.1186/gb-2008-9-10-r147) Accepted: 8 October 2008 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2008/9/10/R147 © 2008 Varadarajan et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Simulation<p>Threefor richly structured tools of genome for simulating syntenic evolution blocks genome of genomeevolution sequence.</p> are presented: for neutrally evolving DNA, for phylogenetic context-free grammars and Abstract Controlled simulations of genome evolution are useful for benchmarking tools. However, many simulators lack extensibility and cannot measure parameters directly from data. These issues are addressed by three new open-source programs: GSIMULATOR (for neutrally evolving DNA), SIMGRAM (for generic structured features) and SIMGENOME (for syntenic genome blocks). Each offers algorithms for parameter measurement and reconstruction of ancestral sequence.
    [Show full text]
  • INFERNAL User's Guide
    INFERNAL User’s Guide Sequence analysis using profiles of RNA sequence and secondary structure consensus http://eddylab.org/infernal Version 1.1.4; Dec 2020 Eric Nawrocki and Sean Eddy for the INFERNAL development team https://github.com/EddyRivasLab/infernal/ Copyright (C) 2020 Howard Hughes Medical Institute. Infernal and its documentation are freely distributed under the 3-Clause BSD open source license. For a copy of the license, see http://opensource.org/licenses/BSD-3-Clause. Infernal development is supported by the Intramural Research Program of the National Library of Medicine at the US National Institutes of Health, and also by the National Human Genome Research Institute of the US National Institutes of Health under grant number R01HG009116. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. 1 Contents 1 Introduction 6 How to avoid reading this manual . 6 What covariance models are . 6 Applications of covariance models . 7 Infernal and HMMER, CMs and profile HMMs . 7 What’s new in Infernal 1.1 . 8 How to learn more about CMs and profile HMMs . 8 2 Installation 10 Quick installation instructions . 10 System requirements . 10 Multithreaded parallelization for multicores is the default . 11 MPI parallelization for clusters is optional . 11 Using build directories . 12 Makefile targets . 12 Why is the output of ’make’ so clean? . 12 What gets installed by ’make install’, and where? . 12 Staged installations in a buildroot, for a packaging system . 13 Workarounds for some unusual configure/compilation problems . 13 3 Tutorial 15 The programs in Infernal .
    [Show full text]
  • Sequence Databases Enriched with Computationally Designed Protein
    D300–D305 Nucleic Acids Research, 2015, Vol. 43, Database issue Published online 27 September 2014 doi: 10.1093/nar/gku888 NrichD database: sequence databases enriched with computationally designed protein-like sequences aid in remote homology detection Richa Mudgal1, Sankaran Sandhya2,GayatriKumar3, Ramanathan Sowdhamini4, Nagasuma R. Chandra2 and Narayanaswamy Srinivasan3,* 1IISc Mathematics Initiative, Indian Institute of Science, Bangalore 560 012, Karnataka, India, 2Department of Biochemistry, Indian Institute of Science, Bangalore 560 012, Karnataka, India, 3Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, Karnataka, India and 4National Centre for Biological Sciences, Gandhi Krishi Vignan Kendra Campus, Bellary road, Bangalore 560 065, Karnataka, India Received August 03, 2014; Revised September 11, 2014; Accepted September 15, 2014 ABSTRACT INTRODUCTION NrichD (http://proline.biochem.iisc.ernet.in/NRICHD/) Efficiency of protein remote homology detection has been is a database of computationally designed protein- dependent on the availability of sequences which can con- like sequences, augmented into natural sequence vincingly connect distantly related proteins (1,2). With im- databases that can perform hops in protein se- provements in homology detection methods, natural linker quence space to assist in the detection of remote sequences often serve as ‘tools’ that facilitate hops be- tween proteins to connect them (3–6). Despite these ad- relationships. Establishing protein relationships vancements, identification
    [Show full text]
  • Biopython Tutorial and Cookbook
    Biopython Tutorial and Cookbook Jeff Chang, Brad Chapman, Iddo Friedberg, Thomas Hamelryck, Michiel de Hoon, Peter Cock, Tiago Antao, Eric Talevich Last Update – 31 August 2010 (Biopython 1.55) Contents 1 Introduction 7 1.1 What is Biopython? ......................................... 7 1.2 What can I find in the Biopython package ............................. 7 1.3 Installing Biopython ......................................... 8 1.4 Frequently Asked Questions (FAQ) ................................. 8 2 Quick Start – What can you do with Biopython? 12 2.1 General overview of what Biopython provides ........................... 12 2.2 Working with sequences ....................................... 12 2.3 A usage example ........................................... 13 2.4 Parsing sequence file formats .................................... 14 2.4.1 Simple FASTA parsing example ............................... 14 2.4.2 Simple GenBank parsing example ............................. 15 2.4.3 I love parsing – please don’t stop talking about it! .................... 15 2.5 Connecting with biological databases ................................ 15 2.6 What to do next ........................................... 16 3 Sequence objects 17 3.1 Sequences and Alphabets ...................................... 17 3.2 Sequences act like strings ...................................... 18 3.3 Slicing a sequence .......................................... 19 3.4 Turning Seq objects into strings ................................... 20 3.5 Concatenating or adding sequences ................................
    [Show full text]
  • Structural Basis for Substrate Specificity in the Escherichia Coli
    Structural basis for substrate specificity in the Escherichia coli maltose transport system Michael L. Oldhama, Shanshuang Chenb, and Jue Chena,b,1 aHoward Hughes Medical Institute and bDepartment of Biological Sciences, Purdue University, West Lafayette, IN 47907 Edited by Christopher Miller, Howard Hughes Medical Institute, Brandeis University, Waltham, MA, and approved September 27, 2013 (received for review June 14, 2013) ATP-binding cassette (ABC) transporters are molecular pumps that maltose analogs with a modified reducing end are not trans- harness the chemical energy of ATP hydrolysis to translocate sol- ported despite their high-affinity binding to MBP (5, 6). Further utes across the membrane. The substrates transported by different evidence for selectivity through the ABC transporter MalFGK2 ABC transporters are diverse, ranging from small ions to large itself comes from mutant transporters that function independently proteins. Although crystal structures of several ABC transporters of MBP. In the absence of MBP, these mutants constitutively are available, a structural basis for substrate recognition is still hydrolyze ATP and specifically transport maltodextrins (7, 8). lacking. For the Escherichia coli maltose transport system, the se- In this study, we determined the crystal structures of the lectivity of sugar binding to maltose-binding protein (MBP), the maltose transport complex MBP-MalFGK2 bound with large periplasmic binding protein, does not fully account for the selec- maltodextrin in two conformational states. The determination tivity of sugar transport. To obtain a molecular understanding of of these structures, along with previous studies of maltoporin this observation, we determined the crystal structures of the trans- and MBP, allow us to define how overall substrate specificity is porter complex MBP-MalFGK2 bound with large malto-oligosaccha- achieved for the maltose transport system.
    [Show full text]
  • Time-Resolved Interaction of Single Antibiotic Molecules with Bacterial Pores
    Designed to penetrate: Time-resolved interaction of single antibiotic molecules with bacterial pores Ekaterina M. Nestorovich*, Christophe Danelon†, Mathias Winterhalter†, and Sergey M. Bezrukov*‡§ *Laboratory of Physical and Structural Biology, National Institute of Child Health and Human Development, National Institutes of Health, Building 9, Room 1E-122, Bethesda, MD 20892-0924; †Institut Pharmacologie et Biologie Structurale, 31 077 Toulouse, France; and ‡St. Petersburg Nuclear Physics Institute, Gatchina 188350, Russia Edited by Charles F. Stevens, The Salk Institute for Biological Studies, La Jolla, CA, and approved May 28, 2002 (received for review April 3, 2002) Membrane permeability barriers are among the factors contribut- ing to the intrinsic resistance of bacteria to antibiotics. We have been able to resolve single ampicillin molecules moving through a channel of the general bacterial porin, OmpF (outer membrane protein F), believed to be the principal pathway for the ␤-lactam antibiotics. With ion channel reconstitution and high-resolution conductance recording, we find that ampicillin and several other efficient penicillins and cephalosporins strongly interact with the residues of the constriction zone of the OmpF channel. Therefore, we hypothesize that, in analogy to substrate-specific channels that evolved to bind certain metabolite molecules, antibiotics have ‘‘evolved’’ to be channel-specific. Molecular modeling suggests that the charge distribution of the ampicillin molecule comple- ments the charge distribution at the narrowest part of the bacterial porin. Interaction of these charges creates a region of attraction inside the channel that facilitates drug translocation through the constriction zone and results in higher permeability rates. lthough the mechanisms of antibiotic action on organisms Fig.
    [Show full text]