Unbiased Protein Interface Prediction Based on Ligand Diversity Quantification∗

Total Page:16

File Type:pdf, Size:1020Kb

Unbiased Protein Interface Prediction Based on Ligand Diversity Quantification∗ Unbiased Protein Interface Prediction Based on Ligand Diversity Quantification∗ Reyhaneh Esmaielbeiki and Jean-Christophe Nebel Faculty of Science, Engineering and Computing, Kingston University, London Kingston-Upon-Thames, Surrey, KT1 2EE, UK {r.esmaielbeiki,J.Nebel}@kingston.ac.uk Abstract Proteins interact with each other to perform essential functions in cells. Consequently, identific- ation of their binding interfaces can provide key information for drug design. Here, we introduce Weighted Protein Interface Prediction (WePIP), an original framework which predicts protein interfaces from homologous complexes. WePIP takes advantage of a novel weighted score which is not only based on structural neighbours’ information but, unlike current state-of-the-art methods, also takes into consideration the nature of their interaction partners. Experimental validation demonstrates that our weighted schema significantly improves prediction performance. In par- ticular, we have established a major contribution to ligand diversity quantification. Moreover, application of our framework on a standard dataset shows WePIP performance compares favour- ably with other state of the art methods. 1998 ACM Subject Classification J.3 Life and Medical Sciences (Biology and genetics) Keywords and phrases Protein-protein interaction, protein interface prediction, homology mod- eling Digital Object Identifier 10.4230/OASIcs.GCB.2012.119 1 Introduction Protein-protein interaction (PPIs) is essential for the functionality of living cells. Alterations of these interactions affect biochemical processes which may lead to critical diseases such as cancer [31]. Therefore, knowledge about protein interactions and their resulting 3D complexes can provide key information for drug design. A number of experimental techniques are available to identify residues involved in PPIs [31]. Although they provide valuable contribution to PPI knowledge, their cost in terms of time and expense limits their practical use [10]. Consequently, computational methods have been proposed to identify protein interfaces. They can be broadly divided in sequence only and structure based approaches. Sequence based methodologies usually rely on a sliding window which allows calculating specific features associated to each amino acid according to its neighbours [24, 28, 35, 8]. Then, a classifier discriminates between interface and non-interface residues according to residue scores. Those approaches differ mainly in their selection of amino acid properties, such as physico-chemical properties [8, 7], residues distribution [24] or conservation degree [34, 25], machine learning algorithm and scoring functions [38]. When the 3D structure of the query protein (QP) is available, integration of structural information, e.g. residues secondary structure or solvent-accessible surface area, allows ∗ This work was in part supported by grant 6435/B/T02/2011/40 of the Polish National Centre for Science. © Reyhaneh. Esmaielbeiki and Jean-Christophe Nebel; licensed under Creative Commons License ND German Conference on Bioinformatics 2012 (GCB’12). Editors: S. Böcker, F. Hufsky, K. Scheubert, J. Schleicher, S. Schuster; pp. 119–130 OpenAccess Series in Informatics Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany 120 Unbiased Protein Interface Prediction better predictions [25, 32]. Three approaches taking advantage of these properties are seen as state of the art [38]. ProMate combines 13 different properties, such as chemical component, geometric properties and information from relevant crystal structures, to generate a quantitative measure [23]. Protein interface residues are then predicted using a clustering process relying on mutual information. Alternatively, Cons-PPISP discriminates between residues using neural networks trained with protein’s surface sequence profiles and solvent accessibility of neighbouring residues [6]. Finally, PINUP addresses the problem using an empirical energy function which is based on a linear combination of side chain energy score, interface propensity and residue conservation [22]. Despite fundamental differences, these three approaches display very similar performance [38]. However, each of them seems to capture different important aspects of residue interactions. As a result, a meta-predictor, Meta-PPISP, combining their scores using a linear regression analysis, is able to outperform each of these individual methods in terms of accuracy [27]. With the increasing number of experimentally determined protein 3D structures, they have become the main source of interface prediction methods. First, structurally homologous proteins tend to display similar interaction sites [1, 9]. Secondly, protein’s binding sites are evolutionarily conserved among structurally similar proteins (or structural neighbours) [37, 18, 19, 4, 36, 5]. Even remote structural neighbours have been shown to display a significant level of interface conservation [37]. Consequently, structure based methods for interface prediction rely on analysing proteins which are structurally similar to the query protein. Initial approaches focused on detecting conserved areas among homologous structural neighbours. Carl et al. use graph representation of surface residues [18, 19] to describe homo- logous binding sites [4]. They then refine their technique by using local structural similarity instead of global similarities of the query protein to detect the structural neighbours [5]. A more general method, PredUs, maps interacting residues from structural neighbours onto QP even if they do not display any homology [36]. Whereas PredUs still dependents on the existence of structural neighbours of QP, PrISE proposes to deal with this limitation by predicting interface residues from local structural similarity only [15]. This is achieved using a repository of structural elements (SE) generated from the Protein Data Bank (PDB) [3]. For each SE of the query protein (consisting of a central residues and it neighbours) a set of similar SEs are extracted from the repository and a weight is assigned to them based on their similarity to QP. The central residues of the query protein’s SE are predicted as interface if a weighted majority of its similar SEs are interface residues. Although more general, PrISE displays comparable performance to PredUs [15]. Those two approaches have significantly improved the ability of predicting interface residues; see Table 3. However, they do not deal satisfactorily with the very heterogeneous nature of the PDB. First, the presence of complex duplicates, or homologs, biases predictions towards specific configurations, which can affect negatively performance. Secondly, confidence in the information provided by the interface of a structural neighbour should depend on its degree of homology with QP. Although PrISE acknowledges both issues, it does not address the first one [15]. PredUs deals with these matters in a binary fashion. Complexes involving structural neighbours are clustered and a 40% similarity cut-off is used to choose the representatives which will inform interface prediction. Here, we address those limitations of structure based methods by quantifying, first, homology between QP and its structural neighbours and,second, ligand diversity between the partners, or ligands, of the structural neighbours. In this study we introduce Weighted Protein Interface Prediction (WePIP) framework, a novel PIP approach based on structural neighbours’ information. Its main contribution R. Esmaielbeiki and J.-C. Nebel 121 Figure 1 Interface prediction framework for single (A) and pair protein queries (B). A) For a single query, if at least one homologous complex exists, interfaces are predicted using WePIP otherwise PredUs is used. B) For a pair of proteins, depending on the existence of homologous complexes, interfaces are predicted by one or a combination of the WePIPc, WePIP and PredUs methods. is the weighted score assigned to each residue of QP, which takes into account not only the degree of homology of structural neighbours, but also the nature of their interacting partners. After description of the methodology and validation of the proposed weighted schema, WePIP is evaluated against state of art protein interface prediction methods using a standard benchmark dataset. 2 Methods 2.1 Interface Prediction Principles When two protein chains form a dimer, they bind through their interaction interfaces. We propose a novel methodology predicting the amino acids which are involved in binding interactions based on the 3D structures of the dimer partners. In this study dimer refers to any two protein chains involved in interaction. Not only does our approach predict the locations of interfaces when both binding partners are known, but it also infers the most likely binding interface of a single protein. Figure 1.A and 1.B describe those interface prediction pipelines for single and pair protein queries respectively. Our methodology relies on discovering interaction patterns from the analysis of the 3D structures of complexes involving homologs of the dimer partners, called ‘homologous complexes’. In this work, proteins are defined as homologous if their sequence similarity is expressed by an E-value ≤ 10−2. First, Blast [2] is used to classify each query partner G C B 2 0 1 2 122 Unbiased Protein Interface Prediction according to the availability of homologous complexes in PDB [3]. When both interaction partners are known and common homologous complexes exist, the WePIPc method exploits
Recommended publications
  • RECENT ADVANCES in BIOLOGY, BIOPHYSICS, BIOENGINEERING and COMPUTATIONAL CHEMISTRY
    RECENT ADVANCES in BIOLOGY, BIOPHYSICS, BIOENGINEERING and COMPUTATIONAL CHEMISTRY Proceedings of the 5th WSEAS International Conference on CELLULAR and MOLECULAR BIOLOGY, BIOPHYSICS and BIOENGINEERING (BIO '09) Proceedings of the 3rd WSEAS International Conference on COMPUTATIONAL CHEMISTRY (COMPUCHEM '09) Puerto De La Cruz, Tenerife, Canary Islands, Spain December 14-16, 2009 Recent Advances in Biology and Biomedicine A Series of Reference Books and Textbooks Published by WSEAS Press ISSN: 1790-5125 www.wseas.org ISBN: 978-960-474-141-0 RECENT ADVANCES in BIOLOGY, BIOPHYSICS, BIOENGINEERING and COMPUTATIONAL CHEMISTRY Proceedings of the 5th WSEAS International Conference on CELLULAR and MOLECULAR BIOLOGY, BIOPHYSICS and BIOENGINEERING (BIO '09) Proceedings of the 3rd WSEAS International Conference on COMPUTATIONAL CHEMISTRY (COMPUCHEM '09) Puerto De La Cruz, Tenerife, Canary Islands, Spain December 14-16, 2009 Recent Advances in Biology and Biomedicine A Series of Reference Books and Textbooks Published by WSEAS Press www.wseas.org Copyright © 2009, by WSEAS Press All the copyright of the present book belongs to the World Scientific and Engineering Academy and Society Press. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the Editor of World Scientific and Engineering Academy and Society Press. All papers of the present volume were peer reviewed
    [Show full text]
  • Current Topics in Computational Molecular Biology Edited by Tao Jiang Ying Xu Michael Q. Zhang a Bradford Book the MIT Press
    Current Topics in Computational Molecular Biology edited by Tao Jiang Ying Xu Michael Q. Zhang A Bradford Book The MIT Press Cambridge, Massachusetts London, England Contents Preface vii I INTRODUCTION 1 1 The Challenges Facing Genomic Informatics 3 Temple F. Smith H COMPARATIVE SEQUENCE AND GENOME ANALYSIS 9 2 Bayesian Modeling and Computation in Bioinformatics Research 11 Jun S. Liu 3 Bio-Sequence Comparison and Applications 45 Xiaoqiu Huang 4 Algorithmic Methods for Multiple Sequence Alignment 71 Tao Jiang and Lusheng Wang 5 Phylogenetics and the Quartet Method 111 Paul Kearney 6 Genome Rearrangement 135 David Sankoff and Nadia El-Mabrouk 7 Compressing DNA Sequences 157 Ming Li HI DATA MINING AND PATTERN DISCOVERY 173 8 Linkage Analysis of Quantitative Traits 175 Shizhong Xu , 9 Finding Genes by Computer: Probabilistic and Discriminative Approaches 201 Victor V. Solovyev 10 Computational Methods for Promoter Recognition 249 Michael Q. Zhang 11 Algorithmic Approaches to Clustering Gene Expression Data 269 Ron Shamir and Roded Sharan 12 KEGG for Computational Genomics 301 Minoru Kanehisa and Susumu Goto vi Contents 13 Datamining: Discovering Information from Bio-Data 317 Limsoon Wong IV COMPUTATIONAL STRUCTURAL BIOLOGY 343 14 RNA Secondary Structure Prediction 345 Zhuozhi Wang and Kaizhong Zhang 15 Properties and Prediction of Protein Secondary Structure 365 Victor V. Solovyev and Ilya N. Shindyalov 16 Computational Methods for Protein Folding: Scaling a Hierarchy of Complexities 403 Hue Sun Chan, Huseyin Kaya, and Seishi Shimizu 17 Protein Structure Prediction by Comparison: Homology-Based Modeling 449 Manuel C. Peitsch, Torsten Schwede, Alexander Diemand, and Nicolas Guex 18 Protein Structure Prediction by Protein Threading and Partial Experimental Data 467 Ying Xu and Dong Xu 19 Computational Methods for Docking and Applications to Drug Design: Functional Epitopes and Combinatorial Libraries 503 Ruth Nussinov, Buyong Ma, and Haim J.
    [Show full text]
  • Bonnie Berger Named ISCB 2019 ISCB Accomplishments by a Senior
    F1000Research 2019, 8(ISCB Comm J):721 Last updated: 09 APR 2020 EDITORIAL Bonnie Berger named ISCB 2019 ISCB Accomplishments by a Senior Scientist Award recipient [version 1; peer review: not peer reviewed] Diane Kovats 1, Ron Shamir1,2, Christiana Fogg3 1International Society for Computational Biology, Leesburg, VA, USA 2Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel 3Freelance Writer, Kensington, USA First published: 23 May 2019, 8(ISCB Comm J):721 ( Not Peer Reviewed v1 https://doi.org/10.12688/f1000research.19219.1) Latest published: 23 May 2019, 8(ISCB Comm J):721 ( This article is an Editorial and has not been subject https://doi.org/10.12688/f1000research.19219.1) to external peer review. Abstract Any comments on the article can be found at the The International Society for Computational Biology (ISCB) honors a leader in the fields of computational biology and bioinformatics each year with the end of the article. Accomplishments by a Senior Scientist Award. This award is the highest honor conferred by ISCB to a scientist who is recognized for significant research, education, and service contributions. Bonnie Berger, Simons Professor of Mathematics and Professor of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology (MIT) is the 2019 recipient of the Accomplishments by a Senior Scientist Award. She is receiving her award and presenting a keynote address at the 2019 Joint International Conference on Intelligent Systems for Molecular Biology/European Conference on Computational Biology in Basel, Switzerland on July 21-25, 2019. Keywords ISCB, Bonnie Berger, Award This article is included in the International Society for Computational Biology Community Journal gateway.
    [Show full text]
  • ISMB 2008 Toronto
    ISMB 2008 Toronto The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Linial, Michal, Jill P. Mesirov, B. J. Morrison McKay, and Burkhard Rost. 2008. ISMB 2008 Toronto. PLoS Computational Biology 4(6): e1000094. Published Version doi:10.1371/journal.pcbi.1000094 Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:11213310 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Message from ISCB ISMB 2008 Toronto Michal Linial1,2, Jill P. Mesirov1,3, BJ Morrison McKay1*, Burkhard Rost1,4 1 International Society for Computational Biology (ISCB), University of California San Diego, La Jolla, California, United States of America, 2 Sudarsky Center, The Hebrew University of Jerusalem, Jerusalem, Israel, 3 Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America, 4 Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America the integration of students, and for the of ISMB. One meeting in South Asia support of young leaders in the field. (InCoB; http://incob.binfo.org.tw/) has ISMB has also become a forum for already been sponsored by ISCB, and reviewing the state of the art in the many another one in North Asia is going to fields of this growing discipline, for follow. ISMB itself has also been held in introducing new directions, and for an- Australia (2003) and Brazil (2006).
    [Show full text]
  • BIOINFORMATICS Doi:10.1093/Bioinformatics/Btq499
    Vol. 26 ECCB 2010, pages i409–i411 BIOINFORMATICS doi:10.1093/bioinformatics/btq499 ECCB 2010 Organization CONFERENCE CHAIR B. Comparative Genomics, Phylogeny, and Evolution Yves Moreau, Katholieke Universiteit Leuven, Belgium Martijn Huynen, Radboud University Nijmegen Medical Centre, The Netherlands PROCEEDINGS CHAIR Yves Van de Peer, Ghent University & VIB, Belgium Jaap Heringa, Free University of Amsterdam, The Netherlands C. Protein and Nucleotide Structure LOCAL ORGANIZING COMMITTEE Anna Tramontano, University of Rome ‘La Sapienza’, Italy Jan Gorodkin, University of Copenhagen, Denmark Yves Moreau, Katholieke Universiteit Leuven, Belgium Jaap Heringa, Free University of Amsterdam, The Netherlands D. Annotation and Prediction of Molecular Function Gert Vriend, Radboud University, Nijmegen, The Netherlands Yves Van de Peer, University of Ghent & VIB, Belgium Nir Ben-Tal, Tel-Aviv University, Israel Kathleen Marchal, Katholieke Universiteit Leuven, Belgium Fritz Roth, Harvard Medical School, USA Jacques van Helden, Université Libre de Bruxelles, Belgium Louis Wehenkel, Université de Liège, Belgium E. Gene Regulation and Transcriptomics Antoine van Kampen, University of Amsterdam & Netherlands Jaak Vilo, University of Tartu, Estonia Bioinformatics Center (NBIC) Zohar Yakhini, Agilent Laboratories, Tel-Aviv & the Tech-nion, Peter van der Spek, Erasmus MC, Rotterdam, The Netherlands Haifa, Israel STEERING COMMITTEE F. Text Mining, Ontologies, and Databases Michal Linial (Chair), Hebrew University, Jerusalem, Israel Alfonso Valencia, National
    [Show full text]
  • ISMB 99 August 6 – 10, 1999 Heidelberg, Germany the Seventh
    ______________________________________ Welcome to ISMB 99 August 6 – 10, 1999 Heidelberg, Germany The Seventh International Conference on Intelligent Systems for Molecular Biology ______________________________________ Final Program and Detailed Schedule Friday, August 6, 1999 Tutorial Day The tutorials will take place in the following rooms: 8:30 – 12:30 (Coffee break around 10:30) Tutorial #1 Trübnersaal Piere Baldi Probabilistic graphical models Tutorial #2 Robert-Schumann-Zimmer Douglas L. Brutlag Bioinformatics and Molecular Biology Tutorial #3 Ballsaal Martin Reese The challenge of annotating a complete eukaryotic genome: A case study in Drosophila melanogaster Tutorial #4 Gustav-Mahler-Zimmer Tandy Warnow Computational and statistical Junhyong Kim challenges involved in reconstructing evolutionary trees Tutorial #5 Sebastian-Münster-Saal Thomas Werner The biology and bioinformatics of regulatory regions in genomes Lunch (on this day served in "Grosser Saal" on the ground floor) 13:30 – 17:30 (Coffee break around 15:30) Tutorial #6 Sebastian-Münster-Saal Rob Miller EST Clustering Alan Christoffels Winston Hide Tutorial #7 Trübnersaal Kevin Karplus Getting the most out of hidden Markov Melissa Cline models Christian Barrett Tutorial #8 Robert-Schumann-Zimmer Arthur Lesk Sequence-structure relationships and evolutionary structure changes in proteins Tutorial #9 Gustav-Mahler-Zimmer David States PERL abstractions for databases and Brian Dunford distributed computing Shore Tutorial # 10 Ballsaal Zoltan Szallasi Genetic network analysis
    [Show full text]
  • The 4Th Bologna Winter School: Hot Topics in Structural Genomics†
    Comparative and Functional Genomics Comp Funct Genom 2003; 4: 394–396. Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.314 Conference Report The 4th Bologna Winter School: hot topics in structural genomics† Rita Casadio* Department of Biology/CIRB, University of Bologna, Via Irnerio 42, 40126 Bologna, Italy *Correspondence to: Abstract Rita Casadio, Department of Biology/CIRB, University of The 4th Bologna Winter School on Biotechnologies was held on 9–15 February Bologna, Via Irnerio 42, 40126 2003 at the University of Bologna, Italy, with the specific aim of discussing recent Bologna, Italy. developments in bioinformatics. The school provided an opportunity for students E-mail: [email protected] and scientists to debate current problems in computational biology and possible solutions. The course, co-supported (as last year) by the European Science Foundation program on Functional Genomics, focused mainly on hot topics in structural genomics, including recent CASP and CAPRI results, recent and promising genome- Received: 3 June 2003 wide predictions, protein–protein and protein–DNA interaction predictions and Revised: 5 June 2003 genome functional annotation. The topics were organized into four main sections Accepted: 5 June 2003 (http://www.biocomp.unibo.it). Published in 2003 by John Wiley & Sons, Ltd. Predictive methods in structural Predictive methods in functional genomics genomics • Contemporary challenges in structure prediction • Prediction of protein function (Arthur Lesk, and the CASP5 experiment (John Moult, Uni- University of Cambridge, Cambridge, UK). versity of Maryland, Rockville, MD, USA). • Microarray data analysis and mining (Raf- • Contemporary challenges in structure prediction faele Calogero, University of Torino, Torino, (Anna Tramontano, University ‘La Sapienza’, Italy).
    [Show full text]
  • Parallelization of Dynamic Programming Recurrences in Computational Biology Arpith Jacob Washington University in St
    Washington University in St. Louis Washington University Open Scholarship All Theses and Dissertations (ETDs) January 2010 Parallelization of dynamic programming recurrences in computational biology Arpith Jacob Washington University in St. Louis Follow this and additional works at: https://openscholarship.wustl.edu/etd Recommended Citation Jacob, Arpith, "Parallelization of dynamic programming recurrences in computational biology" (2010). All Theses and Dissertations (ETDs). 169. https://openscholarship.wustl.edu/etd/169 This Dissertation is brought to you for free and open access by Washington University Open Scholarship. It has been accepted for inclusion in All Theses and Dissertations (ETDs) by an authorized administrator of Washington University Open Scholarship. For more information, please contact [email protected]. WASHINGTON UNIVERSITY IN ST. LOUIS School of Engineering and Applied Science Department of Computer Science and Engineering Thesis Examination Committee: Jeremy Buhler, Chair Michael Brent Ron Cytron Mark Franklin Robert Morley Bill Smart David Taylor PARALLELIZATION OF DYNAMIC PROGRAMMING RECURRENCES IN COMPUTATIONAL BIOLOGY by Arpith Chacko Jacob A dissertation presented to the Graduate School of Arts and Sciences of Washington University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY December 2010 Saint Louis, Missouri copyright by Arpith Chacko Jacob 2010 ABSTRACT OF THE THESIS Parallelization of dynamic programming recurrences in computational biology by Arpith Chacko Jacob Doctor of Philosophy in Computer Science Washington University in St. Louis, 2010 Research Advisor: Professor Jeremy Buhler The rapid growth of biosequence databases over the last decade has led to a per- formance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted.
    [Show full text]
  • PLOS Computational Biology Associate Editor Handbook Second Edition Version 2.6
    PLOS Computational Biology Associate Editor Handbook Second Edition Version 2.6 Welcome and thank you for your support of PLOS Computational Biology and the Open Access movement. PLOS Computational Biology is one of the four community run, Open Access journals published by the Public Library of Science. The journal features works of exceptional significance that further our understanding of living systems at all scales—from molecules and cells, to patient populations and ecosystems—through the application of computational methods. Readers include life and computational scientists, who can take the important findings presented here to the next level of discovery. Since its launch in 2005, PLOS Computational Biology has grown not only in the number of papers published, but also in recognition as a quality publication and community resource. We have every reason to expect that this trend will continue and our team of Associate Editors play an invaluable role in this process. Associate Editors oversee the peer review process for the journal, including evaluating submissions, selecting reviewers, assessing reviewer comments, and making editorial decisions. Together with fellow Editorial Board members and the PLOS CB Team, Associate Editors uphold journal policies and ethical standards and work to promote the PLOS mission to provide free public access to scientific research. PLOS Computational Biology is one of a suite of influential journals published by PLOS. Information about the other journals, the PLOS business model, PLOS innovations in scientific publishing, and Open Access copyright and licensure can be found in Appendix IX. Table of Contents PLOS Computational Biology Editorial Board Membership ................................................................................... 2 Manuscript Evaluation .....................................................................................................................................
    [Show full text]
  • Predicting RNA Binding Proteins Using Discriminative Methods
    Predicting RNA binding proteins using discriminative methods by Shweta Bhandare M.S. Department of Interdisciplinary Telecommunications, University of Colorado at Boulder 2003 B.E., College of Engineering, University of Goa, 1995 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science 2017 This thesis entitled: Predicting RNA binding proteins using discriminative methods written by Shweta Bhandare has been approved for the Department of Computer Science Prof. Robin Dowell Dr. Debra S. Goldberg Prof. Larry E. Hunter Dr. Daniel Weaver Date The final copy of this thesis has been examined by the signatories, and we find that both the content and the form meet acceptable presentation standards of scholarly work in the above mentioned discipline. iii Bhandare, Shweta (Ph.D., Computer Science) Predicting RNA binding proteins using discriminative methods Thesis directed by Prof. Robin Dowell and Dr. Debra S. Goldberg This thesis examines the role of computational methods in identifying the motifs utilized by RNA-binding proteins (RBPs). RBPs play an important role in post-transcriptional regulation and identify their targets in a highly specific fashion through recognition of primary sequence and/or secondary structure hence making the prediction a complex problem. I applied the existing k-spectrum kernel method to a support vector machine and verified the published binding sites of two RBPs: Human antigen R (HuR) and Tristetraprolin (TTP). These RBPs exhibit opposing effects to the bound messenger RNA (mRNA) transcript but have simi- lar binding preferences.
    [Show full text]
  • Research in Computational Molecular Biology
    Terry Speed Haiyan Huang (Eds.) Research in Computational Molecular Biology 1 lth Annual International Conference, RECOMB 2007 Oakland, CA, USA, April 21-25, 2007 Proceedings Sprin ger Table of Contents QNet: A Tool for Querying Protein Interaction Networks 1 Banu Dost, Tomer Shlomi, Nitin Gupta, Eytan Ruppin, Vineet Bafna, and Roded Sharan Pairwise Global Alignment of Protein Interaction Networks by Matching Neighborhood Topology 16 Rohit Singh, Jinbo Xu, and Bonnie Berger Reconstructing the Topology of Protein Complexes 32 Allister Bernard, David S. Vaughn, and Alexander J. Hartemink Network Legos: Building Blocks of Cellular Wiring Diagrams 47 T.M. Murali and Corban G. Rivera An Efficient Method for Dynamic Analysis of Gene Regulatory Networks and in silico Gene Perturbation Experiments 62 Abhishek Garg, Ioannis Xenarios, Luis Mendoza, and Giovanni DeMicheli A Feature-Based Approach to Modeling Protein-DNA Interactions 77 Eilon Sharon and Eran Segal Network Motif Discovery Using Subgraph Enumeration and Symmetry-Breaking 92 Joshua A. Grochow and Manolis Kellis Nucleosome Occupancy Information Improves de novo Motif Discovery 107 Leelavati Narlikar, Raluca Gordan, and Alexander J. Hartemink Framework for Identifying Common Aberrations in DNA Copy Number Data 122 Amir Ben-Dor, Doron Lipson, Anya Tsalenko, Mark Reimers, Lars O. Baumbusch, Michael T. Barrett, John N. Weinstein, Anne-Lise B0rresen-Dale, and Zohar Yakhini Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models 137 Wenyi Wang, Benilton Carvalho, Nate Miller, Jonathan Pevsner, Aravinda Chakravarti, and Rafael A. Irizarry GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data 151 Yanxin Shi, Fan Guo, Wei Wu, and Eric P.
    [Show full text]
  • Safe and Complete Algorithms for Dynamic Programming Problems
    Safe and Complete Algorithms for Dynamic Programming Problems, with an Application to RNA Folding Niko Kiirala Department of Computer Science and Helsinki Institute for Information Technology HIIT, University of Helsinki, Finland [email protected].fi Leena Salmela1 Department of Computer Science and Helsinki Institute for Information Technology HIIT, University of Helsinki, Finland leena.salmela@helsinki.fi Alexandru I. Tomescu1 Department of Computer Science and Helsinki Institute for Information Technology HIIT, University of Helsinki, Finland alexandru.tomescu@helsinki.fi Abstract Many bioinformatics problems admit a large number of solutions, with no way of distinguishing the correct one among them. One approach of coping with this issue is to look at the partial solutions common to all solutions. Such partial solutions have been called safe, and an algorithm outputting all safe solutions has been called safe and complete. In this paper we develop a general technique that automatically provides a safe and complete algorithm to problems solvable by dynamic programming. We illustrate it by applying it to the bioinformatics problem of RNA folding, assuming the simplistic folding model maximizing the number of paired bases. Our safe and complete algorithm has time complexity O(n3M(n)) and space complexity O(n3) where n is the length of the RNA sequence and M(n) ∈ Ω(n) is the time complexity of arithmetic operations on O(n)-bit integers. We also implement this algorithm and show that, despite an exponential number of optimal solutions, our algorithm is efficient in practice. 2012 ACM Subject Classification Theory of computation → Dynamic programming; Applied computing → Life and medical sciences; Applied computing → Molecular structural biology; Theory of computation → Design and analysis of algorithms Keywords and phrases RNA secondary structure, RNA folding, Safe solution, Safe and complete algorithm, Counting problem Digital Object Identifier 10.4230/LIPIcs.CPM.2019.8 Funding Leena Salmela: Supported by Academy of Finland (grants 308030 and 314170).
    [Show full text]