Comparative Modelling

Total Page:16

File Type:pdf, Size:1020Kb

Comparative Modelling Protein Prediction Part 1: Structure Comparative modelling Andrea Schafferhans (built on slides from Burkhard Rost) 2012/05/10 Using structure to predictF. Glaser function et al. T.P. is supported by a JSPS fellowship. E.M. is supported by the Division of Undergraduate Education of the US National Science Foundation. We are grateful to the Bioinformatics Unit and the George S. Wise Faculty of Life Sciences at Tel Aviv University for providing technical assistance and computation facilities. Downloaded from REFERENCES Adams,J.M. and Cory,S. (1998) The Bcl-2 protein family: arbiters of cell survival. Science, 281, 1322–1326. Aloy,P., Querol,E., Aviles,F.X. and Sternberg,M.J. (2001) Auto- mated structure-based prediction of functional sites in proteins: http://bioinformatics.oxfordjournals.org/ applications to assessing the validity of inheriting protein func- tion from homology in genome annotation and to protein dock- ing. J. Mol. Biol., 311, 395–408. Armon,A., Graur,D. and Ben-Tal,N. (2001) ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J. Mol. Biol., 307, 447–463. Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Chymotrypsin Subtilin Data Bank. Nucleic Acids Res., 28, 235–242. (5cha) (5sic) Brenner,S.E. (2001) A tour of structural genomics. Nat. Rev. Genet., 2, 801–809. at Universitatsbibliothek der Technischen Universitaet Muenchen Zweigbibliothe on May 9, 2012 Innis,C.A., Shi,J. and Blundell,T.L. (2000) Evolutionary trace analysis of TGF-beta and related growth factors: implications for site-directed mutagenesis. Protein Eng., 13, 839–847. Landgraf,R., Xenarios,I. and Eisenberg,D. (2001) Three- dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. J. Mol. Biol., 307, 1487–1502. Fig. 1. Conservation pattern in the Bcl-XL/Bak complex (PDB ID: Lichtarge,O., Bourne,H.R. and Cohen,F.E. (1996) An evolutionary 1bx1).ConSurf The Bcl-XL protein is represented as a spacefill model, where trace method defines binding surfaces common to protein the residue conservation scores are color-coded onto its Van der families. J. Mol. Biol., 257, 342–358. Waals surface. The Bak peptide (residues 72–87) is shown as a Martz,E. (2002) Protein explorer: easy yet powerful macromolecu- yellowF backbone Glaser, model. The T color-coding Pupko bar shows, I the Paz, coloring RElar visualization. Trends Biochem. Sci, 27, 107–109. scheme; conserved amino acids are colored bordeaux, residues Pupko,T., Bell,R.E., Mayrose,I., Glaser,F. and Ben-Tal,N. (2002) of averageBell, conservation D Bechor-Shental are white, and variable amino acids, are E Rate4Site: An algorithmic tool for the identification of functional turquoise. regions on proteins by surface mapping of evolutionary determi- Martz and N Ben-Tal (2003)nants within their homologues. Bioinformatics, 18 (suppl), S71– S77. is known to be required for anti-apoptotic activity and Sattler,M., Liang,H., Nettesheim,D., Meadows,R.P., Harlan,J.E., mayBioinformatics play a role in the interaction with19 CED-4 163-4 (Adams Eberstadt,M., Yoon,H.S., Shuker,S.B., Chang,B.S., Minn,A.J., and Cory, 1998). Given the same MSA as an input, Thompson,C.B. and Fesik,S.W. (1997) Structure of Bcl-xL-Bak two other web-servers based on the Evolutionary Trace peptide complex: recognition between regulators of apoptosis. method (Innis et al., 2000; Lichtarge et al., 1996) and a Science, 275, 983–986. consensus approach (Martz, 2002) failed to identify one Schneider,R. and Sander,C. (1996) The HSSP database of protein 2 or both of these patches. (See http://consurf.tau.ac.il under structure-sequence alignments. Nucleic Acids Res., 24, 201–205. ‘OVERVIEW’ for details). Valdar,W.S. and Thornton,J.M. (2001) Protein-protein interfaces: analysis of amino acid conservation in homodimers. Proteins, 42, 108–124. ACKNOWLEDGMENTS Yona,G., Linial,N. and Linial,M. (2000) ProtoMap: automatic This study was supported by a Research Career Devel- classification of protein sequences and hierarchy of protein opment Award from the Israel Cancer Research Fund. families. Nucleic Acids Res., 28, 49–55. 164 Universe of protein structures C h r i s ti n e O r e n g o (S tr u c tu r e s , 1 9 9 7 , 5 , 1 0 9 3 -1 1 0 8 ) C h r i s ti n e O r e n g o (S tr u c tu r e s , 1 9 9 7 , 5 , 1 0 9 3 -1 1 0 8 ) Christine Orengo et al. 1997 Structures 5 1093-1108 3 Protein structure (prediction) in numbers Estimate for 1999 Number of entries 05/2012 3D Uniprot/TREMBL: 21,552,793 Uniprot/Swissprot: 535,698 PDB: 81,369 1D HoMo FoRc 4 Redundancy in the PDB Numberproteins of inPDB 5 Protein structure (prediction) in numbers Estimate for 1999 Number of entries 05/2012 3D Uniprot/TREMBL: 21,552,793 Uniprot/Swissprot: 535,698 PDB: 81,369 1D HoMo unique sequences 46,166 FoRc 6 Goal of structure prediction Epstein & Anfinsen, 1961: sequence uniquely determines structure INPUT: sequence OUTPUT: 3D structure and function 7 Structure and sequence similarity WHAT IF EEEE B B B B EEEEEE EEEEEE EEEEEEEEHHHEEE 1shf 100% VTLFVALYDYEARTEDDLSFHKGEKFQILNSSEGDWWEARSLTTGETGYIPSNYVAPVD 1srm 78% VTTFVALYDYESRTETDLSFKKGERLQIVNNTEGDWWLAHSLTTGQTGYIPSNYVAPSD 1sem 39% ....VAEHDFQAGSPDELSFKRGNTLKVLNKDEDPHWYKAEL.DGNEGFIPSNYIRMTE 8 Comparative modeling PDB U (sequence) significant sequence identity H • assumption: H and U homolgous 3D structures • strategy: modelling of U based on H 9 Sequence conservation of protein structure Conservation of protein structure Related structures Unrelated structures B Rost 1999 Prot Engin 12, 85-94 B Rost 1999 Prot Engin 12, 85-94 © Burkhard Rost (TU Munich) 436 /128 HSSP DistanceDistancecurve fromfrom newnew HSSP-curve HSSP-curve B Rost 1999 Prot Engin 12, 85-94 B Rost 1999 Prot Engin 12, 85-94 © Burkhard Rost (TU Munich) /128 B Rost 1999 Prot Engin 12, 85-94 © Burkhard Rost (TU Munich) /128 Zones f 12 Protein structure (prediction) in numbers Estimate for 1999 Number of entries 05/2012 Uniprot/TREMBL: 21,552,793 3D Uniprot/Swissprot: 535,698 PDB: 81,369 unique sequences 46,166 1D HoMo FoRc Total 1 member UniRef 100 16,973,591 15,225,290 UniRef 90 10,657,899 8,261,540 UniRef 50 4,927,947 3,136,923 13 Bridging potential vs. sequence identity 30 . 25 20 15 10 5 0 98 90 Percentage of proteins in SWISS-PROT 82 74 66 58 50 42 34 Percentage of pairwise sequence identity 14 SequenceTwilight conservation of protein structurezone = false positives explode!! Percentage sequence identity 10 15 20 25 30 35 6 10 B Rost 1999100 Prot Engin 12, 85-94 © Burkhard Rost (TU Munich) 436 /128 5 Number of residues aligned residues of Number y t i 80 t 10 n e d i 60 e c 4 n e 40 u q 10 e S 20 % 3 0 50 100 150 200 250 300 350 400 10 - 20 - - 15 - - 10 - - 5 - 2 0 + 5 + +10 +15 +20 +25 10 Numberof protein pairs 1 10 -15 -10 -5 0 5 10 Related structures Distance from HSSP threshold Unrelated structures B Rost 1999 Prot Engin 12, 85-94 15 Evolution into the Midnight zone 1600 B Rost 1997 Folding & Design 2, S19-S24 1200 800 Numberof structurepairs 400 0 0 0 5 10 15 20 25 25 50 75 100 Percentage pairwise sequence identity Protein structure (prediction) in numbers Estimate for 1999 3D 1D HoMo FoRc 17 Protein structure (prediction) in numbers Estimate for 1999 3D 1D HoMo FoRc 18 Protein structure prediction in reality Estimate for 1999 SWISS-PROT view Genome view HoMo 3D 1D FoRc 1D HoMo ….the art of FoRc being humble 19 Improving prediction by waiting it out … 1999 1995 1991 Jinfeng Liu • 1995-2003 MS Rutgers Univ. • 1998-2004 PhD Columbia Univ. Pharmacology PhD with 16 publications! • 2004-2007 Sr. Research Assistant Columbia Univ. Biochemistry & Molecular Biophysics • 2007-now Genentech, CA Jinfeng Liu 21 Homology modeling for entire genomes 0 5,000 10,000 15,000 20,000 H sapiens(chr. 22) H sapiens D melanogaster C elegans S cerevisiae U urealyticum T pallidum S PCC6803 R prowazekii N meningitidis M tuberculosis M pneumoniae M genitalium H pylori H influenzae E coli C trachomatis Organism C pneumoniae C jejuni B burgdorferi B subtilis A aeolicus T maritima D radiodurans P horikoshii Number of ORFs P abyssi M thermoautotrophicu Number of ORFs with PDB hit M jannaschii A fulgidus A pernix 0 5,000 10,000 15,000 20,000 Number of proteins 22 Homology for protein universe 0 10 20 30 40 50 60 H sapiens(chr. 22) H sapiens D melanogaster C elegans S cerevisiae U urealyticum T pallidum S PCC6803 R prowazekii N meningitidis M tuberculosis M pneumoniae M genitalium H pylori H influenzae E coli C trachomatis Organism C pneumoniae C jejuni B burgdorferi B subtilis A aeolicus T maritima D radiodurans P horikoshii P abyssi M thermoautotrophicu M jannaschii A fulgidus A pernix 0 10 20 30 40 50 60 Percentage of all ORFs in genome 23 Comparative modeling: terms • Comparative modeling vs. Homology modeling • Target: protein to model Template: protein to model from 24 Comparative modeling: steps • Identify template(s) through database search – PSI-Blast / HHblits • Align target/template • Build model • Assess model • (refine) 25 Extending modelling reach: threading Percentage of pair- Accuracy of automatic wise identical residues fold recognition 100% • correct first hit: Region of ! 20-30% homology modelling (sequence alignment • alignment correct to some extent: suffice) ! 10-25% • remote homology modelling (3D) correct: 25% < 10% Fold recognition 0% 26 Comparative modelling: quality quality Comparativemodelling: 27 Percentage of Percentage identical residues identical pairwise 100% 50% 75% 25% 0% in homology
Recommended publications
  • AI and Bioinformatics
    AI Magazine Volume 25 Number 1 (2004) (© AAAI) Articles Editorial Introduction AI and Bioinformatics Janice Glasgow, Igor Jurisica, and Burkhard Rost ■ This article is an editorial introduction to the re- modern-day biology is far more complex than search discipline of bioinformatics and to the articles suggested by the simplified sketch presented in this special issue. In particular, we address the issue here. In fact, researchers in life sciences live off of how techniques from AI can be applied to many of the introduction of new concepts; the discov- the open and complex problems of modern-day mol- ecular biology. ery of exceptions; and the addition of details that usually complicate, rather than simplify, his special issue of AI Magazine focuses the overall understanding of the field. on some areas of research in bioinfor- Possibly the most rapidly growing area of re- Tmatics that have benefited from applying cent activity in bioinformatics is the analysis AI techniques. Undoubtedly, bioinformatics is of microarray data. The article by Michael Mol- a truly interdisciplinary field: Although some la, Michael Waddell, David Page, and Jude researchers continuously affect wet labs in life Shavlik (“Using Machine Learning to Design science through collaborations or provision of and Interpret Gene-Expression Microarrays”) tools, others are rooted in the theory depart- introduces some background information and ments of exact sciences (physics, chemistry, or provides a comprehensive description of how engineering) or computer sciences. This wide techniques from machine learning can be used variety creates many different perspectives and to help understand this high-dimensional and terminologies. One result of this Babel of lan- prolific gene-expression data.
    [Show full text]
  • Are You an Invited Speaker? a Bibliometric Analysis of Elite Groups for Scholarly Events in Bioinformatics
    Are You an Invited Speaker? A Bibliometric Analysis of Elite Groups for Scholarly Events in Bioinformatics Senator Jeong, Sungin Lee, and Hong-Gee Kim Biomedical Knowledge Engineering Laboratory, Seoul National University, 28–22 YeonGeon Dong, Jongno Gu, Seoul 110–749, Korea. E-mail: {senator, sunginlee, hgkim}@snu.ac.kr Participating in scholarly events (e.g., conferences, work- evaluation, but it would be hard to claim that they have pro- shops, etc.) as an elite-group member such as an orga- vided comprehensive lists of evaluation measurements. This nizing committee chair or member, program committee article aims not to provide such lists but to add to the current chair or member, session chair, invited speaker, or award winner is beneficial to a researcher’s career develop- practices an alternative metric that complements existing per- ment.The objective of this study is to investigate whether formance measures to give a more comprehensive picture of elite-group membership for scholarly events is represen- scholars’ performance. tative of scholars’ prominence, and which elite group is By one definition (Jeong, 2008), a scholarly event is the most prestigious. We collected data about 15 global “a sequentially and spatially organized collection of schol- (excluding regional) bioinformatics scholarly events held in 2007. We sampled (via stratified random sampling) ars’ interactions with the intention of delivering and shar- participants from elite groups in each event. Then, bib- ing knowledge, exchanging research ideas, and performing liometric indicators (total citations and h index) of seven related activities.” As such, scholarly events are communica- elite groups and a non-elite group, consisting of authors tion channels from which our new evaluation tool can draw who submitted at least one paper to an event but were its supporting evidence.
    [Show full text]
  • Dear Delegates,History of Productive Scientific Discussions of New Challenging Ideas and Participants Contributing from a Wide Range of Interdisciplinary fields
    3rd IS CB S t u d ent Co u ncil S ymp os ium Welcome To The 3rd ISCB Student Council Symposium! Welcome to the Student Council Symposium 3 (SCS3) in Vienna. The ISCB Student Council's mis- sion is to develop the next generation of computa- tional biologists. We would like to thank and ac- knowledge our sponsors and the ISCB organisers for their crucial support. The SCS3 provides an ex- citing environment for active scientific discussions and the opportunity to learn vital soft skills for a successful scientific career. In addition, the SCS3 is the biggest international event targeted to students in the field of Computational Biology. We would like to thank our hosts and participants for making this event educative and fun at the same time. Student Council meetings have had a rich Dear Delegates,history of productive scientific discussions of new challenging ideas and participants contributing from a wide range of interdisciplinary fields. Such meet- We are very happy to welcomeings have you proved all touseful the in ISCBproviding Student students Council and postdocs Symposium innovative inputsin Vienna. and an Afterincreased the network suc- cessful symposiums at ECCBof potential 2005 collaborators. in Madrid and at ISMB 2006 in Fortaleza we are determined to con- tinue our efforts to provide an event for students and young researchers in the Computational Biology community. Like in previousWe ar yearse extremely our excitedintention to have is toyou crhereatee and an the opportunity vibrant city of Vforienna students welcomes to you meet to our their SCS3 event. peers from all over the world for exchange of ideas and networking.
    [Show full text]
  • Improved Prediction of Protein Secondary Structure by Use Of
    Proc. Natl. Acad. Sci. USA Vol. 90, pp. 7558-7562, August 1993 Biophysics Improved prediction of protein secondary structure by use of sequence profiles and neural networks (protein structure prediction/multiple sequence alinment) BURKHARD ROST AND CHRIS SANDER Protein Design Group, European Molecular Biology Laboratory, D-6900 Heidelberg, Germany Communicated by Harold A. Scheraga, April 5, 1993 ABSTRACT The explosive accumulation of protein se- test set (7-fold cross-validation). The use of multiple cross- quences in the wake of large-scale sequencing projects is in validation is an important technical detail in assessing per- stark contrast to the much slower experimental determination formance, as accuracy can vary considerably, depending of protein structures. Improved methods of structure predic- upon which set of proteins is chosen as the test set. For tion from the gene sequence alone are therefore needed. Here, example, Salzberg and Cost (3) point out that the accuracy of we report a subsantil increase in both the accuracy and 71.0% for the initial choice of test set drops to 65.1% quality of secondary-structure predictions, using a neural- "sustained" performance when multiple cross-validation is network algorithm. The main improvements come from the use applied-i.e., when the results are averaged over several ofmultiple sequence alignments (better overall accuracy), from different test sets. We suggest the term sustained perfor- "balanced tninhg" (better prediction of«-strands), and from mance for results that have been multiply cross-validated. "structure context training" (better prediction of helix and The importance of multiple cross-validation is underscored strand lengths). This method, cross-validated on seven differ- by the difference in accuracy of up to six percentage points ent test sets purged of sequence similarity to learning sets, between two test sets for the reference network (58.3- achieves a three-state prediction accuracy of 69.7%, signi- 63.8%).
    [Show full text]
  • ISMB/ECCB 2007: the Premier Conference on Computational Biology Thomas Lengauer, B
    MESSAGE FROM ISCB ISMB/ECCB 2007: The Premier Conference on Computational Biology Thomas Lengauer, B. J. Morrison McKay*, Burkhard Rost Two Societies Meet in ways to specifically encourage increased participation from The ISMB conference series was previously underrepresented kicked off in 1993 by the vision of David disciplines of computational biology. Searls (GlaxoSmithKline), Jude Shavlik The major challenge for this (University of Wisconsin Madison), and interdisciplinary field is that two Larry Hunter (University of Colorado). cultures with very different ways of A few years down the road, ISMB had publishing intersect at computational established itself as a primary event in biology meetings such as ISMB/ECCB: computational biology and triggered computational scientists publish their Introduction the founding of ISCB, the International most important results in rigorously Society for Computational Biology reviewed proceedings of meetings; the he International Society for (http://www.iscb.org). ISCB has been lower the ratio between accepted/ Computational Biology (ISCB) organizing the ISMB conference series submitted, the more valued the presents ISMB/ECCB 2007, the since 1998. While ISCB evolved into the T publication. In many cases, publication Fifteenth International Conference on only society representing in proceedings of conferences on Intelligent Systems for Molecular Biology computational biology globally, its computer science are valued more (ISMB 2007), held jointly with the Sixth flagship conference has become the highly than those in peer-reviewed European Conference on Computational largest annual forum focused on scientific journals. In contrast, life Biology (ECCB 2007) in Vienna, Austria, computational biology worldwide scientists publish their best work in July 21–25, 2007 (http://www.iscb.org/ (Table 1).
    [Show full text]
  • A Novel Approach for Predicting Protein Functions by Transferring Annotation Via Alignment Networks Warith Djeddi, Sadok Ben Yahia, Engelbert Mephu Nguifo
    A novel approach for predicting protein functions by transferring annotation via alignment networks Warith Djeddi, Sadok Ben Yahia, Engelbert Mephu Nguifo To cite this version: Warith Djeddi, Sadok Ben Yahia, Engelbert Mephu Nguifo. A novel approach for predicting protein functions by transferring annotation via alignment networks. 2019. hal-02070419 HAL Id: hal-02070419 https://hal.archives-ouvertes.fr/hal-02070419 Preprint submitted on 17 Mar 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A novel approach for predicting protein functions by transferring annotation via alignment networks Warith Eddine Djeddi1, Sadok Ben Yahia1;2∗ and Engelbert Mephu Nguifo3∗ 1University of Tunis El Manar, Faculty of Sciences of Tunis, LR11ES14, Capmus Universitaire 2092, Tunis, Tunisia 2Tallinn University of Technology, Department of Software Science, Akadeemia tee 15a, 12618 Tallinn, Estonia and 3 University Clermont Auvergne, CNRS, LIMOS, F-63000 CLERMONT-FERRAND, FRANCE ∗Corresponding author: [email protected], [email protected] Abstract One of the challenges of the post-genomic era is to provide accurate function annotations for orphan and unannotated protein sequences. With the recent availability of huge protein-protein interactions for many model species, it becomes an opportunity to computational methods to elucidate protein func- tion based on many strategies.
    [Show full text]
  • Basel Computational Biology Conference. from Information to Simulation
    Information Extraction in Molecular Biology and Biomedicine Basel Computational Biology Conference. From Information to Simulation Basel, 18-19 March 2004 Alfonso Valencia, CNB - CSIC BC2, 2004 Alfonso Valencia CNB-CSIC Predicted networks Proteomics literature Functional BC2, 2004 Alfonso Valencia CNB-CSIC Genomics Information Extraction in Molecular Biology IE starts in biology [1] Keyword retrieval systems [2] Detection of protein names [3] Detection of protein-protein interactions [4] IE meets experiments [5] 1997 1998 1999 2000 2001 [1] Ohta et al. (1997). "Automatic construction of Knowledge Bases form Biological Papers" [2] Andrade and Valencia (1997). "Automatic annotation for biological sequences by extraction of keywords from MEDLINE abstracts" [3] Fukuda et al. (1998). "Information Extraction: Identifying Protein Names from Biological Papers" Proux et al. (1998). "Detecting Gene Symbols and Names in Biological Texts: a first step ..." [4] Blaschke et al. (1999). "Automatic Extraction of Biological Information ...: Protein-Protein Interactions" Park et al. (2001). "Incremental Parsing for Automatic Pathway Identification with Combinatorial Categorical Grammar" Proux et al. (2000). "... Information Extraction Strategy for gathering Data on Genetic Interactions" Rindflesch et al. (2000). "EDGAR: Extraction of Drugs, Genes and Relations from the Biomedical Literature" Sekimizu et al. (1998). "Identifying the Interaction between Genes and Gene Products based on frequently seen Verbs in Medline Abstracts" Thomas et al. (2000). "Automatic
    [Show full text]
  • Janga-Phd-Thesis.Pdf (PDF, 9Mb)
    Exploiting network-based approaches for understanding gene regulation and function Sarath Chandra Janga A dissertation submitted to the University of Cambridge in candidature for the degree of Doctorate of Philosophy April 2010 Darwin College, University of Cambridge MRC Laboratory of Molecular Biology Cambridge, United Kingdom Previous page: A portrait of the transcriptional regulatory network of the budding yeast, Saccharomyces Cerevisiae. Each circle represents the network of transcriptional interconnections between all other chromosomes to one of the chromosomes. Evidently all chromosomes are transcriptionally controlled by factors encoded on many of the 16 chromosomes in this organism marked by the letters ‘a’ through ‘p’. iii Declaration of originality This dissertation describes work I carried out at the Medical Research Council Laboratory of Molecular Biology in Cambridge between January 2008 and April 2010. The contents are my original work, although much has been influenced by the collaborations in which I took part. I have not submitted the work in this dissertation for any other degree or qualification at any other university. Sarath Chandra Janga April, 2010 Cambridge, United Kingdom iv Acknowledgements First of all I would like to express my gratitude to Dr. Madan Babu with out whose continuous support all along my doctoral work, it would have just remained a dream for me to carry out my thesis work at MRC Laboratory of Molecular Biology. Madan has not only been an excellent supervisor but a good friend who was always supportive of my research interests, by allowing me to work independently on a wide range of problems during my stay here. He has been a source of great inspiration on various occasions and a great scientific colleague to work with.
    [Show full text]
  • In Systems Physiology
    IN SYSTEMS PHYSIOLOGY Interpretation of the consequences of mutations in protein kinases: combined use of bioinformatics and text mining Jose M.G. Izarzugaza, Martin Krallinger and Alfonso Valencia Journal Name: Frontiers in Physiology ISSN: 1664-042X Article type: Review Article Received on: 23 May 2012 Accepted on: 23 Jul 2012 Provisional PDF published on: 23 Jul 2012 Frontiers website link: www.frontiersin.org Citation: Izarzugaza JM, Krallinger M and Valencia A(2012) Interpretation of the consequences of mutations in protein kinases: combined use of bioinformatics and text mining. Front. Physio. 3:323. doi:10.3389/fphys.2012.00323 Article URL: http://www.frontiersin.org/Journal/Abstract.aspx?s=1092& name=systems%20physiology&ART_DOI=10.3389/fphys.2012.00323 (If clicking on the link doesn't work, try copying and pasting it into your browser.) Copyright statement: © 2012 Izarzugaza, Krallinger and Valencia. This is an open‐access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc. This Provisional PDF corresponds to the article as it appeared upon acceptance, after rigorous peer-review. Fully formatted PDF and full text (HTML) versions will be made available soon. Interpretation of the consequences of mutations in protein kinases: combined use of bioinformatics and text mining Jose MG Izarzugaza, Martin Krallinger and Alfonso Valencia Abstract Protein kinases play a crucial role in a plethora of significant physiological functions and a number of mutations in this superfamily have been reported in the literature to disrupt protein structure and/or function.
    [Show full text]
  • CV Burkhard Rost
    Burkhard Rost CV BURKHARD ROST TUM Informatics/Bioinformatics i12 Boltzmannstrasse 3 (Rm 01.09.052) 85748 Garching/München, Germany & Dept. Biochemistry & Molecular Biophysics Columbia University New York, USA Email [email protected] Tel +49-89-289-17-811 Photo: © Eckert & Heddergott, TUM Web www.rostlab.org Fax +49-89-289-19-414 Document: CV Burkhard Rost TU München Affiliation: Columbia University TOC: • Tabulated curriculum vitae • Grants • List of publications Highlights by numbers: • 186 invited talks in 29 countries (incl. TedX) • 250 publications (187 peer-review, 168 first/last author) • Google Scholar 2016/01: 30,502 citations, h-index=80, i10=179 • PredictProtein 1st Internet server in mol. biol. (since 1992) • 8 years ISCB President (International Society for Computational Biology) • 143 trained (29% female, 50% foreigners from 32 nations on 6 continents) Brief narrative: Burkhard Rost obtained his doctoral degree (Dr. rer. nat.) from the Univ. of Heidelberg (Germany) in the field of theoretical physics. He began his research working on the thermo-dynamical properties of spin glasses and brain-like artificial neural networks. A short project on peace/arms control research sketched a simple, non-intrusive sensor networks to monitor aircraft (1988-1990). He entered the field of molecular biology at the European Molecular Biology Laboratory (EMBL, Heidelberg, Germany, 1990-1995), spent a year at the European Bioinformatics Institute (EBI, Hinxton, Cambridgshire, England, 1995), returned to the EMBL (1996-1998), joined the company LION Biosciences for a brief interim (1998), became faculty in the Medical School of Columbia University in 1998, and joined the TUM Munich to become an Alexander von Humboldt professor in 2009.
    [Show full text]
  • Fast and Free Software for Protein Contact Prediction from Residue Co-Evolution
    FreeContact: fast and free software for protein contact prediction from residue co-evolution The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Kaján, László, Thomas A Hopf, Matúš Kalaš, Debora S Marks, and Burkhard Rost. 2014. “FreeContact: fast and free software for protein contact prediction from residue co-evolution.” BMC Bioinformatics 15 (1): 85. doi:10.1186/1471-2105-15-85. http:// dx.doi.org/10.1186/1471-2105-15-85. Published Version doi:10.1186/1471-2105-15-85 Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:12152953 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Kaján et al. BMC Bioinformatics 2014, 15:85 http://www.biomedcentral.com/1471-2105/15/85 SOFTWARE Open Access FreeContact: fast and free software for protein contact prediction from residue co-evolution László Kaján1, Thomas A Hopf1,2, Matúš Kalaš3,4, Debora S Marks2 and Burkhard Rost1,5,6* Abstract Background: 20 years of improved technology and growing sequences now renders residue-residue contact constraints in large protein families through correlated mutations accurate enough to drive de novo predictions of protein three-dimensional structure. The method EVfold broke new ground using mean-field Direct Coupling Analysis (EVfold-mfDCA); the method PSICOV applied a related concept by estimating a sparse inverse covariance matrix. Both methods (EVfold-mfDCA and PSICOV) are publicly available, but both require too much CPU time for interactive applications.
    [Show full text]
  • Ensemble of Bidirectional Recurrent Networks and Random Forests for Protein Secondary Structure Prediction
    Ensemble of Bidirectional Recurrent Networks and Random Forests for Protein Secondary Structure Prediction Gabriel Bianchin de Oliveira, Helio Pedrini, Zanoni Dias Institute of Computing, University of Campinas Campinas, SP, Brazil July 2nd, 2020 1 / 22 Agenda 1 Introduction 2 Background 3 Protein Secondary Structure Prediction Method 4 Experiments 5 Conclusions 2 / 22 Introduction Proteins are fundamental in biological processes. Chemical and physical interactions of attraction and repulsion between amino form 3-D structures. Analysis of protein secondary structure is crucial to develop possible applications. Cost of determining protein structures by laboratory methods is high. Global and local analyses can be used to predicting secondary structures. We present a method that has good results when it uses only the amino acid sequence and competitive results when it uses amino acid sequence information and protein sequence similarity. 3 / 22 Background Initially, the methods used statistical concepts. In 1980s, 1990s and 2000s, machine learning approaches emerged using sliding window. After the high accuracy achieved in Q3, a finer subclassification was created, called Q8. With deep learning, recurrent networks and convolutional networks gained space. 4 / 22 Local Classifiers Window Size 3 Window Size 5 Window Size 7 Window Size 9 Window Size 11 Ensemble of local classifiers 5 / 22 Global Classifiers Concat Concat First layer First layer First Amino acids in normal way Amino acids in inverted way Prediction made by analyzing the amino acids in the standard direction and inverted direction 6 / 22 Global Classifiers 2 Layer Bidirectional Network 3 Layer Bidirectional Network 4 Layer Bidirectional Network 5 Layer Bidirectional Network 6 Layer Bidirectional Network Ensemble of global classifiers 7 / 22 Genetic Algorithm Creation of an initial population with 2,000 individuals.
    [Show full text]