Comparative Modelling

Protein Prediction Part 1: Structure Comparative modelling Andrea Schafferhans (built on slides from Burkhard Rost) 2012/05/10 Using structure to predictF. Glaser function et al. T.P. is supported by a JSPS fellowship. E.M. is supported by the Division of Undergraduate Education of the US National Science Foundation. We are grateful to the Bioinformatics Unit and the George S. Wise Faculty of Life Sciences at Tel Aviv University for providing technical assistance and computation facilities. Downloaded from REFERENCES Adams,J.M. and Cory,S. (1998) The Bcl-2 protein family: arbiters of cell survival. Science, 281, 1322–1326. Aloy,P., Querol,E., Aviles,F.X. and Sternberg,M.J. (2001) Auto- mated structure-based prediction of functional sites in proteins: http://bioinformatics.oxfordjournals.org/ applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein dock- ing. J. Mol. Biol., 311, 395–408. Armon,A., Graur,D. and Ben-Tal,N. (2001) ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J. Mol. Biol., 307, 447–463. Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Chymotrypsin Subtilin Data Bank. Nucleic Acids Res., 28, 235–242. (5cha) (5sic) Brenner,S.E. (2001) A tour of structural genomics. Nat. Rev. Genet., 2, 801–809. at Universitatsbibliothek der Technischen Universitaet Muenchen Zweigbibliothe on May 9, 2012 Innis,C.A., Shi,J. and Blundell,T.L. (2000) Evolutionary trace analysis of TGF-beta and related growth factors: implications for site-directed mutagenesis. Protein Eng., 13, 839–847. Landgraf,R., Xenarios,I. and Eisenberg,D. (2001) Three- dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. J. Mol. Biol., 307, 1487–1502. Fig. 1. Conservation pattern in the Bcl-XL/Bak complex (PDB ID: Lichtarge,O., Bourne,H.R. and Cohen,F.E. (1996) An evolutionary 1bx1).ConSurf The Bcl-XL protein is represented as a spacefill model, where trace method defines binding surfaces common to protein the residue conservation scores are color-coded onto its Van der families. J. Mol. Biol., 257, 342–358. Waals surface. The Bak peptide (residues 72–87) is shown as a Martz,E. (2002) Protein explorer: easy yet powerful macromolecu- yellowF backbone Glaser, model. The T color-coding Pupko bar shows, I the Paz, coloring RElar visualization. Trends Biochem. Sci, 27, 107–109. scheme; conserved amino acids are colored bordeaux, residues Pupko,T., Bell,R.E., Mayrose,I., Glaser,F. and Ben-Tal,N. (2002) of averageBell, conservation D Bechor-Shental are white, and variable amino acids, are E Rate4Site: An algorithmic tool for the identification of functional turquoise. regions on proteins by surface mapping of evolutionary determi- Martz and N Ben-Tal (2003)nants within their homologues. Bioinformatics, 18 (suppl), S71– S77. is known to be required for anti-apoptotic activity and Sattler,M., Liang,H., Nettesheim,D., Meadows,R.P., Harlan,J.E., mayBioinformatics play a role in the interaction with19 CED-4 163-4 (Adams Eberstadt,M., Yoon,H.S., Shuker,S.B., Chang,B.S., Minn,A.J., and Cory, 1998). Given the same MSA as an input, Thompson,C.B. and Fesik,S.W. (1997) Structure of Bcl-xL-Bak two other web-servers based on the Evolutionary Trace peptide complex: recognition between regulators of apoptosis. method (Innis et al., 2000; Lichtarge et al., 1996) and a Science, 275, 983–986. consensus approach (Martz, 2002) failed to identify one Schneider,R. and Sander,C. (1996) The HSSP database of protein 2 or both of these patches. (See http://consurf.tau.ac.il under structure-sequence alignments. Nucleic Acids Res., 24, 201–205. ‘OVERVIEW’ for details). Valdar,W.S. and Thornton,J.M. (2001) Protein-protein interfaces: analysis of amino acid conservation in homodimers. Proteins, 42, 108–124. ACKNOWLEDGMENTS Yona,G., Linial,N. and Linial,M. (2000) ProtoMap: automatic This study was supported by a Research Career Devel- classification of protein sequences and hierarchy of protein opment Award from the Israel Cancer Research Fund. families. Nucleic Acids Res., 28, 49–55. 164 Universe of protein structures C h r i s ti n e O r e n g o (S tr u c tu r e s , 1 9 9 7 , 5 , 1 0 9 3 -1 1 0 8 ) C h r i s ti n e O r e n g o (S tr u c tu r e s , 1 9 9 7 , 5 , 1 0 9 3 -1 1 0 8 ) Christine Orengo et al. 1997 Structures 5 1093-1108 3 Protein structure (prediction) in numbers Estimate for 1999 Number of entries 05/2012 3D Uniprot/TREMBL: 21,552,793 Uniprot/Swissprot: 535,698 PDB: 81,369 1D HoMo FoRc 4 Redundancy in the PDB Numberproteins of inPDB 5 Protein structure (prediction) in numbers Estimate for 1999 Number of entries 05/2012 3D Uniprot/TREMBL: 21,552,793 Uniprot/Swissprot: 535,698 PDB: 81,369 1D HoMo unique sequences 46,166 FoRc 6 Goal of structure prediction Epstein & Anfinsen, 1961: sequence uniquely determines structure INPUT: sequence OUTPUT: 3D structure and function 7 Structure and sequence similarity WHAT IF EEEE B B B B EEEEEE EEEEEE EEEEEEEEHHHEEE 1shf 100% VTLFVALYDYEARTEDDLSFHKGEKFQILNSSEGDWWEARSLTTGETGYIPSNYVAPVD 1srm 78% VTTFVALYDYESRTETDLSFKKGERLQIVNNTEGDWWLAHSLTTGQTGYIPSNYVAPSD 1sem 39% ....VAEHDFQAGSPDELSFKRGNTLKVLNKDEDPHWYKAEL.DGNEGFIPSNYIRMTE 8 Comparative modeling PDB U (sequence) significant sequence identity H • assumption: H and U homolgous 3D structures • strategy: modelling of U based on H 9 Sequence conservation of protein structure Conservation of protein structure Related structures Unrelated structures B Rost 1999 Prot Engin 12, 85-94 B Rost 1999 Prot Engin 12, 85-94 © Burkhard Rost (TU Munich) 436 /128 HSSP DistanceDistancecurve fromfrom newnew HSSP-curve HSSP-curve B Rost 1999 Prot Engin 12, 85-94 B Rost 1999 Prot Engin 12, 85-94 © Burkhard Rost (TU Munich) /128 B Rost 1999 Prot Engin 12, 85-94 © Burkhard Rost (TU Munich) /128 Zones f 12 Protein structure (prediction) in numbers Estimate for 1999 Number of entries 05/2012 Uniprot/TREMBL: 21,552,793 3D Uniprot/Swissprot: 535,698 PDB: 81,369 unique sequences 46,166 1D HoMo FoRc Total 1 member UniRef 100 16,973,591 15,225,290 UniRef 90 10,657,899 8,261,540 UniRef 50 4,927,947 3,136,923 13 Bridging potential vs. sequence identity 30 . 25 20 15 10 5 0 98 90 Percentage of proteins in SWISS-PROT 82 74 66 58 50 42 34 Percentage of pairwise sequence identity 14 SequenceTwilight conservation of protein structurezone = false positives explode!! Percentage sequence identity 10 15 20 25 30 35 6 10 B Rost 1999100 Prot Engin 12, 85-94 © Burkhard Rost (TU Munich) 436 /128 5 Number of residues aligned residues of Number y t i 80 t 10 n e d i 60 e c 4 n e 40 u q 10 e S 20 % 3 0 50 100 150 200 250 300 350 400 10 - 20 - - 15 - - 10 - - 5 - 2 0 + 5 + +10 +15 +20 +25 10 Numberof protein pairs 1 10 -15 -10 -5 0 5 10 Related structures Distance from HSSP threshold Unrelated structures B Rost 1999 Prot Engin 12, 85-94 15 Evolution into the Midnight zone 1600 B Rost 1997 Folding & Design 2, S19-S24 1200 800 Numberof structurepairs 400 0 0 0 5 10 15 20 25 25 50 75 100 Percentage pairwise sequence identity Protein structure (prediction) in numbers Estimate for 1999 3D 1D HoMo FoRc 17 Protein structure (prediction) in numbers Estimate for 1999 3D 1D HoMo FoRc 18 Protein structure prediction in reality Estimate for 1999 SWISS-PROT view Genome view HoMo 3D 1D FoRc 1D HoMo ….the art of FoRc being humble 19 Improving prediction by waiting it out … 1999 1995 1991 Jinfeng Liu • 1995-2003 MS Rutgers Univ. • 1998-2004 PhD Columbia Univ. Pharmacology PhD with 16 publications! • 2004-2007 Sr. Research Assistant Columbia Univ. Biochemistry & Molecular Biophysics • 2007-now Genentech, CA Jinfeng Liu 21 Homology modeling for entire genomes 0 5,000 10,000 15,000 20,000 H sapiens(chr. 22) H sapiens D melanogaster C elegans S cerevisiae U urealyticum T pallidum S PCC6803 R prowazekii N meningitidis M tuberculosis M pneumoniae M genitalium H pylori H influenzae E coli C trachomatis Organism C pneumoniae C jejuni B burgdorferi B subtilis A aeolicus T maritima D radiodurans P horikoshii Number of ORFs P abyssi M thermoautotrophicu Number of ORFs with PDB hit M jannaschii A fulgidus A pernix 0 5,000 10,000 15,000 20,000 Number of proteins 22 Homology for protein universe 0 10 20 30 40 50 60 H sapiens(chr. 22) H sapiens D melanogaster C elegans S cerevisiae U urealyticum T pallidum S PCC6803 R prowazekii N meningitidis M tuberculosis M pneumoniae M genitalium H pylori H influenzae E coli C trachomatis Organism C pneumoniae C jejuni B burgdorferi B subtilis A aeolicus T maritima D radiodurans P horikoshii P abyssi M thermoautotrophicu M jannaschii A fulgidus A pernix 0 10 20 30 40 50 60 Percentage of all ORFs in genome 23 Comparative modeling: terms • Comparative modeling vs. Homology modeling • Target: protein to model Template: protein to model from 24 Comparative modeling: steps • Identify template(s) through database search – PSI-Blast / HHblits • Align target/template • Build model • Assess model • (refine) 25 Extending modelling reach: threading Percentage of pair- Accuracy of automatic wise identical residues fold recognition 100% • correct first hit: Region of ! 20-30% homology modelling (sequence alignment • alignment correct to some extent: suffice) ! 10-25% • remote homology modelling (3D) correct: 25% < 10% Fold recognition 0% 26 Comparative modelling: quality quality Comparativemodelling: 27 Percentage of Percentage identical residues identical pairwise 100% 50% 75% 25% 0% in homology

Comparative Modelling

AI and Bioinformatics

Are You an Invited Speaker? a Bibliometric Analysis of Elite Groups for Scholarly Events in Bioinformatics

Dear Delegates,History of Productive Scientiﬁc Discussions of New Challenging Ideas and Participants Contributing from a Wide Range of Interdisciplinary ﬁelds

Improved Prediction of Protein Secondary Structure by Use Of

ISMB/ECCB 2007: the Premier Conference on Computational Biology Thomas Lengauer, B

A Novel Approach for Predicting Protein Functions by Transferring Annotation Via Alignment Networks Warith Djeddi, Sadok Ben Yahia, Engelbert Mephu Nguifo

Basel Computational Biology Conference. from Information to Simulation

Janga-Phd-Thesis.Pdf (PDF, 9Mb)

In Systems Physiology

CV Burkhard Rost

Fast and Free Software for Protein Contact Prediction from Residue Co-Evolution

Ensemble of Bidirectional Recurrent Networks and Random Forests for Protein Secondary Structure Prediction