Homology Modeling Lausanne, February 22, 2007

Swiss Institute of Bioinformatics EMBnet course: Introduction to Protein Structure Bioinformatics Homology Modeling Lausanne, February 22, 2007 Torsten Schwede Biozentrum - Universität Basel Swiss Institute of Bioinformatics Klingelbergstr 50-70 CH - 4056 Basel, Switzerland Tel: +41-61 267 15 81 How many structures do we know? http://www.wwpdb.org/ How many structures do we know? Growth of the Protein Data Bank PDB Total Yearly [ PDB: http://www.pdb.org ] [ PDB: http://www.pdb.org ] How many structures do we know? 10,000,000 1,000,000 Î No experimental structure for most protein sequences 100,000 10,000 TrEMBL 1,000 SwissProt PDB 100 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 (Sources: PDB, EBI, SIB) In the near future for most of the known protein sequences no experimental structure will be available. Can we predict protein structures from genome sequences? MNIFEMLRID EGLRLKIYKD TEGYYTIGIG HLLTKSPSLN AAKSELDKAI GRNCNGVITK DEAEKLFNQD VDAAVRGILR NAKLKPVYDS LDAVRRCALI NMVFQMGETG VAGFTNSLRM LQQKRWDEAA VNLAKSRWYN QTPNRAKRVI TTFRTGTWDA YKNL MNIFEMLRID EGLRLKIYKD TEGYYTIGIG HLLTKSPSLN AAKSELDKAI GRNCNGVITK DEAEKLFNQD VDAAVRGILR NAKLKPVYDS LDAVRRCALI NMVFQMGETG VAGFTNSLRM LQQKRWDEAA VNLAKSRWYN QTPNRAKRVI TTFRTGTWDA YKNL Many proteins fold spontaneously to their native structure Protein folding is relatively fast (nsec – sec) Chaperones speed up folding, but do not alter the structure The protein sequence contains all information needed to create a correctly folded protein. Can we predict the folding process of a protein structure from their sequences (ab initio)? Molecular Dynamics 2 ki ν = ∑ ()li − li,0 bonds 2 2 ki + ∑ ()θi −θi,0 angles 2 V + ∑ N ()1+ cos()nω −γ torsions 2 12 6 N N ⎛ ⎡ ⎤ ⎞ ⎜ ⎛σ ij ⎞ ⎛σ ij ⎞ qiq j ⎟ + 4πε ⎢⎜ ⎟ − ⎜ ⎟ ⎥ + ∑∑⎜ ij ⎜ ⎟ ⎜ ⎟ ⎟ i=+11j=i ⎜ ⎢ r r ⎥ 4πε r ⎟ ⎝ ⎣⎝ ij ⎠ ⎝ ij ⎠ ⎦ 0 ij ⎠ Ab initio protein folding simulation Physical time for simulation 10–4 seconds Typical time-step size 10–15 seconds Number of MD time steps 1011 Atoms in a typical protein and water simulation 32’000 Approximate number of interactions in force calculation 109 Machine instructions per force calculation 1000 Total number of machine instructions 1023 Petaflop capacity computer (floating point operations per second) 1 petaflop (1015) Î Blue Gene will need 1-3 years to simulate 100 μsec. [ http://www.research.ibm.com/bluegene/ ] Growth of the Protein Data Bank PDB “Old” folds per year New folds per year [ PDB: http://www.pdb.org ] CATH - Protein Structure Classification Class(C) derived from secondary structure content is assigned automatically Architecture(A) describes the gross orientation of secondary structures, independent of connectivity. Topology(T) clusters structures according to their topological connections and numbers of secondary structures Homologous Superfamily (H) This level groups together protein domains which are thought to share a common ancestor and can therefore be described as homologous. [ http://www.biochem.ucl.ac.uk/bsm/cath_new/ ] Sequence similarity implies structural similarity? 100 . Sequence identity 75 implies structural similarity ! 50 Don't 25 know Pairwise sequence identity region 0 Number of residues aligned (B.Rost, Columbia, NewYork) Sequence similarity implies structural similarity? . 100 identity simil arity 80 Sequence identity 60 implies structural similarity 40 identity/similarity Don’t Percentage sequence 20 know region ..... 0 0 50 100 150 200 250 Number of residues aligned (B.Rost, Columbia, NewYork) Fold recognition / Threading Find a compatible fold for a given sequence .... >Protein XY MSTLYEKLGGTTAVDLAV DKFYERVLQDDRIKHFFA ? DVDMAKQRAHQKAFLTYA FGGTDKYDGRYMREAHKE ≈ LVENHGLNGEHFDAVAED LLATLKEMGVPEDLIAEV AAVAGAPAHKRDVLNQ Number of protein folds that occurs in nature is limited. Fold Recognition can be used to: ¾ Identify templates for comparative modeling ¾ Assign Protein Function Fold recognition / Threading The "biological" perspective: Homologous proteins have evolved by molecular evolution from a common ancestor. If we can establish homology, we can predict aspects of structure and function of a new protein by analogy. The "physical" perspective: The native conformation of a protein corresponds to a global free energy minimum of the protein / solvent system. To identify a compatible fold, the protein sequence is "threaded" through a library of folds, and empirical energy calculations are used to evaluate compatibility. No single method is perfect. Consensus methods often perform better: ¾ MetaPP: http://cubic.bioc.columbia.edu/predictprotein/ ¾ http://bioinfo.pl/meta/ Further reading: Adam Godzik, "Fold Recognition Methods", in: "Structural Bioinformatics", Bourne & Weissig, Eds. Protein Structure / Fold Databases PDB: http://www.pdb.org EBI-MSD http://www.ebi.ac.uk/msd/ SCOP http://scop.mrc-lmb.cam.ac.uk/scop/ CATH http://www.biochem.ucl.ac.uk/bsm/cath_new/ Fold Recognition Servers Meta server ¾ http://bioinfo.pl/meta/ 3DPSSM / Phyre ¾ http://www.sbg.bio.ic.ac.uk/servers/3dpssm/ ¾ http://www.sbg.bio.ic.ac.uk/~phyre/ GenTHREADER ¾ http://bioinf.cs.ucl.ac.uk/psipred/ FUGUE2 ¾ http://www-cryst.bioc.cam.ac.uk/~fugue/prfsearch.html SAM ¾ http://www.cse.ucsc.edu/research/compbio/HMM-apps/T99- query.html FOLD ¾ http://fold.doe-mbi.ucla.edu/ FFAS/PDBBLAST ¾ http://bioinformatics.burnham-inst.org/ Evolution of the globin family: Evolution of protein structure families 2.5 2.0 1.5 1.0 0.5 Rmsd of backbone atoms in core 0.0 100 50 0 Percent identical residues in core [ Chothia & Lesk (1986) ] Common core = all residues that can be superposed in 3D For proteins > 60% identical residues, the core contains > 90 % of all residues deviating less than 1.0 Å. Similar Sequence Î Similar Structure Homology modeling = Comparative protein modeling = Knowledge-based modeling Idea: Using experimental 3D-structures of related family members (templates) to calculate a model for a new sequence (target). Comparative Modeling Known Structures (Templates) Target Sequence Template Selection Alignment Structure Evaluation & Template - Target Assessment Structure modeling Homology Model(s) Comparative Modeling Known Structures (Templates) Target Sequence Template Selection Alignment Structure Evaluation & Template - Target Assessment • Protein Data Bank PDB http://www.pdb.org Structure modeling Homology Model(s) Î Database of templates • Separate into single chains • Remove bad structures (models) • Create BLASTable database or fold library (profiles, HMMs) Comparative Modeling Known Structures (Templates) Target Sequence Template Selection Template selection: Alignment Structure Evaluation & Template - Target Assessment 1. Sequence Similarity / Fold Structure modeling recognition Homology Model(s) 2. Structure quality (resolution, experimental method) 3. Experimental conditions (ligands and cofactors) Comparative Modeling Known Structures (Templates) Target Sequence Template Selection Alignment Structure Evaluation & Template - Target Assessment • Multiple sequence alignment for pairs > 40% identity Structure modeling or Homology Model(s) • Use structural alignment of templates to guide sequence alignment of target or • Use separate profiles for template and targets Comparative Modeling Known Structures (Templates) Target Sequence Template Selection Alignment Structure Evaluation & Template - Target Assessment • Errors in template selection or Structure modeling alignment result in bad Homology models Model(s) Î iterative cycles of alignment, modeling and evaluation Î Built many models, choose best. Comparative Modeling Known Structures (Templates) Target Sequence Template Selection Alignment Structure Evaluation & Template - Target Assessment I. Manual Model building Structure modeling Homology II. Template based fragment Model(s) assembly – Composer (Sybyl, Tripos) – SWISS-MODEL III. Satisfaction of spatial restraints – Modeller (Insight II, MSI) –CPH-Models I. Manual Modeling [ http://www.expasy.org/spdbv/ ] II. Template based fragment assembly Find structurally conserved core regions II. Template based fragment assembly Build model core ¾ … by averaging core template backbone atoms (weighted by local sequence similarity with the target sequence). Leave non-conserved regions (loops) for later …. II. Template based fragment assembly Loop (insertion) modeling ¾ Use the “spare part” algorithm to find compatible fragments in a Loop- Database, or “ab-initio” rebuilding (e.g. Monte Carlo, MD, GA, etc.) to build missing loops. II. Template based fragment assembly Side Chain placement ¾ Find the most probable side chain conformation, using • homologues structure information • back-bone dependent rotamer libraries • energetic and packing criteria II. Template based fragment assembly Rotamer Libraries ¾ Only a small fraction of all possible side chain conformations is observed in experimental structures ¾ Rotamer libraries provide an ensemble of likely conformations ¾ The propensity of rotamers depends on the backbone geometry: II. Template based fragment assembly Energy minimization ¾ modeling method will produce unfavorable contacts and bonds ¾ Energy minimization is used to • regularize local bond and angle geometry • Relax close contacts and geometric strain ¾ extensive energy minimization will move coordinates away from real structure ⇒ keep it to a minimum ¾ SWISS-MODEL is using GROMOS 96 force field for a steepest descent Homology Modeling III. Satisfaction of Spatial restraints M Q T S A F G T A E III. Satisfaction of Spatial restraints Alignment of target sequence

Homology Modeling Lausanne, February 22, 2007

NIH Public Access Author Manuscript Proteins

Homology Modeling and Analysis of Structure Predictions of the Bovine Rhinitis B Virus RNA Dependent RNA Polymerase (Rdrp)

Comparative Protein Structure Modeling of Genes and Genomes

FORCE FIELDS for PROTEIN SIMULATIONS by JAY W. PONDER

Structural Bioinformatics

Ten Quick Tips for Homology Modeling of High-Resolution Protein 3D Structures

Methods for the Refinement of Protein Structure 3D Models

Exercise 6: Homology Modeling

Homology Modeling

Homology Modeling and Optimized Expression of Truncated IK Protein, Tik, As an Anti-Inﬂammatory Peptide

Homology Modeling and Docking Studies of a 9-Fatty Acid Desaturase

A Thesis Entitled Homology-Based Structural Prediction of the Binding