Protein Prediction Part 1: Structure

Comparative modelling

Andrea Schafferhans (built on slides from Burkhard Rost) 2012/05/10 Using structure to predictF. Glaser function et al. T.P. is supported by a JSPS fellowship. E.M. is supported by the Division of Undergraduate Education of the US National Foundation. We are grateful to the Unit and the George S. Wise Faculty of Life Sciences at Tel Aviv University for providing technical assistance and computation facilities. Downloaded from REFERENCES Adams,J.M. and Cory,S. (1998) The Bcl-2 protein family: arbiters of cell survival. Science, 281, 1322–1326. Aloy,P., Querol,E., Aviles,F.X. and Sternberg,M.J. (2001) Auto- mated structure-based prediction of functional sites in proteins: http://bioinformatics.oxfordjournals.org/ applications to assessing the validity of inheriting protein func- tion from homology in genome annotation and to protein dock- ing. J. Mol. Biol., 311, 395–408. Armon,A., Graur,D. and Ben-Tal,N. (2001) ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J. Mol. Biol., 307, 447–463. Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Chymotrypsin Subtilin Data Bank. Nucleic Acids Res., 28, 235–242. (5cha) (5sic) Brenner,S.E. (2001) A tour of structural genomics. Nat. Rev. Genet.,

2, 801–809. at Universitatsbibliothek der Technischen Universitaet Muenchen Zweigbibliothe on May 9, 2012 Innis,C.A., Shi,J. and Blundell,T.L. (2000) Evolutionary trace analysis of TGF-beta and related growth factors: implications for site-directed mutagenesis. Protein Eng., 13, 839–847. Landgraf,R., Xenarios,I. and Eisenberg,D. (2001) Three- dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. J. Mol. Biol., 307, 1487–1502. Fig. 1. Conservation pattern in the Bcl-XL/Bak complex (PDB ID: Lichtarge,O., Bourne,H.R. and Cohen,F.E. (1996) An evolutionary 1bx1).ConSurf The Bcl-XL protein is represented as a spacefill model, where trace method defines binding surfaces common to protein the residue conservation scores are color-coded onto its Van der families. J. Mol. Biol., 257, 342–358. Waals surface. The Bak peptide (residues 72–87) is shown as a Martz,E. (2002) Protein explorer: easy yet powerful macromolecu- yellowF backbone Glaser, model. The T color-coding Pupko bar shows, I the Paz, coloring RElar visualization. Trends Biochem. Sci, 27, 107–109. scheme; conserved amino acids are colored bordeaux, residues Pupko,T., Bell,R.E., Mayrose,I., Glaser,F. and Ben-Tal,N. (2002) of averageBell, conservation D Bechor-Shental are white, and variable amino acids, are E Rate4Site: An algorithmic tool for the identification of functional turquoise. regions on proteins by surface mapping of evolutionary determi- Martz and N Ben-Tal (2003)nants within their homologues. Bioinformatics, 18 (suppl), S71– S77. is known to be required for anti-apoptotic activity and Sattler,M., Liang,H., Nettesheim,D., Meadows,R.P., Harlan,J.E., mayBioinformatics play a role in the interaction with19 CED-4 163-4 (Adams Eberstadt,M., Yoon,H.S., Shuker,S.B., Chang,B.S., Minn,A.J., and Cory, 1998). Given the same MSA as an input, Thompson,C.B. and Fesik,S.W. (1997) Structure of Bcl-xL-Bak two other web-servers based on the Evolutionary Trace peptide complex: recognition between regulators of apoptosis. method (Innis et al., 2000; Lichtarge et al., 1996) and a Science, 275, 983–986. consensus approach (Martz, 2002) failed to identify one Schneider,R. and Sander,C. (1996) The HSSP database of protein 2 or both of these patches. (See http://consurf.tau.ac.il under structure-sequence alignments. Nucleic Acids Res., 24, 201–205. ‘OVERVIEW’ for details). Valdar,W.S. and Thornton,J.M. (2001) Protein-protein interfaces: analysis of amino acid conservation in homodimers. Proteins, 42, 108–124. ACKNOWLEDGMENTS Yona,G., Linial,N. and Linial,M. (2000) ProtoMap: automatic This study was supported by a Research Career Devel- classification of protein sequences and hierarchy of protein opment Award from the Israel Cancer Research Fund. families. Nucleic Acids Res., 28, 49–55.

164 Universe of protein structures

C h r i s ti n e O r e n g o (S tr u c tu r e s , 1 9 9 7 , 5 , 1 0 9 3 -1 1 0 8 ) C h r i s ti n e O r e n g o (S tr u c tu r e s , 1 9 9 7 , 5 , 1 0 9 3 -1 1 0 8 )

Christine Orengo et al. 1997 Structures 5 1093-1108

3 Protein structure (prediction) in numbers Estimate for 1999 Number of entries 05/2012

3D Uniprot/TREMBL: 21,552,793 Uniprot/Swissprot: 535,698 PDB: 81,369 1D HoMo

FoRc

4 Redundancy in the PDB Numberproteinsof in PDB

5 Protein structure (prediction) in numbers Estimate for 1999 Number of entries 05/2012

3D Uniprot/TREMBL: 21,552,793 Uniprot/Swissprot: 535,698 PDB: 81,369 1D HoMo unique sequences 46,166

FoRc

6 Goal of structure prediction

Epstein & Anfinsen, 1961: sequence uniquely determines structure

INPUT: sequence OUTPUT:

3D structure and function

7 Structure and sequence similarity

WHAT IF

EEEE B B B B EEEEEE EEEEEE EEEEEEEEHHHEEE 1shf 100% VTLFVALYDYEARTEDDLSFHKGEKFQILNSSEGDWWEARSLTTGETGYIPSNYVAPVD 1srm 78% VTTFVALYDYESRTETDLSFKKGERLQIVNNTEGDWWLAHSLTTGQTGYIPSNYVAPSD 1sem 39% ....VAEHDFQAGSPDELSFKRGNTLKVLNKDEDPHWYKAEL.DGNEGFIPSNYIRMTE

8 Comparative modeling

PDB

U (sequence) significant sequence identity H

• assumption: H and U homolgous 3D structures • strategy: modelling of U based on H

9 Sequence conservation of protein structure Conservation of protein structure Related structures Unrelated structures

B Rost 1999 Prot Engin 12, 85-94

B Rost 1999 Prot Engin 12, 85-94 © Burkhard Rost (TU Munich) 436 /128 HSSP DistanceDistancecurve fromfrom newnew HSSP-curve HSSP-curve

B Rost 1999 Prot Engin 12, 85-94 B Rost 1999 Prot Engin 12, 85-94 © Burkhard Rost (TU Munich) /128

B Rost 1999 Prot Engin 12, 85-94 © Burkhard Rost (TU Munich) /128 Zones f

12 Protein structure (prediction) in numbers

Estimate for 1999 Number of entries 05/2012

Uniprot/TREMBL: 21,552,793 3D Uniprot/Swissprot: 535,698 PDB: 81,369 unique sequences 46,166 1D HoMo

FoRc Total 1 member UniRef 100 16,973,591 15,225,290

UniRef 90 10,657,899 8,261,540

UniRef 50 4,927,947 3,136,923

13 Bridging potential vs. sequence identity

30

. 25

20

15

10

5

0 98 90 Percentage of proteins in SWISS-PROT 82 74 66 58 50 42 34 Percentage of pairwise sequence identity

14 SequenceTwilight conservation of protein structurezone = false positives explode!!

Percentage sequence identity 10 15 20 25 30 35 6 10

B Rost 1999100 Prot Engin 12, 85-94

© Burkhard Rost (TU Munich) 436 /128 5

Number of residues aligned residues of Number

y

t

i 80

t 10

n

e

d

i

60 e

c 4 n

e 40

u

q 10 e

S 20

% 3 0

50 100 150 200 250 300 350 400 10

- 20 -

- 15 -

- 10 -

- 5 - 2

0

+ 5 +

+10

+15 +20

+25 10

Numberproteinof pairs 1 10 -15 -10 -5 0 5 10 Related structures Distance from HSSP threshold Unrelated structures B Rost 1999 Prot Engin 12, 85-94

15 into the Midnight zone

1600 B Rost 1997 Folding & Design 2, S19-S24

1200

800

Numberstructurepairsof 400

0 0 0 5 10 15 20 25 25 50 75 100 Percentage pairwise sequence identity Protein structure (prediction) in numbers Estimate for 1999 3D

1D HoMo

FoRc

17 Protein structure (prediction) in numbers Estimate for 1999 3D

1D HoMo

FoRc

18 Protein structure prediction in reality Estimate for 1999 SWISS-PROT view Genome view HoMo 3D 1D FoRc 1D HoMo ….the art of FoRc being humble

19 Improving prediction by waiting it out …

1999 1995

1991 Jinfeng Liu • 1995-2003 MS Rutgers Univ. • 1998-2004 PhD Columbia Univ. Pharmacology

PhD with 16 publications!

• 2004-2007 Sr. Research Assistant Columbia Univ. Biochemistry & Molecular Biophysics • 2007-now Genentech, CA

Jinfeng Liu

21 Homology modeling for entire genomes 0 5,000 10,000 15,000 20,000

H sapiens(chr. 22) H sapiens D melanogaster C elegans S cerevisiae

U urealyticum T pallidum S PCC6803 R prowazekii N meningitidis M tuberculosis M pneumoniae M genitalium H pylori H influenzae E coli C trachomatis Organism C pneumoniae C jejuni B burgdorferi B subtilis A aeolicus

T maritima D radiodurans P horikoshii Number of ORFs P abyssi M thermoautotrophicu Number of ORFs with PDB hit M jannaschii A fulgidus A pernix

0 5,000 10,000 15,000 20,000 Number of proteins

22 Homology for protein universe 0 10 20 30 40 50 60

H sapiens(chr. 22) H sapiens D melanogaster C elegans S cerevisiae

U urealyticum T pallidum S PCC6803 R prowazekii N meningitidis M tuberculosis M pneumoniae M genitalium H pylori H influenzae E coli C trachomatis Organism C pneumoniae C jejuni B burgdorferi B subtilis A aeolicus

T maritima D radiodurans P horikoshii P abyssi M thermoautotrophicu M jannaschii A fulgidus A pernix

0 10 20 30 40 50 60 Percentage of all ORFs in genome

23 Comparative modeling: terms

• Comparative modeling vs. Homology modeling

• Target: protein to model Template: protein to model from

24 Comparative modeling: steps

• Identify template(s) through database search – PSI-Blast / HHblits

• Align target/template

• Build model • Assess model • (refine)

25 Extending modelling reach: threading

Percentage of pair- Accuracy of automatic wise identical residues fold recognition 100%

• correct first hit: Region of ! 20-30% homology modelling (sequence alignment • alignment correct to some extent: suffice) ! 10-25%

• remote homology modelling (3D) correct: 25% < 10% Fold recognition

0%

26 Comparative modelling: quality

Percentage of pairwise Limiting factor identical residues 100% in homology modelling

SPEED

of modelling y

c

a

r

u

75% c

c e

a

g

a

g

r

n QUALITY e

i

v

s

of model o a

c

e

r

g

c

n

n

i

I 50% s

a

e

r

c

n

ALIGNMENT I accuracy 25%

DETECTION of homology

0%

27 Comparative modeling: steps

• Identify template(s) through database search – PSI-Blast / Hhblits – threading • Align target/template

• Build model • Assess model • (refine)

28 Comparative modeling: steps

• Identify template(s) through database search – PSI-Blast / Hhblits – threading • Align target/template – dynamic programming – structural alignment of targets to guide – threading-like – HMM / profile-profile • Build model • Assess model • (refine)

29 Shapers and Shakers

Andrei Sali - UCSF • CV – PhD Birkbeck - Tom Blundell – PD Harvard - Martin Karplus – Rockefeller Univ. – UCSF • Publications (2011/06) – > 400 publications – 1x ~4,000 – 45x >100 – H-index > 64

Andrej Sali, UCSF

30 Comparative modeling: MODELLER – Marc A Marti-Renom, A Stuart, Andras Fiser, Roberto Sanchez, Francisco Melo, Andrej Sali (2000) Annu. Rev. Biophys. Biomol. Struct. 29:291-325 – Andrej Sali & Tom L Blundell (1993) JMB 234:779-815

Andrej Sali, UCSF Marc Marti-Renom, CIPF Barcelona (here at ISCB-Africa in , )

31 Comparative modeling: MODELLER www.salilab.org/modeller/ • MA Marti-Renom, A Stuart, A Fiser, R Sanchez, F Melo, A Sali (2000) Annu. Rev. Biophys. Biomol. Struct. 29:291-325 • A Sali & TL Blundell (1993) JMB 234:779-81

32 MODELLER: constraint satisfaction

Features of a protein structure: • resolution of X-ray experiment • amino acid residue type • main chain angles • secondary structure class • main chain conformation class • side chain dihedral angle class • residue solvent accessibility • residue neighborhood difference • Ca - Ca distance Source: Modeller manual • difference between two Ca - Ca distances

33 MODELLER: overview

N Eswar et al. & A Sali (2008) Methods Mol Biol 426: 145-59 (Fig. 1)

34 III. Satisfaction of Spatial restraints

‰ Feature properties can be associated with MODELLER:¾ a protein (e.g. X-ray constraint resolution) derivation ¾ residues (e.g. solvent accessibility)

¾ pairs of residues (e.g. Ca -Ca distance) ¾ other features (e.g. main chain classes)

‰ How can we derive modeling restraints from this data?

¾ A restraint is defined as probability density function (pdf) p(x):

x1 ³ p(x)dx 1 p(x1d x x2) ³ p(x)dx with x2 p(x) ! 0

35 MODELLER: Constraint satisfaction • Find the protein model with the highest probability • Variable target function: – Start model close to the template conformation – First only local restraints – Minimize using conjugate gradient optimization – Repeat, introduce more and more long range restraints MODELLER: objective function • Molecular dynamics (MD) • Langevin dynamics (LD) • Self-guided MD & LD • Rigid bodies • Rigid molecular dynamics • Rigid minimization

37 MODELLER: multiple models • Run optimisation several times • Starting point: template coordinates with random fluctuations ➔ Explore different local minima

N Eswar et al. & A Sali (2008) Methods Mol Biol 426: 145-59 (Fig. 3)

38 Typical errors side chain alignment shift no template (loop wrong) packing

mis-alignment wrong template

N Eswar et al. & A Sali (2006) Current Protocols in Bioinformatics : Chapter 5 - Unit 5.6.1-30 (Fig. 5.6.12)

39 JOBNAME: PROSCI 15#11 2006 PAGE: 12 OUTPUT: Friday October 13 03:58:05 2006 csh/PROSCI/125778/ps0624166

Downloaded from www.proteinscience.org on November 10, 2006

Shen and Sali MODELLER – Identify best models accurate: incomplete models, small structures, low-accu- racy models, and NMR structures. We conclude by listing DOPE score several current applications of DOPE. Discrete Optimized Protein Energy Comparison of statistical potential reference states All statistical potentials depend on the same protein structure database (i.e., PDB). Therefore, the differences between the distance pdfs pm;n r of various statistical Based on knowledge- potentials depend only on theð specificÞ choice of the sample structures and are not significant conceptually. based pair potentials In contrast, significantly different definitions of the REF distance pdf for the reference state pm;n r (Equation 3), which is equal to the normalizationð functionÞ n r REF ð Þ (Equation 3) or equivalently Nm;n r (Equations 2 and 3), have been used in the derivationð ofÞ different statistical potentials. For example, RAPDF uses a conditional pdf to construct a distance pdf for the reference state (Samudrala and Moult 1998), and AKBP relies on a mole fraction- dependent reference state function (Lu and Skolnick 2001). We now compare the DOPE n r function to that of DFIRE, Figure 5. Score–error correlation (see Materials and Methods) for ð Þ Shen,M.-Y. and Sali,A. (2006) StatisticalDOPE, using three potential targets from for the assessmentmoulder decoy set. and (A) High predictionwhich is the most similar statistical potential to DOPE. correlation, correlation coefficient r 0.92 (1bbh). (B) Medium correla- Aphysicalpictureofnoninteractingatomsinafinite of protein structures. Protein Science, 15, 2507-24. ¼ tion, r 0.84 (1eaf). (C) Relatively low correlation, r 0.68 (1cew). spherical volume has inspired the DFIRE reference state ¼ ¼ (Zhou and Zhou 2002; Zhang et al. 2004), just as it did for DOPE. The DFIRE normalization function is n r ra, other existing methods for deriving statistical potentials; relying on a constant effective exponent parameterð Þ¼a that second, we discuss the importance of the size of the is used for all sample native structures irrespective of spherical reference state of DOPE; and finally, we de- their size. The optimal value of a was found empirically scribe four regimes where DOPE tends to be less to be 1.57 (Zhou and Zhou 2002) and subsequently

Figure 6. Sample structure assessment that benefits from using the correct reference sphere size. The best-scored model of the target ˚ 1bbh in the moulder decoy set with (A) DOPE based on an underestimated radius of the reference sphere a of 16 A. The Ca RMS error of this model is 15.4 A˚ .(B) When DOPE is calculated with the size a of 23 A˚ , it correctly scores the native structure better than any of the 300 decoys.

2518 Protein Science, vol. 15 Loop modelling

Andras Fiser, Richard Kinh Gian Do & Andrej Sali (2000) Protein Science 9:1753-73: Fig. 9

41 Comparative modeling methods

• MODELLER – lots of whistles and bells – downloadable – very accurate

42 Shapers and Shakers

Torsten Schwede - Biocenter Basel • CV – PhD in X-ray crystallography – SIB – Biocenter Basel • Publications (2011/06) – > 45 publications – 1x ~2,000 – 5x >100 – H-index > 19 • Web presence – 1000s of accesses / day Torsten Schwede

43 Shapers and Shakers

SWISS-MODEL • Nicolas Guex & Manuel C Peitsch (1997) Electrophoresis 18:2714-23 • Torsten Schwede, J Kopp, Nicolas Guex & Manuel C Peitsch (2003) NAR 31:3381-5 • F Kiefer, K Arnold, M Kuenzli, L Bordoli & T Schwede (2009) NAR 37:D387-92 Nicolas Guex Vital-IT photo: http://www.vital-it.ch/vitalit_images/Guex.jpg

Manuel Peitsch Philip Morris Internatl. photo: http://www.inria.fr/actualites/colloques/2008/ lscc2008/images/manuel-peitsch.jpg Torsten Schwede, BioZentrum Basel photo: http://www.unibas.ch/ mediaDB/schwede_torsten_big.jpg

44 SWISS-MODEL http://swissmodel.expasy.org/

Torsten Schwede, J Kopp, Nicolas Guex & Manuel C Peitsch (2003) NAR 31:3381-5 F Kiefer, K Arnold, M Kuenzli, L Bordoli & T Schwede (2009) NAR 37:D387-92

45 SWISS-MODEL

• Underlying “philosophy”: – fully automated – for non-expert users/experimental biologists – do less -> you do fewer mistakes

• Original: 1. alignment by BLAST/PSI-BLAST 2. copy co-ordinates 3. end

Torsten Schwede, J Kopp, Nicolas Guex & Manuel C Peitsch (2003) NAR 31:3381-5 F Kiefer, K Arnold, M Kuenzli, L Bordoli & T Schwede (2009) NAR 37:D387-92

46 SWISS-MODEL – workflow

Lorenza Bordoli, Florian Kiefer, Konstantin Arnold, Pascal Benkert, James Battey & Torsten Schwede (2009) Protocols doi:10.1038/nprot.2008.197; Fig. 2

47 SWISS-MODEL: template search

• Blast • PSI-Blast • HH-Search

48 SWISS-MODEL: template selection • Template quality • Bound substrate • InterPro annotations

• Secondary structure prediction • Transmem- brane helices

49 SWISS-MODEL: building structure

Template based fragment assembly: • Find structurally conserved core regions • Build model core • Loop (insertion) modeling • Side Chain placement – use – homologues structure information – back-bone dependent rotamer libraries – energetic and packing criteria • Energy minimisation (GROMOS 96, steepest descent)

SWISS-MODEL: result after many steps

© Chris Wilton (Helsinki)

51 SWISS-MODEL: model assessment Sum scores: • QMEAN • DFIRE (statistical potential)

Local scores: • ANOLEA (mean force potential) • GROMOS (empirical force field) • ProQres (neural network)

52 SWISS-MODEL: interpreting scores

• Long stretches above zero probably loops • Most-negative regions well buried in core • Functional residues above zero! • Majority of residues must be below zero • Compare your model to pdb template!

53 Comparative modeling methods

• MODELLER lots of whistles and bells, downloadable, very accurate • SWISS-MODEL automated, increasingly comprehensive and flexible

54 Shapers and Shakers

HHpred / HHsearch J Soeding (2005) Bioinformatics 21:951-60 J Soeding, A Biegert & AN Lupas (2005) NAR 33:W244-8 A Hildebrand et al. (2009) Proteins, 77 Suppl 9, 128-32

Johannes Soeding, LMU photo: http://www.lmb.uni-muenchen.de/ soeding/images/soeding.jpg Andrei Lupas, MPI Tuebingen photo: http://www.mph.tuebingen.mpg.de/pix/perspics/pic444.jpg

55 Comparative modeling: HHpred

• Build target sequence profile with PSI-Blast (Hhblits ?) • Make HMM of target • Search template HMM database (HMM-HMM alignment) • Re-rank potential templates • Reduce diversity of multiple sequence alignments • Re-rank alternative alignments • Pick best template for each region • Run MODELLER

56 Comparative modeling methods

• MODELLER lots of whistles and bells, downloadable, very accurate • SWISS-MODEL automated, increasingly comprehensive and flexible • HHpred/HHsearch very accurate, automated

57 Shapers and Shakers

Comparative modeling: 3D-JIGSAW

PA Bates, LA Kelley, RM MacCallum & MJE Sternberg (2001) Proteins 5:39-46

Paul A Bates, London Res Inst photo: http://www.bmm.icnet.uk/ ~bates03/ Michael JE Sternberg photo: http://www3.imperial.ac.uk/pls/portallive/docs/1/63011698.JPG

58 3D-JIGSAW

http://bmm.cancerresearchuk.org/~3djigsaw/

59 3D-JIGSAW fragment-based approach: 1. identify all similar fragments in PDB 2. assemble fragments to structure

60 Comparative modeling methods

• MODELLER lots of whistles and bells, downloadable, very accurate • SWISS-MODEL automated, increasingly comprehensive and flexible • HHpred/HHsearch very accurate, automated • 3D-JIGSAW automated, accurate

61 Shapers and Shakers

WHAT IF G Vriend (1990) J Mol Graph 8:52-6

Gert Vriend CMBI Nijmegen photo: http://swift.cmbi.ru.nl/gv/start/ IMAGE/VRIEND.jpg

62 Shapers and Shakers

YASARA E Krieger, JE NIelsen, C Spronk & G Vriend (1990) J Mol Graph 8:52-6

Elmar Krieger YASARA Biosciences, Vienna photo: http://www.yasara.org/ekrieger_small.jpg

63 Comparative modeling methods

• MODELLER lots of whistles and bells, downloadable, very accurate • SWISS-MODEL automated, increasingly comprehensive and flexible • HHpred/HHsearch very accurate, automated • 3D-JIGSAW automated, accurate • WHAT IF expert users, does anything incl. chess

64 COMA

M Margelevicius, M Laganeckas & Ceslovas Venclovas (2010) COMA server for protein distant homology search. Bioinformatics, 26, 1905-1906

Ceslovas Venclovas Inst Biotechnology, Vilnius, Lithuania photo: http://www.ibt.lt/uploads/images/bioinfo/ ceslovas.jpg

65 COMA – method

• Comparison Of Multiple Alignments • mostly through good profile-profile alignments • special features: – position specific gap penalty – global score

66 COMA – performance

global: entire 3D domain

local

M Margelevicius & Ceslovas Venclovas (2010) BMC Bioinformatics 11:89, 9-14; Fig. 3

67 Comparative modeling methods • MODELLER lots of whistles and bells, downloadable, very accurate • SWISS-MODEL automated, increasingly comprehensive and flexible • HHpred/HHsearch very accurate, automated • 3D-JIGSAW automated, accurate • WHAT IF expert users, does anything incl. chess • COMA reaches deep into twilight zone, automated

68 Yang i-Tasser Zhang, http://zhanglab.ccmb.med.umich.edu/I-TASSER/ University of Michigan

• Ambrish Roy, Alper Kucukural, Yang Zhang Nature Protocols, vol 5, 725-738 (2010). • Ambrish Roy, Dong Xu, Jonathan Poisson, Yang Zhang. Journal of Visualized Experiments, vol 57, e3259 (2011). • Yang Zhang. BMC Bioinformatics, vol 9, 40 (2008). i-Tasser – workflow

• Template identification: threading • Structure assembly: put together contiguous fragments • Knowledge-based force field: – general knowledge-based statistics terms – spatial restraints from threading templates – Sequence-based contact predictions from SVMSEQ • Multiple domain proteins: dock models of individual domains • Special: – Multiple templates to assemble models – Optimise global and local structural packing Models and reality

• “A Model must be wrong, in some respects, else it would be the thing itself. The trick is to see where it is right.” (Henry A. Bent) • “A model is a tool that helps to interpret biochemical data,” (Thorsten Schwede) Next session (May 22nd): Assessing protein structure prediction

72