Comparative Modeling: Homology Modeling and Protein Threading

Comparative modeling: Homology Modeling and Protein Threading Some slides modified from Kristen Huber, Umass & Charles Yan, Utah State University Types of Structure Prediction • De novo protein structure prediction – methods seek to build three‐dimensional protein models "from scratch" – Example: Rosetta • Comparative protein structure prediction – modeling uses previously solved structures as starting points, or templates. – Example: protein threading What is comparative modeling In general, comparative modeling consists of – Selection of one or more templates from a database. • BLAST (for closely related sequences). • PSI‐BLAST (for distantly related sequences). • A single template rarely provides a complete model. Alternative template structures may provide some additional structural features. – Alignment to the target sequence. • Require a correct alignment of the target and template sequences. This is not trivial, especially when the similarity is not very high. – Refinement of side chain geometry and regions of low sequence identity. 3 Comparative modeling • Homology modeling • threading Comparative modeling Sequence Sequence Homology To known fold 30‐40% <30% homology Threading Modeling Yes Match Found? No Model Ab initio challenges Challenges – Aligning the target sequence onto the template structure or structures is challenging, and typically results in very significant errors. – Generally, a significant fraction of residues in a target will have no structural equivalent in an available template. Reliably building regions of the structure not present in a template remains a challenge. – Side chain accuracy of these approximate models is poor. – Refinement remains the principal bottleneck to progress. 6 Sequence comparison improves fast Improving sequence comparison techniques have broadened the scope of comparative modeling. While 30% sequence similarity was considered to be the threshold for successful comparative modeling, predictions for targets with as low as 17% sequence similarity were made during the CASP4 experiment and 6% during CASP5. The importance of comparative modeling will continue to grow as the number of experimentally determined structures grows steadily and, therefore, the number of sequences that can be related to aknown structure is growing. 7 Little progress in refining templates (until 2018) • Comparative modeling methods hardly differ with respect to template selection and alignment. • Little progress in refining templates. Early hopes that molecular dynamics methods would allow refinement have not been fulfilled. Reasons for this are a matter of hot debate within the field, with three suggested inter‐related explanations: inadequate sampling of alternative conformations, insufficiently accurate description of the inter‐ atomic forces and too short trajectories. 8 Homology Modeling Defined – Homolog of a protein is related to it by divergent evolution from a common ancestor – Based on the reasonable assumption that two homologous proteins will share very similar structures. – Given the amino acid sequence of an unknown structure and the solved structure of a homologous protein, each amino acid in the solved structure is mutated computationally, into the corresponding amino acid from the unknown structure. – The accuracy of predictions by homology modeling strongly depends on the degree of sequence similarity. Why Homology Modeling? • Value in structure based drug design • Find common catalytic sites/molecular recognition sites • Use as a guide to planning and interpreting experiments • 70‐80% chance a protein has a similar fold to the target protein based on known structures from X‐ray crystallography or NMR spectroscopy • Sometimes it’s the only option or best guess Similarity of primary sequences matters • If the target and the template share more than 50% of their sequences, predictions usually are of high quality and have been shown to be as accurate as low‐resolution X‐ray predictions. • For 30–50% sequence identity more than 80% of the C‐atoms can be expected to be within 3.5 ˚A of their true positions. • For less than 30% sequence identity, the prediction is likely to contain significant errors 11 factors affecting the quality of homology modeling The quality of the homology model is dependent on the quality of the sequence alignment and template structure. The approach can be complicated by the presence of alignment gaps (commonly called indels) that indicate a structural region present in the target but not in the template, and by structure gaps in the template that arise from poor resolution in the experimental procedure (usually X‐ray crystallography) used to solve the structure. The quality of homology modeling Model quality declines with decreasing sequence identity; a typical model has ~1‐2 Å root mean square deviation between the matched Cα atoms at 70% sequence identity but only 2‐4 Å agreement at 25% sequence identity. However, the errors are significantly higher in the loop regions, where the amino acid sequences of the target and template proteins may be completely different Homology Modeling Limitations • Cannot study conformational changes • Cannot find new catalytic/binding sites • Large Bias towards structure of template • Models cannot be docked together Comparative modeling Sequence Sequence Homology To known fold 30‐40% <30% homology Threading Modeling Yes Match Found? No Model Ab initio Protein Threading Protein threading, also known as fold recognition, is a method of protein modeling, that is, computational protein structure prediction, which is used to model those proteins which have the same fold as those of proteins of known structures but do not have homologous proteins with known structure. Different from homology modeling Protein threading is different from the homology modeling method of protein structure prediction in the sense that it is used for proteins which do not have their homologous protein structures deposited in the pdb. Protein threading predicts protein structures by using statistical knowledge of the relationship between the structures deposited in the PDB and the sequence of the protein which is wished to be modeled. Protein Threading • The word threading implies that one drags the sequence (ACDEFG...) step by step through each location on each template Protein Threading Why Threading (I) • While similar sequence implies similar structure, the converse is in general not true. • In contrast, similar structures are often found for proteins for which no sequence similarity to any known structure can be detected. • As a consequence, the repertoire of different folds is more limited than suggested by sequence diversity. 20 Why threading (II) • Fold recognition methods are motivated by the notion that structure is evolutionary more conserved than sequence. • Fold recognition methods are one class of comparative modeling methods that aim at predicting the three‐dimensional folded structure for amino acid sequences for which homology methods provide no reliable prediction. • Since the number of sequences is much larger than the number of folds, fold recognition methods attempt to identify a model fold for a given target sequence among the known folds even if no sequence similarity can be detected. 21 Threading • Threading‐based methods are known to be computationally expensive. • Globally optimal protein threading is known to be NP‐hard • Several threading methods ignore pairwise interaction between residues.In doing so, the threading problem is simplified considerably, and the simplified problem can be solved with dynamic programming 22 Threading • In early methods of this kind, a one dimensional string of features was recorded for known folds and compared to the target sequence. • The recorded features comprise attributes like buried side chain area, side chain area covered by polar atoms including water, and the local secondary structure. • In this manner, the three‐dimensional structure of known proteins is converted into a one‐dimensional sequence of descriptors and fold recognition is reduced to seeking the most favorable sequence alignment between the query sequence and a database of sequences. • Recent approaches take into account pair‐wise residue interaction potentials that describe a mean force derived from a database of known structures. 23 Threading Methods • Bowie, Lüthy and Eisenberg (1991) • 2 approaches to recognition methods • Derive a 1‐D profile for each structure in the fold library and align the target sequence to these profiles – Identify amino acids based on core or external positions – Part of secondary structure • Consider the full 3‐D structure of the protein template – Modeled as a set of inter‐atomic distances – NP‐Hard (if include interactions of multiple residues) Threading based on secondary structure • One approach to fold recognition is based on secondary structure prediction and comparison. • This subclass of methods is based on the observation that secondary structure similarity can exceed 80% for sequences that exhibit less than 10% sequence similarity. • Clearly any such approach can only be as good as the underlying secondary structure prediction method. 25 accuracy Accuracy of secondary structure predictions. – 60% (1990s) – 76% (Current) 26 Some Threading Programs • 3D‐pssm (ICNET). Based on sequence profiles, solvatation potentials and secondary structure. • TOPITS (PredictProtein server) (EMBL). Based on coincidence of secondary structure and accesibility. • UCLA‐DOE Structure Prediction Server (UCLA).

Comparative Modeling: Homology Modeling and Protein Threading

NIH Public Access Author Manuscript Proteins

Homology Modeling and Analysis of Structure Predictions of the Bovine Rhinitis B Virus RNA Dependent RNA Polymerase (Rdrp)

Comparative Protein Structure Modeling of Genes and Genomes

FORCE FIELDS for PROTEIN SIMULATIONS by JAY W. PONDER

Structural Bioinformatics

Ten Quick Tips for Homology Modeling of High-Resolution Protein 3D Structures

Methods for the Refinement of Protein Structure 3D Models

Exercise 6: Homology Modeling

Homology Modeling

Homology Modeling and Optimized Expression of Truncated IK Protein, Tik, As an Anti-Inﬂammatory Peptide

Homology Modeling and Docking Studies of a 9-Fatty Acid Desaturase

A Thesis Entitled Homology-Based Structural Prediction of the Binding