Protein Structure Comparison: Implications for the Nature of 'Fold

Protein structure comparison: implications for the nature of ‘fold space’, and structure and function prediction Rachel Kolodny, Donald Petrey and Barry Honig The identification of geometric relationships between protein terms. In contrast, there is considerable ambiguity in how structures offers a powerful approach to predicting the to describe a geometric relationship between two pro- structure and function of proteins. Methods to detect such teins, resulting in the large number of approaches to this relationships range from human pattern recognition to a variety problem described in the literature. of mathematical algorithms. A number of schemes for the classification of protein structure have found widespread use One effective but qualitative approach is based on man- and these implicitly assume the organization of protein ual pattern recognition. Richardson’s [1] classical review structure space into discrete categories. Recently, an of structural motifs in proteins was a striking example that alternative view has emerged in which protein fold space is has evolved over the years into manually curated struc- seen as continuous and multidimensional. Significant ture classification schemes, as epitomized by the SCOP relationships have been observed between proteins that [2] and CATH [3] databases. Implicit in SCOP and belong to what have been termed different ‘folds’. There has CATH is a hierarchical view whereby ‘structure space’ been progress in the use of these relationships in the prediction is divided into isolated, non-overlapping ‘islands’ that are of protein structure and function. denoted by categories such as folds. It is perhaps surpris- ing that the concept of a fold has entered the vocabulary Addresses of structural biology in the complete absence of a clear Howard Hughes Medical Institute, Department of Biochemistry and quantitative measure of how such an entity should be Molecular Biophysics, Center for Computational Biology and described. Implicit in the hierarchical view is that protein Bioinformatics, Columbia University, 1130 St Nicholas Avenue, Room 815, New York, NY 10032, USA structure space is discrete, in the sense that if a particular protein belongs to one category it does not belong to some Corresponding author: Honig, Barry ([email protected]) other category. Does the use of inherently rigid classification schemes Current Opinion in Structural Biology 2006, 16:393–398 limit our recognition of important relationships that exist This review comes from a themed issue on between proteins that have been segregated into differ- Sequences and topology ent categories? In principle, one could consider overlap- Edited by Nick V Grishin and Sarah A Teichmann ping classifications, whereby each object is assigned to Available online 4th May 2006 multiple classes; unfortunately, there are no overlapping classifications of protein structure space. Indeed, there is 0959-440X/$ – see front matter growing evidence that protein structure space is contin- # 2006 Elsevier Ltd. All rights reserved. uous, in the sense that there are meaningful structural DOI 10.1016/j.sbi.2006.04.007 relationships between proteins that are classified very differently. In this review, we discuss these alternative perspectives, and argue that both hierarchical and continuous views have ranges of validity. We suggest that the Introduction development of computational tools and algorithms that Proteins have complex three-dimensional shapes that, by recognize both descriptions of structure space can eye, often bear striking similarity to one another over their enhance our ability to predict protein structure and entire lengths or over shorter regions. In parallel to what function. can be deduced from pure sequence relationships, structural similarities also suggest the possibility of evolution- Protein structure alignment ary relationships between proteins. Indeed, because it is Structural alignment programs define scoring functions widely accepted that structure is better conserved than that measure the geometric similarity between proteins sequence (at least given our current ability to detect and use various algorithms to search for two substructures sequence relationships), the identification of structural such that these functions are optimal. Most existing relationships between proteins can provide important similarity measures can be classified into two main types structural and functional information not available from depending on what they compare: the distances between sequence analysis alone. However, detecting geometric corresponding pairs of atoms in the two structures (e.g. relationships between proteins is a far more uncertain [4–6]); and the relative positions of the corresponding process than the identification of pure sequence relation- atoms of two proteins that have been superimposed (e.g. ships, as the latter can be clearly defined in statistical [7,8,9,10,11]). It had been expected that the structural www.sciencedirect.com Current Opinion in Structural Biology 2006, 16:393–398 394 Sequences and topology alignment problem, under either of these formulations, is creation of a ‘best-of-all’ method, which returns, for every NP-hard [12]; however, Kolodny and Linial [13] recently pair of structures, the best alignment found by several reported a polynomial time algorithm that guarantees programs; this ‘joint effort’ outperforms all the individual finding an (approximate) optimal solution for a whole methods that it uses. class of scoring functions of the second type. Their main conclusion is that any efficient solution to the structural The nature of fold space alignment problem must search the ‘superposition space’ SCOP [2] and CATH [3] describe fold space in very of the two structures being compared or, equivalently, similar ways. In SCOP’s manual classification, the first optimize a scoring function of the second type. two levels, ‘class’ and ‘fold’, are defined based purely on structure; the next level, ‘superfamily’, takes into account Several recent studies have introduced new structure both structure and function, and the level below accounts similarity measures that are quite different from those for sequence as well, thus grouping proteins with clear used in traditional approaches. Rogen and Fain [14] evolutionary relationships. CATH combines manual clas- suggest describing the shape of a protein backbone by sification with the automatic structural alignment pro- a vector of 30 values inspired by mathematical knot gram SSAP [6]: the topmost level, ‘class’, is based on theory and define the similarity between two structures secondary structure composition; the second level, ‘archi- as the (Euclidean) distance between their corresponding tecture’, is classified manually; the third level, ‘topology vectors. Calculating the similarity of two structures under (fold family)’, depends on the shape and connectivity of this measure is instantaneous. More importantly, it is a the secondary structures, and is classified using SSAP; and pseudo-metric and hence satisfies the triangle inequality, the last level, ‘homology’, uses sequence information. which is paramount to automatic clustering, or visualiza- Ultimately, the presence or absence of a structural relation, of protein structure space. Note that any similarity tionship between two proteins is determined by the measure between two proteins that is defined on sub- category to which they are assigned. structures of these two proteins cannot satisfy the triangle inequality [14]. Erdmann [15] suggests another knot- FSSP [24] is a database that does not use a classification theory-inspired similarity measure and provides algo- scheme. All-on-all alignments are available and a contin- rithms to calculate it. Ye and Godzik [16], and Shatsky uous measure of structural similarity is provided. Isolated et al.[17] suggest flexible structural alignment algorithms, examples of relationships between proteins that would be whereby one of the two proteins being compared is bent treated as unrelated based on hierarchical protein classi- at several hinge points; the similarity is measured on fication schemes have been observed for some time [25]. corresponding rigid parts. This approach is especially However, that this might be a more general feature of important given the large conformational changes pro- protein structure space has only recently been widely teins can undergo. Friedberg and Godzik [18] suggest a recognized. This issue was discussed extensively by Yang similarity measure for protein folds, which is a normalized and Honig [8], who carried out an all-on-all alignment of count of the number of fragment pairwise alignments proteins in the PDB using the PrISM program. As empha- between proteins populating those folds. sized in that work, there is no unambiguous way of clustering proteins into discrete groups, as a significant The availability of so many structural alignment programs number of overlaps and ambiguities will inevitably exist. makes it difficult to establish common standards as to how structural similarity should be described. Some groups Recent applications of structure alignment that do not have carried out comparisons of different programs, using incorporate categorizations from the hierarchical databases receiver operating characteristic (ROC) curves to evaluate and rely only on objective measures of similarity have how well the similarities found by a structural alignment provided further examples of cross-fold similarities. A method imitate a gold standard classification [19].

Protein Structure Comparison: Implications for the Nature of 'Fold

AI and Bioinformatics

Are You an Invited Speaker? a Bibliometric Analysis of Elite Groups for Scholarly Events in Bioinformatics

Dear Delegates,History of Productive Scientiﬁc Discussions of New Challenging Ideas and Participants Contributing from a Wide Range of Interdisciplinary ﬁelds

Improved Prediction of Protein Secondary Structure by Use Of

ISMB/ECCB 2007: the Premier Conference on Computational Biology Thomas Lengauer, B

A Novel Approach for Predicting Protein Functions by Transferring Annotation Via Alignment Networks Warith Djeddi, Sadok Ben Yahia, Engelbert Mephu Nguifo

Basel Computational Biology Conference. from Information to Simulation

Janga-Phd-Thesis.Pdf (PDF, 9Mb)

In Systems Physiology

CV Burkhard Rost

Fast and Free Software for Protein Contact Prediction from Residue Co-Evolution

Ensemble of Bidirectional Recurrent Networks and Random Forests for Protein Secondary Structure Prediction