Proteins Terminology

Proteins Terminology

Bourne and Weissig Structural Bioinformatics Proteins Wiley, 2003 Classification, Databases, and Visualization Terminology • Primary Structure • Secondary Structure • Tertiary Structure • Quatenary Structure • Supersecondary Structure • Folds • Domains 1 What is a domain? Richardson (1981) Within a single subunit [polypeptide chain], contiguous portions of the polypeptide chain frequently fold into compact, local semi-independent units called domains. More about domains Why look for domains? • Independent folding units. • Lots of within contacts, few outside. • Domains create their own hydrophobic core. • Regions usually conserved during recombination. • Different domains of the same protein can have different functions. • Domains of the same protein may or may not interact. Domains are the currency of protein function! Domain size • Domains can be between 25 and 500 residues long. • Most are less than 200 residues. • Domains can be smaller than 50 residues, but these need to be stabilized. 2 • Homology: Similarity in characteristics resulting from shared ancestry. • Homologous structures: Structures in different species that are similar because of common ancestry. • Analogy: The similarity of structure between two species that are not closely related; attributable to convergent evolution. SCOP Levels Structural Classification of Proteins • Family: Clear evolutionarily relationship. In general • Proteins are classified (manually!) taking both >30% pairwise residue identities between the structural and evolutionary relationship into account. proteins. • There are 7 classes of proteins, the main ones being • Superfamily: Probable common evolutionary origin. all α, all β, α/β, and α+β. Proteins have low sequence identities, but structural and functional features suggest that a common • The principle levels in the hierarchy are fold, evolutionary origin is probable. superfamily, family. • Fold: Major structural similarity. Proteins have the same major secondary structures in same arrangement and with the same topological connections. Hubbard, Murzin, Brenner and Chothia (1997) 3 5NLL flavodoxin 1AMO cytochrome reductase 1CHN signal transduction protein CHEY 1FNB spinach ferredoxin reductase CATH Protein Structure Classification • The CATH database is a hierarchical domain classification of protein structures in the Brookhaven protein databank. Only NMR structures and crystal structures solved to resolution better than 3.0 angstroms are considered. • There are four major levels in this hierarchy: Class, Architecture, Topology (fold family) and Homologous superfamily. • Multidomain proteins are subdivided into their domains using a consensus procedure. All the classification is performed on individual protein domains. Orengo, Michie, Jones, Jones, Swindells, Thornton (1997) SCOP versus CATH 4 DALI Distance Matrix Alignment • DALI generates alignments of structural fragments, and is able to find alignments involving chain reversals and different topologies. • The algorithm uses distance matrices to represent each structure to be compared. • Application of DALI to the entire PDB produces two classifications of structures: FSSP and DDD (3D). Holm and Sander (1993) FSSP and DDD References: Holm and Sander • The families of structurally similar proteins • Protein Structure Comparison by Alignment of Distance (FSSP) is a database of structural alignments Matrices, of proteins in the protein data bank (PDB). It Journal of Molecular Biology 233, pp 123-138, 1993. presents the results of applying DALI to (almost) • The FSSP Database of Structurally Aligned Protein Fold Families, all chains of proteins in the PDB. Nucleic Acids Research 22 (17), pp 3600-3609, 1994. • The DALI domain dictionary (DDD) is a • Mapping the Universe, corresponding classification of recurrent Science 273 (5275), pp 595-602, 1996. domains automatically extracted from known • Touring Protein Fold Space with Dali/FSSP, proteins. Nucleic Acids Research 26 (1), pp 316-319, 1998. 5 Other Algorithms for Domain Decomposition • The Protein Domain Parser (PDP) uses compactness as a chief principle. http://123d.ncifcrf.gov/pdp.html • DomainParser is graph theory based. The underlying principle used is that residue-residue contacts are denser within a domain than between domains. http://compbio.ornl.gov/structure/domainparser/ Parsing sequence into domains Growth of PDB • Look for internal duplication. • Look for low complexity segments. • Look for transmembrane segments. Why is that important? • Functional insights. • Improved database searching. • Fold recognition. • Structure determination. PRODOM: http://protein.touluse.inra.fr/prodom/current/html/home.php PFAM: http://www.sanger.ac.uk/Software/Pfam/ 6.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us