Biochem 218 Protein Structural Motifs
Total Page:16
File Type:pdf, Size:1020Kb
Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 http://biochem218.stanford.edu/ Protein Structural Motifs Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Homework 5: Phylogenies • For this homework assignment take 20 to 30 protein sequences which are at least 30% similar or better and: o 1) make a multiple sequence alignment with them using ClustalW and o 2) make two phylogenies, one using UPGMA method and the other using the Neighbor Joining method • Describe the resulting alignments and include graphic images of the phylogenies in a message to [email protected] • Mention if the trees seem reasonable biologically or taxonomically by comparison with standard taxonomies • Do the two trees have the same topology? • Do the trees have the same branch lengths? • If the two trees do not have the same topology or branch lengths, describe the differences and indicate why you think the two trees differ. Are the differences significant? • Do the trees show evidence of paralogous evolution? Which nodes are orthologous and which are paralogous bifurcations? • Do the trees show evidence of either gene conversion or horizontal gene transfer? Final Projects Due March 12 • Examples of Previous Final Projects o http://biochem218.stanford.edu/Projects.html • Critical review of any area of computational molecular biology. o Area from the lectures but in more depth o Any other area of bioinformatics or genomics focused on computational approaches • Proposed improvement or novel approach • Can be a combined experimental/computational method. • Could be an implementation or just pseudocode. • Please do a MeSH literature search for Reviews on your topic. Some useful MeSH terms include: o Algorithms o Statistics o Molecular Sequence Data o Molecular Structure etc. • Please send a proposed final project topic to [email protected] by next Friday Protein Structure Computational Goals • Compare all known structures to each other • Compute distances between protein structures • Classify and organize all structures in a biologically meaningful way • Discover conserved substructure domain • Discover conserved substructural motifs • Find common folding patterns and structural/functional motifs • Discover relationship between structure and function. • Study interactions between proteins and other proteins, ligands and DNA (Protein Docking) • Use known structures and folds to infer structure from sequence (Protein Threading) • Use known structural motifs to infer function from structure • Many more… Structural Classification of Proteins (SCOP) http://scop.berkeley.edu/ • Class o Similar secondary structure content o All α, all β, alternating α/ β etc • Fold (Architecture) o Major structural similarity o SSE’s in similar arrangement • Superfamily (Topology) o Probable common ancestry o HMM family membership • Family o Clear evolutionary relationship o Pairwise sequence similarity > 25% Classes of Protein Structures • Mainly α • Mainly β α β alternating o Parallel β sheets, β-α-β units • α β o Anti-parallel β sheets, segregated α and β regions o helices mostly on one side of sheet Classes of Protein Structures • Others o Multi-domain, membrane and cell surface, small proteins, peptides and fragments, designed proteins Folds / Architectures • Mainly α • α/ βa nd α+β o Bundle • Closed o Non-Bundle • • Mainly β Barrel o Single sheet • Roll, ... o Roll • Open o Barrel • Sandwich o Clam • Clam, ... o Sandwich o Prism o 4/6/7/8 Propeller o Solenoid The TIM Barrel Fold A Conceptual Problem ... Fold versus Topology Another example: Globin vs. Colicin PDB Protein Database http://www.rcsb.org/pdb/ • Protein DataBase o Multiple Structure Viewers o Sequence & Structure Comparison Tools o Derived Data . SCOP . CATH . pFAM . Go Terms o Education on Protein Structure o Download Structures and Entire Database PDB Protein Database http://www.rcsb.org/pdb/ PDB Protein Database http://www.rcsb.org/pdb/ PDB Advanced Search for UniProt Entry http://www.rcsb.org/pdb/ PDB Search Results http://www.rcsb.org/pdb/ PDB E. coli Hu Entry http://www.rcsb.org/pdb/explore/explore.do?structureId=2O97 PDB SimpleViewer http://www.rcsb.org/pdb/ PDB Protein Workshop View http://www.rcsb.org/pdb/ PDB Derived Data http://www.rcsb.org/pdb/ Molecule of the Month: Enhanceosome http://www.rcsb.org/pdb/static.do?p=education_discussion/molecule_of_the_month/current_month.html NCBI Structure Database http://www.ncbi.nlm.nih.gov/Structure/ • Macromolecular Structures • Related Structures • View Aligned Structures & Sequences • Cn3D: Downloadable Structure & Sequence Viewer • CDD: Conserved Domain Database o CD-Search: Protein Sequence Queries o CD-TREE: Protein Classification Downloadable Application o CDART: Conserved Domain Architecture Tool • PubChem: Small Molecules and Biological Activity • Biological Systems: BioCyc, KEGG and Reactome Pathways • MMDB: Molecular Modeling Database • CBLAST: BLAST sequence against PDB and Related Structure Database • IBIS: Inferred Biomolecular Interaction Server • VAST Search: Structure Alignment Tool NCBI Structure Database http://www.ncbi.nlm.nih.gov/Structure/ NCBI Structure Database http://www.ncbi.nlm.nih.gov/Structure/ NCBI Cn3D Viewer http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml PyMol PDB Structure Viewer http://www.pymol.org/ Databases of Protein Folds • SCOP (http://scop.berkeley.edu/) o Structural Classification of Proteins o Class-Fold-Superfamily-Family o Manual assembly by inspection • Superfamily (http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/) o HMM models for each SCOP fold o Fold assignments to all genome ORFs o Assessment of specificity/sensitivity of structure prediction o Search by sequence, genome and keywords • CATH (http://www.biochem.ucl.ac.uk/bsm/cath/) o Class - Architecture - Topology - Homologous Superfamily o Manual classification at Architecture level o Automated topology classification using SSAP (Orengo & Taylor) • FSSP (http://www2.embl-ebi.ac.uk/dali/fssp/) o Fully automated using the DALI algorithm (Holm & Sander) o No internal node annotations o Structural similarity search using DALI SCOP Database of Protein Folds http://scop.berkeley.edu/ SCOP Hierarchy http://scop.berkeley.edu/data/scop.b.html SCOP Alpha and Beta Proteins http://scop.berkeley.edu/data/scop.b.d.html SCOP TIM Barrels http://scop.berkeley.edu/data/scop.b.d.b.html SCOP Thiamin Phosphate Synthase http://scop.berkeley.edu/data/scop.b.d.b.d.A.html SCOP Thiamin Phosphate Synthase Entry http://scop.berkeley.edu/ SuperFamily HMM Fold Library http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/ SuperFamily Major Features http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/ Genome Assignments by Superfamily http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/ Databases of Protein Folds • SCOP (http://scop.berkeley.edu/) o Structural Classification of Proteins o Class-Fold-Superfamily-Family o Manual assembly by inspection • Superfamily (http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/) o HMM models for each SCOP fold o Fold assignments to all genome ORFs o Assessment of specificity/sensitivity of structure prediction o Search by sequence, genome and keywords • CATH (http://www.biochem.ucl.ac.uk/bsm/cath/) o Class - Architecture - Topology - Homologous Superfamily o Manual classification at Architecture level o Automated topology classification using SSAP (Orengo & Taylor) • FSSP (http://www2.embl-ebi.ac.uk/dali/fssp/) o Fully automated using the DALI algorithm (Holm & Sander) o No internal node annotations o Structural similarity search using DALI CATH Protein Structure Classification http://www.biochem.ucl.ac.uk/bsm/cath/ CATH Protein Structure Hierarchy http://www.biochem.ucl.ac.uk/bsm/cath/ CATH Protein Class Level http://www.biochem.ucl.ac.uk/bsm/cath/ CATH Orthogonal Bundle http://www.biochem.ucl.ac.uk/bsm/cath/ CATH Protein Summary http://www.biochem.ucl.ac.uk/bsm/cath/ CATH Protein Summary http://www.biochem.ucl.ac.uk/bsm/cath/ Databases of Protein Folds • SCOP (http://scop.berkeley.edu/) o Structural Classification of Proteins o Class-Fold-Superfamily-Family o Manual assembly by inspection • Superfamily (http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/) o HMM models for each SCOP fold o Fold assignments to all genome ORFs o Assessment of specificity/sensitivity of structure prediction o Search by sequence, genome and keywords • CATH (http://www.biochem.ucl.ac.uk/bsm/cath/) o Class - Architecture - Topology - Homologous Superfamily o Manual classification at Architecture level o Automated topology classification using SSAP (Orengo & Taylor) • FSSP (http://www2.embl-ebi.ac.uk/dali/fssp/) o Fully automated using the DALI algorithm (Holm & Sander) o No internal node annotations o Structural similarity search using DALI FSSP Database http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+LibInfo+-lib+FSSP Dali Server http://www.ebi.ac.uk/dali/ DALI Database (Liisa Holm) http://ekhidna.biocenter.helsinki.fi/dali/start Protein Fold Prediction: Swiss Model http://swissmodel.expasy.org/ • Amos Bairoch, Swiss Bioinformatics Institute, SBI • Threading and Template Discovery • Workspace for saving Template Results • Domain Annotation • Structure Assessment • Template Library • Structures & Models • Documentation and Tutorials Protein Fold Prediction: Swiss Model http://swissmodel.expasy.org/ Automatic Protein Fold Prediction http://swissmodel.expasy.org/ Automatic Protein Fold Prediction Results http://swissmodel.expasy.org/ Automatic Protein Fold Prediction Results http://swissmodel.expasy.org/ Automatic Protein Fold Prediction Results http://swissmodel.expasy.org/