Data Resources and Tools

6/22/2012 Introductory Bioinformatic Techniques 2012 Wits Bioinformatics Shaun Aron Sequence Structure Function Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron A variety of protein resources online Evolutionary relationships between proteins Several websites/portals dedicated to Detection of local similarities between providing a single interface to multiple proteins to detect functional domains resources Functional predictions Important to differentiate between Protein structural predictions databases, websites and portals Protein-Protein interactions Protein-Nucleotide interactions Protein engineering and design Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron Domain identification Protein classification Multiple domains in eukaryotic proteins . Families Sequence based methods Domain Prediction Structure based methods . Sequence based . SCOP . Structure based . CATH Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron 1 6/22/2012 Experimental determination of structures . X-ray crystallography (requires crystallisation of protein) Sequence and information databases . NMR (for smaller proteins) . NCBI Entrez Protein Database – contains protein . EM (For large complex structures) Structural comparisons sequences from GenBank, RefSeq , as well as Protein folding records from SwissProt, PIR, PRF, and PDB . Folding simulations . UniProtKB – the “Protein knowledgebase”, a Structure prediction comprehensive set of protein sequences. Functional . Secondary structure information on proteins, with accurate, consistent, and . Tertiary structure rich annotation, the amino acid sequence, protein name or . Comparative modeling description, taxonomic data and citation information. De novo predictions Divided into two parts: Swiss-Prot and TrEMBL Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron . UniProt Swiss-Prot – the manually annotated, Prosite reviewed protein sequences in the UniProtKB. Contains documentation entries describing High quality. protein domain, families and functional sites . UniProt TrEMBL – the automatically annotated, together with the associated patterns and profiles unreviewed set of proteins (EMBL-Bank for identifying them (Single motif method) translated). Varying quality. PFam . Collection of protein multiple sequence alignments and profile Hidden Markov Models . Use libraries HMM to define domains in protein sequences (Full Domain Method) Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron SMART BLOCKS . Simple Modular Architect Research Tool - used for . Ungapped multiple alignments corresponding to the identification and annotation of protein the most conserved regions of proteins domains and domain architecture. No longer updated . Makes use of hand curated models for the Panther prediction of protein domains . Predicts protein function based on phylogenetic . Full domain method analyses by comparison to proteins of known functions Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron 2 6/22/2012 Single motif PRINTS methods Regular expressions (PROSITE) . Houses a collection of protein family fingerprints Full domain alignment methods . Fingerprint is a collection of motifs (multiple motif Profiles method) (Profile Library) . Can be used to predict functional families in HMMs uncharacterised sequences (Pfam, SMART, etc) . Hierarchical classification of protein superfamilies Identity matrices . Underpins the BLOCKS database Multiple motif (PRINTS) methods . BLAST, fingerprint and text search Slide duplicated from presentation by Alex Mitchell University of Manchester Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron PDB . Protein Data Bank – single worldwide archive of structural data of biological macromolecules Experimentally validated Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron SCOP . Structural Classification of Proteins . All alpha proteins . All beta proteins . Alpha and beta proteins (a/b) ▪ Mainly parallel beta sheets (beta-alpha-beta units) . Alpha and beta proteins (a+b) ▪ Mainly antiparallel beta sheets (segregated alpha and beta regions) 1dlw . Multi-domain proteins (alpha and beta) 1. Root: scop ▪ Folds consisting of two or more domains belonging to different classes . Membrane and cell surface proteins and peptides 2. Class: All alpha proteins . Small proteins 3. Fold: Globin-like . Coiled coil proteins 4. Superfamily: Globin-like . Designed proteins 5. Family: Truncated hemoglobin 6. Protein: Protozoan/bacterial hemoglobin 7. Species: Ciliate (Paramecium caudatum) Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron 3 6/22/2012 InterPro PIR . Integrated documentation resource of protein . Protein Information Resources families, domains and functional sites . Protein ontology . Collection of data from PROSITE, Pfam, PRINTS, . ProClass: Reports for UniProtKB ProDom, SMART, TIGRFAMs, PIR . ProLink: Literature, Text Mining . Integrated into one single resources for protein information Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron PDB Expasy (Swiss Institute of Bioinformatics) . Portal to PDB database . UniProt, PROSITE, homology modelling, docking, . Tools for searching PDB and related data many many other tools doing protein sequences EBI and identication, mass spectrometry and 2-DE data, protein characterisation and function . European Bioinformatics Institute families, patterns and profiles, post-translational . Tools and databases for primary sequence search, modication, protein structure, protein-protein structural databases and Uniprot interaction, similarity search/alignment, drug design, molecular modelling Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron 4 6/22/2012 Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron 5 6/22/2012 Selected sequences used to build the Prosite profile for the Zn(2)-C6 fungal-type DNA- binding domain The sequence logo below indicates the level of conservation of each residue in the alignment Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron 6 .

Data Resources and Tools

PIR Brochure

Cdna Cloning, Expression and Homology Modeling of a Luciferase from the Firefly Lampyroidea Maculata

Letters to Nature

The Interpro Database, an Integrated Documentation Resource for Protein

Bioinformatics Analysis and Characterization of a Secretory Cystatin from Thelohanellus Kitauei

Mouse Ldhb Conditional Knockout Project (CRISPR/Cas9)

Predicting Protein Structure and Function with Interpro

Introduction to Bioinformatics

Lactate Dehydrogenase – Wikipedia

Cloning and Sequence Analysis of Glyceraldehyde-3-Phosphate Dehydrogenase Gene in Yak*

The Genome Sequence of the Facultative Intracellular Pathogen Brucella Melitensis

Mouse Ldhc Conditional Knockout Project (CRISPR/Cas9)