6/22/2012

Introductory Bioinformatic Techniques 2012 Wits Bioinformatics Shaun Aron

Sequence Structure Function

Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron

 A variety of protein resources online  Evolutionary relationships between proteins  Several websites/portals dedicated to  Detection of local similarities between providing a single interface to multiple proteins to detect functional domains resources  Functional predictions  Important to differentiate between  Protein structural predictions databases, websites and portals  Protein-Protein interactions  Protein-Nucleotide interactions  Protein engineering and design

Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron

 Domain identification  Protein classification  Multiple domains in eukaryotic proteins . Families  Sequence based methods  Domain Prediction  Structure based methods . Sequence based . SCOP . Structure based . CATH

Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron

1 6/22/2012

 Experimental determination of structures . X-ray crystallography (requires crystallisation of protein)  Sequence and information databases . NMR (for smaller proteins) . NCBI Entrez Protein Database – contains protein . EM (For large complex structures)  Structural comparisons sequences from GenBank, RefSeq , as well as  Protein folding records from SwissProt, PIR, PRF, and PDB . Folding simulations . UniProtKB – the “Protein knowledgebase”, a  Structure prediction comprehensive set of protein sequences. Functional . Secondary structure information on proteins, with accurate, consistent, and . Tertiary structure rich annotation, the amino acid sequence, protein name or . Comparative modeling description, taxonomic data and citation information. . De novo predictions Divided into two parts: Swiss-Prot and TrEMBL

Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron

. UniProt Swiss-Prot – the manually annotated,  Prosite reviewed protein sequences in the UniProtKB. . Contains documentation entries describing High quality. , families and functional sites . UniProt TrEMBL – the automatically annotated, together with the associated patterns and profiles unreviewed set of proteins (EMBL-Bank for identifying them (Single motif method) translated). Varying quality.  PFam . Collection of protein multiple sequence alignments and profile Hidden Markov Models . Use libraries HMM to define domains in protein sequences (Full Domain Method) Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron

 SMART  BLOCKS . Simple Modular Architect Research Tool - used for . Ungapped multiple alignments corresponding to the identification and annotation of protein the most conserved regions of proteins domains and domain architecture. . No longer updated . Makes use of hand curated models for the  Panther prediction of protein domains . Predicts protein function based on phylogenetic . Full domain method analyses by comparison to proteins of known functions

Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron

2 6/22/2012

Single motif  PRINTS methods Regular expressions (PROSITE) . Houses a collection of fingerprints Full domain alignment methods . Fingerprint is a collection of motifs (multiple motif Profiles method) (Profile Library) . Can be used to predict functional families in HMMs uncharacterised sequences (Pfam, SMART, etc) . Hierarchical classification of protein superfamilies Identity matrices . Underpins the BLOCKS database Multiple motif (PRINTS) methods . BLAST, fingerprint and text search Slide duplicated from presentation by Alex Mitchell University of Manchester Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron

 PDB . Protein Data Bank – single worldwide archive of structural data of biological macromolecules  Experimentally validated

Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron

 SCOP . Structural Classification of Proteins . All alpha proteins . All beta proteins . Alpha and beta proteins (a/b) ▪ Mainly parallel beta sheets (beta-alpha-beta units) . Alpha and beta proteins (a+b) ▪ Mainly antiparallel beta sheets (segregated alpha and beta regions) 1dlw . Multi-domain proteins (alpha and beta) 1. Root: scop ▪ Folds consisting of two or more domains belonging to different classes . Membrane and cell surface proteins and peptides 2. Class: All alpha proteins . Small proteins 3. Fold: Globin-like . Coiled coil proteins 4. Superfamily: Globin-like . Designed proteins 5. Family: Truncated hemoglobin 6. Protein: Protozoan/bacterial hemoglobin 7. Species: Ciliate (Paramecium caudatum)

Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron

3 6/22/2012

 InterPro  PIR . Integrated documentation resource of protein . Protein Information Resources families, domains and functional sites . Protein ontology . Collection of data from PROSITE, Pfam, PRINTS, . ProClass: Reports for UniProtKB ProDom, SMART, TIGRFAMs, PIR . ProLink: Literature, Text Mining . Integrated into one single resources for protein information

Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron

 PDB  Expasy (Swiss Institute of Bioinformatics) . Portal to PDB database . UniProt, PROSITE, homology modelling, docking, . Tools for searching PDB and related data many many other tools doing protein sequences  EBI and identication, mass spectrometry and 2-DE data, protein characterisation and function . European Bioinformatics Institute families, patterns and profiles, post-translational . Tools and databases for primary sequence search, modication, protein structure, protein-protein structural databases and Uniprot interaction, similarity search/alignment, drug design, molecular modelling

Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron

Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron

4 6/22/2012

Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron

Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron

Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron

5 6/22/2012

 Selected sequences used to build the Prosite profile for the Zn(2)-C6 fungal-type DNA- binding domain  The sequence logo below indicates the level of conservation of each residue in the alignment

Introductory Bioinformatics 2012 - Shaun Aron Introductory Bioinformatics 2012 - Shaun Aron

Introductory Bioinformatics 2012 - Shaun Aron

6