Protein Sequence Databases …And Your Mass Spectrometry-Based Proteomics Experiment

Protein Sequence Databases …And Your Mass Spectrometry-Based Proteomics Experiment

Protein Sequence Databases …and your Mass Spectrometry-based Proteomics Experiment © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 Outline Terminology • Protein Database (DB) • FASTA • Origin • Database repository • Sources • Format • NCBI database • Size • UniProtKB • Composition • Swiss Prot • Selecting a database for mass spec search • Ref Seq (reference • Effect of DB on mass spec sequence) search results • Homology • Post MS analysis: protein • Contaminants DB annotation, ontology, alignment • Ontology © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 FASTA Protein Sequence • Name and Origin • FASTA (pronounced ‘fast-aye’) • ORIGIN: for sequence similarity alignment tool (1985) • REF: DJ Lipman, WR Pearson (1985) PMID: 2983426 "The algorithm has been implemented in a computer program designed to search protein databases very rapidly. For example, comparison of a 200-amino-acid sequence to the 500,000 residues in the National Biomedical Research Foundation library would take less than 2 minutes on a minicomputer, and less than 10 minutes on a microcomputer (IBM PC)." • Stands for “fast all” – the file format worked with ‘all’ alphabets (amino acid and nucleotide) © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 FASTA Protein Sequence Format • Structure: TEXT file • Line 1: description line with sequence identifier • Line 2: single amino acid letter protein sequence 80 characters wide • Allowed characters: • AMINO ACID ONE-LETTER CODE • X • * • - • Custom one-letter amino acid codes © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 Line 1: description line with sequence identifier FASTA Format Header Line Sequence Identifiers © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 Line 2 FASTA Protein Sequence from NCBI- example Line 1 Line 2 NOTE: In Sept 2016, gi numbers were replaced with accession.version identifiers © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 Selecting a Protein Sequence Database • Public repositories, such as • NCBI • UniProtKB • Swiss Prot: manually annotated and reviewed • TrEMBL: Automatically annotated and not reviewed • Custom (from customer) • NOTE: format is important! • Represent species (1 or more) from which protein sample originated • Example: Mouse protein expressed in E. coli • Ideal size range ~ 2000 to < 1 million entries © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 Selecting a Protein Database: UniProtKB repository © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 Selecting a Protein Database: NCBI Ref Seq repository © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 Choose Your Taxonomy or Taxonomies NOTES: • If recombinant protein expressed in host cell, include host proteins & expressed protein(s) • If protein database for your species has <2000 proteins, merge with another protein database (yeast) for statistical reasons • Protein sequence headers must be parsed correctly © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 Taxonomy specification - UniProtKB (19996) © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 Taxonomy specification - NCBI © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 Protein Database repository content for Thirteen-lined Ground Squirrel Database Source Number of Proteins Swiss-Prot* reviewed 20 TrEMBL* unreviewed 20,076 UniProt Reference Proteome 19,966 NCBI (‘non-redundant’) 30,130 NCBI Reference Sequence 29,842 * From UniProt © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 Protein Database Characteristics …related to your mass spectrometry experiment © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 SPLICES FORM variants Sequence alignments: Protein Cytochrome P450 2D6 © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 Protein Sequence Variants Natural variants) SNP’s (single nucleotide polymporphisms) https://hive.biochemistry.gwu.edu © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 In silico trypsin digest, ‘native’ protein © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 In silico trypsin digest, with VARIANTS 1 2 © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 Effect of Variant on Peptide Mass Peptide example Peptide Mass * Peptide Sequence 1 – native 1730.8443 SELEEQLTPVAEETR 1 – variant (Q -> K) 1730.8806 SELEEKLTPVAEETR 1 – variant (Q -> K) 734.3566 SELEEK 1 – variant (Q -> K) 1015.5418 LTPVAEETR 2 – native 830.4366 EQVAEVR 2 – variant (V -> E) 860.4108 EQEAEVR * Monoisotopic [M + H]+1 © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 Proteomics Search Program Meets Protein Sequence Database • Protein sequence file is downloaded to local computer • Merge with common lab contaminants (keratins and more) database • http://www.thegpm.org/crap/ • Protein database is imported or indexed in the proteomics search program (sequence format is critical) • REVERSED sequences are generated for False Discovery Rate (FDR) calculations • Protein sequences are digested with enzymes in silico © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 Database search > Protein List • Database search algorithm matches spectrum > peptide > protein • RESULTS: List of protein identifications with accession numbers • POST Database search options (outside CMSP): 1. Protein annotation 2. Sequence alignment 3. Obtain related Gene Ontology information © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 POST Database search options What you can do with your protein list. © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 1) Protein Annotation from UniProtKB © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 2) Sequence alignment with UniProt alignment tool © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 2) Sequence alignment with UniProt alignment tool: numerous amino acid labeling options * (asterisk) indicates positions which have a single, fully conserved residue. : (colon) indicates conservation between groups of strongly similar properties - scoring > 0.5 in the Gonnet PAM 250 matrix. (period) indicates conservation between groups of weakly similar properties - scoring =< 0.5 in the Gonnet PAM 250 matrix. © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 2) Sequence alignment with NCBI BLAST © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 3) Link Gene Ontology information to Proteins • Define: “The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products across databases.” • Ontologies/Vocabularies • molecular function: molecular activities of gene products • cellular component: where gene products are active • biological process: pathways and larger processes made up of the activities of multiple gene products (http://geneontology.org/page/documentation) © 2015 Regents of the University of Minnesota. All rights reserved. Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 Molecular Function Pie Chart for a List of 96 Protein Identifiers (gi numbers) submitted to PANTHER (http://www.pantherdb.org/) Protein list from Supplemental data REF: Thu TM et al (2016) Cell Reports,

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    31 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us