<<

Mass Fingerprinting and MS/MS Fragment Ion analysis with MASCOT

Gary Van Domselaar University of Alberta Edmtonton, AB [email protected]

Laboratory 2.4 1 Review: Peptide Mass Fingerprinting 337 nm UV laser

cyano-hydroxy Complex cinnamic acid 2D Gel Purified Proteolysis Peptide MALDI Separation Protein Digest Mixture

Mass Spec MRNSYRFLASSL SVVVSLLLIPED 100 100 LASSLSVVVSLLLIPEDVCEK VCEKIIGGNEVT 80 PHSRPYMVLLSL IIGGNEVTPHSR 80 C I C DRKTICAGALIA PYMVLLSLDR 60 I 60

KDWVLTAAHCNL TICAGALIAK %T 40 %T NKRSQVILGAHS DWVLTAAHCNLNKR 40 ITYEEPTKQIML 20 VKKEFPYPCYDP ITTTYEEPTK 20 ATREGDLKLLQL QIMLVK 0 0 EFPYPCYDPATR m/z m/z EGDLKLL Theoretical MS Protein Database In Silico Digestion Experimental MS Laboratory 2.4 2 Review: MS/MS Fragment Ion Analysis

Complex Peptide Protein Proteolysis Digest Mixture HPLC MS/MS

Protein Database In Silico In Silico Digestion Fragmentation 100 100 MRNSYRFLASSL 80 C

80 I SVVVSLLLIPED LASSLSVVVSLLCEK P YMVLLSLDR C I 60 VCEKIIGGNEVT PYM VLLSLDR 60

IIGGNEVTPHSR %T PHSRPYMVLLSL PYMVLLSLDR PYMV LLSLDR %T 40 40 DRKTICAGALIA TICAGALIAK PYMVL LSLDR 20 KDWVLTAAHCNL 20 NKRSQVILGAHS DWVLTAAHCNLNKR PYMVLL SLDR 0 0 ITYEEPTKQIML ITTTYEEPTK PYMVLLS LDR m/z m/z VKKEFPYPCYDP QIMLVK PYMVLLSL DR Theoretical Experimental ATREGDLKLLQL EFPYPCYDPATR PYVLLSLD MR EGDLKLL PYMVLLSLD R Fragmentation Fragmentation Spectrum Spectrum Laboratory 2.4 3 MASCOT

Laboratory 2.4 4 MOWSE

• MOlecular Weight SEarch • Scoring based on peptide frequency distribution from the OWL non redundant Database

Pappin DJC, Hojrup P, and Bleasby AJ (1993) Rapid identification of byBleasby peptide-mass fingerprinting. Curr. Biol. 3:327-332

Laboratory 2.4 5 MOWSE Sequence Mass (M+H) Tryptic Fragments >Protein 1 acedfhsak acedfhsakdfqea 4842.05 dfgeasdfpk sdfpkivtmeeewe ivtmeeewendadnfek ndadnfekqwfe gwfe

>Protein 2 acek acekdfhsadfqea 4842.05 dfhsadfgeasdfpk sdfpkivtmeeewe ivtmeeewenk nkdadnfeqwfe dadnfeqwfe

>Protein 3 SQDDEIGDGTTGVVVLAGALLEEAEQLLDR2 DGDVTVTNDGATILSMMDVD HQIAK MASMGTLAFD EYGRPFLIIK MASMGTLAFDEYGRPFLIIK2 DQDRKSRLMG LEALKSHIM TSLGPNGLDK A AKAVANTMRT SLGPNGLD 14563.36 LMGLEALK KMMVDKDGDVTV TNDGAT LMVELSK ILSM MDVDHQIAKL MVELS AVANTMR KSQDD EIGDGTTGVV VLAG SHIMAAK ALLEEAEQLLDRGIHP IRIAD GIHPIR MMVDK DQDR Laboratory 2.4 6 MOWSE 1. Group Proteins into 10 kDa ‘bins’. >Protein 1 acedfhsakdfqea 4954.13 sdfpkivtmeeewe ndadnfekqwfel

0-10 kDa >Protein 2 acekdfhsadfqea 5672.48 sdfpkivtmeeewe nkdadnfeqwfekq wfei

>Protein 3 MASMGTLAFD EYGRPFLIIK 14563.36 DQDRKSRLMG LEALKSHIM 10-20 kDa A AKAVANTMRT SLGPNGLD KMMVDKDGDVTV TNDGAT ILSM MDVDHQIAKL MVELS KSQDD EIGDGTTGVV VLAG ALLEEAEQLLDRGIHP IRIAD

Laboratory 2.4 7 MOWSE 2. For each protein, place fragments into 100 Da bins.

>Protein 1 Mol. Wt. Fragment Bin Fra g me nt 2098.8909 IVTMEEEWENDADNFEK acedfhsakdfqea 2000-2100 IVTMEEEWENDADNFEK 1183.5266 DFQEASDFPK 1900-2000 sdfpkivtmeeewe 1007.4251 ACEDFHSAK 1800-1900 722.3508 QWFEL 1700-1800 DFHSADFQEASDFPK ndadnfekqwfel 1600-1700 1500-1600 1400-1500 IVTMEEEWENK, DADNFEQWFE >Protein 2 1300-1400 1200-1300 acekdfhsadfqea 1740.7500 DFHSADFQEASDFPK 1100-1200 DFQEASDFPK sdfpkivtmeeewe 1407.6460 IVTMEEEWENK 1000-1100 ACEDFHSAK 1456.6127 DADNFEQWFEK 900-1000 nkdadnfeqwfekq 722.3508 QWFEI 800-900 700-800 wfei 600-700 QWFEL, QWFEI 500-600 400-500

Laboratory 2.4 8 MOWSE The MOWSE frequency distribution plot looks like this:

Laboratory 2.4 9 MOWSE 3. Divide the number of fragments for each bin by the total number of fragments for each 10 kDa protein interval

Bin Fra gment To tal Frequency 2000-2100 IVTMEEEWENDADNFEK 10.125 1900-2000 0 0.000 1800-1900 0 0.000 1700-1800 DFHSADFQEASDFPK 10.125 1600-1700 0 0.000 1500-1600 0 0.000 1400-1500 IVTMEEEWENK, DADNFEQWFE 20.250 1300-1400 0 0.000 1200-1300 0 0.000 1100-1200 DFQEASDFPK 10.125 1000-1100 ACEDFHSAK 10.125 900-1000 0 0.000 800-900 0 0.000 700-800 0 0.000 600-700 QWFEL, QWFEI 2 0.250 500-600 0 0.000 400-500 0 0.000

Laboratory 2.4 10 MOWSE 4. For each 10 kD interval, normalize to the largest bin value

Bin Fra gment To tal Frequency No rmalize d 2000-2100 IVTMEEEWENDADNFEK 1 0.125 0.5 1900-2000 0 0.000 0 1800-1900 0 0.000 0 1700-1800 DFHSADFQEASDFPK 1 0.125 0.5 1600-1700 0 0.000 0 1500-1600 0 0.000 0 1400-1500 IVTMEEEWENK, DADNFEQWFE 20.2501 1300-1400 0 0.000 0 1200-1300 0 0.000 0 1100-1200 DFQEASDFPK 1 0.125 0.5 1000-1100 ACEDFHSAK 1 0.125 0.5 900-1000 0 0.000 0 800-900 0 0.000 0 700-800 0 0.000 0 600-700 QWFEL, QWFEI 2 0.250 1 500-600 0 0.000 0 400-500 0 0.000 0

Laboratory 2.4 11 MOWSE 5. Compare spectrum masses against fragment mass list for each protein in the database. Retrieve the frequency score for each match and multiply.

Bin Fra gment To tal Frequency No rmalize d 2000-2100 IVTMEEEWENDADNFEK 1 0.125 0.5 1900-2000 0 0.000 0 1800-1900 0 0.000 0 1700-1800 DFHSADFQEASDFPK 1 0.125 0.5 1600-1700 0 0.000 0 1740.7500 1500-1600 0 0.000 0 1400-1500 IVTMEEEWENK, DADNFEQWFE 20.2501 1456.6127 1300-1400 0 0.000 0 722.3508 1200-1300 0 0.000 0 1100-1200 DFQEASDFPK 1 0.125 0.5 1000-1100 ACEDFHSAK 1 0.125 0.5 900-1000 0 0.000 0 800-900 0 0.000 0 700-800 0 0.000 0 600-700 QWFEL, QWFEI 2 0.250 1 0.5 x 1 x 1 = 0.5 500-600 0 0.000 0 400-500 0 0.000 0 Laboratory 2.4 12 MOWSE 6. Invert and multiply, and normalize to an 'average' protein of 50 000 k Da:

PN = product of distribution frequency scores = 0.5 x 1 x 1 = 0.5

Score = 50 000 H = 'Hit' Protein MW PN x H = 5672.48

= 50 000 = 17.62 0.5 x 5672.48

Laboratory 2.4 13 MOWSE Takes into account relative abundance of in the database when calculating scores. Protein size is compensated for. The model consists of numerous spaces separated by 100 Da (the average aa mass). Does not provide a measure of confidence for the prediction. •MOWSE • http://www.hgmp.mrc.ac.uk/Bioinformatics/Webapp/mowse/ •MS-Fit • http://prospector.ucsf.edu/ucsfhtml3.2/msfit.htm

Laboratory 2.4 14 MASCOT • Probability-based MOWSE • The probability that the observed match between experimental data and a protein sequence is a random event is approximately calculated for each protein in the sequence database. Probability model details not published.

Perkins DN, Pappin DJC, Creasy DM, and Cottrell JS (1999) Probability-based protein identification by searching sequence databases using data. Electrophoresis 20:3551-3567.

Laboratory 2.4 15 Mascot/Mowse Scoring

• The Mascot Score is given as S = -10*Log(P), where P is the probability that the observed match is a random event

Laboratory 2.4 16 Mascot Scoring – The Mascot Score is given as S = -10*Log(P), where P is the probability that the observed match is a random event – The significance of that result depends on the size of the database being searched. Mascot shades in green the insignificant hits using a P=0.05 cutoff.

In this example, scores less than 74 are insignificant

Mascot Score: 120 = 1x10-12

Laboratory 2.4 17