Mass Spectrometry and Proteomics - Lecture 5
Total Page:16
File Type:pdf, Size:1020Kb
Mass Spectrometry and Proteomics - Lecture 5 - Matthias Trost Newcastle University [email protected] Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation techniques • Search Algorithms • Proteomics software 145 Current limitations of MS-based Proteomics • Cellular proteins span a wide range of expression and current mass spectrometric technologies typically sample only a fraction of all the proteins present in a sample. • Due to limited data quality, only a fraction of all identified proteins can also be reliably quantified. Bantscheff et al, Anal Bioanal Chem, 2007 146 Limitations of Proteomics – concentration of proteins in plasma Anderson & Anderson, MCP, 2002 147 Quantitation techniques Label-free • Ion intensity • Spectral counting Chemical isotopic labeling • ICAT • iTRAQ/TMT • mTRAQ • Formaldehyde label • Enzymatic label Metabolic isotopic labeling • SILAC • 15N 148 The three different spectral sources of quantitative information Wilm, Proteomics, 2010 149 Quantitation methods Isotope label Fragmentation-based label Label-free (SILAC, ICAT, demethyl label etc) (iTRAQ) X Da MS MS/MS 150 Quantitation strategies Bantscheff et al, Anal Bioanal Chem, 2007 151 Characteristics of quantitative MS methods Bantscheff et al, Anal Bioanal Chem, 2007 152 Label-free quantitation Condition A Condition B MS/MS • MASCOT • identification driven peptide assignment Peak detection (in triplicate) Peak detection (in triplicate) Hierarchical clustering 153 Label-free proteomics RLEIpSPDpSpSPER Cond. A Advantages and Disadvantages + Lower complexity + Lower cost + Primary tissue possible (+) Repetitions increase Cond. B identification rates - High LC-reproducibility necessary - Good clustering dependent on high mass accuracy Stdev Cond. A 0.089 - Several peptides for reliable Stdev Cond. B 0.067 quantitation required Ratio Cond. A/Cond. B 0.49 154 Another label-free quantitation: Spectral counting • The number of spectra matched to peptides from a protein is used as a surrogate measure of protein abundance. • As the sampling of peptides in a mass spectrometer is usually depending on the peptides’ intensities, spectral counting has a reasonable statistical significance. • Spectral counting is cheaper, easier to implement and does not require highly reproducible data. • It requires however still thorough computational and statistical analysis. • Modern mass specs are getting to sensitive and fast for this quantitation. 155 Isobaric tag for relative and absolute quantitation (TMT or iTRAQ) • Reacts with N-termini and other primary amines of peptides. • Uses a reporter group for quantification that can be identified in MS/MS spectra. • Another labeled group serves as a balancer. https://www.thermofisher.com/ 156 Isobaric tag for relative and absolute quantitation (TMT or iTRAQ) • Quantification is done in MS/MS mode (low intensity!) • Once labeled with TMT or iTRAQ, the 4/6/8/10 individual samples are pooled for further processing and analysis. • During subsequent MS/MS of the peptides, each isobaric tag produces a unique reporter ion that identifies which samples the peptide originated and its relative abundance. Gingras et al, Nat Rev Mol Cell Biol, 2007 157 Isobaric tag for relative and absolute quantitation (iTRAQ or TMT) + Up to 11 samples (11-plex) can be quantified at the same time. + Saves instrument time. - Quite expensive. - Low dynamic range. - Can not be performed in most ion-trap instruments as they do not reach this low mass range. - Non-changing peptides are favored to be identified. - large mass addition to peptides - high ratios are suppressed by co- www.thermo.com eluting other peptides. 158 Ratio compression in TMT experiments Ow, J Prot Res, 2009 Ting et al, Nature Methods, 2011 159 Reducing ratio compression by using Synchronous Precursor Selection (SPS) 160 Formaldehyde/dimethyl label • Samples are labeled with heavy and light formaldehyde on their primary amines (N-termini, Lys) • relatively cheap and simple. • can be used on virtually any sample. • quite large mass difference between samples. • Problematic retention time shifts in long LC runs due Chen et al, Anal Chem, 2003; Boersema et al, Proteomics, 2008 to Deuterium. 161 Formaldehyde/dimethyl label Chen et al, Anal Chem, 2003 162 Enzymatic isotope label • Further disadvantage: Introduction of 18O at acidic side chains • often incomplete incorporation of the label Miyagi et al, Mass Spec Rev, 2006 163 Stable isotope labeling with amino acids in cell culture (SILAC) • Cells are grown with “normal” and heavy isotope amino acids. + The isotopically labeled peptides are chemically (almost) identical (Retention time etc) + The different samples are mixed at a very early step during sample preparation. - labeled amino acids (Lys/Arg) might be metabolized to other amino acids - Expensive for large amounts of cells. - Not for primary tissue. - Increases complexity of the sample. - Some cell types do not grow well in commons.wikimedia.org dialysed serum. 164 Neutron encoding (NeuCode) SILAC • Makes use of the subtle mass differences caused by nuclear binding energy variation in stable isotopes (“mass defect”). 2 • For example, labelling with lysine with H8 (+8.0502 Da) and Lysine 13 15 with C6 and N2 (+8.0142 Da). • Can only be resolved with very high resolution >200,000. • In a low-resolution (<15,000) MS/MS scan, peaks are overlaying and indistinguishable, thus both peaks add to the intensity. • Theoretically, up to 39 isotopologues of Lysine are possible. Herbert et al, Nature Methods 2013 Rose et al, Anal Chem, 2013 165 Neutron encoding (NeuCode) SILAC (a) Mass calculations of the 39 isotopologues for a +8-Da lysine. Shown in solid black are the isotopologues used for the experiments presented here. (b) Theoretical calculations depicting the percentage of peptides that are resolved (full width at 1% maximum peak height) when spaced 12, 18 or 36 mDa apart for resolving powers (R) of 15,000–1,000,000. (c) Top, MS1 scan collected with typical 30,000 resolving power. Center, a selected precursor with m/z at 827 collected with 30,000 resolving power (black) and the signal recorded in a high- resolution MS1 scan (480,000 resolving power). Herbert et al, Nature Methods 2013 166 Protein Identification • Either “de novo” (thus no database) or from genomic data. • When genomic data is available, the software performs an in silico digestion of the whole database using the specific protease. • The mass of the peptide and the MS/MS spectrum are compared to the theoretical mass and the spectrum. 167 Search Engines • Good search engines take common rules (high peaks after P) into account. • The engines calculates a score from the number of matched peaks compared to peaks present in spectrum. • This score is usually linked to a probability. • Lately, search engines using spectral libraries have emerged. They are much faster and more accurate. However, good spectra for each peptide are required and ideally acquired in different kinds of instruments. 168 Peptide ID & matching For large scale proteomics, identification of peptides becomes a complex matching problem Peptide ID & matching For large scale proteomics, identification of peptides becomes a complex matching problem Database Fragmentation in silico Peptide A Fragment Digestion Peptide A Mass Masses in silico Peptide B Mass Peptide B Fragment Masses Proteome UniProt Database Search Corresponding MS2 data Observed Mass 1000 ± 0.010 Da Intensity m/z The Database Search 1. MS1 filter 2. MS2 scoring 3. Probabilistic analysis Database Search –MS1 filter Peptide A Mass 999.980 Peptide B Mass 999.993 Observed Mass Peptide C Mass 1000 ± 0.010 Da 1000.005 Peptide D Mass 1000.010 Peptide E Mass 1000.025 Database Search –MS1 filter Peptide A Mass 999.980 Peptide B Mass 999.993 Observed Mass Peptide C Mass 1000 ± 0.010 Da 1000.005 Peptide D Mass 1000.010 Peptide E Mass 1000.025 Database Search – theoretical MS/MS spectra Peptide B Mass Score 999.993 9 Peptide C Mass 1000.005 80 Peptide D Mass 1000.010 1 Observed Spectra Observed Mass 1000 ± 0.010 Da Database Search – scoring Theoretical Observed Score spectra spectra Peptide C Mass Observed Mass Peptide Evidence: 1000.005 1000 ± 0.010 Da 80 Search constraints • “Classic” – Peptide/precursor mass accuracy – MS/MS/fragment mass accuracy – Fixed and variable modifications – Enzyme (specificity) – Instrument/type of ions generated • Proposed – Retention time 177 Commonly used Search Engines • Mascot • Sequest • OMSSA • X!Tandem • Andromeda (within MaxQuant) • … 178 Decoy/target strategy to determine FDR 179 Decoy/target strategy to determine FDR probability that the match of score 10 is incorrect ~ 90% probability that a match of score 100 is incorrect ~ 0 PEP =# hits decoy database @ a given score # hits Decoy/target strategy to determine FDR >Ubiquitin MQIFVKTLTGKTITLEVEPSDTIENVKAKIQD KEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKE STLHLVLRLRGG >Ubiquitin MQIFVK Target Database Decoy Database MQIFVK VFIQMK False-Discovery Rate • Peptide/protein identification by mass spectrometry is a statistical analysis with false-negatives and false- positives. • False-discovery rate (FDR) is estimated by searching the data against a combined forward and reversed database. The number of hits from the reversed database is thought equivalent with false hits in the forward database. • Please note that the FDR is on the identification level only, not on the quantitation level. • Commonly accepted FDRs are <1%. 182 Considerations • We accept that a very small proportion of peptide identifications (usually