<<

Mass Spectrometry and - Lecture 5 -

Matthias Trost Newcastle University [email protected] Previously

• Proteomics • Sample prep

144 Lecture 5

• Quantitation techniques • Search Algorithms • Proteomics software

145 Current limitations of MS-based Proteomics

• Cellular span a wide range of expression and current mass spectrometric technologies typically sample only a fraction of all the proteins present in a sample. • Due to limited data quality, only a fraction of all identified proteins can also be reliably quantified.

Bantscheff et al, Anal Bioanal Chem, 2007 146 Limitations of Proteomics – concentration of proteins in plasma

Anderson & Anderson, MCP, 2002 147 Quantitation techniques

Label-free • Ion intensity • Spectral counting

Chemical isotopic labeling • ICAT • iTRAQ/TMT • mTRAQ • Formaldehyde label • Enzymatic label

Metabolic isotopic labeling • SILAC • 15N 148 The three different spectral sources of quantitative information

Wilm, Proteomics, 2010

149 Quantitation methods

Isotope label Fragmentation-based label Label-free (SILAC, ICAT, demethyl label etc) (iTRAQ)

X Da MS

MS/MS

150 Quantitation strategies

Bantscheff et al, Anal Bioanal Chem, 2007 151 Characteristics of quantitative MS methods

Bantscheff et al, Anal Bioanal Chem, 2007 152 Label-free quantitation

Condition A Condition B MS/MS

• MASCOT • identification driven peptide assignment

Peak detection (in triplicate) Peak detection (in triplicate) Hierarchical clustering 153 Label-free proteomics

RLEIpSPDpSpSPER Cond. A Advantages and Disadvantages

+ Lower complexity + Lower cost + Primary tissue possible (+) Repetitions increase Cond. B identification rates

- High LC-reproducibility necessary - Good clustering dependent on high mass accuracy Stdev Cond. A 0.089 - Several peptides for reliable Stdev Cond. B 0.067 quantitation required Ratio Cond. A/Cond. B 0.49

154 Another label-free quantitation: Spectral counting • The number of spectra matched to peptides from a is used as a surrogate measure of protein abundance. • As the sampling of peptides in a mass spectrometer is usually depending on the peptides’ intensities, spectral counting has a reasonable statistical significance. • Spectral counting is cheaper, easier to implement and does not require highly reproducible data. • It requires however still thorough computational and statistical analysis. • Modern mass specs are getting to sensitive and fast for this quantitation. 155 Isobaric tag for relative and absolute quantitation (TMT or iTRAQ)

• Reacts with N-termini and other primary amines of peptides. • Uses a reporter group for quantification that can be identified in MS/MS spectra. • Another labeled group serves as a balancer.

https://www.thermofisher.com/ 156 Isobaric tag for relative and absolute quantitation (TMT or iTRAQ)

• Quantification is done in MS/MS mode (low intensity!) • Once labeled with TMT or iTRAQ, the 4/6/8/10 individual samples are pooled for further processing and analysis. • During subsequent MS/MS of the peptides, each isobaric tag produces a unique reporter ion that identifies which samples the peptide originated and its relative abundance.

Gingras et al, Nat Rev Mol Biol, 2007

157 Isobaric tag for relative and absolute quantitation (iTRAQ or TMT)

+ Up to 11 samples (11-plex) can be quantified at the same time. + Saves instrument time. - Quite expensive. - Low dynamic range. - Can not be performed in most ion-trap instruments as they do not reach this low mass range. - Non-changing peptides are favored to be identified. - large mass addition to peptides - high ratios are suppressed by co- www.thermo.com eluting other peptides. 158 Ratio compression in TMT experiments

Ow, J Prot Res, 2009 Ting et al, Nature Methods, 2011 159 Reducing ratio compression by using Synchronous Precursor Selection (SPS)

160 Formaldehyde/dimethyl label

• Samples are labeled with heavy and light formaldehyde on their primary amines (N-termini, Lys) • relatively cheap and simple. • can be used on virtually any sample. • quite large mass difference between samples. • Problematic retention time shifts in long LC runs due

Chen et al, Anal Chem, 2003; Boersema et al, Proteomics, 2008 to . 161 Formaldehyde/dimethyl label

Chen et al, Anal Chem, 2003

162 Enzymatic label

• Further disadvantage: Introduction of 18O at acidic side chains

• often incomplete incorporation of the label

Miyagi et al, Mass Spec Rev, 2006 163 Stable isotope labeling with amino acids in cell culture (SILAC)

• Cells are grown with “normal” and heavy isotope amino acids. + The isotopically labeled peptides are chemically (almost) identical (Retention time etc) + The different samples are mixed at a very early step during sample preparation. - labeled amino acids (Lys/Arg) might be metabolized to other amino acids - Expensive for large amounts of cells. - Not for primary tissue. - Increases complexity of the sample. - Some cell types do not grow well in commons.wikimedia.org dialysed serum. 164 encoding (NeuCode) SILAC • Makes use of the subtle mass differences caused by nuclear binding energy variation in stable (“mass defect”). 2 • For example, labelling with lysine with H8 (+8.0502 Da) and Lysine 13 15 with C6 and N2 (+8.0142 Da). • Can only be resolved with very high resolution >200,000. • In a low-resolution (<15,000) MS/MS scan, peaks are overlaying and indistinguishable, thus both peaks add to the intensity. • Theoretically, up to 39 of Lysine are possible.

Herbert et al, Nature Methods 2013 Rose et al, Anal Chem, 2013 165 Neutron encoding (NeuCode) SILAC (a) Mass calculations of the 39 isotopologues for a +8-Da lysine. Shown in solid black are the isotopologues used for the experiments presented here. (b) Theoretical calculations depicting the percentage of peptides that are resolved (full width at 1% maximum peak height) when spaced 12, 18 or 36 mDa apart for resolving powers (R) of 15,000–1,000,000. (c) Top, MS1 scan collected with typical 30,000 resolving power. Center, a selected precursor with m/z at 827 collected with 30,000 resolving power (black) and the signal recorded in a high- resolution MS1 scan (480,000 resolving power). Herbert et al, Nature Methods 2013 166 Protein Identification

• Either “de novo” (thus no database) or from genomic data. • When genomic data is available, the software performs an in silico digestion of the whole database using the specific protease. • The mass of the peptide and the MS/MS spectrum are compared to the theoretical mass and the spectrum.

167 Search Engines

• Good search engines take common rules (high peaks after P) into account. • The engines calculates a score from the number of matched peaks compared to peaks present in spectrum. • This score is usually linked to a probability. • Lately, search engines using spectral libraries have emerged. They are much faster and more accurate. However, good spectra for each peptide are required and ideally acquired in different kinds of instruments.

168 Peptide ID & matching

For large scale proteomics, identification of peptides becomes a complex matching problem Peptide ID & matching

For large scale proteomics, identification of peptides becomes a complex matching problem Database

Fragmentation in silico Peptide A Fragment Digestion Peptide A Mass Masses in silico Peptide B Mass Peptide B Fragment Masses Proteome UniProt Database Search

Corresponding MS2 data

Observed Mass 1000 ± 0.010 Da Intensity m/z The Database Search 1. MS1 filter 2. MS2 scoring 3. Probabilistic analysis Database Search –MS1 filter

Peptide A Mass 999.980

Peptide B Mass 999.993

Observed Mass Peptide C Mass 1000 ± 0.010 Da 1000.005

Peptide D Mass 1000.010

Peptide E Mass 1000.025 Database Search –MS1 filter

Peptide A Mass 999.980

Peptide B Mass 999.993

Observed Mass Peptide C Mass 1000 ± 0.010 Da 1000.005

Peptide D Mass 1000.010

Peptide E Mass 1000.025 Database Search – theoretical MS/MS spectra

Peptide B Mass Score 999.993 9 Peptide C Mass 1000.005 80

Peptide D Mass 1000.010 1

Observed Spectra

Observed Mass 1000 ± 0.010 Da Database Search – scoring Theoretical Observed Score spectra spectra

Peptide C Mass Observed Mass Peptide Evidence: 1000.005 1000 ± 0.010 Da 80 Search constraints

• “Classic” – Peptide/precursor mass accuracy – MS/MS/fragment mass accuracy – Fixed and variable modifications – Enzyme (specificity) – Instrument/type of ions generated • Proposed – Retention time

177 Commonly used Search Engines

• Mascot • Sequest • OMSSA • X!Tandem • Andromeda (within MaxQuant) • …

178 Decoy/target strategy to determine FDR

179 Decoy/target strategy to determine FDR probability that the match of score 10 is incorrect ~ 90% probability that a match of score 100 is incorrect ~ 0

PEP =# hits decoy database @ a given score # hits Decoy/target strategy to determine FDR >Ubiquitin MQIFVKTLTGKTITLEVEPSDTIENVKAKIQD KEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKE STLHLVLRLRGG >Ubiquitin MQIFVK Target Database Decoy Database MQIFVK VFIQMK False-Discovery Rate

• Peptide/protein identification by is a statistical analysis with false-negatives and false- positives. • False-discovery rate (FDR) is estimated by searching the data against a combined forward and reversed database. The number of hits from the reversed database is thought equivalent with false hits in the forward database. • Please note that the FDR is on the identification level only, not on the quantitation level. • Commonly accepted FDRs are <1%.

182 Considerations

• We accept that a very small proportion of peptide identifications (usually set to 1%) will likely be false discoveries

• Hence, having multiple supporting peptides per protein is important for confident identification and quantitation Considerations

• FDR estimation is challenging using small databases or when most of the database is identified. Always use bigger databases (for example include human with bacterial database) Considerations

• Choose your PTMs wisely

• Too many PTMs lead to combinatorial explosion and long database search times

• Common chemical modifications – Deamidation (NQ) – Gln  PyroGlu – Oxidation (M) – Carbamidomethylation (C) – Acetyl (N-terminus) Considerations

The vast majority of MS identification and quantitation is performed on peptides; information on proteins is through inference

The peptide to protein relationship is a “many to many” match VFIQMK Protein A VFIQMK Protein B TLSDYNIQK Protein C ESTLHLVLR Protein A EGIPPDQQR Protein B MQIFVK Protein C Considerations

Assigning non-unique peptides:

“Occam’s Razor” Accept the simplest explanation that fits the observations

Non-unique peptides are assigned to proteins that have the most unique peptides VFIQMK Protein A VFIQMK Protein B TLSDYNIQK Protein C ESTLHLVLR Protein A EGIPPDQQR Protein B MQIFVK Protein C Check your data: histograms

• Evaluate distribution of data

• Normalise data

• Calculate standard deviation to set cutoffs Check your data: scatter plots

• Intensities vs intensities

• Reproducibility Check your data: volcano plots

• Evaluate experimental reproducibility (0.05 is usual p-value cutoff)

• Appropriate fold change cutoff depends on standard deviation Databases

• UniProt databases are the standard for mouse, human and most other organisms. • They should be ideally non-redundant. • Can/should contain splice variants. • Database should not be too small (problem for bacteria) as FDR calculation might be wrong. • A common set of contaminants (keratin, BSA, milk proteins…) should be added to the searched database.

191 Software for MS ID and Quant

Software Platforms ID only TMT quantitation • Mascot • COMPASS • MaxQuant • Sequest • MaxQuant • Trans Proteomic • OMSSA • Proteome Pipeline (TPP) • Morpheus Discoverer • Proteome SRM/Targeted Discoverer de novo • PEAKS sequencing • Scaffold • Skyline • PEAKS