<<

Quantifying Proteomes Using the Open- source Trans-Proteomic Pipeline Michael Hoopmann Jason Winget Luis Mendoza Robert Moritz Shotgun Mass Spectrometry

Sample Instrument Discovery

Mass Spectrometer

Digestion / Spectra Separation Data (HPLC) Analysis More Detailed Look at Shotgun Data Analysis

1 2 3 4 Data Conversion Spectrum ID Validation Protein Inference Identification

5 6 7 Quantification Visualization Dissemination The Trans-Proteomic Pipeline (aka The TPP)

Raw Mass Spec Peptide Peptide Validation Quantitation Protein Assignment Protein List Data Identification

Comet PeptideProphet ASAPRatio X!Tandem SBEAMS SpectraST XPRESS msconvert iProphet ProteinProphet Kojak Libra SEQUEST* ProteoGrapher PTMProphet Mascot* StPeter

mzML pepXML protXML

. Simple set of input/outputs readable by all tools. . Modular flow – can swap algorithms into and out of pipeline. . Expandable – can add or remove tools as necessary. . Simple GUI – Operates in a web browser, multi-platform. Versatility of The TPP Instrument Vendor & 3rd Data From Major Party Vendors Applications Data Cloud Open, TPP Applications Data Standardized Converters Formats Search Validate Infer Quant.

mzML mzXML Publish The Trans- mzIdentML Multi-Platform Web Interface pepXML Proteomic Create protXML Visualize Organize Pipeline Suite More! Reports of Software The Trans-Proteomic Pipeline

. MultipleAll tools accessibleuser accounts.from drop -down menu interface. . Maintain . independentInterface has a few projectscommon and pipelines data storage.pre-built. The Trans-Proteomic Pipeline

. All stages of pipeline accessible at any time, ordered for optimal performance.

. Stages allow for customization.

. Applications within pipeline can be re-run with different parameters to refine analyses. The Trans-Proteomic Pipeline StPeter: An application for label-free quantitation StPeter – Label-free Quantitation in The TPP

. Historically, quantitation in the TPP focused on labeled Quantitation methods.

. Label-free methods are often ASAPRatio less work at the bench. Labeled XPRESS methods, e.g. . More recently, label-free Libra iTRAQ & SILAC approaches have become more robust. StPeter New! Label-free!

. StPeter is a new tool in the TPP for label-free quantitation.

StPeter: MS/MS-based Quantitation

Spectral Counting . Contains multiple MS/MS counting-centric methods.

. Produces relative protein quantitation within a sample.

. All results are normalized to facilitate comparisons across samples. StPeter: The NSAF Model

Small proteins produce fewer distinct peptide molecules. . Normalized Spectral Abundance Factor (NSAF) Large proteins produce many distinct peptides, appearing more abundant by total MS/MS count. . Contains two normalizations: . Protein length . Sample-to-sample variation

Zybailov, B.; Mosley, A. L.; Sardiu, M. E.; Coleman, M. K.; Florens, L.; Washburn, M. P. J. Proteome Res. 2006, 5, 2339–2347. StPeter: The SIN Model

. Spectral counts are integral. . i.e. a scan from a low abundance SI adds fragment N peptide has equal weight as a scan ion intensity to NSAF. from a high abundance peptide.

. Normalized Spectral Index (SIN) incorporates MS/MS peak height. . More abundant peptides produce 푆푝퐶 more abundant fragment ion peaks. 푛=1 푖푛 푆퐼푁 = / 퐿 푁 ( 푆푝퐶 푖 ) 푗=1 푛=1 푛 푗 Where i is the summed intensities of fragment ions.

Griffin NM, et al. Nat Biotechnol. 2010 Jan;28(1):83-9 StPeter: The Distributed NSAF (dNSAF) Model

. Modifies the NSAF model to Protein Protein account for shared peptides. A B . i.e. peptides that map to multiple protein sequences.

. Utilizes the fraction of non-shared peptides to split the spectral counts 2 shared SpC of the shared peptides. 5 unique SpC 2 unique SpC Distributed SpC = 6.4 Distributed SpC = 2.6

Zhang Y, et al. Anal Chem. 2010 Mar 15;82(6):2272-81 StPeter: The Distributed SIN (dSIN) Model

. Modifies the SIN model to account for shared peptides. . i.e. peptides that map to multiple 푛 protein sequences. 푑푆퐼푁 = 푑푆퐼/ 푑푆퐼푗 /퐿 . Utilizes the fraction of non-shared 푗=1 peptides to split the spectral counts Hoopmann MR, Winget JM, Mendoza L, Moritz RL. J Proteome of the shared peptides. Res. 2018 Mar 2;17(3):1314-1320.

. Modernizes spectral index analysis with the methods optimized for dNSAF. Adding Quantitation to the Pipeline

. Designed to be seamlessly Spectral integrated into existing TPP Search Pipelines. ID Validation . Execution is FAST – typically seconds to a few minutes. Protein

Inference . Cassette model ensures addition StPeter or removal without disrupting the pipeline. Visualization How StPeter Works

Output is the same format as input, with protein quantities appended; protXML Downstream tools operate the same.

Read pepXML Add fragment ions Protein to spectral index

Passes yes Read Extract PSM list More yes Extract FDR? Peptide List from pepXML PSMs? Spectrum no no yes More mzML / Proteins? mzXML no

Export Spectral Normalize Indexes Spectral Indexes A look at StPeter in action. The Distributed Models

10 pmol

4 pmol 1.6 pmol • Mix six homologous 0.63 pmol albumins in a constant 0.25 pmol 0.1 pmol yeast background.

• Acquire 12 replicate injections on an LTQ.

Zhang Y, et al. Anal Chem. 2010 Mar 15;82(6):2272-81 Cow Rat Human Rabbit Mouse Pig The Distributed Models

dNSAF dSIN . All spectra are used, -5 -12 -13 including identifications to -6

-14

)

multiple proteins. -7 N -15

dSI -16 -8 -17

-9 Log2 ( -18

. Quantitation maintains (dNSAF)Log2 -19 -10 R² = 0.9719 linearity over 3 orders of -20 R² = 0.9871 magnitude. -11 -21 -4 -2 0 2 4 -4 -2 0 2 4 Log2 (Protein Quantity, pmol) Log2 (Protein Quantity, pmol) Quantifying Proteomes

Cox J, et al. Mol Cell Proteomics. 2014 Sep;13(9):2513-26

. Create two conditions: . 60 µg HeLa + 10 µg E. coli . 60 µg HeLa + 30 µg E. coli

. 24 OFFGEL fractions, analyzed in triplicate (3x) on LTQ-Orbitrap . Total of 144 data files . Approximately 1 million spectra used for quantitation

. Compare ratios for every protein observed in 2 of 3 replicates in both conditions. . Over 5000 proteins Artifacts of Spectral Counts

Decreased accuracy Tracks of discrete at low quantities due quantities due to to limited number of single MS/MS protein spectra per protein. representation.

Cox J, et al. Mol Cell Proteomics. 2014 Sep;13(9):2513-26 StPeter Analysis of Proteomes

dNSAF dSI -6 N . E.coli proteome ratio -16 Human much closer to 1:3 than Human -10 E. coli

spectral counting alone. E. coli

-20 ) N -14

. Protein distributions -24 Log2(dSI

show lower standard Log2(dNSAF) -18 deviation. -28

-22 -32 . Fewer artifacts among -2 0 2 4 -2 0 2 4

low abundance Log2(ratio) Log2(ratio)

proteins. -0.56 -0.60 σ=0.60 1.68 1.73 σ=0.55 1.08 1.17

σ=0.69 σ=0.72

Relative Relative frequencies Relative frequencies

-4 -2 0 2 4 6 -4 -2 0 2 4 6 Log2(ratio) Log2(ratio) Comparison to Other Methods

dSIN

-16 Human . Comparison to precursor ion E. coli

analysis (MS signals). -20

) N

-24 . Nearly identical protein ratios. Log2(dSI -28

-32 . Protein distributions similar, -2 0 2 4 slightly better accuracy with MS Log2(ratio)

signal analysis. -0.56 σ=0.60 1.73 1.17

σ=0.72 Relative Relative frequencies

-4 -2 0 2 4 6 Log2(ratio) StPeter in The TPP Running StPeter in The TPP

. Select one or more protein inference data sets. . Batch Analysis!

. Set parameters in a simple user interface

. Analysis typically takes seconds to a few minutes. Visualizing StPeter Results

Protein Descriptions Search Results & Statistics Quantities Visualizing StPeter Results

. Tabbed windows allow for filtering and sorting.

. Extract only quantified proteins.

. Sort results.

. Expand protein details to the peptide level. Summary

. The Trans-Proteomic Pipeline is a free, open-source suite of tools for shotgun MS data analysis. . The TPP is multi-platform, modular, and supports open formats, enabling integration with major platforms and 3rd party solutions. . StPeter offers fast, label-free quantitation of entire proteomes analyzed using shotgun MS.

https://tppms.org Acknowledgements

Moritz Lab (circa 2015) Robert Moritz Eric Deutsch Luis Mendoza David Shteynberg Jason Winget

Funding Sources:

2P50 GM076547/Center for Systems Biology R01 GM087221

HL133135