Kojak: Pipeline Developments and New Features for the Analysis Of
Total Page:16
File Type:pdf, Size:1020Kb
Quantifying Proteomes Using the Open- source Trans-Proteomic Pipeline Michael Hoopmann Jason Winget Luis Mendoza Robert Moritz Shotgun Mass Spectrometry Sample Instrument Discovery Mass Spectrometer Digestion / Spectra Separation Data (HPLC) Analysis More Detailed Look at Shotgun Data Analysis 1 2 3 4 Data Conversion Spectrum ID Validation Protein Inference Identification 5 6 7 Quantification Visualization Dissemination The Trans-Proteomic Pipeline (aka The TPP) Raw Mass Spec Peptide Peptide Validation Quantitation Protein Assignment Protein List Data Identification Comet PeptideProphet ASAPRatio X!Tandem SBEAMS SpectraST XPRESS msconvert iProphet ProteinProphet Kojak Libra SEQUEST* ProteoGrapher PTMProphet Mascot* StPeter mzML pepXML protXML . Simple set of input/outputs readable by all tools. Modular flow – can swap algorithms into and out of pipeline. Expandable – can add or remove tools as necessary. Simple GUI – Operates in a web browser, multi-platform. Versatility of The TPP Instrument Vendor & 3rd Data From Major Party Vendors Applications Data Cloud Open, TPP Applications Data Standardized Converters Formats Search Validate Infer Quant. mzML mzXML Publish The Trans- mzIdentML Multi-Platform Web Interface pepXML Proteomic Create protXML Visualize Organize Pipeline Suite More! Reports of Software The Trans-Proteomic Pipeline . MultipleAll tools accessibleuser accounts.from drop -down menu interface. Maintain . independentInterface has a few projectscommon and pipelines data storage.pre-built. The Trans-Proteomic Pipeline . All stages of pipeline accessible at any time, ordered for optimal performance. Stages allow for customization. Applications within pipeline can be re-run with different parameters to refine analyses. The Trans-Proteomic Pipeline StPeter: An application for label-free quantitation StPeter – Label-free Quantitation in The TPP . Historically, quantitation in the TPP focused on labeled Quantitation methods. Label-free methods are often ASAPRatio less work at the bench. Labeled XPRESS methods, e.g. More recently, label-free Libra iTRAQ & SILAC approaches have become more robust. StPeter New! Label-free! . StPeter is a new tool in the TPP for label-free quantitation. StPeter: MS/MS-based Quantitation Spectral Counting . Contains multiple MS/MS counting-centric methods. Produces relative protein quantitation within a sample. All results are normalized to facilitate comparisons across samples. StPeter: The NSAF Model Small proteins produce fewer distinct peptide molecules. Normalized Spectral Abundance Factor (NSAF) Large proteins produce many distinct peptides, appearing more abundant by total MS/MS count. Contains two normalizations: . Protein length . Sample-to-sample variation Zybailov, B.; Mosley, A. L.; Sardiu, M. E.; Coleman, M. K.; Florens, L.; Washburn, M. P. J. Proteome Res. 2006, 5, 2339–2347. StPeter: The SIN Model . Spectral counts are integral. i.e. a scan from a low abundance SI adds fragment N peptide has equal weight as a scan ion intensity to NSAF. from a high abundance peptide. Normalized Spectral Index (SIN) incorporates MS/MS peak height. More abundant peptides produce 푆푝퐶 more abundant fragment ion peaks. 푛=1 푖푛 푆퐼푁 = / 퐿 푁 ( 푆푝퐶 푖 ) 푗=1 푛=1 푛 푗 Where i is the summed intensities of fragment ions. Griffin NM, et al. Nat Biotechnol. 2010 Jan;28(1):83-9 StPeter: The Distributed NSAF (dNSAF) Model . Modifies the NSAF model to Protein Protein account for shared peptides. A B . i.e. peptides that map to multiple protein sequences. Utilizes the fraction of non-shared peptides to split the spectral counts 2 shared SpC of the shared peptides. 5 unique SpC 2 unique SpC Distributed SpC = 6.4 Distributed SpC = 2.6 Zhang Y, et al. Anal Chem. 2010 Mar 15;82(6):2272-81 StPeter: The Distributed SIN (dSIN) Model . Modifies the SIN model to account for shared peptides. i.e. peptides that map to multiple 푛 protein sequences. 푑푆퐼푁 = 푑푆퐼/ 푑푆퐼푗 /퐿 . Utilizes the fraction of non-shared 푗=1 peptides to split the spectral counts Hoopmann MR, Winget JM, Mendoza L, Moritz RL. J Proteome of the shared peptides. Res. 2018 Mar 2;17(3):1314-1320. Modernizes spectral index analysis with the methods optimized for dNSAF. Adding Quantitation to the Pipeline . Designed to be seamlessly Spectral integrated into existing TPP Search Pipelines. ID Validation . Execution is FAST – typically seconds to a few minutes. Protein Inference . Cassette model ensures addition StPeter or removal without disrupting the pipeline. Visualization How StPeter Works Output is the same format as input, with protein quantities appended; protXML Downstream tools operate the same. Read pepXML Add fragment ions Protein to spectral index Passes yes Read Extract PSM list More yes Extract FDR? Peptide List from pepXML PSMs? Spectrum no no yes More mzML / Proteins? mzXML no Export Spectral Normalize Indexes Spectral Indexes A look at StPeter in action. The Distributed Models 10 pmol 4 pmol 1.6 pmol • Mix six homologous 0.63 pmol albumins in a constant 0.25 pmol 0.1 pmol yeast background. • Acquire 12 replicate injections on an LTQ. Zhang Y, et al. Anal Chem. 2010 Mar 15;82(6):2272-81 Cow Rat Human Rabbit Mouse Pig The Distributed Models dNSAF dSIN . All spectra are used, -5 -12 -13 including identifications to -6 -14 ) multiple proteins. -7 N -15 dSI -16 -8 -17 -9 Log2 ( -18 . Quantitation maintains (dNSAF)Log2 -19 -10 R² = 0.9719 linearity over 3 orders of -20 R² = 0.9871 magnitude. -11 -21 -4 -2 0 2 4 -4 -2 0 2 4 Log2 (Protein Quantity, pmol) Log2 (Protein Quantity, pmol) Quantifying Proteomes Cox J, et al. Mol Cell Proteomics. 2014 Sep;13(9):2513-26 . Create two conditions: . 60 µg HeLa + 10 µg E. coli . 60 µg HeLa + 30 µg E. coli . 24 OFFGEL fractions, analyzed in triplicate (3x) on LTQ-Orbitrap . Total of 144 data files . Approximately 1 million spectra used for quantitation . Compare ratios for every protein observed in 2 of 3 replicates in both conditions. Over 5000 proteins Artifacts of Spectral Counts Decreased accuracy Tracks of discrete at low quantities due quantities due to to limited number of single MS/MS protein spectra per protein. representation. Cox J, et al. Mol Cell Proteomics. 2014 Sep;13(9):2513-26 StPeter Analysis of Proteomes dNSAF dSI -6 N . E.coli proteome ratio -16 Human much closer to 1:3 than Human -10 E. coli spectral counting alone. E. coli -20 ) N -14 . Protein distributions -24 Log2(dSI show lower standard Log2(dNSAF) -18 deviation. -28 -22 -32 . Fewer artifacts among -2 0 2 4 -2 0 2 4 low abundance Log2(ratio) Log2(ratio) proteins. -0.56 -0.60 σ=0.60 1.68 1.73 σ=0.55 1.08 1.17 σ=0.69 σ=0.72 Relative Relative frequencies Relative frequencies -4 -2 0 2 4 6 -4 -2 0 2 4 6 Log2(ratio) Log2(ratio) Comparison to Other Methods dSIN -16 Human . Comparison to precursor ion E. coli analysis (MS signals). -20 ) N -24 . Nearly identical protein ratios. Log2(dSI -28 -32 . Protein distributions similar, -2 0 2 4 slightly better accuracy with MS Log2(ratio) signal analysis. -0.56 σ=0.60 1.73 1.17 σ=0.72 Relative Relative frequencies -4 -2 0 2 4 6 Log2(ratio) StPeter in The TPP Running StPeter in The TPP . Select one or more protein inference data sets. Batch Analysis! . Set parameters in a simple user interface . Analysis typically takes seconds to a few minutes. Visualizing StPeter Results Protein Descriptions Search Results & Statistics Quantities Visualizing StPeter Results . Tabbed windows allow for filtering and sorting. Extract only quantified proteins. Sort results. Expand protein details to the peptide level. Summary . The Trans-Proteomic Pipeline is a free, open-source suite of tools for shotgun MS data analysis. The TPP is multi-platform, modular, and supports open formats, enabling integration with major platforms and 3rd party solutions. StPeter offers fast, label-free quantitation of entire proteomes analyzed using shotgun MS. https://tppms.org Acknowledgements Moritz Lab (circa 2015) Robert Moritz Eric Deutsch Luis Mendoza David Shteynberg Jason Winget Funding Sources: 2P50 GM076547/Center for Systems Biology R01 GM087221 HL133135 .