Introduction to Label-Free Quantification
Total Page:16
File Type:pdf, Size:1020Kb
SeqAn and OpenMS Integration Workshop Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) Mass-spectrometry data analysis in KNIME Julianus Pfeuffer, Alexander Fillbrunn OpenMS • OpenMS – an open-source C++ framework for computational mass spectrometry • Jointly developed at ETH Zürich, FU Berlin, University of Tübingen • Open source: BSD 3-clause license • Portable: available on Windows, OSX, Linux • Vendor-independent: supports all standard formats and vendor-formats through proteowizard • OpenMS TOPP tools – The OpenMS Proteomics Pipeline tools – Building blocks: One application for each analysis step – All applications share identical user interfaces – Uses PSI standard formats • Can be integrated in various workflow systems – Galaxy – WS-PGRADE/gUSE – KNIME Kohlbacher et al., Bioinformatics (2007), 23:e191 OpenMS Tools in KNIME • Wrapping of OpenMS tools in KNIME via GenericKNIMENodes (GKN) • Every tool writes its CommonToolDescription (CTD) via its command line parser • GKN generates Java source code for nodes to show up in KNIME • Wraps C++ executables and provides file handling nodes Installation of the OpenMS plugin • Community-contributions update site (stable & trunk) – Bioinformatics & NGS • provides > 180 OpenMS TOPP tools as Community nodes – SILAC, iTRAQ, TMT, label-free, SWATH, SIP, … – Search engines: OMSSA, MASCOT, X!TANDEM, MSGFplus, … – Protein inference: FIDO Data Flow in Shotgun Proteomics Sample HPLC/MS Raw Data 100 GB Sig. Proc. Peak 50 MB Maps Data Reduction 1 GB Data Diff. Quant. Differentially Annotated 50 MB Expressed 50 kB Maps Identification Proteins Quantification Strategies Quantitative Proteomics Relative Quantification Absolute Quantification AQUA SISCAPA Labeled Label-Free Spectral Feature-Based In vivo In vitro Counting MRM 14N/15N SILAC iTRAQ TMT 16O/18O After: Lau et al., Proteomics, 2007, 7, 2787 Quantitative Data – LC-MS Maps • Spectra are acquired with rates up to dozens per second • Stacking the spectra yields maps • Resolution: – Up to millions of points per spectrum – Tens of thousands of spectra per LC run • Huge 2D datasets of up to hundreds of GB per sample • MS intensity follows the chromatographic concentration LC-MS Data (Map) Quantification (15 nmol/µl, 3x over-expressed, …) 10 Label-Free Quantification (LFQ) • Label-free quantification is probably the most natural way of quantifying – No labeling required, removing further sources of error, no restriction on sample generation, cheap – Data on different samples acquired in different measurements – higher reproducibility needed – Manual analysis difficult – Scales very well with the number of samples, basically no limit, no difference in the analysis between 2 or 100 samples LFQ – Analysis Strategy 1. Find features in all maps LFQ – Analysis Strategy 1. Find features in all maps 2. Align maps LFQ – Analysis Strategy 1. Find features in all maps 2. Align maps 3. Link corresponding features LFQ – Analysis Strategy 1. Find features in all maps 2. Align maps 3. Link corresponding features 4. Identify features GDAFFGMSCK LFQ – Analysis Strategy 1. Find features in all maps 2. Align maps 3. Link corresponding features 4. Identify features 5. Quantify GDAFFGMSCK 1.0 : 1.2 : 0.5 Feature-Based Alignment • LC-MS maps can contain millions of peaks • Retention time of peptides and metabolites can shift between experiments • In label-free quantification, maps thus need to be aligned in order to identify corresponding features • Alignment can be done on the raw maps (where it is usually called ‘dewarping’) or on already identified features • The latter is simpler, as it does not require the alignment of millions of peaks, but just of tens of thousands of features • Disadvantage: it replies on an accurate feature finding Feature-Based Alignment ~350,000 peaks ~ 700 features Feature Finding • Identify all peaks belonging to one peptide • Key idea: – Identify suspicious regions (e.g. highest peaks) – Fit a model to that region and identify peaks explained by it Feature Finding • Extension: collect all data points close to the seed • Refinement: remove peaks that are not consistent with the model • Fit an optimal model for the reduced set of peaks • Iterate this until no further improvement can be achieved Multiple Alignment • Dewarp k maps onto a comparable coordinate system • Choose one map (usually the one with the largest number of features) as reference map (here: map 2 -> T2 = 1) Map 1 T1 Map 2 … T2 … Consensus map Map k m/z Tk rt rt LFQ with OpenMS in KNIME • Identification • Feature finding and mapping • Map alignment • Feature linking • Statistical analysis with R Snippets • Visualization with KNIME plotting nodes Preprocessing of single maps Combining information of maps Statistical post-processing and visualization.