Introduction to Label-Free Quantification

SeqAn and OpenMS Integration Workshop Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) Mass-spectrometry data analysis in KNIME Julianus Pfeuffer, Alexander Fillbrunn OpenMS • OpenMS – an open-source C++ framework for computational mass spectrometry • Jointly developed at ETH Zürich, FU Berlin, University of Tübingen • Open source: BSD 3-clause license • Portable: available on Windows, OSX, Linux • Vendor-independent: supports all standard formats and vendor-formats through proteowizard • OpenMS TOPP tools – The OpenMS Proteomics Pipeline tools – Building blocks: One application for each analysis step – All applications share identical user interfaces – Uses PSI standard formats • Can be integrated in various workflow systems – Galaxy – WS-PGRADE/gUSE – KNIME Kohlbacher et al., Bioinformatics (2007), 23:e191 OpenMS Tools in KNIME • Wrapping of OpenMS tools in KNIME via GenericKNIMENodes (GKN) • Every tool writes its CommonToolDescription (CTD) via its command line parser • GKN generates Java source code for nodes to show up in KNIME • Wraps C++ executables and provides file handling nodes Installation of the OpenMS plugin • Community-contributions update site (stable & trunk) – Bioinformatics & NGS • provides > 180 OpenMS TOPP tools as Community nodes – SILAC, iTRAQ, TMT, label-free, SWATH, SIP, … – Search engines: OMSSA, MASCOT, X!TANDEM, MSGFplus, … – Protein inference: FIDO Data Flow in Shotgun Proteomics Sample HPLC/MS Raw Data 100 GB Sig. Proc. Peak 50 MB Maps Data Reduction 1 GB Data Diff. Quant. Differentially Annotated 50 MB Expressed 50 kB Maps Identification Proteins Quantification Strategies Quantitative Proteomics Relative Quantification Absolute Quantification AQUA SISCAPA Labeled Label-Free Spectral Feature-Based In vivo In vitro Counting MRM 14N/15N SILAC iTRAQ TMT 16O/18O After: Lau et al., Proteomics, 2007, 7, 2787 Quantitative Data – LC-MS Maps • Spectra are acquired with rates up to dozens per second • Stacking the spectra yields maps • Resolution: – Up to millions of points per spectrum – Tens of thousands of spectra per LC run • Huge 2D datasets of up to hundreds of GB per sample • MS intensity follows the chromatographic concentration LC-MS Data (Map) Quantification (15 nmol/µl, 3x over-expressed, …) 10 Label-Free Quantification (LFQ) • Label-free quantification is probably the most natural way of quantifying – No labeling required, removing further sources of error, no restriction on sample generation, cheap – Data on different samples acquired in different measurements – higher reproducibility needed – Manual analysis difficult – Scales very well with the number of samples, basically no limit, no difference in the analysis between 2 or 100 samples LFQ – Analysis Strategy 1. Find features in all maps LFQ – Analysis Strategy 1. Find features in all maps 2. Align maps LFQ – Analysis Strategy 1. Find features in all maps 2. Align maps 3. Link corresponding features LFQ – Analysis Strategy 1. Find features in all maps 2. Align maps 3. Link corresponding features 4. Identify features GDAFFGMSCK LFQ – Analysis Strategy 1. Find features in all maps 2. Align maps 3. Link corresponding features 4. Identify features 5. Quantify GDAFFGMSCK 1.0 : 1.2 : 0.5 Feature-Based Alignment • LC-MS maps can contain millions of peaks • Retention time of peptides and metabolites can shift between experiments • In label-free quantification, maps thus need to be aligned in order to identify corresponding features • Alignment can be done on the raw maps (where it is usually called ‘dewarping’) or on already identified features • The latter is simpler, as it does not require the alignment of millions of peaks, but just of tens of thousands of features • Disadvantage: it replies on an accurate feature finding Feature-Based Alignment ~350,000 peaks ~ 700 features Feature Finding • Identify all peaks belonging to one peptide • Key idea: – Identify suspicious regions (e.g. highest peaks) – Fit a model to that region and identify peaks explained by it Feature Finding • Extension: collect all data points close to the seed • Refinement: remove peaks that are not consistent with the model • Fit an optimal model for the reduced set of peaks • Iterate this until no further improvement can be achieved Multiple Alignment • Dewarp k maps onto a comparable coordinate system • Choose one map (usually the one with the largest number of features) as reference map (here: map 2 -> T2 = 1) Map 1 T1 Map 2 … T2 … Consensus map Map k m/z Tk rt rt LFQ with OpenMS in KNIME • Identification • Feature finding and mapping • Map alignment • Feature linking • Statistical analysis with R Snippets • Visualization with KNIME plotting nodes Preprocessing of single maps Combining information of maps Statistical post-processing and visualization.

Introduction to Label-Free Quantification

Standard Flow Multiplexed Proteomics (Sflompro) – an Accessible and Cost-Effective Alternative to Nanolc Workflows

Text Mining Course for KNIME Analytics Platform

Imagej2-Allow the Users to Use Directly Use/Update Imagej2 Plugins Inside KNIME As Well As Recording and Running KNIME Workflows in Imagej2

KNIME Workbench Guide

Targeted Quantitative Proteomics Using Selected Reaction Monitoring

Data Analytics with Knime

Quantitative Proteomics Reveals the Selectivity of Ubiquitin-Binding Autophagy Receptors in the Turnover of Damaged Lysosomes by Lysophagy

Mass Spectrometry and Proteomics Using R/Bioconductor

Sheffield HPC Documentation

Quantitative Assay of Targeted Proteome in Tomato Trichome

Mathematica Document

Direct Submission Or Co-Submission Direct Submission