Label Free Quantitative Proteomics

Mi-Youn Brusniak, Ph.D. Outline

• Principles of quantitative proteomics • Labeling vs. non-labeling approaches for quantitative proteomics • SpecArray: software tool for quantitative proteomics without isotopic labeling • Corra: Framework to generate candidate biomarkers using non labeling methods Summary of LC-ESI-MS/MS

• Protein mixtures are digested into peptides • Peptides are concentrated and fractionated by separation technologies such as SCX, IEF, RP, etc. • While eluting from RP column, peptides are ionized by ESI and analyzed by MS/MS • Peptides are identified from CID spectra • Peptides are mostly quantified from MS signatures except in the case of iTRAQ Electro spray Ionization

• Multiple charge states: from +1 to +4

+ + M + z H = M(H )z m/z = (M+z*H)/z

LC ESI +1

+HV - time +3 +4 Reversed-Phase Chromatography • Separate peptides by hydrophobicity • Reproducible • Automated, coupled online with MS

- 2 to 4 kV UV detector 100 100 HPLC % AcN% Tapered frit AcN% 50 50 R.A.(%) Analytical Column (100 µµµm i.d.) R.A.(%) Peek 0 0 micro- 10 20 30 40 50 60 70 cross Time (min) Grounded

waste close Precolumn Peptide eluting profiling

6-way Divert Valve time Single Ion Chromatogram

2D view: m/z, intensity 3D view: m/z, intensity, time MS scans z m/ z y / z t m / i m z s m/

n .2 i 00 intensity 10 m/z tim e (scan Single Ion Current (SIC) Trace #) intensity m/z=1000.2

scan # Peptide Quantification

• Area of SIC is proportional to peptide abundance • Ionization efficiency of each peptide is different – Depends on the peptide molecular properties (e.g. number of basic residues) • MS Technology is NOT quantitative Outline

• Samples labeled with different stable isotopes • Chemically identical • Distinguishable by MS in mass shift • Peptide abundance ratio measured by ratio of SIC areas • Peptides are identified before quantification ASAPRatio

Area=1.21x10 9 d0

Area=1.01x10 9 d8 d0/d8=1/0.83 Typical LC-MS/MS Analysis

Features: 2720 Typical LC-MS/MS Analysis

Features: 2720

CIDs: 1633 Typical LC-MS/MS Analysis

Features: 2720

CIDs: 1633

IDs: 363

ID/CID: 22% ID/feature: 13% Limitations of LC-MS/MS Approach to Large-Scale Protein Profiling

• Sample size limited: – ICAT (2), iTRAQ (4), … • Difficult to trace protein abundance across a large number of samples • Most peptides cannot be identified • Difficult to identify & quantify low-abundance proteins LC-MS Approach

1 10 LC-MS Approach

1 10 LC-MS Platform for Large-Scale Protein Profiling

• Samples are NOT labeled • Samples are analyzed under identical settings • Peptide abundance is evaluated by MS signal intensity in different runs • Reproducibility in LC-MS analysis critical • Peptide alignment crucial • Followed by target LC-MS/MS Challenges in LC-MS Platform

• Highly reproducible LC-MS analysis – Retention time shift – Fluctuations in MS signal intensity – Peptide identification in separated MS/MS • Complex samples – Overlapping signals – Misaligned peptides • Large sample size – Column degradation Outline

• Software Suite for the Generation and Comparison of Peptide Arrays from Sets of Data Collected by Liquid Chromatography-Mass Spectrometry • Xiao-Jun Li et. al. Molecular & Cellular Proteomics 4.9, 2005 • SpecArray v1.2.0 is available in http://sourceforge.net/projects/sashimi or http://tools.proteomecenter.org/software.php SpecArray Software Suite

Collect LC-ESI-MS data Pep3D Evaluate data quality mzXML2dat Process raw data PepList Extract peptide features PepMatch Align peptides across samples PepArray Generate peptide array Collect LC-MS Data

Collect LC-ESI-MS data Pep3D Data collection: Evaluate data quality LC-MS in profiling mode mzXML2dat No or minimal MS/MS Process raw data PepList Peptide abundance evaluated Extract peptide features by MS signal intensity (after proper normalization) PepMatch Align peptides across samples Converted into mzXML file PepArray format Generate peptide array Evaluate Data Quality

Collect LC-ESI-MS data Pep3D Pep3D Evaluate data quality mzXML2dat Process raw data PepList Extract peptide features PepMatch Align peptides across samples PepArray Generate peptide array LC-MS Data of Same Sample good bad

MicroTOF QTOF Process Raw Data

Collect LC-ESI-MS data mzXML2dat : Pep3D Smoothing, denoising, Evaluate data quality centroiding, subtracting mzXML2dat background, estimating Process raw data S/N PepList ~2GB ~5MB Extract peptide features PepMatch Align peptides across samples PepArray Generate peptide array Pep3D Images

Before Pep3D Images

After Extract Peptide Features

Collect LC-ESI-MS data PepList : Pep3D Peptide isotopic distribution Evaluate data quality Peptide elution profile mzXML2dat Process raw data Data range: m/z, time PepList I1

Extract peptide features I2 I0

PepMatch I3

Align peptides across samples I4 PepArray Generate peptide array Peptide Feature Detection

Mass: 2177.8 Time: 78 min Charge: +2

Peptide isotopic distribution

Peptide elution profile Align Peptides Across Samples

Collect LC-ESI-MS data PepMatch : Pep3D Align peptides by mass, Evaluate data quality charge, retention time mzXML2dat Reproducibility Process raw data m/z: Highly reproducible PepList retention time: Relative order reproducible Extract peptide features signal intensity: Relative intensity PepMatch reproducible Align peptides across samples Align multiple samples PepArray Combine aligned peptides Generate peptide array Peptide Alignment Between Two Samples

retention time intensity Generating Peptide Expression Array

Collect LC-ESI-MS data PepArray : Pep3D Sample-dependent Evaluate data quality normalization mzXML2dat Process raw data Search missed features PepList User specified information Extract peptide features Minimal sample size of each group PepMatch Align peptides across samples PepArray Generate peptide array Outline

• Principles of quantitative proteomics • Labeling vs. non-labeling approaches for quantitative proteomics • Introduction to ASAPRatio: software tool for quantitative proteomics using isotopic labeling • Introduction to SpecArray: software tool for quantitative proteomics without isotopic labeling • Corra: Framework to generate candidate biomarkers. What is Corra? What is Corra?

• Corra is a Scottish Goddess of prophecy. • Corra is a framework to generate candidate biomarkers in high throughput MS data processing environment. • Corra’s goal is to detect differentially expressed features by maximizing the number of such features while controlling the false discovery rate. • Corra uses LC-MS (and LC-MS/MS) and experiment design information • Work in progress – Validation of work flow using Latin Square data – Validation of reproducibility Corra: MS1 Data Analysis (A Discovery-Based Approach)

LC-MS of sample pools Differential analysis Align LC-MS data of cases and controls PCA and clustering

Candidate validation Peptide/protein Peak inclusion list of in larger sample pool identification by MS/MS discriminatory features

Discriminatory peptide features initially determined just from MS1 data analysis Follow-up MS/MS subsequently determines peptide identity Corra Framework Overflow Corra Framework Overview

LC-MS and MS/MS (mzXML) MS TPP Processes

mzXML

Database search Pep3D (SEQUEST/Mascot) MS/MS

INTERACT

PeptideProphet

ProteinProphet LC-MS Feature Picking

Annotation of picked APML(feature lists) Annotated Peptide peptide features with Markup Language sequence assignments

LC-MS alignment LC-MS

LC-MS/MS DB

APML (aligned feature lists) Annotated Peptide Markup Language

Bioconductor Annotated Data Set

R: Statistical Analysis Validation and candidate for targeted analysis CorraCorra Framework OverviewOverview

Histogram of cor(run1all, use = "pairwise.complete.obs") Quick Monitoring

3000

Inputs: 2500

2000

Series1 •mzXML 1500 Series2 No Features No

1000

•Search Result Files Frequency 500

t st t st t st t st t t st t st t st Li Li •Sample Information pLi pLi pLi pLis pLi pLi .pep pe pe pe pe pe pe .pep -1.pepLis -1.pepLis -1. -2.pepLis 1. -2.pepLis 1. -1.pepLis -2. -1.pepList 2. -2.pepLis 1. -2.pepLis 25 304 013 046 063 079 112 273 552 00134 700 800 8000 800 800 800 800 8 800 800 26 26700457-126 26800017-2.pepList26 26 26800055-26 26800068-26 26800101-1.pepList26 26 26800213-26 26800454-26 26800753-1 Sample 0 50 100 150 200 250

0.70 0.75 0.80 0.85 0.90 0.95 1.00

cor(run1all, use = "pairwise.complete.obs")

Corra •Quick Quality Control on the fly •LC-MS or LC-MS/MS Process •Annotation of Samples and aligned Features •Simple and Basic Statistical Analysis (PCA, Volcanoplot) •Simple Cross-validation

m.z rt charge M t P.Value adj.P.ValB PCA on all common features 343 439.468 35.666 4 1.309641 5.747842 5.92E-06 0.010247 3.929592 285 1164.557 121.915 4 -1.55887 -5.54884 9.78E-06 0.010247 3.480909 DB 742 734.352 65.715 3 -2.03654 -5.30905 2.34E-05 0.016372 2.687205 152 974.18 80.755 4 -1.41841 -4.82714 6.15E-05 0.032238 1.825438 DB 421 992.613 16.686 4 -1.26019 -4.53599 0.00013 0.040325 1.150351 NGT IGT DB 255 981.175 45.484 4 -1.32148 -4.52185 0.000135 0.040325 1.11754 IGT IGT DB 1753 669.88 90.033 2 -2.03223 -4.39905 0.000293 0.071341 0.442843 Outputs: 1367 861.883 47.297 2 -3.3339 -6.34749 0.000107 0.040325 0.299392 453 1104.57 112.804 4 -1.17287 -4.15356 0.000346 0.071341 0.264205 IGT IGT IGT IGT IGT IGT 680 758.056 67.722 3 -1.21069 -4.10977 0.000387 0.071341 0.163092 DB DB NGT •APML for Feature Lists PC2 DB 332 739.094 92.16 4 -1.07081 -4.06259 0.000437 0.071341 0.05428 NGT IGT 326 987.174 45.883 4 -1.25356 -3.99535 0.000518 0.071341 -0.10052 DB NGT IGT 183 726.855 82.543 4 -1.12979 -3.94417 0.00059 0.071341 -0.2181 NGT DB •APML for Aligned Feature Lists NGT NGT 1252 905.114 81.87 3 -1.18671 -3.93147 0.00061 0.071341 -0.24723 NGT 324 923.705 131.279 4 -1.10673 -3.9304 0.000611 0.071341 -0.24969 NGT NGT 668 889.396 67.162 3 -1.31687 -3.90338 0.000655 0.071341 -0.31164 •Candidate Target Analysis Lists NGT 1179 448.137 25.178 3 -1.25939 -3.8901 0.000677 0.071341 -0.34206 IGT IGT NGT 234 531.773 102.71 4 -1.05553 -3.88587 0.000684 0.071341 -0.35175 -15 -10 -5 0 5 10NGT 15 370 448.18 24.596 4 -1.05587 -3.88262 0.00069 0.071341 -0.35918 (feature or peptide or protein) -20 -10 0 10 20 644 892.412 71.75 3 -1.13239 -3.91765 0.000715 0.071341 -0.36621 •Weighted Panels of Proteins which PC1 has a certain statistical predictability Corra Application Examples (LC-MS: Diabetes Pilot Study)

24 runs = 8 individuals x 3 replicate runs Corra Application Examples (LC-MS: NCI Mouse Skin Cancer) Summary

• Label free quantitative LC-MS and LC-MS/MS is a powerful method for identifying and quantifying proteins in complex samples • Corra framework is very promising in large- scale protein profiling • Software suites have been developed for both LC-MS and LC-MS/MS platforms