<<

Fundamentals of Biological Spectrometry and

Steve Carr Broad Institute of MIT and Harvard Modern Mass (MS) Systems

Orbitrap Q-Exactive Triple Quadrupole

Discovery/Global Experiments Targeted MS

MS systems used for proteomics have 4 tasks: • Create from analyte • Separate the ions based on charge and mass • Detect ions and determine their mass-to-charge • Select and fragment ions of interest to provide structural information (MS/MS) Electrospray MS: ease of coupling to -based separation methods has made it the key technology in proteomics Possible Sample Inlets Pump Sample Injection Loop Liquid Autosampler, HPLC Capillary Expansion of the Formation and Sampling Regions Drying Electrospray Atmosphere Vacuum Needle 3- 5 kV

Liquid

Nebulizing Gas Droplets Ions Containing Solvated Ions

Most elements have more than one stable . For example, most have a mass of 12 Da, but in nature, 1.1% of C atoms have an extra , making their mass 13 Da.

Why do we care? Mass “see” the isotope peaks provided the resolution is high enough. If an MS instrument has resolution high enough to resolve these isotopes, better mass accuracy is achieved.

Stable isotopes of most abundant elements of peptides

Element Mass Abundance H 1.0078 99.985% 2.0141 0.015 C 12.0000 98.89 13.0034 1.11 N 14.0031 99.64 15.0001 0.36 O 15.9949 99.76 16.9991 0.04 17.9992 0.20 and isotopes

We use instruments that resolve the isotopes enabling us to accurately measure the monoisotopic mass MonoisotopicMonoisotopic mass; all 12C, mass no 13C atoms corresponds to 13 lowestOne massC peak

Two 13C atoms Angiotensin I (MW = 1295.6) (M+H)+ = C62 H90 N17 O14

TheWhen monoisotopic the isotopes mass of aare clearly is the resolved sum of the the accurate monoisotopic for the massmost abundant isotope of each element present. As the number of atoms of any given element increases,is used the as percentage it is the of most the population accurate of molecules measurement. having one or more atoms of a heavier isotope of this element also increases. The most significant contributors to the isotopic peak pattern for peptides is the 13C isotope of carbon (1.1%) and 15N peak of nitrogen (0.36%). Example of electrospray mass of mixture of 3 peptides

Isotope spacing = 1.0: Ion is singly charged: (M+H)1+ MW = 584.2

Isotope spacing = 0.5: 2+ Isotope spacing = 0.3: Doubly charged: (M+2H) Triply charged: (M+3H)3+ MW = 1096.2 MW = 1806.6

586.2

603.5 Example of electrospray of mixture of 3 peptides

Peptide 3: MW = 1806.6 F-G-G-F-T-G-A-R-K-S-A-R-K-L-A-N-Q Peptide 2: MW = 1096.2 F-G-G-F-T-G-A-R-K-S-A Peptide 1: MW = 584.2 F-G-G-F-T-G Peptide 1 (M+H)1+

Peptide 2 1-6 Peptide 3 (M +2H)2+ (M+3H)3+ Nociceptin 1-11 1-11 1-17

Nociceptin 586.2 Peptide 2 Nociceptin (-H, +Na) 603.5 How we sequence peptides: MS/MS intact peptide parent ions

Scan m/z Pass Pass 350-1200 All ions All ions

MS

Collision MS 1 Cell (off) MS 2 MS or mass spectrum HPLC Column MS/MS

Collision Cell (on) Pass m/z Fragment Pass MS/MS spectrum of doubly 834-838 all ions All ions charge ion at m/z 836.5

MS/MS means using two mass analyzers (combined in one instrument) to select an analyte (ion) from a mixture, then generate fragments from it to give structural information. Dominant fragment ions observed by collision- induced dissociation (CID) of peptides

b ions y ions

Direct cleavage Rearrangement of of peptide bond mobile

Example electrospray MS/MS spectrum of a peptide

F-I-I-V-G-Y-V-D-D-T-Q-F-V-R

1199.53 100

90

80 880.46 70 Δ=115 Δ=115 979.49 Δ=128 Asp Asp 60 Δ=99 Δ=57 Gln Δ=99 50 792.35 Val Gly Val Δ=147 693.26 Phe 1142.53 1298.60 40 706.62 Δ=101 1497.46 Thr Δ=163 30 Tyr 650.44 765.40 1251.46 20 421.33 473.15 549.46 907.26 1124.44 1398.48 10 261.30 818.64 Relative Abundance,% Relative 0 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 m/z Example electrospray MS/MS spectrum of a peptide

11 10 9 8 7 6 5 4 3 2 y ions F-I-I-V-G-Y-V-D-D-T-Q-F-V-R b ions 2 3 4 6 7 8 9 10 11 12 13

1199.53 100 Δ=115 y 90 10 Asp y7 80 Δ=115 y8 Asp 880.46 Δ=99 70 Val 979.49 Δ=57 b7 Gly 60 Δ=128 b6 Δ=99 y11 Gln Val Δ=163 y9 50 b3 792.35 b Δ=101 693.26 Tyr 13 b4 1142.53 1298.60 40 706.62 Thr 1497.46 y5 y b10 y 6 b11 30 y 3 2 b8 y 650.44 b 4 765.40 1251.46 12 20 b2 421.33 473.15 b9 549.46 907.26 1124.44 1398.48 10 261.30 818.64 Relative Abundance,% Relative 0 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 m/z Discovery Proteomics: differential expression profiling by MS

Biological Samples LC-MS/MS Data Analysis (case vs. control)

Protein Mixtures • Biofluids Separate and Analyze Search DB using peptide • Tissue lysates Peptides by LC-MS/MS m/z and sequence

• digest to peptides • m/z and intensity of peptides • Peptide identity • fractionate peptides • rich pattern • identity • Fragment ions for sequence • Relative abundance Most analyses of are done by digestion of proteins to peptides (“bottom-up” proteomics)

Advantages: • Data acquisition easily automated • Fragmentation of tryptic peptides well understood • Reliable software available for analysis • Separation of peptides to create less complex subsets of the for MS analysis is far easier than for proteins (relates to breadth and depth of coverage)

Disadvantages: • Simple relationship between peptide and protein lost • Took highly complex mixture and made it 20-100x more complex • Puts high analytical demands on instrumentation

Obtaining sequence information on intact proteins: “top-down” proteomics

Ÿ Most useful for single proteins or relatively simple mixtures (1) Ÿ Can distinguish sequence variants Ÿ Enables deciphering of combinatorial modification “codes” on proteins like histones (2).

Ÿ While useful, its not suitable for most biomedical applications yet: – requires highly specialized instrumentation – cannot be easily applied to complex biological samples – Data interpretation is far more difficult and less automated – Breadth and depth of coverage of the proteome is orders of magnitude less than for bottom-up proteomics

1. Tran et al.Nature, 2011, 480(7376) p. 254-8 2. Tian et al. Biology 2012, 13:R86 A selective look into the proteomic “tool chest”

Reduction and Alkylation Ÿ Routinely done prior to enzymatic digestion to break disulfide bonds, unfolding proteins to make them more susceptible to enzymatic cleavage A selective look into the proteomic “tool chest”

Highly Specific Ÿ C-terminal to Arg and Lys Ÿ Lys-C C-terminal to Lys Ÿ Staph. V8 C-terminal to Glu and Asp Ÿ Asp-N N-terminal to Asp

Non-Specific Proteases Ÿ Chymotrypsin C-terminal to aromatic, aliphatic (e.g., Tyr, Trp, Phe, Leu) Ÿ Proteinase K, Thermolysis C-terminal to aromatic, aliphatic Discovery Proteomics: differential expression profiling by MS

Biological Samples LC-MS/MS Data Analysis (case vs. control)

Protein Mixtures • Biofluids Separate and Analyze Search DB using peptide • Tissue lysates Peptides by LC-MS/MS m/z and sequence

• digest to peptides • m/z and intensity of peptides • Peptide identity • fractionate peptides • rich pattern • Protein identity • Fragment ions for sequence • Relative abundance Peptide by LC/MS/MS

MS-1 Collision MS-2 peptide from Q I K L COOHCell protein of interest E NH F 2 NH2 Y

NH2 Y G NH 1D or 2D LC 2 Y G G G Y G G F separation Y NH2 G COOH Y G G F L NH2 L NH2 F COOH G G F L COOH G F L COOH L F COOH

M A P L S A COOH A R N Trypsin digest S P N Reduce, alkylate

Select mass from Sequence peptide of interest peptide of interest

proteins Automated Peptide Sequencing by LC/MS/MS

RT: 0.01 - 90.15 340 Total Ion Current Trace produced during LC-MS/MSNL: 10 0 505 12 54 B 9.54E 7 A 510 80 385 B ase Peak F : + 465 c Full ms [ 60 1150 12 9 9 300.00 - 335 415 645 10 70 114 5 2000.00] 535 695 10 15 10 6 0 40 295 555 630 705 175 955 116 5 1175 13 0 4 18 0 250 650 795 860 875 925 13 59 20 585 760 970 13 0 9 Relative Abundance Relative 170 41 31 14 0 13 6 9 14 6 9 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 T ime (min) 10 0 535.8 10 0 500.6 684.6 B 403.9 A 750.0 50 493.6 50 616.6 713.0 Relative Abundance Relative Relative Abundance Relative 0 0 500 10 0 0 150 0 2000 500 10 0 0 150 0 2000 m/z m/z

644.0 10 0 658.9 10 0 MS/MS 535.8 MS/MS 500.6 653.0 50 598.5 526.9 10 0 50 10 0 596.7 MS/MS 403.9 700.5 MS/MS 684.6 511.0 213.1 469.5 565.1 10 14 . 2 1119 . 3 318.2 Abundance Relative 0 698.2 814.7 849.5596.21154 . 6 13 6 1. 6 50 471.0 712.6 Relative Abundance Relative 50 10 0 0 10 0 200 400 600 800897.4 10 0 0 12 0 0 14 0 0 500 10 0 0 MS/MS150 0 493.6 m/z MS/MS 750.0 491.5 698.5 967.3 156 . 9 374.4 m/z 683.2 796.3 897.3 968.4 119 6 . 5 16 0 5. 5 Relative Abundance Relative

Relative Abundance Relative 572.5 0 598.6 0 50 641.1596.2 50 10 0457.3 942.4 12 2 5. 4 10 0 543.7 200 400443.1 543.5600 800 10 0 0 12 0 0 500 10 0 0 10 9 5. 3 150 0 2000 262.2 MS/MS 616.6 m/z MS/MS12 8 6 . 3 713.0 289.1 m/z 677.1600.7778.9 892.1 1153 . 0 755.3 Relative Abundance Relative 471.0 Relative Abundance Relative 0 0 50 50 677.1 500 814.610 0 0 500 10 0 0 954.4 150 0 2000 520.0 1-2 sec m/z m/z 119 6 . 4 489.3 893.6 10 9 0 . 2 Abundance Relative 0 Relative Abundance Relative 0 500 10 0 0 150 0 2000 cycle time 500 10 0 0 150 0 m/z m/z “Top 4 Method” (modern MS systems can do up to “top 20”) MS/MS Search Engines: looking up the answer in the back of the book

Acquired MS/MS Sequence Database spectrum (translation of )

Theoretical spectrum ISLLDAQSAPLR correlate VVEELCPTPEGK DLLLQWCWENGK ECDVVSNTIIAEK GDAVFVIDALNR VPTPNVSVVDLTNR 200 400 600 800 1000 1200 200 400 600 800 1000 1200 m/z m/z SYLFCMENSAEK PEQSDLRSWTAK similarity score Best matching database peptide Determine peptide FDR by searching reversed DB Algorithms: , MaxQuant, SpectrumMill, X-Tandem… Rolling peptides up to the protein level

Peptide 1 Peptide 2 Prot A Peptide 3 Prot B Peptide 4 Peptide 5 Peptide 6 Prot E Peptide 7 Peptide 8 Prot D Prot X Peptide 9 Prot Y Peptide10 Prot Z

Slide courtesy of Alexey Nesvizhskii Examples of a Protein Centric Table (MaxQuant)

Unique Sequence Mol. Protein Gene Peptides Coverage Weight Ratio H/ Ratio H/ IDs Names Rep01 [%] [kDa] L L Count p-value

A0ELI5 Edc3 3 10.2 55.9 1.2 4 0.96 mCG_96 A0MNP4 84 1 3 33.7 1.0 1 0.01

A1A549 Tcf3 3 5.2 64.0 1.0 5 0.03Pepde Rao H/L 2510012 APEPTIDEK 1.0 A1L013 J08Rik 5 7.8 90.6 0.78 8 0.54 mCG_20 YKPSTELLIR 1.2 A1L329 206 9 10.9 109.9 0.86 16 0.78EWERTHEFAASLR 1.6 mCG_19 A1L3B6 432 8 37.4 28.9 0.73 19 0.66IAMAPEPTIDER 0.9 GWQIMNCSTYK 0.5 YHTLSSVTYEHLK 1.5 ISEEALARGEPEPTIDEK 1.2 § Table of values organized around proteins X Protein Median 1.2 § A ratio that indicates a fold-change vs. a control condition § We generate a false discovery rate or p-value statistic for each protein ratio to indicate how different from the null hypothesis (unchanged)

§ A prioritized list of candidates for follow-up studies Protein Inference Problem

Shared peptides: map to more than a single entry in protein database

Prot A protein A or protein B ?? Peptide Or both?

Prot B In bottom-up proteomics the connectivity between peptides and proteins is lost

Shared peptides are more prevalent with databases of higher eukaryotes due to the presence of:

§ related protein family members § alternative splice forms § partial sequences Slide courtesy of Alexey Nesvizhskii Peptide Quant to Protein Quant

§ A peptide could belong to more Pepde Log2 SILAC Rao APEPTIDEK 0.12 than one protein YKPSTELLIR 0.15 EWERTHEFAASLR 0.07 § Go with preponderance of the IAMAPEPTIDER 0.21 Protein X evidence to assign peptide GWQIMNCSTYK 0.14 YHTLSSVTYEHLK 0.29 • Occam’s razor principle ISEEALARGEPEPTIDEK 0.23 § In this case, peptide is assigned EITHERWAYK 0.22 to Protein X because there are

SIMPLESEQK 0.77 more peptides supporting it LITTLEPEPTIDER 0.99 Protein Y Quantitative Data Drives Modern Proteomics

• Analyze biological replicates of state comparisons • The end result is always a ratio: – WT expression vs. mutant Non-specific – Drug vs. no drug – Bait vs. control Specific, upregulated Unique Proteins

• T-stascs or Gaussian Modeling drives calling of regulaon or specificity replicate 2 (+drug/-drug) replicate2(+drug/-drug)

replicate 1 (+drug/-drug) Unique Proteins Relative Quantification Methods for Discovery Proteomics

Label-free quantification Chemical labeling Metabolic labeling (SILAC) (1 sample at a time) (up to 10 samples at a time) (up to 3 samples at a time) State A State B State A State B State A State B (light) (heavy)

Label

Combine

Label

Combine

Quantify Identify

m/z m/z m/z • Need multiple replicates • Compression (fractionate) • Limited plex-level • Less precise at low abund. • CostIncreasing precision • Humans can’t be labeled Label-free quantification: spectral counting or peak area

Label-free quantification • One spectrum with peptide ID that can be linked to a protein = 1 count for that protein

State A State B – Basis of spectral counting

• Detection likelihood is tied to abundance – Results vary depending on Instrument settings and number of peptides in protein

• Only reliable for moderate to highly abundant proteins – Lots of missing data, especially for lower abundance proteins – Poor precision leads to high FDR

• Low throughput – Every sample run separately – Triplicate analyses required for stat. confidence

m/z – Instrument time = $; not inexpensive! SILAC: Stable Isotope Labeling by Amino acids in

Pros Cons Metabolic labeling (SILAC) (up to 3 samples at a time) Deep, highly precise quant. Limited plex level (3 max) State A State B Works well in most cell lines Not practical for most model (light) (heavy) systems Works with all PTMs Can’t label humans Label Relatively inexpensive

Combine

State A State B

6-7 doublings in media depleted of 12 light ( C6)

m/z • Time course of activation • Mixing samples improves data and saves instrument time • ID of p-sites requires MS/MS • Detects some proteins associated with pY-proteins

Blagoev, Ong et al. (2004) Nature Biotech. 22: 1139 Chemical Labeling of Peptides: Multiplexed Quantification with Isobaric Mass Tag Reagents

Mass = 145 Chemical Labeling of Peptides: Multiplexed Quantification with Isobaric Mass Tag Reagents

100 116.1111 114.1108 Mix Peptides from all 4 Samples: analyze by MS 80 291.2149 117.1145 9060 390.2832 8040

7020 MS same peptide 60 0 112 114 116 118 50 m/z 720.4188 from 4 different 503.3672 40 reporter ions MS/MS samples: 30Abundance Relative 703.2882 Observed 404.3024 20 116.1111 218.0594 614.2397 792.3369 200.1014 331.1429 462.1813 precursor 10 145.1086 240.1341 774.3190833.5016 352.1475 549.2076 904.5338 561.3007 0 intensity = Σ of all 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 m/z labeled versions m/z Sequence informative fragment ions 116.1111 100 114.1108

iTRAQ Experimental Example 80

Kinase Kinase Kinase 60 DMSO Peptide #1: Inhib 1 Inhib 2 Inhib 3 No effect 40

Relative Abundance Relative 20 Lyse and Digest 0 112 114 116 118 m/z Label 114.1107 100 “114” “115” “116” “117” 80 Pool Peptide #2: 60 Sensitive to 115.1077 Phosphopeptide 40 Enrichment all 117.1146 inhibitors Abundance Relative 20

0 LCMS 112 114 116 118 m/z 116.1117 100 114.1112

80

Peptide #3: 60 Sensitive to inhibitors 1 & 40 117.1146 3 Abundance Relative 20

0 112 114 116 118 m/z Isobaric tag reagents with higher multiplex levels now available: increased sample throughput with high sensitivity and good quantitative fidelity

3x increased throughput Highly consistent quantification results 9 tumor samples (4 basal; 4 luminal; 1 reference) ref iTRAQ4

TMT6

TMT10

Log2 basal/luminal tumors Analytical challenges of proteomics differ in important ways from transcriptional analysis

Transcriptional Profiling MS-based Proteomics

Lysozyme #1047-1064 RT: 14.18-14.47 AV: 18 NL: 5.58E3 T: FTMS + p NSI Full ms2 [email protected] [ 390.00-2000.00] 993.0279 100 95 1584.7555 90 85 1167.6802 80 1408.6818 75 1352.9105 70 1638.7799 65 1538.7399 60 1839.5064 975.5341 55 Relative Abundance 50 45 1081.7426 40 35 1773.8836 1894.6001 30 25 20 1285.7981 15 660.3698 1963.1184 559.8581 806.9863 10 497.6919 915.3887 5 0 400 600 800 1000 1200 1400 1600 1800 2000 m/z

Ÿ All possible features known Ÿ All possible features not known Ÿ Sample is static during analysis Ÿ Sample is dynamic during analysis Ÿ All features measured Ÿ 20-50% of features measured Ÿ Robust means to amplify low Ÿ No protein PCR (analytics have to numbers DNA or RNA (PCR) deal with enormous dynamic range) Ÿ Signal not detected means Ÿ Signal not detected means either feature not present that feature not present or feature present but not detected Discovery defines a reduced set of “sentinel” marks that need to be repeatedly measured in a range perturbations

Discovery: Assays: • Currently: Westerns, • Highly specific • Development immunoassays • Sensitive • Drug Future: Targeted MS, • Highly precise • KO/KI ImmunoMS • Multiplexed • Interference-free

Not all proteins Precisely measure and (especially) selected analytes PTMs observed in in all experiments all experiments – no missing data How do you start a proteomics project?

We meet to discuss your project ([email protected]) • Project proposals are reviewed for scientific merit, technical feasibility and alignment with our interests and the Broad mission • Discussion of the science and experimental design • discussed in detail - what, how and by whom • All projects are collaborative

Funding: • Platforms are largely self-supporting and must charge the work performed. If projects are reviewed favorably but lack funding, we will help investigators explore options for support, including consideration for collaborative funding through the Broad.

www.broadinstute.org/proteomics The Broad Institute Proteomics Group Suggested additional reading

Carr and Annan, 2001. Overview of Peptide and Protein Analysis by . Current Protocols in 10: 10.21.1–10.21.27.

Aebersold, R. and Mann, M. 2003. Mass spectrometry-based proteomics. Nature 422:198-207.

Cravatt et al. 2007. The biological impact of mass-spectrometry-based proteomics. Nature 450: 991-1000. Additional resources for MS data interpretation

• Manual de novo tutorials • Don Hunt and Jeff Shabanowitz • hp://www.ionsource.com/tutorial/DeNovo/DeNovoTOC.htm • Rich Johnson • hp://www.abrf.org/ResearchGroups/MassSpectrometry/EPosters/ms97quiz/SequencingTutorial.html • Automated de novo - PEAKS • hp://www.bioinformacssoluons.com/peaks/tutorials/denovo.html • hp://www.youtube.com/watch?v=lyhpRu6s7Ro • De Novo Sequencing and Homology Searching Tutorial • Ma B, Johnson R. Mol Cell Proteomics 11: O111.014902, 1–16, 2012.. • Modificaon Site Localizaon Scoring: Strategies and Performance - Review • Chalkley, RJ and Clauser, KR. Mol Cell Proteomics 11, 3-14, 2012 . • Target/Decoy FDR - Tutorial • Elias & Gygi, Nature Methods, 4, 207-214, 2007. • Protein Inference - Tutorial • Nesvizhskii, Mol Cell Proteomics, 4, 1419-1440, 2005.

41 BACKUPS 14+ 23,984 +/- 2 + Example of electrospray mass spectrum ESI MS 13 (M+H)1+ = 24,010 +/- 30 MALDIof intact MS protein (beta-Casein) 24,024 +/- 2

15+ 18+

16+ 24,093 +/- 2 12+

1400 1410 1420

11+ Relative Abundance, Relative %

10+

9+ 8+

800 1200 1600 2000 20000 22000 24000 26000 m/z m/z 47. M. Mann, C. K. Meng and J. B. Fenn, Anal . Chem. 61, 1702 (1989).