Post-translational Modification Identification by

Liwen Zhang Mass Spectrometry and Proteomics Facility The Ohio State University

Summer Workshop 2015 PTM: Chemical Modifications on Increases the Functional Diversity of the Proteome Regulate Activity, Localization and Interaction

Image from https://www.lifetechnologies.com/overview‐post‐translational‐modification.html Histone Modifications

Arginine Methylation Lysine Methylation

Acetylation Sumoylation

K‐GGXXX

Phosphorylation Ubiquinylation

Citrullination

K‐GG Post-translational Modifications

Modified Modification Mass Shift (Da) Function Residue

Disulphide bond C ‐2.0157 formation stability

Methylation K/R 14.0157 regulation of gene expression

Hydroxylation P 15.9949 Protein stability and protein-ligand interaction

Oxidation of Met M 15.9949 usually introduced during digestion

Acetylation K 42.0106 Protein stability, protection of N terminus, regualtion of protein-DNA interactions oxidative stress; preventing irreversible oxidation of protein thiol; control of cell- 305.0682 S‐Glutathionylation C signaling pathways by modulating protein function Signaling, activation/inactivation of activities, modulation of molecular Phosphorylation S/T/Y 79.9663 interactions

Nitration Y 44.9851 oxidative damage during inflammation

Ubiquitination K 114.0429 destruction signal

Oxidation of Cys C 31.9721/47.9847 oxidative/nitrosative stress Nitrosylation C 28.9902 oxidative/nitrosative stress Excreted proteins, cell-cell recognition/signaling O-GlcNAc, reversible, regulatory Glycosylation N/S/T ‐‐‐‐‐‐‐‐ functions Method for PTM Detections

• Specific Fluorescence Dye: Pro-Q Diamond---Phosphorylation Pro-Q Emerald---Glycosylation • Specific Antibodies: Nitration; Phosphorylation; Methylation; Ubiquitination; Acetylation • Mass Shift/PI Change: Gel Approaches

Pros: High sensitivity, Favor the modification Cons: Lack of accuracy Interference from other residues Lack the ability to locate the actual modification sites

Mass Spectrometry!!!!! Modifications Identification by MS/MS

• Virtually anything that shifts the mass can be determined by MS

• Using MS/MS allows for identification of the modification AND location

• You must have an idea of what modification you are looking for

• First use protein stain to examine various modifications

• Western analysis VS MS analysis

Cons: Detection based on ionization efficiency and abundance of the , thus may not favor modified peptides.

• LC-MS/MS allows for the identification of low abundant modifications MSMS Fragments and Different Fragmentation Technique

xn-1 yn-1 zn-1 xn-2 yn-2 zn-2 x1 y1 z1 R O R O R O R O 1 2 n-1 n H2N CCN CC … N CC N CCOH H H H H H H

a1 b1 c1 a2 b2 c2 an-1 bn-1 cn-1

CID: a, b and y ion, loss of H2O, NH3, side-chain PSD: a, b and y ion IRMPD: b and y ion, loss of side-chain ECD: c, y and z ions ETD: c, y and z ions Sequence Nomenclature for Mass Ladder

X b2‐b1=56Da + R2 b3‐b2=56Da + R2 X b1 b2 b3

Mass=56Da+R

y3 y2 y1

y3‐y2=56Da + R2 y2‐y1=56Da + R2 1598

723 965 1166 529 1424 401 852 586 1052 1295 QGH E L S N EER G L L E D L G Y D V V V K

200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 m/z b12 G L L E D L G Y D V V V K y2

1273.72 b12

246.00 y2

200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 m/z b11 b12 G L L E D L G Y D V V V K y3 y2

1174.68 1273.72 b11 b12

99.04 Val

246.00 345.32 y2 y3

99.32 Val

200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 m/z b10 b11 b12 G L L E D L G Y D V V V K y4y3 y2

1075.51 1174.68 1273.72 b10 b11 b12

99.17 99.04 Val Val

246.00 345.32 444.29 y2 y3 y4

99.32 98.97 Val Val

200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 m/z b9 b10 b11 b12 G L L E D L G Y D V V V K y5 y4y3 y2

976.45 1075.51 1174.68 1273.72 b9 b10 b11 b12

99.06 99.17 99.04 Val Val Val

246.00 345.32 444.29 559.27 y2 y3 y4 y5

99.32 98.97 114.98 Val Val Asp

200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 m/z b3 b4 b5 b6b7 b8 b9 b10 b11 b12 G L L E D L G Y D V V V K y11 y10 y9 y8 y7 y6 y5 y4y3 y2

283.82 413.06 528.07 641.08 698.27 861.35 976.45 1075.51 1174.68 1273.72 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12

129.24 115.01 113.01 57.19 163.08 115.10 99.06 99.17 99.04 Glu Asp Leu Gly Tyr Asp Val Val Val

246.00 345.32 444.29 559.27 722.22 779.16 892.36 1007.39 1136.48 1249.63 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11

99.32 98.97 114.98 162.95 56.94 113.20 115.03 129.09 113.19 Val Val Asp Tyr Gly Leu Asp Glu Leu

200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 m/z Database Searching

Mascot •16 node cluster for high-speed data processing

Protein sequences are digested and fragmented In Silico which produces an enormous peak list

Raw MS/MS data is converted to a peak list and compared against the In Silico peak list.

Critical point of database searching •The enzyme must work properly (no non-specific cleavage and missed cuts) •The mass accuracy limits must be set appropriately •Any modifications must be accounted for (modified cysteine) •The database must contain the protein Mascot Database Search Engine

Enzyme Used

20ppm for orbitrap 1.5 Da for LTQ MASCOT Results

Protein Mol. Weight

Protein Mascot # of spectra matched Mowse Score # of peptides matched * Protein Name

The Exponentially Modified Protein Abundance Index (emPAI)

* Example 1 — Phosphorylation Identification by MS/MS

b9 b11 b12 b13 b14 b4 b5 b6 b7 b9* b10 b11* b12* b13* b14* b15* 48 41T S S D N N V S(PO4) V T N V S V A K56 y13 y12 y11 y10y9 y8 y7 y6 y5 y4 y3 y12* y11* y9* bn*=bn-H3PO4 2+ yn*=yn-H3PO4 m/z=850.54 802.432+ 391.12 505.27 619.29 718.37 M-H3PO4 984.39 1085.46 1199.51 1298.46 1385.54 1484.46 b4 b5 b6 b7 266.02 b9 b10 b11 b12 b13 b14 (87+80+99) Asn Asn Val ThrAsn Val Ser Val Ser(PO4)-Val

886.36 1200.51 1386.63 b9* b12* b14* 1287.76 1101.51 b13* 1457.78 b11* b15*

317.28 404.24 503.32 617.36 718.37 817.37 984.39 1083.45 1197.51 1311.59 1426.56 y3 y4 y5 y6 y7 y8 y9 y10 y11 y12 y13 167.02 (87+80) Ser Val Asn Thr Val Val Asn Asn Asp Ser(PO4) 886.36 y9* 1099.56 1213.70 y11* y12*

300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 m/z Example 2 — Nitration Identification by MS/MS

b3 b4 b7 b8b9 b10 b11 b12 246 241E A F P A Y (N2O) R Q P P E S L253 y11 y10 y9 y8 y7 y6 y5 y4 y3 y2

Measured m/z = 775.612+ Theoretical m/z = 775.372+

348.10 445.22 880.42 1008.45 1105.51 1202.57 1331.59 b3 b4 b7 b8 b9 b10 b11 601.992+ b10 Pro Gln Pro Pro Glu

219.15 348.10 445.22 542.31 670.43 826.20 1034.50 1105.51 1202.57 1349.69 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11 675.552+ y11 208.3 (163+45) Glu Pro Pro Gln Arg Ala Pro Phe Tyr(N2O)

666.502+ b11 709.872+ b12

200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 m/z Example 3 — Identification of Cysteine Oxidation by MS/MS y17 y16 y15 y14 y13 y12 y11 y10 y9 y8 y7 y6 y5 y4 529 548 V G S V L Q E G C(SO3) E K I S S L Y G D L R b6 b7 b8 b9 b10 b11 b12 b13 b14 b15 b16 b17 b18 b19

584.26 713.38 1050.21 1178.34 1291.39 1465.42 1578.48 1741.58 1913.59 b6 b7 b10 b11 b12 b14 b15 b16 b18 770.40 921.28 1378.34 1798.75 b8 b9 b13 b17 150.88 (103+48) EGC(SO3) EKISSL Y GD

1014.182+ b19

460.24 623.30 736.40 910.39 1023.55 1151.56 1280.54 1431.51 1617.48 1745.66 1858.62 y4 y5 y6 y8 y9 y10 y11 y12 y14 y15 y16 823.41 1488.48 y7 y13 150.97 (103+48) Y L S S I K E C(SO3) G E QL

873.542+ 930.162+ y15 y16

979.482+ y17

400 600 800 1000 1200 1400 1600 1800 2000 m/z Example 4 — Identification of DMPO Adduct by MS/MS

b4 b5 b6 b7 b8 b10 b11 907 900E E W K W F R C (DMPO) P T L L911 y11 y9 y7

Theoretical m/z=859.942+ 2+ 794.642+ Observed m/z=859.89 2+ 738.12 b11 b10

DMPO=C6H9NO (111.0684 Da)

573.11 759.32 906.56 1062.49 1276.67 1474.67 b4 b5 b6 b7 b8 b10

186.21 97.24 155.93 214.18(103+111) 198.00 (97+101) Trp Phe Arg C-DMPO Pro-Thr

638.892+ b8 638.312+ y9 795.572+ 1274.57 y11 y9

960.47 y7

500 600 700 800 900 1000 1100 1200 1300 1400 1500 m/z PTM Identification: when modification is on the terminus of the peptide

b2 b3 b4 b5 b6b7 b8 b9 b10 b11 b12 b13 b14 44X K E G V V H G V A T V A E K60 y14 y13 y12 y11 y10 y9 y8 y7 y6 y5 y4y3 y2

753.582+ 543.30 b14 935.48 1105.51 b4 b8 b10 357.19 486.30 642.31 741.36 878.49 1034.45 1206.44 1305.59 1376.51 1505.59 b2 b3 b5 b6 b7 b9 b11 b12 b13 b14

129.11 57.00 99.01 99.05 137.13 56.99 97.97 71.06 100.93 99.15 70.92 129.98 E G V V H G V A T V A E

276.20 347.18 446.27 547.27 618.41 717.41 911.36 1010.47 1166.53 1295.49 1423.55 y2 y3 y4 y5 y6 y7 774.38 y9 y10 1109.47y12 y13 y14 y8 y11

70.98 99.09 101.00 71.13 99.00 56.97 136.98 99.11 99.00 57.06 128.96 128.06 A V T A V G H V V G E K 712.632+ y14

200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 m/z PTM Identification: when modification is on the terminus of the peptide

b2 b3 b4 b5 b6b7 b8 b9 b10 b11 b12 b13 b14 44X K E G V V H G V A T V A E K60 y14 y13 y12 y11 y10 y9 y8 y7 y6 y5 y4y3 y2

Mass of X=(M+H)-y14=826.45*2-1-1423.55=228.35 Mass of X=b2-Mass of Lysine-H=357.19-128.02-1.01=228.16

753.582+ 543.30 b14 935.48 1105.51 b4 b8 b10 357.19 486.30 642.31 741.36 878.49 1034.45 1206.44 1305.59 1376.51 1505.59 b2 b3 b5 b6 b7 b9 b11 b12 b13 b14

129.11 57.00 99.01 99.05 137.13 56.99 97.97 71.06 100.93 99.15 70.92 129.98 E G V V H G V A T V A E

276.20 347.18 446.27 547.27 618.41 717.41 911.36 1010.47 1166.53 1295.49 1423.55 y2 y3 y4 y5 y6 y7 774.38 y9 y10 1109.47y12 y13 y14 y8 y11

70.98 99.09 101.00 71.13 99.00 56.97 136.98 99.11 99.00 57.06 128.96 128.06 A V T A V G H V V G E K 712.632+ y14

200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 m/z PTM Identification: when modification is on the terminus of the peptide

b2 b3 b4 b5 b6b7 b8 b9 b10 b11 b12 b13 b14 44X K E G V V H G V A T V A E K60 y14 y13 y12 y11 y10 y9 y8 y7 y6 y5 y4y3 y2

Modification occurs on lysine residue Mass of modification =Mass of X-Mass of Lysine=228.16-128.09=100.05Da Succinylation: Addition of –COCH2CH2CO- to lysine residue (+100.0115Da)

Theoretical m/z 826.45432+ Observed m/z 826.45732+ Mass Error 3.63 ppm

High mass accuracy instrument needed Use high mass accuracy for MS/MS too MSn can be performed to resolve the structure of modification if necessary Identification of PTMs:

Know the Modification (Stability, Mass Shift, possible modification site)

High Samples Purity

High Concentration

High Mass Accuracy/Resolution Difficulties for PTMs Identification

 Generating Peptides containing modified residue --Different /Digestion Condition

 Low population of modifications (usually less than 1%) --Enrichment --Targeted Method

 Modification not stable --Experiment Condition (reducing reagents, enzymes) --Fresh sample

 Modification labile in MS analysis --Neutral Loss Method --Different Fragmentation Method Enzymes for Proteome Research

Trypsin K-X and R-X

Endoprotease Lys-C K-X except when X = P

Endoprotease Arg-C R-X except when X = P

Endoprotease Asp-N X-D

Endoprotease Glu-C E-X except when X = P

Chymotrypsin L-X, F-X, Y-X and W-X

Cyanogen Bromide X-M Using the CORRECT Enzyme

1 MAALKLLSSG LRLGASARSS RGALHKGCVC YFSVSTRHHT KFYTDPVEAV 51 KDIPNGATLL VGGFGLCGIP ENLIGALLKT GVKDLTAVSN NAGVDNFGLG 101 LLLRSKQIKR MISSYVGENA EFERQFLSGE LEVELTPQGT LAERIRAGGA 151 GVPAFYTSTG YGTLVQEGGS PIKYNKDGSV AIASKPREVR EFNGQHFILE 201 EAITGDFALV KAWKADRAGN VIFRKSARNF NLPMCKAAGT TVVEVEEIVD 251 IGSFAPEDIH IPKIYVHRLI KGEKYEKRIE RLSLRKEGDG KGKSGKPGGD 301 VRERIIKRAA LEFEDGMYAN LGIGIPLLAS NFISPNMTVH LQSENGVLGL Phosphorylation 351 GPYPLKDEAD ADLINAGKET VTVLPGASFF SSDESFAMIR GGHVNLTMLG 401 AMQVSKYGDL ANWMIPGKMV KGMGGAMDLV SSSKTKVVVT MEHSAKGNAH 451 KIMEKCTLPL TGKQCVNRII TEKGVFDVDK KNGLTLIELW EGLTVDDIKK 501 STGCDFAVSP NLMPMQQIST

Tryspin: 309AALEFEDGMYANLGIGIPLLASNFISPNMTVHLQSENGVLGLGPYPLK356 Chymotrypsin: 329ASNF332, GluC: 315DGMYANLGIGIPLLASNFISPNMTVHLQSE344 Optimize Digestion Conditions 1 MADESSDAAG EPQPAPAPVRRRSSANYRAY ATEPHAKKKS KISASRKLQL 51 KTLMLQIAKQ EMEREAEER R GEKGRVLRTR CQPLELDGLG FEELQDLCRQ 101 LHARVDKVDE ERYDVEAKVT KNITEIADLT QKIYDLRGKF KRPTLRRVRI 151 SADAMMQALL GTRAKESLDL RAHLKQVKKE DIEKENREVG DWRKNIDALS 201 GMEGRKKKFE G Sequence coverage: 97.2%

Mr(calc Start Observed Mr(expt) ppm M Score Peptide ) AA2-20 932.9369 1863.8592 1863.9 -0.34 0 104 M.ADESSDAAGEPQPAPAPVR.R AA2-21 674.3265 2019.9576 2020 -1.67 1 95 M.ADESSDAAGEPQPAPAPVRR.R AA2-22 726.3605 2176.0598 2176.1 -1.06 2 66 M.ADESSDAAGEPQPAPAPVRRR.S AA23-37 555.9385 1664.7936 1664.8 1.78 1 55 R.SSANYRAYATEPHAK.K AA23-39 641.3364 1920.9874 1921 3.59 3 44 R.SSANYRAYATEPHAKKK.S AA29-37 494.2502 986.4858 986.48 3.69 0 26 R.AYATEPHAK.K AA29-39 622.3464 1242.6782 1242.7 4.97 2 44 R.AYATEPHAKKK.S AA29-41 486.9404 1457.7994 1457.8 0.25 3 35 R.AYATEPHAKKKSK.I ; 1:200; RT; 20 min AA29-46 658.37 1972.0882 1972.1 1.43 4 34 R.AYATEPHAKKKSKISASR.K AA47-59 382.7495 1526.9691 1527 7.14 2 35 R.KLQLKTLMLQIAK.Q Digestion AA52-64 530.952 1589.8341 1589.8 4.51 1 85 K.TLMLQIAKQEMER.E AA52-69 735.7083 2204.1029 2204.1 4.57 2 49 K.TLMLQIAKQEMEREAEER.R AA60-69 436.1964 1305.5675 1305.6 4.31 1 25 K.QEMEREAEER.R AA76-99 730.1263 2916.476 2916.5 5.98 2 31 R.VLRTRCQPLELDGLGFEELQDLCR.Q AA79-99 850.4134 2548.2183 2548.2 5.24 1 95 R.TRCQPLELDGLGFEELQDLCR.Q AA81-99 1146.5381 2291.0616 2291.1 2.36 0 115 R.CQPLELDGLGFEELQDLCR.Q AA81-104 725.1068 2896.3981 2896.4 0.75 1 69 R.CQPLELDGLGFEELQDLCRQLHAR.V AA105-118 565.6145 1693.8217 1693.8 3.43 2 41 R.VDKVDEERYDVEAK.V AA105-121 1012.028 2022.0415 2022 7.19 3 39 R.VDKVDEERYDVEAKVTK.N AA119-132 787.4489 1572.8833 1572.9 7 1 63 K.VTKNITEIADLTQK.I AA119-137 745.4208 2233.2405 2233.2 3.91 2 65 K.VTKNITEIADLTQKIYDLR.G AA119-139 605.5953 2418.3522 2418.3 1.68 3 54 K.VTKNITEIADLTQKIYDLRGK.F AA122-132 623.3372 1244.6599 1244.7 -1.06 0 79 K.NITEIADLTQK.I AA122-137 953.5199 1905.0252 1905 2.38 1 66 K.NITEIADLTQKIYDLR.G AA122-139 697.7211 2090.1414 2090.1 2.03 2 48 K.NITEIADLTQKIYDLRGK.FQ ALLGTR.A AA164-171 466.2661 930.5176 930.51 4.52 1 58 R.AKESLDLR.A AA138-146 368.2345 1101.6816 1101.7 4.08 2 26 R.GKFKRPTLR.R AA138-147AA176-187 420.2665 505.9445 1257.7776 1514.8118 1257.8 1514.8 -0.46 4.36 3 3 26 R.GKFKRPTLRR.V 59 K.QVKKEDIEKENR.E AA148-163AA188-205 866.964 678.3385 1731.9134 2031.9937 1731.9 2032 0.61 6.93 1 2 79 R.VRISADAMMQALLGTR.A 43 R.EVGDWRKNIDALSGMEGR.K AA150-163AA194-205 739.3765 645.8306 1476.7385 1289.6466 1476.7 1289.6 -2.94 5.29 0 107 1 R.ISADAMM 78 R.KNIDALSGMEGR.K AA195-205 581.7822 1161.5499 1161.5 4.39 0 95 K.NIDALSGMEGR.K AA195-206 645.8309 1289.6472 1289.6 5.77 1 54 K.NIDALSGMEGRK.K AA195-207 709.878 1417.7414 1417.7 4.74 2 46 K.NIDALSGMEGRKK.K AA195-211 627.329 1878.9651 1879 1.58 4 29 K.NIDALSGMEGRKKKFEG.- Difficulties for PTMs Identification

 Generating Peptides containing modified residue --Different Enzymes/Digestion Condition

 Low population of modifications --Enrichment --Targeted Method (For better MSMS) PTM Enrichment Techniques (Start with mg of samples)

Acetylation: Anti-acetyl lysine polyclonal antibody

Phosphopeptides: Immobilized Metal Affinity Chromatography (Fe3+) Metal Oxide Affinity Chromatography (TiO2, ZrO2) Reversible Covalent Binding (Techniques for phosphopeptide enrichment prior to analysis by mass spectrometry. Mass Spectrom Rev. 2010 Jan- Feb;29(1):29-54.) Phospho-Threonine Antibody (P-Thr-Polyclonal)

Nitration: Anti‐Nitrotyrosine polyclonal antibody

Ubiquitination: K-ε-GG–specific antibody (enrichment has to be done prior to other treatment on lysine)

Glycopeptides: Lectin affinity enrichment Covalent Interactions Chromatographic separation (Glycopeptide enrichment and separation for protein glycosylation analysis., J Sep Sci. 2012 Sep;35(18):2341-72).

Methylation: Anti-methyl lysine/araginine antibody Enrichment for Phosphorylation

extract cells Cell Culture

cell lysis

TiO2 beads peptides

In-solution digestion protein

2D LC-MS/MS Phosphopeptides/proteins Identification

Slide courtesy of Nilini S. Ranbaduge NL: 4.05E7 Base Peak m/z= 516.7690- 516.7896 MS 17625_1_rer un

NL: 5.98E5 Example---Targeted MS Scan Base(for Peak Known Modifications) m/z= 479.2634- RT: 19.61 - 39.93 27.80 100 479.2826 90 27.74 MS 2+ 80 Full Scan 27.69 27.93 27.96 SIC of 516.7793 (SLVLC(CAM)TPSR) 70 17625_1_rer 60 28.01 un 50 27.61 28.06 e c

n 100:1 a d

n 27.55 u b A

e 40 tiv la

e 27.50 R 30 28.13 27.44 20 27.38 28.21 10 27.30 28.36 20.9021.45 22.89 23.82 25.54 30.13 25.95 27.19 28.56 29.07 31.48 33.6734.17 36.04 38.15 38.65 100 0 27.80 90 80 27.74 70 27.66 27.82 27.96 60 2+ 50 27.61 SIC of 479.2730 (SLVLfGlyTPSR ) 27.55 28.01 40 28.06 30 27.50 35.91 39.60 27.44 28.13 36.09 20 27.36 39.55 10 27.25 28.21 31.90 32.09 39.49 20.02 21.30 21.75 23.65 24.59 27.0635.59 33.86 28.64 25.85 31.76 33.41 36.25 38.68 0 20 22 24 26 28 30 32 34 36 38 Time (min)

2+ 2+ MS/MS of 516.7793 (SLVLC(CAM)TPSR) MS/MS of 479.2730 (SLVLfGlyTPSR )

17625_1_rerun # 2715-9589 RT: 18.73-51.92 AV: 99 NL: 5.71E3 T: Average spectrum MS2 516.79 (2715-9589) 733.45 100 95 90 85 620.39 80 507.76 NOT Observed!!!! 75 70 65 60

55 832.54 50 45 Relative Abundance 40 Cys(CAM)462.29 Leu Val 35 30 416.76 25 201.05 487.72

20 359.31 544.28 575.37 173.20 Thr 15 Val Leu Cys(CAM)674.26 Thr 10 255.07 423.26 347.30 367.27300.06 704.44 407.78 413.05 602.00 5 288.20 319.31 656.23 886.49 213.06 804.48 773.42 0 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 m/z NL: 2.47E8 Base Peak m/z= 85.0000- 2000.0000 MS 17625_1_rer un

Example---Targeted MS Scan (for Known Modifications) NL: 8.17E8 RT: 19.94 - 50.13 100 Base Peak 35.88 90 Full Scan MS 17625_1_Ta 42.95 49.23 80 41.90 43.66 49.07 70 rget_2 44.67 40.05 35.99 60 36.06 40.1540.20 36.27 45.51

e 50 c n a

d 28.36 n u b A e 40 tiv la e R 30 28.13 28.54 28.21 34.20 36.46 38.26 46.39 20 28.67 32.04 33.80 46.58 28.08 31.93 32.09 36.57 47.16 10 21.16 23.85 24.09 27.55 28.90 47.47 0 29.16 41.95 100 2+ 2+ 90 Targeted @ 479.2730 (SLVLfGlyTPSR )/516.7793 (SLVLC(CAM)TPSR) 80 70 60 50 40 44.90 30 44.81 20 35.82 40.13 35.9541.86 43.30 49.38 10 36.3928.69 40.32 46.00 45.16 48.12 24.70 25.82 28.26 32.14 32.44 36.59 38.21 39.95 34.25 0 20.00 22.08 28.92 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 Time (min)

42.09 MS/MS of 479.27302+ (SLVLfGlyTPSR ) 2+ MS/MS of 516.7793 (SLVLC(CAM)TPSR) 17625_1_Target_2 #667-2166 RT: 19.74-39.25 AV: 264 NL: 3.92E4 17625_1_Target_2 # 799-1295 RT: 22.03-28.48 AV: 40 NL: 1.03E3 T: ITMS + c ESI d Full ms2 [email protected] [130.00-1045.00] T: ITMS + c ESI d Full ms2 [email protected] [120.00-970.00] 733.44 581.28 100 100

95 95 359.31 90 620.39 90 85 85

80 80

75 75

70 70

65 65 832.51 60 60 55 55 507.74 50 50 45 45 40 Relative Abundance 416.77

40 Relative Abundance 470.22 35 35 Cys(CAM)Leu Val 30 460.28 30 460.36 201.05 25 757.45 25 Thr fGly 545.38 Leu 658.34 Val 20 359.31 413.05 20 173.20 300.06 Thr 573.18 15 402.85 15 413.10 299.98 328.66 10 201.02 631.30 674.27 446.40 255.09 173.19 Val Leu 594.32 10 Val Leu Cys(CAM) Thr 388.76 517.26 5 255.10 671.51 fGly Thr 730.39 555.09 229.03 781.32 5 656.23 715.43Pro 702.58 213.08 407.74 498.76 545.24 602.35 814.46Ser 858.31 0 262.26 310.81 430.13 689.49 771.31 886.44 150 200 250 300 350 400 450 500 550 600 650 700 750 800 0 m/z 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 m/z Difficulties for PTMs Identification

 Generating Peptides containing modified residue --Different Enzymes/Digestion Condition

 Low population of modifications --Enrichment --Targeted Method

 Modification not stable --Experiment Condition (reducing reagents, enzymes) --Fresh sample Adjust Experiment Condition to Preserve Modifications

y12Ay11A y10A y9A y8A y6A y5A y4A y3A y2A 185 199 Chain A N A C187 G S G Y D F D V F V V R S S y5B y3B 138L V E G C L V G G R146 Chain B b3B 142 A 274.25 373.31 520.42 619.48 734.51 996.57 1159.62 1216.57 1303.641360.66 y2A y3A y4A y5A y6A y8A y9A y10A y11A y12A

VFVD FDYGSG

B 408.17 465.20 552.31 609.30 772.30 1034.35 1149.39 1248.41 1395.53 1494.50 1593.59

G S G Y FD D V F V V

289.23342.33 501.10 y3B b3B y5B

250 400 600 800 1000 1200 1400 1600 m/z Using Different Enzyme to Protect Modifications

y12 y10 y9 y8 y7 y6 y5 y4 y3 202 194A G R P G M G V O G P E T S L208 b3 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 Theoretical m/z =783.40892+, Observed m/z =783.40802+, Mass Error =1.15ppm O=Pyrrolysine

285.21 439.23 570.30 627.43 726.35 963.39 1020.49 1117.55 1246.54 1347.71 b3 b5 b6 b7 b8 b9 b10 b11 b12 b13 154.13 (97+57) 130.97 57.13 98.92 237.04 57.10 97.06 128.99 101.17 P‐G M G V O G P E T 674.472+ b13 718.062+ b14

623.822+ b12

320.13 449.05 546.24 603.21 840.38 939.24 996.30 1127.37 1281.45 y3 y4 y5 y6 y7 y8 y9 y10 y12

128.92 97.19 56.97 237.17 98.86 57.06 131.07 154.08 (57+97) E P G O V G M G‐P

200 300 400 500 600 700 800m/z 900 1000 1100 1200 1300 1400 LTQ DATA—Fresh Sample

1 MSDKSELKAE LERKKQRLAQ IREEKKRKEE ERKKKETDQK KEAVAPVQEE 51 SDLEKKRREA EALLQSMGLT PESPIVFSEY WVPPPMSPSS KSVSTPSEAG 101 SQDSGDGAVG SRTLHWDTDP SVLQLHSDSD LGRGPIKLGM AKITQVDFPP 151 REIVTYTKET QTPVMAQPKE DEEEDDDVVA PKPPIEPEEE KTLKKDEEND 201 SKAPPHELTE EEKQQILHSE EFLSFFDHST RIVERALSEQ INIFFDYSGR 251 DLEDKEGEIQ AGAKLSLNRQ FFDERWSKHR VVSCLDWSSQ YPELLVASYN 301 NNEDAPHEPD GVALVWNMKY KKTTPEYVFH CQSAVMSATF AKFHPNLVVG 351 GTYSGQIVLW DNRSNKRTPV QRTPLSAAAH THPVYCVNVV GTQNAHNLIS 401 ISTDGKICSW SLDMLSHPQD SMELVHKQSK AVAVTSMSFP VGDVNNFVVG 451 SEEGSVYTAC RHGSKAGISE MFEGHQGPIT GIHCHAAVGA VDFSHLFVTS 501 SFDWTVKLWT TKNNKPLYSF EDNADYVYDV MWSPTHPALF ACVDGMGRLD 551 LWNLNNDTEV PTASISVEGN PALNRVRWTH SGREIAVGDS EGQIVIYDVG 601 EQIAVPRNDE WARFGRTLAE INANRADAEE EAATRIPA Sequence Coverage 39%; T95 was phosphorylated Orbitrap XL Data‐‐‐Two Days old, Stored at 4C

1 MSDKSELKAE LERKKQRLAQ IREEKKRKEE ERKKKETDQK KEAVAPVQEE 51 SDLEKKRREA EALLQSMGLT PESPIVFSEY WVPPPMSPSS KSVSTPSEAG 101 SQDSGDGAVG SRTLHWDTDP SVLQLHSDSD LGRGPIKLGM AKITQVDFPP 151 REIVTYTKET QTPVMAQPKE DEEEDDDVVA PKPPIEPEEE KTLKKDEEND 201 SKAPPHELTE EEKQQILHSE EFLSFFDHST RIVERALSEQ INIFFDYSGR 251 DLEDKEGEIQ AGAKLSLNRQ FFDERWSKHR VVSCLDWSSQ YPELLVASYN 301 NNEDAPHEPD GVALVWNMKY KKTTPEYVFH CQSAVMSATF AKFHPNLVVG 351 GTYSGQIVLW DNRSNKRTPV QRTPLSAAAH THPVYCVNVV GTQNAHNLIS 401 ISTDGKICSW SLDMLSHPQD SMELVHKQSK AVAVTSMSFP VGDVNNFVVG 451 SEEGSVYTAC RHGSKAGISE MFEGHQGPIT GIHCHAAVGA VDFSHLFVTS 501 SFDWTVKLWT TKNNKPLYSF EDNADYVYDV MWSPTHPALF ACVDGMGRLD 551 LWNLNNDTEV PTASISVEGN PALNRVRWTH SGREIAVGDS EGQIVIYDVG 601 EQIAVPRNDE WARFGRTLAE INANRADAEE EAATRIPA

Sequence Coverage 53%; No Phosphorylation Detected

Detected 30 times!!! Difficulties for PTMs Identification

 Generating Peptides containing modified residue --Different Enzymes/Digestion Condition

 Low population of modifications --Enrichment --Targeted Method

 Modification not stable --Experiment Condition (reducing reagents, enzymes) --Fresh sample

 Modification labile in MS analysis --Neutral Loss Method --Different Fragmentation Method Neutral Loss Scan ScanDissociate Scan

http://www.biotechniques.com/BiotechniquesJournal/2006/June/Analysis-of-posttranslational-modifications-of- proteins-by-tandem-mass-spectrometry YKVPQLEIVPNsAEER Alpha Casein

NL of Phosphorylation: 98 (+1), 49 (+2)…

1070.62 1561.73 b9 y13 1777.84 391.23 504.24 b15 b3 y4

1463.75 y13* 784.36 y7* • Help Identify the Sequence but don’t Give Information about the Location of the

1070.62 Modification if Multiple Sites 883.43 b9 y8* Coexisting. • Sensitive not Ideal 391.23 996.51 b3 y9* 1125.55 1238.64 y10* y11* 292.16 1562.82 b2 y14*

Hydrothermal synthesis of α-Fe2O3@SnO2 core–shell nanotubes for highly selective enrichment of phosphopeptides for mass spectrometry analysis Nanoscale, 2010,2, 1892-1900 ECD/ETD: Keep the Phosphorylation Intact!!!

H3PO4 Group may Leave upon CID Dissociation; Actual Location of H3PO4 Group May not be Clear if >2 Sites in the Sequence

xn-1 yn-1 zn-1 xn-2 yn-2 zn-2 x1 y1 z1

R O R O R O R O 1 2 n-1 n H2N CCN CC … N CC N CCOH H H H H H H

a1 b1 c1 a2 b2 c2 an-1 bn-1 cn-1

CID: a, b and y ion, loss of H2O, NH3, side-chain ECD/ETD: c, y and z ions

Electron Capture Dissociation (ECD, ICR)/ Electron Transfer Dissociation (ETD, Traps) will Generate Peptide Fragments without Losing the Sidechain Phosphorylation

Sweet, Anal. Chem. 2006, 78, 7563-7569 Leann M. Mikesh et al. Biochimica et Biophysica Acta 1764 (2006) 1811–1822 http://openi.nlm.nih.gov/detailedresult.php?img=2849712_zjw0031035580003&req=4 Choose a Good Enzyme to Generate The GG Tag —the Identification of Ubiquitination by MS/MS

MQIFVKTLTG KTITLEVEPS DTIENVKAKI QDKEGIPPDQ QRLIFAGKQL EDGRTLSDYN IQKESTLHLV LRLRGG Trypsin!!!

+3Ub +2Ub +1Ub Unmodified Choose a Good Enzyme to Generate Signature Modification Group —the Identification of Ubiquitination by MS/MS

A proteomics approach to understanding protein ubiquitination Nature Biotechnology 21, 921 - 926 (2003) Unknown Truncation Sites Identification

Separate Suspected Truncation Product(s) from Intact Protein

Separation on 1D SDS PAGE Intact MS Measurement Cut the band (truncated protein) Digestion Top down Sequencing LC/MSMS

Peptide Sequencing MASCOT Protein sequence

Suggested truncation sites

Search against new sequences

Validation Example---Unknown Truncation Sites Identification

MASCOT RESULT: 1 MVLWILWRPF GFSGRFLKLE SHSITESKSL IPVAWTSLTQ MLLEAPGIFL 51 LGQRKRFSTM PETETHERET ELFSPPSDVR GMTKLDRTAF KKTVNIPVLK 101 VRKEIVSKLM RSLKRAALQR PGIRRVIEDP EDKESRLIML DPYKIFTHDS 151 FEKAELSVLE QLNVSPQISK YNLELTYEHF KSEEILRAVL PEGQDVTSGF 201 SRIGHIAHLN LRDHQLPFKH LIGQVMIDKN PGITSAVNKI NNIDNMYRNF 251 QMEVLSGEQN MMTKVRENNY TYEFDFSKVY WNPRLSTEHS RITELLKPGD 301 VLFDVFAGVG PFAIPVAKKN CTVFANDLNP ESHKWLLYNC KLNKVDQKVK 351 VFNLDGKDFL QGPVKEELMQ LLGLSKERKP SVHVVMNLPA KAIEFLSAFK 401 WLLDGQPCSS EFLPIVHCYS FSKDANPAED VRQRAGAVLG ISLEACSSVH 451 LVRNVAPNKE MLCITFQIPA SVLYKNQTRN PENHEDPPLK RQRTAEAFSD 501 EKTQIVSNT ~260AA =~27‐28kDa Conservation of structure and mechanism by Trm5 enzymes RNA 2013 Sep; 19(9): 1192–1199 Example---Unknown Truncation Sites Identification

Constructed new sequences (a total of 24) and searched the data against these sequences: >16502 Sequence 1 (241-509) NNIDNMYRNFQMEVLSGEQNMMTKVRENNYTYEFDFSKVYWNPRLSTEHSRITELLKPGDVLFDVFAGVGPFAIP VAKKNCTVFANDLNPESHKWLLYNCKLNKVDQKVKVFNLDGKDFLQGPVKEELMQLLGLSKERKPSVHVVMNLPA KAIEFLSAFKWLLDGQPCSSEFLPIVHCYSFSKDANPAEDVRQRAGAVLGISLEACSSVHLVRNVAPNKEMLCITFQI PASVLYKNQTRNPENHEDPPLKRQRTAEAFSDEKTQIVSNT < >16502 Sequence 2 (242-509) NIDNMYRNFQMEVLSGEQNMMTKVRENNYTYEFDFSKVYWNPRLSTEHSRITELLKPGDVLFDVFAGVGPFAIPVA KKNCTVFANDLNPESHKWLLYNCKLNKVDQKVKVFNLDGKDFLQGPVKEELMQLLGLSKERKPSVHVVMNLPAKAI EFLSAFKWLLDGQPCSSEFLPIVHCYSFSKDANPAEDVRQRAGAVLGISLEACSSVHLVRNVAPNKEMLCITFQIPAS VLYKNQTRNPENHEDPPLKRQRTAEAFSDEKTQIVSNT < …..

>16502 Sequence 23 (263-509) TKVRENNYTYEFDFSKVYWNPRLSTEHSRITELLKPGDVLFDVFAGVGPFAIPVAKKNCTVFANDLNPESHKWLLYN CKLNKVDQKVKVFNLDGKDFLQGPVKEELMQLLGLSKERKPSVHVVMNLPAKAIEFLSAFKWLLDGQPCSSEFLPI VHCYSFSKDANPAEDVRQRAGAVLGISLEACSSVHLVRNVAPNKEMLCITFQIPASVLYKNQTRNPENHEDPPLKRQ RTAEAFSDEKTQIVSNT < >16502 Sequence 24 (264-509) KVRENNYTYEFDFSKVYWNPRLSTEHSRITELLKPGDVLFDVFAGVGPFAIPVAKKNCTVFANDLNPESHKWLLYNC KLNKVDQKVKVFNLDGKDFLQGPVKEELMQLLGLSKERKPSVHVVMNLPAKAIEFLSAFKWLLDGQPCSSEFLPIV HCYSFSKDANPAEDVRQRAGAVLGISLEACSSVHLVRNVAPNKEMLCITFQIPASVLYKNQTRNPENHEDPPLKRQR TAEAFSDEKTQIVSNT <

MASSMATRIX Results: hit# score decoy% protein description hit1 3533 0.00% 16502 Sequence 12 (252-509) hit2 3522 0.00% 16502 Sequence 23 (263-509) hit3 3519 0.00% 16502 Sequence 14 (254-509) hit4 3512 0.00% 16502 Sequence 21 (261-509) b2 b3 b4 b5b6 b8 b9 b10 b11 b12 M E V L S G E Q N M M T K 252 y12 y11 y10 y9 y8 y5 y4 y3 y2 264 Observed m/z =749.34412+ 2+ Theoretical m/z =749.3409 1025.27 Mass Error=4.27ppm y9

1400 248.21 379.20 510.34 624.17 938.20 1138.37 1237.34 1366.52 y2 y3 y4 y5 y8 y10 y11 y12 1200 130.99 131.14 113.83 314.03 (57+129+128) 87.07 113.10 98.97 129.18 Met Met Asn Gly-Glu-Gln Ser Leu Val Leu

800

Intensity 260.92 360.13 473.13 560.08 617.17 874.34 988.20 1119.23 1250.21 1351.26 b2 b3 b4 b5 b6 b8 b9 b10 b11 b12

400 99.21 113.00 86.95 57.09 257.17(128+129) 113.86 131.03 130.98 101.05 Val Leu Ser Gly Glu-Gln Asn Met Met Thr

200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 m/z Conservation of structure and mechanism by Trm5 enzymes RNA. 2013 Sep; 19(9): 1192–1199 b2 b3 b5b7 b8 b9 b10 V L S G E Q N M M T K 254 y10 y9 y8 y7 y6 y5 y4 y3 y2 264 Observed m/z =619.30162+ Theoretical m/z =619.29942+ Mass Error=3.55ppm 1025.23 y9 610.232+ M‐H2O 5000

248.11 379.17 510.11 624.08 752.17 881.22 938.30 1138.25 y2 y3 y4 y5 y6 y7 y8 y10

4000 131.06 130.94 113.97 128.09 129.05 57.08 86.93 113.02 Met Met Asn Gln Glu Gly Ser Leu

3000 213.01 300.13 486.24 728.26 859.15 990.43 1091.04 Intensity b2 b3 b5 b7 b8 b9 b10

2000

87.12 186.11 (57+129) 242.02 (128+114) 130.89 131.28 100.61 1000 Ser Gly-Glu Gln-Asn Met Met Thr

200 300 400 500 600 700 800 900 1000 1100 1200 m/z

Conservation of structure and mechanism by Trm5 enzymes RNA. 2013 Sep; 19(9): 1192–1199 b2 b3 b4 b6 b7 b9 b10 b11 b13 b14 b15 b17 261M M T K V R E N N Y T Y E F D F S K278 y17 y16 y15 y14 y13 y12 y11 y10 y9 y8 y7 y6 y5 y4 y3 y2

906.412+ 1020.812+ Observed m/z =768.35733+ y14 y16 856.742+ 906.412+ 1086.462+ Theoretical m/z =768.35393+ 1350 y13 y15 y17 Mass Error=4.42ppm 1250 99.34 127.78 101.02 131.30 Val Lys Thr Met

234.08 381.14 496.40 643.26 772.16 935.14 1036.28 1199.31 1313.47 1427.34 1556.88 1000 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11 y12 961.622+ b15

147.06 115.26 146.86 128.90 162.98 101.14 163.03 114.16 113.87 129.54 750 Phe Asp Phe Glu Tyr Thr Tyr Asn Asn Glu

904.472+ 2+ 2+ 2+ 2+ Intensity b14 438.92 552.89 634.302+ 684.622+ 830.87 1078.65 b7 b9 b10 b11 b13 b17 500 263.06 364.12 492.11 747.66 b2 b3 b4 b6

250 101.06 127.99 255.55 (99+156) Thr Lys Val-Arg

200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 m/z Conservation of structure and mechanism by Trm5 enzymes RNA. 2013 Sep; 19(9): 1192–1199 b4 b3 b6b7 b8 b9 b10 b11 b13 b14 b15 263T K V R E N N Y T Y E F D F S K278 y14 y13 y12 y10 y9 y8 y7 y6 y5 y4 y3

935.40 Observed m/z =680.99663+ y7 Theoretical m/z =680.99363+ 2+ 900 906.36 1036.27 y14 y8 Mass Error=4.40ppm 778.642+ 856.942+ 800 y12 y13 156.60 98.84 Arg Val

830.892+ b13 381.32 496.36 643.34 772.20 1199.22 1313.39 600 y3 y4 y5 y6 y9 y10

115.04 146.98 128.86 163.20 100.87 162.95 114.17 Asp Phe Glu Tyr Thr Tyr Asn 904.692+ 307.442+ 699.792+ 2+ Intensity b14 948.15 400 b5 b11 b15

485.56 728.46 842.88 1005.38 1106.55 1269.44 b4 b6 b7 b8 b9 b10

200 242.90 (129+114) 114.42 162.50 101.17 162.89 Glu-Asn Asn Tyr Thr Tyr

300 400 500 600 700 800 900 1000 1100 1200 1300 1400 m/z

Conservation of structure and mechanism by Trm5 enzymes RNA. 2013 Sep; 19(9): 1192–1199 Glycosylation—A Different Story

“The cell surface landscape is richly decorated with oligosaccharides anchored to proteins or lipids within the plasma membrane. Cell surface oligosaccharides mediate the interactions of cells with each other and with extracellular matrix components.” Science 291:2337

Why Glycosylation Investigation is SO HARD???

• Only Protein Modification Requiring Detailed Structural Characterization (sugar chain/branched) • Poor Ionization efficiency (large mass increase, sometimes has negative charge, hydrophobicity..) • Heterogeneity • Large Size

• Enrichment (Lectin Affinity Binding) • Derivatization (Permethylation) • LC Separation (PGC Column for Glycans, HILIC Column for Glycopeptides) • Both MALDI and ESI will Work • High Mass Accuracy Desired • Negative Mode Works Better • ETD Improve the MSMS Identification

Determination of site-specific glycan heterogeneity on glycoproteins Nature Protocols 7, 1285–1298 (2012)