Development of Methods for the Analysis of Protein Post-translational

Modifications: IsoAspartic Acid and Protein Crosslinking

by Min Liu

A dissertation submitted to

The Faculty of the College of Science of Northeastern University in partial fulfillment of the requirements for the degree of Doctor of Philosophy

September 11, 2014

Dissertation directed by

Zhaohui Sunny Zhou Professor of Chemistry and Chemical Biology Da Ren

Amgen, Inc Acknowledgments

My journey on the path of the fascinating world of protein analysis began with my advisors, Professor Zhaohui Sunny Zhou and Dr Da Ren. I would like to take immense pleasure in thanking them for the opportunity on two challenging and interesting projects. Their passion and high standard for science will inspire me for my life and whole career. Their abundant guidance, valuable suggestion, endless encouragement and patience are greatly appreciated.

They have made me overcome my frustration and have made the past five years truly enjoyable and memorable for me.

I would like to express my deepest gratitude to Dr Janet Cheetham at Amgen for her endless support, many advice and encouragement. Without her support, it is impossible for me to transfer work experience from small molecule to bio-large molecules. Without her encouragement, I could not get the work done and reported here.

I would like to express my great appreciation to Dr Zhongqi Zhang at Amgen for his generous help, strong scientific guidance, and very helpful discussion. Sharing his expertise, he is my great mentor in protein characterization by mass spectrometry.

My special thanks go to my committee members: Professors Paul Vouros and William

Hancock for reviewing my thesis and the valuable suggestions.

I particularly thank my coworkers and lab mates: Chris Spahr, Drs David Brems, Judy

Ostovic, Nina Cauchon, Aleksander Swietlow, Peter Zhou, Wenqin Ni, Tianzhu Zang, Eddie

Zhou, Kevin Moulton, Aldina Mesic, Wanlu Qu, Nathaniel Kenton and Tianyi Bai for their help and scientific discussion. I would like to especially thank Professor Richard Duclos and Kalli

Catcott for their critical review and helpful suggestions on my manuscript preparations. I also ii

thank Drs Bin Ma, Dan Maloney, and Cassandra Wigmore at Bioinformatics Solutions for

helpful discussion on de novo sequencing.

Finally and most importantly, I would like to dedicate this thesis to my family: my

husband Chester Yuan, son Michael Yuan and daughter Emily Yuan. I am sincerely grateful for their generous understanding, endless love and selfless support. Their love, believing in me and encouragement made me completing this five-year endeavor in my career and life.

iii

Abstract

Analysis of protein posttranslational modifications (PTMs) plays pivotal roles for the

understanding of their biological importance. Isoaspartic acid (isoAsp) as the smallest PTM is

observed in vivo and in vitro. No mass difference and subtle difference in physiochemical property between isoAsp and Asp pose a great challenging for sensitive detection and ambiguous location of isoAsp site in complex samples. A novel assay of isoAsp by exploiting methylation specificity of protein isoaspartate methyltransferase (PIMT) at isoAsp and subsequent 18O-

incorporation during methyl ester hydrolysis is presented for sensitive detection and

unambiguous site location of several isoAsp residues in IgG1 (Anal Chem 2012, 84, 1056-1062).

The method can be applied to biological samples to understand the isoAsp process and identify

biomarkers.

Ubiquitous protein crosslinks in biological systems and biopharmaceuticals are reported

to result in loss of bioactivity and immunogenicity, but their characterization is poor, especially

when the crosslink chemistry is undefined, due to their intrinsic structural complexity and a lack

of a systematic analytical approach. A comprehensive methodology, XChem-Finder, has been

developed to break down the analytical challenge via 18O labeling and mass spectrometry,

leading to the discovery of a total of 14 cross-linked thioether peptides in IgG2, including those

that have not been previously reported (Anal Chem 2013, 85, 5900-5908). Furthermore, a novel

Histidine-Histidine (His-His) crosslink in IgG1 was successfully discovered and characterized

via our XChem-Finder (Anal Chem 2014, 86, 4940-4948). This again demonstrates the broad

applicability and utility of our XChem-Finder. The further improvement of XChem-Finder is

discussed. The discovery of more novel crosslinks in protein by XChem-Finder will be

successful without any doubt.

iv

Table of Contents

Acknowledgements …………………………………………………………………………….. ii

Abstract ………………………………………………………………………………….……... iv

Table of Contents …………………………………………………………………….….……... v

List of Figures …………………………………………………………………….….….……... xi

List of Tables ………..…………………………………………………………...….….……... xvi

List of Schemes ………………………..………………………………………...... ….……... xviii

Abbreviations and Symbols ….…………...………………………………..……….….……... xix

Chapter 1: Overview of Protein Post-translational Modification and Their Analysis .………... 1

1.1 Protein Post-translational Modification ………………………………………...... 1

1.2 Biological and Biopharmaceutical Importance of PTMs ………………………..……...... 1

1.3 Analysis of PTMs ………………………………………………………………...……….. 2

1.4 Deamidation and Isomerization ……………………………………………………..…...... 7

1.4.1 Isoaspartic Acid Formation …..…………………………………………………….…... 7

1.4.2 Biological and Biopharmaceutical Importance of Deamidation and Isomerization …... 11

1.4.3 Methods for Detection and Characterization of Deamidation and Isomerization …...... 12

1.4.3.1 Edman Degradation …………………………………………...……………....…… 12

1.4.3.2 Protein-L-isoaspartyl Methyltransferase (PIMT) ……………………………..….... 13

1.4.3.3 Mass Spectrometry …………………………………………………………..….…. 14

1.4.3.4 18O-incorporation …………………………………………………………….…...... 17

v

1.4.3.5 Asp-N Protease Peptide Mapping ……………………………………………..…… 19

1.4.3.6 Chemo-enzymatic Derivatization and Affinity-based Method …………………...... 20

1.4.3.7 Protein Isoaspartate Methyltransferase-mediated 18O-Labeling ………………..….. 22

1.4.3.8 Methods for Racemization Detection …………………………………………..….. 22

1.5 Protein Crosslinking …………………………..…………………….……….………....…. 23

1.5.1 Crosslink Formation …………………….……………………………………....……... 23

1.5.1.1 Crosslinks as degradants …………………………………………..………...……... 24

1.5.2 Biological and Biopharmaceutical Importance of Crosslinks ……………………..…... 32

1.5.3 Methods for Detection and Characterization of Crosslinks …………………………..... 33

1.5.3.1 HPLC with Fluorescence Detection …………………………………………….….. 33

1.5.3.2 MS-based Method …………………………………………………………….……. 33

1.5.3.2.1 C-Terminal 18O-Labeling ………………………...………..…………………… 34

1.5.3.2.2 N-Terminal Modification ………………………………..………………….….. 36

1.5.3.2.3 Chromatographic Sample Enrichment …………………………..…..….……… 38

1.5.3.3 Antibody-based Method …………………………..………………………...……… 39

1.6 Conclusions ……………..………………………………………………………….…...… 39

1.7 References ……………………..……………………………………………...……..……. 40

Chapter 2: Protein Isoaspartate Methyltransferase-Mediated 18O-Labeling of Isoaspartic Acid for

Mass Spectrometry Analysis .….……………………………………………………………..... 52

2.1 Abstract ………………………………………………………………………...……..…... 53

2.2 Introduction …………………………………………………………………….…….…… 53

2.3 Experimental Section ………….……………………………………………….…………. 58

vi

2.3.1 Chemicals ………………………………………………………………………….…… 58

2.3.2 Generation of isoAsp …………………………………………………………….…….. 59

2.3.3 Reduction, Alkylation, and Tryptic Digestion of IgG1 …………………………..……. 59

2.3.4 Methylation Catalyzed by PIMT ……………………………………………...…….…. 60

2.3.5 18O-Labeling ……………………………………………………………………..…….. 60

2.3.6 HPLC …………………………………………………………………………….…….. 61

2.3.7 Mass Spectrometry ……………………………………………………………..……… 61

2.4 Results and Discussion …………………………………………………………...….…… 62

2.4.1 Methylation of isoAsp ………………………………………………………….…..….. 64

2.4.2 Hydrolysis and 18O Incorporation ……………………………………………………… 68

2.4.3 Screening of 18O-Labeled isoAsp by Mass Spectrometry ………………………....….. 73

2.4.4 Co-elution of isoAsp and Asp and Overlapping of Isotope Patterns ………….….…… 75

2.4.5 Identification of isoAsp Sites in 18O-Labeled Peptides …………………………….…. 75

2.5 Conclusions ……………………………………………………………………….…..….. 83

2.6 References ………………………………………………………………….….…….…… 83

Chapter 3: A Comprehensive Methodology for the Identification of Protein Crosslinks without a

Prior Knowledge of Chemistry via 18O Labeling and Mass Spectrometry .….………...... 87

3.1 Abstract ………………………………………………………………………….………... 88

vii

3.2 Introduction …………………………………………………….………………..…….… 89

3.3 Experimental Section ………………………………………………….………..….…..… 91

3.3.1 Chemicals ……………………………………………………………………..….…… 91

3.3.2 Generation of Stressed Sample ………………………………………………..…...…. 94

3.3.3 Reduction, Alkylation, Tryptic Digestion and 18O-Labeling of the IgG2 ……...….…. 94

3.3.4 HPLC …………………………………………………………………………..……... 95

3.3.5 Mass Spectrometry …………………………………………………………..…..…… 95

3.4 Results and Discussion ……………………………………………………………...…… 96

3.4.1 Stage 1: Identification of Crosslinked Peptides ………………………….…….…….. 96

3.4.2 Stage 2: Deduce Partial Sequence for Each Chain ………………………..………… 100

3.4.3 Stage 3: Inference of Full Sequence for Each Chain ……………...………………… 104

3.4.4 Stage 4: Deduction of Crosslinking Chemistry and Site …………………….……… 111

3.4.5 Final Confirmation and Additional Support ……………………….……………..… 115

3.4.6 Targeted Search Based on the Newly Established Crosslinking Chemistry ..……… 121

3.5 Formation of Thioether ………………………………………………………..………. 129

3.6 Conclusions …………………………………………..….…………………………….. 131

3.7 References ………………………………………………..…….……………………… 131

viii

Chapter 4: Discovery and Characterization of a Novel His-His crosslink in IgG1 Utilizing 18O- labeling and Mass Spectrometry .….……………………………………………..………….. 137

4.1 Abstract ………………………………………………………….………….…………... 138

4.2 Introduction ……………………………………………………………….………..…… 139

4.3 Experimental Section ………………………………………………………….………… 140

4.3.1 Chemicals …………………………………………………………………….…….… 140

4.3.2 Generation of Stressed Sample …………………………………………….……..….. 141

4.3.3 Aggregates by Size Exclusion Chromatography ………………………………..…… 141

4.3.4 Reduction, Alkylation, Tryptic Digestion and 18O-Labeling of IgG1 ……….….…… 142

4.3.5 HPLC ……………………………………………………………………………..….. 143

4.3.6 Mass Spectrometry …………………………………………………………………… 144

4.4 Results and Discussion …………………………………………………….……….…… 145

4.4.1 Detection of Crosslinked Protein ……………………..………………………..…….. 145

4.4.2 Detection of Crosslinked Peptide ……………………………..…………..…….…… 148

4.4.3 Elucidation of Crosslinking Chemistry ……………….…………………….….….…. 149

4.4.4 Structural Confirmation by Mass Spectrometry ………………………….….….…… 151

4.4.5 Mechanism of formation for His-His crosslink ……………………….……….….…. 163

ix

4.4.6 Other Crosslinks ………………………………………………………………...…… 172

4.5 Conclusions …………………………………………………………………….….……. 174

4.6 References ………………………………………………………………………….…… 174

Chapter 5: Conclusion and Future Directions .….……………………………………….…... 180

5.1 isoAsp Project …………………………………………………………………….…….. 181

5.2 Crosslink Project …………………………………………………………………….….. 182

5.3 References ……………………………………………………………………...…..…… 188

x

List of Figures

Figure 1-1 MS/MS for mapping PTMs ……………………………………………………...…. 4

Figure 1-2 Total isoAsp analysis via HPLC/UV at 260nm for SAH measurement or radioactive detection of MeOH ……………………………………………………..……………………… 14

Figure 1-3 Mechanism of fragmentation of the Asp and isoAsp peptides in ETD MS ……...… 16

Figure 1-4 Acid-and based catalyzed deamidation ………………………………………...…... 19

Figure 1-5 Detection of deamidation and isomerization via (A) isoaspartate methyltransferase

(PIMT)-catalyzed methylation of isoaspartate and hydrazine trapping of methylester and succinimide and (B) enrichment by hydrazide-aldehyde affinity………………...... ….…….…. 21

Figure 1-6 Disulfide scrambling under a basic condition to form crosslink degradants ………. 24

Figure 1-7 The formation of thioether and other related degradants via dehydroalanine followed by Michael addition …………………………………………………………………………..... 25

Figure 1-8 Non-disulfide crosslinking in insulin Asn deamidation followed by the reaction with the N-terminal amine ……………………………………………………………………….….. 26

Figure 1-9 (A) Dityrosine and Tyr-Cys crosslinks formed via tyrosyl radical (B) Other Tyrosine related crosslink involving tyrosine oxidation followed by Michael addition with primary amines

…………………………………………………………………………………………………….…...... 28

Figure 1-10 Histidine-related crosslinks via photo-oxidation ..……………………..….……… 29

Figure 1-11 Dimerization of human superoxide dismutase via a novel oxidative modification ─ ditryptophan crosslink ….………..…………………….………………………………….…… 30

xi

Figure 1-12 Formaldehyde-mediated cross-linking ….………..…………………….………… 31

Figure 1-13 RNase dimerization by a single amide bond between Lys66 and Glu9 under vacuum and 85 °C ………………….….………..………………………………………………….…… 32

Figure 1-14 Isotopic labeling at N-termini ………………………………………………….…. 38

Figure 2-1 Isotopic distribution of a singly charged DSIP peptide with/without 18O tag (A) and a triply charged tryptic peptide LC69-108 from the IgG1 sample with/without 18O tag (B) …..... 63

Figure 2-2 Specificity of PIMT-mediated 18O-labeling shown in the Asp-DSIP and isoAsp-DSIP samples ………………………………………………………………………………….…..…. 64

Figure 2-3 The mixture of isoAsp-DSIP and DSIP peptide was analyzed by PIMT/18O-labeling method …………………………………………………….…………………………...………. 66

Figure 2-4 The effects of pH (A) and incubation time (B) on the hydrolysis of the succinimide and methyl ester ……………………………………………………………….……….………. 69

Figure 2-5 Stability of Asp-DSIP during sample treatment ………………………..………..… 70

Figure 2-6 Guanidine HCl (Gnd-HCl, 1.25 M) quenched PIMT activity during hydrolysis and thereby minimized the incorporation of two 18O-atoms into isoAsp peptides …………....…… 72

Figure 2-7 Identification of isoAsp site in a doubly charged tryptic peptide HC 271-284 from the stressed IgG1 ………………………………………………………………………….…..….... 77

Figure 2-8 Identification of isoAsp site in a triply charged tryptic peptide LC69-108 from the

IgG1 sample ……………………………………………………………………………..…...... 78

xii

Figure 2-9 Identification of isoAsp site in a doubly charged tryptic peptide HC389-405 from the

IgG1 sample …………………………………………………………………………...……...... 79

Figure 2-10 Identification of isoAsp site in the stressed Asp-DSIP peptide by tandem mass spectrometry ………………………………………………………………………….……….... 82

Figure 3-1 Isotopic distributions of the cross-linked peptide HC:G118-R129/HC:C215-K240

……………………………………………………………….……………………………....…. 97

Figure 3-2 CID MS/MS spectrum of the triply charged precursor ions at m/z 1351.33 (16O-

labeled C-termini) and 1354.00 (18O-labeled C-termini) ………………………………..……. 114

Figure 3-3a MS/MS data of the cross-link peptide HC:G118-R129/HC:KΔ214-K240

……………………………………………………………………………………….…..…….. 117

Figure 3-3b MS3 for structure confirmation of the singly-charged fragment ion m/z 1196 from

the cross-link peptide HC:G118-R129/HC:KΔ214-K240 ……………………….…………… 118

Figure 3-3c MS3 for structure confirmation of the double-charged fragment ion m/z 1521 from

the cross-linked peptide HC:G118-R129/HC:KΔ214-K240 …..………………….………...… 119

Figure 3-4 MS/MS data of the cross-linked peptide HC:C215-K240/HC:C215-K240 ….…... 123

Figure 3-5 MS/MS data of the cross-linked peptide HC:K214-K240*/HC:KΔ214-K240 ….... 124

Figure 3-6 MS/MS data of the cross-link peptide LC:T211-S218/HC:G118-R129 ……….… 125

Figure 3-7 MS/MS data of the cross-linked peptide LC:T211-S218/HC:K214-K240 ……… 126

xiii

Figure 3-8 (A) Detection of cross-links in IgG2 by reducing SDS-PAGE and (B) Aggregation

analysis by size exclusion chromatography …………………………………….……….…..... 128

Figure 3-9 Major disulfide linkage isoforms in IgG2 ………………………………….……... 130

Figure 4-1 Detection of crosslinking in IgG1 by reduced SDS-PAGE and size exclusion chromatography (SEC) ……………………………………………………………….….…… 147

Figure 4-2 Isotopic distributions of the crosslinked peptide S215-K244/S215-K244 m/z 1673.54

(z=4) from tryptic digestion of IgG1 ………………………………………………….….…... 148

Figure 4-3 CID MS/MS spectra of the quadruply charged precursor ions m/z 1673.54 (16O-

labeled C-termini) and 1675.54 (18O-labeled C-termini) of the crosslinked tryptic peptide S215-

K244/S215-K244 …………………………………………………………………..……….… 154

Figure 4-4 MS3 spectrum of the doubly charged fragment ion m/z 1488.35 obtained from

MS/MS of the precursor ion m/z 1673.54 in Figure 4-3 …………………………….….….… 156

Figure 4-5 CID MS/MS spectrum of the triply charged precursor ion m/z 1178.77 of the

crosslinked S215-E229/S215-E229 peptide generated from combined trypsin and GluC digestion

………………………………………………………………………………………………… 157

Figure 4-6 CID MS/MS spectrum of the quadruply charged precursor ions m/z 1549.53 of the

crosslinked peptide D217-K244/D217-K244 from Asp-N digestion ………………………… 158

Figure 4-7 CID MS/MS spectrum of the triply charged precursor ions at m/z 1013.41 of the

crosslinked peptide D217-E229/D217-E229 from digestion with Asp-N and GluC ………… 159

xiv

Figure 4-8 ETD MS/MS spectrum of the precursor ion m/z 1339.70 (z=5) of the crosslinked

tryptic peptide S215-K244/S215-K244 ………………………………………………………. 160

Figure 4-9 ETD MS/MS spectrum of the precursor ion m/z 1033.35 (z=6) of the crosslinked

peptide D217-K244/D217-K244 from Asp-N digestion ……………………………………... 161

Figure 4-10 ETD MS/MS spectrum of the precursor ion m/z 884.33 (z=4) of the crosslinked

peptide S215-E229/S215-E229 from digestion with trypsin and GluC ………………………. 162

Figure 4-11 ETD MS/MS spectrum of the precursor ion m/z 760.31 (z=4) of the crosslinked

peptide D217-E229/D217-E229 from digestion with Asp-N and GluC ……………………… 163

Figure 4-12 CID MS/MS spectrum of the tryptic peptide containing the 2-oxo-His (+14 Da)

intermediate …………………………………………………………………………………… 167

Figure 4-13 CID MS/MS spectrum of the tryptic peptide containing the His+32 intermediate

…………………………………………………………………………………………………. 168

Figure 4-14 (A) Space filling illustration of the hinge region of IgG1 antibody (DKTHTCPPCP)

(B) Three-dimensional (3D) structure of an IgG1 …………………..……..………….……… 169

Figure 4-15 ETD MS/MS spectrum of the quadruply charged precursor ion m/z 821.09 of the

crosslinked peptide D217-E229/S215-E229 generated by limited Asp-N digestion of fully digested IgG1 by trypsin and GluC ……………………………………………….……..…… 171

Figure 4-16 CID MS/MS spectrum of the triply charged precursor ions at m/z 1096.09 of the crosslinked peptide D217-E229/S215-E229 from limited digestion by Asp-N of the fully digested IgG1 by GluC and trypsin ……………………………………………………..….… 172

xv

List of Tables

Table 2-1 Representative isoAsp containing peptides detected in IgG1 ……………….….….. 74

Table 3-1 Fragmentation ions of a cross-link peptide ……………………………….…....….... 93

Table 3-2 The cross-linked peptide candidates identified by MassAnalyzer algorithm ..…...... 99

Table 3-3 Peptides in IgG2 that match with the mass of fragment ions of the triply charged ion m/z 1351.33 via FindPept (sorted in the order of primary sequence number) ……….……….. 105

Table 3-4 Peptides in IgG2 that match with the mass of fragment ions of the triply charged ion m/z 1351.33 via FindPept (grouped in the order of m/z value ) ………….………...... …. 107

Table 3-5 Partial sequences determined from the mass of fragmentation ions for a triply charge precursor ion at the retention time of 91.17 min with m/z 1351.33 …………….……….……. 109

Table 3-6 De novo sequencing for sequence tag using y-ions from the cross-linked fragments

(group 4 in Table 3-1) in the cross-link peptide G118-R129/C215-K240 ……………………. 110

Table 3-7 Elemental formula with mass of 149.9987 Da …………………………………..… 112

Table 3-8 Crosslinked peptides identified in IgG2 ………………………………….…..….… 120

Table 3-9 The cross-linking peptides identified in the IgG2 via a targeted search for cysteinyl thioether ………………………………………………………………………………………. 122

Table 3-10 Quantification of the cross-linked peptides in the IgG2 …………………….……. 127

Table 4-1 Partial sequences that match the mass of fragmentation ions for the precursor ion m/z

1673.54 (z=4) eluted at 112.48 min …………………………………………………………... 150

xvi

Table 4-2 Deduction of elemental formula for the crosslinked S215-K244/S215-K244 peptide

………………………………………………………………………….……………………… 151

Table 4-3 Crosslinked peptides obtained from digestion of IgG1 by various proteases and the combination thereof ……………………………………………………….………………….. 153

Table 4-4 Peptides containing the 2-oxo-His (+14 Da) and His+32 (+32 Da) intermediates observed in the stressed IgG1 ………………………………………………….….…..……… 166

Table 4-5 Thioether crosslinks detected in IgG1…………………………………..……..…… 173

xvii

List of Schemes

Scheme 1-1 Deamidation, isomerization, racemization and PIMT-dependent methylation …... 10

Scheme 2-1 Formation of isoAsp from the isomerization of aspartic acid (Asp) or the

deamidation of asparagine (Asn) …………………………………….…………….…..……..... 54

Scheme 2-2 Isotopic labeling of isoaspartic acid via protein isoaspartyl methyltransferase

(PIMT)-catalyzed S-adenosyl-methionine (SAM or AdoMet)-dependent methylation and

hydrolysis of the resulting methyl ester and succinimide in 18O-water …………………...…… 56

Scheme 2-3 Identification of isoAsp peptides by mass spectrometry using the mass increase of 2

Da imparted by 18O-labeling ………………………………………………………………...... 57

Scheme 3-1 Flow chart of XChem-Finder in four main stages ……………………………..… 92

Scheme 3-2 Establishment of crosslink chemistry based on formula C4H6O4S obtained from

elemental composition analysis of 149.9987 Da ……………………………….…..………… 130

Scheme 4-1 Proposed mechanism for the formation of His-His crosslink via photo-oxidation intermediates ……………………………………………………………………………….…. 164

Scheme 5-1 The use of combining sample enrichment and Lys-N digestion for detection of crosslinks …………………………………………………………………………………....… 184

Scheme 5-2 Isotopic labeling at N-termini via 1) trypsin digestion; 2) protection of є–amino

group of by reductive methylation; 3) specific derivatization of N-terminal amino group

2 with a 1:1 mixture of DNFB (2, 4-dinitrofluorobenzene) and [ H3]DNFB at pH 7.0 ….…..… 185

Scheme 5-3 N-Terminal Succinylation via two-step chemical derivatizations ………….....… 186

xviii

Abbreviations and Symbols

ACN acetonitrile

Asn (N) asparagine

Asp (D) aspartic acid

Asu aspartyl succinimide

CDR complementary-determining region of IgG

CID collision induced dissociation

°C degree Celsius

CEX cation exchange chromatography

CHO Chinese hamster ovary

Cys cysteine

Da Dalton

DSIP β-delta sleep-inducing peptide

DTT dithiothreitol

ECD electron capture dissociation

EDTA ethylenediaminetetraacetic acid

EGFR epidermal growth factor receptor

xix

ELISA enzyme-linked immunosorbent assay

ESI electron spray ionization

ETD electron transfer dissociation

FT-MS/MS Fourier transform tandem mass spectrometry

GndHCl guanidine hydrochloride

HC heavy chain of IgG

HCD

His (H) histidine

His-His histidine-histidine crosslink

HPLC high performance liquid chromatography

IAA iodoacetic acid

ICH The international conference on harmonization of technical requirements

for registration of pharmaceuticals for human use

IgG immunoglobulin gamma isoAsp (isoD) isoaspartic acid kD kilodalton kV kilovolt

LC-MS liquid chromatography coupled with mass spectrometry

xx

LC-MS/MS liquid chromatography coupled with tandem mass spectrometry

LC liquid chromatography, light chain of IgG

m milli(10-3); meter(s)

M molarity

mAb monoclonal antibody

MALDI matrix-assisted laser desorption ionization

mg milligram

mg/mL milligram per milliliter

min minute(s)

µ micro (10-6)

µL microliter mL milliliter

MS mass spectrometry, mass spectrum, mass spectroscopy

MS/MS (MS2) tandem mass spectrometry

MSn multiple stage fragmentation

m/z Mass-to-charge ratio

nm nanometer

xxi

% percentage

pI isoelectric point

PIMT protein isoaspartate methyltransferase

ppm parts per million

HNMR proton nuclear magnetic resonance

PTM post-translational modification

RP-HPLC reversed phase high performance liquid chromatography

RT retention time

SAH (AdoHcy) S-adenosyl-homocystein

SAM (AdoMet) S-adenosyl-L-methionine

SDS-PAGE sodium dodecyl sulfate polyacrylamide gel electrophoresis

SEC size exclusion chromatography

TFA trifluoroacetic acid

Tris tris(hydroxymethyl)aminomethane

UV ultraviolet

UPLC ultra performance liquid chromatography v/v volume-to-volume ratio

xxii

XIC extracted ion chromatography

xxiii

Chapter 1: Overview of Protein Post-Translational Modification and Their Analysis

1.1 Protein Post-Translational Modifications

Post-translational modifications (PTMs) are enzymatic or chemical modifications

occurring on the amino acid side chain or the amino and carboxyl termini of proteins[1]. Protein

backbone cleavage, commonly referred to as proteolysis, is also considered as a PTM in many

cases[1]. Over 300 PTMs have been reviewed in the literature[1, 2]. The common types of

PTMs include oxidation, deamidation, isomerization, protein crosslinks, ,

phosphorylation, ubiquitination (attachment of 76-amino acid residue ubiquitin protein),

nitrosylation, pyroglutamic acid, methylation, acetylation, lipidation. This thesis will focus on

Asp isomerization and non-reducible protein crosslinks.

1.2 Biological and Biopharmaceutical Importance of PTMs

PTMs are ubiquitous in biological systems and biopharmaceuticals. PTMs can

potentially change protein’s physical or chemical properties, conformation, activity, cellular location or stability and serve as one of the most important regulatory mechanisms for fine- tuning protein function. Therefore, PTMs influence almost all aspects of normal cell biology and pathogenesis. PTMs can occur during manufacturing process and storage, leading to structural

changes and loss of efficacy, causing immunogenic response, raising safety concern[3-5]. As

examples, biological significance of Asp isomerization and protein crosslinks will be discussed

in section 1.4.2 & 1.5.2, respectively.

It has been recognized the potential damage to proteins by solution pH, temperature and

light exposure for years as reflected in the ICH Guideline Q1B and Q5C (the international

1 conference on harmonization of technical requirements for registration of pharmaceuticals for human use, www.ICH.org) [6-10]. As such, stress testing studies must be conducted in order to assess the stability of products and to facilitate improvements in the manufacturing process, formulations, package, and storage conditions. Typical stress studies include high and low pH, elevated temperature, photolysis, oxidation, and freeze-thaw cycles and shear. Photodegradation study covers direct long-term exposure of product to sunlight and common light source for indoor lighting and UV-sterilization in industries. In this thesis, acidic and basic condition leading to isomerization and thioether crosslinks in IgGs will be described in Chapter Two and

Three, respectively. The formation of histidine-histidine (His-His) crosslinks in IgG resulting from light exposure will be presented in Chapter Four. A more detail overview on the formation and biological importance of isoAsp and protein crosslinks will be presented in section 1.4 and

1.5, respectively.

1.3 Analysis of PTMs

Identifying and understanding PTMs is critical in the study of cell biology and disease treatment and prevention. However, it is a great challenge to study a specific modified form in a largely heterogeneous protein pool because 1) it is often at a low abundance; 2) most post- translational modifications (e.g. isoAsp formation) alter protein in subtle ways that are not easily detected; 3) a protein can be modified by more than one type of PTMs; 4) furthermore, a protein can be multiply modified by the same PTM at different residues. This challenge will continue to promote methodology development for protein separation/detection as well as better instrumentation. In a constant effort to develop novel, highly sensitive and sophisticated PTM identification techniques, MS-based protein analysis holds great potential for the analysis of

2

protein PTMs. The currently reported analysis methods of isoAsp and protein crosslinks will be

reviewed in sections 1.4.3 and 1.5.3, respectively. In this section, the focus will be on the

overview of the determination of PTMs via MS-based approach.

The widely used strategy for protein identification is to cleave the protein with highly

specific proteases, followed by LC/MS analysis. PTMs can either increase or decrease the

molecular weight of peptides and result in modification-specific signals in MS/MS. In other words, the modification not only makes peptide’s molecular weight shifted, but all fragment ions containing the modified amino acid residue are also mass-shifted (Figure 1-1). However, precise identification of the modification type and the modification site can be very challenging due to 1) the mass shift in the peptide molecular weight. The difficulty increases for small mass shift which requires higher resolution MS. Modifications that are particularly large may shift total mass outside of the mass range suitable for MS/MS sequencing; 2) the overall abundance of the modified peptide. Most PTMs are low abundance and/or substoichiometric; 3) the stability of the modification and the gas phase dissociation behavior of the modified peptides. Some PTMs are labile during MS and MS/MS; 4) the effect of PTMs on protease digestion. The presence of

PTMs may affect the cleavage efficiency of proteases; 5) the effect of PTMs on the peptide’s ionization efficiency. The detectability/sensitivity of a peptide is a function of its sequence and modification; 6) multiple-site PTMs. This may generate very complicated MS and MS/MS data that are difficult to interpret; 7) the effect of PTMs on peptide chromatographic behavior; 8) sample handling. Sample preparation may introduce artificial modifications to a protein, such as isoaspartic acid formation; 9) the complex sample matrix of formulation samples or biological samples (such as urine, plasma).

3

Therefore several strategies such as sample separation/enrichment to reduce sample complexity and to minimize ion suppression, multiple proteases with different cleavage specificity to generate complementary and redundant sets of overlapping peptides, isotope labeling, multiple stage fragmentation (MSn), different fragmentation techniques (CID, collision induced dissociation vs ETD, electron transfer dissociation), etc. are developed for PTM analysis.

Figure 1-1. Mass spectrometry for mapping PTMs[11]. The mass shift from PTM can be detected in MS and the modification site can be located in MS/MS.

4

The use of specific proteases cleaves proteins into appropriately sized peptides that can

be identified easily in MS/MS experiments. It is important to note that larger peptides (>4 kDa)

have poor recovery and are difficult to characterize by MS/MS on most commercial instruments

(the optimal m/z range is 500-4000 Da for MALDI and 300-1500 Da for ESI), whereas very

small peptides (2-3 residues) are often lost due to their poor retention on reversed-phase

columns. The choice of a proper protease for digestion will be determined by the nature of

amino acid sequence of the protein being analyzed. However, Trypsin and Endoproteinase LysC

are the most commonly used in proteomics because they cleave Arg/Lys or Lys at the C-termini

of digested peptides which aid in identification via tandem mass spectrometry. Based on the

average occurrence of lysine (5.7%) and (5.5%) in proteins in the Swiss-Prot database,

digestion with LysC would be expected to produce peptides with an average length of about 17

residues while Trypsin would produce peptides with an average length of about 8 residues [12].

The size of those peptides produced by LysC and Trypsin can be optimally separated by reversed phase HPLC and are very suitable for mass spectrometric analysis. In some instances tryptic peptides may be too small or too large for LC MS/MS analysis. Furthermore, the proximity of post-translational modifications (PTMs) to proteolytic sites can interfere with protease efficiency, which can potentially hinder detection of the PTM. For example, the missed cleavage by trypsin at Lys residue was reported due to its close proximity to the crosslinking site[13, 14].

As such, there are cases where other proteases (such as GluC, AspN, chymotrypsin, etc) with

different cleavage specificity are useful. A disadvantage of using GluC, and AspN is that they often yield peptides that are longer and contain one or more internal basic residues, which are poorly fragmented by CID. But the alternative fragmentation strategies such as ETD and ECD are known to improve identification of long, highly-charged peptides containing basic residues.

5

The use of non-specific proteases are also reported[15], but these can decrease experimental reproducibility and complicate the separation and identification of proteolytic peptides.

It is important not to introduce any reagents/contamination during sample preparation, since this can inhibit enzyme activity, and generate artifacts, for example, carbamylation. Also, enzyme activity is greatly dependent on the digestion conditions such as pH, temperature,

enzyme-to-substrate ratio, etc.

It has become a common practice to use a multiple protease strategy to generate

complementary and redundant sets of overlapping peptides for the improvement of protein

identification and sequence coverage [16-19]. In Swaney’s study, trypsin, LysC, ArgC, AspN,

and GluC were used with two dissociation methods (CID, ETD) in a decision tree-driven fashion

for complex protein samples[16]. All digestions were performed under optimized conditions for

each protease, respectively. They observed a modest boost in protein identifications (~20%) over the use of a single protease, but a more than two fold improvement in proteome sequence coverage. The optimum digestion pH of each protease as well as the carry-over of proteases to next digestion is an important consideration in a tandem approach. Buffer exchange and other

clean-up steps may be necessary for the simplification of the downstream data interpretation.

In addition, the mass spectrometry instruments have been greatly advanced in the recent

years. The instruments (for example, Orbitrap and Fourier transform ion cyclotron resonance

mass spectrometry) offering high resolution, good mass accuracy, high scan speed, and wide

dynamic range are becoming available. In this thesis, the use of 18O-labeling, complimentary

protease digestion, multiple stage fragmentation MSn, different ion activation by CID and ETD

6 on Orbitrap mass spectrometry to detect and characterize isomerization and protein crosslinks will be discussed in great detail in Chapters Two to Four.

1.4 Deamidation and Isomerization

1.4.1 Isoaspartic Acid Formation

Isoaspartic acid (isoAsp, isoD), aspartic acid in a beta-peptide linkage, is a ubiquitous post-translation modification observed both in vivo and in vitro. IsoAsp can be spontaneously generated through the non-enzymatic deamidation of asparagine (Asn) or isomerization of aspartic acid (Asp) during manufacture process and storage. It is of great concern in the protein pharmaceutical product development. The loss of ammonia from Asn or the dehydration of Asp leads to the formation of a labile intermediate of succinimide (Asu) which readily hydrolyzes to isoAsp and Asp in about 3:1 ratio (Scheme 1-1)[20].

Many factors can influence the rates of isoAsp formation, such as pH, temperature, protein sequences, secondary structures, local three-dimensional structure, etc. Basic conditions

(pH>7) favor deamidation while isomerization happens more readily in acidic conditions. In general, the half-times of aspartyl and asparaginyl peptide degradation under physiological conditions (pH 7.5, 37 °C) vary between about 1 and 1000 days[21]. In many case, deamidation rate is determined by the sequence of residues immediately adjacent in the peptide chain and by higher order protein structure[22]. The fastest deamidation sequence was reported as asparagine- glycine (NG) sequence followed by asparagine-serine (NS) sequence. The slowest is asparagine- proline (NP) sequence. Compared to Asn deamidation, Asp isomerization is about 10 times slower under physiological conditions[23]. In 1992, Kroon et al reported that OKT-3, the first marketed monoclonal antibody product(MAb), undergoes deamidation[24]. There are a number

7

of reports on deamidation and isomerization in MAbs and other proteins of pharmaceutical interest [25, 26].

Spontaneous direct hydrolysis of asparagine residues by water attack on the side chain amide group under pH 3 can also result in aspartyl residue formation[27, 28]. This direct

hydrolysis of Asn in an acidic condition results in the formation of Asp as the only product. At

neutral pH, the rate of this reaction appears to be much slower than that of the succinimide pathway. The deamidation rate reaches a minimum at approximately pH of 5.

The succinimide is also racemization-prone and can generate the D-succinimidyl, D- aspartyl, and D-isoaspartyl forms (Scheme 1-1)[29, 30]. Zhang et al reported simultaneous isomerization and racemization of Asp in Asp-Asp motif of a therapeutic protein[31]. Young et

al demonstrated racemization of Asp-25 in mammalian Histone H2B[32]. Asp residues do not

racemize uniformly and specific Asp residues have a greater tendency to racemize than others

depending on the neighboring residue of the Asp residue as well as the higher-order structure

around the Asp residues in the protein[29, 30]. UV radiation and oxidative stress can

promote/induce the racemization of Asp residues[29, 33]. Among the reaction products

(including L/D-succinimidyl, L/D-asp, L/D-isoAsp), L-isoAsp is typically the predominate form.

Similar degradation reactions were also reported for glutamine (Gln) or glutamic acid

(Glu) residues, but the rate of these reactions via six membrane intermediate are much slower (2

orders of magnitude) of Gln than those of asparagine and aspartic acid residues via five member

ring succinimide intermediate[5, 34].

Isoaspartic acid in the damaged protein can be partially repaired by protein L-

isoaspartyl/D-aspartyl O-methyltransferase (PIMT, EC 2.1.1.77) which is a repair enzyme that

initiates the conversion of L-isoAsp or D-Asp residues to L-Asp residues (Scheme 1-1)[20, 35].

8

PIMT recognizes and transfers the methyl group from S-adenosyl-L-methionine to L-isoAsp or

D-Asp to form the methyl ester. The labile methylester is rapidly converted back to succinimide and subsequent hydrolysis can generate Asp and isoAsp.

9

Scheme 1-1. Deamidation, isomerization, racemization and PIMT-dependent methylation[29, 30,

32]. This spontaneous intramolecular rearrangement occurs most readily at Asn-Gly, Asn-Ser and Asp-Gly sequences in flexible regions of polypeptides. L-isoAsp form typically accounts for

70-85% of the succinimide hydrolysis product. Protein-L-isoaspartyl methylthansferase (PIMT) catalyzes the methylation of L-isoAsp and D-Asp in the presence of S-adenosyl-L-methionine

(AdoMet).

10

1.4.2 Biological and Biopharmaceutical Importance of Deamidation and Isomerization

Deamidation of asparagine generates aspartate, which fundamentally changes the amino

acid composition and charge of the polypeptide post-translationally[20]. Asn deamidation alters the charge of the protein from neutral to negatively charged. Formation of isoAsp from either deamidation of Asn or isomerization of Asp results in the insertion of a methylene group and D- configuration of Asp into the protein backbone[20]. These may dramatically change the protein structure/conformation[36], stability[37], bioactivity[37, 38], aggregation[39, 40], and function[41, 42] leading to aging[40], cancers[43], Alzheimer’s disease[44] and immunogenicity[45-47]. The enzymatic conversion by PIMT of the abnormal isoAsp residues to normal Asp residue in proteins prevents the accumulation of a potentially dysfunctional protein in vivo as cells and tissues age. Studies show that there is increased isoAsp accumulation in tissues (e.g. brain) and in fluids (e.g. urine) in PIMT-deficient mice compared to wild type mice[35, 48]. Furthermore, the average age for PIMT-deficient mice is 42 days, much shorter than that of 22-26 months for wild type mice. These studies clearly demonstrate the harmful consequences of isoAsp accumulation[35, 48]. The D-Asp residues were detected in various proteins from diverse tissues of elderly individuals and related to age-related diseases such as cataract and Alzheimer’s disease[29, 30, 33, 49]. Fujii and coworkers reported high level D- isoAsp at the site of 58 and 151 in the αA-crystalline from aged human lenses, which undergo abnormal aggregation and lead to the reduced chaperone activity[29, 30]. The same research group reported that human skin samples exposed to UV light exhibited significant accumulation of D-Asp by comparison to sun-protected skin[50]. This evidence led to the proposal that assay of D-amino acid accretion could serve as an indicator of sun-induced damage. In the next section, the detection of deamidation and isomerization will be discussed.

11

1.4.3 Methods for Detection and Characterization of Deamidation and Isomerization

Methods for detection of deamidation are usually based on the charge-sensitive techniques or mass spectrometry analysis[51]. Deamidation introduces negative charges to a protein that shift its isoelectric point (pI). It also results in a +0.984 Da mass increase from Asn to Asp/isoAsp, which can be detected and quantified by mass spectrometry. On the other hand, for isomerization, there is no difference in charge and molecular mass between Asp and isoAsp, which cannot be reliably distinguished by mass spectrometry, the method of choice for analyzing almost all other PTMs. There was no suitable method for Asp isomerization until recently several methods including chemical, instrumental and enzymatic approaches are developed. In this section, a brief review on those methods will be presented and the limitation of each method will also be discussed. The specific focus of this thesis is the determination of deamidation and isomerization using liquid chromatography-mass spectrometry (LC-MS) methods.

1.4.3.1 Edman Degradation

Edman degradation stops at isoAsp residues and was widely used to detect and identify isoAsp residues[52, 53]. Di Donato and his coworkers used chromatographic separation of tryptic peptides from RNase A followed by Edman degradation sequencing to successfully identify Asn67 in Ribonuclease A (RNase A) as the site of deamidation[53]. The Asn67 deamidation was found to impact catalytic property and refolding rate of RNase A[53]. The

Edman sequencing method needs purified sample and is not suitable when N-terminal amino group is modified/blocked. In addition, relatively large quantity of protein sample is required for

Edman sequencing method.

12

1.4.3.2 Protein-L-isoaspartyl Methyltransferase (PIMT)

Protein-L-isoaspartyl methyltransferase (PIMT, EC 2.1.1.77) specifically transfer a

methyl group from S-adenosyl-L-methionine (SAM or AdoMet) to isoAsp, generating S-

adenosyl-homocystein (SAH or AdoHcy) and the corresponding isoaspartate methyl esters

(Figure 1-2)[26, 32, 54, 55]. The methyl ester has increased retention time compared to isoAsp

on RP-HPLC due to its increased hydrophobicity, but the methyl ester is labile and readily

converts to succinimide. Therefore the methyl ester cannot be used to quantify isoAsp. As such,

the methylation by-products, SAH and methanol (MeOH), are measured instead to achieve the

global analysis of the isoAsp residue content in the protein samples in commercially available kit

IsoQuant. In the radioactive format of IsoQuant, the methyl group donor SAM, is isotopically labeled with tritium and the resulting by-product of [3H] methanol is used for isoAsp quantitation

(Figure 1-2). In the HPLC format of IsoQuant, a by-product of SAH is separated by reversed

phase HPLC and then quantified using a standard at UV 260 nm (Figure 1-2). The major

limitation of this method is that only total isoAsp content is measured and the site of

deamidation/isomerization cannot be located.

13

Figure 1-2. Total isoAsp analysis via HPLC/UV at 260 nm for SAH measurement or radioactive detection of MeOH.

1.4.3.3 Mass Spectrometry

Over the past decade, peptide identification by CID has become the method of choice in mass spectrometry-based proteomics. In neutral and basic solution, deamidation and isomerization involve succinimide intermediate (Asu) (Scheme 1-1). While the Asu intermediate often can be detected as a degradation product (mass decrease of 17 Da), it is readily hydrolyzed in aqueous solution to form the Asp and isoAsp products. The deamidation from Asn to isoAsp and Asp results in 1 Da mass increase which can also be readily detected by modern mass spectrometers, and the deamidation sites can be localized through tandem mass spectrometry fragmentation. In addition, mass spectrometry is a sensitive technique which requires only femtomole to attomole quantities of sample.

Unlike Asn deamidation, Asp isomerization analysis presents a significant challenge for mass spectrometry, as there is no mass change between Asp and isoAsp. Fragment ion intensity

14 in CID was studied to differentiate isoAsp from Asp. Lehmann et al have noticed that replacement of L-Asp by L-isoAsp resulted in 1) the ion intensity ratio of complementary b and y ions generated by cleavage of N- and C-terminal to the isoAsp decreased, and 2) the Asp ammonium ion abundance at m/z 88 also decreased[56]. However, the b/y ion intensity ratio and the ammonium ion intensity vary considerably depending on the peptide sequence and instrumental settings. Thus, the abundance changes are difficult to use in practice for detection of isoAsp. The development of alternative fragmentation techniques has extended the possibilities within tandem mass spectrometry for isomerization detection. More recently, electron capture dissociation (ECD) and electron-transfer dissociation (ETD) are used for the differentiation of isoaspartic acid and aspartic acid residues by the reporter ions c+57 and z·-57

(Figure 1-3)[57-62]. Dai et al reported the identification of major isoAsp-containing proteins in the urine of PIMT-deficient mice via ETD analysis of Lys C digests[63]. The limitations of this approach 1) rely on only one pair of signature ions; 2) The abundance of the reporter ions is low.

The intensity of these reporter ions is typically less than 20% of the corresponding c and z ions;

3) Also higher charge state of peptides is required for ETD fragmentation.

Like other PTMs, deamidation and isomerization often are at low abundance. Sample enrichment to improve detection and reduce sample complexity is desired. Since Asn deamidation to Asp/isoAsp changes the charge of peptides/proteins, cation exchange chromatography (CEX) has been reported to separate intact molecules from their charge variants[64, 65]. However, the mobile phase in CEX usually is not compatible with mass spectrometry and fraction collection of CEX for further mass spectrometry identification is needed.

15

Asp and isoAsp have identical mass and similar pI, thus their analysis remains challenging. Fortunately the structural changes induced by isomerization usually change the retention time of the peptide in reversed phase liquid chromatography (RPLC)[26, 66]. A typical peptide elution order is isoAsp, Asn, Asp, and succinimide[67]. However, caution should be taken for identification based on retention time alone since the separation sometimes varies with different chromatography conditions[61]. As such, RPLC coupled with ETD-MS provides powerful identification tool to differentiate Asp from isoAsp[57, 61].

Figure 1-3. Mechanism of fragmentation of the Asp and isoAsp peptides in ETD MS. (a) formation of c and z fragment ions of the Asp peptides, which is the same for isoAsp peptide; (b) formation of the c+57 and z-57 diagnostic ions of the isoAsp peptides[61].

16

1.4.3.4 18O-Incorporation

As described in the previous section 1.4.1, IsoAsp can be generated from Asn

deamidation or Asp isomerization via a common succinimide intermediate. The identification of

the succinimide intermediate in proteins is challenging because it hydrolyzes rapidly under

neutral to basic pH conditions—typical conditions in protease digestion. Xiao et al have

developed a 18O-labeling method for identification and quantification of succinimide in proteins[68]. The method utilized 18O water in the hydrolysis of succinimide followed by tryptic

digestion and LC/MS analysis to unambiguously identify the sites of deamidation and

isomerization via mass increase of 3 and 2 Da comparing to their 16O counterparts,

respectively[31, 68, 69].

Since Asn deamidation to convert to isomeric products (isoAsp and Asp residues) can

readily occur even under the mild conditions used to digest protein for LC/MS analysis, it often

overestimates the original level of deamidation. The inherent deamidation and those introduced

by sample preparation can be differentiated by preparing sample in 18O-water[70-72]. The

artificial deamidation from sample preparation show a 3 Da mass shift while intrinsic

deamidation has a 1 Da mass shift compared to the non-deamidated peptide. When the sample

preparation is conducted in 18O-water, protease can simultaneously catalyze the incorporation of

up to two 18O atoms at the peptide C-terminal carboxyl groups, resulting in complicated mass spectra. This limitation can be overcome by a multiple-step calculation procedure[72, 73]. In this method, b ions were used for the calculation of Asn deamidation that occurred prior to or during sample preparation, which eliminated the complexity induced by protease C-terminal 18O- labeling.

17

Recently, Wang et al taking advantages of different deamidation mechanisms in acidic

and basic conditions introduced isomer-specific mass tags to 18O-labeled aspartyl and

isoaspartyl-containing peptides (Figure 1-4)[74]. Deamidation under basic conditions generates both aspartyl and isoaspartyl-containing peptides while acid-catalyzed deamidation only leads to aspartyl-containing peptides. When 18O-water is used in those deamidation conditions, different

levels of 18O-incorporation in aspartyl- and isoaspartyl-contained peptides can be achieved. In

the acid-catalyzed labeling, deamidation result in a mass increment of 4n+9 Da (4 Da from each

acidic residues, 4 Da from C-terminus and 5 Da from deamidation-formed Asp residue), where n

is the number of acidic residues and carboxylated Cys residues in a peptide[74]. In contrast, only

one 18O atom is incorporated during the hydrolysis of succinimide intermediate under basic

condition and results in mass shift of 3 Da. These different mass shifts from 18O-labeling can be

exploited for unambiguous assignment of aspartyl- and isoaspartyl-containing peptides by mass

spectrometry.

18

Figure 1-4. Acid-and base catalyzed deamidation[74]. The n is the number of acidic residues and carboxymethylated Cys residues in a peptide.

1.4.3.5 Selective Cleavage of isoAsp Peptides with the Asp-N Protease

Endoproteinase Asp-N selectively cleaving only at Asp, not at isoAsp, was used for enrichment of isoAsp-containing peptides for MS analysis[61, 75]. As a result, isoAsp peptide can be differentiated from the peptide containing Asp. Zhang et al employed this finding to identify isoAsp formation at Asp45, Asp47 and Asn47 of recombinant human iterleukin-11

(rhIL-11)[76]. Rehder et al used Asp-N peptide mapping to identify the isomerization of Asp92 residues of anti-epidermal growth factor receptor (EGFR) immunoglobulin γ2 antibody, which is contributed to the decreased potency to bind to EGFR as measured by a cell proliferation assay[75]. The limitation of this approach is that Asp-N digestion might be not suitable to digest some proteins resulting in very long peptides which are difficult to identify by mass spectrometry.

19

1.4.3.6 Chemo-enzymatic Derivatization and Affinity-based Method

Alfaro and co-workers reported a chemo-enzymatic detection of protein isoaspartate by taking advantage of protein isoaspartate methyltransferase (PIMT) to selectively converts isoaspartates into the corresponding methyl esters followed by hydrazine trapping and then aldehyde affinity enrichment (Figure 1-5)[77]. Hydrazides bind to aldehyde resins at mildly acidic conditions (pH 3-6) and the trapped protein isoaspartate can be released with pH 10. The mass increase of 14 Da from isoAsp to hydrazide can be readily detected by standard mass spectrometry. This method can be used not only for site identification, but also for the detection of low abundance isoAsp peptides/proteins. The limitation of this method is that hydrazine trapping is sub-stoichiometric and hydrolysis is competing.

20

(A)

(B)

Figure 1-5 Detection of deamidation and isomerization via (A) isoaspartate methyltransferase

(PIMT)-catalyzed methylation of isoaspartate and hydrazine trapping of methylester and succinimide and (B) enrichment by hydrazide-aldehyde affinity[77].

21

1.4.3.7 Protein Isoaspartate Methyltransferase-mediated 18O-Labeling

Recently, a novel approach for the detection and characterization of isomerization of Asp

in IgG1 via protein isoaspartate methyltransferase-mediated 18O-labeling followed by mass

spectrometry analysis has been developed in our lab[26]. In this approach, under mild basic condition, 18O has been incorporated into succinimide generated from PIMT-mediated

methylation of isoAsp. Several isoAsp sites in IgG1 have been identified , which will be detailed

in Chapter Two.

1.4.3.8 Methods for Racemization Detection

Racemization is much more difficult to detect because L- and D-amino acids have

identical polarity, charge, and molecular weights. However, there are several methods for

detection and quantification of racemization, which include chromatography, ELISA(enzyme-

linked immunosorbent assay), and enzymatic assays.

In a typical chromatographic protocol, the protein of interest is acid hydrolyzed under

very harsh conditions to release individual amino acids for either achiral separation after chiral derivatization or direct chiral separation on a chiral column[25, 78]. The limitation of this method includes that 1) the harsh acid hydrolysis can induce racemization; 2) the site of racemization is unknown. However, by combining Edman sequencing or peptide mapping with derivatization with chiral reagents, both the sequence and stereo-configuration of a peptide can be determined. Fujii et al tryptically digested αA-crystalline and then used reversed-phase

HPLC-mass spectrometry to analyze the resulting peptides[79]. After the peptides were identified by mass and sequence analysis, the peptides were hydrolyzed and derivatized with o-

22

phthaldialdehyde (OPA) for fluorometric derivatization and N-tert-butyloxycarbonyl-L-cysteine

(Boc-L-Cys) for chiral specificity. They found Asp58 and Asp151 residues in aged human alpha

A-crystalline were highly inverted to D-isomers. Inoue et al also reported racemization and

isomerization of N-terminal Amyloid-β in Alzheimer’s brain tissues by covalent chiral

derivatized UPLC-MS/MS analysis[80].

As an alternative to HPLC-based techniques, racemized amino acids can be identified

and quantified using a stereo-selective enzyme. Protein isoaspartyl methyl transferase (PIMT) selectively recognizes L-isoAsp and D-Asp, which is described in the previous section 1.4.3.2.

Again this method does not recognize racemization site in addition to an underestimation of total damage to protein or an overestimation of one form of specific modification since it only recognizes L-isoAsp and D-Asp, not D-isoAsp.

Currently, the most promising method for detection of racemization is to use the sequence specificity and stereospecificity of antibodies[81, 82]. The antibody-based method is a high sensitive and high throughput assay, but its method development is rather arduous.

1.5 Protein Crosslinking

1.5.1 Crosslink Formation

Protein crosslinks as one of PTMs can arise naturally or as degradation. A few protein crosslinks have been reported so far. Biological crosslinks (e.g. the crosslinks formed via transglutaminases, pentosidine and glucosepane crosslinks) are reported in the literature[83-92], therefore the focus here will be on the crosslinks formed as protein degradation[10, 24, 83].

23

1.5.1.1 Crosslinks as degradants

Disulfide exchange Disulfide scrambling, especially under basic condition, forms abnormal disulfide bonds. High pH deprotonates thiols and forms a thiol anion, which initiate thiol disulfide exchange (Figure 1-6)[93]. Therefore, lowering pH (e.g. typical formulation pH

~5) can minimize disulfide bond scrambling. A few scrambled disulfides in stressed therapeutic

IgG1s, anti-HER2 and anti-CD11a, were characterized by LC/MS with ETD[94]. Disulfide scrambling is a common issue for proteins containing disulfide bonds, therefore care should be taken during protein manufacturing process.

Figure 1-6 Disulfide scrambling under a basic condition to form crosslink degradants[93]

Thioethers Thioether crosslinks were reported to form as protein degradation products, especially under basic conditions, via dehydroalanine intermediate followed by Michael addition

(Figure 1-7)[10, 13, 14, 83, 95-99]. During thioether formation, cysteine racemization on IgG heavy and light chains was observed[100]. The light chain sequence was reported to impact the rate of thioether formation ─ thioether formation rates were faster for IgG1 containing λ light chains than those containing κ light chains[101]. Mozziconnacci et al also observed photolytic

24 conversion of a disulfide bond in IgG1 to thioether crosslink via a thiyl radical-dependent mechanism[98]. As such, thioether crosslink may be a potential issue for the production and formulation of therapeutically disulfide-containing proteins. In this thesis, thioether crosslinks in

IgG2 were formed under a basic condition and their characterization will be presented in Chapter

Three.

Figure 1-7.The formation of thioether and other related degradants via dehydroalanine followed by Michael addition[10, 13, 14, 83, 95-98].

Succinimide-mediated intermolecular transamidation Covalent dimer formation in insulin has been observed both in aqueous and lyophilized formulations[102-104]. Evidence suggests it involves rate-limiting formation of a cyclic anhydride intermediate at the C-terminal

AsnA-21 followed by intermediate partitioning to form crosslinking of AsnA-21-PheB-1 and AsnA-21-

25

GlyA-1 (Figure 1-8)[102-104]. Similar intermolecular amide-linked crosslinking was also

observed in hen egg-white lysozyme[105].

Figure 1-8. Non-disulfide crosslinking in insulin which arise from the initial formation of a

cyclic anhydride intermediate at the C-terminal Asn followed by the reaction with the N-terminal

free amine of another insulin molecule[103].

Tyrosine-related crosslinks Dityrosine crosslink, a biomarker of oxidative/nitrative stress, is a fluorescent molecule detected as photo-degradation in many proteins (such as insulin, calmodulin, etc.) [106, 107]. The dityrosine crosslink formed by UV irradiation of bovine brain calmodulin was believed to be the intermolecular crosslinking of Tyr99 and Tyr138[108]. The

mechanism of dityrosine formation begins with the generation of a tyrosinal radical which then is crosslinked to form dityrosine (Figure 1-9A)[106]. In addition, Mozziconacci et al studied

26

photodegradation of recombinant human insulin in the solid state[109]. They found

dithiohemiacetal and Tyr-Cys crosslink by GluC digestion of the UV-irradiated human insulin followed by mass spectrometry analysis. UV-exposure of solid human insulin results in

photodissociation of the C-terminal intrachain disulfide bond, leading to the formation of a thiyl

radical pair which reacts to proximal Tyr radical to form Tyr-Cys crosslink (Figure 1-9A)[109].

So in order to be crosslinked, the involved amino acid residues must be within a certain distance

of each other.

Other Tyrosine related crosslinks were also detected in the metal-catalyzed oxidized

recombinant human Interferon β-1a and recombinant human insulin (Figure 1-9B)[110, 111].

Tyrosine residues were first oxidized in the presence of copper(II) and ascorbate. The tyrosine

oxidation products undergo Michael addition which is initiated by a primary amine group

(Figure 1-9B). This results in the formation of crosslinks which are most likely responsible for

aggregation[111].

27

Figure 1-9. (A) Dityrosine and Try-Cys crosslinks formed via tyrosyl radical under oxidation

conditions[106, 109]. (B) Other Tyrosine related crosslink in metal catalyzed oxidized interferon

β-1a involving tyrosine oxidation followed by Michael addition with primary amines of Lys side chain or N-termini[110, 111].

Histidine-histidine crosslink Histidine-histidine crosslink formed by photo-oxidation was reported in free histidine and peptides containing histidine (Figure 1-10)[112, 113]. The evidence obtained supports the role of singlet oxygen in the formation of reactive peroxide

28

intermediate on exposure of His-containing peptides to light, which leads to final histidine-

histidine crosslink (Figure 1-10). Most recently, histidine-histidine crosslinks in the hinge region of light stressed IgG1 as protein photodegradation product has been discovered and characterized in our lab. The detail will be discussed in Chapter Four.

Figure 1-10. Histidine-related crosslinks via photo-oxidation[112].

Ditryptophan crosslink A non-disulfide covalent dimer of human superoxide dismutase

1 (hSod1), which was produced during its bicarbonate-dependent peroxidase activity in vitro, has been isolated and characterized by coupling 18O-labeling and mass spectrometry analysis

29

recently[114]. This covalent dimer was found to consist of two hSod1 subunits crosslinked by a

ditrytophan, which contains a bond between C3 and N1 of the respective Trp32 residues (Figure

1-11)[114]. Carbonate radical was believed to promote ditryptophan crosslink[114].

Figure 1-11. Dimerization of human superoxide dismutase via a novel oxidative modification ─

ditryptophan crosslink[114].

Formaldehyde-mediated crosslinking Formaldehyde-mediated crosslinking was also

reported to cause significant aggregation of lyophilized tetanus and diphtheria toxoids during

storage[115]. Formaldehyde, used to prepare the toxoid from the native toxin, reacts with Lys

residue and results in the reactive electrophiles which react with nucleophiles of a second vaccine molecule to form intermolecular crosslinks (Figure 1-12).

30

Figure 1-12. Formaldehyde-mediated cross-linking in vaccines where a formaldehyde-modified

electrophile is attacked by nucleophiles to form intermolecular crosslinks[115].

Amide crosslink A cross-linked ribonuclease A (RNase A) dimer composed of monomeric units covalently linked by a single amide bond between the side-chain of Lys66 and

Glu9 was generated without chemical reagent and characterized by mass spectrometry(Figure 1-

13)[116, 117]. It is very interesting that this dimer shows a two-fold increase in activity over monomeric RNase A[116].

31

Figure 1-13. RNase dimerization by a single amide bond between Lys66 and Glu9 under vacuum

and 85 °C[117]

1.5.2 Biological and Biopharmaceutical Importance of Crosslinks

Since the technical challenge on analysis of crosslink (as described in the next section

1.5.3), the limited crosslinks (such as thioether) were discovered serendipitously and

characterized with painstaking efforts. This results in limited biological knowledge on protein

crosslinks. In this section, a brief review of biological and biopharmaceutical importance of crosslinks will be presented next.

Protein crosslinks play a significant role in protein structure and stability. For example, disulfide bonds between cysteine residues separated in primary sequence often help to stabilize tertiary structure and subsequently affect biological activity and stability[51, 118]. On the other hand, abnormal crosslinks often lead to protein stability problem and immunogenicity.

Aggregation of type I soluble tumor necrosis factor receptor due to photoirradiation was reported via disulfide formation[119]. Aberrant disulfide linkages are also reported to associate with disease. The mutation of Cys470 to Arg in recombinant human arylsulfatase A (rhASA) and thus disruption of a disulfide linkage, has been reported in patients with metachromatic leukodystrophy (MLD), an autosomal recessive disease[120].

32

Dityrosine crosslinking is likely responsible for the dimerization and decrease bioactivity

of insulin[107]. Elevated levels of urinary dityrosine have been demonstrated in aging animals

and patients with systemic inflammation[106].

Thioether crosslink in commercial recombinant human growth hormone (r-hGH)

exhibited a significantly reduced in vivo biopotency and altered receptor-binding properties

compared with a control[96].

Covalent insulin dimers formed through transamidation reactions of AsnA21 and PheB1 accumulate in the circulation of type I diabetic patients undergoing prolonged insulin therapy, accounting for significantly reduced insulin biologic activity[121].

1.5.3 Methods for Detection and Characterization of Crosslinks

1.5.3.1 HPLC with Fluorescence Detection

Dityrosine crosslinks were detected and quantified by LC with fluorescence detection after acid hydrolysis of proteins[106, 122]. This method provides no structural data of crosslink site. In addition, the method is not suitable for the crosslinks which are not stable to conventional conditions of acid hydrolysis. Furthermore, HPLC results can be altered by the presence of other molecules that can coelute with the target molecule. This requires additional characterization (e.g. by MS) on the molecule of interest.

1.5.3.2 MS-based Method

Despite the excellence of mass spectrometry as an analytical tool for PTMs, it is very challenging to identify crosslinked peptides after proteolytic digestion of crosslinked proteins

33

because crosslinked peptides are often present at substoichiometric levels, which leads to failure

in detecting them during data-dependent LC/MS analysis. Even when the crosslink peptides have been identified, it remains challenging to assign sequences and locate site of crosslinking.

This is because (1) tandem mass spectra of crosslinked peptides are complicated by the presence of two sets of fragment ions; (2) the masses of crosslinked peptides are not in database if the crosslink chemistry is unknown. Therefore, traditional database search algorithms and de novo sequencing cannot be used to interpret their tandem mass spectrometry. In chemical crosslinking

widely used to probe protein structure and interaction, the crosslink chemistry is known. A

database of the intact mass and the tandem mass spectra for possible combination of crosslinked

peptides can be computerized, and subsequently used to correlate with observed spectra to

identify both the peptide sequence and sites of modification. However, this approach is futile if

the crosslink chemistry is unknown. This ultimately leads to very limited crosslinks discovered.

It is very helpful for interpretation of the tandem mass spectra if the fragment ions with and

without crosslink site can be distinguished isotopically or chemically. The linear fragment ions

then can be used for database searching as well as de novo sequencing[13]. This will be

described in more detail in our XChem-Finder work flow in Chapter Three and Four. In this

section, a brief summary of the techniques used to facilitate MS analysis of crosslinks will be

presented next.

1.5.3.2.1 C-Terminal 18O-Labeling:

Various stable isotope labeling techniques via metabolic labeling, chemical tagging, or

proteolytic 18O labeling have been developed and used for relative quantitation of change in

protein abundances between two compared samples, and also for qualitative characterization of

34

differentially labeled proteomes[123-125]. In this section the focus will be on simple and easy

enzyme-catalyzed 18O-labeling.

Protease catalytic 18O-labeling relies on the help of proteases (such as trypsin, Lys-C,

Glu-C) to exchange two 16O atoms for two 18O atoms at the C-terminal carboxyl group of each

newly formed digested peptide, resulting in mass shift of 4 Da[123, 126]. It was reported the

optimum pH for the carboxyl oxygen exchange reaction catalyzed by Lys-C and trypsin are 5

and 6, respectively[126]. Other protease such as endoproteinase Lys-N was reported only

incorporate one 18O atom and yield spectra insufficient to resolve isotope peak overlap[127].

Different peptides incorporate 18O atoms at different rates, which can complicate data analysis

and limit its applications in quantitative proteomics[128]. As a matter of fact, trypsin-catalyzed

18O exchange at the carboxyl terminus is in many instances inhomogeneous/incomplete[129].

Also back exchange of the carboxyl oxygen of 18O-labeled peptide to oxygen-16 could occur in

16O-water. Therefore, several approaches including using high enzyme-to-protein ratio[130],

low pH[131], heating[130], immobilized trypsin[132] were developed to optimize the exchange reaction.

Incorporation of 18O at a newly created C-termini during proteolytic digestion represents

clever approach to specifically detect crosslinked peptides[133]. Four and two 18O will be

incorporated at C-termini of a crosslinked peptide and linear peptide, respectively, thus resulting

in a specific isotopic signature in the mass shift of 8 Da for crosslinked peptides and 4 Da for linear peptides. The proteolytic 18O-labeling achieving isotope labeling concurrent with the

proteolytic digestion of proteins offers simplicity. In this thesis, the strategy of incorporation of

18O at C-terminus of peptide during tryptic digestion was used to facilitate the identification of

35

thioethers and histidine-histidine crosslinks, and the optimized 18O-labeled method will be described in more detail in Chapter Three and Four.

For large molecular weight crosslinks, this 18O-labeling may not produce sufficient mass

difference between light and heavy form, therefore complicating detection of crosslinks. Since

protease-catalyzed 18O-labeling is only applied to newly created C-termini, a crosslink

containing C-terminus of protein has mass shift of only 4 Da compounding with linear peptides.

To overcome this, other strategies such as using proteases with different specificity or N-terminal

modification can be explored.

1.5.3.2.2 N-Terminal Modification

Since crosslinks contain two amino termini, isotopic labeling of α-amino groups with N-

terminal modification reagents will lead to incorporation of the two isotopically coded groups in

crosslinked peptides versus one in case of linear peptides (Figure 1-14)[134]. Modification of

crosslinked peptides with an equimolar mixture of light and heavy isotopic forms of an amine-

2 2 reactive reagent e.g. H3-2,4-dinitrofluorobenzen ([ H3]NDFB) results in a specific triplet of

signals separated by mass according to the mass difference between the light and heavy isotopic

forms of the reagent[134]. The resulting 1:2:1 intensity ratio of these peaks is due to the possible

combinations of the different isoforms of the product (LL, LH+HL, HH, where L and H are light

and heavy forms, respectively). On the other hand, modification of a single N-terminal amino

group of linear peptides results in a doublet of signals in a 1:1 ratio due to the possible

combination of L and H isoforms[134]. One complication in this approach is the possible

modification of the є-amino groups of lysine residues which will produce a false positive

isotopic signature for lysine-containing linear peptides. Blocking lysine residues by reductive

36 methylation and then selectively hydrolyzing to release α-amino termini for derivatization with

2 [ H3]DNFB prior to enzymatic digest was proposed (Figure 1-17)[134]. The utility of this approach was demonstrated in the characterization of the unique crosslinks of polyubiquitin[134]. The lysine residue protection adds additional step in sample preparation and also leads to a higher molecular mass tryptic crosslinks due to missed digest sites at the modified lysine residues.

37

Figure 1-14. Isotopic labeling at N-termini via 1) protection of є–amino group of lysine; 2)

limited proteolysis to generate peptide; 3) specific derivatization of the liberated–amino group

2 with [ H3]DNFB[134].

1.5.3.2.3 Chromatographic Sample Enrichment

In order to improve identification of protein crosslinks using mass spectrometry, sample

enrichment can be greatly helpful. The separation based on protein size, e.g. size exclusion chromatography(SEC), was successfully used to enrich crosslinked protein for the characterization of thioether crosslinks in monoclonal antibody by LCMS analysis[14, 95].

38

Also, crosslinked peptides having higher charge state than linear peptides can be enriched by cation exchange chromatography (CEX)[135].

1.5.3.3 Antibody-based Method

Kato and colleagues have developed rabbit polyclonal and mouse monoclonal antibodies to detect dityrosine immunohistochemically in lipofuscin granules in aged human brain and in atherosclerotic lesions in mice[136, 137]. However, method of immunodetection is semiquantitative, requires extensive sample preparation and antibody purification, and may be confounded by the presence of crossreacting proteins in the sample of interest.

1.6 Conclusions

Detection and characterization of PTMs remain challenging, especially when at low abundance. The high sensitive MS-based method is a method of choice for the analysis of

PTMs. Various techniques as described in this chapter have been developed to enrich/facilitate the detection of PTMs. The demand to detect and characterize the low level of various PTMs for obtaining the knowledge of protein degradation pathway, controlling the quality of therapeutic proteins, and understanding PTMs biological importance will drive the continuing effort to develop novel, highly sensitive and sophisticated PTM identification techniques.

39

1.7 References

[1] Walsh CT, Garneau-Tsodikova S, Gatto GJ, Jr. Protein posttranslational modifications: the chemistry of proteome diversifications. Angew Chem Int Ed Engl 2005;44:7342-72.

[2] Walsh C. Posttranslational modifications of proteins: Expanding nature's inventory: Roberts and Company Publishers; 2006.

[3] Liu H, Gaza‐Bulseco G, Faldu D, Chumsae C, Sun J. Heterogeneity of monoclonal antibodies. J Pharm Sci 2008;97:2426-47.

[4] Zhang Z, Pan H, Chen X. Mass spectrometry for structural characterization of therapeutic antibodies. Mass Spectrom Rev 2009;28:147-76.

[5] Manning MC, Chou DK, Murphy BM, Payne RW, Katayama DS. Stability of protein pharmaceuticals: an update. Pharm Res 2010;27:544-75.

[6] Kerwin BA, Remmele RL, Jr. Protect from light: photodegradation and protein biologics. J Pharm Sci 2007;96:1468-79.

[7] Volkin DB, Mach H, Middaugh CR. Degradative covalent reactions important to protein stability. Mol Biotechnol 1997;8:105-22.

[8] Manning MC, Patel K, Borchardt RT. Stability of protein pharmaceuticals. Pharm Res 1989;6:903-18.

[9] Pattison DI, Rahmanto AS, Davies MJ. Photo-oxidation of proteins. Photochem Photobiol Sci 2012;11:38-53.

[10] Wang W, Singh S, Zeng DL, King K, Nema S. Antibody structure, instability, and formulation. J Pharm Sci 2007;96:1-26.

[11] Larsen MR, Trelle MB, Thingholm TE, Jensen ON. Analysis of posttranslational modifications of proteins by tandem mass spectrometry. Biotechniques 2006;40:790-8.

[12] Stone KL, Williams KR. Enzymatic digestion of proteins in gels for mass spectrometric identification and structural analysis. Curr Protoc Protein Sci 2004;Chapter 11:Unit 11 3.

[13] Liu M, Zhang Z, Zang T, Spahr C, Cheetham J, Ren D, et al. Discovery of undefined protein cross-linking chemistry: a comprehensive methodology utilizing 18O-labeling and mass spectrometry. Anal Chem 2013;85:5900-8.

[14] Tous GI, Wei Z, Feng J, Bilbulian S, Bowen S, Smith J, et al. Characterization of a novel modification to monoclonal antibodies: thioether cross-link of heavy and light chains. Anal Chem 2005;77:2675-82.

40

[15] Wang B, Malik R, Nigg EA, Korner R. Evaluation of the low-specificity protease elastase for large-scale phosphoproteome analysis. Anal Chem 2008;80:9526-33.

[16] Swaney DL, Wenger CD, Coon JJ. Value of using multiple proteases for large-scale mass spectrometry-based proteomics. J Proteome Res 2010;9:1323-9.

[17] Mohammed S, Lorenzen K, Kerkhoven R, Breukelen Bv, Vannini A, Cramer P, et al. Multiplexed proteomics mapping of yeast RNA polymerase II and III allows near- complete sequence coverage and reveals several novel phosphorylation sites. Anal Chem 2008;80:3584-92.

[18] Choudhary G, Wu S-L, Shieh P, Hancock WS. Multiple enzymatic digestion for enhanced sequence coverage of proteins in complex proteomic mixtures using capillary LC with ion trap MS/MS. J Proteome Res 2003;2:59-67.

[19] An HJ, Peavy TR, Hedrick JL, Lebrilla CB. Determination of N-glycosylation sites and site heterogeneity in glycoproteins. Anal Chem 2003;75:5628-37.

[20] Yang H, Zubarev RA. Mass spectrometric analysis of asparagine deamidation and aspartate isomerization in polypeptides. Electrophoresis 2010;31:1764-72.

[21] Brennan TV, Clarke S. Effect of adjacent histidine and cysteine residues on the spontaneous degradation of asparaginyl- and aspartyl-containing peptides. Int J Pept Protein Res 1995;45:547-53.

[22] Radkiewicz JL, Zipse H, Clarke S, Houk KN. Neighboring side chain effects on asparaginyl and aspartyl degradation: an ab initio study of the relationship between peptide conformation and backbone NH acidity. J Am Chem Soc 2001;123:3499-506.

[23] Stephenson RC, Clarke S. Succinimide formation from aspartyl and asparaginyl peptides as a model for the spontaneous degradation of proteins. J Biol Chem 1989;264:6164-70.

[24] Kroon DJ, Baldwin-Ferro A, Lalan P. Identification of sites of degradation in a therapeutic monoclonal antibody by peptide mapping. Pharm Res 1992;9:1386-93.

[25] McCudden CR, Kraus VB. Biochemistry of amino acid racemization and clinical application to musculoskeletal disease. Clin Biochem 2006;39:1112-30.

[26] Liu M, Cheetham J, Cauchon N, Ostovic J, Ni W, Ren D, et al. Protein isoaspartate methyltransferase-mediated 18O-labeling of isoaspartic acid for mass spectrometry analysis. Anal Chem 2012;84:1056-62.

[27] Robinson NE, Robinson AB. Deamidation of human proteins. Proc Natl Acad Sci U S A 2001;98:12409-13.

[28] Robinson NE. Protein deamidation. Proc Natl Acad Sci U S A 2002;99:5283-8.

41

[29] Fujii N, Kaji Y, Fujii N. D-Amino acids in aged proteins: analysis and biological relevance. J Chromatogr B Analyt Technol Biomed Life Sci 2011;879:3141-7.

[30] Fujii N, Kaji Y, Fujii N, Nakamura T, Motoie R, Mori Y, et al. Collapse of homochirality of amino acids in proteins from various tissues during aging. Chem Biodivers 2010;7:1389-97.

[31] Zhang J, Yip H, Katta V. Identification of isomerization and racemization of aspartate in the Asp-Asp motifs of a therapeutic protein. Anal Biochem 2011;410:234-43.

[32] Young GW, Hoofring SA, Mamula MJ, Doyle HA, Bunick GJ, Hu Y, et al. Protein L- isoaspartyl methyltransferase catalyzes in vivo racemization of Aspartate-25 in mammalian histone H2B. J Biol Chem 2005;280:26094-8.

[33] Mori Y, Aki K, Kuge K, Tajima S, Yamanaka N, Kaji Y, et al. UV B-irradiation enhances the racemization and isomerizaiton of aspartyl residues and production of Nepsilon-carboxymethyl lysine (CML) in keratin of skin. J Chromatogr B Analyt Technol Biomed Life Sci 2011;879:3303-9.

[34] Won JI, Meagher RJ, Barron AE. Characterization of glutamine deamidation in a long, repetitive protein polymer via bioconjugate capillary electrophoresis. Biomacromolecules 2004;5:618-27.

[35] Kim E, Lowenson JD, MacLaren DC, Clarke S, Young SG. Deficiency of a protein- repair enzyme results in the accumulation of altered proteins, retardation of growth, and fatal seizures in mice. Proc Natl Acad Sci U S A 1997;94:6132-7.

[36] Noguchi S. Structural changes induced by the deamidation and isomerization of asparagine revealed by the crystal structure of Ustilago sphaerogena ribonuclease U2B. Biopolymers 2010;93:1003-10.

[37] Friedman AR, Ichhpurani AK, Brown DM, Hillman RM, Krabill LF, Martin RA, et al. Degradation of growth hormone releasing factor analogs in neutral aqueous solution is related to deamidation of asparagine residues. Replacement of asparagine residues by serine stabilizes. Int J Pept Protein Res 1991;37:14-20.

[38] Huang L, Lu J, Wroblewski VJ, Beals JM, Riggin RM. In vivo deamidation characterization of monoclonal antibody by LC/MS/MS. Anal Chem 2005;77:1432-9.

[39] Nilsson MR, Driscoll M, Raleigh DP. Low levels of asparagine deamidation can have a dramatic effect on aggregation of amyloidogenic peptides: implications for the study of amyloid formation. Protein Sci 2002;11:342-9.

[40] Takata T, Oxford JT, Demeler B, Lampi KJ. Deamidation destabilizes and triggers aggregation of a lens protein, betaA3-crystallin. Protein Sci 2008;17:1565-75.

42

[41] Charache S, Fox J, McCurdy P, Kazazian H, Jr., Winslow R, Hathaway P, et al. Postsynthetic deamidation of hemoglobin Providence (beta 82 Lys replaced by Asn, Asp) and its effect on oxygen transport. J Clin Invest 1977;59:652-8.

[42] Curnis F, Longhi R, Crippa L, Cattaneo A, Dondossola E, Bachi A, et al. Spontaneous formation of L-isoaspartate and gain of function in fibronectin. J Biol Chem 2006;281:36466-76.

[43] Takehara T, Takahashi H. Suppression of Bcl-xL deamidation in human hepatocellular carcinomas. Cancer Res 2003;63:3054-7.

[44] Shimizu T, Watanabe A, Ogawara M, Mori H, Shirasawa T. Isoaspartate formation and neurodegeneration in Alzheimer's disease. Arch Biochem Biophys 2000;381:225-34.

[45] Doyle HA, Gee RJ, Mamula MJ. Altered immunogenicity of isoaspartate containing proteins. Autoimmunity 2007;40:131-7.

[46] Yang ML, Doyle HA, Gee RJ, Lowenson JD, Clarke S, Lawson BR, et al. Intracellular protein modification associated with altered T cell functions in autoimmunity. J Immunol 2006;177:4541-9.

[47] Mamula MJ, Gee RJ, Elliott JI, Sette A, Southwood S, Jones PJ, et al. Isoaspartyl post- translational modification triggers autoimmune responses to self-proteins. J Biol Chem 1999;274:22321-7.

[48] Lowenson JD, Kim E, Young SG, Clarke S. Limited accumulation of damaged proteins in l-isoaspartyl (D-aspartyl) O-methyltransferase-deficient mice. J Biol Chem 2001;276:20695-702.

[49] Sadakane Y, Yamazaki T, Nakagomi K, Akizawa T, Fujii N, Tanimura T, et al. Quantification of the isomerization of Asp residue in recombinant human alpha A- crystallin by reversed-phase HPLC. J Pharm Biomed Anal 2003;30:1825-33.

[50] Fujii N, Tajima S, Tanaka N, Fujimoto N, Takata T, Shimo-Oka T. The presence of D- beta-aspartic acid-containing peptides in elastic fibers of sun-damaged skin: a potent marker for ultraviolet-induced skin aging. Biochem Biophys Res Commun 2002;294:1047-51.

[51] Beck A, Wagner-Rousset E, Ayoub D, Van Dorsselaer A, Sanglier-Cianferani S. Characterization of therapeutic antibodies and related products. Anal Chem 2013;85:715- 36.

[52] Edman P. Sequence determination. Mol Biol Biochem Biophys 1970;8:211-55.

43

[53] Di Donato A, Ciardiello MA, de Nigris M, Piccoli R, Mazzarella L, D'Alessio G. Selective deamidation of ribonuclease A. Isolation and characterization of the resulting isoaspartyl and aspartyl derivatives. J Biol Chem 1993;268:4745-51.

[54] Johnson BA, Shirokawa JM, Hancock WS, Spellman MW, Basa LJ, Aswad DW. Formation of isoaspartate at two distinct sites during in vitro aging of human growth hormone. J Biol Chem 1989;264:14262-71.

[55] Schurter BT, Aswad DW. Analysis of isoaspartate in peptides and proteins without the use of radioisotopes. Anal Biochem 2000;282:227-31.

[56] Lehmann WD, Schlosser A, Erben G, Pipkorn R, Bossemeyer D, Kinzel V. Analysis of isoaspartate in peptides by electrospray tandem mass spectrometry. Protein Sci 2000;9:2260-8.

[57] Cournoyer JJ, Lin C, O'Connor PB. Detecting deamidation products in proteins by electron capture dissociation. Anal Chem 2006;78:1264-71.

[58] Sargaeva NP, Lin C, O'Connor PB. Identification of aspartic and isoaspartic acid residues in amyloid beta peptides, including Abeta1-42, using electron-ion reactions. Anal Chem 2009;81:9778-86.

[59] Cournoyer JJ, Pittman JL, Ivleva VB, Fallows E, Waskell L, Costello CE, et al. Deamidation: Differentiation of aspartyl from isoaspartyl products in peptides by electron capture dissociation. Protein Sci 2005;14:452-63.

[60] Chan WY, Chan TW, O'Connor PB. Electron transfer dissociation with supplemental activation to differentiate aspartic and isoaspartic residues in doubly charged peptide cations. J Am Soc Mass Spectrom 2010;21:1012-5.

[61] Ni W, Dai S, Karger BL, Zhou ZS. Analysis of isoaspartic Acid by selective proteolysis with Asp-N and electron transfer dissociation mass spectrometry. Anal Chem 2010;82:7485-91.

[62] Sargaeva NP, Lin C, O'Connor PB. Differentiating N-terminal aspartic and isoaspartic acid residues in peptides. Anal Chem 2011;83:6675-82.

[63] Dai S, Ni W, Patananan AN, Clarke SG, Karger BL, Zhou ZS. Integrated proteomic analysis of major isoaspartyl-containing proteins in the urine of wild type and protein L- isoaspartate O-methyltransferase-deficient mice. Anal Chem 2013;85:2423-30.

[64] Harris RJ, Kabakoff B, Macchi FD, Shen FJ, Kwong M, Andya JD, et al. Identification of multiple sources of charge heterogeneity in a recombinant antibody. J Chromatogr B Biomed Sci Appl 2001;752:233-45.

44

[65] Zhang W, Czupryn MJ. Analysis of isoaspartate in a recombinant monoclonal antibody and its charge isoforms. J Pharm Biomed Anal 2003;30:1479-90.

[66] Sargaeva NP, Goloborodko AA, O'Connor PB, Moskovets E, Gorshkov MV. Sequence- specific predictive chromatography to assist mass spectrometric analysis of asparagine deamidation and aspartate isomerization in peptides. Electrophoresis 2011;32:1962-9.

[67] Chelius D, Rehder DS, Bondarenko PV. Identification and characterization of deamidation sites in the conserved regions of human immunoglobulin gamma antibodies. Anal Chem 2005;77:6004-11.

[68] Xiao G, Bondarenko PV, Jacob J, Chu GC, Chelius D. 18O labeling method for identification and quantification of succinimide in proteins. Anal Chem 2007;79:2714-21.

[69] Chu GC, Chelius D, Xiao G, Khor HK, Coulibaly S, Bondarenko PV. Accumulation of succinimide in a recombinant monoclonal antibody in mildly acidic buffers under elevated temperatures. Pharm Res 2007;24:1145-56.

[70] Li X, Cournoyer JJ, Lin C, O'Connor PB. Use of 18O labels to monitor deamidation during protein and peptide sample processing. J Am Soc Mass Spectrom 2008;19:855-64.

[71] Gaza-Bulseco G, Li B, Bulseco A, Liu HC. Method to differentiate asn deamidation that occurred prior to and during sample preparation of a monoclonal antibody. Anal Chem 2008;80:9491-8.

[72] Liu H, Wang F, Xu W, May K, Richardson D. Quantitation of asparagine deamidation by isotope labeling and liquid chromatography coupled with mass spectrometry analysis. Anal Biochem 2013;432:16-22.

[73] Du Y, Wang F, May K, Xu W, Liu H. Determination of deamidation artifacts introduced by sample preparation using 18O-labeling and tandem mass spectrometry analysis. Anal Chem 2012;84:6355-60.

[74] Wang S, Kaltashov IA. An 18O-labeling assisted LC/MS method for assignment of aspartyl/isoaspartyl products from Asn deamidation and Asp isomerization in proteins. Anal Chem 2013;85:6446-52.

[75] Rehder DS, Chelius D, McAuley A, Dillon TM, Xiao G, Crouse-Zeineddini J, et al. Isomerization of a single aspartyl residue of anti-epidermal growth factor receptor immunoglobulin gamma2 antibody highlights the role avidity plays in antibody activity. Biochemistry 2008;47:2518-30.

[76] Zhang W, Czupryn JM, Boyle PT, Jr., Amari J. Characterization of asparagine deamidation and aspartate isomerization in recombinant human interleukin-11. Pharm Res 2002;19:1223-31.

45

[77] Alfaro JF, Gillies LA, Sun HG, Dai S, Zang T, Klaene JJ, et al. Chemo-enzymatic detection of protein isoaspartate using protein isoaspartate methyltransferase and hydrazine trapping. Anal Chem 2008;80:3882-9.

[78] Waite ER, Collins MJ, Ritz-Timme S, Schutz HW, Cattaneo C, Borrman HI. A review of the methodological aspects of aspartic acid racemization analysis for use in forensic science. Forensic Sci Int 1999;103:113-24.

[79] Fujii N, Satoh K, Harada K, Ishibashi Y. Simultaneous stereoinversion and isomerization at specific aspartic acid residues in alpha A-crystallin from human lens. J Biochem 1994;116:663-9.

[80] Inoue K, Hosaka D, Mochizuki N, Akatsu H, Tsutsumiuchi K, Hashizume Y, et al. Simultaneous Determination of Post-Translational Racemization and Isomerization of N- Terminal Amyloid-beta in Alzheimer's Brain Tissues by Covalent Chiral Derivatized Ultraperformance Liquid Chromatography Tandem Mass Spectrometry. Anal Chem 2014;86:797-804.

[81] Sakai K, Homma H, Lee JA, Fukushima T, Santa T, Tashiro K, et al. Localization of D- aspartic acid in elongate spermatids in rat testis. Arch Biochem Biophys 1998;351:96- 105.

[82] Sakai K, Homma H, Lee JA, Fukushima T, Santa T, Tashiro K, et al. D-aspartic acid localization during postnatal development of rat adrenal gland. Biochem Biophys Res Commun 1997;235:433-6.

[83] Liu H, May K. Disulfide bond structures of IgG molecules: structural variations, chemical modifications and possible impacts to stability and biological function. MAbs 2012;4:17-23.

[84] Heck T, Faccio G, Richter M, Thony-Meyer L. Enzyme-catalyzed protein crosslinking. Appl Microbiol Biotechnol 2013;97:461-75.

[85] Griffin M, Casadio R, Bergamini CM. Transglutaminases: nature's biological glues. Biochem J 2002;368:377-96.

[86] Lucero HA, Kagan HM. Lysyl oxidase: an oxidative enzyme and effector of cell function. Cell Mol Life Sci 2006;63:2304-16.

[87] Spasser L, Brik A. Chemistry and biology of the ubiquitin signal. Angew Chem Int Ed Engl 2012;51:6840-62.

[88] Baynes JW. Role of oxidative stress in development of complications in diabetes. Diabetes 1991;40:405-12.

46

[89] Nagaraj RH, Shipanova IN, Faust FM. Protein cross-linking by the Maillard reaction. Isolation, characterization, and in vivo detection of a lysine-lysine cross-link derived from methylglyoxal. J Biol Chem 1996;271:19338-45.

[90] Sell DR, Monnier VM. Structure elucidation of a senescence cross-link from human extracellular matrix. Implication of pentoses in the aging process. J Biol Chem 1989;264:21597-602.

[91] Sell DR, Biemel KM, Reihl O, Lederer MO, Strauch CM, Monnier VM. Glucosepane is a major protein cross-link of the senescent human extracellular matrix. Relationship with diabetes. J Biol Chem 2005;280:12310-5.

[92] Monnier VM, Mustata GT, Biemel KL, Reihl O, Lederer MO, Zhenyu D, et al. Cross- linking of the extracellular matrix by the maillard reaction in aging and diabetes: an update on "a puzzle nearing resolution". Ann N Y Acad Sci 2005;1043:533-44.

[93] Trivedi MV, Laurence JS, Siahaan TJ. The role of thiols and disulfides on protein stability. Curr Protein Pept Sci 2009;10:614-25.

[94] Wang Y, Lu Q, Wu SL, Karger BL, Hancock WS. Characterization and comparison of disulfide linkages and scrambling patterns in therapeutic monoclonal antibodies: using LC-MS with electron transfer dissociation. Anal Chem 2011;83:3133-40.

[95] Cohen SL, Price C, Vlasak J. Beta-elimination and peptide bond hydrolysis: two distinct mechanisms of human IgG1 hinge fragmentation upon storage. J Am Chem Soc 2007;129:6976-7.

[96] Lispi M, Datola A, Bierau H, Ceccarelli D, Crisci C, Minari K, et al. Heterogeneity of commercial recombinant human growth hormone (r-hGH) preparations containing a thioether variant. J Pharm Sci 2009;98:4511-24.

[97] Datola A, Richert S, Bierau H, Agugiaro D, Izzo A, Rossi M, et al. Characterisation of a novel growth hormone variant comprising a thioether link between Cys182 and Cys189. ChemMedChem 2007;2:1181-9.

[98] Mozziconacci O, Kerwin BA, Schoneich C. Exposure of a monoclonal antibody, IgG1, to UV-light leads to protein dithiohemiacetal and thioether cross-links: a role for thiyl radicals? Chem Res Toxicol 2010;23:1310-2.

[99] Fradkin AH, Mozziconacci O, Schoneich C, Carpenter JF, Randolph TW. UV photodegradation of murine growth hormone: chemical analysis and immunogenicity consequences. Eur J Pharm Biopharm 2014;87:395-402.

[100] Zhang Q, Flynn GC. Cysteine racemization on IgG heavy and light chains. J Biol Chem 2013;288:34325-35.

47

[101] Zhang Q, Schenauer MR, McCarter JD, Flynn GC. IgG1 thioether bond formation in vivo. J Biol Chem 2013;288:16371-82.

[102] Darrington RT, Anderson BD. Evidence for a common intermediate in insulin deamidation and covalent dimer formation: effects of pH and aniline trapping in dilute acidic solutions. J Pharm Sci 1995;84:275-82.

[103] Strickley RG, Anderson BD. Solid-state stability of human insulin. II. Effect of water on reactive intermediate partitioning in lyophiles from pH 2-5 solutions: stabilization against covalent dimer formation. J Pharm Sci 1997;86:645-53.

[104] Brange J, Hallund O, Sorensen E. Chemical stability of insulin. 5. Isolation, characterization and identification of insulin transformation products. Acta Pharm Nord 1992;4:223-32.

[105] Desfougeres Y, Jardin J, Lechevalier V, Pezennec S, Nau F. Succinimidyl residue formation in hen egg-white lysozyme favors the formation of intermolecular covalent bonds without affecting its tertiary structure. Biomacromolecules 2011;12:156-66.

[106] DiMarco T, Giulivi C. Current analytical methods for the detection of dityrosine, a biomarker of oxidative stress, in biological samples. Mass Spectrom Rev 2007;26:108- 20.

[107] Correia M, Neves-Petersen MT, Jeppesen PB, Gregersen S, Petersen SB. UV-light exposure of insulin: pharmaceutical implications upon covalent insulin dityrosine dimerization and disulphide bond photolysis. PLoS One 2012;7:e50733.

[108] Malencik DA, Anderson SR. Dityrosine formation in calmodulin. Biochemistry 1987;26:695-704.

[109] Mozziconacci O, Haywood J, Gorman EM, Munson E, Schoneich C. Photolysis of recombinant human insulin in the solid state: formation of a dithiohemiacetal product at the C-terminal disulfide bond. Pharm Res 2012;29:121-33.

[110] Torosantucci R, Mozziconacci O, Sharov V, Schoneich C, Jiskoot W. Chemical modifications in aggregates of recombinant human insulin induced by metal-catalyzed oxidation: covalent cross-linking via michael addition to tyrosine oxidation products. Pharm Res 2012;29:2276-93.

[111] Torosantucci R, Sharov VS, van Beers M, Brinks V, Schoneich C, Jiskoot W. Identification of oxidation sites and covalent cross-links in metal catalyzed oxidized interferon Beta-1a: potential implications for protein aggregation and immunogenicity. Mol Pharm 2013;10:2311-22.

48

[112] Agon VV, Bubb WA, Wright A, Hawkins CL, Davies MJ. Sensitizer-mediated photooxidation of histidine residues: evidence for the formation of reactive side-chain peroxides. Free Radic Biol Med 2006;40:698-710.

[113] Kang P, Foote CS. Photosensitized oxidation of 13C,15N-labeled imidazole derivatives. J Am Chem Soc 2002;124:9629-38.

[114] Medinas DB, Gozzo FC, Santos LF, Iglesias AH, Augusto O. A ditryptophan cross-link is responsible for the covalent dimerization of human superoxide dismutase 1 during its bicarbonate-dependent peroxidase activity. Free Radic Biol Med 2010;49:1046-53.

[115] Schwendeman SP, Costantino HR, Gupta RK, Siber GR, Klibanov AM, Langer R. Stabilization of tetanus and diphtheria toxoids against moisture-induced aggregation. Proc Natl Acad Sci U S A 1995;92:11234-8.

[116] Simons BL, Kaplan H, Fournier SM, Cyr T, Hefford MA. A novel cross-linked RNase A dimer with enhanced enzymatic properties. Proteins 2007;66:183-95.

[117] Simons BL, King MC, Cyr T, Hefford MA, Kaplan H. Covalent cross-linking of proteins without chemical reagents. Protein Sci 2002;11:1558-64.

[118] Zhang T, Zhang J, Hewitt D, Tran B, Gao X, Qiu ZJ, et al. Identification and characterization of buried unpaired cysteines in a recombinant monoclonal IgG1 antibody. Anal Chem 2012;84:7112-23.

[119] Roy S, Mason BD, Schoneich CS, Carpenter JF, Boone TC, Kerwin BA. Light-induced aggregation of type I soluble tumor necrosis factor receptor. J Pharm Sci 2009;98:3182- 99.

[120] Coulter-Mackie MB, Gagnier L. Spectrum of mutations in the arylsulfatase A gene in a Canadian DNA collection including two novel frameshift mutations, a new missense mutation (C488R) and an MLD mutation (R84Q) in cis with a pseudodeficiency allele. Mol Genet Metab 2003;79:91-8.

[121] Robbins DC, Hirshman M, Wardzala LJ, Horton ES. High-molecular-weight aggregates of therapeutic insulin. In vitro measurements of receptor binding and bioactivity. Diabetes 1988;37:56-9.

[122] Malencik DA, Anderson SR. Dityrosine as a product of oxidative stress and fluorescent probe. Amino Acids 2003;25:233-47.

[123] Miyagi M, Rao KC. Proteolytic 18O-labeling strategies for quantitative proteomics. Mass Spectrom Rev 2007;26:121-36.

[124] Wang YK, Ma Z, Quinn DF, Fu EW. Inverse 18O labeling mass spectrometry for the rapid identification of marker/target proteins. Anal Chem 2001;73:3742-50.

49

[125] Ye X, Luke B, Andresson T, Blonder J. 18O stable isotope labeling in MS-based proteomics. Brief Funct Genomic Proteomic 2009;8:136-44.

[126] Hajkova D, Rao KC, Miyagi M. pH dependency of the carboxyl oxygen exchange reaction catalyzed by lysyl endopeptidase and trypsin. J Proteome Res 2006;5:1667-73.

[127] Rao KC, Palamalai V, Dunlevy JR, Miyagi M. Peptidyl-Lys metalloendopeptidase- catalyzed 18O labeling for comparative proteomics: application to cytokine/lipolysaccharide-treated human retinal pigment epithelium cell line. Mol Cell Proteomics 2005;4:1550-7.

[128] Ramos-Fernandez A, Lopez-Ferrer D, Vazquez J. Improved method for differential expression proteomics using trypsin-catalyzed 18O labeling with a correction for labeling efficiency. Mol Cell Proteomics 2007;6:1274-86.

[129] Ye X, Luke BT, Johann DJ, Jr., Ono A, Prieto DA, Chan KC, et al. Optimized method for computing (18)O/(16)O ratios of differentially stable-isotope labeled peptides in the context of postdigestion (18)O exchange/labeling. Anal Chem 2010;82:5878-86.

[130] Petritis BO, Qian WJ, Camp DG, 2nd, Smith RD. A simple procedure for effective quenching of trypsin activity and prevention of 18O-labeling back-exchange. J Proteome Res 2009;8:2157-63.

[131] Stewart, II, Thomson T, Figeys D. 18O labeling: a tool for proteomics. Rapid Commun Mass Spectrom 2001;15:2456-65.

[132] Sevinsky JR, Brown KJ, Cargile BJ, Bundy JL, Stephenson JL, Jr. Minimizing back exchange in 18O/16O quantitative proteomics experiments by incorporation of immobilized trypsin into the initial digestion step. Anal Chem 2007;79:2158-62.

[133] Back JW, Notenboom V, de Koning LJ, Muijsers AO, Sixma TK, de Koster CG, et al. Identification of cross-linked peptides for protein interaction studies using mass spectrometry and 18O labeling. Anal Chem 2002;74:4417-22.

[134] Chen X, Chen YH, Anderson VE. Protein cross-links: universal isolation and characterization by isotopic derivatization and electrospray ionization mass spectrometry. Anal Biochem 1999;273:192-203.

[135] Rinner O, Seebacher J, Walzthoeni T, Mueller LN, Beck M, Schmidt A, et al. Identification of cross-linked peptides from large sequence databases. Nat Methods 2008;5:315-8.

[136] Kato Y, Maruyama W, Naoi M, Hashizume Y, Osawa T. Immunohistochemical detection of dityrosine in lipofuscin pigments in the aged human brain. FEBS Lett 1998;439:231-4.

50

[137] Kato Y, Wu X, Naito M, Nomura H, Kitamoto N, Osawa T. Immunochemical detection of protein dityrosine in atherosclerotic lesion of apo-E-deficient mice using a novel monoclonal antibody. Biochem Biophys Res Commun 2000;275:11-5.

51

Chapter 2: Protein Isoaspartate Methyltransferase-Mediated 18O-Labeling of

Isoaspartic Acid for Mass Spectrometry Analysis

Reproduced with permission from “Min Liu, Janet Cheetham, Nina Cauchon, Judy

Ostovic, Wenqin Ni, Da Ren, and Zhaohui Sunny Zhou. Protein Isoaspartate

Methyltransferase-Mediated 18O-Labeling of Isoaspartic Acid for Mass Spectrometry

Analysis. Analytical Chemistry, 2012, 84, 1056-1062”. Copyright [2012] American

Chemical Society.

Co-authors’ work in this chapter: Min Liu: experimental design and execute, data analysis,

manuscript writing and revision; Janet Cheetham: manuscript writing and revision, grant support;

Nina Cauchon: manuscript writing and revision; Judy Ostovic: manuscript writing and revision;

Wenqin Ni: PIMT purification, manuscript writing and revision; Da Ren: idea contribution,

experimental design, manuscript writing and revision; Zhaohui Sunny Zhou: idea contribution,

experimental design, data analysis, manuscript writing and revision and grant support.

52

2.1 Abstract

Arising from spontaneous aspartic acid (Asp) isomerization or asparagine (Asn)

deamidation, isoaspartic acid (isoAsp, isoD or beta-Asp) is a ubiquitous non-enzymatic

modification of proteins and peptides. Due to no mass difference between isoaspartyl and

aspartyl species, sensitive and specific detection of isoAsp, particularly in complex samples,

remains challenging. Here we report a novel assay for Asp isomerization by isotopic labeling

with 18O via a two-step process: the isoAsp peptide is first specifically methylated by protein

isoaspartate methyltransferase (PIMT, EC 2.1.1.77) to the corresponding methyl ester, which is

subsequently hydrolyzed in 18O-water to regenerate isoAsp. The specific replacement of 16O with 18O at isoAsp leads to a mass shift of 2 Da, which can be automatically and unambiguously

recognized using standard mass spectrometry, such as collision-induced dissociation (CID), and

data analysis algorithms. Detection and site identification of several isoAsp peptides in a

monoclonal antibody and the β-delta sleep-inducing peptide (DSIP) are demonstrated.

2.2 Introduction

The non-enzymatic post-translational formation of isoaspartic acid (isoAsp, isoD or beta-

Asp) in oligopeptides (Scheme 2-1), arising from either the isomerization of aspartic acid (Asp)

or the deamidation of asparagine (Asn) via a common succinimide intermediate, is one of the major chemical degradation pathways both in vivo and in vitro. The formation of isoAsp via Asp isomerization and Asn deamidation in a protein alters its structure by inserting an extra methylene group into the peptide backbone[1] and, in doing so, may change protein activity or trigger an immunologic response[2, 3]. As such, this protein post-translational modification

53

(PTM) plays critical roles in biological processes, human diseases and protein pharmaceutical

development. For example, isoAsp level is elevated in amyloid-beta peptides in Alzheimer’s

disease[4]. IsoAsp accumulates over time and thus is associated with aging, perhaps acting as a

molecular clock[5-9]. In addition, significant amounts of isoAsp are also commonly observed in protein pharmaceuticals and represent a major contributor to heterogeneity, particularly after

long-term storage[10-12]. Typically, pH is one of critical factors affecting the formation of Asn

deamidation and Asp isomerization; both processes are also reported to depend on the primary

sequences, higher-order structures and formulation[10, 11, 13-15].

O O

NH2 OH NH NH N N H H O O Asparagine Aspartic acid (Asn) (Asp) NH3 H2O Deamidation Isomerization

O

N N H O Succinimide

H2O

O

NH OH N H O Isoaspartic acid (IsoAsp)

Scheme 2-1. Formation of isoAsp from the isomerization of aspartic acid (Asp) or the

deamidation of asparagine (Asn).

54

For analysis, it is challenging to differentiate isoAsp from Asp or Asn (particularly the former), as isoAsp and Asp have identical mass and bear similar charge and structure. Current approaches for isoAsp analysis include chemical (e.g., Edman degradation), immunological, enzymatic (e.g., isoQuant), and instrumental methods (e.g., chromatography and mass spectrometry)[16, 17]. High performance liquid chromatography (HPLC) coupled with mass spectrometry/collision induced dissociation (LC-MS/CID) is commonly used for the characterization of protein modifications, however MS/CID often fails to differentiate isoAsp

and Asp[17]. Recently O’Connor and others have demonstrated that electron transfer

dissociation (ETD)/electron capture dissociation (ECD) mass spectrometry is able to distinguish

isoAsp from Asp peptides based on a pair of the characteristic reporter ions of isoAsp (c.+58 and

z-57)[16-20]. However, the peak intensity of this single pair of diagnostic ions of isoAsp may

vary under different conditions, often requiring manual inspection of the spectral data and

making assignment ambiguous when isoAsp is present at low abundance; moreover, multiply

charged precursor ions are required as well, limiting the scope of this method.

Another commonly used assay is based on protein isoaspartate O-methyltransferase

(PIMT or PCMT, EC 2.1.1.77). This enzyme specifically transfers a methyl group from S- adenosyl-L-methionine (SAM or AdoMet) to isoAsp, generating S-adenosyl-homocysteine

(SAH or AdoHcy) and the corresponding isoaspartate methyl esters (Scheme 2-2)[5, 21-24]. As

a result, the amount of isoAsp can be deduced from quantifying AdoHcy, the byproduct of

methylation. The other methylation product, isoAsp methyl ester, is labile and spontaneously

cyclize to aspartyl succinimide (Asu), which is also labile under most conditions used for the analysis of peptides and proteins[25, 26]. Therefore, the information on the specific location of isoAsp in peptides is often lost as the methyl esters and succinimides typically rapidly hydrolyze

55 back to isoAsp and Asp. To overcome these limitations, trapping the labile isoaspartyl methyl esters and succinimides with hydrazines or hydroxylamines has been developed for isoAsp detection[27, 28]. However, the conversion of methyl esters to hydrazides or hydroxamic acids is not stoichiometric, as water present in the reagent solutions competes with the trapping reaction. On the other hand, as discussed below, the hydrolysis reaction in 18O-water represents an attractive method for isoAsp labeling.

Scheme 2-2. Isotopic labeling of isoaspartic acid via protein isoaspartyl methyltransferase

(PIMT)-catalyzed S-adenosyl-methionine (SAM or AdoMet)-dependent methylation and hydrolysis of the resulting methyl ester and succinimide in 18O-water.

56

Scheme 2-3. Identification of isoAsp peptides by mass spectrometry using the mass increase of 2

Da imparted by 18O-labeling.

Stable isotope labeling combined with mass spectrometry analysis is a powerful tool for identification and quantification due to the fact that no detectable change in retention time, ionization efficiency and fragmentation patterns after isotope labeling is observed[29-37]. For example, Fenselau and coworkers have developed a general strategy to label C-termini of peptides using proteases and 18O-water for peptide quantification and identification[30, 35, 36].

The use of 18O to quantitate succinimide and to track deamidation during sample handling has

also been reported[29, 31-34]. Herein we describe a novel isoAsp assay that couples PIMT-

mediated methylation with 18O-labeling followed by LC-MS analysis (Scheme 2-2 & 2-3). In

57

the first step, PIMT specifically methylates isoAsp to form isoaspartate methyl ester.

Subsequently, the labile methyl ester spontaneously cyclizes to the succinimide intermediate

(Asu) which then hydrolyzes in 18O-water to produce 18O-labeled isoAsp[29, 31-34, 38]. The

incorporation of 18O shifts the modified residual mass by 2 and 3 Da relative to Asp and Asn, respectively. This allows facile screening and site determination of isoAsp using standard mass spectrometry techniques (such as CID) and data analysis algorithms. Using our method, several isoAsp peptides in a recombinant monoclonal antibody and a synthetic peptide were detected and

the sites of isoAsp were identified. In addition, aspartyl succinimide (Asu) and isoAsp can be

distinguished by 18O-labeling in the presence and absence of PIMT. Because mass spectrometry

with reasonably high resolution can distinguish isoAsp from Asn (a mass increase of 0.984 Da),

our method focuses on the analysis of isoAsp from Asp (no change in mass)—the smallest

protein post-translational modification.

2.3 Experimental Section

2.3.1 Chemicals

All chemicals were reagent grade or above. Guanidine hydrochloride (GndHCl) and S-

adenosyl-methionine hydrochloride (AdoMet or SAM) were purchased from Sigma (St. Louis,

MO, USA). 18O-water (97%) was obtained from Cambridge Isotope Laboratories (Andover,

MA, USA). β-Delta sleep-inducing peptides (Asp-DSIP, Trp-Ala-Gly-Gly-Asp-Ala-Ser-Gly-

Glu, and isoAsp-DSIP, Trp-Ala-Gly-Gly-isoAsp-Ala-Ser-Gly-Glu) trifluoroacetate salts were

purchased from Bachem America (King of Prussia, PA, USA). Immobilized trypsin was

purchased from Thermo Scientific (Rockford, IL, USA). Recombinant protein L-isoaspartyl-O-

58

methyltransferase (PIMT) was obtained as previously described[39]. Recombinant monoclonal antibody, anti-streptavidin immunoglobulin gamma 1 (IgG1), was produced in Chinese hamster ovary (CHO) cells, purified according to standard manufacturing procedures, formulated at a concentration of 20 mg/mL in 50 mM acetate buffer at pH 5.2, and stored at -70 °C at Amgen

(Thousand Oaks, CA, USA).

2.3.2 Generation of isoAsp

The Asp-DSIP peptides were dissolved to a final concentration of 1 mg/mL in 0.1 M acetate buffer at pH 4.8 and stored at 50 °C for 3 days. After being exchanged into 0.1M acetate buffer at pH 4, the antibody IgG1 was incubated at 45 °C for 1 month.

2.3.3 Reduction, Alkylation, and Tryptic Digestion of IgG1

IgG1 (20 mg/mL) was diluted to 1 mg/mL in the denaturation buffer (7.5 M guanidine

HCl, 2 mM EDTA and 0.25 M Tris-HCl, pH 7.5) to a final volume of 0.5 mL. Reduction was

accomplished with the addition of 3 μL of 0.5 M dithiothreitol (DTT) followed by 30 min

incubation at room temperature. S-Carboxymethylation was achieved with the addition of 7 μL

of 0.5 M iodoacetic acid (IAA); the reaction was carried out in the dark for 15 min at room

temperature. Excess iodoacetic acid was quenched with the addition of 4 μL of 0.5 M DTT.

Reduced and alkylated IgG1 samples were exchanged into the digestion buffer (0.1 M Tris-HCl

at pH 7.5) using a NAP-5 size-exclusion column (GE Healthcare, Piscataway, NJ, USA). Then,

150 µL of immobilized trypsin suspension was centrifuged at 900 x g for 10 seconds and the

59

supernatant was removed followed by three washes, each with 1 mL of 0.1 M Tris-HCl at pH

7.5. The washed immobilized trypsin was mixed with 300 μL of the reduced, alkylated, and buffer-exchanged antibody to achieve a 1:2 (v/v) enzyme/substrate ratio before incubation at 37

°C for 45 min. Subsequently, 300 μL of acetonitrile was added and the supernatant was collected after centrifuging at 900 x g for 10 seconds. Then, 200 μL of the supernatant was dried and then reconstituted into 98 μL of 18O-water for methylation as described next.

2.3.4 Methylation Catalyzed by PIMT

Methylation reactions were carried out in a final volume of 100 μL containing 0.1 M

Tris-HCl at pH 7.5, 7 μM of DSIP peptides or the tryptic digest of IgG1, 120 μM of AdoMet and

15.9 μM of PIMT at 37 °C for 30 min. To quench methylation, guanidine HCl (GndHCl, 10 M)

was added to a final concentration of 2.3 M. It should be noted that the above buffer or reagents

were prepared in 18O-water, instead of normal water.

2.3.5 18O-Labeling

Sodium bicarbonate buffers at pH 8.5 in 18O-water was prepared by drying 100 μL of 1

M sodium bicarbonate solution at pH 8.5 followed by reconstitution into the same volume of

18O-water. To initiate the hydrolysis of succinimide, 60 μL of 1 M sodium bicarbonate-18O-

water was added to the methylated peptide solution described above. The reaction was then

conducted at 37 °C for 30 min.

60

2.3.6 HPLC

The separation of the DSIP peptides was carried out on an XBridge C18 column (150 x

2.1 mm, 3.5 µm, Waters, Milford, MA, USA) at column temperature of 50 °C with a flow rate of

200 μL/min. Mobile phase A consisted of 0.1% formic acid in water while mobile phase B

contained 0.085% formic acid in 90% acetonitrile. A linear gradient was applied by increasing

mobile phase B from 0 to 50% in 60 min. The injection volume was 25 μL. Chromatographic

profiles were monitored by UV absorption at 215 nm.

Tryptic digests of IgG1 sample were separated on a Polaris Ether C18 column (250 x 2.1

mm, 3 μm, Varian, Palo Alto, CA, USA) at column temperature of 50 °C with a flow rate of 200

μL/min. Mobile phase A was 0.1% trifluoroacetic acid in water while mobile phase B contained

0.085% trifluoroacetic acid in 90% acetonitrile. A linear gradient was applied by increasing

mobile phase B from 0 to 50% in 195 min. The injection volume was 100 μL. Elution profiles

were monitored by UV absorption at 215 nm.

2.3.7 Mass Spectrometry

LXQ and LTQ Orbitrap mass spectrometers (ThermoFisher Scientific, San Jose, CA,

USA) were used in-line with HPLC systems (Agilent 1100, Palo Alto, CA, USA) for the stressed

Asp-DSIP and IgG1 samples, respectively. LXQ was operated with a full scan, zoom scan and

data-independent MS/MS scan. The spray voltage was 5 kV, and the capillary temperature was

280 °C. For LTQ Orbitrap, a high resolution full MS scan at 60,000 resolution (at m/z 400), followed by data-dependent MS/MS scans of the top three most abundant ions, was set up to

61

acquire both the mass and the sequence information. The spray voltage was 5 kV, and the

capillary temperature was 300 °C. Both instruments were tuned using the doubly charged ion of

the synthetic peptide, Bradykinin. The MS/MS spectra were obtained using normalized collision

energy of 35%. Mass Analyzer software developed in-house was used for peptide identification

and sequencing[40, 41]. Extracted ion chromatograms (XIC) were used to quantify the relative

amount of isoAsp peptide and its Asp isomer.

2.4 Results and Discussion

As expected from the general approach outlined in Scheme 2-2 & 2-3, isoAsp peptides

were observed completely labeled by 18O via the sequential methylation and hydrolysis in 18O water, as evident from the mass shift of 2 Da in isotopic distribution (see Figure 2-1). By searching for potential 2 Da mass increases for each Asp residue using a standard data mining algorithm, e.g. MassAnalyzer, several 18O-labeled isoAsp peptides from the IgG1 sample were

automatically identified. Furthermore, the precise locations of isoAsp were readily and

unambiguously established by tandem mass spectrometry using CID.

62

851.36 100 A) 18 80 A) isoAsp-DSIP (with O tag)

60 852.38 40

Relative Abundance 20 853.38 849.38 850.40 854.44 855.36 0 100 849.36

80 B) Asp-DSIP (without 18O tag) 60 850.36 40

Relative Abundance20 851.36 852.38 0 848 849 850 851 852 853 854 855 856 m/z

1406.6407 100 1406.9748 B) 1406.3067 isoAsp-LC69-108 (with18O tag) 80

1407.3088 60

1405.9721 40 1407.6430

Relative Abundance 20 1407.9768 1408.3113 1405.6388 0

1405.9743 100 1406.3083 1405.6402 80 Asp-LC69-108 (without 18O tag)

60 1406.6424

40 1405.3058 1406.9761

Relative Abundance 20 1407.3102 1407.6440 1407.9784 1408.3125 0 1405.5 1406.0 1406.5 1407.0 1407.5 1408.0 1408.5 m/z

Figure 2-1. Isotopic distribution of a singly charged DSIP peptide with/without 18O tag (A) and a

triply charged tryptic peptide LC69-108 from the IgG1 sample with/without 18O tag (B). Their

sequences with alkylated cysteines are

69SGTSASLAITGLQAEDEADYYCQSYisoDSSLSGLYVFGTGTK108 and

69SGTSASLAITGLQAEDEADYYCQSYDSSLSGLYVFGTGTK108, respectively.

63

2.4.1 Methylation of isoAsp

The specificity of the PIMT-catalyzed methylation of isoAsp residues has been

extensively investigated, indicating that only isoAsp residues, but not Asp, are recognized[2, 5,

9, 10, 24]. In agreement with the literature, no methyl ester or succinimide was observed for the

Asp-DSIP peptide (Figure 2-2).

Asp m/z 849.37 100 Asp-DSIP Sample 80

60

40

Relative Abundance 20

0 Succinimide

m/z 831.42 100 isoAsp-DSIP Sample 80

60 18O-IsoAsp 18 40 m/z 851.45 O-Asp Methyl Ester m/z 851.43 m/z 863.33 Relative Abundance 20

0 14 15 16 17 18 19 20 Time (min)

Figure 2-2. Specificity of PIMT-mediated 18O-labeling shown in the Asp-DSIP and isoAsp-DSIP

samples. Methylation via PIMT specifically occurs at isoAsp peptide (bottom trace), not Asp

peptide (top trace). The corresponding methyl ester is labile and converts to the succinimide

intermediate spontaneously. Hydrolysis of succinimide in 18O-water forms 18O-isoAsp (major

product) and 18O-Asp (minor product).

64

As illustrated in Scheme 2-2, the degree of methylation of isoAsp peptides dictates the

overall yield of 18O incorporation into isoAsp residues. Therefore, methylation efficiency was optimized before attempting 18O-labeling. As for a typical enzymatic transformation,

methylation is faster under higher PIMT concentration, so a relatively high concentration of

PIMT (16 μM) was employed. In addition, PIMT is known to be sensitive to feedback inhibition

from the product, S-adenosyl-homocysteine (AdoHcy or SAH)[42], so excess of AdoMet (the

methyl donor) was also used. If necessary, product inhibition can be further alleviated by the

addition of AdoHcy nucleosidase (EC 3.2.2.9), as has been previously demonstrated[43-45]. In

addition, pH, temperature, detergents (e.g. SDS) or chaotropic reagents (e.g. guanidine-HCl) may

affect PIMT enzyme activity, therefore caution was taken during sample preparation. Under our

conditions, methylation of isoAsp was completed as evidenced by the disappearance of isoAsp peptide peak and the concomitant appearance of peaks for its methyl ester and succinimide peptide (Figure 2-3). Moreover, complete methylation is also supported by the near stoichiometric incorporation of 18O into each isoAsp residue in isoAsp-DSIP and the tryptic

peptides from the IgG1 antibody described next (Figure 2-1 and Figure 2-3).

65

A) 849.49 100 81.4% Asp 18.6% isoAsp Mixture prior to PIMT/18O labeling 50 849.53

0 849.53 100 75.3% Asp 15.3% Methylester No hydrolysis 50 9.4% Succinimide 831.41 863.50 0 849.55 100 88.2% Asp 18 Hydrolysis at pH9/30min 50 11.8% O-isoAsp 851.43 0 849.53 100 Relative Abundance 88.3% Asp 11.7% 18O-isoAsp Hydrolysis at pH9/3hrs 50 851.51 0 849.41 100 87.9% Asp 12.1% 18O-isoAsp Hydrolysis at pH9/24hrs 50 851.51 0 11 12 13 14 15 16 17 18 19 20 Time (min)

849.30 851.34 100 B)

90 Unlabeled isoAsp 18 80 O-isoAsp (Hydrolysis at pH9/30min) 70 18O-isoAsp (Hydrolysis at pH9/3hrs) 60 18O-isoAsp (Hydrolysis at pH9/24hrs)

50 850.30 852.36

40

30

20 853.36

10 854.34 855.34 0 847 848 849 850 851 852 853 854 855 856 857 m/z

Figure 2-3. The mixture of isoAsp-DSIP and DSIP peptide was analyzed by PIMT/18O-labeling method. Complete conversion of isoAsp to its corresponding methyl ester and succinimide was evidenced by the disappearance of the isoAsp peak. Complete hydrolysis in the next step was supported by the disappearance of the methyl ester and succinimide peaks in buffer pH 9 for 30 min. No significant isoAsp change was observed when the sample was exposed to buffer at pH 9,

37 °C, after 30 min, 3 hrs and 24 hrs (A). No isotopic distortion was observed when the sample was exposed for 24 hrs (B).

66

To ensure all isoAsp residues are accessible to PIMT, the IgG1 was first digested by trypsin prior to methylation. Immobilized trypsin was used and removed from the peptides after digestion for several reasons: first and foremost, to prevent 18O-labeling at the C-termini of peptides[30, 35, 36]; second, to prevent proteolysis of PIMT. For comparison, the reduced and alkylated protein was methylated/labeled in 18O-water first and then digested with trypsin.

Similar results were found (data not shown), suggesting in this case all isoAsp residues in this particular protein are accessible to the PIMT enzyme.

It is worth noting that isoAsp residues at C- or N-termini are not methylated by PIMT, thus precluding their detection by our approach. On the other hand, it has been shown that isoAsp is refractory to the proteolytic digestion by most endoproteases (such as Asp-N and trypsin)[16, 46, 47], so isoAsp is unlikely to be at the termini of the digested peptides. If such peptides do exist, N-terminal isoAsp can be detected by other methods, such as ETD mass spectrometry, as recently described[48].

67

2.4.2 Hydrolysis and 18O Incorporation

As illustrated in Scheme 2-2, the isoaspartate methyl ester spontaneously converts to

succinimide which is hydrolyzed in 18O-water into isoAsp and Asp peptides, resulting in 18O incorporation. The mass spectra of 18O-labeled isoAsp-DSIP and IgG1 peptides showed greater

than 95% isotope incorporation (Figure 2-1 and Figure 2-3), which is crucial to the subsequent

mass spectrometric analysis. Considering that 18O-water contained ~ 3% 16O-water, our results

again indicate that isoAsp was completely methylated under these conditions.

Buffer pH and hydrolysis time are critical parameters to completely convert succinimide

to 18O-isoAsp/Asp with minimum artifact. As shown in Figure 2-4, the hydrolysis of methyl

ester and succinimide was quicker under higher pH conditions, though we suspect that the

abundance of deamidation artifacts from sample treatment might be increased at higher pH.

Therefore, the hydrolysis conditions were examined, and pH 8.5-9.0, 37 °C and 30 min was

found to be optimal. Under these conditions, both isoaspartate methyl ester and succinimide

peaks disappeared with the concomitant appearance of 18O-labeled isoAsp and Asp peptide peaks

(see Figure 2-3). Similar to other peptides, the isoAsp species was the preferential hydrolysis

product compared to its Asp counterpart in a ratio of about 3:1. The presence of 18O-labeled Asp

peptides does not affect the detection of 18O-labeled isoAsp peptides and, in fact, may provide

secondary confirmation, as 18O in both isoAsp and Asp species serves as a telltale sign of the

existence of isoAsp in the original samples.

68

18O-isoAsp 851.47 18 100 O-Asp A) 851.49 80 pH8.1-30min 60 Succinimide 40 Methylester 831.51 865.57 20

Relative Abundance 0 851.39 100

80 851.43 60 pH8.5-30min 40 20

Relative Abundance 0 851.51 100

80 851.51 60 40 pH9.0-30min 20

Relative Abundance 0 11 12 13 14 15 16 17 18 19 20 Time (min)

Succinimide 831.44 100 B) 80 18 60 O-isoAsp pH8.5-15min 851.53 18O-Asp 40 851.39 Methylester 20 863.58

Relative Abundance 0 851.39 100

80 851.43 60 pH8.5-30min 40 20

Relative Abundance 0 851.52 100

80 851.48 60 pH8.5-45min 40 20

Relative Abundance 0 11 12 13 14 15 16 17 18 19 20 Time (min)

Figure 2-4. The effects of pH (A) and incubation time (B) on the hydrolysis of the succinimide and methyl ester. Complete hydrolysis was observed at pH 8.5-9.0 at 37 °C after 30 min.

Since deamidation of Asn and isomerization of Asp may happen spontaneously during the sample handling process, the degree of background reactions should be measured[31, 32].

Prolonged incubation and harsh conditions should be avoided. Under our conditions, no isoAsp

69 from Asp was detected when the samples were incubated at pH 9, 37 °C for 30 min, and 24 hrs

(Figure 2-5). Additionally, deamidation of Asn (even the NG “hot spots”) occurred to a small degree for 30 min, thus having little practical effects on the identification of isoAsp from Asp via tandem mass spectrometry.

Asp 849.44 100 A) 80 60 isoAsp 40 849.52 20 Marker 831.43 Relative Abundance Relative 0 849.40 100 80 60 40 20 DSIP/pH9/30min

Relative Abundance Relative 0 849.47 100 80 60 40 DSIP/pH9/24hrs 20 Relative Abundance Relative 0 12 13 14 15 16 17 18 19 Time (min)

849.28 100 B) 90

80 70 DSIP STD 60 DSIP/pH9/30min 50 850.26 40 DSIP/pH9/24hrs Relative Abundance

30

20 851.28 10 852.24 0 847 848 849 850 851 852 853 854 855 856 857 m/z

Figure 2-5. Stability of Asp-DSIP during sample treatment. Neither isoAsp-DSIP formation nor

18O-incorporation was observed when Asp-DSIP was exposed to pH 9 buffer at 37 °C for 24 hrs.

70

The 18O-labeled isoAsp peptides may be methylated by PIMT again[38], leading to two

18O incorporation. This phenomenon is similar to protease-catalyzed 18O-labeling of C-termini

of peptides that Fenselau’s group has developed[30, 35, 36]. This however does not affect the

database search and site identification of labeled isoAsp. To simplify data analysis,

incorporation of two 18O atoms can be minimized by quenching PIMT activity (e.g., by adding guanidine) after the methylation step, as demonstrated in Figure 2-6. Because of the stoichiometry may not be precisely controlled, such analysis should be treated not as absolute but rather semi-quantitative.

71

18 O-isoAsp 18 A) 851.47 O-Asp 100 851.49 80 Marker 60 SuccinimideMethylester 40 833.58 865.41 20 Relative Abundance Relative

0 851.56 100 851.44 80 0.24M GndHCl 60 40 20

Relative Abundance Relative 0

851.39 100 80 851.60 1.25M GndHCl 60 40 20

Relative Abundance 0 11 12 13 14 15 16 17 18 19 20 Time (min)

B) 849.34 100 Unlabeled IsoAsp 80 60 850.36 40 20 851.34 852.32 Relative Abundance 0 18 851.36 O-isoAsp-0.24M GndHCl 100 80 60 853.40 852.38 40 854.36 20 855.30 849.34 850.34 Relative Abundance 0

851.40 18 100 O-isoAsp-1.25M GndHCl 80 60 40 852.38 20 853.38 849.44 850.42 854.42 Relative Abundance 0 848 849 850 851 852 853 854 855 856 m/z

Figure 2-6. Guanidine HCl (Gnd-HCl, 1.25 M) quenched PIMT activity during hydrolysis and thereby minimized the incorporation of two 18O-atoms into isoAsp peptides. The blue arrow indicates that two 18O-atoms were incorporated into some peptides when 0.24 M guanidine HCl was used.

72

2.4.3 Screening of 18O-Labeled isoAsp by Mass Spectrometry

Compared to unlabeled isoAsp, 18O-labeling at isoAsp results in a 2 Da mass increase that

can be easily detected with standard mass spectrometry (for work flow, see Scheme 2-3). As

shown in Figure 2-1A, the 18O-labeled isoAsp-DSIP peptide was detected as a singly charged ion

at m/z 851, 2 Da higher than that from the unlabeled Asp-DSIP peptide (m/z 849). Similarly, a

shift of 0.6663 m/z was observed for the triply charged peptide LC 69-108 from IgG1 (Figure 2-

1B). The high 18O-labeling efficiency (near completion) resulted in a clean shift of the isotopic pattern, thereby enabling automatic recognition of the 18O-labeled isoAsp peptides using a

standard data analysis algorithm. For example, multiple isoAsp peptides in IgG1 were readily identified in this manner, some of which are listed in Table 2-1. Therefore, the method reported

here is suitable for automatic, high throughput screening of isoAsp. 17O-water can also be used

for labeling, but is less desirable due to its higher cost and smaller mass shift (1 Da) on the

labeled peptides.

73

Table 2-1. Representative isoAsp containing peptides detected in IgG1.

Peak Ratio

Retention in the

Time Charge Obs. m/z Cal. m/z Peptide Sequence IsoD Site* stressed

(min) sample

(%)

98.41 2 840.4055 840.4073 271FNWYVisoDGVEVHNAK284 9.2 HC, D276

100.07 2 839.4036 839.4052 271FNWYVDGVEVHNAK284 90.8

114.75 2 938.4655 938.4672 389TTPPVLDSisoDGSFFLYSK405 0.6

HC, D397

117.12 2 937.4632 937.4651 389TTPPVLDSDGSFFLYSK405 99.4

131.24 3 1405.9721 1405.9707 69SGTSASLAITGLQAEDEADYYCQSYisoDSSLSGLYVFGTGTK108 1.4 LC, D94

(CDR) 135.25 3 1405.3058 1405.3026 69SGTSASLAITGLQAEDEADYYCQSYDSSLSGLYVFGTGTK108 98.6

*Note: LC, HC, and CDR are light chain, heavy chain, and complementary-determining region, respectively. Cysteine was alkylated.

74

2.4.4 Co-elution of isoAsp and Asp and Overlapping of Isotope Patterns

As shown in Table 2-1 and Figures 2-2 ~2-6, isoAsp peptide and its Asp counterpart are

fully or partially resolved by liquid chromatography. However, occasionally they may co-

elute[16, 49]. Under such a scenario, the isotopic envelope of the unlabeled (16O) Asp peptide

peak overlaps with that of the labeled (18O) isoAsp species, potentially complicating the analysis

of the isotope pattern. Although the mixed isotope patterns can always be deconvoluted[30, 34,

50], as has been done in the analysis of succinimide and deamidation by 18O-labeling[29, 32], the practical issue is whether the intensity from the 18O-species is sufficiently high (e.g., when

isoAsp is in low abundance) so that the 2 Da mass shift can be automatically recognized by data analysis algorithms. In addition, the deconvolution of the mixed isotope patterns complicates

isoAsp analysis. Remediation includes changing chromatographic conditions for peptide

separation or using different proteases to generate isoAsp and Asp peptides with different

sequences. A more direct and surefire approach, as we have reported, is to treat the sample with

endoprotease Asp-N (EC 3.4.24.33), which cleaves peptides at the N-terminal side of Asp but

not isoAsp residues[16, 46, 47]. As such, sample treatment with Asp-N has been used as an

effective method to selectively remove Asp peptides from their isoAsp counterparts, enriching

isoAsp species for subsequent analysis[16].

2.4.5 Identification of isoAsp Sites in 18O-Labeled Peptides

In addition to screening isoAsp in a given peptide, the precise location of modification can be facilely deduced by tandem mass spectrometry with high confidence, as the specific 18O

75

incorporation imparts 2 Da mass increases on the fragmentation ions (such as b and y ions in

CID mode) that contain isoAsp (Figure 2-7~Figure 2-10). For example, the isoAsp modification

site in a tryptic peptide from the IgG1 sample was identified to be at isoAsp276 by its MS/MS

data (Figure 2-7). The mass increment of 2 Da corresponding to 18O incorporation was evident

in a series of y9-y12 and b6-b13 ions with normal intensity as peaks from the unlabeled (16O)

peptide, leading to unambiguous identification of the modification site. In comparison,

ETD/ECD mass spectrometry distinguishes isoAsp from Asp peptides based on only a single

pair of characteristic reporter ions of isoAsp (c.+58 and z-57) that also are of various intensity

under different conditions[16-20]. As another example, the third peptide in Table 2-1 is located in the light chain (LC) of IgG1 at amino acid positions 69-108 and contains 40 amino acids with three closely positioned Asp residues, posing challenge for site identification. Nevertheless, the modification site was automatically detected and unambiguously established to be isoAsp94 based on the isotopic patterns conferred by 18O incorporation, again exemplifying the utility of

our method.

76

y9 970.4 A)

b5 y12++ 710.2 y10 1069.5

b6 y11 y4 697.4 y5 827.3 1232.5 469.3 y6 b3 568.4 611.3 y3 b4 796.5 853.5 b2 448.2 y8 b11 b12 332.2 y7 884.4 983.3 1112.3 1211.41348.5 1418.5 b13 262.2 b7 b8 b9 b10 y121462.61533.7 400 600 800 1000 1200 1400 1600 m/z

y9 968.4 B)

y12++ 709.1 y10 697.4 1067.5 y6

y11 1230.5 y4 469.3 y5 568.4 710.2 b6 b5 448.3 825.3 b11 b3 y3 853.4 1346.5 611.2 y8 332.2 796.4 1209.4 b12 b13 b4 y7 981.3 1416.6 262.1 b8 1110.5 b10 y121460.61531.6 b2 882.5 b9 b7 400 600 800 1000 1200 1400 1600 m/z

Figure 2-7. Identification of isoAsp site in a doubly charged tryptic peptide HC 271-284 from the

stressed IgG1. The MS/MS spectra with (A) and without 18O tag (B) were obtained by collision induced dissociation (CID) of the (M+2H)2+ precursor ions m/z 840.41 and 839.40 for the top and bottom traces, respectively . Cysteine is alkylated.

77

A)

y8 872.5

b13 b14 y19++ y20++ 1187.6 y22++ 1037.2 b12 1118.9 1258.5 1059.5 y10 y6 1129.7 1042.6 y11 610.4 y14++ 1388.8 709.4 1387.6 y13 1329.7 y26++ 788.4 889.4 b15 1480.2

b9-H2O b10 770.4 b27++ 1444.6 b15[3+] b8 463.4 1200.8 675.5 y17++ y16++ y21++ y23++ y24++ 752.5 892.6y18++ 1243.0 604.4 849.1 1293.2 1358.4 1416.9 b16++ 956.8 y12 b7 985.6 y14 639.2 b11 y9 946.5 514.5

500 600 700 800 900 1000 1100 1200 1300 1400 1500 m/z

B) y8 872.5

y10 1042.6 y11 1129.6 b13 1187.6 y14++ 1036.2 1117.8 1388.5 709.4 y19++ y6 y20++ b15 b12 610.3 1387.6 788.4 1059.5 y22++ y21++1257.0 b9-H2O 1199.4 y13 y26++ 770.5 889.5 y18++ 1258.7 1329.7 1479.2 b15[3+] b10 b14 1333.5 b27++ 955.7 1443.6 463.3 1292.3 752.4 1242.7 657.3 b11 y9 y23++ 1357.0 604.3 b16++ y17++946.6 y12 y24++ b7 b8 985.6 1416.7 675.4 y16++ 891.6 y14 848.8 500 600 700 800 900 1000 1100 1200 1300 1400 1500 m/z

Figure 2-8. Identification of isoAsp site in a triply charged tryptic peptide LC69-108 from the

IgG1 sample. The MS/MS spectra with (A) and without 18O tag (B) were obtained by collision-

induced dissociation (CID) of the (M+3H)3+ precursor ions, m/z 1406.64 and 1405.97, for the top and bottom traces, respectively. Cysteine was alkylated.

78

y15++ 837.6 A)

y11 907.8 1267.6 y3 b5y4 b12++b6 y5 b7 y14++y6 y16++y7 b9y8 b10 y9b11 y10 b12 b13y12 b14 y14 b15y15 b16 400 600 800 1000 1200 1400 1600 1800 m/z 397.3 609.4 724.3 1152.6 1380.6

y15++ 836.7

906.7 y3 b5y4 b6 y5 b7 y6b8 y16++y7 b9y8 y9b11 y10 b12 y11 b13y12 b14 y14 b15y15 b16 609.5 724.4 1150.6 1265.6 1378.6 397.3 400 600 800 1000 1200 1400 1600 1800 m/z

Figure 2-9. Identification of isoAsp site in a doubly charged tryptic peptide HC389-405 from the

IgG1 sample. The MS/MS spectra with (top) and without 18O tag (bottom) (A) were obtained by collision-induced dissociation (CID) of the (M+2H)2+ precursor ions, m/z 938.97 and 937.97, for

top and bottom trace, respectively. (B) and (C) are the zoomed-in views.

79

y15-H2O++ 828.6 B)

b6 609.4

726.4

724.3 510.3 b6-H2O b7 y4 524.4 y5 591.2 657.4 b6-2H2O b5 b8 y3 573.4 811.5 496.2 b7-H2O 397.3 804.6 706.5 y6 b5-2H2O b5-H2O 547.4 460.4 565.3 776.3 793.5 434.2 478.3 627.4 740.4

400 450 500 550 600 650 700 750 800 m/z y15-H2O++ 827.7

b7 724.4 b6 609.5

y3 397.3 522.4

y4 y5 510.3 b6-H2O 657.4 b5 591.4 y6 496.3 b6-2H2O b7-H2O 804.5 b5-2H2O 811.5 573.5 706.5 460.4 b5-H2O b8 547.4 425.4 452.3 478.5 689.3 776.6

400 450 500 550 600 650 700 750 800 m/z

Figure 2-9. Continued.

80

907.8 C)

y11 1267.6

M-H2O++ 929.4 y10 1152.6

928.3 y12 b9 1380.6 y9 948.5 y15 y8 1065.6 1577.7 1673.8 1479.7 891.6 b12-H2O 1480.6 1643.9 1674.9 1201.8 1366.7 1576.8 b15 y7 b14 y14 b12 b13 b15-H2O b16 b10 1219.5 985.5 1072.6 1442.7 1624.6 1729.7

900 1000 1100 1200 1300 1400 1500 1600 1700 1800 m/z

906.7

M-H2O++ 928.6

y11 1265.6 y10 1150.6 y12 1378.6 b15-H2O b14 y14 1622.7b15 y8 1477.6 1574.8 926.4 948.5 1640.8y15 b9 1671.9 y9 1364.7 1063.6 b12-H2O 891.5 b13 b15-2H2O b16 1199.6 1605.7 y7 b10 b11 b12 1727.8 983.6 1070.4 1217.6 1527.9 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 m/z

Figure 2-9. Continued.

81

y7 594.3 A)

M-H2O 833.4

b6 b7-H2O 560.3 y8-2H2O 629.3 b5 b6-H2O b5-H2O489.3 542.2 471.3 480.3 b7 M-2H2O y5 y7-H2O b8-H2O 537.2 y8-H2O 815.4 576.4 686.4 b4 y6 b7-2H2O 647.3 b8 y8 372.3 390.2 y8-3H2O 704.3 611.3 665.4 374.3 y3 303.3315.3 443.2 292.3 b3 363.3y4 798.4

300 350 400 450 500 550 600 650 700 750 800 m/z

M-H2O B) 831.4 y7 592.3

b7-H2O y8-2H2O 627.3

b7 b6 y8-H2O b5-H2O 558.3 645.3 469.2 b8-H2O M-2H2O b5 684.3 b8 813.3 487.2 b6-H2O 702.3 540.3 y7-3H2O y7-H2O b7-2H2O y8 538.2 574.3 663.3 388.2 441.2 y8-3H2O 609.3 b8-2H2O 301.2 535.3 363.2 370.2 666.3 478.3 y6 796.2 292.3 356.3 b4372.3 y3 315.4 y5 720.3 b3 y4 300 350 400 450 500 550 600 650 700 750 800 m/z

Figure 2-10. Identification of isoAsp site in the stressed Asp-DSIP peptide by tandem mass spectrometry. The MS/MS data of isoAsp-DSIP with 18O tag (A) and Asp-DSIP without 18O tag (B) were obtained by collision induced dissociation (CID) of the (M+H)1+ precursor ions

m/z 851 and 849 for the top and bottom traces, respectively.

82

2.5 Conclusions

We present herein an approach combining chemo-enzymatic transformations to specifically label isoAsp with 18O for facile analysis by standard mass spectrometry and routine data analysis algorithms. The complete incorporation of 18O into isoAsp makes it feasible for quantitative analysis; this project is currently under development in our laboratories.

2.6 References

[1] Noguchi S. Structural changes induced by the deamidation and isomerization of asparagine revealed by the crystal structure of Ustilago sphaerogena ribonuclease U2B. Biopolymers 2010;93:1003-10.

[2] Doyle HA, Gee RJ, Mamula MJ. Altered immunogenicity of isoaspartate containing proteins. Autoimmunity 2007;40:131-7.

[3] Moss CX, Matthews SP, Lamont DJ, Watts C. Asparagine deamidation perturbs antigen presentation on class II major histocompatibility complex molecules. J Biol Chem 2005;280:18498-503.

[4] Shimizu T, Matsuoka Y, Shirasawa T. Biological significance of isoaspartate and its repair system. Biol Pharm Bull 2005;28:1590-6.

[5] Clarke S. Aging as war between chemical and biochemical processes: protein methylation and the recognition of age-damaged proteins for repair. Ageing Res Rev 2003;2:263-85.

[6] Curnis F, Longhi R, Crippa L, Cattaneo A, Dondossola E, Bachi A, et al. Spontaneous formation of L-isoaspartate and gain of function in fibronectin. J Biol Chem 2006;281:36466- 76.

[7] Robinson NE, Robinson AB. Molecular Clocks: Deamidation of Asparaginyl and Glutaminyl Residues in Peptides and Proteins. Cave Junction, Oregon, USA: Althouse Press; 2004.

[8] Desrosiers RR, Fanelus I. Damaged proteins bearing L-isoaspartyl residues and aging: a dynamic equilibrium between generation of isomerized forms and repair by PIMT. Curr Aging Sci 2011;4:8-18.

[9] Reissner KJ, Aswad DW. Deamidation and isoaspartate formation in proteins: unwanted alterations or surreptitious signals? Cell Mol Life Sci 2003;60:1281-95. 83

[10] Manning MC, Chou DK, Murphy BM, Payne RW, Katayama DS. Stability of protein pharmaceuticals: an update. Pharm Res 2010;27:544-75.

[11] Wakankar AA, Borchardt RT. Formulation considerations for proteins susceptible to asparagine deamidation and aspartate isomerization. J Pharm Sci 2006;95:2321-36.

[12] Liu H, Gaza-Bulseco G, Faldu D, Chumsae C, Sun J. Heterogeneity of monoclonal antibodies. J Pharm Sci 2008;97:2426-47.

[13] Geiger T, Clarke S. Deamidation, isomerization, and racemization at asparaginyl and aspartyl residues in peptides. Succinimide-linked reactions that contribute to protein degradation. J Biol Chem 1987;262:785-94.

[14] Liu YD, van Enk JZ, Flynn GC. Human antibody Fc deamidation in vivo. Biologicals 2009;37:313-22.

[15] Rehder DS, Chelius D, McAuley A, Dillon TM, Xiao G, Crouse-Zeineddini J, et al. Isomerization of a single aspartyl residue of anti-epidermal growth factor receptor immunoglobulin gamma2 antibody highlights the role avidity plays in antibody activity. Biochemistry 2008;47:2518-30.

[16] Ni W, Dai S, Karger BL, Zhou ZS. Analysis of isoaspartic Acid by selective proteolysis with Asp-N and electron transfer dissociation mass spectrometry. Anal Chem 2010;82:7485-91.

[17] Yang H, Zubarev RA. Mass spectrometric analysis of asparagine deamidation and aspartate isomerization in polypeptides. Electrophoresis 2010;31:1764-72.

[18] Chan WY, Chan TW, O'Connor PB. Electron transfer dissociation with supplemental activation to differentiate aspartic and isoaspartic residues in doubly charged peptide cations. J Am Soc Mass Spectrom 2010;21:1012-5.

[19] Cournoyer JJ, Pittman JL, Ivleva VB, Fallows E, Waskell L, Costello CE, et al. Deamidation: Differentiation of aspartyl from isoaspartyl products in peptides by electron capture dissociation. Protein Sci 2005;14:452-63.

[20] O'Connor PB, Cournoyer JJ, Pitteri SJ, Chrisman PA, McLuckey SA. Differentiation of aspartic and isoaspartic acids using electron transfer dissociation. J Am Soc Mass Spectrom 2006;17:15- 9.

[21] Aswad DW, Paranandi MV, Schurter BT. Isoaspartate in peptides and proteins: formation, significance, and analysis. J Pharm Biomed Anal 2000;21:1129-36.

[22] Schurter BT, Aswad DW. Analysis of isoaspartate in peptides and proteins without the use of radioisotopes. Anal Biochem 2000;282:227-31.

[23] McFadden PN, Clarke S. Methylation at D-aspartyl residues in erythrocytes: possible step in the repair of aged membrane proteins. Proc Natl Acad Sci U S A 1982;79:2460-4.

84

[24] O'Connor CM. Protein L-isoaspartyl, D-aspartyl O-methyltransferases: catalysts for protein repair. In: Clarke SG, Tamanoi F, editors. Enzymes: Protein Methyltransferases. Amsterdam: Elsevier; 2006. p. 385-433.

[25] Johnson BA, Aswad DW. Enzymatic protein carboxyl methylation at physiological pH: cyclic imide formation explains rapid methyl turnover. Biochemistry 1985;24:2581-6.

[26] Barber JR, Clarke S. Demethylation of protein carboxyl methyl esters: a nonenzymatic process in human erythrocytes? Biochemistry 1985;24:4867-71.

[27] Alfaro JF, Gillies LA, Sun HG, Dai S, Zang T, Klaene JJ, et al. Chemo-enzymatic detection of protein isoaspartate using protein isoaspartate methyltransferase and hydrazine trapping. Anal Chem 2008;80:3882-9.

[28] Zhu JX, Aswad DW. Selective cleavage of isoaspartyl peptide bonds by hydroxylamine after methyltransferase priming. Anal Biochem 2007;364:1-7.

[29] Chu GC, Chelius D, Xiao G, Khor HK, Coulibaly S, Bondarenko PV. Accumulation of succinimide in a recombinant monoclonal antibody in mildly acidic buffers under elevated temperatures. Pharm Res 2007;24:1145-56.

[30] Fenselau C, Yao X. 18O2-labeling in quantitative proteomic strategies: a status report. J Proteome Res 2009;8:2140-3.

[31] Gaza-Bulseco G, Li B, Bulseco A, Liu HC. Method to differentiate asn deamidation that occurred prior to and during sample preparation of a monoclonal antibody. Anal Chem 2008;80:9491-8.

[32] Li X, Cournoyer JJ, Lin C, O'Connor PB. Use of 18O labels to monitor deamidation during protein and peptide sample processing. J Am Soc Mass Spectrom 2008;19:855-64.

[33] Terashima I, Koga A, Nagai H. Identification of deamidation and isomerization sites on pharmaceutical recombinant antibody using H(2)(18)O. Anal Biochem 2007;368:49-60.

[34] Xiao G, Bondarenko PV, Jacob J, Chu GC, Chelius D. 18O labeling method for identification and quantification of succinimide in proteins. Anal Chem 2007;79:2714-21.

[35] Yao X, Afonso C, Fenselau C. Dissection of proteolytic 18O labeling: endoprotease-catalyzed 16O-to-18O exchange of truncated peptide substrates. J Proteome Res 2003;2:147-52.

[36] Yao X, Freas A, Ramirez J, Demirev PA, Fenselau C. Proteolytic 18O labeling for comparative proteomics: model studies with two serotypes of adenovirus. Anal Chem 2001;73:2836-42.

[37] Ye X, Luke B, Andresson T, Blonder J. 18O stable isotope labeling in MS-based proteomics. Brief Funct Genomic Proteomic 2009;8:136-44.

85

[38] Lindquist JA, McFadden PN. Incorporation of two 18O atoms into a peptide during isoaspartyl repair reveals repeated passage through a succinimide intermediate. J Protein Chem 1994;13:553-60.

[39] Villa ST, Xu Q, Downie AB, Clarke SG. Arabidopsis Protein Repair L-Isoaspartyl Methyltransferases: Predominant Activities at Lethal Temperatures. Physiol Plant 2006;128:581-92.

[40] Zhang Z. De novo peptide sequencing based on a divide-and-conquer algorithm and peptide tandem spectrum simulation. Anal Chem 2004;76:6374-83.

[41] Zhang Z. Prediction of low-energy collision-induced dissociation spectra of peptides. Anal Chem 2004;76:3908-22.

[42] Clarke S, Banfield K. Homocysteine in Health & Disease. In: Carmel R, Jacobsen DW, editors. New York: Cambridge University Press; 2001. p. 63-78.

[43] Cannon LM, Butler FN, Wan W, Zhou ZS. A stereospecific colorimetric assay for (S,S)- adenosylmethionine quantification based on thiopurine methyltransferase-catalyzed thiol methylation. Anal Biochem 2002;308:358-63.

[44] Dorgan KM, Wooderchak WL, Wynn DP, Karschner EL, Alfaro JF, Cui Y, et al. An enzyme- coupled continuous spectrophotometric assay for S-adenosylmethionine-dependent methyltransferases. Anal Biochem 2006;350:249-55.

[45] Hendricks CL, Ross JR, Pichersky E, Noel JP, Zhou ZS. An enzyme-coupled colorimetric assay for S-adenosylmethionine-dependent methyltransferases. Anal Biochem 2004;326:100-5.

[46] Bohme L, Bar JW, Hoffmann T, Manhart S, Ludwig HH, Rosche F, et al. Isoaspartate residues dramatically influence substrate recognition and turnover by proteases. Biol Chem 2008;389:1043-53.

[47] Kameoka D, Ueda T, Imoto T. A method for the detection of asparagine deamidation and aspartate isomerization of proteins by MALDI/TOF-mass spectrometry using endoproteinase Asp-N. J Biochem 2003;134:129-35.

[48] Sargaeva NP, Lin C, O'Connor PB. Differentiating N-terminal aspartic and isoaspartic acid residues in peptides. Anal Chem 2011;83:6675-82.

[49] Krokhin OV, Antonovici M, Ens W, Wilkins JA, Standing KG. Deamidation of -Asn-Gly- sequences during sample preparation for proteomics: Consequences for MALDI and HPLC- MALDI analysis. Anal Chem 2006;78:6645-50.

[50] Mason CJ, Therneau TM, Eckel-Passow JE, Johnson KL, Oberg AL, Olson JE, et al. A method for automatically interpreting mass spectra of 18O-labeled isotopic clusters. Mol Cell Proteomics 2007;6:305-18.

86

Chapter 3: Discovery of Undefined Protein Crosslinking Chemistry: A Comprehensive

Methodology Utilizing 18O-labeling and Mass Spectrometry

Reproduced with permission from “Min Liu, Zhongqi Zhang, Tianzhu Zang, Chris

Spahr, Janet Cheetham, Da Ren, and Zhaohui Sunny Zhou. Discovery of Undefined Protein

Crosslinking Chemistry: A Comprehensive Methodology Utilizing 18O-labeling and Mass

Spectrometry. Analytical Chemistry 2013, 85, 5900-5908.” Copyright [2013] American

Chemical Society.

Co-authors’ work in this chapter: Min Liu: experimental design and execute, data analysis, manuscript writing and revision; Zhonqi Zhang: new 18O-screening function in MassAnalyzer algorithm, data analysis, manuscript writing and revision; Tiazhu Zang: data analysis, manuscript writing and revision; Chris Spahr: data analysis, manuscript writing and revision; Janet Cheetham: idea contribution, manuscript writing and revision, grant support; Da Ren: idea contribution, experimental design, manuscript writing and revision; Zhaohui Sunny Zhou: idea contribution, experimental design, data analysis, manuscript writing and revision and grant support.

87

3.1 Abstract

Characterization of protein crosslinking, particularly without prior knowledge of the

chemical nature and site of crosslinking, poses a significant challenge due to their intrinsic

structural complexity and the lack of a comprehensive analytical approach. Towards this end,

we have developed a generally applicable workflow—XChem-Finder that involves four stages.

(1) Detection of crosslinked peptides via 18O-labeling at C-termini. (2) Determination of the putative partial sequences of each crosslinked peptide pair using a fragment ion mass database search against known protein sequences coupled with a de novo sequence tag search. (3)

Extension to full sequences based on protease specificity, the unique combination of mass, and

other constraints. (4) Deduction of crosslinking chemistry and site. The mass difference

between the sum of two putative full-length peptides and the crosslinked peptide provides the

formulas (elemental composition analysis) for the functional groups involved in each cross-

linking. Combined with sequence restraint from MS/MS data, plausible crosslinking chemistry

and site were inferred, and ultimately, confirmed by matching with all data. Applying our

approach to a stressed IgG2 antibody, ten cross-linked peptides were discovered and found to be

connected via thioether originating from disulfides at locations that had not been previously

recognized. Furthermore, once the crosslink chemistry was revealed, a targeted crosslink search

yielded four additional crosslinked peptides that all contain the C-terminus of the light chain.

88

3.2 Introduction

Protein crosslinking exists in a myriad of biological systems and protein pharmaceuticals,

such as , ubiquitylated proteins, and monoclonal antibodies[1-6]. Rich and diverse

chemistry is involved as well, including disulfide[1], dityrosine[2], lysinoalanine[7, 8], lanthionine[7, 8], etc. Additionally, chemical crosslinking is widely used to probe protein structures and interactions[9, 10]. Due to their intrinsic structural complexity, characterization of

crosslinked peptides is complex, but nonetheless tractable if the crosslink chemistry is pre-

defined. For example, a database of the intact mass (precursor ion) and the tandem mass spectra

(fragmentation ions) for all possible combinations of crosslinked peptides (e.g., two cysteines to

form a disulfide bond) can be generated computationally, and subsequently, correlated with

observed spectra to identify both the sequences and sites of modification. Such a database search

strategy is the cardinal principle behind many common algorithms, including ASAP[11],

X!link[12], BLink[13], Xlink-Identifier[14, 15] and MassAnalyzer[16]. Moreover, clever

experimental tricks, such as judicious isotope labeling[17, 18], can markedly simplify the

process and enhance the confidence level for assignment with the assistance of software tools

e.g., Pro-Cross-link[19, 20], PepLynx[21], xQuest[22], iXLink/doXLink/XlinkViewer[23]. To

date, the rapid advancements in mass spectrometers, data analysis algorithms, and computational

capacity have made analyses of crosslinking with known chemistry much more accessible if not

routine (for recent reviews, see [9, 10]).

Yet the aforementioned approaches are futile if the crosslink chemistry is unknown or not

pre-defined; for one thing, no theoretical mass or spectrum can be simulated. Even if crosslinked

peptides have been identified, it remains a tall order to deduce the sequences and sites of

89

crosslinking. Conceptually, de novo sequencing should provide at least partial sequences for

crosslinked peptides (see review paper[24]). Under typical fragmentation conditions, however, a

crosslinked peptide gives rise to at least five sets of b- and y-ions that are intertwined and indistinguishable. In addition, high-charge-states (≥3+) are typically featured in the crosslinked

peptides, resulting in multiple charge fragment ions (e.g. 2+ or 3+) and further complicating data

interpretation[12, 22, 25]. High resolution mass spectrometers (e.g. Orbitrap), capable of the

determination of fragment ion charge state, have become widely available only recently. As

such, the drastically increased complexity in tandem spectrum renders de novo sequencing

ineffective in most cases. Unknown or undefined crosslinks are typically discovered

serendipitously, requiring isolation of the crosslinked peptides and “old-fashioned” protein

chemistry. Even so, full characterization remains elusive for many cases. For instance, the non-

reducible crosslinks between an IgG heavy chain and a light chain in a murine monoclonal

antibody, OKT3, and between two heavy chains of IgG2 could not be elucidated even after

intensive efforts[26, 27].

To facilitate systematic and unbiased discovery of unknown crosslinks, we have

developed a generally applicable workflow—XChem-Finder (Scheme 3-1). First, crosslinked

peptides were isotopically labeled at the C-termini to facilitate their detection[19-21, 28].

Proteins were digested in 18O-(heavy) and 16O-(light) water, respectively, followed by

LC/MS/MS analysis. At full scan, the distinct isotope pattern of the crosslinked peptides (a mass

increase of 8 Da) compared to the non-crosslinked linear species (a mass increase of 4 Da) was

readily detected by a spectral analysis algorithm[19-21]. The second and more challenging part

is to determine the sequences, chemical nature and site of crosslink. The workflow breaks down

the challenge into workable sub-steps. (a) The candidate ions of crosslinked peptides underwent

90

high resolution MS/MS analysis. Based on their isotope patterns, linear and crosslinked

fragment ions are divided into different groups (Table 3-1). (b) Mass of linear fragment ions

were searched against the protein sequence, yielding partial sequences (often sequence ladders)

of each chain of the crosslinked peptides. In parallel, de novo sequencing of crosslinked

fragment ions affords sequence tags. (c) Combining the partial sequences and sequence tags,

putative full-length sequences of each chain were deduced based on protease specificity, the

unique combination of mass, and other constraints. (d) The difference between the combined

mass of the two putative full-length peptides and the observed mass of a crosslinked peptide

provides the formula for the functional group involved in the crosslink (mass to formula).

Combined with sequence restraint from MS/MS data, the crosslink chemistry and site were

inferred, and ultimately, confirmed by matching with all data.

Applying our XChem-Finder approach to a stressed IgG2, ten crosslinked peptides were

discovered and found to be linked via thioether that originated from disulfides at locations that

had not been reported. Furthermore, once the crosslinking chemistry was revealed, a targeted

search yielded additional four crosslinked peptides that all contain the C-terminus of light chain.

3.3 Experimental Section

3.3.1 Chemicals

All chemicals were reagent grade or above. Guanidine hydrochloride (GndHCl),

dithiothreitol (DTT), iodoacetic acid (IAA), trifluoroacetic acid (TFA), acetonitrile (ACN),

HPLC-grade water, and bradykinin were from Sigma-Aldrich (St. Louis, MO, USA).

91

Sequencing grade trypsin was obtained from Roche (Indianapolis, IN, USA). 18O-water (97%) was obtained from Cambridge Isotope Laboratories (Andover, MA, USA). Recombinant monoclonal antibody anti-streptavidin immunoglobulin gamma 2 (IgG2) was produced in

Chinese hamster ovary (CHO) cells (Amgen, Thousand Oaks, CA, USA), purified according to standard manufacturing procedures, formulated at a concentration of 20 mg/mL in 50 mM sodium acetate pH 5.2, and stored at -70 °C.

Stage 1: Detection of Stage 2: Determination of Partial Sequences Stage 3: Stage 4: Deduction of

Crosslinked Peptides Inference of Crosslink Site &

Full Sequences Chemistry

Tryptic digestion in Group fragment ions via mass shift Protease Elementary

16O- and 18O-water 0 or +4 Da +8 Da specificity & composition analysis of

(linear fragment ions) (crosslinked fragment ions) other mass difference

Match mass of fragment de novo sequencing constraints (combined native linear

ions with peptide sequences (manual) chains vs crosslink

(FindPept) peptide)

Mass shift of 8 Da Partial peptide sequences; Sequence tag Putative full Confirmed structure

often sequence ladders sequences of (Thioether)

both chains

Scheme 3-1. Flow chart of XChem-Finder in four main stages: (1) detection of crosslink by 18O- labeling; (2) determination of partial sequences of crosslinked peptides from the mass of fragmentation ions; (3) inference of full sequence of each chain; (4) determination of crosslink site and chemistry by elemental composition analysis and chemical intuition.

92

Table 3-1. Fragmentation ions of a cross-link peptide. 18O-labeling at C-termini were shown in red. Each letter (e.g., abc and XYZ) represents

one amino acid residue. The symbol (?) denotes unknown cross-link chemistry. Five —and intertwined—sets of b- and y-ions are in a cross-link

peptide. The ions with consecutive bond dissociations or internal fragments are excluded due to lower abundance in CID.

Precursor Ion

Group Structure 18O mass Linear or cross-linked Searchable (Match Sequence Tags (De Notes number shift (Da) partial sequence) novo sequence)

1 0 Linear Yes Yes b-ions, single chain

2 +4 Linear Yes Yes y-ions, single chain

3 +4 Cross-linked No Yes b-ions, cross-linked

4 +8 Cross-linked No Yes y-ions, cross-linked

5 +4 Linear No No Modified single chain from cleavage of the cross-linking

93

3.3.2 Generation of Stressed Sample

After being buffer exchanged into 100 mM Tris at pH 8.5, the IgG2 antibody was

incubated at 50 °C for 7 days in the dark.

3.3.3 Reduction, Alkylation, Tryptic Digestion and 18O-Labeling of the IgG2

Tryptic digestion of the stressed IgG2 was performed similarly to the procedure described

by Ren et al[29]. Briefly, IgG2 (20 mg/mL) was diluted to 1 mg/mL in a denaturing buffer (7.5

M GndHCl, 2 mM EDTA and 0.25 M Tris-HCl, pH 7.5) to a final volume of 0.5 mL. Reduction

was accomplished with the addition of 3 μL of 0.5 M DTT followed by 30 min incubation at room temperature. S-Carboxymethylation was achieved with the addition of 7 μL of 0.5 M IAA; the reaction was carried out in the dark for 15 min at room temperature. Excess IAA was quenched with the addition of 4 μL of 0.5 M DTT. The reduced and alkylated IgG2 samples were subsequently exchanged into the digestion buffer (0.1 M Tris-HCl at pH 7.5) using a NAP-

5 size-exclusion column (GE Healthcare, Piscataway, NJ, USA). After two aliquots (200 µL each) of the above buffer-exchanged antibody were completely dried via Speed Vac and reconstituted separately into the same volume of 18O-water or 16O-water, 6 µL of 1 mg/mL

trypsin in 18O-water or 16O-water solution, respectively, was added to achieve a 1:25 (w/w)

enzyme/substrate ratio. The reaction mixtures were incubated at 37 °C for 30 min.

94

3.3.4 HPLC

Tryptic digests of the IgG2 (25 μL) were separated on a Jupiter C5 column (250 x 2.0 mm, 5 μm, 300Ǻ, Phenomenex, Torrance, CA, USA) at a temperature of 50 °C with a flow rate of 200 μL/min on a HPLC system (Agilent 1100, Palo Alto, CA, USA). Mobile phase A was

0.1% TFA in water (v/v) while mobile phase B contained 0.085% TFA / 90% ACN / 10%water.

A gradient was applied by holding at 2% B for 2 min, increasing to 22% B in 38 min, then 42%

B in 80 min, then 100% B in 25 min followed by holding at 100% B for 5 min. The column was re-equilibrated at 2% B for 30 min before next injection.

3.3.5 Mass Spectrometry

An LTQ-Orbitrap mass spectrometer (Thermo Fisher Scientific, San Jose, CA, USA) was used in-line with the HPLC system for the analyses of the IgG2 tryptic digests. A full MS scan

(with 60,000 resolution at m/z 400 and an automatic gain control (AGC) target value of 2x105) followed by data-dependent MS/MS scans of the three most abundant precursor ions was set up to acquire both the peptide mass and sequence information. The spray voltage was 5.5 kV, and the capillary temperature was 250 °C. The instrument was tuned using the doubly-charged ion of a synthetic peptide, bradykinin. The MS/MS spectra were obtained using collision-induced dissociation (CID) with normalized collision energy of 35%. For MS/MS with ion detection in the Orbitrap, the AGC target was set to 3x106, resolution to 7,500, and the precursor isolation width to 4 m/z unit. Peptides were identified by MassAnalyzer by comparing experimental

MS/MS to theoretically predicted MS/MS[16, 30-32]. Peak alignment between 16O- and 18O-

95

digest runs was automatically performed by MassAnalyser[33]. A new function was

implemented in MassAnalyzer to calculate the level of 18O-labeling in each peptide. The number

of incorporated 18O in a peptide is calculated from the following equation:

− = Mlabeled Munlabeled N18 Eq. 1 O 2.004

18 where Mlabeled and Munlabeled are the average masses of the O-labeled and unlabeled peptides,

respectively, as calculated by the centroids of their respective isotope envelopes. The value of

2.004 Da is the mass difference between an 18O atom and an 16O atom.

3.4 Results and Discussion

3.4.1 Stage 1: Identification of Crosslinked Peptides.

18O-labeling combined with mass spectrometry is commonly used to identify crosslinked

peptides [19-21, 28]. As shown in Scheme 3-1, newly created C-termini of tryptic peptides from

digestion in 18O-water were completely labeled by 18O. The distinct isotope pattern for the

labeled crosslinked peptides (a mass increase of 8 Da) compared to linear (non-crosslinked)

species (a mass increase of 4 Da) can be automatically detected by common spectral analysis

algorithms, such as an in-house isotopic screening algorithm (MassAnalyzer[16, 30-32]).

18O-Labeling. A general strategy to label the C-termini of peptides in 18O-water catalyzed by proteases for peptide identification and quantification is well documented[34-40].

Under our experimental conditions, near complete (four) 18O-incorporation for crosslinked

96 peptides was evident from the isotopic distributions (see Figure 3-1). The small amount of 16O- water (3%) in 18O-water had no significant impact in their isotopic patterns and the subsequent data analysis. 18O-labeling during tryptic digestion is only applied for newly created C-termini, not the C-termini of proteins. Hence, a crosslinked tryptic peptide that contains the C-terminus of the protein only has a mass shift of 4 Da, therefore cannot be differentiated from the linear peptides. This limitation can be overcome by using proteases with different substrate specificity or labeling N-termini (e.g., formaldehyde-d2 and sodium cyanoborohydride or succinic anhydride-d4)[41-43]. In this paper, this was satisfactorily addressed via a targeted mass search after the crosslink chemistry was elucidated. In addition, the deamidation of asparagine and isomerization of aspartic acid could potentially introduce 18O into peptides[44-46]; under our conditions, no isoaspartic acid was detected in the candidate peptides.

1351.3303 100 1350.9961 1351.6641 16 80 O-water

1351.9982 60

40 1350.6618 1352.3322

1352.6652 Relative Abundance Relative 20 1352.9986 0 1354.0024 100 1353.6682 1354.3362 18 80 8Da Mass Shift O-water

60 1354.6702 1353.3341 40 1355.0039

Relative Abundance Relative 20 1355.3376 1353.0002 1355.6718 0 1350 1351 1352 1353 1354 1355 1356 m/z

Figure 3-1. Isotopic distributions of the cross-linked peptide HC:G118-R129/HC:C215-K240

(RT at 91.17min). A mass shift of 8 Da was observed indicating four 18O-incorporation for the crosslinked peptide.

97

Screening of Crosslinked Peptides in Full Scan. An 18O incorporation value (Eq. 1) of

4.0 ± 0.3 was set as cut-off in our screening. The initial screening results for the stressed IgG2 are shown in Table 3-2. Each peak was evaluated for false positive. For instance, gas phase dimerization, commonly observed in mass spectrometry[47, 48], was readily determined based on retention time (same as the monomers) and mass (exactly double that of monomers). In addition, weak precursor ions (typically with peak intensity of 50,000 count or lower) with poor or no MS/MS data were excluded. Based on these criteria, ten candidates shown in red in Table

3-2 were selected for subsequent high resolution MS/MS analysis.

98

Table 3-2. The cross-linked peptide candidates identified by MassAnalyzer algorithm. False hits (gas phase dimers) and weak ions (with poor or no MS/MS data) were excluded from the subsequent analysis. Ten cross-linked peptide candidates shown in red were selected for high resolution MS/MS analysis in next step. For the ions with multiple charges, the one with the

18 18 highest intensity was examined. N O is the number of incorporated O.

Charge Retention m/z Average Monoisotopic Intensity Comments

Time Mass Mass

(RT, min) (Da) (Da)

3 39.93 1119.24 3355.58 3353.699 575968 3.86 Q(-17)SVVTQPPSVSGAPGQR(dimer)

3 60.37 1163.58 3487.76 3485.707 511960 3.84 YAASSYLSLTPEQWK(dimer)

3 76.76 1655.45 4963.26 4960.313 216540 3.82 NQFSLELTSVTAADTAVYYCAR(dimer)

4 87.91 1045.77 4179.76 4177.057 93846 3.78 Same as m/z1394.03

3 87.91 1394.03 4179.80 4177.059 386639 3.81 Cross-link candidate

3 88.56 1413.36 4237.87 4235.064 796803 3.82 Cross-link candidate

4 88.57 1060.53 4237.79 4235.065 175997 3.85 Same as m/z1413.36

5 88.59 848.42 4237.77 4235.065 85712 3.85 Same as m/z1413.36

3 91.17 1351.33 4051.60 4048.963 205489 3.85 Cross-link candidate

4 96.15 1605.54 6418.27 6414.117 159092 3.77 Cross-link candidate, similar to m/z1620.04(-58)

4 97.39 1620.04 6476.45 6472.120 168467 3.86 Cross-link candidate

4 97.88 1484.22 5932.92 5928.854 424306 3.79 Cross-link candidate

4 98.49 1475.73 5898.65 5894.856 71227 3.92 Cross-link candidate

4 98.99 1498.47 5990.95 5986.858 303560 3.82 Cross-link candidate

6 99.00 999.32 5990.99 5986.854 53191 3.87 Same as m/z1498.47

4 99.76 1452.20 5804.77 5800.765 401159 3.78 Cross-link candidate

3 99.78 1935.93 5804.71 5800.758 176055 3.80 Same as m/z1452.20

3 101.56 1893.23 5676.57 5673.664 58750 3.83 Miss monoisotopic peak. same as m/z1419.92

4 101.59 1419.92 5676.47 5672.670 66658 3.89 Cross-link candidate

99

3.4.2 Stage 2: Deduce Partial Sequence for Each Chain.

As illustrated in Scheme 3-1, this stage involves (a) grouping fragment ions based on

their isotope patterns imparted by their corresponding structural features (e.g., linear or cross-

linked), (b) deducing partial peptide sequences via a database search (match mass with partial

peptide sequences using FindPept) and de novo sequencing, and (c) determining most likely

candidate peptides.

Deconvolution of Fragment Ions. Most precursor ions for cross-linked peptides were

highly charged (e.g., 3+ or 4+), thus doubly and triply charged fragment ions abound, e.g. ion

m/z 839.49 (2+) and 1300.47 (3+) in Figure 3-2. The high resolution of the tandem mass

spectrum allowed us to measure the isotope envelope and hence determine the charge state. Also

considering fragment ion type (b- vs y-ion), monoisotopic neutral mass of each fragment ion

from a crosslinked peptide was calculated manually. For example, +17.0033 Da (the mass of

OH-) and -1.0073 Da (the mass of H+) were added to a singly charged b-and y-ion, respectively,

to obtain their neutral peptide mass. The high-resolution for the tandem spectra was crucial in determining the correct charge state and hence neutral mass; otherwise, incorrect monoisotopic mass would lead to false hits and even erroneous assignment.

Grouping Fragment Ions by 18O Incorporation. The fragment ions containing zero,

one, and two C-termini displayed a mass shift of 0, 4, and 8 Da, respectively, in the

corresponding MS/MS spectra obtained from 18O-water vs 16O-water (referred as 18O/16O rule in this paper) and accordingly, are divided into different groups (Table 3-1). For each crosslinked peptide, two sets of linear fragment ions that contain no crosslink site do exist for each chain.

100

One set is the b-ions prior to the crosslink site, which show no mass shifts with 18O-labeling and

thus are separated from other fragment ions (group 1 in Table 3-1). Another set is the y-ions to

the C-terminal side of the crosslink site (group 2 in Table 3-1), which contain two 18O with a mass shift of 4 Da. Essentially, these linear fragments are searchable in standard database, i.e., the mass can be matched with the corresponding peptide fragments. The freely available

FindPept (web.expasy.org/findpept/) was used for the search in this study. Each observed mass value of these linear fragment ions should match to a partial sequence of the crosslinked peptides, but also unrelated sequences (false hits). High mass accuracy (typical 10 ppm in our

FT MS/MS experiments) greatly limits false positives. Furthermore, multiple fragmentation ions collectively—and in combination with de novo sequencing as described below—narrow the hits to a selected few, if not one, candidate peptides.

Isotope pattern (8 Da mass shift with 18O-labeling) can also be readily used to isolate a

set of fragmentation ions that contain two C-termini (y-ions containing the crosslink site, see

group 4 in Table 3-1). First, these ions were excluded from database search (which is for linear

peptides), reducing false hits. Second and more importantly, this markedly simplified set of

tandem spectra could be used for de novo sequencing to yield sequence tags, as it was indeed the

case in our study.

Partial Sequence Search via FindPept. The neutral peptide monoisotopic mass

(obtained from the fragment ion bins of the mass shift of 0 and 4 Da, linear peptides, as shown in

Scheme 3-1) were searched against the known IgG2 sequence using FindPept with user-defined

mass error (10 ppm for the resolution of 7500 in FT-MS/MS in our experiments). FindPept also

allows users to define the residue modifications, for example, alkylation at all cysteine residues

101

(+58.005 Da for reaction with iodoacetic acid in our experiments). FindPept outputs a list of peptides that match the neutral peptide masses, and naturally, some are false hits. As such,

several complimentary steps (constraints) were taken to confirm the actual sequences (higher

probability and confidence level) and rule out false hits. It is worth noting that this is an iterative

process, so the steps can be taken in a different order based on individual situation.

As an example, the process is demonstrated using a triply charged crosslinked peptide

m/z 1351.33 (retention time at 91.17 min, G118-R129/C215-K240). The corresponding neutral

monoisotopic mass of its fragment ions were searched against the IgG2 sequence. The full list of

fragment ion peptides is shown in Table 3-3 and 3-4 and some are highlighted in Table 3-5.

A rewarding first step is to sort the peptides according to their positions in the full protein

sequence. As illustrated in Table 3-3 and 3-5, typically, at least one sequence ladder could be

readily identified. For example, the overwhelmingly large numbers of fragment ions (eight

peptides #3-10 in Table 3-5) that share C-terminal sequences (CPPCPAPPVAGPSVFLFPPKPK) were found, essentially affirming this is part of the true sequence. An immediate implication is that the largest observed fragment ion (2361.190 Da) sets an upper limit for the mass of the other chain (1687 Da) by subtraction from the observed total crosslink peptide mass (4048.963 Da).

Based on this criterion, fourteen peptides in Table 3-3 with a mass of significantly larger than

1687 Da were excluded.

Another two powerful constraints that can be applied to data analysis are based on the

protease specificity (referred to as the tryptic rule) and mass shift conferred by 18O-labeling

(18O/16O rule). For instance, the above mentioned eight peptides (peptides 3-10 in Table 3-5) are

likely from a tryptic peptide as they all end with lysine, and indeed, a mass shift of 4 Da was

observed for all the fragmentation ions from digestion in heavy and light water. Similarly, the

102

other two overlapping partial sequences are likely the N-terminal fragments of a single tryptic

peptide containing GPSVFPLA; and again, as expected, no mass shift was observed from 18O- labeling. Conversely, false hits can be ruled out; for example, the doubly charged fragment ion m/z 1101.0885 (Table 3-4) matches four peptide sequences

(W)GQGTLVTVSSASTKGPSVFPLAP(C), (T)APKLLIYGNSNRPSGVPDRF(S),

(Y)WGQGTLVTVSSASTKGPSVFPL, (C)PPCPAPPVAGPSVFLFPPKPK/(D). Since a mass shift from 18O-labeling was observed, an internal fragment was ruled out, and therefore, this

leaves only the last sequence with a C-terminal lysine as the only plausible choice.

At this point, the fragment ion mass search data indicated the peptide at m/z 1351.33 highly likely contains CPPCPAPPVAGPSVFLFPPKPK. For the second chain, although mass search of two b-ions suggests the presence of GPSVFPLA, additional data were warranted for higher confidence level in the assignment as described next.

De Novo Sequencing. This compliments nicely with the database search and afford sequence tags[25]. As shown in Scheme 3-1, the identification of the sequence tags was conducted using the crosslinked y-ions (8 Da mass shift in 18O-digest, group 4 in Table 3-1),

which obviously would not match any single chain peptides in the database. In Table 3-6, the

observed m/z value is from the most abundant isotopic peak in each isotopic envelope because the monoisotopic peak is weak for large ions at low level. The mass difference between a pair of

adjacent y-ions was calculated and compared manually to the mass of single amino acids and

dipeptides within a mass error of 0.05 Da. Matching single amino acid residues or dipeptides are

shown in red. The sequence tag SVFPLA was confirmed in the crosslink peptide chain G118-

103

R129, lending a strong support to the existence of peptide chain G118-R129 as a component of

the crosslinked peptide (Table 3-6).

In summary, the peptide chain C219-K240 (219CPPCPAPPVAGPSVFLFPPKPK240) and

G118-A125 (118GPSVFPLA125) was identified at the end of Stage 2 as parts of the crosslinked

peptide of m/z 1351.33.

3.4.3 Stage 3: Inference of Full Sequence for Each Chain

Extension to the Putative Full Sequences. Because the peptides were generated by

trypsin digestion, the putative partial sequences of crosslinked peptide chains were extended to

their corresponding full tryptic peptides (G118-R129, C215-K240, V294-K314) with mass of

1287.6282, 2911.3305, and 2502.2941 Da, respectively (Table 3-5). The mass difference

between the observed intact crosslinked peptide (4048 Da) and the first tryptic peptide C215-

K240 (2911 Da) is 1137 Da. This narrowed down the second crosslink chain to G118-R129

(1287.6282 Da) while a putative tryptic peptide V294-K314 (2502.2941 Da) is too large

(combined mass) to be the second chain. This leaves peptide G118-R129 as an only plausible

choice to pair with C215-K240. We were mindful that mis-cleavage might happen, which would

be considered if the initial inference did not yield correct assignment.

104

Table 3-3. Peptides in IgG2 that match with the mass of fragment ions of the triply charged ion m/z 1351.33 (RT at 91.17 min) via FindPept. All

Cys are alkylated with IAA. The adjoining residues before cleavage are in parenthesis. NA means not available. The peptides were sorted in the order of primary sequence number. The amino acid position 1-439 and 440-657 was for HC and LC, respectively.

Mass shift in

18O-water User mass Theor Δmass

m/z Charge (m/z) (Da) mass (Da) (ppm) peptide Position Notes

1181.5930 2 2 2361.171 2361.173 0.9 (Q)ESGPGLVKPSGTLSLTCAVS GGSIS(S) 6-30 Exclude (Tryptic rule, 18O/16O rule, too large as 2nd chain)

1188.6063 3 NA 3580.807 3580.817 2.9 (W)SWVRQPPGKGLEWIGEISHS GTTNYNPSLKSR/(V) 36-67 Exclude (too large as 2nd chain)

1188.6063 3 NA 3580.807 3580.776 -8.7 (I)GEISHSGTTNYNPSLKSRVT ISGDKSKNQFSLE(L) 50-82 Exclude (Tryptic rule, too large as 2nd chain)

1101.0885 2 2 2218.173 2218.163 -4.3 (Y)WGQGTLVTVSSASTKGPSVF PL(A) 103-124 Exclude (Tryptic rule, 18O/16O rule, too large as 2nd chain)

1101.0885 2 2 2200.162 2200.174 5.4 (W)GQGTLVTVSSASTKGPSVFPLAP(C) 104-126 Exclude (Tryptic rule, 18O/16O rule, too large as 2nd chain)

1181.5930 2 2 2361.171 2361.189 7.4 (W)GQGTLVTVSSASTKGPSVFPLAPC(S) 104-127 Exclude (Tryptic rule, 18O/16O rule. Too large as 2nd chain)

470.2368 NA 0 505.251 505.254 5.1 (K)/GPSVF(P) 118-122 -

769.4172 1 0 786.420 786.428 9.6 (K)/GPSVFPLA(P) 118-125 -

1333.6070 2 2 2683.210 2683.218 3.1 (V)TVPSSNFGTQTYTCNVDHKP SNTK/(V) 183-206 Exclude (too large as 2nd chain)

1341.1364 2 NA 2680.258 2680.258 0.1 (T)YTCNVDHKPSNTKVDKTVERKC(C) 194-215 Exclude (Tryptic rule, too large as 2nd chain)

769.4172 1 0 804.431 804.434 3.8 (S)NTKVDKT(V) 204-210 Exclude (Tryptic rule)

1181.5930 2 2 2361.171 2361.19 8.0 (E)CPPCPAPPVAGPSVFLFPPK PK/(D) 219-240 -

1101.0885 2 2 2200.162 2200.175 6.0 (C)PPCPAPPVAGPSVFLFPPKPK/(D) 220-240 -

1846.0661 1 4 1845.059 1845.055 -2.0 (C)PAPPVAGPSVFLFPPKPK/(D) 223-240 -

923.5283 2 2 1845.042 1845.055 7.1 (C)PAPPVAGPSVFLFPPKPK/(D) 223-240 -

839.4851 2 2 1676.956 1676.965 5.5 (A)PPVAGPSVFLFPPKPK/(D) 225-240 -

105

1384.7951 1 4 1383.788 1383.791 2.4 (V)AGPSVFLFPPKPK/(D) 228-240 -

470.2368 NA 0 505.251 505.254 5.1 (A)GPSVF(L) 229-233 Exclude (Tryptic rule)

1313.7528 1 4 1312.745 1312.754 7.0 (A)GPSVFLFPPKPK/(D) 229-240 -

566.3647 1 4 565.357 565.359 3.1 (F)PPKPK/(D) 236-240 -

1688.7689 1 4 1705.772 1705.788 9.4 (V)HQDWLNGKEYKCK/(V) 302-314 -

470.2368 NA 0 505.251 505.254 5.2 (E)PQVY(T) 338-341 Exclude (Tryptic rule)

470.2368 NA 0 487.240 487.239 -1.9 (P)PSRE(E) 345-348 Exclude (Tryptic rule)

1676.7706 2 NA 3351.527 3351.503 -7.1 (E)WESNGQPENNYKTTPPMLDS DGSFFLYSK/(L) 373-401 Exclude (too large as 2nd chain)

1181.5930 2 2 2361.171 2361.175 1.8 (S)DGSFFLYSKLTVDKSRWQQG(N) 393-412 Exclude (Tryptic rule, 18O/16O rule, too large as 2nd chain)

1181.5930 2 2 2361.171 2361.183 4.8 (M)HEALHNHYTQKSLSLSPGKQ S(V) 421-441 Exclude (Tryptic rule, 18O/16O rule, too large as 2nd chain, HC-LC)

1676.7706 2 NA 3369.537 3369.532 -1.4 (P)SVSGAPGQRVTISCTGSSSN IGAGYDVHWYQQ(L) 448-479 Exclude (Tryptic rule, too large as 2nd chain)

923.5283 2 2 1863.053 1863.041 -6.6 (D)VHWYQQLPGTAPKLLI(Y) 474-489 Exclude (Tryptic rule, 18O/16O rule)

1101.0885 2 2 2200.162 2200.175 6.0 (T)APKLLIYGNSNRPSGVPDRF(S) 484-503 Exclude (Tryptic rule, 18O/16O rule, too large as 2nd chain)

1188.6063 3 NA 3580.807 3580.801 -1.6 (A)TLVCLISDFYPGAVTVAWKA DSSPVKAGVETTTP(S) 576-609 Exclude (Tryptic rule, too large as 2nd chain)

470.2368 NA 0 505.251 505.254 5.1 (F)YPGAV(T) 585-589 Exclude (Tryptic rule)

751.4051 NA 0 786.419 786.412 -8.5 (Y)LSLTPEQ(W) 623-629 Exclude (Tryptic rule)

769.4172 1 0 786.420 786.412 -9.7 (Y)LSLTPEQ(W) 623-629 Exclude (Tryptic rule)

106

Table 3-4. Peptides in IgG2 that match with the mass of fragment ions of the triply charged ion m/z 1351.33 (RT at 91.17 min) via

FindPept. All Cys are alkylated with IAA. The adjoining residues before cleavage are in parenthesis. NA means not available. The table was grouped in the order of m/z value.

Mass shift in

18O-water User mass Theor Δmass

m/z Charge (m/z) (Da) mass (Da) (ppm) peptide Notes

470.2368 NA 0 487.240 487.239 -1.9 (P)PSRE(E) Exclude (Tryptic rule)

470.2368 NA 0 505.251 505.254 5.1 (K)/GPSVF(P) -

470.2368 NA 0 505.251 505.254 5.1 (A)GPSVF(L) Exclude (Tryptic rule)

470.2368 NA 0 505.251 505.254 5.1 (F)YPGAV(T) Exclude (Tryptic rule)

470.2368 NA 0 505.251 505.254 5.2 (E)PQVY(T) Exclude (Tryptic rule)

566.3647 1 4 565.357 565.359 3.1 (F)PPKPK/(D)

751.4051 NA 0 786.419 786.412 -8.5 (Y)LSLTPEQ(W) Exclude (Tryptic rule)

769.4172 1 0 786.420 786.412 -9.7 (Y)LSLTPEQ(W) Exclude (Tryptic rule)

769.4172 1 0 786.420 786.428 9.6 (K)/GPSVFPLA(P) -

769.4172 1 0 804.431 804.434 3.8 (S)NTKVDKT(V) Exclude (Tryptic rule)

839.4851 2 2 1676.956 1676.965 5.5 (A)PPVAGPSVFLFPPKPK/(D) -

923.5283 2 2 1845.042 1845.055 7.1 (C)PAPPVAGPSVFLFPPKPK/(D) -

923.5283 2 2 1863.053 1863.041 -6.6 (D)VHWYQQLPGTAPKLLI(Y) Exclude (Tryptic rule, 18O/16O rule)

1101.0885 2 2 2200.162 2200.174 5.4 (W)GQGTLVTVSSASTKGPSVFPLAP(C) Exclude (Tryptic rule, 18O/16O rule)

1101.0885 2 2 2200.162 2200.175 6.0 (T)APKLLIYGNSNRPSGVPDRF(S) Exclude (Tryptic rule, 18O/16O rule)

1101.0885 2 2 2218.173 2218.163 -4.3 (Y)WGQGTLVTVSSASTKGPSVF PL(A) Exclude (Tryptic rule, 18O/16O rule)

107

1101.0885 2 2 2200.162 2200.175 6.0 (C)PPCPAPPVAGPSVFLFPPKPK/(D) -

1181.5930 2 2 2361.171 2361.173 0.9 (Q)ESGPGLVKPSGTLSLTCAVS GGSIS(S) Exclude (Tryptic rule, 18O/16O rule)

1181.5930 2 2 2361.171 2361.175 1.8 (S)DGSFFLYSKLTVDKSRWQQG(N) Exclude (Tryptic rule, 18O/16O rule)

1181.5930 2 2 2361.171 2361.183 4.8 (M)HEALHNHYTQKSLSLSPGKQ S(V) Exclude (Tryptic rule, 18O/16O rule)

1181.5930 2 2 2361.171 2361.189 7.4 (W)GQGTLVTVSSASTKGPSVFPLAPC(S) Exclude (Tryptic rule, 18O/16O rule)

1181.5930 2 2 2361.171 2361.190 8.0 (E)CPPCPAPPVAGPSVFLFPPK PK/(D) -

1188.6063 3 NA 3580.807 3580.776 -8.7 (I)GEISHSGTTNYNPSLKSRVT ISGDKSKNQFSLE(L) Exclude (Tryptic rule, too large as 2nd chain)

1188.6063 3 NA 3580.807 3580.801 -1.6 (A)TLVCLISDFYPGAVTVAWKA DSSPVKAGVETTTP(S) Exclude (Tryptic rule, too large as 2nd chain)

1188.6063 3 NA 3580.807 3580.817 2.9 (W)SWVRQPPGKGLEWIGEISHS GTTNYNPSLKSR/(V) Exclude (Too large as 2nd chain)

1313.7528 1 4 1312.745 1312.754 7.0 (A)GPSVFLFPPKPK/(D) -

1333.6070 2 2 2683.210 2683.218 3.1 (V)TVPSSNFGTQTYTCNVDHKP SNTK/(V) Exclude (too large as 2nd chain)

1341.1364 2 NA 2680.258 2680.258 0.1 (T)YTCNVDHKPSNTKVDKTVERKC(C) Exclude (Tryptic rule)

1384.7951 1 4 1383.788 1383.791 2.4 (V)AGPSVFLFPPKPK/(D) -

1676.7706 2 NA 3351.527 3351.503 -7.1 (E)WESNGQPENNYKTTPPMLDS DGSFFLYSK/(L) Exclude (too large as 2nd chain)

1676.7706 2 NA 3369.537 3369.532 -1.4 (P)SVSGAPGQRVTISCTGSSSN IGAGYDVHWYQQ(L) Exclude (Tryptic rule, too large as 2nd chain)

1688.7689 1 4 1705.772 1705.788 9.4 (V)HQDWLNGKEYKCK/(V) -

1846.0661 1 4 1845.059 1845.055 -2.0 (C)PAPPVAGPSVFLFPPKPK/(D) -

108

Table 3-5. Partial sequences determined from the mass of fragmentation ions for a triply charge precursor ion at the retention time of

91.17 min with m/z 1351.33 (molecular mass at 4048.963 Da). The longest b- and y-ions observed are underlined. All Cys are alkylated with IAA. The adjoining residues before cleavage are in parenthesis. 18O-labeling indicated that ions 1 and 2 are b-ions and ion 3 to 11 are y-ions. The crosslinked peptides were determined to be HC:G118-R129/HC:C215-K240

Corresponding Tryptic Peptide Notes

User Theor. mass Δmass # m/z Charge mass peptide Sequence Mass (Da) (ppm) (Da)

1 470.2368 1 505.251 505.254 5.1 (K)/GPSVF(P) (K)/118GPSVFPLAPCSR129/(S) 1287.6282 Chain 2

2 769.4172 1 786.420 786.428 9.6 (K)/GPSVFPLA(P)

3 566.3647 1 565.357 565.359 3.1 (F)PPKPK/(D) (K)/215CCVECPPCPAPPVAGPSVFLFPPKPK240/(D) 2911.3305 Chain 1

4 1313.7528 1 1312.745 1312.754 7.0 (A)GPSVFLFPPKPK/(D)

5 1384.7951 1 1383.788 1383.791 2.4 (V)AGPSVFLFPPKPK/(D)

6 839.4851 2 1676.956 1676.965 5.5 (A)PPVAGPSVFLFPPKPK/(D)

7 1846.0661 1 1845.042 1845.055 7.1 (C)PAPPVAGPSVFLFPPKPK/(D)

8 923.5283 2 1845.059 1845.055 -2.0 (C)PAPPVAGPSVFLFPPKPK/(D)

9 1101.0885 2 2200.162 2200.175 6.0 (C)PPCPAPPVAGPSVFLFPPKPK/(D)

10 1181.5930 2 2361.171 2361.190 8.0 (E)CPPCPAPPVAGPSVFLFPPKPK/(D)

ruled out for 11 1688.7689 1 1705.772 1705.788 9.4 (V)HQDWLNGKEYKCK/(V) (R)/294VVSVLTVVHQDWLNGKEYKCK314/(V) 2502.2941 chain 2, see text

109

Table 3-6. De novo sequencing for sequence tag using y-ions from the cross-linked fragments (group 4 in Table 3-1) in the cross-link peptide G118-R129/C215-K240. The observed m/z value is the most abundant isotopic peak in each isotopic envelope. The mass difference between a pair of adjacent y-ions was calculated and compared to the mass of single amino acids and dipeptides within mass error of 0.05 Da. The amino acid residues or dipeptides found in the putative cross-linked peptide chains are shown in red. The sequence tag SVFPLA is in the cross-linked peptide chain G118-R129.

Charge Mass Shift in Obs. Mass Mass Diff. Obs. m/z Amino Acid Residue or Di-peptide (Mass error) State 18O-water (Da) (Da)

1633.2705 2 4 3264.5265 17.9980 H2O=18.0106(0.0126)

1642.2695 2 4 3282.5244 184.0776 IA, LA=184.1212(0.0436); PS=184.0848(0.0072)

1734.3083 2 4 3466.6020 79.0643 P-H2O=97.0528-18.0106=79.0422(-0.0221)

1773.8405 2 4 3545.6663 18.0336 H2O=18.0106(-0.0230)

1782.8573 2 4 3563.7000 147.0739 F=147.0684(-0.0055)

1237.6012 3 2.5 3709.7818 186.1085 W=186.0793(-0.0292); AD,GE=186.0641(-0.0444); SV=186.1004(-0.0081)

1299.6374 3 2.5 3895.8903 - -

110

3.4.4 Stage 4: Deduction of Crosslinking Chemistry and Site

Toward this end, two pieces of information are particularly useful: (1) elemental composition of the functional group involved and (2) peptide fragments not observed in tandem mass spectrometry.

Deducing Crosslinking Chemistry: Mass to Formula (Elemental Composition). An example of the elemental composition calculation is illustrated in Table 3-7. Once the peptides

G118-R129 and C215-K240 were established as the components of the cross-linked peptide, the difference between the observed mass of the crosslinked peptide (4048.9600 Da) and the combined mass of the two chains devoid of modifications (1287.6282+2911.3305=4198.9587

Da) was calculated to be 149.9987 Da. Four potential elemental compositions for the mass of

149.9987 Da were obtained via Thermo-Fisher Scientific Xcalibur (Table 3-7). The last two were eliminated based on their high delta ppm relative to the FT MS mass accuracy (typically ≤

5 ppm). The high RDB (ring and double bond) value makes the second one unlikely too. This leaves the first one as the only plausible choice.

The elemental composition of C4H6O4S contains sulfur, which only presents in cysteine and methionine. Each of the putative crosslinked peptide pair G118-R129 and C215-K240, and particularly the fragments that were not observed by tandem mass spectrometry (those not underlined in Table 3-5), contain cysteine but not methionine. During sample preparation, cysteine residues were reduced and alkylated by iodoacetic acid (IAA), so the mass for all peptides were calculated assuming cysteines are alkylated. Hence, removal of two alkyl groups

(two C2H3O2) and a sulfur atom exactly matches the determined elemental composition. The

111 mass of observed crosslink peptide (4048.963 Da) and the theoretical thioether peptide

(4048.960 Da) are practically identical (with mass error of 0.74 ppm, see Table 3-8). All together, we surmised that the crosslinking chemistry is a thioether originating from a pair of cysteine residues (Scheme 3-2).

Table 3-7. Elemental formula with mass of 149.9987 Da (the mass difference between the sum of unmodified peptides and crosslinked peptide). RDB means ring and double bond; ppm is part-per-million.

Elemental Composition Proposed Structure

Formula Cal. Mass (Da) Delta ppm RDB

C4H6O4S 149.9987 0 2.0 S+CH2COOH+CH2COOH

C10ON 149.9980 5 11.5 Excluded

C5H2N4S 150.0000 -9 7.0 Excluded

C2H4O3N3S 149.9973 9 2.5 Excluded

Locating Crosslink Site. Typically, the crosslink site can be localized by the largest b- and y-ions observed. For the crosslinked peptide m/z 1351.33, the largest b- and y-ions are

GPSVFPLA/(PCSR, not observed), (CCVE, not observed)/CPPCPAPPVAGPSVFLFPPKPK for the chain HC:G118-R129 and HC:C215-K240, respectively. This indicates the crosslink site is in the corresponding PCSR and CCVE region (Figure 3-2). Compared to the highly stable

112

valine and proline, cysteine and glutamic acid are chemically reactive, so they are more likely

candidates for crosslinking.

The elemental analysis described in the previous section indicates that a sulfur atom was

removed, suggesting that a cysteine is involved. In addition, functional groups with a combined

composition of C4H6O4 are eliminated from the theoretical peptide pairs, in which cysteinyl

residues were assumed to be alkylated with IAA (C2H3O2). Taken together, our data indicated that the crosslink site is highly likely at HC:Cys127-HC:Cys215 or HC:Cys127-HC:Cys216 as shown in Figure 3-2. Because the two cysteines (Cys215 and Cys216) in the heavy chain are adjacent to

each other, the exact crosslink site was unable to be unambiguously determined here.

113

b22-H2O++ y8+++ y4++ 1783.16 100 1237.90 16 y21++ y7++ O-water y11 1641.75 1782.28 1101.46 1256.74 b8++ y16 80 y16++ y9+++ 1103.19 839.72 1270.97 1678.05 y18 y5++ y7+++ y10+++ y15 1846.17 60 y7-H2O++ b10++ 1299.85 1580.94 b5 1773.89 b22-H2O+++ y6++ 1849.91 b8-H2O 840.36 y13 b21++ 40 y5 1188.41 y12 1733.91 751.49 1384.78 1743.27 1783.90 566.46 1314.80 y4-H2O++ y8++ b8 y18++ y22++ Relative Abundance Relative 1856.88 20 b5-H2O y8 1633.40 923.74 1333.70 769.48 1181.97 1885.82 470.34 713.48 973.60 1539.70 0 y16++ y8+++ b22-H2O++ 841.78 b8++ 100 1240.42 18 1104.93 y11 1786.98 O-water y4++ y7++ 1260.85 1786.39 80 y21++ 1646.35y16 y9+++ 1103.98 1681.95 1273.60 y5++ b5 60 y10+++ y7-H2O++ y7+++ y6++ 1853.88 y12 1302.43 1778.08 b10++ 1737.61 1787.72 y13 b21++ 40 y5 b22-H2O+++1317.75 y8++ b8-H2O 1388.87 1745.64 y18 570.50 1189.26 y4-H2O++ 1860.89 751.46 y18++ y8 1318.81 1850.05

Relative Abundance b8 1637.00 20 b5-H2O 925.78 977.60 y22++ y11 769.49 1183.90 1584.91 470.33 717.51 945.78 1889.84 0 600 800 1000 1200 1400 1600 1800

Figure 3-2. CID MS/MS spectrum of the triply charged precursor ions at m/z 1351.33 (16O- labeled C-termini) and 1354.00 (18O-labeled C-termini). Singly and doubly charge fragment ions

that contain the individual chain of the crosslinked peptides (G118-R129/C215-K240, RT at

91.17 min) are highlighted in blue and red, respectively. Characteristic mass shift imparted by

the heavier isotope 18O was observed (e.g., the mass shift of 4 Da for y5 in 16O- vs 18O-water).

114

3.4.5 Final Confirmation and Additional Support

Confirmation by Data Matching. Once the putative crosslinking chemistry and site

have been proposed, theoretical fragmentation spectra were calculated and compared with the

observed spectra. The assignment is shown in Figure 3-2 and is highly consistent with the

deduced structure. A handful fragment peaks of a few crosslinks were not assigned initially, and hence were subjected to further analysis as described next.

MS3 Analysis. MS3 analysis may provide additional structural information, especially

for fragment ions that are difficult to assign in the MS/MS. For example, in Figure 3-3a, two

high intensity fragment ions at m/z 1196.50 (singly charged) and 1520.90 (doubly charged) were

observed for the triply charged crosslinked peptide m/z 1413.37 (G118-R129/K214-K240), but

could not be assigned to typical b- or y-ions. To ascertain, MS3 analysis of these two unassigned

ions revealed the sequences to be 118GPSVFPLAPC*SR129 and

214KCC*VECPPCPAPPVAGPSVFLFPPKPK240, in which a dehydroalanine replaces C127 in

peptide G118-R129 and a free cysteine replaces the thioether at 216 in peptide K214-K240,

respectively (Figure 3-3b & 3-3c). These data further supported the proposed sequence and

crosslink sites. Alkylation at Lys and Met as an artifact from sample preparation in peptide

mapping was reported[49]. The alkylation at K214 in the crosslink peptide G118-R129/K214-

K240 is in agreement with the literature [49].

115

Additional Peptides. Following the same work flow, full sequences, crosslinking

chemistry and sites have been established for all ten candidate crosslinked peptides shown in red

in Table 3-2. The final results of all identified crosslinked peptides are summarized in Table 3-8.

To evaluate the sensitivity of our method, the peak intensity from LC-MS analysis for each crosslinked peptide and its related (not crosslinked) peptides was used to estimate the degree of crosslinking as described by Zhang[16], ranging from 0.2% to 5.0% (half less than 1%; see Table 3-10 for details). Comparable data were observed based on reducing SDS-PAGE, which indicated about 8% of total crosslinked species (see Figure 3-8A). It is also worth noting that no enrichment or separation was performed on the IgG2 samples prior to tryptic digestion

(the first step of our work flow); in other words, the crosslinked peptides were analyzed in the presence of large excess of native peptides. Of course, considerably higher sensitivity can be achieved if the crosslinked proteins are separated or enriched prior to analysis.

116

(See Fig. S3-4) (See Fig. S3-5)

Ion 1521++ 1520.90 100 16 O-water Ion 1196 80 1196.50

60 y4++ b15++ 1735.05 b5 40 b8 1462.89 y18++ y21++ b11++ y3-H2O++ y7++ 769.44 b22++ 923.46 1280.45 y10+++ y16 1875.87 Relative Abundance 20 y5 y16++ 1101.24 1836.93 y20-H2O++ 1361.73 1677.76 566.34 709.41 839.58 428.25 1043.48 1580.77 1944.77 0 Ion 1521++ 1522.79 100 18O-water Ion 1196 80 1200.51

60 y4++ y3-H2O++ 1738.65 40 y21++ b11++ y16 y10+++ b22++ b8 1103.22 1282.57 1681.68 y16++ b5 769.39 1364.50 1838.05 Relative Abundance 20 y5 y20-H2O++ b15++ y7++ 841.68y18++ 1682.71 1879.82 570.40 713.36 1465.19 432.21 925.681046.48 1635.52 0 400 600 800 1000 1200 1400 1600 1800 2000

Figure 3-3a. MS/MS data of the cross-link peptide HC:G118-R129/HC:KΔ214-K240 (RT at

88.56 min, m/z 1413.36, charge of 3). Lys214 was found alkylated by IAA in this peptide.

117

A)

1196.6369 100

90

80 1197.6389 70

60

50

40 Relative Abundance 30 1198.6415

20

10 1199.6492 1200.6535 0 1196.5 1197.0 1197.5 1198.0 1198.5 1199.0 1199.5 1200.0 1200.5 1201.0 M-H2O m/z 1178.52 100

90 B)

80

70

60

50 y10 1042.49 40 b8-H2O

Relative Abundance 30 751.33 M-2H2O b8 1160.53 769.34 20 y7 y10-H2O y8 1024.56 y4 709.41 y9 y5 856.41 1150.73 10 428.26 y8-H2O b9 955.63 371.25 499.31 692.43 1007.54 838.65 866.44 1082.34 0 400 500 600 700 800 900 1000 1100 1200 m/z

Figure 3-3b. MS3 for structure confirmation of the singly-charged fragment ion m/z 1196 from the cross-link peptide HC:G118-R129/HC:KΔ214-K240 (RT at 88.56 min, m/z 1413.36, charge of 3).

Lys214 was found alkylated in this peptide. A) Isotopic distribution (observed m/z 1196.6369, calculated m/z 1196.6423, mass error of 4.5 ppm). B) MS3 spectrum.

118

A) 1521.2133 100 1521.7178 90

80

70 1520.7060

60 1522.2217

50

40 Relative Abundance 30 1522.7274 20

10 1523.2295 0 1520.5 1521.0 1521.5 1522.0 1522.5 1523.0 1523.5 m/z y16 1677.74 100 1678.77 90 B) y21++ 1101.37 80 b11 1363.27 70 y18 1846.89 60 1456.95 1845.81 50

y5 b6 b9 1376.37 1522.54 1636.63 40 840.14 y12 566.33 1195.28 1313.60 30 1475.61 y15 b15 y13 1580.81 1727.50 1847.94 20 1196.38 1384.68 b14 y8 y11 b16 y5-H2O y6 1257.64 1656.47 10 b5 973.43 1785.15 548.37 713.38 679.26 801.67 842.39 1085.26 1899.39 0 600 800 1000 1200 1400 1600 1800 2000 m/z

Figure 3-3c. MS3 for structure confirmation of the double-charged fragment ion m/z 1521 from the cross-linked peptide HC:G118-R129/HC:KΔ214-K240 (RT at 88.56 min, m/z 1413.36, charge of

3). Lys214 was found alkylated in this peptide. A) Isotopic distribution (observed m/z 1520.7060, calculated m/z 1520.7200, mass error of 9.2 ppm). B) MS3 spectrum.

119

Table 3-8. Crosslinked peptides identified in IgG2. Thioether crosslink sites are labeled in red and bold-face. The exact site at either C215 or C216 is unable to be differentiated with available data.

Unless noted, all cysteine side chains are alkylated with IAA. Alkylated Lys is shown in blue. The longest b- or y-ions observed are underlined. Peptides are shown with the adjoining amino acid residues before cleavage in parenthesis. The peptide #8 contains a thioether (in red) and dehydroalanine (in green and asterisk) at C215-C216, as shown in Figure S5.

# Name m/z(charge) RT Observed Mass Theoretical Mass Mass Error Sequence Cross-linking

(min) (Da) (Da) (ppm) Site in Heavy Chain

1 G118-R129/ C215-K240 1351.33(3+) 91.17 4048.963 4048.960 0.74 (K)118GPSVFPLAPCSR129(S) 127

(K)215CCVECPPCPAPPVAGPSVFLFPPKPK240(D) 215 or 216

2 G118-R129/ K214-K240 1394.03(3+) 87.91 4177.059 4177.055 0.96 (K)118GPSVFPLAPCSR129(S) Same

(R)214KCCVECPPCPAPPVAGPSVFLFPPKPK240(D)

3 G118-R129/ K214-K240 1413.36(3+) 88.56 4235.064 4235.060 0.94 (K)118GPSVFPLAPCSR129(S) Same

(R)214KCCVECPPCPAPPVAGPSVFLFPPKPK240(D)

4 C215-K240/ C215-K240 1419.92(4+) 101.59 5672.670 5672.662 1.41 (K)215CCVECPPCPAPPVAGPSVFLFPPKPK240(D) 215 or 216

(K)215CCVECPPCPAPPVAGPSVFLFPPKPK240(D) 215 or 216

5 C215-K240/ K214-K240 1452.20(4+) 99.76 5800.765 5800.757 1.38 (K)215CCVECPPCPAPPVAGPSVFLFPPKPK240(D) Same

(R)214KCCVECPPCPAPPVAGPSVFLFPPKPK240(D)

6 K214-K240/ K214-K240 1484.22(4+) 97.88 5928.854 5928.852 0.34 (R)214KCCVECPPCPAPPVAGPSVFLFPPKPK240(D) Same

(R)214KCCVECPPCPAPPVAGPSVFLFPPKPK240(D)

7 K214-K240/ K214-K240 1498.47(4+) 98.99 5986.858 5986.858 0.00 (R)214KCCVECPPCPAPPVAGPSVFLFPPKPK240(D) Same

(R)214KCCVECPPCPAPPVAGPSVFLFPPKPK240(D)

8 K214-K240*/ K214-K240 1475.73(4+) 98.49 5894.860 5894.856 0.68 (R)214KCC*VECPPCPAPPVAGPSVFLFPPKPK240(D) Same

(R)214KCCVECPPCPAPPVAGPSVFLFPPKPK240(D)

9 T210-K240/ K214-K240 1605.54(4+) 96.15 6414.117 6414.112 0.78 (K)210TVERKCCVECPPCPAPPVAGPSVFLFPPKPK240(D) Same

(R)214KCCVECPPCPAPPVAGPSVFLFPPKPK240(D)

10 T210-K240/ K214-K240 1620.04(4+) 97.39 6472.120 6472.117 0.46 (K)210TVERKCCVECPPCPAPPVAGPSVFLFPPKPK240(D) Same

(R)214KCCVECPPCPAPPVAGPSVFLFPPKPK240(D)

120

3.4.6 Targeted Search Based on the Newly Established Crosslinking Chemistry

After the thioether crosslink chemistry was established, a targeted search for this

particular modification was performed following well-established protocols. First, a theoretical

database was built for all combination of a thioether crosslinking between any two cysteinyl

residues. Then, all observed precursor ions were searched against the database. When a hit was

found in the targeted mass search, the corresponding MS/MS data from both the 18O-water and

16O-water digests were examined for further structural confirmation. By this approach, four

additional thioether peptides were found (Table 3-9, Figure 3-6 & 3-7). All contain a light chain

C-terminal peptide, so each has only one newly created C-terminus (two 18O-incorporation) and therefore was not discriminated from single chain peptides in the initial screening stage. Again, these results showcase the utility of our approach to identify crosslinks in macromolecules derived from previously unknown crosslinking chemistry.

121

Table 3-9. The cross-linking peptides identified in the IgG2 via a targeted search for cysteinyl thioether (labeled in red and bold-face). The exact site at either C215 or C216 is unable to be differentiated with available data. Unless noted, all cysteine side chains are alkylated with

IAA. Alkylated Lys is shown in blue and asterisk. Peptides are shown with the adjoining amino acid residues before cleavage in parenthesis.

# Name m/z(charge) RT Observed Mass Theoretical Mass Mass Error Sequence Cross-linking

(min) (Da) (Da) (ppm) Site

11 T211-S218/ 1002.50(2+) 52.25 2001.984 2001.983 0.40 (K)211TVAPTECS218 LC217/

G118-R129 (K)118GPSVFPLAPCSR129(S) HC127

12 T211-S218/ 1209.90(3+) 78.29 3625.689 3625.685 1.10 (K)211TVAPTECS218 LC217/

C215-K240 (K)215CCVECPPCPAPPVAGPSVFLFPPKPK240(D) HC215 or 216

13 T211-S218/ 1252.94(3+) 76.12 3753.784 3753.780 1.00 (K)211TVAPTECS218 Same

K214-K240 (R)214KCCVECPPCPAPPVAGPSVFLFPPKPK240(D)

14 T211-S218/ 1272.27(3+) 77.69 3811.789 3811.786 0.70 (K)211TVAPTECS218 Same

Δ Δ K 214- (R)214K CCVECPPCPAPPVAGPSVFLFPPKPK240(D)

K240

122

y16++ 839.82 100 16O-water 80 y18 60 1847.14 b5++ b10++ y21++ b10+++ y22++ 1738.07 1333.49 1999.66 40 1101.38 1181.83 y16 840.61 y12 y5 y8 b21+++ y18++ 1678.00 1102.15 1313.77 Relative Abundance Relative 973.72 y13 y15 1704.30 b8++ 20 566.46 y11 y5-H2O y6 923.88 1384.75 1915.37 1256.79 1580.94 1739.01 548.50 567.50 713.52 924.73 0 y16++ 841.76 100 18O-water 80

60 y18 y21++ y11 1851.17 b10+++ 842.49 1260.81 b8++ 40 1103.39 1334.79 b21+++ y12 y16 1917.23 y5 y6 1705.38 y18++ 1104.13 y13 y5-H2O 717.52 1317.79 y15 1682.05 b5++

Relative Abundance Relative 570.49 20 925.78 y22++ 1388.76 1739.90 1918.23 550.53 y8 1584.94 571.53 1183.99 524.34 831.82 977.70 0 400 600 800 1000 1200 1400 1600 1800 2000 m/z

Figure 3-4. MS/MS data of the cross-linked peptide HC:C215-K240/HC:C215-K240 (RT at

101.59 min, m/z 1419.92, charge of 4).

123

y16++ 16 839.82 O-water 100

y21++ Ion 1429++ 80 1101.38 b11+++ 60 y16 1407.91Ion 1521++ b22+++ 1677.92 1521.72 40 y5 1778.90 b6++ y12 1429.47 1848.94 566.37 y22++ y15 y18++ 1313.74 20 1182.14 1581.86 1803.41 Relative Abundance 923.57

0 y16++ 841.75 18 100 O-water

b22+++ 80 Ion 1429++ 1779.68

60 y21++ b11+++ Ion 1521++ y16 1103.17 1408.74 1523.51 1681.87 40 y12 1431.87 y5 b6++ y18++ 1445.93 y15 y22++ 1317.82 20 570.38 1851.17 Relative Abundance 925.68 1184.27 1584.92 717.43 1025.69 1984.12 0 600 800 1000 1200 1400 1600 1800 2000 m/z

Figure 3-5. MS/MS data of the cross-linked peptide HC:K214-K240*/HC:KΔ214-K240 (RT at

98.49 min, m/z 1475.73, charge of 4).

124

y4 1234.40 100 y3-H2O 80 y7 y5++ 1515.56 16O-water 866.54 60 y7++ 814.05 758.51 622.80 b8 y6++ 1244.48 b7-H2O 902.06 40 b5 769.33 y5 680.31b8-H2O y8++ 963.32 y6 b5-H2O 751.28 832.05 993.07 1305.45 1418.56

Relative Abundance 470.19 b7++ y2 y8 20 b5 b7 1034.39 488.21 698.26 949.61 1404.55 1662.64 442.23 1137.42 1731.64 0 y4 1238.43 100 y3-H2O 80 y7 18 1519.57 O-water y5++ 60 624.85 868.47 y7++ 815.99 b7-H2O 760.45 b8 y6++ 1248.47 40 680.32 769.32 b5 b8-H2O 903.93 y5 b5-H2O y8++ 967.34 y6 751.29 1309.49 b5 b7 833.99 b7++ 994.98 1422.51 y8 Relative Abundance20 470.19 y2 698.36 1038.36 488.22 951.59 1408.52 1666.61 442.24 1141.42 1735.67 0 400 600 800 1000 1200 1400 1600 1800 2000 m/z

Figure 3-6. MS/MS data of the cross-link peptide LC:T211-S218/HC:G118-R129 (RT at 52.25 min, m/z 1002.50, charge of 2).

125

y16++ 839.66 100 16O-water 80

60 y21++ y5+++ 1101.20 y12 1162.41 1313.54 40 y6+++ b22++ y18 y5 y18++ 1185.70 1596.44 y16 b11++ y13 b6 1845.93

Relative Abundance 566.36 b15++ 1677.75 20 923.48 1039.53 1384.58 1554.46 y5++ b9 1222.22 1743.21 1909.54 791.08 904.41 1460.22 0 y16++ 841.73 100 18O-water 80

60 y21++ 1103.29 y13 y5+++ b6 y5++ 40 1388.54 1163.45 y6+++ 1554.43 1745.65 y18 y5 1187.21 y12 b22++ y16 1849.87 y18++ b15++ Relative Abundance 20 b11++ 1596.44 1681.80 b9 570.40 925.65 1222.641317.57 1039.49 1495.64 1909.53 793.18 904.43 1806.65 0 400 600 800 1000 1200 1400 1600 1800 2000 m/z

Figure 3-7. MS/MS data of the cross-linked peptide LC:T211-S218/HC:K214-K240 (RT at

76.12 min, m/z 1252.94, charge of 3).

126

Table 3-10 Quantification of the cross-linked peptides in the IgG2

# Peptide Level (%)

1 G118-R129/C215-K240 0.5

2 G118-R129/K214-K240 1.0

3 G118-R129/KΔ214-K240 1.9

4 C215-K240/C215-K240 0.3

5 C215-K240/K214-K240 2.6

6 K214-K240/K214-K240 2.4

7 K214-K240/KΔ214-K240 1.7

8 K214-K240*/KΔ214-K240 0.2

9 T210-K240/K214-K240 0.6

10 T210-K240/KΔ214-K240 0.5

11 T211-S218/G118-R129 5.0

12 T211-S218/C215-K240 0.2

13 T211-S218/K214-K240 0.2

14 T211-S218/KΔ214-K240 0.4

127

mAU B) 500

400

300

200

6% 100 IgG2-pH8.5/50C/1wk 1% 3%

IgG2-Control 0

0 10 20 30 40 50 min

Figure 3-8. (A) Detection of cross-links in IgG2 by reducing SDS-PAGE. Lane 1, 2, 3, and 4 are molecular weight marker, control, acid stressed sample (pH 3.0 at 50 °C for 1 week), and base stressed sample (pH 8.5 at 50 °C for 1 week), respectively. About 8% (combined) non-reducible high molecular bands were observed in the base stressed IgG2, as quantified by the software TotalLab Quant version

12.4. (B) Aggregation analysis by size exclusion chromatography. Seven percent (7%) aggregation was observed in the base stressed IgG2. Column: TSKgel G3000 SWXL 7.8 x 300 mm 5 µm (two in tandem); mobile phase: 150 mM NaCl in 100 mM sodium phosphate buffer pH 6.9; column temperature: ambient; flow rate: 0.5 mL/min; detection: 215 nm; injection: 10 µL of 2.5mg/mL sample.

128

3.5 Formation of Thioether

Thioether is a known modification for proteins[50-55]. For IgG1, a thioether crosslink

was located at the disulfide bond of the light chain C-termini and the heavy chain hinge

region[52, 55]. A generally accepted mechanism involves a β-elimination of disulfide to

generate dehydroalanine followed by Michael addition by another cysteinyl thiol[52-54, 56, 57].

Basic conditions and structural flexibility generally favor its formation[50-55]. In addition,

radical intermediates have been postulated for desulfurization[57, 58]. The hinge region of IgG2

is highly flexible and solvent exposed, and therefore very susceptible to this transformation.

Indeed, our results indicated it occurs more frequently at the light chain C-termini and in the

hinge region of IgG2. It is very interesting that the disulfide bonds of heavy chain C127 –heavy

chain C215 (or C216) in the IgG2 A/B form (or B form) are also reactive (Figure 3-9). These

thioether crosslinks at HC:Cys127-HC:Cys215 (or HC:Cys127-HC:Cys216), HC:Cys215-HC:Cys215

(or HC:Cys216-HC:Cys216), LC:Cys217-HC:Cys127, and LC:Cys217-HC:Cys215(or LC:Cys217-

HC:Cys216) originated from native disulfides as shown in red in Figure 3-9. Thioether linkage is

in agreement with the previous reports on IgG2 disulfide bond pairing[59-61]. In Table 3-8, the

crosslink peptide #8 (HC:K214-K240*/HC:K214-K240) contains a thioether and an

dehydroalanine. The corresponding linear peptide K214-K240*

(214KCC*VECPPCPAPPVAGPSVFLFPPKPK240, the dehydroalanine at C215 or C216 was

denoted as asterisk) was also found. All together, these data are consistent with thioether

formation via dehydroalanine intermediates.

129

Scheme 3-2. Establishment of crosslink chemistry based on formula C4H6O4S obtained from elemental composition analysis of 149.9987 Da.

Figure 3-9. Major disulfide linkage isoforms in IgG2. Those labeled in red were found to

convert into thioethers in IgG2.

130

3.6 Conclusions

The utility of our XChem-Finder strategy for the characterization of protein crosslinking with undefined chemistry is exemplified by the discovery of fourteen thioether peptides in IgG2.

Essential to our approach is 18O-isotope labeling; it allows the facile detection of crosslinked peptides, and most significantly, divides the complex tandem mass spectra to sub-sets that can be processed by standard database search (FindPept that matches fragment ions with partial peptide sequences) and de novo sequencing (sequence tags). High-resolution spectral data also dramatically improve the confidence of assignment, and moreover, reveal the chemical nature of the crosslinking. While the reported work was manually processed, most steps can be automated. Hence our XChem-Finder strategy should be generally applicable for the discovery of crosslinked proteins, without prior defined chemistry, in both biological systems and biopharmaceuticals.

3.7 References

[1] Liu H, Gaza-Bulseco G, Faldu D, Chumsae C, Sun J. Heterogeneity of monoclonal antibodies. J Pharm Sci 2008;97:2426-47.

[2] DiMarco T, Giulivi C. Current analytical methods for the detection of dityrosine, a biomarker of oxidative stress, in biological samples. Mass Spectrom Rev 2007;26:108- 20.

[3] Srivastava OP, Kirk MC, Srivastava K. Characterization of covalent multimers of crystallins in aging human lenses. J Biol Chem 2004;279:10901-9.

[4] Wilhelmus MM, Grunberg SC, Bol JG, van Dam AM, Hoozemans JJ, Rozemuller AJ, et al. Transglutaminases and transglutaminase-catalyzed cross-links colocalize with the pathological lesions in Alzheimer's disease brain. Brain Pathol 2009;19:612-22.

131

[5] Lopez B, Gonzalez A, Hermida N, Valencia F, de Teresa E, Diez J. Role of lysyl oxidase in myocardial fibrosis: from basic science to clinical aspects. Am J Physiol Heart Circ Physiol 2010;299:H1-9.

[6] Nemes Z, Devreese B, Steinert PM, Van Beeumen J, Fesus L. Cross-linking of ubiquitin, HSP27, parkin, and alpha-synuclein by gamma-glutamyl-epsilon-lysine bonds in Alzheimer's neurofibrillary tangles. FASEB J 2004;18:1135-7.

[7] Friedman M. Chemistry, biochemistry, nutrition, and microbiology of lysinoalanine, lanthionine, and histidinoalanine in food and other proteins. J Agric Food Chem 1999;47:1295-319.

[8] Nashef AS, Osuga DT, Lee HS, Ahmed AI, Whitaker JR, Feeney RE. Effects of alkali on proteins. Disulfides and their products. J Agric Food Chem 1977;25:245-51.

[9] Leitner A, Walzthoeni T, Kahraman A, Herzog F, Rinner O, Beck M, et al. Probing native protein structures by chemical cross-linking, mass spectrometry, and bioinformatics. Mol Cell Proteomics 2010;9:1634-49.

[10] Singh P, Panchaud A, Goodlett DR. Chemical cross-linking and mass spectrometry as a low-resolution protein structure determination technique. Anal Chem 2010;82:2636-42.

[11] Young MM, Tang N, Hempel JC, Oshiro CM, Taylor EW, Kuntz ID, et al. High throughput protein fold identification by using experimental constraints derived from intramolecular cross-links and mass spectrometry. Proc Natl Acad Sci U S A 2000;97:5802-6.

[12] Lee YJ, Lackner LL, Nunnari JM, Phinney BS. Shotgun cross-linking analysis for studying quaternary and tertiary protein structures. J Proteome Res 2007;6:3908-17.

[13] Hoopmann MR, Weisbrod CR, Bruce JE. Improved strategies for rapid identification of chemically cross-linked peptides using protein interaction reporter technology. J Proteome Res 2010;9:6323-33.

[14] Du X, Chowdhury SM, Manes NP, Wu S, Mayer MU, Adkins JN, et al. Xlink-identifier: an automated data analysis platform for confident identifications of chemically cross- linked peptides using tandem mass spectrometry. J Proteome Res 2011;10:923-31.

[15] Chowdhury SM, Du X, Tolic N, Wu S, Moore RJ, Mayer MU, et al. Identification of cross-linked peptides after click-based enrichment using sequential collision-induced dissociation and electron transfer dissociation tandem mass spectrometry. Anal Chem 2009;81:5524-32.

[16] Zhang Z. Large-scale identification and quantification of covalent modifications in therapeutic proteins. Anal Chem 2009;81:8354-64.

132

[17] Zang T, Lee BW, Cannon LM, Ritter KA, Dai S, Ren D, et al. A naturally occurring brominated furanone covalently modifies and inactivates LuxS. Bioorg Med Chem Lett 2009;19:6200-4.

[18] Wan W, Zhao G, Al-Saad K, Siems WF, Zhou ZS. Rapid screening for S- adenosylmethionine-dependent methylation products by enzyme-transferred isotope patterns analysis. Rapid Commun Mass Spectrom 2004;18:319-24.

[19] Gao Q, Xue S, Doneanu CE, Shaffer SA, Goodlett DR, Nelson SD. Pro-CrossLink. Software tool for protein cross-linking and mass spectrometry. Anal Chem 2006;78:2145-9.

[20] Gao Q, Xue S, Shaffer SA, Doneanu CE, Goodlett DR, Nelson SD. Minimize the detection of false positives by the software program DetectShift for 18O-labeled cross- linked peptide analysis. Eur J Mass Spectrom (Chichester, Eng) 2008;14:275-80.

[21] Zelter A, Hoopmann MR, Vernon R, Baker D, MacCoss MJ, Davis TN. Isotope signatures allow identification of chemically cross-linked peptides by mass spectrometry: a novel method to determine interresidue distances in protein structures through cross- linking. J Proteome Res 2010;9:3583-9.

[22] Rinner O, Seebacher J, Walzthoeni T, Mueller LN, Beck M, Schmidt A, et al. Identification of cross-linked peptides from large sequence databases. Nat Methods 2008;5:315-8.

[23] Seebacher J, Mallick P, Zhang N, Eddes JS, Aebersold R, Gelb MH. Protein cross- linking analysis using mass spectrometry, isotope-coded cross-linkers, and integrated computational data processing. J Proteome Res 2006;5:2270-82.

[24] Seidler J, Zinn N, Boehm ME, Lehmann WD. De novo sequencing of peptides by MS/MS. Proteomics 2010;10:634-49.

[25] Singh P, Shaffer SA, Scherl A, Holman C, Pfuetzner RA, Larson Freeman TJ, et al. Characterization of protein cross-links via mass spectrometry and an open-modification search strategy. Anal Chem 2008;80:8799-806.

[26] Kroon DJ, Baldwin-Ferro A, Lalan P. Identification of sites of degradation in a therapeutic monoclonal antibody by peptide mapping. Pharm Res 1992;9:1386-93.

[27] Van Buren N, Rehder D, Gadgil H, Matsumura M, Jacob J. Elucidation of two major aggregation pathways in an IgG2 antibody. J Pharm Sci 2009;98:3013-30.

[28] Back JW, Notenboom V, de Koning LJ, Muijsers AO, Sixma TK, de Koster CG, et al. Identification of cross-linked peptides for protein interaction studies using mass spectrometry and 18O labeling. Anal Chem 2002;74:4417-22.

133

[29] Ren D, Pipes GD, Liu D, Shih LY, Nichols AC, Treuheit MJ, et al. An improved trypsin digestion method minimizes digestion-induced modifications on proteins. Anal Biochem 2009;392:12-21.

[30] Zhang Z. Prediction of low-energy collision-induced dissociation spectra of peptides with three or more charges. Anal Chem 2005;77:6364-73.

[31] Zhang Z. Prediction of collision-induced-dissociation spectra of peptides with post- translational or process-induced modifications. Anal Chem 2011;83:8642-51.

[32] Zhang Z. Prediction of low-energy collision-induced dissociation spectra of peptides. Anal Chem 2004;76:3908-22.

[33] Zhang Z. Retention time alignment of LC/MS data by a divide-and-conquer algorithm. J Am Soc Mass Spectrom 2012;23:764-72.

[34] Schnolzer M, Jedrzejewski P, Lehmann WD. Protease-catalyzed incorporation of 18O into peptide fragments and its application for protein sequencing by electrospray and matrix-assisted laser desorption/ionization mass spectrometry. Electrophoresis 1996;17:945-53.

[35] Ye X, Luke B, Andresson T, Blonder J. 18O stable isotope labeling in MS-based proteomics. Brief Funct Genomic Proteomic 2009;8:136-44.

[36] Fenselau C, Yao X. 18O2-labeling in quantitative proteomic strategies: a status report. J Proteome Res 2009;8:2140-3.

[37] Yao X, Afonso C, Fenselau C. Dissection of proteolytic 18O labeling: endoprotease- catalyzed 16O-to-18O exchange of truncated peptide substrates. J Proteome Res 2003;2:147-52.

[38] Yao X, Freas A, Ramirez J, Demirev PA, Fenselau C. Proteolytic 18O labeling for comparative proteomics: model studies with two serotypes of adenovirus. Anal Chem 2001;73:2836-42.

[39] Bantscheff M, Dumpelfeld B, Kuster B. Femtomol sensitivity post-digest (18)O labeling for relative quantification of differential protein complex composition. Rapid Commun Mass Spectrom 2004;18:869-76.

[40] Stewart, II, Thomson T, Figeys D. 18O labeling: a tool for proteomics. Rapid Commun Mass Spectrom 2001;15:2456-65.

[41] Koehler CJ, Arntzen MO, de Souza GA, Thiede B. An Approach for Triplex-Isobaric Peptide Termini Labeling (Triplex-IPTL). Anal Chem 2013.

134

[42] Koehler CJ, Arntzen MO, Strozynski M, Treumann A, Thiede B. Isobaric peptide termini labeling utilizing site-specific N-terminal succinylation. Anal Chem 2011;83:4775-81.

[43] Nakazawa T, Yamaguchi M, Okamura TA, Ando E, Nishimura O, Tsunasawa S. Terminal proteomics: N- and C-terminal analyses for high-fidelity identification of proteins using MS. Proteomics 2008;8:673-85.

[44] Liu M, Cheetham J, Cauchon N, Ostovic J, Ni W, Ren D, et al. Protein isoaspartate methyltransferase-mediated 18O-labeling of isoaspartic acid for mass spectrometry analysis. Anal Chem 2012;84:1056-62.

[45] Ni W, Dai S, Karger BL, Zhou ZS. Analysis of isoaspartic Acid by selective proteolysis with Asp-N and electron transfer dissociation mass spectrometry. Anal Chem 2010;82:7485-91.

[46] Alfaro JF, Gillies LA, Sun HG, Dai S, Zang T, Klaene JJ, et al. Chemo-enzymatic detection of protein isoaspartate using protein isoaspartate methyltransferase and hydrazine trapping. Anal Chem 2008;80:3882-9.

[47] Gururaja TL, Payan DG, Anderson DC. Gas phase dimerization of neuropeptide head activator analogs useful for the noncovalent constraint of peptides. Biopolymers 2007;88:55-63.

[48] Banerjee S, Mazumdar S. Non-covalent dimers of the lysine containing protonated peptide ions in gaseous state: electrospray ionization mass spectrometric study. J Mass Spectrom 2010;45:1212-9.

[49] Gurd FR. [34a] carboxymethylation. Methods Enzymol 1972;25:424-38.

[50] Datola A, Richert S, Bierau H, Agugiaro D, Izzo A, Rossi M, et al. Characterisation of a novel growth hormone variant comprising a thioether link between Cys182 and Cys189. ChemMedChem 2007;2:1181-9.

[51] Lispi M, Datola A, Bierau H, Ceccarelli D, Crisci C, Minari K, et al. Heterogeneity of commercial recombinant human growth hormone (r-hGH) preparations containing a thioether variant. J Pharm Sci 2009;98:4511-24.

[52] Cohen SL, Price C, Vlasak J. Beta-elimination and peptide bond hydrolysis: two distinct mechanisms of human IgG1 hinge fragmentation upon storage. J Am Chem Soc 2007;129:6976-7.

[53] Florence TM. Degradation of protein disulphide bonds in dilute alkali. Biochem J 1980;189:507-20.

[54] Galande AK, Trent JO, Spatola AF. Understanding base-assisted desulfurization using a variety of disulfide-bridged peptides. Biopolymers 2003;71:534-51.

135

[55] Tous GI, Wei Z, Feng J, Bilbulian S, Bowen S, Smith J, et al. Characterization of a novel modification to monoclonal antibodies: thioether cross-link of heavy and light chains. Anal Chem 2005;77:2675-82.

[56] Zhao G, Zhou ZS. Vinyl sulfonium as novel proteolytic enzyme inhibitor. Bioorg Med Chem Lett 2001;11:2331-5.

[57] Wang Z, Rejtar T, Zhou ZS, Karger BL. Desulfurization of cysteine-containing peptides resulting from sample preparation for protein characterization by mass spectrometry. Rapid Commun Mass Spectrom 2010;24:267-75.

[58] Zhou ZS, Smith AE, Matthews RG. L-Selenohomocysteine: one-step synthesis from L- selenomethionine and kinetic analysis as substrate for methionine synthases. Bioorg Med Chem Lett 2000;10:2471-5.

[59] Dillon TM, Ricci MS, Vezina C, Flynn GC, Liu YD, Rehder DS, et al. Structural and functional characterization of disulfide isoforms of the human IgG2 subclass. J Biol Chem 2008;283:16206-15.

[60] Wypych J, Li M, Guo A, Zhang Z, Martinez T, Allen MJ, et al. Human IgG2 antibodies display disulfide-mediated structural isoforms. J Biol Chem 2008;283:16194-205.

[61] Zhang B, Harder AG, Connelly HM, Maheu LL, Cockrill SL. Determination of Fab- hinge disulfide connectivity in structural isoforms of a recombinant human immunoglobulin G2 antibody. Anal Chem 2010;82:1090-9.

136

Chapter 4: Discovery and Characterization of a Novel Photo-Oxidative Histidine-Histidine

Crosslink in IgG1 Antibody Utilizing 18O-labeling and Mass Spectrometry

Reproduced with permission from “Min Liu, Zhongqi Zhang, Janet Cheetham, Da

Ren, and Zhaohui Sunny Zhou. Discovery and Characterization of a Novel Photo-Oxidative

Histidine-Histidine Crosslink in IgG1 Antibody Utilizing 18O-labeling and Mass

Spectrometry. Analytical Chemistry 2014, 86, 4940-4948.” Copyright [2014] American

Chemical Society.

Co-authors’ work in this chapter: Min Liu: experimental design and execute, data analysis, manuscript writing and revision; Zhonqi Zhang: data analysis, manuscript writing and revision;

Janet Cheetham: idea contribution, manuscript writing and revision, and grant support; Da Ren: manuscript writing and revision; Zhaohui Sunny Zhou: idea contribution, data analysis, manuscript writing and revision, and grant support.

137

4.1 Abstract

A novel photo-oxidative crosslinking between two histidines (His-His) has been

discovered and characterized in an IgG1 antibody via the workflow of XChem-Finder –18O labeling and mass spectrometry (Anal Chem 2013, 85, 5900-5908). Its structure was elucidated by peptide mapping with multiple proteases with various specificities (e.g., trypsin, Asp-N, and

GluC combined with trypsin or Asp-N) and mass spectrometry with complementary fragmentation modes (e.g., collision-induced dissociation (CID) and electron-transfer dissociation (ETD)). Our data indicated that crosslinking occurred across two identical conserved histidine residues on two separate heavy chains in the hinge region, which is highly flexible and solvent accessible. Based on model studies with short peptides, it has been proposed that singlet oxygen reacts with the histidyl imidazole ring to form an endoperoxide and then converted to the 2-oxo-histidine (2-oxo-His) and His+32 intermediates, the latter is subject to a nucleophilic attack by the unmodified histidine; and finally, elimination of a water molecule leads to the final adduct with a net mass increase of 14 Da. Our findings are consistent with this mechanism. Successful discovery of crosslinked His-His again demonstrates the broad applicability and utility of our XChem-Finder approach in the discovery and elucidation of protein cross-linking, particularly without a priori knowledge of the chemical nature and site of crosslinking.

138

4.2 Introduction

Protein crosslinks are ubiquitous in biological systems and biopharmaceuticals. They are also involved in disease pathologies such as Alzheimer[1-3] and cataractogenesis[2, 4]. As one of the post-translational modifications and degradations that occur during biopharmaceutical protein production processing and storage, crosslinks have been reported to result in aggregation, loss of bioactivity, and immunogenicity[5-7].

Despite the rapid advancements in mass spectrometry and data analysis algorithms, characterization of protein crosslinks remains challenging due to their structural complexity[8].

Whereas a limited set of crosslinked structures (e.g. thioether[7, 9-12]) have been characterized, most remain unknown; for example, the non-disulfide covalent crosslinking in crystalline [4, 13,

14], collagen[15], ubiquitylated proteins[3], ribonuclease A[16] and monoclonal antibodies[17,

18]. It is particularly challenging to characterize protein crosslinking without prior knowledge of the chemical nature and sites of crosslinking as no theoretical mass or spectrum can be predicted.

In contrast, numerous chemical crosslinks with well-established crosslinking chemistry have been used in the investigation of protein structures and protein-protein interactions[19-25].

Since pre-defined crosslinking chemistry is involved, various specialized algorithms have been developed for data analysis for each incorporated crosslink. Naturally, these approaches are less amenable to the identification of crosslinks with undefined crosslinking chemistry. Recently, we developed a workflow—XChem-Finder—that is generally applicable for protein crosslinking. It involves, first, the detection of cross-linked peptides via the unique isotope patterns imparted by

18O-labeling of their two termini (in comparison, one terminus for a linear peptide), and then integrated mass spectrometric and data analysis[8].

139

IgG1 and IgG2 are the most popular therapeutic monoclonal antibodies on the

market[26]. Applying our XChem-Finder workflow, we have discovered and characterized a

novel histidine-histidine (His-His) crosslink in IgG1 antibody. High molecular weight species in the light-irradiated IgG1 were detected by reduced SDS-PAGE and size exclusion chromatography (SEC). Our LC-MS analysis indicated that crosslinking occurred across two identical conserved histidine residues (His220) on two separate heavy chains in the hinge region, which is highly flexible and solvent accessible. The crosslinking chemistry is consistent with the proposed mechanism based on model peptides under photo-oxidative conditions (see Scheme 4-

1) [16, 27-29]. Successful discovery of His-His crosslink in IgG1 has further demonstrated the general applicability and power of our XChem-Finder workflow. To the best of our knowledge, our work reported herein is the first example of such crosslinking in a protein.

4.3 Experimental Section

4.3.1 Chemicals

All chemicals were reagent grade or above. Guanidine hydrochloride (GndHCl),

ethylenediaminetetraacetic acid (EDTA), dithiothreitol (DTT), iodoacetic acid (IAA), trifluoroacetic acid (TFA), acetonitrile (ACN), HPLC-grade water, and bradykinin were from

Sigma-Aldrich (St. Louis, MO, USA). Sequencing grade trypsin, GluC, and Asp-N were from

Roche (Indianapolis, IN, USA). 18O-water (97%) was from Cambridge Isotope Laboratories

(Andover, MA, USA). Recombinant monoclonal IgG1 antibody (anti-streptavidin

immunoglobulin gamma 1) was produced in Chinese hamster ovary (CHO) cells (Amgen,

140

Thousand Oaks, CA, USA), purified according to standard manufacturing procedures, formulated at a concentration of 30 mg/mL in 50 mM sodium acetate at pH 5.2, and stored at -70

°C.

4.3.2 Generation of Stressed Sample

After being exchanged into various buffers of biopharmaceutical interest (50 mM sodium acetate at pH 4.8, 50 mM sodium phosphate at pH 7.4, 50 mM sodium bicarbonate at pH 9.0 or water), the IgG1 antibody at a concentration of 5 mg/mL in a clear 3 mL glass vial was put into a light chamber (Atlas Suntest CPS+ with Xenon Lamp and ID65 solar filter, controlled irradiance at 300-800 nm, light intensity at 765 W/m2) and exposed to light irradiation for 7, 14, and 22 hrs. These conditions represent the light irradiance of 1 x, 2 x, and 3 x ICH (International

Conference on Harmonization of technical requirements for registration of pharmaceuticals for human use), respectively.

4.3.3 Aggregates by Size Exclusion Chromatography

Size exclusion chromatography (SEC) analysis for reduced IgG was carried out as described[30]. Briefly, IgG1 was diluted to 1 mg/mL in a denaturing buffer (7.5 M Gnd-HCl, 2 mM EDTA and 0.25 M Tris-HCl, pH 7.5) at room temperature. Reduction was accomplished by

10 mM DTT at room temperature for 30 min. Then 50 µL of the above samples was injected onto a TSKgel G3000 SWXL column (7.8 x 300 mm 5 µm) with an isocratic mobile phase of

141

0.1% TFA/H2O:ACN (80:20) and a flow rate of 0.2 mL/min. The column was set at room

temperature and the UV detector was at 280 nm.

4.3.4 Reduction, Alkylation, Tryptic Digestion and 18O-Labeling of IgG1

IgG1 was digested by trypsin similarly to the procedure described by Ren et al[31].

Briefly, IgG1 was diluted to 1 mg/mL in a denaturing buffer (7.5 M GndHCl, 2 mM EDTA and

0.25 M Tris-HCl, pH 7.5) to a final volume of 0.5 mL. Reduction was accomplished with the

addition of 3 μL of 0.5 M DTT followed by 30 min incubation at room temperature. S-

Carboxymethylation was achieved with the addition of 7 μL of 0.5 M IAA, and resulting mixture was incubated at room temperature in the dark for 15 min. Excess IAA was quenched with the addition of 4 μL of 0.5 M DTT. The reduced and alkylated IgG1 samples were subsequently

exchanged into the digestion buffer (0.1 M Tris-HCl at pH 7.5) using a NAP-5 size-exclusion column (GE Healthcare, Piscataway, NJ, USA). Next, two aliquots (200 µL each) were completely dried via SpeedVac and reconstituted separately into the same volume of 18O-water

or 16O-water; then 6 µL of 1 mg/mL trypsin in 18O-water or 16O-water solution, respectively, was

added to achieve a 1:25 (w/w) enzyme/substrate ratio. The reaction mixtures were incubated at

37 °C for 30 min.

Other proteolytic digestions of IgG1 (Asp-N, Trypsin combined with GluC, and Asp-N

combined with GluC) were performed in 16O-water only. Proteases were added to 100 µL of the

above buffer-exchanged antibody to achieve a 1:25 (w/w) enzyme/substrate ratio. The reaction

mixtures were incubated at 37 °C overnight.

142

Limited Asp-N digestion was performed by adding 6 µg of Asp-N into 300 µL digest (of

trypsin combined with GluC) and incubating at 37 °C for 1.5 hr for LC/CID-MS analysis. An

aliquot of 200 µL of the above digest was dried via SpeedVac and reconstituted into 40 µL of

water for LC/ETD-MS analysis.

4.3.5 HPLC

The proteolytic digests of IgG1 (25 μL) were separated on a Jupiter C5 column (250 x

2.0 mm, 5 μm, 300Å, Phenomenex, Torrance, CA, USA) at 50 °C with a flow rate of 200

μL/min on a HPLC system (Agilent 1100, Palo Alto, CA, USA). Mobile phase A was 0.1%

TFA in water (v/v) while mobile phase B contained 0.085% TFA in 90% ACN / 10% water. A

gradient was applied by holding at 2% B for 2 min, increasing to 22% B in 38 min, then 42% B

in 80 min, then 100% B in 25 min followed by holding at 100% B for 5 min. The column was

re-equilibrated at 2% B for 30 min prior to next injection.

For ETD analysis, digests of IgG1 (6 μL) were separated on a PROTO C4 column (150 x

1.0 mm, 5 μm, 300Å, Higgins Analytical, Mountain View, CA, USA) at 50 °C with a flow rate

of 60 μL/min on a HPLC system (Agilent 1100, Palo Alto, CA, USA). Mobile phase A was

0.1% FA / 0.02% TFA in water (v/v) while mobile phase B contained 0.1% FA / 0.02% TFA in

90% ACN / 10% water. The same gradient as described above was applied.

143

4.3.6 Mass Spectrometry

An LTQ-Orbitrap mass spectrometer (Thermo Fisher Scientific, San Jose, CA, USA) was

used in-line with a HPLC system for the analyses of the IgG1 proteolytic digests. A full MS

scan (with 60,000 resolution at m/z 400 and an automatic gain control (AGC) target value of

2x105) followed by data-dependent MS/MS scans of the three most abundant precursor ions was set up to acquire both the peptide mass and sequence information. The spray voltage was 5.5 kV, and the capillary temperature was 250 °C. The instrument was tuned using the doubly- charged ion of a synthetic peptide, bradykinin. The MS/MS spectra were obtained using CID with normalized collision energy of 35%. For MS/MS with ion detection in the Orbitrap, the

AGC target was set to 3x106, resolution to 7,500, and the precursor isolation width to 4 m/z unit.

Under our experimental conditions, the typical mass accuracy in full MS scan and FT MS/MS is

5 and 10 ppm, respectively.

ETD spectra were acquired on a Thermo-Scientific LXQ-XL mass spectrometer in

centroid mode with isolation width of 5, reaction time of 75 ms and reagent target value of

1x105, using singly charged fluoranthene anions as the ETD reagent. Both CID and ETD data

were analyzed for peptide identification, using a custom-written algorithm MassAnalyzer and verified manually[32-35].

144

4.4 Results and Discussion

A novel His-His crosslink in proteins has been discovered via our XChem-Finder workflow, without pre-defined crosslinking chemistry. Peptide mapping with mass spectrometry has established that the crosslink occurred across two identical His220 on each of the two heavy chains in the hinge region.

4.4.1 Detection of Crosslinked Protein.

Photo-induced non-reducible high molecular weight species were detected by reducing

SDS-PAGE; their intensities increased with longer light exposure (Figure 4-1A). Their formation was pH-dependent: less favorable under acidic conditions, such as pH ~5 for typical formulation of proteins (Figure 4-1B); and more favorable in neutral or basic buffers that are commonly used in protein production and purification (Figure 4-1B). The crosslinked species were also quantified by size exclusion chromatography (SEC) (Figure 4-1C and D). Mobile phase of 0.1% TFA/H2O:ACN (80:20) was used to avoid hydrophobic interaction with stationary phase[30]. The results from SEC and SDS-PAGE were consistent. The total amounts of the early elution peaks observed were at the level of 0.2, 4.5, 9.5, & 16.5% by peak area in the control sample and samples exposed to 1x, 2x, 3x ICH irradiation, respectively (Figure 4-1C).

The crosslinks were also observed to increase to 25.8% in 50 mM NaHCO3 pH 9.0, 15.7% in

50mM sodium phosphate pH 7.4, & 6.3% in 50 mM sodium acetate pH 4.8 (Figure 4-1D). It is interesting to note that the control sample (without light stress) already contained small yet detectable amount of crosslinking (0.4%, Figure 4-1D), suggesting such modifications could

145 occur during routine protein production and process. The chemical nature and site of crosslinking was discovered by our XChem-Finder workflow as detailed next[8].

146

0.16 C) Heavy chain Samples Total Crosslinks 0.14 (%) 0.12 3 x Light 16.5 2 x Light 9.5 Light chain 0.10 1 x Light 4.5 Control 0.2 Aggregate AU 0.08 7.5% 0.06 IgG1-H2O-3xLight 9.0% 5.1% 0.04 IgG1-H2O-2xLight 4.4% 2.9% 0.02 IgG1-H2O-1xLight 1.6% 0.2% IgG1-Control 0% 0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00 55.00 mAU Minutes 200 D) Heavy chain

175 Samples Total Crosslinks (%) Light chain 150 pH 9.0 25.8 pH 7.4 15.7 125 pH 4.8 6.3 Control 0.4 100 Aggregate 11.6% 75

50 5.8% IgG1-pH9.0-3xLight 14.2% IgG1-pH7.4-3xLight 9.9% 3.4% 25 IgG1-pH4.8-3xLight 2.9% IgG1-Control 0.3% 0 0.1% 10 20 30 40 50 min Minutes

Figure 4-1. Detection of crosslinking in IgG1 by reduced SDS-PAGE (A & B) and size exclusion chromatography

(SEC) (C & D). Samples with the corresponding lane numbers in parenthesis are molecular ladder (1), IgG1-control

(without stress) (2), IgG1-water-1xLight (3), IgG1-water-2xLight (4) & IgG1-water-3xLight (5); molecular ladder

(6), IgG1-control (7), IgG1-pH4.8-3xLight (8), IgG1-pH7.4-3xLight (9), & IgG1-pH 9.0-3xLight (10), respectively.

Formation of the high molecular weight bands increased with longer time light exposure and was more favorable under basic conditions than acidic conditions. His-His crosslinking of two heavy chains is likely to contribute to the band at ~100 kDa. The thioether crosslink between heavy chain and light chain is probably at the band of ~92 kDa.

Other high molecular weight bands could be due to other unknown crosslinking.

147

4.4.2 Detection of Crosslinked Peptides.

Tryptic digestion in 18O-water results in the incorporation of two 18O atoms in each of the

newly generated C-termini[36, 37]; hence, two 18O atoms for a linear tryptic peptide (with one

C-terminus) and four 18O atoms for a crosslinked peptide (with two C-termini)[38-40]. As

shown in Figure 4-2, the isotopic distribution of the peptide at m/z 1673.54 (quadruply charged,

monoisotopic mass 6687.149 Da) show mass shift of 8 Da (i.e., four 18O) in 18O-water compared

to that from 16O-water (Figure 4-2), indicating it contains two C-termini and is a crosslinked

peptide.

1673.7979 100 1673.5471 1674.0485 16O-water 80 1673.2970 1674.2987 60 1673.0456 1674.5492 40 1674.8000

Relative Abundance20 1672.7945 1675.0501 1675.3033 0 1675.5514 100 1675.8017 18O-water 1675.3006 80 1676.0523

1676.3026 60 +8Da 1675.0498

40 1676.5531 1674.7992 1676.8032

Relative Abundance 20 1674.5470 1677.0549 1674.2983 0 1673.0 1673.5 1674.0 1674.5 1675.0 1675.5 1676.0 1676.5 1677.0

Figure 4-2. Isotopic distributions of the crosslinked peptide S215-K244/S215-K244 m/z 1673.54

(z=4) from tryptic digestion of IgG1. A mass increase of 8 Da was observed when the sample was digested in 18O-water instead of regular 16O-water.

148

4.4.3 Elucidation of Crosslinking Chemistry.

The crosslinked peptide m/z 1673.54 underwent FT MS/MS analysis. As described in our previous paper[8], the fragment ions obtained were searched against the amino acid sequence of the IgG1 via FindPept to match all possible peptide fragments, see Table 4-1. Based on the peptide ladders observed, a partial sequence K218-K244

(KTHTCPPCPAPELLGGPSVFLFPPKPK, see Table 4-1) was identified. Then, the partial sequence was extended to a putative full-length tryptic peptide S215-K244

(SCDKTHTCPPCPAPELLGGPSVFLFPPKPK, 3336.587 Da). Since the fragment ions only matched this single peptide, we surmised that crosslinking occurred across the two identical peptides. The combined mass of the two unmodified (native) peptides is 6673.174 Da, which also satisfies the mass limitation conferred by the observed mass of the crosslinked peptide

(6687.149 Da, see Table 4-2).

In order to elucidate the crosslinking chemistry, elemental composition analysis of the crosslink was performed as illustrated in Table 4-2. The mass difference between the sum of the two native peptide chains and observed mass of the crosslinked peptide is 13.975 Da, for which three potential formula (O-2H, N, or CH2) were proposed. From a chemistry perspective, it is difficult to add just one nitrogen atom or a CH2 group. On the other hand, addition of one oxygen atom coupled with the loss of two hydrogen atoms (O-2H) indicates oxidation. The putative peptide chain K218-K244 contains His, of which oxidation and crosslinking have been reported[28, 29]. In addition, the formula O-2H gives the lowest mass error (0.004 Da).

Therefore, a potential His-His crosslinking structure is proposed as illustrated in Table 4-2 and

Figure 4-3 and verified as described next.

149

Table 4-1. Partial sequences that match the mass of fragmentation ions for the precursor ion m/z 1673.54 (z=4) (molecular mass at 6687.149 Da) eluted at

112.48 min. The longest peptide fragments for the observed b- and y-ions are underlined. All Cys are alkylated with IAA. The adjoining residues before cleavage are in parenthesis. Lys443 is the C-terminus of heavy chain. The crosslinked peptides were proposed to be HC:S215-K244/HC:S215-K244.

Corresponding Tryptic Peptide Notes Theor. User mass Δmass # m/z Charge mass peptide (Da) (ppm) Sequence Mass (Da)

1 566.364 1 565.356 565.359 4.8 (F)PPKPK/(D) (K)/215SCDKTHTCPPCPAPELLGGP SVFLFPPKPK244/(D) 3336.587 chain 1

2 1256.736 1 1255.729 1255.733 3.0 (G)PSVFLFPPKPK/(D)

3 1370.777 1 1369.770 1369.776 4.2 (L)GGPSVFLFPPKPK/(D)

4 1483.863 1 1482.856 1482.86 2.5 (L)LGGPSVFLFPPKPK/(D)

5 1823.042 1 1822.034 1822.039 2.8 (A)PELLGGPSVFLFPPKPK/(D)

6 1991.144 1 1990.136 1990.129 -3.4 (C)PAPELLGGPSVFLFPPKPK/(D)

7 1173.626 2 2345.238 2345.249 4.8 (C)PPCPAPELLGGPSVFLFPPK PK/(D)

8 1304.665 2 2607.315 2607.312 -1.2 (H)TCPPCPAPELLGGPSVFLFP PKPK/(D)

9 1487.750 2 2973.485 2973.513 9.4 (D)KTHTCPPCPAPELLGGPSVF LFPPKPK/(D)

10 1478.766 2 2973.528 2973.513 -4.9 (D)KTHTCPPCPAPELLGGPSVF LFPPKPK/(D)

Same as 11 992.341 1 1009.344 1009.348 4.0 (K)/SCDKTHTC(P) (K)/215SCDKTHTCPPCPAPELLGGP SVFLFPPKPK244/(D) above

12 1515.547 1 1532.55 1532.558 5.3 (K)/SCDKTHTCPPCPA(P) Same as above

too large 13 1605.255 2 3244.517 3244.493 -7.3 (K)/SRWQQGNVFSCSVMHEALHN HYTQKSL(S) (K)/411SRWQQGNVFSCSVMHEALHNHYTQKSLSLSPGK443 3813.810 as chain 2

150

Table 4-2. Deduction of elemental formula for the crosslinked S215-K244/S215-K244 peptide.

Name Mass (Da)

Calculated mass of S215-K244 (single chain) 3336.587

Sum of the mass of two unmodified chains 6673.174

Observed mass of the crosslinked peptide 6687.149

Mass difference +13.975

Proposed formula and calculated mass +O-2H: 13.979 +N: 14.003 +CH2: 14.016

(mass error in Da) (0.004) (0.028) (0.041)

Comments Most likely Unlikely Unlikely

N H Proposed structure N N - - O NH

4.4.4 Structural Confirmation by Mass Spectrometry

Full Scan and MSMS Analysis First, the calculated mass of the His-His crosslinked

peptide (6687.153 Da) is in good agreement with the observed mass (6687.149 Da, mass error

0.6 ppm, see Table 4-3). Second, the series of b- and y-ions are highly consistent with the

proposed structure (Figure 4-3). The observed y-ions from y5 to y24 and the b5 ions correspond

to fragment ions with no crosslinking site, while the y-ions from y27 and the b-ions from b13 to b24 are from fragments that contain the crosslinked histidine residues. These data support crosslinking at His 220. Moreover, y27* ion (in blue) and b8* ion (in blue) are peptide fragments resulting from cleaving the bond connecting the two crosslinked histidine residues

(see Figure 4-3). The missed cleavage by trypsin at Lys218 is likely due to its close proximity to 151 the crosslinking site at His220, reminiscent of similarly missed cleavages in the case of thioether crosslinking[9]. The second missed cleavage at Lys242 is likely due to the presence of adjacent proline residues. The two missed tryptic cleavages in the crosslinked peptide would have been especially challenging to handle by traditional database-dependent algorithms, again highlighting the utility of isotope labeling and our XChem-Finder workflow[8, 41, 42].

152

Table 4-3. Crosslinked peptides obtained from digestion of IgG1 by various proteases and the combination thereof. The crosslinking

sites are labeled in red and bold-face. All cysteines are alkylated with IAA. Peptides are shown with the amino acid residue position

in IgG1 in superscript and the adjoining amino acid residues before cleavage in parenthesis.

Name Proteases Crosslinked Peptides RT m/z Obs. Mass Theor. Mass Mass Error

(min) (Charge) (Da) (Da) (ppm)

S215-K244/ Trypsin (K)215SCDKTHTCPPCPAPELLGGPSVFLFPPKPK244(D) 112.48 1673.54 6687.149 6687.153 0.6

S215-K244 (K)215SCDKTHTCPPCPAPELLGGPSVFLFPPKPK244(D) (4+)

D217-K244/ Asp-N (C)217DKTHTCPPCPAPELLGGPSVFLFPPKPK244(D) 113.07 1549.53 6191.064 6191.060 0.6

D217-K244 (C)217DKTHTCPPCPAPELLGGPSVFLFPPKPK244(D) (4+)

S215-E229/ Trypsin + GluC (K)215SCDKTHTCPPCPAPE229(L) 41.19 1178.77 3531.289 3531.286 0.6

S215-E229 (K)215SCDKTHTCPPCPAPE229(L) (3+)

D217-E229/ Asp-N + GluC (C)217DKTHTCPPCPAPE229(L) 40.53 1013.41 3035.193 3035.193 0.0

D217-E229 (C)217DKTHTCPPCPAPE229(L) (3+)

D217-E229/ Trypsin + GluC; (C)217DKTHTCPPCPAPE229(L) 40.31 1096.09 3283.234 3283.240 1.9

S215-E229 then Asp-N (K)215SCDKTHTCPPCPAPE229(L) (3+)

153

1646.54 100 16O-water y27(4+) 80 y22(2+) 1582.92 1658.11 1174.15 y27*(2+) 60 1488.35 b19(3+) y13 b13(3+) 1812.50 y5 y24(2+) 1624.58 1305.90 1370.93 b18(3+) y19 40 566.53 y8 1793.53.50 y17 y12 1991.54 b24(3+) 973.95 1313.92 1823.25 b8* y11 b17(3+) 1992.54 b5-2H2O 1256.88 1774.42 1993.71 Relative Abundance Relative 20 b5 y6 992.58 557.40 y17(2+) y19(2+) 1922.33 593.38 713.63 912.51 996.86 0 1660.13 100 18O-water y27(4+) 80 y22(2+) y27*(2+)1584.81 b19(3+) 1813.94 1176.07 1490.23 60 b13(3+) b24(3+) y24(2+) 1625.47 b18(3+) 1307.02 y13 1794.61.50 y19 y5 y8 1374.90 y17 1995.34 40 570.53 y17(2+) 977.77 y11 b17(3+) 914.43 b8* 1260.92 y12 1775.841827.23 b5-2H2O 992.52 Relative Abundance 20 y6 1317.90 557.38 b5 974.53 y19(2+) 593.40 717.61 1923.58 831.52 998.74 0 600 800 1000 1200 1400 1600 1800 2000

Figure 4-3. CID MS/MS spectra of the quadruply charged precursor ions m/z 1673.54 (16O- labeled C-termini) and 1675.54 (18O-labeled C-termini) of the crosslinked tryptic peptide S215-

K244/S215-K244. Characteristic mass shift imparted by the heavier isotope 18O was observed

(e.g., the mass shift of 4 Da for y5 ions in 16O- vs 18O-water, 566.53 vs 570.53). The y27* ion results from cleavage of the His-His bond while the y27 ion contains the crosslinking site. MS3 spectrum of the y27* ion (m/z 1488.35) is shown in Figure 4-4.

154

Additional confirmation by 18O/16O-isotope fragment ions pattern. Since the

fragment ions containing no (zero), one, or two C-termini of the crosslinked peptides displayed a mass shift of 0, 4, and 8 Da, respectively, in the corresponding MS/MS spectra obtained from

18O- and 16O-water, the examination of mass shift of fragment ions can lend further support for the assignment of fragment ions. For example, as shown in Figure 4-3, b-ions prior to the crosslinking site (e.g., b5) have no mass shift between the 18O-water and 16O-water digests. On the other hand, the y-ions without the crosslinking site (e.g., y5) gave mass shift of 4 Da. All assignments were verified by their distinct mass shift in 18O, depending on the number of C-

termini they contain.

Additional confirmation by MS3 analysis. Several abundant fragment ions shown in

Figure 4-3 were selected for MS3 analysis which simplified and further confirmed data

interpretation. For example the fragment ion m/z 1488.35 shown in Figure 4-3 could not be

assigned initially, so it was selected for MS3 analysis (Figure 4-4). The analysis established that it was the y27* ion (in blue) generated from cleaving the bond connecting the two crosslinked histidine residues.

155

y22(2+) 1174.18 100 y26(2+) 90 1424.16

80 y25(2+) M-2H2O(2+) 70 1373.68 y5 1470.42 M-H2O(2+) 60 566.58 y26-H2O(2+) 1479.65 1415.31 y17 50 y19 1824.25 y23(2+) 1992.38 40 b22(2+) Relative Abundance Relative b5 1205.52 y24(2+) 30 y5-H2O 1305.22 629.39 1152.74 b14 y8 1254.45 548.54 1605.03 20 b5-H2O 973.77y19(2+) 611.54 y6 y7 996.49 b15 1125.88 10 713.47 826.85 902.69 1975.32 1077.60 1662.18 438.29 742.63 1522.14 1901.53 0 600 800 1000 1200 1400 1600 1800 2000 m/z

Figure 4-4. MS3 spectrum of the doubly charged fragment ion m/z 1488.35 obtained from

MS/MS of the precursor ion m/z 1673.54 in Figure 4-3.

Peptide mapping with multiple proteases. Since this is the first report of His-His crosslinking in a protein, peptide mapping with additional proteases was carried out to glean complimentary data[43, 44]. In additional to trypsin, proteases with different sequence specificity (e.g. Asp-N[45-47] or GluC [48]) and combined proteases (e.g., trypsin with GluC, and Asp-N with GluC) were employed. Additional crosslinked peptides containing His220 were detected and analyzed: D217-K224/D217-K224 from Asp-N, S215-E229/S215-E229 from trypsin with GluC, and D217-E229/D217-E229 from Asp-N and GluC, respectively.

In each case, the observed mass was in good agreement with its theoretical mass with mass errors ranging from 0.0-0.6 ppm (see Table 4-3). The y- and b-ions were also consistent with the corresponding structure (Figure 4-5, 4-6, & 4-7). Similar to the tryptic peptide, the

156 crosslinking site and chemistry were further supported by the presence of several ions generated from cleaving the bond connecting the two crosslinked histidine residues, such as the doubly charged ion at m/z 880.67, the singly charged ion at m/z 1773.77, and the singly charged ion at m/z 992.52 (b8*) shown in Figure 4-5 (all highlighted in blue).

b13(2+) 1645.83 100

90

80

70 Ion 1759(2+) 60 880.67

50 y7 1157.97 40 768.43 b11(2+)

Relative Abundance Ion 1773 30 y12(3+) 1561.78 1057.61 b8(2+) 1773.77 1383.36 b13-H2O(2+) 20 b13(3+) 1636.17 Ion 1759-H2O(2+) b8* 1523.79 1097.01 1759.78 10 y4 b5 871.73 y8992.52b6(2+) 1338.10 413.34 929.49 593.41 671.44 1252.25 1905.85 1999.06 0 400 600 800 1000 1200 1400 1600 1800 2000

Figure 4-5. CID MS/MS spectrum of the triply charged precursor ion m/z 1178.77 of the crosslinked S215-E229/S215-E229 peptide generated from combined trypsin and GluC digestion. The b8* ion results from cleavage of the His-His bond while the b8 ion contains the crosslinking site.

157

y27(4+) y27*(2+)1520.81 100 1488.36 y22(2+) 90 1174.08 80 b23(3+) 1877.11 70 b23-H2O(3+) b16(3+) 1871.14 60 y13 1627.84 1370.90 b23-2H2O(3+) y5 1865.69 50 566.48 b17-H2O(3+) b15(3+) 1641.47 y17 b11-H2O(3+) 1609.74 40 1823.22 1536.95 y11 1452.27 y19 Relative Abundance 30 1256.87 y12 b9(3+) 1992.44 b6* y19(2+) 1313.88 1402.12 b17(3+) 20 y8 996.60 1646.80 b6(2+) y5-H2Oy6744.46 y17(2+) b20(3+) 973.74 1924.62 10 548.47 713.62 912.49 1741.77 826.69 1099.60 0 600 800 1000 1200 1400 1600 1800 2000

Figure 4-6. CID MS/MS spectrum of the quadruply charged precursor ions m/z 1549.53 of the crosslinked peptide D217-K244/D217-K244 from Asp-N digestion.

158

y12 y7 y4

DKTHTCPPCPAPE b 6 b9 b11 1525 N H N N 1511 O NH y12 y7 y4

DKTHTCPPCPAPE

b6 b9 b11

b11(2+) 1396.95 100

90

80

70

60 b11-H2O(2+) 50 1388.88

40

Relative Abundance 30 Ion 1511(2+) b9(2+) Ion 1511 756.60 y7 1511.73 20 b6(2+) 1312.90 768.44 992.53 Ion 1525 y12(3+) 1135.34 1525.71 y4 974.88 1267.61 10 899.48 413.31 747.73 1089.43 1190.86 0

Figure 4-7. CID MS/MS spectrum of the triply charged precursor ions at m/z 1013.41 of the

crosslinked peptide D217-E229/D217-E229 from digestion with Asp-N and GluC.

ETD MS/MS analysis. As an alternative fragmentation technique, ETD provides sequence information complimentary to that obtained from CID by cleaving a peptide backbone in a less selective manner than CID [34, 49, 50]. Higher charge state ions usually generate more effective ETD fragmentation[50], therefore formic acid instead of TFA was used in the mobile phase to increase charge state for more effective ETD fragmentation and to minimize ion

suppression. All ETD MS/MS spectra were collected with supplemental activation and

159 dominated by charge reduced species. The charge states of 5, 6, 4, and 4 for the peptide S215-

K244/S215-K244, D217-K244/D217-K244, S215-E229/S215-E229, and D217-E229/D217-

E229, respectively, offered optimal ETD fragmentation for each crosslinked peptide (Figure 4-8,

4-9, 4-10 & 4-11). While different than those from CID, the fragmentation patterns from ETD also support our proposed crosslinking site and chemistry. For instance, the c5 and c*8 ions in

Figure 4-8 narrow the site within the HTC motif; the c4 ion in Figure 4-9, c5 and c6 ions in

Figure 4-10, and c3 and z8 ions in Figure 4-11 pinpointed the crosslink at His220.

M(5+) 1339.65 8

7 z18(2+) 939.75

6

5 z.8 c8* 958.60 4 1023.38 z.12 z7 1298.83 3 810.49 z.15 . 1583.22 Relative Abundance z 6 . 698.47 . z 13 2 z16++ z 10 855.46 z.9 1356.82 . 1144.63 z.14 c5 z 15++ 1058.76 792.39 y7 1469.11 1 y5 610.26 z3 c4 z14++ 827.53 1173.63 566.47 734.71 1546.26 356.18 509.37 645.29 0 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600

Figure 4-8. ETD MS/MS spectrum of the precursor ion m/z 1339.70 (z=5) of the crosslinked tryptic peptide S215-K244/S215-K244.

160

M(6+) 1031.49 1546.33 M-58(4+) 1535.23 30

25 1517.39

c21(3+) 20 1795.12

M-58(5+) M(4+) 15 M-58(6+) 1023.63 1228.17 c8(3+) 1549.46 c4(3+) 1353.73 b9(3+) . Relative Abundance c 16(3+) 10 y21(2+)1200.93 1401.83 z18(2+) c20(3+) 1125.33 M(5+) 1634.09 z12(2+) z15(2+) c10(3+) 939.92 1239.92 1746.67 649.65 791.27 z7 z8 y22(2+) 1440.31 1600.72 5 z.6 z3 y5 810.40 957.44 1173.79 z13(3+) 698.47 z16(2+) 355.92 566.50 855.21 280.20 452.56 0 400 600 800 1000 1200 1400 1600 1800

Figure 4-9. ETD MS/MS spectrum of the precursor ion m/z 1033.35 (z=6) of the crosslinked peptide D217-K244/D217-K244 from Asp-N digestion.

161

z13 z11 z10

SCDKT H TCPP C PA P E

c4 c5 c9 c10 c11 c12 c14 N H N N O NH z13 z11

SCDKTHTCPPCPAPE c c4 c5 c6 c9 c10 c11 c12 14

M(4+) 884.46 100 90 A) 80 M(3+) 1178.74 70 60 50 M-58(3+) 40 1159.17

Relative Abundance30 M(2+) 1768.80 20 c14(3+) M-58(2+) 1950.97 1135.17 1739.29 10 c4 c5 y7 c6(3+) y13(3+) z11(2+) 1708.90 1927.22 428.43 509.41 610.20769.13 839.42 894.73 1096.20 1323.19 1513.75 0 200 400 600 800 1000 1200 1400 1600 1800 m/z M(4+) c14(3+) 884.46 1135.17 B) 140 C) 1.0 . 120 z 13(2+) 1635.96 0.8 y13(3+) 100 1096.20 . 0.6 80 z 11(2+) 1513.75 60 . . 0.4 c6(3+) z 10(2+) c 12(2+) 1658.24 Relative Abundance Relative Abundance 839.42 40 1463.18 1618.91 c5 y7 . c11(2+) y6 c 9(2+) c10(2+) 1596.47 0.2 1548.33 c4 610.20769.13 894.73 20 1488.78 1569.64 672.22 1037.73 1441.21 1542.06 428.43 509.41 970.36 0.0 0 400 500 600 700 800 900 1000 1100 1450 1500 1550 1600 1650 m/z m/z

Figure 4-10. ETD MS/MS spectrum of the precursor ion m/z 884.33 (z=4) of the crosslinked peptide S215-E229/S215-E229 from digestion with trypsin and GluC.

162

M(4+) 760.36 22 M-58(3+) 994.14 20 A) 18 1994.11 16 14 12 982.94 10 c12(3+) 969.85 z13(3+) 8 z13(2+) 804.41 1007.62 Relative Abundance 6 1511.12 1944.95 z8 z10(2+) 4 913.35 c2 c3 y5 1474.13 y4 1338.87 1884.37 2 362.36 955.19 1138.92 261.15 413.21 574.46 715.95 1241.51 1608.81 1779.61 0 200 400 600 800 1000 1200 1400 1600 1800 2000 m/z M(4+) 760.36 804.41

1.8 120 B) C) M-58(2+) z13(2+) 1.6 y7 1490.14 1511.12 100 1.4 768.30 1.2 80 c10(2+) b12(2+) 1369.87 1.0 1445.81 c3 60 0.8 z10(2+) . 362.36 z8 z 11(2+) 1474.13 1338.87 1389.95 z.12(2+) Relative Abundance Relative Abundance 0.6 913.35y8 40 a10(2+) 1430.48 c2 1347.95 b11(2+) 1454.37 M(2+) 0.4 z3 929.45 261.15 y6 1397.24 300.14 y4 y5 715.95 20 1519.59 0.2 671.27 894.23 413.21513.34 574.46 820.48 0.0 0 300 400 500 600 700 800 900 1350 1400 1450 1500 1550 m/z m/z

Figure 4-11. ETD MS/MS spectrum of the precursor ion m/z 760.31 (z=4) of the crosslinked

peptide D217-E229/D217-E229 from digestion with Asp-N and GluC.

4.4.5 Mechanism of formation for His-His crosslink.

Photo-oxidation and crosslinking between histidine residues have been studied using both free histidine and model peptides. The commonly accepted mechanism is depicted in Scheme 4-

163

1. Singlet oxygen (e.g., generated from photoactivated dye rose bengal[51]) reacts with histidine

to form a highly reactive and labile endoperoxide intermediate, which converts into a hydroperoxide intermediate and then 2-oxo-histidine (2-oxo-His) and His+32 intermediates.

Subsequently, the His+32 intermediate can be attacked by the nucleophilic imidazole of another histidine residue; followed by the elimination of a water molecule to give the final crosslinking product (Scheme 4-1)[16, 27-29, 52, 53]. As discussed below, our results are consistent with this mechanism.

N N NH N H N O2, hv N O NH NH

His His-His crosslink

1 O2 -H2O

N

NH O N N O N HO OH Endoperoxide NH

His Nucleophilic NNH addition H N O N N OH O HO O NH H N NH Hydroperoxide 2-oxo-His His+32

Scheme 4-1. Proposed mechanism for the formation of His-His crosslink via photo-oxidation intermediates.

164

First, oxygen was present in all buffers and water in which IgG1 was exposed to light

irradiation. Second, several photo-oxidation intermediates were observed. The endoperoxide

intermediate is unstable and has only been observed by low-temperature NMR study[16, 27], so

we are not surprised that it was not detected by our LC-MS analysis. However, the subsequent

oxidation intermediates, 2-oxo-His (+14 Da) and His+32 species (+32 Da), were detected. The peptides with masses 14 and 32 Da greater than the unmodified peptide, S215-K244 (

SCDKTHTCPPCPAPELLGGPSVFLFPPKPK), were observed in the light stressed samples but not in the control sample (Table 4-4). Tandem mass spectra confirmed their structures to be the peptides modified at His220 (Figures 4-12 and 4-13). Third, the reported model studies showed the crosslinking was favored at higher pH, as the neutral (deprotonated) imidazole in histidine

(pKa ~6) is more reactive for nucleophilic attack and thus results in a higher yield of crosslinking[54]. Similar pH dependence was observed in our case as discussed above (Figure

4-1). Lastly, the two His220 residues are juxtaposed in the hinge region, which is highly exposed to solvent and flexible, as illustrated in Figure 4-14. In fact, in most crystal structures, the side chains of residues in the hinge region could not be located, indicating a high degree of flexibility. In this illustrative structure (PDB 1HZH), side chain of only one histidine residue was observed.

165

Table 4-4. Peptides containing the 2-oxo-His (+14 Da) and His+32 (+32 Da) intermediates observed in the stressed IgG1. They were not detected

in the control. His220 residues are labeled in red. All cysteine are alkylated with IAA. Peptides are shown with the amino acid residue position

in IgG1 in superscript and the adjoining amino acid residues before cleavage in parenthesis. The level of each peptide is determined by peak area

of modified peptide over normal tryptic peptide (T219-K224) with the consideration of all charge states.

# Peptide Name Peptide Sequence RT m/z Obs. Mass Theor. Mass Mass Level by

(min) (Charge) (Da) (Da) Error Peak Area

(ppm) (%)

1 T219-K224 (K)219THTCPPCPAPELLGGPSVFLFPPKPK244(D) 93.98 949.82 2845.422 2845.418 1.3 100

(3+)

2 S215-K224 (K)215SCDKTHTCPPCPAPELLGGPSVFLFPPKPK244(D) 92.17 1113.87 3336.591 3336.587 1.1 83.1

(3+)

3 2-oxo-His (K)215SCDKTH(+14)TCPPCPAPELLGGPSVFLFPPKPK244(D) 93.48 1118.20 3350.570 3350.566 1.1 2.3

(3+)

4 His+32 (K)215SCDKTH(+32)TCPPCPAPELLGGPSVFLFPPKPK244(D) 93.00 1124.20 3368.578 3368.577 0.2 0.5

(3+)

5 S215-K244/ (K)215SCDKTHTCPPCPAPELLGGPSVFLFPPKPK244(D) 112.48 1673.54 6687.149 6687.153 0.6 4.4

S215-K244 (K)215SCDKTHTCPPCPAPELLGGPSVFLFPPKPK244(D) (4+)

166

y22(2+) 1174.07 100

90

80

70

60

50

40 y5 b8 y26(2+) 566.48 1006.50 y13 1431.03

Relative Abundance y17(2+) 30 y13(2+) 912.46 y19(2+) M-2H2O(3+) 1370.91 686.32 y14(2+) y7 1106.93 b25(2+) 20 742.81 826.73 y8996.77 b13 P14_F25 1394.29 y11(2+) y6 973.74 y28(3+) y141529.80 y5-H O y15(2+) 1257.96 y12 b16 10 2 713.52 799.78 1036.18 1314.01 1484.11 y15 b15 b17 548.54 629.23 1982.12 323.38 1597.04 1756.92 1869.06 0 400 600 800 1000 1200 1400 1600 1800 2000 m/z

Figure 4-12. CID MS/MS spectrum of the tryptic peptide containing the 2-oxo-His (+14 Da) intermediate (m/z=1118.20, z=3, mass=3350.570 Da).

167

y22(2+) 1174.11 100 1104.15

90

80

70 y5 566.52 y19(2+) 60 996.41 y12(2+) 1393.91 50 657.50 y17(2+) 1371.84 b8-H2O y11(2+) 912.48 40 629.23 1006.49 1306.94 Relative Abundance 30 y8 y11 1669.62 b5 1256.81 593.48 y13(2+) 973.95 20 b7 b4 686.00 y6 863.54 713.61 1566.12 10 492.39 1781.02 1937.86 425.50 0 400 600 800 1000 1200 1400 1600 1800 2000 m/z

Figure 4-13. CID MS/MS spectrum of the tryptic peptide containing the His+32 intermediate

(m/z=1124.20, z=3, mass=3368.578 Da).

168

A)

His237

B)

Figure 4-14. (A) Space filling illustration of the hinge region of IgG1 antibody (DKTHTCPPCP); the underlined and bold residues are shown in color; the atoms are shown in color: oxygen in red, nitrogen in dark blue, carbon in light blue and sulfur in yellow. The image is rendered based on PDB 1HZH using the VMD software (Visual Molecular Dynamics). In this structure, His237 is equivalent to the His220 described in the paper. (B) Three-dimensional (3D) structure of an IgG1 (PDB entry 1HZH[55]). The characteristic hinge region sequences (SCDKTHTCPPC) of two heavy chains of IgG1 are circled. The first cysteine is disulfide bonded with the C-terminal cysteine in light chain. The other two cysteines form inter-heavy chain disulfide bridges. Two heavy chain histidines (His237) are located in the hinge region which is very flexible and highly solvent accessible. The model of 1HZH is being used for illustrative purposes only; in this structure, His237 is equivalent to the His220 discussed in our paper.

169

Based on the reaction pathway and protein structure, crosslinking of lysine with the

oxidized histidine via nucleophilic addition is also plausible[16, 27], and Lys218 is in the

vicinity of His220. Therefore, great effort was made to determine whether the crosslinking is

His220-His220 or Lys218-His220. This is particularly challenging due to the pseudo-symmetry

in the crosslinked peptide; in other words, when the two chains share identical sequence (e.g., in

Figure 4-3, 4-5, and 4-8), any fragment ion could come from either one chain or both. For

example, c4 and c5 ions in Figure 4-8 indicated the existence of unmodified Lys218 and Thr219,

but could not unambiguously establish whether they were from one chain or both. To address

this issue, an asymmetric crosslinked peptide (i.e., two chains of different length) was generated

via limited digestion. IgG1 fully digested by trypsin and GluC was treated with Asp-N for a

limited time to obtain a crosslinked peptide with two different chains D217-E229/S215-E229

(Figure 4-15 and 4-16). Its precursor ion m/z 1096.09 (z=3) has an observed mass of 3283.234

Da, which is in agreement with the theoretical mass of 3283.240 Da (Table 4-3). As shown in

Figures 4-15 and 4-16, cleavage of the His-His bond resulted in ions m/z 1511 and m/z 1773,

indicating that the oxidized His residue is on the long chain highlighted in red. Moreover, the c2

and c3 ions from the short chain (highlighted in blue) together with the c2 to c5 ions from the

long chain (highlighted in red) indicate the absence of modification for all residues N-terminal to

His220 on both chains, thus ruling out crosslinking between Lys218 and His220. This is not

unexpected, as at the pH for our studies, the amine on the lysine side chain is mostly protonated

and thus renders it unreactive[56, 57]. And, of course, others factors such as local environment

and solvation are known to modulate reactivities in enzymes and antibodies[58-60]. Taken

together, our data have firmly established that the crosslinking is between the two heavy chain

His220 residues.

170

M-58(3+) M(4+) M-58(2+) . 822.331065.98 1076.20 1614.51 z 13(2+) 8 . c14(3+)1052.41 z 15(2+) 1635.47 7 A) 1999.43

6 z13(3+) 5 z15(3+) Ion 1511 1975.81 . 4 1090.30 z 13(2+) ion 1773-14(2+) 1511.42 1963.08 3 y7 Ion 1511(2+) 928.32 M(3+) z12(2+) c10(2+)

Relative Abundance c4(2+) c12(2+) y7 c12(3+) 1096.05 2 y6 c6(2+) z.11(2+)1493.91 y3 c5 880.28 996.13 1918.49 c3 c4 804.49 1230.36 1 y3 y6 1136.05 z10(2+)1389.57 1761.61 509.27 610.29 316.25 362.20 671.39 1294.18 0 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 m/z

M(4+) M-58(2+)z.15(2+) 822.33 Ion 1511 1614.51 1635.47 1.2 . . 1605.89 z.13(2+) Ion 1773-14(2+)25 z 13(2+) z 12(2+) B) y7 880.28 C) c12(2+) 1511.42 1598.54 1.0 y7 c14(2+) 1621.72 804.49 1578.10 768.34 20 Ion 1511(2+) z10(2+) 0.8 756.49 z.11(2+) 1585.57 1626.64 b6(3+) c10(2+) b12(2+) 15 1513.41 c2 c5 b8(3+) . c12(2+) b14(2+) 0.6 z 12(2+) 266.25 z3 610.29 839.93 1569.30 c8(2+) 1453.90 1493.91 z3 c4 10 c10(2+) . 1462.89 1548.43 0.4 300.29 y5 z 11(2+) 1430.51 1644.44 Relative Abundance Relative Abundance 509.27 1472.31 c2 y3 c3 y6 z10(2+) 1527.51 261.41 362.20 y5 1389.57 y3 c3 y4 y6 5 1338.55 0.2 y2 381.11 574.26 b11 316.25 671.39 y2 y4466.25 548.43 1365.07 1398.58 245.21 338.22 413.26 0.0 0 300 400 500 600 700 800 1350 1400 1450 1500 1550 1600 1650 m/z m/z

Figure 4-15. ETD MS/MS spectrum of the quadruply charged precursor ion m/z 821.09 of the

crosslinked peptide D217-E229/S215-E229 generated by limited Asp-N digestion of fully digested IgG1 by trypsin and GluC.

171

b11(2+) b13(2+) 1520.88 100

90

80

70

60

50 b13-H2O(2+) b11-H2O(2+) 40 1512.11 y7 Relative Abundance 30 y7 y10(3+) 768.38 1075.75 b11(2+) Ion 1773-14 Ion 1511(2+) Ion 1773-14(2+) y12(3+) b8(2+) b9(2+) 20 1759.60 y4 756.64 880.51 1058.15 b6(2+) 1436.90 Ion 1773 y4 1259.371399.10 980.29 10 1213.36 1773.63 413.23 524.28 672.27 1729.74 1952.85 0 400 600 800 1000 1200 1400 1600 1800 2000

Figure 4-16. CID MS/MS spectrum of the triply charged precursor ions at m/z 1096.09 of the

crosslinked peptide D217-E229/S215-E229 from limited digestion by Asp-N of the fully

digested IgG1 by GluC and trypsin.

4.4.6 Other Crosslinks.

As reported in literature[9-11], crosslinking via thioether between the heavy chain hinge region and the light chain C-termini (HC:S215-K218/LC:T211-S218, SCDK/TVAPTECS) was

also observed in the photo-irradiated IgG1 (Table 4-5). The cleavage and formation of carbon- 172

sulfur (C-S) bonds may occur via either homolytic (e.g., radical or photo-induced) or heterolytic

(e.g, elimination and addition) mechanisms[10, 11, 61, 62]. These additional crosslinks may also

account for the multiple non-reducible higher molecular species detected by SDS-PAGE and

SEC described above (Figure 4-1). In the SDS-PAGE gel (Figure 4-1), the first band (with an

apparent molecular weight about 92 kD) for the sample pH9-3xLight (lane 10) was not observed

in the samples pH4.8-3xLight (lane 8) and pH7-3xLight (lane 9). It is likely that this band

corresponds to the thioether crosslink of LC-HC, as it is favorable under basic conditions[9, 10].

Although the thioether crosslink between two heavy chains (e.g. Cys222-Cys222) has been

reported after a higher dose of photo irradiation[11], it was not detected in our sample by MS.

The His-His crosslink of two heavy chains may contribute to the band with an apparent

molecular weight about 100 kDa. The bands with apparent molecular weight about150 and 200

kDa are probably due to the crosslinking of more than two chains.

Table 4-5. Thioether crosslinks detected in IgG1. The second thioether crosslink has two

missed trypsin cleavages at Lys218 and Lys242 probably due to nearby thioether crosslink site

and proline amino acid residues, respectively.

Theo. Mass RT m/z Obs. Mass Level Sequence Mass Error (min) (charge) (Da) (%) (Da) (ppm)

LC:T211-S218/ 211TVAPTECS218/ 612.77 22.11 1223.535 1223.534 0.7 0.1 HC:S215-K218 215SCDK218 (2+)

LC:T211-S218/ 211TVAPTECS218/ 1351.99 90.97 4050.947 4050.942 1.2 10.1 HC:S215-K244 215SCDKTHTCPPCPAPELLGGPSVFLFPPKPK244 (3+)

173

4.5 Conclusions

Our XChem-Finder workflow again leads to the discovery of an undefined and novel

protein His-His crosslink, demonstrating its broad applicability and utility. Since the His-His

crosslink is found in the highly conserved hinge region of IgG1, this modification most likely exists in other IgG1 molecules. As discussed above, a low level of crosslinking was present even without light stress, suggesting protein crosslinking in therapeutic proteins is perhaps more common than we have appreciated. Such drastic modification of proteins is likely to affect product quality, clinical efficacy, and even at low abundance, immunogenicity. And again, to the best of our knowledge, there is no other alternative systematic approach that can be generally used to fully characterize protein crosslinking without a priori knowledge of the chemistry and site. With the rapid advancement in mass spectrometric techniques (e.g. high resolution and complementary fragmentation mechanisms), we expect the discovery and elucidation of other new protein crosslinking by our XChem-Finder approach will be equally successful.

4.6 References

[1] Wilhelmus MM, Grunberg SC, Bol JG, van Dam AM, Hoozemans JJ, Rozemuller AJ, et al. Transglutaminases and transglutaminase-catalyzed cross-links colocalize with the pathological lesions in Alzheimer's disease brain. Brain Pathol 2009;19:612-22.

[2] Wang SS, Wu JW, Yamamoto S, Liu HS. Diseases of protein aggregation and the hunt for potential pharmacological agents. Biotechnol J 2008;3:165-92.

[3] Nemes Z, Devreese B, Steinert PM, Van Beeumen J, Fesus L. Cross-linking of ubiquitin, HSP27, parkin, and alpha-synuclein by gamma-glutamyl-epsilon-lysine bonds in Alzheimer's neurofibrillary tangles. FASEB J 2004;18:1135-7.

174

[4] Balasubramanian D, Du X, Zigler JS, Jr. The reaction of singlet oxygen with proteins, with special reference to crystallins. Photochem Photobiol 1990;52:761-8.

[5] Liu H, Gaza-Bulseco G, Faldu D, Chumsae C, Sun J. Heterogeneity of monoclonal antibodies. J Pharm Sci 2008;97:2426-47.

[6] Beck A, Wagner-Rousset E, Ayoub D, Van Dorsselaer A, Sanglier-Cianferani S. Characterization of therapeutic antibodies and related products. Anal Chem 2013;85:715- 36.

[7] Lispi M, Datola A, Bierau H, Ceccarelli D, Crisci C, Minari K, et al. Heterogeneity of commercial recombinant human growth hormone (r-hGH) preparations containing a thioether variant. J Pharm Sci 2009;98:4511-24.

[8] Liu M, Zhang Z, Zang T, Spahr C, Cheetham J, Ren D, et al. Discovery of undefined protein cross-linking chemistry: a comprehensive methodology utilizing (18)o-labeling and mass spectrometry. Anal Chem 2013;85:5900-8.

[9] Tous GI, Wei Z, Feng J, Bilbulian S, Bowen S, Smith J, et al. Characterization of a novel modification to monoclonal antibodies: thioether cross-link of heavy and light chains. Anal Chem 2005;77:2675-82.

[10] Cohen SL, Price C, Vlasak J. Beta-elimination and peptide bond hydrolysis: two distinct mechanisms of human IgG1 hinge fragmentation upon storage. J Am Chem Soc 2007;129:6976-7.

[11] Mozziconacci O, Kerwin BA, Schoneich C. Exposure of a monoclonal antibody, IgG1, to UV-light leads to protein dithiohemiacetal and thioether cross-links: a role for thiyl radicals? Chem Res Toxicol 2010;23:1310-2.

[12] Wang Z, Rejtar T, Zhou ZS, Karger BL. Desulfurization of cysteine-containing peptides resulting from sample preparation for protein characterization by mass spectrometry. Rapid Commun Mass Spectrom 2010;24:267-75.

[13] Fujimori E. Crosslinking and photoreaction of ozone-oxidized calf-lens alpha-crystallin. Invest Ophthalmol Vis Sci 1982;22:402-5.

[14] Srivastava OP, Kirk MC, Srivastava K. Characterization of covalent multimers of crystallins in aging human lenses. J Biol Chem 2004;279:10901-9.

[15] Lopez B, Gonzalez A, Hermida N, Valencia F, de Teresa E, Diez J. Role of lysyl oxidase in myocardial fibrosis: from basic science to clinical aspects. Am J Physiol Heart Circ Physiol 2010;299:H1-9.

175

[16] Shen HR, Spikes JD, Kopeckova P, Kopecek J. Photodynamic crosslinking of proteins. II. Photocrosslinking of a model protein-ribonuclease A. J Photochem Photobiol B 1996;35:213-9.

[17] Kroon DJ, Baldwin-Ferro A, Lalan P. Identification of sites of degradation in a therapeutic monoclonal antibody by peptide mapping. Pharm Res 1992;9:1386-93.

[18] Van Buren N, Rehder D, Gadgil H, Matsumura M, Jacob J. Elucidation of two major aggregation pathways in an IgG2 antibody. J Pharm Sci 2009;98:3013-30.

[19] Sinz A. Investigation of protein-protein interactions in living cells by chemical crosslinking and mass spectrometry. Anal Bioanal Chem 2010;397:3433-40.

[20] Petrotchenko EV, Borchers CH. Crosslinking combined with mass spectrometry for structural proteomics. Mass Spectrom Rev 2010;29:862-76.

[21] Walzthoeni T, Leitner A, Stengel F, Aebersold R. Mass spectrometry supported determination of protein complex structure. Curr Opin Struct Biol 2013;23:252-60.

[22] Singh P, Panchaud A, Goodlett DR. Chemical cross-linking and mass spectrometry as a low-resolution protein structure determination technique. Anal Chem 2010;82:2636-42.

[23] Tang X, Bruce JE. Chemical cross-linking for protein-protein interaction studies. Methods Mol Biol 2009;492:283-93.

[24] Bruce JE. In vivo protein complex topologies: sights through a cross-linking lens. Proteomics 2012;12:1565-75.

[25] Singh P, Shaffer SA, Scherl A, Holman C, Pfuetzner RA, Larson Freeman TJ, et al. Characterization of protein cross-links via mass spectrometry and an open-modification search strategy. Anal Chem 2008;80:8799-806.

[26] Wang X, Das TK, Singh SK, Kumar S. Potential aggregation prone regions in biotherapeutics: A survey of commercial monoclonal antibodies. MAbs 2009;1:254-67.

[27] Shen HR, Spikes JD, Kopecekova P, Kopecek J. Photodynamic crosslinking of proteins. I. Model studies using histidine- and lysine-containing N-(2- hydroxypropyl)methacrylamide copolymers. J Photochem Photobiol B 1996;34:203-10.

[28] Agon VV, Bubb WA, Wright A, Hawkins CL, Davies MJ. Sensitizer-mediated photooxidation of histidine residues: evidence for the formation of reactive side-chain peroxides. Free Radic Biol Med 2006;40:698-710.

[29] Pattison DI, Rahmanto AS, Davies MJ. Photo-oxidation of proteins. Photochem Photobiol Sci 2012;11:38-53.

176

[30] Liu H, Gaza-Bulseco G, Chumsae C. Analysis of reduced monoclonal antibodies using size exclusion chromatography coupled with mass spectrometry. J Am Soc Mass Spectrom 2009;20:2258-64.

[31] Ren D, Pipes GD, Liu D, Shih LY, Nichols AC, Treuheit MJ, et al. An improved trypsin digestion method minimizes digestion-induced modifications on proteins. Anal Biochem 2009;392:12-21.

[32] Zhang Z. Prediction of collision-induced-dissociation spectra of peptides with post- translational or process-induced modifications. Anal Chem 2011;83:8642-51.

[33] Zhang Z. Prediction of low-energy collision-induced dissociation spectra of peptides. Anal Chem 2004;76:3908-22.

[34] Zhang Z. Prediction of electron-transfer/capture dissociation spectra of peptides. Anal Chem 2010;82:1990-2005.

[35] Zhang Z. Prediction of low-energy collision-induced dissociation spectra of peptides with three or more charges. Anal Chem 2005;77:6364-73.

[36] Yao X, Freas A, Ramirez J, Demirev PA, Fenselau C. Proteolytic 18O labeling for comparative proteomics: model studies with two serotypes of adenovirus. Anal Chem 2001;73:2836-42.

[37] Yao X, Afonso C, Fenselau C. Dissection of proteolytic 18O labeling: endoprotease- catalyzed 16O-to-18O exchange of truncated peptide substrates. J Proteome Res 2003;2:147-52.

[38] Gao Q, Xue S, Shaffer SA, Doneanu CE, Goodlett DR, Nelson SD. Minimize the detection of false positives by the software program DetectShift for 18O-labeled cross- linked peptide analysis. Eur J Mass Spectrom (Chichester, Eng) 2008;14:275-80.

[39] Gao Q, Xue S, Doneanu CE, Shaffer SA, Goodlett DR, Nelson SD. Pro-CrossLink. Software tool for protein cross-linking and mass spectrometry. Anal Chem 2006;78:2145-9.

[40] Back JW, Notenboom V, de Koning LJ, Muijsers AO, Sixma TK, de Koster CG, et al. Identification of cross-linked peptides for protein interaction studies using mass spectrometry and 18O labeling. Anal Chem 2002;74:4417-22.

[41] Liu M, Cheetham J, Cauchon N, Ostovic J, Ni W, Ren D, et al. Protein isoaspartate methyltransferase-mediated 18O-labeling of isoaspartic acid for mass spectrometry analysis. Anal Chem 2012;84:1056-62.

177

[42] Wan W, Zhao G, Al-Saad K, Siems WF, Zhou ZS. Rapid screening for S- adenosylmethionine-dependent methylation products by enzyme-transferred isotope patterns analysis. Rapid Commun Mass Spectrom 2004;18:319-24.

[43] Swaney DL, Wenger CD, Coon JJ. Value of using multiple proteases for large-scale mass spectrometry-based proteomics. J Proteome Res 2010;9:1323-9.

[44] Ni W, Lin M, Salinas P, Savickas P, Wu SL, Karger BL. Complete mapping of a cystine knot and nested disulfides of recombinant human arylsulfatase A by multi-enzyme digestion and LC-MS analysis using CID and ETD. J Am Soc Mass Spectrom 2013;24:125-33.

[45] Ingrosso D, Fowler AV, Bleibaum J, Clarke S. Specificity of endoproteinase Asp-N (Pseudomonas fragi): cleavage at glutamyl residues in two proteins. Biochem Biophys Res Commun 1989;162:1528-34.

[46] Tetaz T, Morrison JR, Andreou J, Fidge NH. Relaxed specificity of endoproteinase Asp- N: this enzyme cleaves at peptide bonds N-terminal to glutamate as well as aspartate and cysteic acid residues. Biochem Int 1990;22:561-6.

[47] Ni W, Dai S, Karger BL, Zhou ZS. Analysis of isoaspartic Acid by selective proteolysis with Asp-N and electron transfer dissociation mass spectrometry. Anal Chem 2010;82:7485-91.

[48] Sorensen SB, Sorensen TL, Breddam K. Fragmentation of proteins by S. aureus strain V8 protease. Ammonium bicarbonate strongly inhibits the enzyme but does not improve the selectivity for glutamic acid. FEBS Lett 1991;294:195-7.

[49] Kim MS, Pandey A. Electron transfer dissociation mass spectrometry in proteomics. Proteomics 2012;12:530-42.

[50] Syka JE, Coon JJ, Schroeder MJ, Shabanowitz J, Hunt DF. Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry. Proc Natl Acad Sci U S A 2004;101:9528-33.

[51] Tomita M, Irie M, Ukita T. Sensitized photooxidation of histidine and its derivatives. Products and mechanism of the reaction. Biochemistry 1969;8:5149-60.

[52] Kang P, Foote CS. Photosensitized oxidation of 13C,15N-labeled imidazole derivatives. J Am Chem Soc 2002;124:9629-38.

[53] Nilsson R, Merkel PB, Kearns DR. Unambiguous evidence for the participation of singlet oxygen ( 1 ) in photodynamic oxidation of amino acids. Photochem Photobiol 1972;16:117-24.

178

[54] Verweij H, Dubbelman TM, Van Steveninck J. Photodynamic protein cross-linking. Biochim Biophys Acta 1981;647:87-94.

[55] Saphire EO, Parren PW, Pantophlet R, Zwick MB, Morris GM, Rudd PM, et al. Crystal structure of a neutralizing human IGG against HIV-1: a template for vaccine design. Science 2001;293:1155-9.

[56] Alfaro JF, Gillies LA, Sun HG, Dai S, Zang T, Klaene JJ, et al. Chemo-enzymatic detection of protein isoaspartate using protein isoaspartate methyltransferase and hydrazine trapping. Anal Chem 2008;80:3882-9.

[57] Zang T, Dai S, Chen D, Lee BW, Liu S, Karger BL, et al. Chemical methods for the detection of protein N-homocysteinylation via selective reactions with aldehydes. Anal Chem 2009;81:9065-71.

[58] Zhou ZS, Flohr A, Hilvert D. An Antibody-Catalyzed Allylic Sulfoxide-Sulfenate Rearrangement. J Org Chem 1999;64:8334-41.

[59] Zhao G, Zhou ZS. Vinyl sulfonium as novel proteolytic enzyme inhibitor. Bioorg Med Chem Lett 2001;11:2331-5.

[60] Zhou ZS, Jiang N, Hilvert D. An Antibody-Catalyzed Selenoxide Elimination. J Am Chem Soc 1997;119:3623-4.

[61] Zhou ZS, Smith AE, Matthews RG. L-Selenohomocysteine: one-step synthesis from L- selenomethionine and kinetic analysis as substrate for methionine synthases. Bioorg Med Chem Lett 2000;10:2471-5.

[62] Matthews RG, Smith AE, Zhou ZS, Taurog RE, Bandarian V, Evans JC, et al. Cobalamin-Dependent and Cobalamin-Independent Methionine Synthases: Are There Two Solutions to the Same Chemical Problem? Helvetica Chimica Acta 2003;86:3939- 54.

179

Chapter 5: Conclusion and Future Directions

In this thesis, LCMS-based methods have been developed for the detection and

characterization of Asp isomerization and protein crosslinks in monoclonal antibodies. As

described in Chapter Two, a protein isoaspartate methyltransferase-mediated 18O-labeling

followed by LC/MS analysis is described to detect one of the most challenging post translational modifications─isomerization of Asp due to the subtle difference between isoaspartic acid and aspartic acid and difficulty to differentiate them. Several isoAsp peptides in IgG1 were characterized and isomerization site were unambiguously identified.

In Chapter Three, a comprehensive methodology for the identification of protein crosslinks without a prior knowledge of chemistry via 18O-labeling and LC/MS analysis is

presented. Due to the intrinsic structural complexity of crosslink, it is very challenging to detect

and characterize crosslinks, especially when crosslink chemistry is unknown. The utility of our

XChem-Finder work flow has been successfully demonstrated via the detection of thioether

crosslinks in IgG2 and the discovery of a novel histidine-histidine crosslink in IgG1 in Chapter

Three and Four, respectively. Both thioether crosslinks and histidine-histidine crosslinks are

found as degradation compounds in the constant region of IgG, therefore these modifications are

most likely common to IgG.

Despite considerable efforts to understand the relevance of post-translational

modifications such as Asp isomerization and protein crosslinking in the cellular context, we are

still in the process of unraveling the complexity of these modifications and their tremendous impact. Sophisticated technological advances like powerful separation techniques, high

180 resolution mass spectrometry are now increasingly available for identification and characterization of these site specific protein modifications. In this chapter, some future work to extend the utilities of our methods for each project is listed next. For crosslink project, the improvement of our current XChem-Finder work flow will be discussed in more detail.

5.1 isoAsp Project

Biological Samples. The method described in Chapter Two can be useful for monitoring isoAsp formation in not only IgG but also other therapeutic proteins during production and storage to ensuring the quality of the therapeutic proteins. Most importantly, the method can be applied to biological samples such as plasma, serum, urine and tissues to identify potential isoAsp proteins to understand the isoAsp process pathways in vivo and to identify disease- associated biomarkers.

As discussed in the section 1.4.1, artifactual deamidation or isomerization can be significant during sample preparation. The extreme pH exposure and high temperature should be avoided during sample preparation. The sample preparation for biological samples typically takes longer time to remove other interference proteins. Therefore, caution should especially be taken to minimize artifactual deamidation or isomeration. The inherent isoAsp and those introduced by sample preparation can be differentiated by preparing sample in 18O-water and quantified by b-ion intensity calculation procedure (See detail discussion in the section

1.4.3.4)[1-3].

D-isoAsp Detection. It is difficult to identify the peptides containing D-isoAsp. So far no sensitive method for D-isoAsp has been reported. The lack of suitable methods has prevented

181

its biological study. Antibodies are highly specific against their specifically modified proteins[4,

5], but there is very limited reports on antibody-based method yet. It is worthy to explore more in this are in the future.

5.2 Crosslink Project

Other Crosslinks in Proteins Our XChem-Finder work flow has successfully been used to detect thioethers and histidine-histidine crosslinks in IgG. We expect the discovery and elucidation of other new protein crosslinking in pharmaceutical products and biological samples by our XChem-Finder approach will be equally successful. Dityrosine crosslinks in protein has been proposed and studied since 1980s, but its analysis most often employ reversed phase HPLC with fluorometric detection due to its complicated structure in nature and difficulties on its MS data interpretation[6-8]. Calmodulin contains two tyrosyl residues with no cysteine or tryptophan, so it is a good model protein to study dityrosine crosslink[8]. Our XChem-Finder work flow might be applied to directly detect and characterize dityrosine crosslink in calmodulin.

This might shed a light on the formation of dityrosine crosslink in other proteins (for example, proximities of two tyrosine residues in proteins).

Sample Enrichment. Our current XChem-Finder process to detect the crosslinks is to directly use tryptic digest from both 16O and 18O-water for LC/MS/MS analysis. A

chromatographic fractionation of peptic digests may be necessary sometime to avoid suppression

of the ionization of some peptides. Various methods to achieve such enrichment are now

explored. One such approach that has so far been employed for enrichment of chemically

crosslinked peptides makes use of the generally higher charge state that distinguishes chemically

182

crosslinked peptides from linear peptides[9]. Chemically crosslinked peptides elute in the late

fractions in cation-exchange chromatography[9]. Also peptides with high charge states are selected for fragmentation in MS analysis[9]. Taouatas and his colleague combined Lys-N proteolytic digestion, strong cation exchange enrichment, and mass spectrometry (MALDI-

MS/MS by CID and LC-MS/MS by CID or ETD) to achieve an optimal targeted strategy for proteome analysis[10-12]. The lack of an enrichment step in the current work flow means that only the most abundant of crosslinks are likely to be found. In the case of photo-degraded IgG1, only histidine-histidine crosslink was discovered in our study although there are highly likely other crosslinks via radical mechanism in the sample. Development of specific and efficient enrichment strategies may help to discover new crosslinks. For example, Lys-N digestion of

SEC (size exclusion chromatography) high molecular fractions followed by CEX (cation exchange chromatography) enrichment for LC/MS analysis will be explored in the future

(Scheme 5-1)[10-12].

183

Scheme 5-1. The use of combining sample enrichment and Lys-N digestion for detection of crosslinks[10-12]. Two enrichment steps via SEC at protein level and CEX at peptide level are used to reduce sample complexity. Lys-N peptides that do or do not contain a basic amino acid

(e, g. His or Arg. His is used as an example here.) are shown with the charge in the scheme. The

Lys-N peptides result in dominant b-ions in MALDI-CID (matrix assisted laser ionization- collusion induced dissociation) and c-ions in ETD facilitating MS/MS data interpretation.

N-Terminal Labeling. In this thesis, the incorporation of 18O at C-termini of each newly created peptide was described to distinguish the crosslinked peptides from linear peptides.

However, it failed to detect the crosslinked peptides containing C-termini of proteins. Isotope

2 2 labeling at N-termini with H3-2,4-dinitrofluorobenzen ([ H3]NDFB) as described in section

1.5.3.2.2 might be useful. In the original protocol, methylation of є–amino group of lysine and

N-terminal tag was conducted before protease digestion which results in miss-cleavage due to

dimethyled resistance to protease digestion[13]. This can be overcome by simply

switching sample preparation order—protease digestion first followed protection of є–amino

2 group of Lys and then specific derivatization of the N-terminal amino group with [ H3]DNFB

184

2 (Scheme 5-2). This will increase cost/consumption of [ H3]DNFB reagent, but can minimize miss-cleavage to get too large crosslinked peptides.

N N H2N

N N H2N HN Lys H2N Lys H2N Lys O2N

O2N F HN Lys H2N Lys H2N Lys O N O2N H 2 Crosslinked peptides 1:2:1 Triplet Trypsin O N O H O2N 2

NaCNBH3 2 Protein N N DNFB: [ H3]DNFB (1:1) N

H2N Lys O N HN Lys H2N Lys 2 Linear peptides 1:1 Doublet O2N

Scheme 5-2. Isotopic labeling at N-termini via 1) trypsin digestion; 2) protection of є–amino group of lysine by reductive methylation; 3) specific derivatization of N-terminal amino group

2 with a 1:1 mixture of DNFB (2, 4-dinitrofluorobenzene) and [ H3]DNFB at pH 7.0. The resulting crosslinked and linear peptides can isotopically be identified (1:2:1 triplet and 1:1 doublet with a space of 3 Da for crosslinked and linear peptides, respectively)[13].

In addition, succinylation is known to modify peptides at N-terminal and є–amino group[14-17]. In the effort to develop a protein quantification method based on isobaric peptide termini labeling, the use of 2-Methoxy-4,5-dihydro-1H-imidazole reacts first with є–amino group followed by N-termini labeling with succinic anhydride and teterdeuterated succinic anhydride-d4 (Scheme 5-3)[14, 17]. Koehler at al recently reported succinylation selectively

185

occurred at N-terminal amino group using sodium acetate buffer at pH 7.6 (Scheme 5-3)[15, 16].

In principal, the site-specific N-terminal succinylation can be used for the detection of crosslink

peptides. Succinic anhydride specifically isotope label N-termini which results in mass increase

of 100 Da and 200 Da for single peptides and crosslinked peptides, respectively (Scheme 5-3).

We plan to exploit the succinylation to improve the detection of crosslink peptides in the future.

Scheme 5-3. N-Terminal Succinylation via two-step chemical derivatizations[14, 17] and site

specific N-terminal succinylation in sodium acetate buffer pH 7.6[15, 16] can be used to differentiate crosslinked peptides and linear peptides.

186

Digestion under Acidic Conditions. Trypsin digestion is often performed at slight basic

condition which may make some crosslinks unstable. Protease digestion in acidic condition may

be explored.

Deglycosylation. Heterogeneity and relatively poor ionization efficiency of

glycopeptides increase difficulty to determine the crosslinks near the site of glycan attachment.

Deglycosylation before 18O-labeling and protease digestion is worth to explore in the future.

Complicated Crosslinks. Proteins are known to be degraded in light exposure by a

number of mechanisms, which is of concern for products manufactured for the clinic. Protein

degradation in the light can involve multiple amino acid residues and form a combination of multiple degradation pathways. This gives very complicated mass spectra which pose great challenge to current methodologies including our XChem-Finder work flow.

Intra-crosslinks. In our XChem-Finder work flow, 18O-labeling followed by tryptic

digestion and LC/MS analysis is used to differentiate single peptides and crosslinked peptides.

This approach may not be suitable for intra-crosslinks when the two crosslinked amino acid

residues are close so that no tryptic cleavage between them occurs. This can be addressed with

the combination of multiple protease digestion (e.g., GluC/Trpsin) and 18O-labeling, N-terminal

labeling or chemical tag. We are going to explore this area in the future.

Others As mention before, some limitation of XChem-Finder (e.g. crosslinks containing

C-terminus of protein, high quality MSMS spectra, large crosslinks, etc.) can be addressed by

use of protease with different selectivity (e.g., GluC instead of Trypsin), N-terminal labeling and

different ion activation (CID and ETD). Chemical tag may provide some solution as well and

might be studied in the future. In addition, MALDI often generates single charge ions while ESI

187

gives multiple charge ions for large peptides and proteins. As such MALDI instead of ESI may simplify the data interpretation of crosslinks. Next, we are going to exploit MALDI for

characterization of crosslinks.

5.3 References

[1] Li X, Cournoyer JJ, Lin C, O'Connor PB. Use of 18O labels to monitor deamidation during protein and peptide sample processing. J Am Soc Mass Spectrom 2008;19:855-64.

[2] Gaza-Bulseco G, Li B, Bulseco A, Liu HC. Method to differentiate asn deamidation that occurred prior to and during sample preparation of a monoclonal antibody. Anal Chem 2008;80:9491-8.

[3] Liu H, Wang F, Xu W, May K, Richardson D. Quantitation of asparagine deamidation by isotope labeling and liquid chromatography coupled with mass spectrometry analysis. Anal Biochem 2013;432:16-22.

[4] Sakai K, Homma H, Lee JA, Fukushima T, Santa T, Tashiro K, et al. Localization of D- aspartic acid in elongate spermatids in rat testis. Arch Biochem Biophys 1998;351:96- 105.

[5] Sakai K, Homma H, Lee JA, Fukushima T, Santa T, Tashiro K, et al. D-aspartic acid localization during postnatal development of rat adrenal gland. Biochem Biophys Res Commun 1997;235:433-6.

[6] DiMarco T, Giulivi C. Current analytical methods for the detection of dityrosine, a biomarker of oxidative stress, in biological samples. Mass Spectrom Rev 2007;26:108- 20.

[7] Correia M, Neves-Petersen MT, Jeppesen PB, Gregersen S, Petersen SB. UV-light exposure of insulin: pharmaceutical implications upon covalent insulin dityrosine dimerization and disulphide bond photolysis. PLoS One 2012;7:e50733.

[8] Malencik DA, Anderson SR. Dityrosine formation in calmodulin. Biochemistry 1987;26:695-704.

188

[9] Rinner O, Seebacher J, Walzthoeni T, Mueller LN, Beck M, Schmidt A, et al. Identification of cross-linked peptides from large sequence databases. Nat Methods 2008;5:315-8.

[10] Boersema PJ, Taouatas N, Altelaar AF, Gouw JW, Ross PL, Pappin DJ, et al. Straightforward and de novo peptide sequencing by MALDI-MS/MS using a Lys-N metalloendopeptidase. Mol Cell Proteomics 2009;8:650-60.

[11] Taouatas N, Altelaar AF, Drugan MM, Helbig AO, Mohammed S, Heck AJ. Strong cation exchange-based fractionation of Lys-N-generated peptides facilitates the targeted analysis of post-translational modifications. Mol Cell Proteomics 2009;8:190-200.

[12] Taouatas N, Mohammed S, Heck AJ. Exploring new proteome space: combining Lys-N proteolytic digestion and strong cation exchange (SCX) separation in peptide-centric MS- driven proteomics. Methods Mol Biol 2011;753:157-67.

[13] Chen X, Chen YH, Anderson VE. Protein cross-links: universal isolation and characterization by isotopic derivatization and electrospray ionization mass spectrometry. Anal Biochem 1999;273:192-203.

[14] Arntzen MO, Koehler CJ, Treumann A, Thiede B. Quantitative proteome analysis using isobaric peptide termini labeling (IPTL). Methods Mol Biol 2011;753:65-76.

[15] Koehler CJ, Arntzen MO, Strozynski M, Treumann A, Thiede B. Isobaric peptide termini labeling utilizing site-specific N-terminal succinylation. Anal Chem 2011;83:4775-81.

[16] Koehler CJ, Arntzen MO, Treumann A, Thiede B. A rapid approach for isobaric peptide termini labeling. Methods Mol Biol 2012;893:129-41.

[17] Koehler CJ, Strozynski M, Kozielski F, Treumann A, Thiede B. Isobaric peptide termini labeling for MS/MS-based quantitative proteomics. J Proteome Res 2009;8:4333-41.

189