Advances in Glycoproteomics and Glyco-Biomarker Discovery Studies: Development and Application of Liquid Chromatography and Mass Spectrometry Platforms to Cancer Samples

by Francisca Owusu Gbormittah

B.S. in Chemistry, Kwame Nkrumah University of Science and Technology M.S. in Chemistry, Indiana University of Pennsylvania

A dissertation submitted to

The Faculty of the college of Science of Northeastern University in partial fulfillment of the requirements for the degree of Doctor of Philosophy

August 1, 2014

Dissertation directed by

William S. Hancock Professor of Chemistry and Chemical Biology UMI Number: 3633335

All rights reserved

INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted.

In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion.

UMI 3633335 Published by ProQuest LLC (2014). Copyright in the Dissertation held by the Author. Microform Edition © ProQuest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code

ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, MI 48106 - 1346

DEDICATION

To my family

ii

ABSTRACT

The development of analytical technologies to investigate the glycoproteome of clinical relevant samples has improved over the last 10 years. These new developments aim to improve the identification and quantification of disease-specific glyco-biomarkers, which are present at low amounts in biological matrices. Glyco-biomakers have the potential to significantly contribute to cancer discovery studies in specific areas such as; early diagnosis, prognosis, monitor cancer recurrence and improve the low survival rate of cancer. In this thesis, we focused on the development and application of novel liquid affinity chromatography fractionation platforms integrated with nano-LC-MS/MS to characterize and quantify the glycoproteome as well as selected glyco-biomarker candidates of cancer samples.

In chapter 1, brief background information covering glycoproteomics and glyco- biomarker discovery studies is presented. Specifically, glycosylation process and how the field of ‘omics’, which includes glycoproteomics, have revolutionized clinical glyco-biomarkers discovery are discussed. Further, various disease models, current sample fractionation strategies and analytical methodologies involved in glyco-biomarker development pipeline and their significance as well as their short falls are described. Reviewing biomarker validation and current bio-infomatics tools utilized in glycoproteomics discovery studies concludes chapter 1.

Chapter 2 details the development of a novel multi-dimensional affinity liquid chromatography fractionation approach that combines the depletion of the top 12 abundant and multi-lectin fractionation of the human plasma. Evaluating and validating the reproducibility, specificity and overall recoveries of the platform demonstrated the suitability of the developed method in glyco-biomarker discovery studies of clinical samples. After establishing this robust platform, it was applied in chapter 3 to comprehensively study the global glycoproteome profile of clear cell renal cell carcinoma plasma (ccRCC) samples to identify and

iii

characterize potential biomarkers for early detection of the disease. During this study, protein abundance alterations as well as glycan shifts were investigated to understand the sub-proteome of ccRCC.

Chapter 4 focuses on the structural characterization of a glycoprotein (clusterin) that was identified during the ccRCC biomarker discovery studies. Clusterin has been implicated in ccRCC cancer progression however; its structure and biological function(s) are not yet well defined. Therefore, to have more structural insights into clusterin, the protein was immuno- affinity purified from ccRCC plasma followed by tandem mass spectrometry to profile glycoforms, N-glycosylation sites and quantify glycan amounts. We discovered that the levels of bi-antennary digalactosylated disialylated (A2G2S2) and core fucosylated bi-antennary digalactosylated disialylated (FA2G2S2) glycans differed significantly in the plasma of patients before and after curative nephrectomy of localized ccRCC.

In chapter 5, a multi-lectin affinity chromatography platform previously developed in our laboratory was optimized and applied to investigate glycoproteins and non-glycoproteins present in pancreatic cyst fluid samples. This study was aimed at identifying potential candidate markers for early detection of malignant cyst (pancreatic cancer precursor). Our data showed the identification of proteins with significant differential expression in mucinous cysts (malignant cyst) compared to non-mucinous cysts (benign) of which one protein (periostin) associated with cancer progression was confirmed by immunoblotting assay.

In the final chapter (chapter 6), we summarize and conclude our findings in this work and provide our perspective on the potential of glycoproteins in glyco-biomarker discovery studies.

iv

TABLE OF CONTENTS

Page #

DEDICATION ii

ABSTRACT iii

TABLE OF CONTENTS vi

APPENDICES xii

LIST OF FIGURES xiii

LIST OF TABLES xv

CHAPTER 1 1

INTRODUCTION: OVERVIEW OF GLYCOPROTEOMICS AND GLYCO-BIOMARKER DISCOVERY STUDIES

1.1 Protein glycosylation and glycoproteomics 2

1.2 Current status of glyco-biomarkers: advantages and limitations 3

1.3 Biological matrices 4

1.3.1 Blood plasma/serum 4

1.3.2 Tumor tissue 5

1.3.3 Proximal fluids 5

1.3.4 Tumor cell lines 6

1.4 Glycoproteomics sample fractionation strategies 6

1.4.1 Lectin affinity platforms and applications 7

1.5 Glycoproteomics characterization using Mass Spectrometry (MS)

approaches 9

1.5.1 Proteolytic selection 10

v

1.5.2 Glycoproteins and glycopeptides enrichment strategies 10

1.5.3 MS spectrometry platform 11

1.6 Quantitative technologies in glycoproteomics 15

1.6.1 Stable isotope quantitation 15

1.6.2 Targeted-based quantitation 16

1.6.3 Label free quantitation 16

1.7 Glyco-biomarker validation strategies 17

1.8 Data Processing and Statistical Analysis of Glycoproteomics 18

1.9 References 20

CHAPTER 2 30

DEVELOPMENT OF AN IMPROVED FRACTIONATION OF THE HUMAN PLASMA PROTEOME BY A COMBINATION OF ABUNDANT PROTEINS DEPLETION AND MULTI-LECTIN AFFINITY CHROMATOGRAPHY

2.1 Abstract 31

2.2 Introduction 32

2.3 Materials and Methods 35

2.3.1 Materials 35

2.3.2 Samples for study 36

2.3.3 Experimental design 36

2.3.4 Preparation of 12P, M-LAC and reverse phase HPLC columns 37

2.3.5 High abundance protein depletion and multi-lectin affinity

Fractionation 38

2.3.6 Protein concentration measurements and 1D-SDS PAGE analysis 39

2.3.7 In-solution protein trypsin digestion 40

2.3.8 Nano-LC-MS/MS analysis and peptide sequencing 41

vi

2.4 Results and discussion 43

2.4.1 12P immuno-affinity depletion 43

2.4.2 Specificity of 12P depletion column 46

2.4.3 12P-M-LAC fractionation platform 46

2.4.4 Recovery studies of 12P-M-LAC platform 47

2.4.5 Reproducibility studies of protein identification from the

12P-M-LAC platform fractions 47

2.4.6 Enrichment of low level glycoproteins by 12P-M-LAC platform 51

2.5 Conclusion 53

2.6 References 54

CHAPTER 3 57

COMPARATIVE STUDIES OF THE PROTEOME, GLYCOPROTEOME AND N-GLYCOME OF CLEAR CELL RENAL CELL CARCINOMA PLASMA BEFORE AND AFTER CURATIVE NEPHRECTOMY

3.1 Abstract 58

3.2 Introduction 59

53.3 Materials and Methods 61

3.3.1 Materials 61

3.3.2 Sample population 62

3.3.3 High Abundance Proteins Depletion and Glycoprotein Affinity

Fractionation 63

3.3.4 N- Glycan Release and LC-ESI-MS Analysis 64

3.3.5 Gel nano-LC-MS/MS Proteomic and Glycoproteomic Analysis 65

3.3.6 Data processing and statistical analysis 66

3.4 Results and discussion 68

vii

3.4.1 The analytical strategy 68

3.4.2 The 12P-M-LAC analytical platform 70

3.4.3 Overview of proteomics and glycoproteomics data 71

3.4.4 Quantification and selection of differentially expressed proteins

present in 12P depleted ccRCC plasma proteome 72

3.4.5 Identification and selection of proteins of interest showing

differential M-LAC column binding 76

3.4.6 Characterization of N-glycan moieties released from depleted

M-LAC fractions by porous graphitized carbon (PGC)

LC-ESI-IT MS/MS 79

3.4.7 N-glycan structures alteration analysis 80

3.4.8 Validation of differentially expressed N-glycans by extracted ion

Chromatograms 84

3.5 Conclusion 86

3.6 References 87

CHAPTER 4 92

TANDEM MASS SPECTROMETRY CHARACTERIZATION OF CLUSTERIN GLYCOPEPTIDE VARIANTS IN THE PLASMA OF CLEAR CELL RENAL CELL CARCINOMA

4.1 Abstract 93

4.2 Introduction 94

4.3 Materials and Methods 96

4.3.1 Materials 96

4.3.2 Clear cell renal cell carcinoma (ccRCC) plasma sample

collection and preparation 97

viii

4.3.3 Clusterin immuno-affinity HPLC purification 97

4.3.4 Lectin blot assay of purified Clusterin 98

4.3.5 One dimensional SDS PAGE and enzymatic digestion 98

4.3.6 C18 reversed phase nano-LC-MS/MS Analysis 99

4.3.7 Data and statistical Analysis 100

4.4 Results and discussion 101

4.4.1 Development of the Analytical Approach 101

4.4.2 Glycan occupancy analysis 106

4.4.3 Characterization of site-specific oligosaccharide heterogeneity 109

4.4.4 Glycan structures for the selected glycopeptide residue 372-385,

N-374 112

4.4.5 Quantitation of targeted glycoforms in clinical samples 115

4.4.6 Lectin blot assay 119

4.5 Conclusion 120

4.6 References 122

CHAPTER 5 128

CHARACTERIZATION OF GLYCOPROTEINS IN PANCREATIC CYST FLUID USING A HIGH PERFORMANCE MULTIPLE LECTIN AFFINITY CHROMATOGRAPHY PLATFORM

5.1 Abstract 129

5.2 Introduction 130

5.3 Materials and Methods 132

5.3.1 Reagents 132

5.3.2 Pancreatic cyst fluid samples 132

5.3.3 Sample fractionation and glycoproteins enrichment 133

ix

5.3.4 One dimensional SDS-PAGE analysis followed by in-gel Trypsin

Digestion 134

5.3.5 Liquid chromatography mass spectrometry analysis 135

5.3.6 Data processing and Bioinformatics 136

5.3.7 Western Blot Analysis 137

5.4 Results and discussion 137

5.4.1 Analytical Strategy 137

5.4.2 Glycoproteome and non-glycoproteome platform 139

5.4.3 Summary of glycoproteome and non-glycoproteome data 141

5.4.4 Quantitation of glycoproteins in different analysis set and

Selection of potential protein targets of interest 144

5.4.5 Pathway and network interaction analysis of potential

targets of interest 151

5.4.6 mapping analysis of potential targets

of interest 153

5.4.7 Validation of Periostin 154

5.5 Conclusion 156

5.6 References 157

CHAPTER 6 161

SUMMARY AND FUTURE DIRECTIONS 161

x

APPENDICES

A Peak area measurements of serum albumin to evaluate column 165

B Stability analysis of 12P-M-LAC platform 166

C The number of total peptides and proteins identified in proteomics and glycoproteomics analysis 167

D A representative average MS spectral of M-LAC bound (BD) and unbound (FT) fractions 168

E Relative abundances of RCC (+) and RCC (-) glycans for averaged (n=3) analytical replicates is represented in the heat map 169

F A representation of glycan site occupancy calculation showing N-linked site 374 170

G Identified peptides for bile salt-activated lipase (CEL) long iso-form in M- LAC bound sub-proteome 171

H MS/MS fragmentation of diagnostic peptide TYAYLFSHPSR of CEL- long isoform 172

I Novoseek disease relationship to pancreatic cancer and related diseases data of selected protein target list 173

J string network interaction of CEL, PNLIP, and PNLIPRP1 significantly enriched in glycoproteomics and observed in pancreatic secretion pathway 176

xi

LIST OF FIGURES

1.1 An illustration of proteomics bottom-up approach 10

1.2 An illustration of glycoprotein heterogeneity 12

1.3 Collision induced dissociation (CID) fragmentation of a glycan moiety 14

1.4 Electron transfer dissociation (ETD) and Collision induced dissociation

(CID) glycopeptides fragmentation sites 15

2.1 Experimental work flow of the 12P-M-LAC analytical platform 37

2.2 ID SDS-PAGE analysis of reference plasma fractions collected from 12P depletion column to evaluate efficiency of the immuno-affinity depletion 42

2.3 HPLC profile showing 4 replicate runs and fractions collected during 12P-M- LAC fractionation 48

2.4 Reproducibility measurements of 12P-M-LAC fractions 49

2.5 A three way Venn diagram showing proteins identified in three analytical replicates of M-LAC bound and unbound fractions 50

3.1 Experimental workflow showing the process used in the characterization of clear cell renal cell carcinoma plasma (ccRCC) 68

3.2 ID SDS-PAGE analysis of reference plasma fractions from 12P-M-LAC platform to evaluate reproducibility 71

3.3 GO functional classification of selected differentially expressed proteins 75

3.4 N-glycans identified in clear cell renal cell carcinoma plasma 80

3.5 Extracted ion chromatograms to illustrate differentially expressed glycans in M- LAC bound and unbound fractions 85

4.1 Box and whisker plot of ELISA measurement of clusterin concentration in RCC (pre,+) and RCC (post, -) plasma samples 102

4.2 Flow diagram showing tandem mass spectrometry approach for the characterization of clusterin N-linked glycan sites 107

4.3 Representative LC-MS/MS analysis of clusterin glycoprotein digests 111

xii

4.4 CID-MS2 and MS3 glycopeptide analysis of precursor ion 1296.88 (+ 3 charge state) 113

4.5 CID-MS2 and MS3 glycopeptide analysis of precursor ion 1345.56 (+ 3 charge state) 114

4.6 Representative CID-MS2 fragmentation spectrum of the precursor ion m/z 559.309 (+ 2 charge state) of a non-glycosylated peptide (insert) 115

4.7 Column chart and box plot of A2G2S2 (A and B), and FA2G2S2 (C and D) comparison between before RCC (+) and after RCC (-) nephrectomy 117

4.8 Representative lectin blot assay to determine total glycan changes in before (+) and after (-) nephrectomy plasma samples 120

5.1 Block diagram showing experimental process used in glycoproteomic studies of two analysis sample set 140

5.2 A four way Venn diagram showing distribution of proteins identified in unbound and M-LAC bound fractions of mucinous and non-mucinous subtypes after glycoproteomic analysis in sample set 1 and sample set 2 141

5.3 1D SDS-PAGE of two sample sets used for glycoproteomics analysis 143

5.4 Molecular functional characterizations of differentially expressed proteins in M- LAC fractionation 150

5.5 Annotation of pancreatic secretion KEGG signaling pathway 152

5.6 Pre-validation of Periostin as a potential biomarker target through SDS-PAGE western blot analysis 155

xiii

LIST OF TABLES

2.1 A list of 12P bound (targeted depleted) proteins identified after mass spectrometry analysis 43

2.2 A list of identified non-targeted proteins in the 12P bound fraction after mass spectrometry analysis 45

2.3 Recovery studies of 12P-M-LAC platform 47

2.4 A list of glycoproteins identified in M-LAC fractions 51

3.1 Patient information of RCC plasma samples 62

3.2 List of proteins with significant abundance changes in ccRCC 12P depleted fraction 73

3.3 List of glycoproteins with significant differential M-LAC binding 77

3.4 List of glycans with significant differential expression in ccRCC M-LAC fractions 83

4.1 Enzymatic glycopeptides identified by nano-LC-MS/MS analysis 105

4.2 Overview of major identified site-specific glycoforms and their calculated mass and corresponding m/z of glycopeptides 110

4.3 Analytical reproducibility peak area measurements of 5 ccRCC patient samples (averaged peak areas of 3 analytical replicates shown) 116

5.1 Pancreatic cyst fluid samples for glycoproteomics analysis 138

5.2 Number of Identified proteins in the Unbound and M-LAC Bound fractions after 1D SDS-PAGE LC-MS/MS Glycoproteomics analysis 142

5.3A Mucinous vs. non-mucinous proteins identified in M-LAC bound fraction with relative abundance changes 146

5.3B Mucinous vs. non-mucinous proteins identified in unbound fraction with relative abundance changes 147

5.4 Chromosome gene analysis of some ‘proteins of interest’ 153

xiv

CHAPTER 1

INTRODUCTION: OVERVIEW OF GLYCOPROTEOMICS AND GLYCO-BIOMARKER

DISCOVERY STUDIES

1

1.1 Protein glycosylation and glycoproteomics

Proteins or biological molecules are known to play an integral role in several biological activities and physiological processes such as cell-cell interaction, catalyzed reactions, and metabolism1. The comprehensive study of proteins and their associated functions of clinically relevant specimens may result in the identification of modified or non-modified proteins and provide insights into mechanisms and biological pathways of diseases2.

During the synthesis or co-translation of proteins, certain chemical species react with the protein molecules and as a result modulates the structure and function of the protein. This process is called posttranslational modification (PTM). There are different types of PTM’s and the most notable one’s include: phosphorylation, glycosylation, truncation, acetylation, deamidation, oxidation, and alkylation. The most common type of posttranslational modification is glycosylation and it occurs when sugar molecules are added to proteins during or after their synthesis in the Golgi and endoplasmic reticulum (ER) compartments of eukaryotes 3-5.

Glycoproteins are the end product of protein glycosylation and the large-scale study of the entire set of glycoproteins in a biological matrix is termed glycoproteomics.

There are two main types of glycoproteins: N-linked glycoproteins and O-linked glycoproteins6-8. N-linked glycoproteins are generated when sugar structures are attached to the asparagine (Asn-N) side chain in a defined sequence of Asn-X-Ser/Thr, where X can be any amino acid residue but proline whiles O-linked glycoproteins occurs when sugar structures are attached to the hydroxyl group of Ser/Thr residues. Glycoproteins are established to be involved in a number of bio-molecular functions including; protein folding, quality control checks in cells, protein turnover, cell signaling, gene expression, and response to stress9,10. In addition, alterations of glycoproteins have been reported in a number of diseases including cancer11,12.

2

There is therefore the need for continues investigation of the glycoproteome of clinical relevant samples since this may lead to the identification of potential glyco-biomarkers that are cancer specific13-21.

In this thesis, the development and application of analytical platforms for the discovery of

N-glycoproteins and their potential utility as diagnostic or prognostic clinical glyco-biomarkers will be presented. N-glycoproteins were the main focus of this work because of the current availability of bioinformatics tools needed for data analysis and validation compared to other types of glycoproteins.

1.2 Current status of glyco-biomarkers: advantages and limitations

The field of “omics” i.e. genomics, transcriptomics, proteomics, glycoproteomics, and glycomics, have gained a lot of attention in recent years because of continuous advancement in analytical and computational technologies. Through these studies several relevant information have been derived. For instance, advances in glycoproteomics–the process of identification, characterization and quantification of glycoproteins in biological samples have led to the detection of cellular mechanisms and glyco-biomarkers of clinical relevance.

In addition, the study of the repertoire of glycans i.e. glycomics have also received a considerable amount of attention over the past decade with extensive studies involving their structure and function in association with disease transformation 22-27. Glycan molecules are attractive as glyco-biomarkers because of their high sensitivity, specificity, consistency and reproducibility in their identification.

Currently, glycosylated proteins make up the majority of biomarkers screened in the clinics. These include; carcinoembryonic antigen (CEA) for colorectal and bladder cancers, prostate-specific antigen (PSA) for prostate cancer, human epidermal growth factor receptor 2

3

(HER2/NEU) for breast cancer, and cancer antigen 125 (CA125) for ovarian cancer15, 28-30. Their glycans are observed to change during carcinogenesis and metastasis19, 31-32 and their unique molecular feature helps in improving disease diagnosis, prognosis and treatment.

Although there have been tremendous advances in the last decade, glycoproteomics is still challenging. This is because the glycoproteome of a biological matrix is complex, it is dynamic and changes over time depending on the cellular environment and the type of biological specimen under consideration. There are two main challenges associated with glycoproteomics studies; (1) incomplete regulation of glycans in the ER leading to a mixture of glycoforms33 and

(2) glycoprotein micro and macro heterogeneity leading to multiple glycoform variants and glycan sites of attachment. These complexities are as a result of both genetic and environmental factors34 and in the next two sections; we will consider and discuss some sources of the challenges associated with glycoproteomics discovery studies.

1.3 Biological matrices

Glycoproteomics of different disease models such as plasma/serum, tumor tissue, proximal fluids, and tumor cell lines have led to the identification of differentially expressed proteins in disease samples. These findings have provided relevant information to investigate unusual pathways associated with the pathogenesis of disease mechanism as well as detect new biomarkers.

1.3.1 Blood plasma/serum

Blood plasma/serum is known to be the ideal biological specimen used in biomarker discovery studies and the preferred diagnostic bio-fluid for clinical screenings35. This is because plasma/serum is easily collected and readily available in large volumes. However, one major

4

disadvantage of plasma/serum is the large dynamic range (˃11 orders of magnitude) of protein concentration. It is established that potential biomarkers are usually expressed in low concentration in blood plasma making the identification and quantification of these biomarkers a difficult task. The characterization of such low abundance proteins requires extensive and comprehensive proteomic strategies to enhance the depth of analysis36.

1.3.2 Tumor tissue

Comparative studies involving the glycoproteome of tumor tissues from disease and control samples have resulted in the identification of several novel tumor tissue derived proteins of clinical importance37-39. The tumor glycoproteome provides information regarding dysregulation and abnormal signaling pathways involved in carcinogenesis and the spread of tumor cells.

Unfortunately, identified candidate markers from tumor tissues may not be detectable in plasma/serum. Some reasons attributed to this occurrence include: tumor heterogeneity, mis- match of tissue analysis and the clearance rate of tissue derived proteins, and difficulties in establishing controls. Therefore, it is important to thoroughly evaluate proteomic data and establish a strong correlation among proteins expressed in tissues, leached out of tissue, and present in blood circulation36.

1.3.3 Proximal fluids

Proximal fluids are good reference for cancer specific proteins because they contain secreted and leaked proteins from tumor tissue or cells40. Some examples of proximal fluid are; pancreatic cyst fluid or juice of pancreatic lesions, cerebrospinal fluid (CSF) of the central nervous system tumors, breast ductal fluid (BDF) from nipple aspiration and ductal lavage of breast cancer, and ascitic fluid (AF) of intra-abdominal tumors. Previous studies have reported the identification of

5

novel candidate biomarkers from proximal fluids although there are a number of difficulties associated with proximal fluid based biomarker discovery studies40. Some of the challenges include: (1) invasive procedures for obtaining proximal fluid samples, (2) challenges in standardization of sample collection, (3) inadequate controls, and (4) heterogeneity of proximal fluid proteins. Whiles these challenges exist, proximal fluids still remain a more direct source for the detection of biomarker candidates.

1.3.4 Tumor cell lines

Numerous studies have profiled tumor cell lines in an attempt to identify potential diagnostic and prognostic candidate markers. Tumor cell lines are attractive models because of easy cell lines manipulation to aid in understanding mechanisms underlying disease state and the identification of potential biomarkers. In addition, cell lines enable the investigation of secreted and cell surface proteins that may contain potential fluid-based candidate markers. For example, in a recent pancreatic cancer cells study, several novel pancreatic associated proteins were identified41. Similarly, conditioned media obtained from breast cancer cell lines revealed a strong correlation between identified secreted proteins and the development of breast cancer42.

These observations provide a unique platform for the identification of protein subsets rich in information that are of clinical relevance.

1.4 Glycoproteomics sample fractionation strategies

Several analytical platforms have been developed for the front-end treatment of biological matrices in glycoproteomics. Majority of these approaches are specifically designed to reduce the complexity of the biological sample for which glycoproteins are found by targeting a population of glycoproteins or a single glycoprotein whose detection are difficult because of

6

their low abundance.

One popular approach routinely used to reduce the large dynamic range of biological samples is immuno-depletion. This method targets the removal of high abundance proteins, which frequently mask low level glycoproteins and non-glycoproteins making their identification difficult. Immuno-depletion platforms currently employs affinity anti-body based materials which targets proteins with high concentration in blood plasma including albumin, immunoglobulin, haptoglobin, transferrin, fibrinogen, Apolipoprotein A-1, alpha-macroglobulin, alpha-1 acid glycoprotein and alpha-1 anti-trypsin43. These high abundance proteins contribute about 99% of the total protein concentration and their removal or reduction could greatly improve the detection of molecular species of disease significance44.

The benefits derived from immuno-affinity depletion especially when combined with targeted enrichment methods make it an attractive approach in mapping the glycoproteome of a biological sample. Several strategies are available for the target enrichment of glycoproteins sub- population and majority of the methods takes advantage of glycoproteins affinity for specific chemical groups. One of such molecules is lectins which are known to be highly selective for oligosaccharides.

1.4.1 Lectin affinity platforms and applications

Over the past decade, lectins–diverse protein molecules with affinity for specific glycan moieties; are extensively used to isolate glycoconjugates in global glycoproteomics studies 45.

The application of lectins to glycoproteomics was first reported in the early 1970’s 46, 47 and since then there have been tremendous improvement in lectin-affinity enrichment methods including its integration into HPLC chromatographic platforms47-49. Currently, there are over sixty commercially available lectins that can be used to capture intact glycoproteins or

7

glycopeptides50. Lectins play important role in cancer biomarker discovery studies because of their unique sugar binding properties that allows for the isolation of glycoproteins in clinical samples leading to the discovery of disease specific markers.

Lectins are utilized in various formats such as; serial lectin (S-LAC) 51,52 multi-lectin (M-

LAC) 53, or single lectin format54 to enrich glycoproteins sub-population in biological samples.

In addition, a number of lectin screening platforms have been developed, these include; lectin blot assay – used for the identification of the presence of glycan specific alterations and glycoproteins fractionated on an SDS-PAGE (sodium dodecyl sulfate-polyacrylamide gel electrophoresis) gels55,56 and lectin microarrays –used for the detection of aberrant glycosylation changes in disease vs. control samples57. Lectin microarrays i.e. lectin glycoprotein arrays and lectin-antigen-antibody microarrays58 are widely used as complementary tools to mass spectrometry measurements in screening for subtle and significant glycosylation changes to in clinical studies. Both array formats are extensively utilized for the screening of glycosylation changes in glyco-biomarker discovery studies because of the following advantages; high throughput, multiplexing ability, and minimum sample requirement. These complementary analytical tools provide valuable information in glycosylation alterations when combined with knowledge derived from current advanced technologies such as mass spectrometry. The only drawback however is the observation of non-specific binding resulting from the presence of multiple proteins co-migrating in SDS gel bands59.

1.5 Glycoproteomics characterization using Mass Spectrometry (MS) approaches

MS based technologies are the gold standard for the identification, structural characterization and quantification of glycoproteins and glycopeptides in a biological mixture. This is because of the unique advantages of the MS, which include: sensitivity, specificity, reproducibility, speed,

8

multiplexing and high throughput compared to other analytical methods.

The most common approach for structural characterization of proteins is the “shotgun” or bottom-up MS strategy. In the “shotgun” method, proteins isolated from a biological matrix are digested with a selected endopeptidase (proteolytic enzyme) in gel-free or gel based platform followed LC-MS/MS sequencing, Figure 1.1.

Figure 1.1: An illustration of proteomics bottom-up approach

In glycoproteomics, the shotgun strategy provides the platform for which information such as; glycoprotein primary sequence, glycoforms and glycan sites of attachment are attained.

However, the successful application of this MS approach depends on three main considerations; the effectiveness of selected proteolytic enzyme, glycoproteins and glycopeptides enrichment

9

procedure, and the mass spectrometry platform required for the generation of glycoproteomic data. These considerations will be discussed in detail in the following sub-sections.

1.5.1 Proteolytic enzymes selection

The choice of a proteolytic enzyme in glycoproteomics studies is dependent largely on the goal of the study. Trypsin, Lys-C and Glu-C are the predominant proteolytic enzymes used in

“shotgun” glycoproteomics studies. Trypsin, the enzyme of choice in many glycoproteomics studies cleaves the carboxyl side of arginine (R) or lysine (K) amino-acid residue when proline

(P) does not follow in the peptide chain. Lys-C endopeptidase selectively cleaves the carboxyl side of lysine (K) amino acid residue and they are utilized when large peptide sizes are required.

Glu-C, another frequently used endopeptidase targets the C-terminus of either aspartic or glutamic acid amino acid residues. All these enzymes can be used either alone or in combination to generate the appropriate length of peptides for increase glycoprotein sequence coverage and the choice of enzyme(s) may lead to the improvement in peptides recovery and fragmentation to enhance structural identification60-63.

1.5.2 Glycoproteins and glycopeptides enrichment strategies

The identification and quantification of glyco-conjugates (glycoproteins, glycopeptides or glycans) are greatly improved when fractionated with specific target chemical molecules prior to

MS analysis. This is because glyco-conjugates are heterogeneous and exhibits poor MS ionization leading to ion suppression in a complex biological mixture. For instance in reverse phase chromatography, the hydrophilic state of glycopeptides makes them elute as a family and unlike unmodified peptides do not ionize well under electrospray ionization and their signal suppressed by co-eluting peptides64,65. Therefore, it is important to employ analytical strategies

10

that reduce the effects of these challenges.

Antibody immunoprecipitation is one of such methods in which a glycoprotein of interest can be targeted and isolated from a biological mixture followed by either glycopeptides enrichment before MS analysis or direct MS analysis. The complete isolation of a glycoprotein significantly reduces the effects of ion suppression in glycopeptides analysis. In addition to immunoprecipitation, other alternative strategies exist for the enrichment of glyco-conjugates is known. These include; (1) hydrophilic interaction liquid chromatography (HILIC) which isolates glycans, (2) hydrazide-based extraction procedure which isolates glycoproteins and glycopeptides by hydrazide-coated stationary phase support, (3) sialic acid enrichment methods which uses boronic acid and titanium dioxide (TiO2) chemical molecules for capturing sialic acid glycans, and (4) PNGase-F assisted glycan evaluation, wherein N–linked glycans are released from denatured glycoproteins using PNGase-F iso-enzyme followed by chromatographic separation and glycan detection and/or quantitation 66-67.

1.5.3 MS spectrometry platform

Mass spectrometry (MS) has emerged as an analytical tool that plays an important role in glyco- conjugates analysis because of the complexity and heterogeneity i.e. multiple glycoforms, glycan sites, and glycan structure variants of glyco-conjugates in complex bio-fluids, Figure 1.2.

11

Figure 1.2: An illustration of glycoprotein heterogeneity

During MS glycoproteomics studies, either glycans are PNGase-F released from glycoproteins and profiled or glycopeptides from digested glycoproteins sequenced via mass spectrometry.

Glycan profiling approach only provides information about sugar structures, whiles information such as glycan attachment sites as well as the amounts of glycans at these sites are not obtained.

Glycopeptide analysis is therefore often preferred because information of both glycan structure and glycan sites of attachment is obtained.

Although MS is widely utilized in glycosylation studies, the charaterization of the glycoproteome is still challenging and requires the optimization of the MS platform to achieve desired results. MS ionization and fragmentation components are the two critical parameters to consider during the structural characterization of glycoproteins. Electrospray Ionization (ESI) and Matrix Assisted Laser Desorption Ionization (MALDI) are the two main types of MS ionization strategies used for glycoproteomics characterization. ESI is a soft MS ionization technique that generates charged ions under atmospheric pressure. ESI is frequently used because of its ability to transfer multiple and highly charged glycopeptides for MS analysis.

Liquid chromatography (LC) is the preferred interface utilized for the introduction of separated glycopeptides before ESI and subsequent mass spectrometry analysis. Additionally, LC provides a platform for sample de-salting leading to the improvement of glycopeptides ionization and

12

quantitation68.

MALDI, another MS ionization method uses dry analytes containing a selected crystalline matrix. These dried samples are struck with laser pulses to generate ions and then transferred into the MS for detection. MALDI is also a soft ionization method but unlike ESI,

MALDI generates singly charged ions and it is suitable for intact glycoprotein analysis69. After

ESI or MALDI MS ionization, several different mass analyzers can be used to analyze either positive or negative ions generated. Mass analyzers frequently used include: quadrupole, ion trap, time of flight (TOF), Orbitrap and Fourier transform ion cyclotron resonance (FT- ICR) 70.

Tandem mass spectrometry (MSn) with collision-induced dissociation (CID) fragmentation has emerged as the preferred approach to assign peptides and protein modifications 71-75. Tandem MS has the capability to provide site-specific glycan heterogeneity information of glycoproteins, glycopeptide sequence information and allows for site-specific glycoform comparison between diseased samples and associated healthy controls. The identification of site-specific glycosylation changes may shed light on our understanding of the glycoprotein function and improves glycan detection specificity76,77.

CID of glycopeptides occurs when precursor ions collide in the collision cell resulting in fragmentation or dissociation of the glycosidic bonds of the glycan structure, Figure 1.3. Glycan glycosidic bonds are preferentially fragmented during the application of the CID process (CID-

MS2) because of lower energy resulting in weaker bonds of the glycan compared to the carbonyl bonds of the peptide backbone 78,79 generating glycan moiety information. To derive the entire structural information, the peptide backbone is subsequently fragmented via CID-MS3 to confirm the structure of the glycopeptides.

13

Figure 1.3: Collision induced dissociation (CID) fragmentation of a glycan moiety

Although this approach is effective and has been utilized extensively over the past decade, two major limitations still exists. CID generates complicated spectra making glycopeptides data interpretation challenging. Another issue with CID is the limited peptide backbone information derived during CID-MS2. This often results in time consuming and tedious data analysis especially when several glycopeptides with multiple glycan attachment sites are involved80.

More recently, alternative fragmentation strategies have been introduced to overcome the limitations of CID application and one of such procedure is electron transfer dissociation (ETD).

The ETD strategy operates by transferring anion electrons to a positively charged peptide backbone of a glycopeptide there by fragmenting the peptide backbone in ETD-MS2 leaving the glycan moiety intact, Figure 1.4.

14

Figure 1.4: Electron transfer dissociation (ETD) and Collision induced dissociation (CID) glycopeptides fragmentation sites

Since its introduction, the ETD technology has been used mainly for protein posttranslational modifications. Recent glycopeptides characterization utilizes ETD together with CID to fragment glycopeptides to obtain glycan structure and glycopeptide backbone information simultaneously81-85.

1.6 Quantitative technologies in glycoproteomics

Global comparative studies involving MS based quantitation is conducted to identify potential candidate glyco-markers in clinical samples. The main types of MS based protein quantitation are; stable isotopic labeling, targeted MS based method and label-free quantitation method.

1.6.2 Stable isotope quantitation

Stable isotope labeling methods are devised to measure differential protein expression either in vivo 86 (cultured cells) or in vitro in glycoproteomics studies. In this approach, proteins or

15

peptides are chemically labeled and MS detected based on mass shift. Some frequently used chemical labels include Isotope-coded affinity tags (ICATs), Isobaric tags for relative and absolute quantitation (iTRAQ), mass-coded abundance tagging (MCAT) and permethylation87-90.

Stable isotope labeling together with tandem mass spectrometry allows for quantitative evaluation of the glycoproteome and effective for global profiling as well as comparative studies.

For instance Kang et. al reported in an earlier study, the successful mapping of glycoforms using

91 a combination of differential stable isotope labeling (CH3I/CD3I) and permethylation .

Furthermore, Zhang et al. investigated mouse plasma glycoproteome before and after tumorigenisis by labeling the N-terminal of captured glycopeptides with d0/d4 stable isotope92.

1.6.2 Targeted-based MS quantitation

Targeted based MS technology is another widely used approach to quantify the glycoproteome.

The most popular MS targeted approach is selected or multiple reaction monitoring

(SRM/MRM) 93-95. SRM is the process where peptides or glycopeptides molecules unique to a single protein are monitored in MS analysis. This method allows for the quantitation of purified protein molecules or protein analytes buried in a biological mixture. Several advantages are associated with targeted based MS method. These include; sensitivity, specificity and excellent reproducibility96-97. In glycoproteomics, this approach is used to identify different glycoforms with similar chemical properties.

1.6.3 Label free quantitation

Some quantitation strategies utilize neither isotope labeling nor target measurement of proteins or peptide molecules; these are called label-free methods. In the label-free approach, peptide molecules present in the MS data are selected as internal standard and used for data

16

normalization before comparison of biological sample sets to identify differentially expressed proteins. The accuracy of this quantitation method is predominantly based on MS response (MS spectral counts) hence; front-end sample preparation procedures before MS analysis are crucial and needs comprehensive evaluation. Previous studies have demonstrated a linear correlation between protein abundance and MS spectral counts98-99. Peak area measurements of randomly selected peptide peaks are generally used as a complementary tool in addition to MS spectral counting for increased confidence in analytical data 100. Our research group has applied label-free quantitation methodology in several biomarker studies to measure protein differential expression and have identified relevant proteins of clinical interest which were verified by ELISA measurements 101-102.

1.7 Glyco-biomarker validation strategies

Enzyme-linked immunosorbent assay (ELISA), western blot assay and selected reaction monitoring (SRM) MS assay are the main validation protocols used to confirm the presence and levels of protein substances in glycoproteomics studies.

ELISA is an antibody-based method widely used as a diagnostic tool in the clinics. This is because ELISA is fast, sensitive, reproducible and easy to transfer103-104. In this assay, a color change and the intensity of the color is indicative of the presence and levels of the proteins in the biological sample respectively. The platform usually involves an antibody immobilized unto the surface of a solid support which enables the capture of a protein of interest in a matrix sample and which is subsequently detected with a substrate. The technique uses a 96-well microtiter plate format for this technique. ELISA is extremely sensitive and allows for the quantitation of low amounts of proteins up to pg/mL concentration levels.

However, there are some major issues associated with ELISA measurements. These

17

include; non-specific binding, multiplexing incapability, availability of antibodies and long periods in raising antibodies. For instance, popular plasma biomarker for prostate cancer, prostate specific antigen (PSA) is well documented to have specificity issues with reports of 15-

27% false negative and 61-78% false positive in diagnosis.

Western blot is frequently used either as an alternative or complimentary to ELISA measurements to detect proteins in a biological sample. In this approach, an antibody with affinity for a specific protein is first used to probe, followed by a secondary antibody for visualization (conjugated horseradish peroxidase) and identification of proteins of interest separated on an SDS gel. This analytical technique is simple, fast and easy to perform 105-107. As with the ELISA technology, western blots have issues with specificity and the availability of antibodies. In addition, problems such as; weak protein band signal, high background noise and variations in protein loading amounts further complicates the data obtained from this assay.

Selected reaction monitoring (SRM) MS is another validation strategy. Here, the intensity of representative peptides is measured and correlated to the concentration of the corresponding protein in disease vs. control clinical samples. The technology is superior to antibody-based assays (i.e. ELISA and western blot) because of its sensitivity and specificity.

1.8 Data Processing and Statistical Analysis of Glycoproteomics

The development of bioinformatics technology to analyze large data sets generated from glycoproteomics biomarker discovery studies have increased significantly over the past decade.

The utilization of these technologies either alone or in combination are project dependent, which means the goal of a particular project predicts or defines which bioinformatics tools are best suited for data evaluation and validation. In this thesis, each chapter (from chapters 2-5) uses the appropriate bioinformatics or statistical tools to generate the required information. Majority of

18

these data processing tools are free and can be found at public databases. The following tools were predominantly used during this thesis work; (1) Sequest (http://fields.scripps.edu/sequest/) and Mascot (http://www.matrixscience.com/search_form_select.html) algorithms for protein identifications, (2) GlycoWorkbench (http://glycomics.ccrc.uga.edu/eurocarb/gwb/home.action) and Glycomod (http://web.expasy.org/glycomod/) tools for glycopeptides and/or glycans structure annotations, (3) Analysis of variance (ANOVA) and standard t-test for differential protein analysis between groups of MS data, (4) PANTHER (Protein ANalysis THrough

Evolutionary Relationships) database (http://pantherdb.org/) for protein classifications, and (5)

Gene A La Cart (provided by www..org) for establishing protein disease relationships and to gain protein-protein interaction information. In all statistical analysis, P-value <0.05 was considered as significant for both protein and glycosylation differential expression levels.

19

1.9 REFERENCES

1. Wilkins, M. R.; Sanchez, J. C.; Gooley, A. A.; Appel, R. D.; Humphery-Smith, I.; Hochstrasser, D. F.; Williams, K. L., Progress with proteome projects: why all proteins expressed by a genome should be identified and how to do it. Biotechnol Genet Eng Rev 1996, 13, 19-50.

2. Kim, E. H.; Misek, D. E., Glycoproteomics-based identification of cancer biomarkers. International Journal of Proteomics 2011, 2011, 601937.

3. Helenius, A.; Aebi, M., Intracellular functions of N-linked glycans. Science 2001, 291 (5512), 2364-9.

4. Apweiler, R.; Hermjakob, H.; Sharon, N., On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochimica et biophysica acta 1999, 1473 (1), 4-8.

5. Wong, C. H., Protein glycosylation: new challenges and opportunities. J Org Chem 2005, 70 (11), 4219-25.

6. Wei, X.; Li, L., Comparative glycoproteomics: approaches and applications. Brief Funct Genomic Proteomic 2009, 8 (2), 104-13.

7. Floyd, N.; Vijayakrishnan, B.; Koeppe, J. R.; Davis, B. G., Thiyl glycosylation of olefinic proteins: S-linked glycoconjugate synthesis. Angew Chem Int Ed Engl 2009, 48 (42), 7798-802.

8. Lote, C. J.; Weiss, J. B., Identification in urine of a low-molecular-weight highly polar glycopeptide containing cysteinyl-galactose. The Biochemical journal 1971, 123 (4), 25P.

9. Slawson, C.; Hart, G. W., Dynamic interplay between O-GlcNAc and O-phosphate: the sweet side of protein regulation. Curr Opin Struct Biol 2003, 13 (5), 631-6.

10. Rexach, J. E.; Clark, P. M.; Hsieh-Wilson, L. C., Chemical approaches to understanding O-GlcNAc glycosylation in the brain. Nat Chem Biol 2008, 4 (2), 97-106.

11. Hakomori, S., Tumor malignancy defined by aberrant glycosylation and sphingo(glyco)lipid metabolism. Cancer Research 1996, 56 (23), 5309-18.

12. Orntoft, T. F.; Vestergaard, E. M., Clinical aspects of altered glycosylation of glycoproteins in cancer. Electrophoresis 1999, 20 (2), 362-71.

13. Barrabes, S.; Pages-Pons, L.; Radcliffe, C. M.; Tabares, G.; Fort, E.; Royle, L.; Harvey, D. J.; Moenner, M.; Dwek, R. A.; Rudd, P. M.; De Llorens, R.; Peracaula, R., Glycosylation of serum ribonuclease 1 indicates a major endothelial origin and reveals an increase in core fucosylation in pancreatic cancer. Glycobiology 2007, 17 (4), 388-400.

20

14. Kuzmanov, U.; Jiang, N.; Smith, C. R.; Soosaipillai, A.; Diamandis, E. P., Differential N- glycosylation of kallikrein 6 derived from ovarian cancer cells or the central nervous system. Molecular & cellular proteomics : MCP 2009, 8 (4), 791-8.

15. Meany, D. L.; Zhang, Z.; Sokoll, L. J.; Zhang, H.; Chan, D. W., Glycoproteomics for prostate cancer detection: changes in serum PSA glycosylation patterns. Journal of Proteome Research 2009, 8 (2), 613-9.

16. Misonou, Y.; Shida, K.; Korekane, H.; Seki, Y.; Noura, S.; Ohue, M.; Miyamoto, Y., Comprehensive clinico-glycomic study of 16 colorectal cancer specimens: elucidation of aberrant glycosylation and its mechanistic causes in colorectal cancer cells. Journal of Proteome Research 2009, 8 (6), 2990-3005.

17. Mizuochi, T.; Nishimura, R.; Derappe, C.; Taniguchi, T.; Hamamoto, T.; Mochizuki, M.; Kobata, A., Structures of the asparagine-linked sugar chains of human chorionic gonadotropin produced in choriocarcinoma. Appearance of triantennary sugar chains and unique biantennary sugar chains. The Journal of biological chemistry 1983, 258 (23), 14126-9.

18. Moore, A.; Medarova, Z.; Potthast, A.; Dai, G., In vivo targeting of underglycosylated MUC-1 tumor antigen using a multimodal imaging probe. Cancer Research 2004, 64 (5), 1821- 7.

19. Ohyama, C.; Hosono, M.; Nitta, K.; Oh-eda, M.; Yoshikawa, K.; Habuchi, T.; Arai, Y.; Fukuda, M., Carbohydrate structure and differential binding of prostate specific antigen to Maackia amurensis lectin between prostate cancer and benign prostate hypertrophy. Glycobiology 2004, 14 (8), 671-9.

20. Peracaula, R.; Tabares, G.; Royle, L.; Harvey, D. J.; Dwek, R. A.; Rudd, P. M.; de Llorens, R., Altered glycosylation pattern allows the distinction between prostate-specific antigen (PSA) from normal and tumor origins. Glycobiology 2003, 13 (6), 457-70.

21. Taylor, A. D.; Hancock, W. S.; Hincapie, M.; Taniguchi, N.; Hanash, S. M., Towards an integrated proteomic and glycomic approach to finding cancer biomarkers. Genome medicine 2009, 1 (6), 57.

22. Cooper, C. A.; Harrison, M. J.; Wilkins, M. R.; Packer, N. H., GlycoSuiteDB: a new curated relational database of glycoprotein glycan structures and their biological sources. Nucleic acids research 2001, 29 (1), 332-5.

23. Rudd, P. M.; Dwek, R. A., Glycosylation: heterogeneity and the 3D structure of proteins. Crit Rev Biochem Mol Biol 1997, 32 (1), 1-100.

24. Rudd, P. M.; Guile, G. R.; Kuster, B.; Harvey, D. J.; Opdenakker, G.; Dwek, R. A., Oligosaccharide sequencing technology. Nature 1997, 388 (6638), 205-7.

21

25. Brooks, S. A.; Carter, T. M.; Royle, L.; Harvey, D. J.; Fry, S. A.; Kinch, C.; Dwek, R. A.; Rudd, P. M., Altered glycosylation of proteins in cancer: what is the potential for new anti- tumour strategies. Anticancer Agents Med Chem 2008, 8 (1), 2-21.

26. Kobata, A., Altered glycosylation of surface glycoproteins in tumor cells and its clinical application. Pigment Cell Res 1989, 2 (4), 304-8.

27. Kobata, A.; Amano, J., Altered glycosylation of proteins produced by malignant cells, and application for the diagnosis and immunotherapy of tumours. Immunol Cell Biol 2005, 83 (4), 429-39.

28. Hammarstrom, S., The carcinoembryonic antigen (CEA) family: structures, suggested functions and expression in normal and malignant tissues. Semin Cancer Biol 1999, 9 (2), 67-81.

29. Moss, E. L.; Hollingworth, J.; Reynolds, T. M., The role of CA125 in clinical practice. Journal of Clinical Pathology 2005, 58 (3), 308-12.

30. Drake, P. M.; Cho, W.; Li, B.; Prakobphol, A.; Johansen, E.; Anderson, N. L.; Regnier, F. E.; Gibson, B. W.; Fisher, S. J., Sweetening the pot: adding glycosylation to the biomarker discovery equation. Clinical Chemistry 2010, 56 (2), 223-36.

31. Jankovic, M. M.; Milutinovic, B. S., Glycoforms of CA125 antigen as a possible cancer marker. Cancer Biomark 2008, 4 (1), 35-42.

32. van Gisbergen, K. P.; Aarnoudse, C. A.; Meijer, G. A.; Geijtenbeek, T. B.; van Kooyk, Y., Dendritic cells recognize tumor-specific glycosylation of carcinoembryonic antigen on colorectal cancer cells through dendritic cell-specific intercellular adhesion molecule-3-grabbing nonintegrin. Cancer Research 2005, 65 (13), 5935-44.

33. Zaia, J., Mass spectrometry and glycomics. Omics : a journal of integrative biology 2010, 14 (4), 401-18.

34. Knezevic, A.; Polasek, O.; Gornik, O.; Rudan, I.; Campbell, H.; Hayward, C.; Wright, A.; Kolcic, I.; O'Donoghue, N.; Bones, J.; Rudd, P. M.; Lauc, G., Variability, heritability and environmental determinants of human plasma N-glycome. Journal of Proteome Research 2009, 8 (2), 694-701.

35. Chen, R.; Pan, S.; Aebersold, R.; Brentnall, T. A., Proteomics studies of pancreatic cancer. Proteomics. Clinical applications 2007, 1 (12), 1582-1591.

36. Hanash, S. M.; Pitteri, S. J.; Faca, V. M., Mining the plasma proteome for cancer biomarkers. Nature 2008, 452 (7187), 571-9.

37. Chen, R.; Yi, E. C.; Donohoe, S.; Pan, S.; Eng, J.; Cooke, K.; Crispin, D. A.; Lane, Z.; Goodlett, D. R.; Bronner, M. P.; Aebersold, R.; Brentnall, T. A., Pancreatic cancer proteome: the

22

proteins that underlie invasion, metastasis, and immunologic escape. Gastroenterology 2005, 129 (4), 1187-97.

38. DeSouza, L.; Diehl, G.; Rodrigues, M. J.; Guo, J.; Romaschin, A. D.; Colgan, T. J.; Siu, K. W., Search for cancer markers from endometrial tissues using differentially labeled tags iTRAQ and cICAT with multidimensional liquid chromatography and tandem mass spectrometry. Journal of Proteome Research 2005, 4 (2), 377-86.

39. Li, C.; Hong, Y.; Tan, Y. X.; Zhou, H.; Ai, J. H.; Li, S. J.; Zhang, L.; Xia, Q. C.; Wu, J. R.; Wang, H. Y.; Zeng, R., Accurate qualitative and quantitative proteomic analysis of clinical hepatocellular carcinoma using laser capture microdissection coupled with isotope-coded affinity tag and two-dimensional liquid chromatography mass spectrometry. Molecular & cellular proteomics : MCP 2004, 3 (4), 399-409.

40. Hu, S.; Loo, J. A.; Wong, D. T., Human body fluid proteome analysis. Proteomics 2006, 6 (23), 6326-53.

41. Mauri, P.; Scarpa, A.; Nascimbeni, A. C.; Benazzi, L.; Parmagnani, E.; Mafficini, A.; Della Peruta, M.; Bassi, C.; Miyazaki, K.; Sorio, C., Identification of proteins released by pancreatic cancer cells by multidimensional protein identification technology: a strategy for identification of novel cancer markers. Faseb J 2005, 19 (9), 1125-7.

42. Kulasingam, V.; Diamandis, E. P., Proteomics analysis of conditioned media from three breast cancer cell lines: a mine for biomarkers and therapeutic targets. Molecular & cellular proteomics : MCP 2007, 6 (11), 1997-2011.

43. Shen, Z.; Want, E. J.; Chen, W.; Keating, W.; Nussbaumer, W.; Moore, R.; Gentle, T. M.; Siuzdak, G., Sepsis plasma protein profiling with immunodepletion, three-dimensional liquid chromatography tandem mass spectrometry, and spectrum counting. Journal of Proteome Research 2006, 5 (11), 3154-60.

44. Tu, C.; Rudnick, P. A.; Martinez, M. Y.; Cheek, K. L.; Stein, S. E.; Slebos, R. J.; Liebler, D. C., Depletion of abundant plasma proteins and limitations of plasma proteomics. Journal of Proteome Research 2010, 9 (10), 4982-91.

45. Lis, H.; Sharon, N.; Katchalski, E., Identification of the carbohydrate-protein linking group in soybean hemagglutinin. Biochimica et biophysica acta 1969, 192 (2), 364-6. 46. Aspberg, K.; Porath, J., Group-specific adsorption of glycoproteins. Acta Chem Scand 1970, 24 (5), 1839-41.

47. Cuatrecasas, P.; Tell, G. P., Insulin-like activity of concanavalin A and wheat germ agglutinin--direct interactions with insulin receptors. Proc Natl Acad Sci U S A 1973, 70 (2), 485-9.

23

48. Kullolli, M.; Hancock, W. S.; Hincapie, M., Preparation of a high-performance multi- lectin affinity chromatography (HP-M-LAC) adsorbent for the analysis of human plasma glycoproteins. Journal of separation science 2008, 31 (14), 2733-9.

49. Johansen, E.; Schilling, B.; Lerch, M.; Niles, R. K.; Liu, H.; Li, B.; Allen, S.; Hall, S. C.; Witkowska, H. E.; Regnier, F. E.; Gibson, B. W.; Fisher, S. J.; Drake, P. M., A lectin HPLC method to enrich selectively-glycosylated peptides from complex biological samples. J Vis Exp 2009, (32).

50. Fanayan, S.; Hincapie, M.; Hancock, W. S., Using lectins to harvest the plasma/serum glycoproteome. Electrophoresis 2012, 33 (12), 1746-54.

51. Cummings, R. D.; Kornfeld, S., Fractionation of asparagine-linked oligosaccharides by serial lectin-Agarose affinity chromatography. A rapid, sensitive, and specific technique. The Journal of biological chemistry 1982, 257 (19), 11235-40.

52. Qiu, R.; Regnier, F. E., Comparative glycoproteomics of N-linked complex-type glycoforms containing sialic acid in human serum. Analytical Chemistry 2005, 77 (22), 7225-31.

53. Yang, Z.; Hancock, W. S., Approach to the comprehensive analysis of glycoproteins isolated from human serum using a multi-lectin affinity column. Journal of chromatography. A 2004, 1053 (1-2), 79-88.

54. Abbott, K. L.; Aoki, K.; Lim, J. M.; Porterfield, M.; Johnson, R.; O'Regan, R. M.; Wells, L.; Tiemeyer, M.; Pierce, M., Targeted glycoproteomic identification of biomarkers for human breast carcinoma. Journal of Proteome Research 2008, 7 (4), 1470-80.

55. Qiu, Y.; Patwa, T. H.; Xu, L.; Shedden, K.; Misek, D. E.; Tuck, M.; Jin, G.; Ruffin, M. T.; Turgeon, D. K.; Synal, S.; Bresalier, R.; Marcon, N.; Brenner, D. E.; Lubman, D. M., Plasma glycoprotein profiling for colorectal cancer biomarker identification by lectin glycoarray and lectin blot. Journal of Proteome Research 2008, 7 (4), 1693-703.

56. Vercoutter-Edouart, A. S.; Slomianny, M. C.; Dekeyzer-Beseme, O.; Haeuw, J. F.; Michalski, J. C., Glycoproteomics and glycomics investigation of membrane N-glycosylproteins from human colon carcinoma cells. Proteomics 2008, 8 (16), 3236-56.

57. Zhao, J.; Patwa, T. H.; Qiu, W.; Shedden, K.; Hinderer, R.; Misek, D. E.; Anderson, M. A.; Simeone, D. M.; Lubman, D. M., Glycoprotein microarrays with multi-lectin detection: unique lectin binding patterns as a tool for classifying normal, chronic pancreatitis and pancreatic cancer sera. Journal of Proteome Research 2007, 6 (5), 1864-74.

58. Liu, Y.; He, J.; Li, C.; Benitez, R.; Fu, S.; Marrero, J.; Lubman, D. M., Identification and confirmation of biomarkers using an integrated platform for quantitative analysis of glycoproteins and their glycosylations. Journal of Proteome Research 2010, 9 (2), 798-805.

24

59. Alley, W. R., Jr.; Mann, B. F.; Novotny, M. V., High-sensitivity analytical approaches for the structural characterization of glycoproteins. Chem Rev 2013, 113 (4), 2668-732.

60. Kaji, H.; Saito, H.; Yamauchi, Y.; Shinkawa, T.; Taoka, M.; Hirabayashi, J.; Kasai, K.; Takahashi, N.; Isobe, T., Lectin affinity capture, isotope-coded tagging and mass spectrometry to identify N-linked glycoproteins. Nat Biotechnol 2003, 21 (6), 667-72.

61. Zhang, Z.; Pan, H.; Chen, X., Mass spectrometry for structural characterization of therapeutic antibodies. Mass Spectrom Rev 2009, 28 (1), 147-76.

62. Taouatas, N.; Drugan, M. M.; Heck, A. J.; Mohammed, S., Straightforward ladder sequencing of peptides using a Lys-N metalloendopeptidase. Nature methods 2008, 5 (5), 405-7.

63. Choudhary, G.; Wu, S. L.; Shieh, P.; Hancock, W. S., Multiple enzymatic digestion for enhanced sequence coverage of proteins in complex proteomic mixtures using capillary LC with ion trap MS/MS. Journal of Proteome Research 2003, 2 (1), 59-67.

64. Wuhrer, M.; Catalina, M. I.; Deelder, A. M.; Hokke, C. H., Glycoproteomics based on tandem mass spectrometry of glycopeptides. J Chromatogr B Analyt Technol Biomed Life Sci 2007, 849(1-2), 115-128.

65. Wada, Y.; Wada, Y.; Azadi, P.; Costello, C. E.; Dell, A.; Dwek, R. A., Comparison of the methods for profiling glycoprotein glycans--HUPO Human Disease Glycomics/Proteome Initiative multi-institutional study. Glycobiology 2007, 17(4), 411-422

66. Biringer, R. G.; Amato, H.; Harrington, M. G.; Fonteh, A. N.; Riggins, J. N.; Huhmer, A. F., Enhanced sequence coverage of proteins in human cerebrospinal fluid using multiple enzymatic digestion and linear ion trap LC-MS/MS. Brief Funct Genomic Proteomic 2006, 5 (2), 144-53.

67. Atwood, J. A.; Minning, T.; Ludolf, F.; Nuccio, A.; Weatherly, D. B.; Alvarez-Manilla, G.; Tarleton, R.; Orlando, R., Glycoproteomics of Trypanosoma cruzi trypomastigotes using subcellular fractionation, lectin affinity, and stable isotope labeling. Journal of Proteome Research 2006, 5 (12), 3376-84.

68. Ghesquiere, B.; Van Damme, J.; Martens, L.; Vandekerckhove, J.; Gevaert, K., Proteome-wide characterization of N-glycosylation events by diagonal chromatography. Journal of Proteome Research 2006, 5 (9), 2438-47.

69. Hoffmann, E. d.; Stroobant, V., Mass Spectrometry: Principles and Applications. 3 ed.; Wiley-Interscience: 2007; p 43-54, 33-39, 85-167.

68. Yamagaki, T.; Nakanishi, H., A new technique distinguishing alpha2-3 sialyl linkage from alpha2-6 linkage in sialyllactoses and sialyl-N-acetyllactosamines by post-source decay fragmentation method of MALDI-TOF mass spectrometry. Glycoconj J 1999, 16 (8), 385-9.

25

70. Itoh, S.; Kawasaki, N.; Harazono, A.; Hashii, N.; Matsuishi, Y.; Kawanishi, T.; Hayakawa, T., Characterization of a gel-separated unknown glycoprotein by liquid chromatography/multistage tandem mass spectrometry: analysis of rat brain Thy-1 separated by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Journal of chromatography. A 2005, 1094 (1-2), 105-17.

71. Harazono, A.; Kawasaki, N.; Itoh, S.; Hashii, N.; Ishii-Watabe, A.; Kawanishi, T.; Hayakawa, T., Site-specific N-glycosylation analysis of human plasma ceruloplasmin using liquid chromatography with electrospray ionization tandem mass spectrometry. Analytical biochemistry 2006, 348 (2), 259-68.

72. Harazono, A.; Kawasaki, N.; Kawanishi, T.; Hayakawa, T., Site-specific glycosylation analysis of human apolipoprotein B100 using LC/ESI MS/MS. Glycobiology 2005, 15 (5), 447- 62.

73. Ito, H.; Takegawa, Y.; Deguchi, K.; Nagai, S.; Nakagawa, H.; Shinohara, Y.; Nishimura, S., Direct structural assignment of neutral and sialylated N-glycans of glycopeptides using collision-induced dissociation MSn spectral matching. Rapid Commun Mass Spectrom 2006, 20 (23), 3557-65.

74. Deguchi, K.; Ito, H.; Takegawa, Y.; Shinji, N.; Nakagawa, H.; Nishimura, S., Complementary structural information of positive- and negative-ion MSn spectra of glycopeptides with neutral and sialylated N-glycans. Rapid Commun Mass Spectrom 2006, 20 (5), 741-6.

75. Temporini, C.; Perani, E.; Calleri, E.; Dolcini, L.; Lubda, D.; Caccialanza, G.; Massolini, G., Pronase-immobilized enzyme reactor: an approach for automation in glycoprotein analysis by LC/LC-ESI/MSn. Analytical Chemistry 2007, 79 (1), 355-63.

76. Du, M. Q.; Hutchinson, W. L.; Johnson, P. J.; Williams, R., Differential alpha-fetoprotein lectin binding in hepatocellular carcinoma. Diagnostic utility at low serum levels. Cancer 1991, 67(2), 476-480.

77. Jensen, P. H.; Karlsson, N. G.; Kolarich, D.; Packer, N. H.; Structural analysis of N- and O-glycans released from glycoproteins. Nature Protocols 2012 7(7), 1299-1310.

78. Conboy, J. J.; Henion, J. D., The determination of glycopeptides by liquid chromatography/mass spectrometry with collision-induced dissociation J. Am. Soc. Mass Spectrom. 1992, 3: 804.

79. Huddleston, M. J.; Bean, M. F; Carr, S. A., Collisional fragmentation of glycopeptides by electrospray ionization LC/MS and LC/MS/MS: methods for selective detection of glycopeptides in protein digests. Analytical Chemistry 1993, 65: 877.

80. Renfrow, M. B.; Mackay, C. L.; Chalmers, M. J.; Julian, B. A.; Mestecky, J.; Kilian, M.; Poulsen, K.; Emmett, M. R.; Marshall, A. G.; Novak, J., Analysis of O-glycan heterogeneity in

26

IgA1 myeloma proteins by Fourier transform ion cyclotron resonance mass spectrometry: implications for IgA nephropathy. Analytical and Bioanalytical Chemistry 2007, 389 (5), 1397- 407.

81. Wu, S. L.; Huhmer, A. F.; Hao, Z.; Karger, B. L., On-line LC-MS approach combining collision-induced dissociation (CID), electron-transfer dissociation (ETD), and CID of an isolated charge-reduced species for the trace-level characterization of proteins with post- translational modifications. Journal of Proteome Research 2007, 6 (11), 4230-44.

82. Zhang, Q.; Tang, N.; Brock, J. W.; Mottaz, H. M.; Ames, J. M.; Baynes, J. W.; Smith, R. D.; Metz, T. O., Enrichment and analysis of nonenzymatically glycated peptides: boronate affinity chromatography coupled with electron-transfer dissociation mass spectrometry. Journal of Proteome Research 2007, 6 (6), 2323-30.

83. Wuhrer, M.; Stam, J. C.; van de Geijn, F. E.; Koeleman, C. A.; Verrips, C. T.; Dolhain, R. J.; Hokke, C. H.; Deelder, A. M., Glycosylation profiling of immunoglobulin G (IgG) subclasses from human serum. Proteomics 2007, 7 (22), 4070-81.

84. Catalina, M. I.; Koeleman, C. A.; Deelder, A. M.; Wuhrer, M., Electron transfer dissociation of N-glycopeptides: loss of the entire N-glycosylated asparagine side chain. Rapid Commun Mass Spectrom 2007, 21 (6), 1053-61.

85. Deguchi, K.; Ito, H.; Baba, T.; Hirabayashi, A.; Nakagawa, H.; Fumoto, M.; Hinou, H.; Nishimura, S., Structural analysis of O-glycopeptides employing negative- and positive-ion multi-stage mass spectra obtained by collision-induced and electron-capture dissociations in linear ion trap time-of-flight mass spectrometry. Rapid Commun Mass Spectrom 2007, 21 (5), 691-8.

86. Ong, S. E.; Blagoev, B.; Kratchmarova, I.; Kristensen, D. B.; Steen, H.; Pandey, A.; Mann, M., Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Molecular & cellular proteomics : MCP 2002, 1 (5), 376-86.

87. Shiio, Y.; Aebersold, R., Quantitative proteome analysis using isotope-coded affinity tags and mass spectrometry. Nature Protocols 2006, 1 (1), 139-45.

88. Cagney, G.; Emili, A., De novo peptide sequencing and quantitative profiling of complex protein mixtures using mass-coded abundance tagging. Nat Biotechnol 2002, 20 (2), 163-70.

89. Ross, P. L.; Huang, Y. N.; Marchese, J. N.; Williamson, B.; Parker, K.; Hattan, S.; Khainovski, N.; Pillai, S.; Dey, S.; Daniels, S.; Purkayastha, S.; Juhasz, P.; Martin, S.; Bartlet- Jones, M.; He, F.; Jacobson, A.; Pappin, D. J., Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Molecular & cellular proteomics : MCP 2004, 3 (12), 1154-69.

27

90. Qian, M.; Sleat, D. E.; Zheng, H.; Moore, D.; Lobel, P., Proteomics analysis of serum from mutant mice reveals lysosomal proteins selectively transported by each of the two mannose 6-phosphate receptors. Molecular & cellular proteomics : MCP 2008, 7 (1), 58-70.

91. Kang, P.; Mechref, Y.; Kyselova, Z.; Goetz, J. A.; Novotny, M. V., Comparative glycomic mapping through quantitative permethylation and stable-isotope labeling. Analytical Chemistry 2007, 79 (16), 6064-73.

92. Zhang, H.; Yi, E. C.; Li, X. J.; Mallick, P.; Kelly-Spratt, K. S.; Masselon, C. D.; Camp, D. G., 2nd; Smith, R. D.; Kemp, C. J.; Aebersold, R., High throughput quantitative analysis of serum proteins using glycopeptide capture and liquid chromatography mass spectrometry. Molecular & cellular proteomics : MCP 2005, 4 (2), 144-55.

93. Anderson, L.; Hunter, C. L., Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins. Molecular & cellular proteomics : MCP 2006, 5 (4), 573-88.

94. Rifai, N.; Gillette, M. A.; Carr, S. A., Protein biomarker discovery and validation: the long and uncertain path to clinical utility. Nat Biotechnol 2006, 24 (8), 971-83.

95. Keshishian, H.; Addona, T.; Burgess, M.; Kuhn, E.; Carr, S. A., Quantitative, multiplexed assays for low abundance proteins in plasma by targeted mass spectrometry and stable isotope dilution. Molecular & cellular proteomics : MCP 2007, 6 (12), 2212-29.

96. Serrano, E.; Pozo, O. J.; Beltran, J.; Hernandez, F.; Font, L.; Miquel, M.; Aragon, C. M., Liquid chromatography/tandem mass spectrometry determination of (4S,2RS)-2,5,5- trimethylthiazolidine-4-carboxylic acid, a stable adduct formed between D-(-)-penicillamine and acetaldehyde (main biological metabolite of ethanol), in plasma, liver and brain rat tissues. Rapid Commun Mass Spectrom 2007, 21 (7), 1221-9.

97. Wang, S.; Gong, T.; Lu, J.; Kano, Y.; Yuan, D., Simultaneous determination of tectorigenin and its metabolites in rat plasma by ultra performance liquid chromatography/quadrupole time-of-flight mass spectrometry. J Chromatogr B Analyt Technol Biomed Life Sci 2013, 933, 50-8.

98. Old, W. M.; Meyer-Arendt, K.; Aveline-Wolf, L.; Pierce, K. G.; Mendoza, A.; Sevinsky, J. R.; Resing, K. A.; Ahn, N. G., Comparison of label-free methods for quantifying human proteins by shotgun proteomics. Molecular & cellular proteomics : MCP 2005, 4 (10), 1487- 502.

99. Thielmann, M.; Massoudy, P.; Schmermund, A.; Neuhauser, M.; Marggraf, G.; Kamler, M.; Herold, U.; Aleksic, I.; Mann, K.; Haude, M.; Heusch, G.; Erbel, R.; Jakob, H., Diagnostic discrimination between graft-related and non-graft-related perioperative myocardial infarction with cardiac troponin I after coronary artery bypass surgery. Eur Heart J 2005, 26 (22), 2440-7.

28

100. Bondarenko, P. V.; Chelius, D.; Shaler, T. A., Identification and relative quantitation of protein mixtures by enzymatic digestion followed by capillary reversed-phase liquid chromatography-tandem mass spectrometry. Analytical Chemistry 2002, 74 (18), 4741-9.

101. Wang, W.; Zhou, H.; Lin, H.; Roy, S.; Shaler, T. A.; Hill, L. R.; Norton, S.; Kumar, P.; Anderle, M.; Becker, C. H., Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Analytical Chemistry 2003, 75 (18), 4818-26.

102. Plavina, T.; Wakshull, E.; Hancock, W. S.; Hincapie, M., Combination of abundant protein depletion and multi-lectin affinity chromatography (M-LAC) for plasma protein biomarker discovery. Journal of Proteome Research 2007, 6 (2), 662-71.

103. Lequin, R. M., Enzyme immunoassay (EIA)/enzyme-linked immunosorbent assay (ELISA). Clinical Chemistry 2005, 51 (12), 2415-8.

104. Yalow, R. S.; Berson, S. A., Immunoassay of endogenous plasma insulin in man. J Clin Invest 1960, 39, 1157-75.

105. Burnette, W. N., "Western blotting": electrophoretic transfer of proteins from sodium dodecyl sulfate--polyacrylamide gels to unmodified nitrocellulose and radiographic detection with antibody and radioiodinated protein A. Analytical biochemistry 1981, 112 (2), 195-203.

106. Ochiai, H., [Electric transfer of peptides from acrylamide gels to nitrocellulose sheets: Procedure and some applications of "Western blotting" (author's transl)]. Seikagaku 1982, 54 (2), 107-9.

107. Sharma, P.; Ganguly, N. K.; Sehgal, R.; Srivastava, R. K., Western blotting. Trop Gastroenterol 1989, 10 (1), 62-8.

29

CHAPTER 2

DEVELOPMENT OF AN IMPROVED FRACTIONATION OF THE HUMAN PLASMA

PROTEOME BY A COMBINATION OF ABUNDANT PROTEINS DEPLETION AND

MULTI-LECTIN AFFINITY CHROMATOGRAPHY

This work is currently under review for publication: Francisca O. Gbormittah, Marina Hincapie, William S. Hancock, Bioanalysis, 2014

30

2.1 ABSTRACT

Current analytical tools do not meet the required capacity to reduce the complexity of the plasma proteome and identify low level proteins of clinical interest. Therefore, there is the urgent need to develop a sample fractionation approach that can provide (1) adequate throughput for a clinical study and (2) minimize the loss and improve the detection of low abundance proteins.

Here, we present the development of a multi-dimensional sample fractionation platform which combines the depletion of top 12 high abundance proteins (12P) and multi-lectin affinity chromatography (M-LAC) glycoproteome enrichment from human plasma (n=10 females). We observed 91.8% (%CV=2.6) total protein recoveries of the starting material after fractionation of the human plasma. We have validated the specificity of both the 12P (97% efficiency of target proteins) and M-LAC (95% of proteins identified are glycoproteins) columns. Further, we show a highly stable (≥175 independent runs) and robust (r2 = 0.9845 from replicate runs) 12P-M-LAC platform. An improved enrichment of low abundance proteins and glycoproteins were achieved demonstrating the suitability of our analytical platform (12P-M-LAC) in future biomarker discovery studies.

31

2.2 INTRODUCTION

Plasma, the most common bio-specimen is known to be the “reservoir” of soluble proteins.

Human plasma is used in early biomarker discovery studies because they are easily obtained and contains potential candidate biomarkers secreted or leached into the blood stream. However, the characterization of proteins from human plasma to identify potential markers is a difficult task.

This is because of the large dynamic range (i.e., > 10 orders of magnitude) of proteins abundance and a variety of proteins post translational modifications 1. Current technology that is based on one dimensional fractionation followed by liquid chromatography tandem mass spectrometry (LC-MS/MS) is inadequate to detect low level proteins present in plasma, which have the potential of informing on a disease state. It has therefore become necessary to reduce the complexity of plasma to improve the identification of disease biomarkers that are present in low amounts. In this project, the reduction in complexity was achieved by purifying human plasma with an immuno-affinity depletion column followed by glycoprotein fractionation using a multi-lectin column with selectivity for cancer associated glycan changes and subsequently analyzed by LC-MS/MS.

Human serum albumin (HSA) and Immunoglobulin (IgG’s) are the two most abundant proteins in human plasma and they represent more than three-quarters of the total protein content

(75-80%). Specifically, the top 20 high abundant proteins including; albumin, immunoglobulins, haptoglobin, transferrin, fibrinogen, Apolipoprotein A-1, alpha-macroglobulin, alpha-1 acid glycoprotein and alpha-1 anti-trypsin makes up about 97% of the total protein content in plasma, leaving less than the 3% for low abundance proteins. This suggests that accurate detection of low-level proteins and/or glycoproteins of disease significance rely on effective sample preparation procedures. Several multi-dimensional methodologies have been devised to reduce

32

the complexity of human plasma for comparative proteomics studies, and to identify potential biomarker candidates. The most utilized of these methods include; immuno-affinity depletion of high abundance proteins, 1 or 2-dimensional gel electrophoresis of proteins, chromatographic fractionation of proteins or peptides, and enrichment of modified proteins or peptides 2. All of these methods are either used alone or in combination to achieve the proteomics’ goals of a particular biomarker study.

A number of immuno-depletion columns are reported to remove high abundance proteins.

These include; albumin and Immunoglobulin depletion columns 3, 4, Multiple Affinity Removal

System (MARS) which targets the depletion of top 6, 7 or 14 high abundant proteins 5-7, and

Sigma-Aldrich’s ProteoPrep20 immuno-depletion column which targets top 20 high abundance proteins 8. These immuno-affinity columns have been applied successfully in several proteomics studies. A drawback however that is associated with their use involves the loss of low-level proteins that may be of potential interest. This loss occurs as a result of non-specific binding and protein-protein interactions with carrier proteins during the depletion process 9-11.

Several studies have indicated that glycosylated proteins are among the most important sub class of plasma low abundance proteins where in their alterations may be an indication of a disease state 12-15. Protein glycosylation is a common post-translational modification in eukaryotic organisms, and proteomic research targeting their association and role with various disease states including cancer is of importance. Recently, advanced analytical strategies have been applied to study the glycoproteome of disease samples to identify glycoproteins and their glycans as specific molecular biomarkers 16. One of these analytical technologies is the use of lectins in an affinity chromatography platform to capture proteins that expresses specific carbohydrate structures 2,17. Lectins are molecules that bind reversibly to sugar moieties attached

33

to proteins and their application either as individuals (S-LAC) or a mixture (M-LAC) has been used to explore oligosaccharide structural alterations in disease samples 18,19.

Previously, our laboratory reported the effective use of abundant protein depletion combined with multi-lectin chromatography to investigate human plasma of patients with medium to severe psoriasis. Proteins with differential abundance expression at low µg/mL levels were detected 2. The depletion column used in that study targeted albumin and immunoglobulins, which are the top 2 high abundance proteins. Additionally, the M-LAC column used composed of agarose-support lectins: concanavalin A, wheat germ agglutinin, and jacalin. This strategy was further automated to integrate immuno-affinity depletion of high abundant proteins and high pressure POROS M-LAC packed column 20,21. The automated platform has been applied in a number of biomarker discovery studies, and has led to the identification of potential candidate markers in various cancer samples 22- 24.

The focus of this work is to advance the multi-lectin affinity chromatography platform to incorporate the targeted depletion of top 12 high abundant proteins using a highly stable depletion resin and M-LAC enrichment of low abundance proteins and glycoproteins. The M-

LAC column in this platform contains a mixture of three lectins; Sambucus nigra (SNA), Aleuria aurantia (AAL), and Phytohemagglutinin-L (PHA-L) having a unique specificity towards a carbohydrate structure. SNA targets NeuAc α 2–6Gal/GalNAc (sialic acids and N-acetyl glucosamine) sugar structures; AAL captures Fuc α 1–6GlcNAc (core fucose) glycoproteins residues and PHA-L have affinity for β 1, 6 branched glycans (including terminal galactose and mannose residues). These lectins were selected based on commonly observed glycan alterations in cancer and other malignancies 14, 25-27.

In this chapter, the development of an effective and robust platform that integrates

34

immuno-affinity depletion of 12 high abundance proteins and M-LAC fractionation from human plasma is reported, Figure 1. BCA assay measurements, 1D-SDS PAGE and tandem mass spectrometry enabled the validation of the platform. We recorded an average of 91.8% total protein recoveries of the starting material and a spectral count average Pearson coefficient value

(r2) of 0.9845 from replicate runs. This result shows the reproducibility and efficiency of the platform. Low abundance glycoproteins and non-glycoproteins were enriched and observed to be segregated into sub-populations highlighting the effectiveness of the developed 12P-M-LAC platform.

2.3 MATERIALS AND METHODS

2.3.1 Materials

Capture select ® 12P depletion resin and PEEK columns were purchased from Life Technology,

Milford, MA. Gravity omnifit glass column was provided by Biochem Fluidics, Boonton, New

Jersey. POROS 20-AL beads for lectin conjugation, POROS® R1 50µm Bulk Media (reversed phase packing), and HPLC self-packing device were purchased from Applied Biosystems,

Framingham, MA. All lectins were obtained from Vector laboratories, Burlingame, CA.

Sequencing grade modified trypsin was purchased from Promega (Madison, WI). 4−12% Bis-

Tris sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) gels and NuPAGE

MES SDS running buffer (10X) were purchased from Invitrogen, Carlsbad, CA. HPLC-MS grade water, formic acid, acetonitrile and other buffer reagents were all purchased from Thermo

Fisher Scientific (Waltham, MA).

35

2.3.2 Samples for study

In this study we used reference plasma samples (n=10 females) purchased from Bioreclamation

(Jericho, NY) and evaluated the platform using the following check points; loading capacity, column recoveries and platform reproducibility. Particulates and fatty materials present in the reference plasma was first removed by making a 1:1 mixture of reference plasma and 20 mM phosphate buffer pH 7.4 in a 1.5 mL centrifuge tube. The mixture was centrifuged for 15 minutes at 6,000 x g after brief vortexing. Supernatant of the mixture was extracted followed by buffer exchange with 12P-M-LAC binding buffer (25 mM Tris, 0.5 M NaCl, 1 mM MnCl2, 1 mM

CaCl2 and 0.05% sodium azide, pH 7.4) using a conditioned (equal ratio water/ethanol mixture)

MWCO 3kDa µm membrane filter (Millipore, Billerica, MA). Purified samples were then stored in aliquots of 100µL in -80oC.

2.3.3 Experimental design

The design for reference plasma processing is summarized in the flow diagram, Figure 1. Ten female reference plasma samples were pooled to make one large volume. This enabled both analytical and technical replicates analysis. Consistency in experimental results and conservation of plasma sample integrity were achieved by not allowing each aliquot to undergo more than two freeze/thaw cycles after removal from the -80oC freezer.

36

Figure 2.1: An illustration of the experimental work flow of the 12P-M-LAC analytical platform. 25µL of reference plasma was purified from 12 high abundance proteins using an immuno- affinity depletion resin packed in a PEEK column followed by glycoprotein fractionation with multi-lectin column.

3.3.4 Preparation of 12P, M-LAC and reverse phase HPLC columns

Twelve protein (12P) immuno-affinity depletion column was prepared by making a 50:50 slurry from the resin with M-LAC binding buffer composed of; 25mM Tris, 0.5M sodium chloride,

1mM MnCl2, 1mM CaCl2 and 0.05% sodium azide, pH 7.4. The slurry was then packed under gravity into an omnifit glass column (4.6mm×100mm). The M-LAC PEEK column was prepared using three conjugated lectins mixed in a ratio of 1:1:1. All 3 lectins were cross-linked separately following the same conjugation procedure as previously described 20. Briefly, each lectin was conjugated to styrene –divinyl benzene support matrix coated with a cross-linked polystyrene-

37

divinyl benzene support matrix (POROS™) 20 beads as follows; first, 0.4g POROS beads was washed thoroughly with HEPES buffer (100mM pH 8.2). Next, 50% POROS beads was mixed with 25mg lectin (AAL, SNA or PHA-L) dissolved in 100mM HEPES buffer. The lectin-

POROS mixture was subsequently incubated overnight at room temperature with constant rotation and the reaction was stopped with 0.5 M Tris buffer pH 7.5 and 20 mM sodium cyanoborohydride. Supernatant was removed after centrifugation for 5 min at 4000 x g followed by reconstitution with M-LAC binding buffer. After conjugation, a mixture of the 3 lectins was prepared in a ratio of 1:1:1 and packed using a self-packing tool on a Shimadzu HPLC system into an M-LAC column (PEEK 4.6mm×100mm). POROS® R1 reversed phase column for sample de-salting was packed in a similar fashion into a 4.6mm×30mm PEEK column.

2.3.5 High abundance protein depletion and multi-lectin affinity fractionation

Reference plasma samples were fractionated with three columns namely; 12P depletion column,

M-LAC column and R1-reversed phase C4 column (Life Technologies, Carlsbad, CA). These three columns were connected in series on a 2-dimensional HPLC system (Shimadzu, Columbia,

MD) capable of automatic switching and controlled by the EZstart software (Shimadzu,

Columbia, MD). During HPLC fractionation, each column was placed in separate valve and their status (on or off-line) determined by valve switching. Using a flow rate of 2.0 mL/min in an isocratic mode, all columns were brought on-line and equilibrated with M-LAC binding buffer

(25mM Tris, 0.5M sodium chloride, 1mM MnCl2, 1mM CaCl2 and 0.05% sodium azide, pH 7.4) for 15 minutes followed by an adjustment of the flow-rate to 0.5 mL/min. Next, 25µL plasma was loaded onto the HPLC columns via manual injection and the sample processed using the following sequence; high abundance proteins depletion by the 12P column (12P bound); unbound (depleted plasma) fraction passes through the M-LAC column where glycoproteins

38

with specific sugar structures are enriched (M-LAC bound) and flow-through (M-LAC unbound) fraction trapped on the R1 reversed phase column. During fractionation, depleted plasma fractions were collected separately without going through the M-LAC column to enable evaluations of the depletion column efficiency. Each fraction was eluted and de-salted on the R1 reversed phase column separately. The following buffers were used for sample elution; 0.1 M glycine (pH 2.5) for 12P column, 100mM acetic acid for M-LAC column, and 70% (v/v) acetonitrile/water for desalting. De-salted samples were subsequently concentrated via speed vacuum.

2.3.6 Total protein measurements and 1D-SDS PAGE analysis

Concentrated fractions were reconstituted in 1 X PBS buffer (pH 7.5) and total protein concentration performed using BCA assay kit (Thermo Fisher Scientific, Rockford, IL) following manufacturer’s instructions. BSA was used as the standard protein to generate calibration curve. This allowed for platform evaluations including; immuno-depletion efficiency, protein recoveries, and columns reproducibility. Equal amounts of each fraction was loaded on a

4−12% Bis-Tris polyacrylamide SDS-PAGE gels (10 wells, 1 mm) and gel separation performed on Pharmacia Biotech electrophoresis power supply-EPS 600 at 200 V for 35 mins. Gels were washed with milli-Q water three times and Coomassie stained with SimplyBlue TM SafeStain

(Invitrogen, Carlsbad, CA) to visualize protein bands.

2.3.7 In-solution protein trypsin digestion

20 µg of depleted and M-LAC fractions were denatured with 6 M Urea solution followed by reduction with 25 mM dithiothreitol at room temperature for 45 minutes and alkylation using 50 mM iodoacetamide at room temperature for 1 hour in darkness. Protein samples were cleaned up

39

on an HPLC system (Shimadzu multidimensional LC) with the R1-reversed phase C4 column as described in the previous section and elutes speed vacuum to dryness. Fractions were subsequently re-constituted in 100 µL of freshly prepared 100mM ammonium bicarbonate buffer, pH 8.0. Sequencing grade trypsin (1:25 w/w enzyme to protein ratio) was added to each sample and incubated at 37o C for approximately 16 hours. Following incubation, the trypsin reaction was quenched by adding 10% formic acid solution prior to LC-MS analysis.

2.3.8 Nano-LC-MS/MS analysis and peptide sequencing

Nano-LC-MS/MS was performed on an Eksigent 2D Nano-LC system (Dublin, CA) connected via capillary tubing to Finnigan LTQ mass spectrometer (ThermoFisher Scientific, Waltham,

MA). A 150mm x 75mm i.d. capillary column (New Objective, Woburn, MA) was packed in- house using a slurry of 5-µm particle, 300-Å pore size Magic C18 stationary phase (Michrom

Bioresources, Auburn, CA). 5 μL (1µg) of the tryptic digest was loaded onto C18 capillary column and de-salted for 30 min at a flow-rate of 300 nL/min using buffer A (0.1% v/v formic acid in HPLC grade water). Peptides were separated at the following linear gradient; 5% buffer B to 40% buffer B for 70 mins; 40% buffer B to 90% buffer B for 10 mins; 90% buffer B to 2% buffer B for 5 mins. Buffer B consisted of 0.1% v/v formic acid in HPLC grade acetonitrile. The mass spectrometer was operated in a data dependent mode with a full MS scan range from m/z

400 to 1800 followed by MS/MS fragmentation of the six most intense precursor ions selected from the MS spectrum. Dynamic exclusion was set with 1 repeating counts (repeat duration of

30s, exclusion list of 250, and exclusion duration of 30s, exclusion mass width 0.50 m/z low and

1.55 m/z high). MS/MS data were searched against Uniprot annotated human database (release

2011_1; 34,117 entries) using SEQUEST (Thermo Electron) algorithm present in Thermo Fisher

Proteome Discoverer 1.3 software suite for proteins identification. The search parameters were

40

set as follows; full trypsin as enzyme; carboxyamidomethylation (C) as fixed modification; two missed cleavages; precursor ion mass tolerance was 2.0 Da and fragment ion mass tolerance was

1.0 Da. Confidence in identification was enhanced by applying the reverse database with a false discovery rate (FDR) targeted at 1% at the peptide level.

2.4 RESULTS AND DISCUSSION

2.4.1 12P immuno-affinity depletion

The need to deplete the human plasma of high abundance proteins is significant in biomarker discovery studies because majority of candidate markers of pharmaceutical importance are present at low amounts in blood plasma. The detection of these low level proteins becomes a difficult task when a single step analytical approach is used, hence, various multi-step processes have been developed to reduce the complexity of plasma, and to improve the identification of low abundance proteins. To this end, we utilized an immuno-affinity depletion resin which consists of antibodies for the of top 12 high abundant proteins namely; Albumin, IgG, IgM, IgA,

Free light chains, Fibrinogen, Transferrin, α1 anti-trypsin, Apolipoprotein A1, α2 Macroglobulin,

Haptoglobin, and α1 acid glycoprotein to purify human plasma samples. The efficiency and performance of the depletion column was investigated as follows; first, the loading capacity was evaluated with 10 µL, 25 µL, and 50 µL volumes of plasma. After analysis by1D SDS-PAGE gels, we determined that 25 µL plasma volume was the optimal loading amount and ensured that sample loses, run-to-run carryover and non-specific binding are significantly reduced as demonstrated by MS1 peak area measurements (Appendix A). Next, we investigated the efficiency of the depletion process through replicate analysis of reference plasma. In Figure 2, gels of three analytical replicates are shown. Reference plasma is indicated at lanes (2, 5, 8); 12P

41

bound or targeted depleted proteins is shown at lanes (3, 6, 9); and12P unbound or depleted fraction is at lanes (4, 7, 10). The significant reduction of major bands in the depleted fraction

(lanes 4, 7 and 10) and the presence of “new” bands ( potential low abundance proteins) previously not observed in the “crude” reference plasma (lanes 2, 5 and 8) is indicative of efficient depletion.

Figure 2.2: ID SDS-PAGE analysis of reference plasma fractions collected from 12P depletion column to evaluate efficiency of the immuno-affinity depletion. Equal amounts (2µg) of replicate samples were separated on a 4-12% NuPAGE gel as follows; reference plasma (lanes 2, 5 and 8),12P bound fraction (lanes 3, 6 and 9), 12P flow-through fraction (lanes 4, 7 and 10). Lane 1 represents the molecular weight standard marker purchased from Invitrogen (SeeBlue® Plus2 Prestained).

LC-MS analysis of crude reference plasma showed only high abundance proteins such as serum albumin, IgG’s, transferrin and haptoglobin with high sequence coverage. However, LC-

MS analysis of depleted plasma (12P unbound) showed serum albumin sequence coverage of~

2.5% versus ~100% in crude plasma. In addition, several proteins such as Cystatin-C and

42

Serine/threonine-protein kinase NIM1 not detected in the “crude” plasma were identified in the

depleted plasma. The proteins that are difficult to detect in the reference plasma may be of low

concentration and hence could only be identified after depletion. This further strengthens our

conclusion of 12P depletion efficiency and we will demonstrate high reproducibility in a later

section.

2.4.2 Specificity of 12P depletion column

Further, we probed the specificity of the immuno-affinity depletion column by performing nano-

LC-MS/MS analysis of a tryptic digest of 12P bound fraction. All targeted depleted proteins

including some isoforms (total of 31) were identified with 99% confidence (FDR 1%) in

replicate runs, Table 1.

Table 2.1: A list of 12P bound (targeted depleted) proteins identified after mass spectrometry analysis

b Average Spectral counts Protein Description a # AAs MW [kDa] Calc. pI (Coverage) 1c 2c 3c Serum albumin 97.85% 68 69 68 609 69.30 6.28 Ig kappa chain C region 95.85% 9 8 8 106 11.60 5.87 Isoform 2 of Serum albumin 93.45% 39 41 39 417 47.30 6.35 Apolipoprotein A-I 89.78% 32 33 31 267 30.80 5.76 Ig lambda-1 chain C regions 84.53% 9 7 8 106 11.30 7.87 Ig lambda-2 chain C regions 84.53% 7 7 7 106 11.30 7.24 Ig lambda-3 chain C regions 77.92% 5 6 7 106 11.20 7.24 Ig gamma-1 chain C region 73.94% 16 15 14 330 36.10 8.19 Serotransferrin 73.47% 51 50 55 698 77.00 7.12 Ig gamma-3 chain C region 64.11% 15 15 17 377 41.30 7.90 Ig gamma-4 chain C region 60.46% 9 11 10 327 35.90 7.36 Ig gamma-2 chain C region 59.39% 18 17 15 326 35.90 7.59 Ig kappa chain V-III region HAH 55.74% 4 5 5 129 14.10 7.96 Alpha-1-antitrypsin 52.82% 17 19 21 418 46.70 5.59 Ig kappa chain V-I region AG 52.59% 3 4 4 108 12.00 5.99 Isoform 2 of Alpha-1-antitrypsin 51.50% 14 12 11 359 40.20 5.47 Immunoglobulin lambda-like 50.65% 10 8 6 214 23.00 8.84

43

polypeptide 5 Haptoglobin 50.39% 15 14 18 406 45.20 6.58 Alpha-2-macroglobulin 50.23% 50 45 42 1474 163.20 6.46 Ig kappa chain V-II region TEW 48.94% 3 3 4 113 12.30 6.00 Ig mu chain C region 45.18% 12 13 9 452 49.30 6.77 Fibrinogen beta chain 45.03% 15 13 11 491 55.90 8.27 Isoform 2 of Ig mu chain C region 43.62% 13 11 13 473 51.80 6.15 Isoform 3 of Alpha-1-antitrypsin 42.68% 9 7 5 306 34.70 5.19 Haptoglobin-related protein 41.90% 10 11 13 348 39.00 7.09 Ig alpha-1 chain C region 40.31% 9 9 7 353 37.60 6.51 Fibrinogen alpha chain 40.28% 19 21 17 644 69.70 8.06 Isoform Gamma-A of Fibrinogen 39.75% 10 11 12 437 49.50 6.09 gamma chain Fibrinogen gamma chain 38.70% 11 11 9 453 51.50 5.62 Ig alpha-2 chain C region 30.88% 6 6 4 340 36.50 6.10 Alpha-1-acid glycoprotein 22.44% 3 4 5 201 23.60 5.11 a protein names are from Swiss-Prot, b Relative expression levels based on total number of peptide hits, c Analytical replicates

Notably, serum albumin, the most abundant plasma protein was 97% immuno-affinity

depleted in replicate runs and similar observations made for proteins such as Apolipoprotein A1

and IgG’s. Some key reasons associated with incomplete protein depletion especially with

multiple depletion columns range from; presence of protein fragments, PTM’s e.g. glycosylation

and specific or non-specific interaction. In the current work, although we did not observe

complete protein depletion, our main goal of maximum depletion of human plasma reproducibly

in replicate analysis with minimum sample loss was achieved. We identified some non-targeted

proteins (e.g. Apolipoprotein A-II, Ceruloplasmin, and Apolipoprotein E) in the 12P bound

fraction, Table 2. Some abundant proteins are known carrier proteins implicated to undergo

protein-protein interaction with other proteins. For example, serum albumin and alpha 2

macroglobulin are known to have high binding affinity for Apolipoprotein A-II, growth factors,

cytokines and other proteins 11, 28-30.

44

Table 2.2: A list of identified non-targeted proteins in the 12P bound fraction after mass spectrometry analysis

Spectral counts b Sum Protein Protein Description a 1c 2c 3c # AAs Protein PI (Coverage) MW [kDa] Apolipoprotein A-II 21.00% 2 4 2 100 11.20 6.62 Apolipoprotein A-IV 21.19% 5 5 3 396 45.40 5.38 Apolipoprotein D 27.16% 3 4 4 189 21.30 5.15 Apolipoprotein E 30.87% 4 3 4 317 36.10 5.73 Apolipoprotein L1 17.54% 2 2 3 398 43.90 5.81 Ceruloplasmin 12.20% 4 5 3 1065 122.10 5.72 Complement factor B 3.93% 2 3 2 764 85.50 7.06 Hemoglobin subunit alpha 21.31% 2 2 2 142 15.20 8.68 Hemoglobin subunit beta 22.53% 3 3 2 147 16.00 7.28 Hemoglobin subunit delta 18.37% 2 2 2 147 16.00 8.05 Inter-alpha (Globulin) inhibitor H2 11.31% 5 3 4 935 105.20 7.03 Inter-alpha-trypsin inhibitor heavy chain H2 16.24% 4 5 4 946 106.40 6.86 a protein names are from Swiss-Prot, b Relative expression levels based on total number of peptide hits, c Analytical replicates

To minimize non-specific binding of abundant proteins to the 12P depletion column,

depletion conditions such as flow-rates and plasma loading capacity were optimized.

Specifically, sample binding flow rate was increased from 0.5 mL/min to 1.0 mL/min to

potentially reduced residency time of proteins-protein interactions and extensive column washes

(20 column volume) to remove unbound proteins before elution of bound fractions. Both binding

and elution buffers compositions were kept the same as described in our previous publication 21.

From LC-MS data, we observed that identified non-specific bound proteins showed lower

MS intensity with few spectral counts (total peptides) in the 12P bound fraction (Table 2)

compared to the 12P unbound (depleted) fraction. For example, Apolipoprotein A-II was

observed with few spectral counts (average ≤ 3 total peptides, MS intensity of 3.0 x 102) in the

12P bound fraction and the corresponding unbound fraction (depleted plasma) showed at least a

45

10 fold increase (average ≥ 31 total peptides, MS intensity of 3.0 x 109) in spectral counts of the same protein. The current results is in agreement with data obtained from 2P and MARS 6P depletion methods previously reported 2, 31. This observation suggests that only minimum amount of proteins non-specifically bind to the 12P immuno-depletion column and therefore, the 12P depletion column is suitable for use in proteomic analysis for “deeper mining” of potential biomarker candidates.

2.4.3 12P-M-LAC fractionation platform

After establishing the loading capacity and the performance of the 12P depletion column, we integrated the 12P and M-LAC columns onto a 2 dimensional HPLC system and performed, in sequence, protein depletion and glycoprotein fractionation. Some major advantages of this automated M-LAC fractionation platform are; to reduce sample manipulation, reduce sample losses, and decrease sample preparation time 21. Therefore, the robustness of the current approach was evaluated and validated through total protein recoveries, reproducibility and specificity of the 12P-M-LAC platform. The results from these studies are discussed below.

2.4.4 Recovery studies of 12P-M-LAC platform

Initially, we evaluated the 12P-M-LAC platform by performing replicate analysis of 4 separate plasma injections. Each fraction (12P bound, M-LAC bound and unbound) was collected separately to enable total protein content measurements using BCA assay. The average total protein recovery was 91.8% with %CV of 2.6 (Table 3) based on the sample loading volume and amount. About 80% of plasma proteins loaded was depleted and 12% of the load collected for the two M-LAC fractions.

46

Table 2.3: Recovery studies of 12P-M-LAC platform

12P bound M-LAC M-LAC Total Plasma fractions (target proteins) Bound Unbound % Recovery Average of replicate analysis 79.5 5.1 7.2 91.8 (n=3) 2.9 1.9 2.1 2.6 %CV

The current result is consistent with previous report 21 demonstrating that the M-LAC platform is robust irrespective of the depletion column used and re-affirming the suitability of the M-LAC platform in clinical sample processing.

2.4.5 Reproducibility studies of protein identification from the 12P-M-LAC platform

The 12P-M-LAC platform reproducibility was investigated by peak area and peak height measurements of HPLC profiles and spectral count plots of nano-LC-MS/MS analysis data. First, we performed 4 independent runs using 25 µL volume of plasma and compared the peak areas and peak heights of replicate runs, Figure 3.

47

Figure 2.3: HPLC profile showing 4 replicate runs and fractions collected during 12P-M-LAC fractionation.

We recorded an average %RSD of ≤ 0.04 and ≤ 0.03 for peak area and peak height measurements respectively. This observation was confirmed qualitatively by 1D-SDS gels where the same band patterns were observed in replicate runs.

Further, tryptic digest of 12P-M-LAC replicate fractions were analyzed using nano-LC-

MS/MS to evaluate and validate the 12P-M-LAC platform reproducibility by MS spectral counts

(total peptides) per protein identified. Pearson correlation plots using spectral counts were generated for 12P bound, M-LAC bound and M-LAC unbound fractions and the results presented in Figure 4. In the plots (Figure 4A, 4B, and 4C), each point represents an individual protein, x and y axes compares total number of peptides per protein in analytical replicate 1 with total number of peptides per protein in analytical replicate 2. Spectral counts of only proteins

48

identified reproducibly in replicate runs were utilized to generate Pearson plots. Correlation coefficient (r2) values were consistent among the three fractions: 0.9881 for 12P bound fraction,

0.9832 for M-LAC bound fraction and 0.9822 for M-LAC unbound fraction.

Figure 2.4: Reproducibility measurements of 12P-M-LAC fractions; 12P bound fraction (A), M- LAC bound (B) and M-LAC unbound (C) based on spectral counts of two analytical replicates. Each point in the graph represents an identified protein, x axes shows spectral counts per protein in analytical replicate 1 and y axes shows spectral counts per protein in analytical replicate 2. Pearson coefficient correlation values (R2) indicates good analytical reproducibility of 12P-M- LAC platform with an average R2 of 0.9845

Additionally, we compared the number of proteins identified in all three analytical replicates for M-LAC bound and unbound fractions. In Figure 5, a 3 way Venn diagram obtained for each fraction is shown. The Venn diagrams provided details about shared and unique proteins identified for each M-LAC fraction in individual replicate analysis. From Figure 5, it is observed

49

that in the M-LAC bound fraction, majority (~91%) of proteins identified after LC-MS analysis were present in all three replicates. A similar observation was noted for M-LAC unbound fraction where ~82% of proteins identified were present in all three replicates. These observations therefore demonstrate the reproducibility of our 12P-M-LAC platform.

Figure 2.5: A three way Venn diagram showing proteins identified in three analytical replicates of M-LAC bound and unbound fractions after nano-LC-MS/MS analysis.

The consistency in protein identification of replicate analysis is necessary for the accurate determination of protein or glycoprotein abundance changes in disease verse control clinical samples especially in label free quantitative proteomic analysis. Hence, there is the need for a proteomic platform to demonstrate good reproducibility. As previously established 21, column life time monitoring is imperative since we used low pH elution buffers. So far, we have performed approximately 175 independent 12P-M-LAC runs over a period of 9 months and have

50

observed an average r2 of 0.9818 for spectral counts of 12P bound fraction (Appendix B), ~90% and ~87% for shared proteins in replicates M-LAC bound and unbound fractions respectively.

This shows that the 12P-M-LAC platform is consistent, efficient, effective, reproducible and overall stable.

2.4.6 Enrichment of low level glycoproteins by 12P-M-LAC platform

M-LAC is established to be the preferred glycoproteomics enrichment method 19, 22- 23 and it is known to partition glycoproteins and non-glycoproteins depending on their glycosylation status which is crucial in glycoproteomics analysis of clinical relevant samples 21. In the current study, we investigated both the presence and the amounts of low level glycoproteins identified in replicate analysis. In Table 4, we show examples of proteins identified in M-LAC fractions. This list contains proteins that were most abundant and identified in replicate analysis.

Table 2.4: A list of glycoproteins identified in M-LAC fractions

b Spectral counts Glycosylation Protein Description a M-LAC Unbound c M-LAC Bound c status (Yes/No) Cadherin-5 ND 14 Yes Carboxypeptidase B2 4 10 Yes Clusterin 9 16 Yes Complement C2 15 25 Yes Cystatin-C 4 ND Yes Gelsolin 42 ND No Histidine-rich glycoprotein 18 31 Yes Kallikrein-13 ND 6 Yes Kallistatin 17 17 Yes Laminin subunit alpha-3 ND 97 Yes Pigment epithelium-derived factor 23 33 Yes Plasma serine protease inhibitor 7 12 Yes Galectin-3-binding protein 5 16 Yes Sacsin 73 ND No Selenoprotein P ND 14 Yes

51

Serine/threonine-protein kinase NIM1 8 ND No 14-3-3 protein zeta/delta 13 ND No Zinc-alpha-2-glycoprotein 4 15 Yes

Our data indicates that majority (>95%) of the identified low level proteins are glycoproteins present in both M-LAC bound and unbound fractions with the distribution determined by the presence of glycan subsets on an individual glycoprotein. Some glycoproteins such as Cadherin-5, Kallikrein-13, and Laminin subunit alpha-3 were only identified in the M-

LAC bound fraction whiles non-glycosylated proteins such as Sacsin, Serine/threonine-protein kinase NIM1, and Gelsolin were only detected in the M-LAC unbound fraction. Kulloli et al. reported similar observations in a study where ConA, WGA and Jac lectins constituted the M-

LAC column 21. The composition of glycans enriched in M-LAC fractions is currently studied in our laboratory. These observations support the conclusion that M-LAC has the ability to enrich low level glycoproteins and segregates glycoproteins and non-glycoproteins into sub- populations. The analysis of glycan shifts in disease samples will be a subject of chapter 3.

The enrichment of low level glycoproteins and non-glycoproteins such as; Zinc-alpha-2- glycoprotein, Pigment epithelium-derived factor, and Selenoprotein P is a significant step towards the discovery of potential candidate markers in plasma. This is because indicators of a disease and/or healthy state such as tissue and cell specific proteins are usually present at lower concentrations (low ng/mL and pg/mL levels) in circulatory blood plasma32. Therefore, the development of an analytical tool, such as 12P-M-LAC platform that has the potential to detect low abundance proteins, can be of clinical interest.

52

2.5 CONCLUSION

In summary, a robust platform which depletes top 12 high abundance proteins and fractionates the glycoproteome efficiently and reproducibly from human plasma has been developed. Since thorough evaluations of a multi-dimensional analytical platform are crucial for applications in biomarker discovery studies, the performance of the 12P-M-LAC platform was validated through reproducibility, recoveries, and stability studies. It was demonstrated that the platform allows for improved identification of low abundance proteins which can aid in the discovery of candidate markers of therapeutic and clinical interest. The application of a robust platform to characterize the glycoproteome of clinically relevant samples and evaluation of protein abundance changes as well as glycan shifts is suitable in understanding the sub-proteome of disease samples. In chapter

3, this platform is applied to comprehensively study the global profile of clear cell renal cell carcinoma plasma samples to identify potential biomarkers that may help early detection of the disease.

53

2.6 REFERENCES

1. Anderson, N. L.; Anderson, N. G., The human plasma proteome: history, character, and diagnostic prospects. Mol Cell Proteomics 2002, 1 (11), 845-67.

2. Plavina, T.; Wakshull, E.; Hancock, W. S.; Hincapie, M., Combination of abundant protein depletion and multi-lectin affinity chromatography (M-LAC) for plasma protein biomarker discovery. J Proteome Res 2007, 6 (2), 662-71.

3. Burgess-Cassler, A.; Johansen, J. J.; Kendrick, N., Immunodepletion of albumin from human serum samples. Clin Chim Acta 1989, 183 (3), 359-65.

4. Beer, L. A.; Tang, H. Y.; Sriswasdi, S.; Barnhart, K. T.; Speicher, D. W., Systematic discovery of ectopic pregnancy serum biomarkers using 3-D protein profiling coupled with label- free quantitation. J Proteome Res 2011, 10 (3), 1126-38.

5. Chromy, B. A.; Gonzales, A. D.; Perkins, J.; Choi, M. W.; Corzett, M. H.; Chang, B. C.; Corzett, C. H.; McCutchen-Maloney, S. L., Proteomic analysis of human serum by two- dimensional differential gel electrophoresis after depletion of high-abundant proteins. J Proteome Res 2004, 3 (6), 1120-7.

6. Maccarrone, G.; Milfay, D.; Birg, I.; Rosenhagen, M.; Holsboer, F.; Grimm, R.; Bailey, J.; Zolotarjova, N.; Turck, C. W., Mining the human cerebrospinal fluid proteome by immunodepletion and shotgun mass spectrometry. Electrophoresis 2004, 25 (14), 2402-12.

7. Gao, M.; Deng, C.; Yu, W.; Zhang, Y.; Yang, P.; Zhang, X., Large scale depletion of the high-abundance proteins and analysis of middle- and low-abundance proteins in human liver proteome by multidimensional liquid chromatography. Proteomics 2008, 8 (5), 939-47.

8. Shen, Z.; Want, E. J.; Chen, W.; Keating, W.; Nussbaumer, W.; Moore, R.; Gentle, T. M.; Siuzdak, G., Sepsis plasma protein profiling with immunodepletion, three-dimensional liquid chromatography tandem mass spectrometry, and spectrum counting. J Proteome Res 2006, 5 (11), 3154-60.

9. Jacobs, J. M.; Adkins, J. N.; Qian, W. J., et al., Utilizing human blood plasma for proteomic biomarker discovery J Proteome Res 2005, 4(4), 1073–1085.

10. Gundry, R. L.; Fu, Q.; Jelinek, C. A., et al., Investigation of an albumin-enriched fraction of human serum and its albuminome. Proteomics Clin Appl 2007, 1(1), 73−88.

11. Fanali, G.; di Masi, A.; Trezza, V et al., Human serum albumin: from bench to bedside. Mol Aspects Med 2012, 33(3), 209−290.

12. Dennis, J. W.; Granovsky, M.; Warren, C. E., Glycoprotein glycosylation and cancer progression. Biochimica et biophysica acta 1999, 1473(1), 21-34.

54

13. Fuster, M. M.; Esko, J. D., The sweet and sour of cancer: glycans as novel therapeutic targets. Nat Rev Cancer 2005, 5(7), 526-542.

14. Dube, D. H.; Bertozzi, C. R., Glycans in cancer and inflammation--potential for therapeutics and diagnostics. Nat Rev Drug Discov 2005, 4(6), 477-488.

15. Drake, P. M.; Cho, W.; Li, B. et al., Sweetening the pot: adding glycosylation to the biomarker discovery equation. Clinical Chemistry 2010, 56(2), 223-236.

16. Mann, B. F.; Goetz, J. A.; House, M. G.; Schmidt, C. M.; Novotny, M. V., Glycomic and proteomic profiling of pancreatic cyst fluids identifies hyperfucosylated lactosamines on the N- linked glycans of overexpressed glycoproteins. Mol Cell Proteomics 2012, 11 (7), M111 015792.

17. Zhao, J.; Simeone, D. M.; Heidt, D.; Anderson, M. A.; Lubman, D. M., Comparative serum glycoproteomics using lectin selected sialic acid glycoproteins with mass spectrometric analysis: application to pancreatic cancer serum. J Proteome Res 2006, 5 (7), 1792-802.

18. Fu, D.; van Halbeek, H., N-glycosylation site mapping of human serotransferrin by serial lectin affinity chromatography, fast atom bombardment-mass spectrometry, and 1H nuclear magnetic resonance spectroscopy. Anal Biochem 1992, 206 (1), 53-63.

19. Yang, Z.; Hancock, W. S., Monitoring glycosylation pattern changes of glycoproteins using multi-lectin affinity chromatography. J Chromatogr A 2005, 1070 (1-2), 57-64.

20. Kullolli, M.; Hancock, W. S.; Hincapie, M., Preparation of a high-performance multi- lectin affinity chromatography (HP-M-LAC) adsorbent for the analysis of human plasma glycoproteins. J Sep Sci 2008, 31 (14), 2733-9.

21. Kullolli, M.; Hancock, W. S.; Hincapie, M., Automated platform for fractionation of human plasma glycoproteome in clinical proteomics. Anal Chem 2010, 82 (1), 115-20.

22. Zheng, X.; Wu, S. L.; Hincapie, M.; Hancock, W. S., Study of the human plasma proteome of rheumatoid arthritis. J Chromatogr A 2009, 1216 (16), 3538-45.

23. Zeng, Z.; Hincapie, M.; Haab, B. B.; Hanash, S.; Pitteri, S. J.; Kluck, S.; Hogan, J. M.; Kennedy, J.; Hancock, W. S., The development of an integrated platform to identify breast cancer glycoproteome changes in human serum. J Chromatogr A 2010, 1217 (19), 3307-15.

24. Zeng, Z.; Hincapie, M.; Pitteri, S. J.; Hanash, S.; Schalkwijk, J.; Hogan, J. M.; Wang, H.; Hancock, W. S., A proteomics platform combining depletion, multi-lectin affinity chromatography (M-LAC), and isoelectric focusing to study the breast cancer proteome. Anal Chem 2011, 83 (12), 4845-54.

25. Dall'olio, F.; Chiricolo, M., Sialyltransferases in cancer. Glycoconj J 2001, 18(11-12), 841-850.

55

26. Pousset, D; Piller, V; Bureaud, N; Monsigny, M; Piller, F, Increased alpha2,6 sialylation of N-glycans in a transgenic mouse model of hepatocellular carcinoma. Cancer Research 1997, 57(19), 4249-4256.

27. Recchi, M. A.; Hebbar, M.; Hornez, L.; Harduin-Lepers, A; Peyrat, J. P.; Delannoy, P., Multiplex reverse transcription polymerase chain reaction assessment of sialyltransferase expression in human breast cancer. Cancer Research 1998, 58(18), 4066-4070.

28. Gundry, R. L.; Fu, Q.; Jelinek, C. A.; Van Eyk, J. E.; Cotter, R. J., Investigation of an albumin-enriched fraction of human serum and its albuminome. Proteomics Clin Appl 2007, 1 (1), 73-88.

29. Zhang, W. M.; Finne, P.; Leinonen, J.; Salo, J.; Stenman, U. H., Determination of prostate-specific antigen complexed to alpha(2)-macroglobulin in serum increases the specificity of free to total PSA for prostate cancer. Urology 2000, 56 (2), 267-72.

30. Lin, V. K.; Wang, S. Y.; Boetticher, N. C.; Vazquez, D. V.; Saboorian, H.; McConnell, J. D.; Roehrborn, C. G., Alpha(2) macroglobulin, a PSA binding protein, is expressed in human prostate stroma. Prostate 2005, 63 (3), 299-308.

31. Dayarathna, M. K. R; Hancock, W. S.; Hincapie, M.; A two step fractionation approach for plasma proteomics using immunodepletion of abundant proteins and multi-lectin affinity chromatography: Application to the analysis of obesity, diabetes, and hypertension diseases. J. Sep. Sci. 2008, (31), 1156 – 1166.

32. Schiess, R.; Wollscheid, B.; Aebersold, R., Targeted proteomic strategy for clinical biomarker discovery. Mol Oncol 2009, 3 (1), 33-44.

56

CHAPTER 3

COMPARATIVE STUDIES OF THE PROTEOME, GLYCOPROTEOME AND N-GLYCOME OF CLEAR CELL RENAL CELL CARCINOMA PLASMA BEFORE AND AFTER CURATIVE NEPHRECTOMY

This work is currently under review for publication: Francisca O. Gbormittah, Ling Y. Lee, KyOnese Taylor, William S. Hancock, Othon Iliopoulos, J. Proteome Res., 2014

57

3.1 ABSTRACT

Clear cell renal cell carcinoma is the most prevalent of all reported kidney cancer cases and currently there are no markers for early diagnosis. This has stimulated great research interest recently because early detection of the disease can significantly improve the low survival rate.

Combining the proteome, glycoproteome and N-glycome data from clear cell renal cell carcinoma plasma has the potential of identifying candidate markers for early diagnosis, prognosis and/or monitor disease recurrence. Here, we report on the utilization of a multi- dimensional fractionation approach (12P-M-LAC) and LC-MS/MS to comprehensively investigate clear cell renal cell carcinoma plasma collected before (disease) and after (non- disease) curative nephrectomy (n=40). Proteins detected in the sub-proteomes were investigated via label-free quantification. Protein abundance analysis revealed a number of low-level proteins with significant differential expression levels in disease samples including; HSPG2, CD146,

ECM1, SELL, SYNE1, and VCAM1. Importantly, we observed a strong correlation between differentially expressed proteins and clinical status of the patient. Investigation of the glycoproteome returned 13 candidate glycoproteins with significant differential M-LAC column binding. Qualitative analysis indicated that 62% of selected candidate glycoproteins showed higher levels (up-regulation) in M-LAC bound fraction of disease samples. This observation was further confirmed by released N-glycans data in which 53% of identified N-glycans were present at different levels in plasma in the disease vs. non-disease samples. This striking result demonstrates the potential for significant protein glycosylation alterations in clear cell renal cell carcinoma cancer plasma. With future validation in a larger cohort, information derived from this study may lead to the development of clear cell renal cell carcinoma candidate biomarkers.

58

3.2 INTRODUCTION

Kidney (renal) cancer which includes renal cell carcinomas and transitional cell carcinomas of the renal pelvis is estimated to cause 13,680 deaths (8,780 men and 4,900 women) in the year

2013. Clear cell renal cell carcinoma (ccRCC) is the most common subtype of kidney cancer contributing approximately 80% in all reported cases, whiles papillary and chromophobe subtypes contribute 15% and 5% respectively1.

Currently, there are no biomarkers for ccRCC early diagnosis and the standard method of treatment (i.e. surgery) is unsatisfactory. This is because patients undergoing surgery are likely to relapse at a rate of 20-50% and have a higher chance of developing metastatic tumor which is incurable2,3. Therefore, there is the urgent need to identify biomarkers for early detection and to predict or monitor the recurrence of ccRCC after surgery.

The complexity (dynamic range of approximately 1011) of the plasma proteome is of great concern in biomarker discovery studies. This is because tissue and secreted cell surface protein products which are indicators of a disease and/or healthy state are of low abundance in blood plasma. For example, current clinical biomarkers including prostate specific antigen (PSA) and Her2/neu are present at low ng/mL concentration in plasma4 and the ability to identify such molecular markers requires a comprehensive multi-dimensional analytical approach.

Several analytical platforms have been devised to overcome plasma complexity. One of such method is targeted enrichment of glycosylated proteins. This approach is sensitive and specific because it involves the characterization of a sub-population of proteins whose alterations are associated with many diseases including cancer5. Protein glycosylation is one of the most diverse and frequently occurring post translational modifications involved in a number of cell processes6 and aberrant changes in glycosylation profile during the development and progression

59

of cancer is known. Another method is immuno-affinity depletion, which target the removal of high abundance proteins enabling ‘deeper mining’ of low-level proteins. These strategies have either been used alone or in combination to enhance protein biomarker discovery studies.

A number of groups have reported the use of high abundance proteins depletion and enrichment of target glycoproteins and have observed differential protein expression and alterations in protein glycosylation in cancer samples7-10. These observations suggest that, the integration of protein depletion and enrichment of target glycoproteins enhance proteomics and glycoproteomics studies leading to the identification of potential candidate markers.

ccRCC plasma biomarker discovery is lagging behind other disease studies although early diagnosis of ccRCC can lead to a cure by surgery. To date, only a handful of ccRCC plasma proteomics biomarker discovery studies have been reported 7, 11 and no reports yet on N- glycan profiles in the literature to the best of our knowledge. Therefore, we attempted to use a comprehensive comparative ‘omics’ approach namely; proteomics, glycoproteomics and N- glycomics to study ccRCC plasma before (disease) and after (non-disease) curative nephrectomy and evaluated their alterations associated with their histological status. Our goal was to identify potential candidate markers of biological interest in ccRCC development and their potential utility to monitor ccRCC recurrence after curative nephrectomy.

Our laboratory previously developed an automated high throughput multi-lectin affinity chromatography (HP-M-LAC) platform which combines two high abundance proteins (albumin and IgG) depletion and multi-lectins (Con A, WGA and JAC) fractionation12. In this chapter, we utilized the expanded HP-M-LAC approach which was described in detail in chapter 2. In brief, the platform incorporates 12 high abundance proteins depletion (12P) and multi-lectins

(Sambucus nigra (SNA), Aleuria aurantia (AAL), and Phytohemagglutinin-L (PHA-L))

60

enrichment with the overall goal of improving the proteomic and/or glycoproteomic depth. AAL,

SNA and PHA-L lectins have specificities towards core (α 1, 6) fucose13, sialic acid and N-acetyl glucosamine14, and highly branched glycans of terminal galactose and mannose15 oligosaccharides respectively. These lectins were selected to capture the population of glycan structures frequently implicated in cancer progression and metastasis16-21. Nano-LC-MS/MS analysis of trypsin digested12P depleted plasma and M-LAC fractions (bound and unbound) enabled relative quantification via spectral counting8. Furthermore, N-glycans released from M-

LAC fractions were profiled and detailed structural annotation conducted.

We identified several low abundance proteins and glycoproteins with significant differential abundance levels in ccRCC cancer samples. Glycosylation alterations in cancer plasma (before) compared to non-cancer plasma (after) were evident based on glycoproteins differential binding to the M-LAC column and N-glycan profiles.

3.3 MATERIALS and METHODS

3.3.1 Materials

Capture select ® 12P depletion resin and PEEK columns were purchased from Life Technology,

Milford, MA. Gravity omnifit glass column was provided by Biochem Fluidics, Boonton, New

Jersey. POROS® R1 50µm Bulk Media (reversed phase packing), and HPLC self-packing device were purchased from Applied Biosystems, Framingham, MA. All lectins were obtained from

Vector laboratories, Burlingame, CA. Sequencing grade modified trypsin was purchased from

Promega (Madison, WI). 4−12% Bis-Tris sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) gels and NuPAGE MES SDS running buffer (10X) were purchased from Invitrogen, Carlsbad, CA. HPLC-MS grade water, formic acid, acetonitrile and other buffer reagents were all purchased from Thermo Fisher Scientific (Waltham, MA).

61

3.3.2 Sample population

Clear cell renal cell carcinoma (ccRCC) patients enrolled in this study gave their consent via protocols 01-130 approved by the Institutional Review Board at Massachusetts General Hospital and provided to us by Dr. Othon Iliopoulos laboratory at Massachusetts General Hospital. 20 ccRCC plasma samples before nephrectomy and 20 ccRCC plasma samples after nephrectomy were pooled to give one disease (RCC (+)) and one non-disease (RCC (–)) samples, Table 3.1.

Pooling was necessary for this discovery study due to limited amount of samples and also to reduce patient’s variability. Pooled plasma samples were stored in -80oC and did not undergo more than two freeze/thaw cycles.

Table 3.1: Patient information of RCC plasma samples, average age of patients is 52years

Patient ID Protocol number Sex Tumor size, cm Clinical diagnosis 38 01-130 Female 8.5 x 6.3 x 6.0 ccRCC 46 01-130 Female 5.5 x 5.0 x 5.0 ccRCC 48 01-130 Male 7.2 x 4.0 x 3.0 ccRCC 50 01-130 Female 6.3 x 5.0 x 4.5 ccRCC 53 01-130 Male 8.0 x 7.0 x 6.0 ccRCC 62 01-130 Male 7.5 x 7.5 x 6.0 ccRCC 64 01-130 Female 8.2 x 8.0 x 7.2 ccRCC 75 01-130 Female 8.5 x 7.5 x 7.0 ccRCC 88 01-130 Female 3.5 x 3.5 x 3.5 ccRCC 108 01-130 Male 6.0 x 4.0 x 3.5 ccRCC 116 01-130 Male 7.5 x 4.5 x 3.0 ccRCC 124 01-130 Male 8.5 x 6.0 x 5.5 ccRCC 135 01-131 Female 5.5 x 4.0 x 3.0 ccRCC 170 01-132 Male 6.2 x 5.0 x 3.5 ccRCC 192 01-130 Female 8.0 x 4.5 x 4.0 ccRCC 223 01-130 Male 7.0 x 6.5 x 5.0 ccRCC 235 01-130 Female 4.5 x 4.5 x 3.0 ccRCC 251 01-130 Male 5.2 x 5.0 x 4.5 ccRCC 253 01-130 Male 4.7 x 4.2 x 4.0 ccRCC 261 01-130 Female 8.2 x 7.5 x 6.0 ccRCC

62

3.3.3 High Abundance Proteins Depletion and Glycoprotein Affinity Fractionation

An automated HPLC platform used for high abundance protein depletion and glycoprotein fractionation has been described previously 12 and we have applied this fractionation platform with moderate changes. Briefly, pooled plasma samples were depleted using 12 (Albumin, IgG,

IgM, IgA, free light chains, Fibrinogen, Transferrin, α1 anti-trypsin, Apolipoprotein A1, α2

Macroglobulin, Haptoglobin, and α1 acid glycoprotein) abundance protein depletion column packed in-house into a PEEK column (4.6mm×100mm) followed by glycoprotein fractionation with a multi-lectin affinity column (M-LAC) containing equal mixtures of lectins; Aleuria aurantia lectin (AAL), Sambucus nigra lectin (SNA), and Phaseolus vulgaris leucoagglutinin

(PHA-L). Eluted fractions were desalted on a R1 reversed phase column. Three columns (12P,

M-LAC and R1 reversed phase) each attached to a separate valve were connected in series on a two dimensional HPLC system (Shimadzu, Columbia, MD) equipped with an on/off switch to control the valves. During sample fractionation, the columns were first equilibrated with a binding buffer (25mM Tris, 0.5M sodium chloride, 1mM MnCl2, 1mM CaCl2 and 0.05% sodium azide, pH 7.4) at a flow rate of 2.0 mL/min for 15 minutes followed by plasma loading at a flow rate of 0.5 mL/min for 25 minutes. Depleted plasma (12P unbound), M-LAC bound and

M-LAC unbound fractions were eluted separately via valve switching and desalted on an R1 reversed phase column using a 70% solvent B (0.1% trifluoroacetic acid in acetonitrile) and 30% solvent A (0.1% trifluoroacetic acid in milli-q water) gradient. Elution buffers for 12P depletion and M-LAC fractions were 0.1 M glycine (pH 2.5) and 0.1 M acetic acid (pH 2.5) respectively.

Total protein concentration measurements of all collected fractions were performed using Qubit

®fluorescence assay (Life Technologies, Inc. Carlsbad, CA) following manufacturer’s instructions.

63

3.3.4 N- Glycan Release and LC-ESI-MS Analysis

N-linked glycans were isolated using previously described method 22. Briefly, 20 µg total proteins per sample (2 analytical replicates of each fraction) were brought to 100 µL volume with ultrapure water and 9 volumes of acetone added followed by overnight precipitation at -

20oC. Precipitated proteins were centrifuged briefly; acetone removed and speed vacuum to dryness followed by re-solubilization of protein pellet in 10 µL Urea (8M). Protein solution was dot blotted onto a 100% (v/v) methanol-activated PVDF membrane (Millipore) surface and dried at room temperature. Protein spots were visualized using Direct Blue 71 (Sigma-Aldrich) and de- stained with 40% (v/v) ethanol and 10% (v/v) acetic acid. Protein spots were excised and placed into separate wells of 96-well plate. The membrane was then blocked with 1% (w/v) PVP40 solution followed by 3 washes of 5 minutes each with water. 2.5 U of PNGase F

(Flavobacterium meningospeticum, Roche) was added and incubated at 37oC for 15 minutes and further incubated at 37oC overnight after additional 10 µL water was added. N- glycans were extracted in the following fashion; 5 min sonication of 96-well plate containing glycans, and three times washes with 20 µL water. Supernatants washes were pooled into one Eppendorf tube for individual samples. Samples were acidified with 10 µL of 100 mM ammonium acetate (pH 5) and incubated at room temperature for 1 hour. Samples were subsequently dried via speed

o vacuum and reduced with 20 µL 1M NaBH4 in 50mM KOH at 50 C for 3 hours. The reaction was stopped by addition of 2 L acetic acid and desalted using AG 50W X8 cation exchange resin (Bio-Rad). Desalted samples were collected in water and dried by vacuum centrifugation.

Methanol was added to remove any residual borate and allowed to evaporate in the vacuum centrifuge. This step was repeated four or five times until the white residue disappeared.

Desalted samples were kept at -80 C if not used immediately. Separation of the N-glycan

64

alditols was performed using a Hypercarb PGC (5 µm Hypercarb, 180 µ × 100 mm; Thermo

Fisher Scientific) connected on the HPLC system (Agilent 1100) over an 85 min gradient from

0% to 45% acetonitrile in 10 mM ammonium bicarbonate and eluted N-glycan analyzed in ESI-

MS/MS on a Agilent MSD three-dimensional ion-trap XCT Plus mass spectrometer. Settings for the MS/MS were as follow: drying gas flow, 7 L/min; drying gas temperature, 325°C; nebulizer gas, 18 p.s.i.; skimmer, -40.0 V; trap drive, -99.1 V and capillary exit, -166 V. Smart fragmentation was used with start- and end-amplitude of 30% and 200% respectively. Ions were detected in ion charge control targeted at 100,000 ions and with maximum accumulation time of

200 ms. MS spectra were obtained in negative ion mode with two scan events: a full scan range between m/z 100 and 2200 at scan speed of 8,100 m/z/s and dependent MS/MS scan after CID of top two most intense precursor ions with threshold 30,000 and relative threshold of 5% base peak. Dynamic inclusion was inactivated for MS/MS of closely eluting glycans. Precursors were observed mainly in charged states -1 and/or -2. Mass accuracy calibration of instrument was performed using tuning mix (Agilent) and N-glycans released from bovine fetuin served as positive controls before each data set runs.

3.3.5 Gel nano-LC-MS/MS Proteomic and Glycoproteomic Analysis

20µg total proteins of depleted plasma and M-LAC fractions were resolved on a 4-12% Bis-Tris

SDS-PAGE gel (Novex® NuPAGE®, Life Technologies) followed by trypsin (Promega,

Madison, WI) digestion of excised gel pieces as previously described 23. Briefly, lanes were cut into four bands and each band cut into 1mm x 1mm pieces. Gel pieces were trypsin (0.04 µg/µL) digested following de-staining with 50 mM ammonium bicarbonate buffer at pH 8.0 and acetonitrile, reduction (25 mM dithiothreitol) at 56oC for 30 minutes and alkylation (50 mM

65

iodoacetamide) at room temperature for 30 minutes in darkness. Trypsin digests were extracted with 100µL 50% (v/v) acetonitrile/0.1% (v/v) formic acid in HPLC grade water three times and speed vacuum to dryness. Mass spectrometry analysis of depleted plasma and M-LAC fractions were performed on an LTQ-Orbitrap Elite instrument (Thermo Fisher Scientific, Waltham, MA) equipped with an Ultimate 3000 HPLC (LC Packings-Dionex, Marlton, NJ, USA) and nano-ESI source. A reversed phase C18 column packed in-house with a 75 µm metal spray tip (Michrom

Bioresources, Auburn, CA) was used. Peptides were separated at a flow rate of 200 nL/min on the C18 column using the following 100 min gradient; 5% to 40% buffer B for 80 min; 40% to

90% B buffer for 15 min; 90% to 2% B buffer for 5 min. Mobile phase A consisted of 0.1% v/v formic acid in HPLC grade water, and mobile phase B consisted of 0.1% v/v formic acid in acetonitrile. The mass spectrometer was operated in a data-dependent mode with 8 most abundant precursor ions selected for collision induced dissociation (CID) MS/MS fragmentation in a full MS scan range of m/z 400-2000 with a mass resolution of 120,000. Dynamic exclusion parameters were set to; 1 repeat count (repeat duration of 30 seconds, exclusion list size 100, exclusion duration of 45 seconds, and exclusion mass width 1.0 m/z low and 1.50 m/z high ).

3.3.6 Data processing and statistical analysis

Analysis of N-glycan data was performed using ESI-Compass 1.3 (Bruker Daltonics).

Monoisotopic masses obtained were searched against GlycoMod

(http://web.expasy.org/glycomod/) for possible glycan compositions and subsequently verified by their corresponding MS/MS spectra. The relative abundance of each glycan in a sample was determined using peak area of each glycan against the sum of peak areas of all glycans from extracted ion chromatograms, which has been shown to be a reasonably accurate method for

66

relative N-glycan quantitation24. LC-MS/MS proteomic and glycoproteomic data were searched against annotated human database (release 2013_1; 34,157 entries) using SEQUEST algorithm

(Thermo Electron Corp, San Jose, CA) on the Thermo Fischer Proteome Discoverer 1.4 suite.

Peptide identification was based on the HUPO criteria which included; ΔCn ≥ 0.1, peptide probability < 0.001, Xcorr ≥ 1.9, 2.5 and 3.8 for singly, doubly and triply charged ions, respectively. Confidence in identification was further increased by applying the reverse database with a false discovery rate (FDR) targeted at 1% at the peptide level. Other search parameters included: 2 maximum missed cleavages; full trypsin as enzyme; carbamidomethylation on cysteine as static modification; deamidation of asparagine as dynamic modification; precursor ion mass tolerance and fragment ion mass tolerance were set at 5 ppm and 0.8 Da respectively.

PANTHER (Protein ANalysis THrough Evolutionary Relationships) database

(http://pantherdb.org/) was used for classification. A label free semi-quantitative method using spectral count was applied to select proteins of interest with abundance changes and evaluated M-LAC differential binding in ccRCC samples as previously described8. Briefly, ratios of spectral count in disease to spectral count in non-disease allowed us to select proteins with potential abundance level changes after normalization with reference ratio calculated from total spectral counts. Glycoprotein candidates with M-LAC differential binding were selected based on the ratios of spectral count of M-LAC bound fraction considered theoretical (non- disease _Bound measured * (disease_Unbound measured / non-disease_Unbound measured)) to spectral counts of experimental M-LAC bound fraction (disease_Bound). In instances where no peptides (“0”) were observed for a particular protein under consideration, “1” was added for meaningful ratio calculations. Proteins or glycoproteins with a fold change ≥ 3 or ≤ 3 were identified as differentially expressed. Excel software (Microsoft Office 2010) was used to

67

generate p-values by calculating standard student’s t-test and investigate N-glycome and N- glycoproteome differential expressions considering p-values ≤ 0.05 as statistically significant.

3.4 RESULTS AND DISCUSSION

3.4.1 The analytical strategy

The global profiles of the proteome, glycoproteome and N-glycome of ccRCC plasma were achieved by designing an analytical strategy that focused on “deeper mining” of low abundance disease associated non-glycoproteins, glycoproteins, and N-glycans, Figure 3.1.

Figure 3.1: Experimental workflow showing the process used in the characterization of clear cell renal cell carcinoma plasma (ccRCC). ccRCC plasma samples were purified from top 12 high

68

abundance proteins followed by glycoproteins enrichment and LC-MS/MS analysis of depleted plasma and M-LAC bound and unbound fractions.

In our previous publications we have shown that a multidimensional platform is a valuable approach to comprehensively characterize the proteome and glycoproteome of biological samples to enhance the identification of potential biomarker candidates present at low amounts 8,23,25. In the current study, plasma samples were initially purified to reduce the large dynamic range of plasma concentration by depleting the top 12 high abundance proteins (12P).

Further, a semi-targeted approach was used as a second fractionation strategy, in which equal mixtures of three lectins; Aleuria aurantia (AAL), Sambucus nigra (SNA) and Phaseolus vulgaris leucoagglutinin (PHA-L) were packed into an HPLC PEEK column to enrich the sub- glycoproteome. Lectins have previously been used to target glycan structures commonly altered in carcinogenesis10, 26,27. In a serum breast cancer proteomic study, Zeng et al. observed that differential protein affinities towards selected lectins was indicative of changes in glycan expression level in cancer versus control samples8. Similarly, Abbott et al. used PHA-L lectin to capture potential breast carcinoma biomarkers elevated in breast carcinoma tissues at different stages10.

In this study, we focused on 40 plasma samples obtained from 20 patients diagnosed with clear cell renal cell carcinoma, Table 1. Plasma samples taken before (RCC (+)) and after (RCC

(–)) curative nephrectomy were pooled into two groups; disease (before nephrectomy, n=20) and non-disease (after nephrectomy, n=20). Pooling was necessary to reduce patient variability 28 and improve effective depletion of plasma while increasing protein detection coverage as previously established23. Also, pooling allowed for sufficient amount of samples for two analytical replicates of each ‘omic’ analysis. 12P depleted plasma and M-LAC fractions were subjected to

69

proteomics, glycoproteomics and N-glycomic analysis using analytical platforms described in the experimental section.

3.4.2 The 12P-M-LAC analytical platform

Advances in proteomic analysis have pointed out the importance of minimizing interference from high abundance proteins that may mask and/or prevent the detection of low level proteins in disease samples23,29. Therefore, an analytical technology that improves the identification of low level proteins and increases the depth of proteomic data is desirable.

In this chapter, we applied the developed 12P-M-LAC fractionation platform as demonstrated in chapter 2 and evaluated its performance in replicate analysis using reference plasma (Bioreclamation, Jericho, NY). First, the loading capacity of the platform was investigated to ensure minimal run-to-run carry over and sample losses and 25 µL of reference plasma volume was determined to be the optimal loading amount. We then used the optimized loading amount to assess the platform based on; total protein recoveries, reproducibility and efficiency. The total protein recovery measurements using BCA assay (Thermo Scientific) showed an average of 92% of the starting material which agrees with an earlier report12 (see chapter 2, Table 2.3).

Similarly, Coomassie Blue stained gels for three analytical replicates of 12P column target (bound) proteins and M-LAC fractions revealed identical band pattern and band intensity in replicate samples indicating good reproducibility of analytical replicates, Figure 3.2.

Furthermore, we observed gel bands differences in M-LAC bound and unbound fractions, suggesting that the M-LAC column fractionates the glycoproteome into sub-populations.

70

Figure 3.2: ID SDS-PAGE analysis of reference plasma fractions from 12P-M-LAC platform to evaluate reproducibility. Equal amounts (2µg) of replicate samples were separated on a 4-12% NuPAGE gel as follows; reference plasma (lanes 2, 6 and 10), 12P bound fraction (lanes 3, 7 and 11), M-LAC bound fraction (lanes 4, 8 and 12), and M-LAC flow-through fraction (lanes 5, 9 ® and 13). Lane 1 represents the molecular weight standard marker from Invitrogen (SeeBlue Plus2 Prestained).

3.4.3 Overview of proteomics and glycoproteomics data

12P depleted plasma fractions were analyzed to evaluate protein abundance changes in proteomics analysis using 1D-SDS PAGE and nano-LC-MS/MS. Similarly, glycoproteomics of

12P-M-LAC bound and unbound fractions enabled the evaluation of proteins with potential glycosylation changes.

Overall, 215 and 248 unique proteins were identified from two analytical replicates in the proteomic and glycoproteomic analysis respectively. All proteins reported were identified with

99% confidence (1% FDR) and ≥ 2 unique peptides. One peptide hit proteins were not included in the data analysis. Details of proteins and peptides distribution are presented in Appendix C.

Low abundance glycoproteins and non-glycoproteins identified were observed to fall into one of

71

the following functional classification; proteases, lipid associated proteins, cytoskeletal associated proteins, and complement factors and these functional categories have recently been reported to correlate with disease states25, 30.

3.4.4 Quantification and selection of differentially expressed proteins present in 12P depleted ccRCC plasma proteome

A label-free semi-quantitation method based on spectral counts was utilized to quantify proteins expressed at different amounts in ccRCC plasma samples as previously described8. Briefly, 1D gel-nano-LC-MS/MS analysis was performed on equal amounts of 12P depleted plasma samples followed by data normalization using reference ratio factor (total peptide hits of disease/ total peptide hits of control) as detailed in the data analysis section. The data was normalized to reduce variations introduced during sample preparation. In addition, spectral counts were validated by measurements of peak areas of extracted ion chromatograms and manual inspection of MS/MS spectra in random selected cases as shown in earlier published work9, 23.

Differentially expressed proteins were selected based on ratios of spectral count fold changes observed between RCC (+) and RCC (–) after normalization. Potential candidate markers were selected if detected in two analytical replicates and exhibited ≥ 3 or ≤ 0.3 fold changes, p ≤ 0.05. As shown in Table 3.2, the majority (approximately 74%) of potential candidate markers were up-regulated in the disease proteome.

72

Table 3.2: List of proteins with significant abundance changes in ccRCC 12P depleted fraction

abundance Glycosylation Renal and/or Cancer Gene Name Description disease* control* changes status (PTM) Significance YWHAZ 14-3-3 protein zeta/delta 2 14 ↓ No N/A ADIPOQ Adiponectin 1 10 ↓ Yes √ √ APOF Apolipoprotein F 11 1 ↑ Yes N/A APOL1 Apolipoprotein L1 16 2 ↑ No √ √ ATRN Attractin 2 76 ↓ Yes √ HSPG2 Basement membrane-specific heparan sulfate 94 2 ↑ Yes √ √ proteoglycan core protein CD146 Cell surface glycoprotein MUC18 34 2 ↑ Yes √ √ CETP Cholesteryl ester transfer protein 10 1 ↑ Yes √ √ F11 Coagulation factor XI 19 3 ↑ Yes √ C4B Complement component C4B (Childo blood group) 57 2 ↑ Yes √ √ DSP Desmoplakin 123 5 ↑ No √ ECM1 Extracellular matrix protein 1 21 3 ↑ Yes √ FBLN1 Fibulin-1 13 1 ↑ Yes √ GPX3 Glutathione peroxidase 3 9 1 ↑ No √ √ HIST2H2AC Histone H2A type 2-C 16 3 ↑ No N/A HIST1H2BM Histone H2B type 1-M 8 1 ↑ Yes N/A HIST1H3D Histone H3.1 9 1 ↑ No √ HIST1H4I Histone H4 8 1 ↑ No √ HABP2 Isoform 2 of Hyaluronan-binding protein 2 5 33 ↓ Yes √ √ CRTAC1 Isoform 3 of Cartilage acidic protein 1 18 2 ↓ Yes N/A FN1 Isoform 5 of Fibronectin 68 3 ↓ Yes N/A JUP Junction plakoglobin 22 3 ↑ No √ √ SELL L-selectin 16 1 ↑ Yes √ √ LYVE1 Lymphatic vessel endothelial hyaluronic acid receptor 1 15 2 ↑ Yes √ PGLYRP2 N-acetylmuramoyl-L-alanine amidase 7 1 ↑ Yes √ SYNE1 Nesprin-1 230 9 ↑ No √ √ VNN1 Pantetheinase 17 2 ↑ Yes √ GPLD1 Phosphatidylinositol-glycan-specific phospholipase D 14 1 ↑ Yes √

73

abundance Glycosylation Renal and/or Cancer Gene Name Description disease* control* changes status (PTM) Significance PLTP Phospholipid transfer protein 21 4 ↑ Yes N/A YTHDC2 Probable ATP-dependent RNA helicase YTHDC2 2 36 ↓ No N/A PTGDS Prostaglandin D2 Synthase (21kD, Brain) 9 1 ↑ Yes √ √ SERPINA10 Protein Z-dependent protease inhibitor 21 1 ↑ Yes √ √ ROCK2 Rho-associated protein kinase 2 56 3 ↑ No √ SHBG Sex hormone-binding globulin 1 17 ↓ Yes √ PTPN23 Tyrosine-protein phosphatase non-receptor type 23 3 24 ↓ No √

VCAM1 Vascular cell adhesion protein 1 25 1 ↑ Yes √ √

VCAM1 Vitamin K-dependent protein C heavy chain 15 2 ↑ Yes √ √ Membrane-associated guanylate kinase, WW and PDZ MAGI3 3 51 ↓ No √ domain-containing protein 3 BARD1 BRCA1-associated RING domain protein 1 12 1 ↑ No √ *average spectral count of two technical replicates, for meaningful ratio calculations proteins with no spectral counts were replaced with one (1). ↓ down regulation; ↑ up regulation, N/A not identified

74

These include; lipid transport and metabolic process proteins (e.g. Cholesteryl ester transfer protein, Apolipoprotein F, Apolipoprotein L1, Phospholipid transfer protein), immune system process proteins (e.g. Basement membrane-specific heparan sulfate proteoglycan core protein, Coagulation factor XI, and Prostaglandin D2 Synthase (21kD) Brain), and signal transduction proteins (e.g. Cell surface glycoprotein MUC18, Pantetheinase, and Junction plakoglobin). We used the gene ontology (GO) classification system to characterize both biological process and molecular function of selected potential candidate markers, Figure 3.3A &

3.3B.

Figure 3.3: GO functional classification of selected differentially expressed proteins. A. Molecular function classification and B. Biological process classification. The abundance of both molecular function and biological process are represented by their relative percentage. PANTHER, a free online database was used for this characterization.

Upon further exploration, we established disease associations and glycosylation status of potential candidate markers using Novoseek, a data mining tool resident in Genecards repository

75

(www.genecards.org). Two unique observations were made; (1) strong correlation among potential candidate markers and various kidney and cancer diseases, and (2) majority of potential candidate markers are glycosylated, Table 3.3. Proteins such as basement membrane-specific heparan sulfate proteoglycan core protein (HSPG2), cell surface glycoprotein MUC18 (CD146),

L-selectin (SELL), vascular cell adhesion protein 1 (VCAM1), and protein Z-dependent protease inhibitor (SERPINA10) have been implicated in several disease states including; clear cell renal cell carcinoma, renal failure, gastric cancer, hepatocellular cancer, prostate cancer, lung cancer, ovarian cancer, breast cancer and skin cancer 31-33.

CD146 a novel cell adhesion molecule was recently reported to be a potential marker for clear cell renal cell carcinoma recurrence. Feng et al. observed significantly high levels of

CD146 gene expression in patients with metastatic ccRCC compared to patients with localized ccRCC and concluded that the recurrence of ccRCC is directly related to the levels of CD146 gene expression34-36. In another publication, the presence of CD146 and elevated levels of adiponectin in patients with chronic renal failure were associated with potential indication of endothelial damage and increased cardiovascular risk37. These findings further strengthen our current data wherein a 16 fold abundance increase of CD146 was observed in RCC (+) plasma compared to RCC (–) plasma. Future structural studies of CD146 may provide more information in our understanding of the presence of high amounts of CD146 protein and their role in ccRCC plasma.

3.4.5 Identification and selection of proteins of interest showing differential M-

LAC column binding

It is established that changes in M-LAC binding affinities (low or high) of glycoproteins maybe indicative of the response of glycan structural changes in disease samples38. Hence,

76

glycoproteins in M-LAC fractions (bound and unbound) with glycan alterations were evaluated

based on differential M-LAC binding. Relative quantification was performed (see experimental

section) using spectral counts obtained from ccRCC glycoproteome data. In Table 3.3, we show

a list of glycoproteins with significant differential M-LAC binding (≥ 3 or ≤ 3 fold changes,

significance level p ≤ 0.05) and their potential sites of glycosylation based on literature

information (www..org).

Table 3.3: List of glycoproteins of interest with significant differential M-LAC binding

Disease Control Disease Control M-LAC Gene Name Description FT* FT* BD* BD* changes Glycosylation sites Vitamin D-binding GC protein 34 36 24 2 ↑ N-288 C2 Complement C2 18 26 9 1 ↑ N-29, 112, 290, 33, 467, 471, 621, 651 Plasma kallikrein heavy KLKB1 chain 25 1 16 26 ↓ N-127, 308, 396, 453, 494 Lysosomal alpha- N-133, 310, 367, 497, 645, 651 MAN2B1 mannosidase 30 20 1 18 ↓ , 692, 766, 832, 930, 989 Complement C1r C1R subcomponent 20 1 1 13 ↓ N-125, 221, 514, 581 AHSG Alpha-2-HS-glycoprotein 14 1 7 11 ↓ N-156, 176; O-256, 270, 346 N-34, 185, 983, 1368, 1377, 1523, 2239, 2560, 2779, 2982, 3101, 3224, APOB Apolipoprotein B-100 163 81 54 6 ↑ 3336, 3358, 3411, 3465, 3895, 4237, 4431 C4A Complement C4-A 1 1 59 4 ↑ N-226, 862, 1328, 1391; O-1244 Galectin-3-binding LGALS3BP protein 4 1 9 5 ↑ N-69, 125, 192, 362, 398, 551, 580 Inter-alpha-trypsin ITIH4 inhibitor heavy chain H4 39 41 4 29 ↓ N-81, 207, 517, 577; O-720 N-430, 528, 542, 877, 1007, 1244; O- FN1 Isoform 14 of Fibronectin 47 59 64 1 ↑ 2064, 2065 CLU Clusterin 20 22 29 1 ↑ N-86, 103, 145, 291, 317, 354, 374 SERPINA3 Alpha-1-antichymotrypsin 33 30 34 1 ↑ N-33, 93, 106, 127, 186, 271

*average spectral count of two technical replicates, for meaningful ratio calculations proteins with no spectral counts were replaced with one (1). ↓ down regulation; ↑ up regulation, N/A not identified

77

The association between glycans alterations and cancer is well known and our current observation of altered glycoproteins is consistent with earlier reports39,40.

For instance, clusterin, a heavily glycosylated protein with 7 potential asparagine-linked glycan sites showed increased differential M-LAC binding in disease samples vs. controls.

Clusterin is associated with tumor advancement and carcinogenesis41 and recent reports have indicated glycan alterations of clusterin in cancer vs. non-cancer samples. In stomach cancer studies, Bones et al. showed the linearity between decreased levels of clusterin glycans and the progression of cancer29. In addition, clear cell renal cell carcinoma plasma studies revealed significant glycoform changes between RCC (+) and RCC (–) samples of released clusterin glycans 42. More recently, we have observed significant site-specific glycoform alteration of bi- antennary digalactosylated disialylated (A2G2S2) and core fucosylated bi-antennary digalactosylated disialylated (FA2G2S2) glycans in disease vs. non-diseased ccRCC plasma (see chapter 4).

Similarly, Vitamin D-binding protein (DBP) showed increased binding to the M-LAC column in ccRCC disease samples. The relationship between the glycosylation status and function of DBP in cancer patients is still unclear. Earlier data suggested that there is a direct correlation between decreased levels of oligosaccharides present on DBP and inactivity of Gc microphage activating factor (GcMAF) in cancer patients43-45. Recently however, Rehder et al. investigated DBP glycans levels and observed high abundance of oligosaccharides in cancer patients, which is in contrast to earlier suggestions46.

The current study showed an increase in M-LAC binding of DBP in disease compared to non-disease ccRCC samples suggesting potential glycosylation alterations. However, the focus

78

of this study was not to investigate the function of DBP, therefore, further studies are required to provide information on DBP’s glycosylation association with ccRCC.

3.4.6 Characterization of N-glycan moieties released from depleted M-LAC fractions by porous graphitized carbon (PGC) LC-ESI-IT MS/MS

In this discovery based study, our goal is to understand the global profile of N-glycans released from low abundance glycoproteins enriched through 12 protein depletion followed by lectins fractionation. Changes in these low level glycans may be a potential utility to understand ccRCC presence, progression and disease recurrence. To this end, total N-glycans of depleted M-LAC fractions of pooled disease RCC (+) and non-disease RCC (–) samples were characterized. N- glycans were released by PNGase-F via dot-blotting, online-separated on a porous graphitized carbon (PGC) column, and analyzed using LC-ESI tandem mass spectrometry in negative ion mode. Utilizing MS retention times, charge states, and MS/MS fragmentation pattern, oligosaccharide structures were deduced.

Thirty-six structurally different N-glycans corresponding to 23 N-glycan monosaccharide compositions were identified from two analytical replicates with minimum variation (average %

CV ˂2.5), Figure 3.4. Neutral, sialylated (monosialo, disialo, trisialo, and tetrasialo), fucosylated, and high mannose N-glycans were observed. The identification of N-glycans with various degrees of isomerization was consistent with previous studies of PGC chromatography47.

79

Figure 3.4: N-glycans identified in clear cell renal cell carcinoma plasma. Confidence based on MS/MS identification. Yellow circle: Galactose; Blue square: N-acetylglucosamine; Green circle: Mannose; Purple diamond: sialic acid; Red triangle: Fucose; Core = (GlcNAc)2(Man)3

3.4.7 N-glycan structures alteration analysis

Even though the N-glycan profiles of RCC (+) and RCC (–) M-LAC fractions were similar in overall appearance (Appendix D), a detailed analysis revealed significant differences between disease and non-disease M-LAC fractions then were characterized. First the mean of relative intensities of the two analytical replicate were calculated and the data normalized as previously described48. Briefly, the relative intensities of each glycan in a sample was determined using the ratio of the extracted ion chromatography (EIC) peak area of each N-glycan over the sum of EIC peak areas of all N-glycans in the sample. The resulting relative abundance of individual N-

80

glycans was compared across different samples.

Following normalization, a comparative qualitative approach was taken to evaluate the 36 observed and normalized N-glycans from RCC (+) and RCC (–) M-LAC (bound and unbound) fractions using a Euclidean heat map (Appendix E). All normalized 36 identified glycans (rows) and M-LAC fractions (columns) for RCC (+) and RCC (–) samples with hierarchical clustering ordering are shown. Highly enriched glycans are represented in green and low level glycans represented in dark pink. Three unique observations were made: (1) sialylated N-glycans were expressed in high levels and afucosylated disialo-N-glycan, m/z 1111.42- was observed to be the most abundant glycan structure in all analyzed fractions. This observation is in contrast to a recent report wherein fucosylated glycans were observed to be the most abundant form49.

Afucosylated disialo-N-glycans are frequently observed occurrence in many cancer glycomic studies50,51. In the present data, we observed two structural isomers for afucosylated disialo-N- glycan with different levels of expression - structure #15a with both terminal sialic acid residues in 2,6-linkages eluting prior to structure #15b which has both terminal sialic acid residues in

2,6/2, 3-linkages. (2) N-glycan expression levels in M-LAC bound fractions were higher compared to M-LAC unbound fractions which are consistent with our aim of enriching target glycans using the M-LAC column. In addition, this observation correlates with M-LAC’s ability to segregate glycan variations into bound and unbound fractions. (3) Majority of N-glycans were expressed with different amounts comparing RCC (+) and RCC (–), and some N-glycan structures (#’s, -3b, -28a, -18b, -29b, -19c) were observed to be either “missing” or expressed at extremely low levels in M-LAC fractions and therefore difficult to quantitate. This is indicated as white rows in the heat map. However, this unique observation may point to a potential glycan- specific molecular feature to differentiate disease and non-disease clear cell renal cell carcinoma

81

plasma samples.

After establishing qualitative differences between disease and non-disease fractions, relative quantification and statistical analysis were performed to identify differentially expressed

N-glycans. For N-glycans with zero (0) relative abundance, “0.1” was added for meaningful ratio calculations. Standard students t-test revealed that 44% of identified N-glycans (16 structures) differ significantly (average p<0.011) with an observation of over or under expression of N- glycans in RCC (+) M-LAC fractions. Differentially expressed N-glycans are summarized in

Table 3.4. A notable feature from the differentially expressed glycans was the up-regulation of high degree sialylated and high mannose glycans. We observed highly branched sialylated N- glycans; #18a, 18b, 21b, 23b to be up-regulated in disease M-LAC fractions (bound and unbound).

82

Table 3.4: List of glycans with significant differential expression in ccRCC M-LAC fractions

Glycan Confidence Observed Abundance* No. Composition Glycan (MS/MS) Type Charge m/z (disease vs control) High 1 Core + (Hex)3 High Mannose 2 698.2 High

9 Core + (Hex)6 High Mannose 2 941.3 High 4 Core + (Hex)5 High Mannose 2 860.3 Core + 5 (HexNAc)1(Hex)2(NeuAc)1 High Hybrid 2 864.3 Core + 10a (HexNAc)2(Hex)2(NeuAc)1 High Complex 2 965.9 Core + 10b (HexNAc)2(Hex)2(NeuAc)1 High Complex 2 965.9 Core + 10c (HexNAc)2(Hex)2(NeuAc)1 High Complex 2 965.9 Core + 15b (HexNAc)2(Hex)2(NeuAc)2 High Complex 2 1111.4 Core + 13b (HexNAc)2(Hex)2(Fuc)1(NeuAc)1 High Complex 2 1038.9 Core + 13d (HexNAc)2(Hex)2(Fuc)1(NeuAc)1 High Complex 2 1038.9 Core + 19a (HexNAc)2(Hex)2(Fuc)1(NeuAc)2 High Complex 2 1184.4 Core + 18a (HexNAc)4(Hex)4(NeuAc)4 Medium Complex 2 1178.1 Core + 18b (HexNAc)4(Hex)4(NeuAc)4 Medium Complex 2 1178.1

21b Core + (HexNAc)3(Hex)3(Neuc)2 High Complex 2 1294.0 Core + 23b (HexNAc)3(Hex)3(Fuc)1(NeuAc)3 Medium Complex 3 1008.0

83 11 Core + (HexNAc)3(Hex)2(Fuc)1 High Complex 2 994.9

*M-LAC bound and unbound; Core = (GlcNAc)2(Man)3; ↓ down regulation; ↑ up regulation. For meaningful ratio calculations glycans with zero (0) relative abundance were replaced with “0.1”

The observation of up regulation of sialylated glycans was recently reported in a study involving tissues of renal cell carcinoma52 and our data is in agreement with this report. Elevated levels of highly branched sialylated glycans is associated with increased activity of sialyltransferases, an enzyme that regulates biosynthesis of sialic acid residues 20,53. Several studies have reported an effect of alterations of sialic acids composition on cell adhesion, an implicated factor in metastasis54, 55. In breast cancer for example, Lin et al. reported that increase amounts of sialic acid correlates with a decrease in cell-cell adhesion and an increase in cell invasiveness56.

An increased level of high mannose structures (glycans #1, #9, #4) was another distinct feature in this study. Elevation of high mannose type glycans have been indicated in various cancer types. High levels of high mannose glycans were reported to correlate with breast cancer progression56 and a similar trend reported in head and neck tumors studies57. Higher or lower levels of glycans are a hallmark of cancer progression and their alterations may be related to changes in expression levels of enzymes involved in the glycan biosynthesis pathway. Therefore the goal of these studies was to assess the potential of N-glycans as biomarker candidates for clear cell renal cell carcinoma.

3.4.8 Validation of differentially expressed N-glycans by extracted ion chromatograms

Extracted ion chromatograms (EICs) of some selected N-glycans were used to validate different amounts of N-glycan expression in M-LAC fractions. In Figure 3.5A & 3.5B, the EIC of N- glycan #1, [Core + (Hex)3] m/z (698.22-); N-glycan #9, [Core + (Hex)6] m/z (941.32-); N- glycan #18a (Isomer a), [Core + (HexNAc)4(Hex)4 (NeuAc)4] m/z (1178.13-); and N-glycan

#18b (Isomer b), [Core + (HexNAc)4(Hex)4 (NeuAc)4] m/z (1178.13-) highlights an elevation of high mannose and tetra-antennary sialo types oligosaccharides in RCC (+) M-LAC fractions.

84

Figure 3.5: Extracted ion chromatograms to illustrate differentially expressed glycans in M-LAC bound and unbound fractions. A. The extracted ion chromatograms of Glycan #1 (left) and Glycan #9 (right) showed that both N-glycans were expressed at higher levels in M-LAC bound before ccRCC surgery (upper panels) compared to those observed after ccRCC surgery (lower panels). B. The extracted ion chromatograms of isomers Glycan #18a (a) and Glycan #18b (b) showed that both N-glycans were expressed at higher levels in M-LAC unbound and bound fractions before ccRCC surgery (1st and 3rd panels) compared to those observed after ccRCC surgery (2nd and 4th panels). Yellow circle: Galactose; Blue square: N-acetylglucosamine; Green circle: Mannose; Purple diamond: sialic acid; Red triangle: Fucose; Core = (GlcNAc)2(Man)3

85

All N-glycan structural identifications were confirmed via MS/MS fragmentation. In summary, our N-glycan data correlates with the glycoproteomics data in which M-LAC differential binding suggests potential glycosylation (glycan-specific) changes in ccRCC cancer fractions and our current report marks a novel mapping of glycans in ccRCC plasma.

3.5 CONCLUSION

We have successfully performed fractionation of pooled plasma from 20 patients diagnosed with clear cell renal cell carcinoma (ccRCC) using 12 high abundance immuno-affinity protein depletion and multi-lectin affinity chromatography (M-LAC) platform. Alterations in the plasma proteome, glycoproteome and N-glycome of ccRCC patients were studied for the identification of low level potential candidate markers that can be of interest for early diagnosis or used as a utility to monitor ccRCC recurrence. We report in this chapter that low abundance proteins with significant expression changes such as cell surface glycoprotein MUC18, basement membrane- specific heparan sulfate proteoglycan core protein, L-selectin, and vascular cell adhesion protein

1 may be potential candidates with further validation because of their observed association with various cancers and renal related diseases. Further, proteins with glycan alterations showing differential M-LAC column binding confirm reports of the ability of lectins to target potential glyco-biomarker candidates. Complex type sialylated bearing fucose glycans released from enriched glycoproteins were observed with alterations in ccRCC disease patients suggesting alterations of glycosylation on the onset of clear cell renal cell carcinoma. This study was conducted as a first step in identifying potential candidate markers of interest in ccRCC plasma.

In Chapter 4, we characterize a glycoprotein (clusterin) identified through this discovery based study and may putatively serve as a biomarker for ccRCC that alone or in combination with other biomarkers for early detection.

86

3.6 REFERENCES

1. Linehan, W. M.; Walther, M. M.; Zbar, B., The genetic basis of cancer of the kidney. J Urol 2003, 170 (6 Pt 1), 2163-72.

2. Pantuck, A. J.; Zisman, A.; Belldegrun, A. S., The changing natural history of renal cell carcinoma. J Urol 2001, 166 (5), 1611-23.

3. Weiss, R. H.; Lin, P. Y., Kidney cancer: identification of novel targets for therapy. Kidney Int 2006, 69 (2), 224-32.

4. Herrmann, W.; Stockle, M.; Sand-Hill, M.; Hubner, U.; Herrmann, M.; Obeid, R.; Wullich, B.; Loch, T.; Geisel, J., The measurement of complexed prostate-specific antigen has a better performance than total prostate-specific antigen. Clin Chem Lab Med 2004, 42 (9), 1051- 7.

5. Yang, Z.; Hancock, W. S., Approach to the comprehensive analysis of glycoproteins isolated from human serum using a multi-lectin affinity column. J Chromatogr A 2004, 1053 (1- 2), 79-88.

6. Chen, S.; LaRoche, T.; Hamelinck, D.; Bergsma, D.; Brenner, D.; Simeone, D.; Brand, R. E.; Haab, B. B., Multiplexed analysis of glycan variation on native proteins captured by antibody microarrays. Nature methods 2007, 4 (5), 437-44.

7. Teng, P. N.; Hood, B. L.; Sun, M.; Dhir, R.; Conrads, T. P., Differential proteomic analysis of renal cell carcinoma tissue interstitial fluid. Journal of Proteome Research 2011, 10 (3), 1333-42.

8. Zeng, Z.; Hincapie, M.; Pitteri, S. J.; Hanash, S.; Schalkwijk, J.; Hogan, J. M.; Wang, H.; Hancock, W. S., A proteomics platform combining depletion, multi-lectin affinity chromatography (M-LAC), and isoelectric focusing to study the breast cancer proteome. Analytical Chemistry 2011, 83 (12), 4845-54.

9. Plavina, T.; Wakshull, E.; Hancock, W. S.; Hincapie, M., Combination of abundant protein depletion and multi-lectin affinity chromatography (M-LAC) for plasma protein biomarker discovery. Journal of Proteome Research 2007, 6 (2), 662-71.

10. Abbott, K. L.; Aoki, K.; Lim, J. M.; Porterfield, M.; Johnson, R.; O'Regan, R. M.; Wells, L.; Tiemeyer, M.; Pierce, M., Targeted glycoproteomic identification of biomarkers for human breast carcinoma. Journal of Proteome Research 2008, 7 (4), 1470-80.

11. Gao, Y.; Ma, F.; Zhang, W.; Zhong, F.; Tang, H.; Xu, D.; Zhao, L., O-glycan profiling of serum glycan for potential renal cancer biomarkers. Sci China Life Sci 2013, 56 (8), 739-44. 12. Kullolli, M.; Hancock, W. S.; Hincapie, M., Automated platform for fractionation of human plasma glycoproteome in clinical proteomics. Analytical Chemistry 2010, 82 (1), 115-20.

87

13. Matsumura, K.; Higashida, K.; Ishida, H.; Hata, Y.; Yamamoto, K.; Shigeta, M.; Mizuno- Horikawa, Y.; Wang, X.; Miyoshi, E.; Gu, J.; Taniguchi, N., Carbohydrate binding specificity of a fucose-specific lectin from Aspergillus oryzae: a novel probe for core fucose. The Journal of biological chemistry 2007, 282 (21), 15700-8.

14. Fischer, E.; Brossmer, R., Sialic acid-binding lectins: submolecular specificity and interaction with sialoglycoproteins and tumour cells. Glycoconj J 1995, 12 (5), 707-13.

15. Li, W. P.; Zuber, C.; Roth, J., Use of Phaseolus vulgaris leukoagglutinating lectin in histochemical and blotting techniques: a comparison of digoxigenin- and biotin-labelled lectins. Histochemistry 1993, 100 (5), 347-56.

16. Dall'Olio, F.; Chiricolo, M., Sialyltransferases in cancer. Glycoconj J 2001, 18 (11-12), 841-50.

17. Pousset, D.; Piller, V.; Bureaud, N.; Monsigny, M.; Piller, F., Increased alpha2,6 sialylation of N-glycans in a transgenic mouse model of hepatocellular carcinoma. Cancer Research 1997, 57 (19), 4249-56.

18. Recchi, M. A.; Hebbar, M.; Hornez, L.; Harduin-Lepers, A.; Peyrat, J. P.; Delannoy, P., Multiplex reverse transcription polymerase chain reaction assessment of sialyltransferase expression in human breast cancer. Cancer Research 1998, 58 (18), 4066-70.

19. Takada, A.; Ohmori, K.; Yoneda, T.; Tsuyuoka, K.; Hasegawa, A.; Kiso, M.; Kannagi, R., Contribution of carbohydrate antigens sialyl Lewis A and sialyl Lewis X to adhesion of human cancer cells to vascular endothelium. Cancer Research 1993, 53 (2), 354-61.

20. Dube, D. H.; Bertozzi, C. R., Glycans in cancer and inflammation--potential for therapeutics and diagnostics. Nat Rev Drug Discov 2005, 4 (6), 477-88.

21. Turner, G. A.; Skillen, A. W.; Buamah, P.; Guthrie, D.; Welsh, J.; Harrison, J.; Kowalski, A., Relation between raised concentrations of fucose, sialic acid, and acute phase proteins in serum from patients with cancer: choosing suitable serum glycoprotein markers. Journal of Clinical Pathology 1985, 38 (5), 588-92.

22. Jensen, P. H.; Karlsson, N. G.; Kolarich, D.; Packer, N. H., Structural analysis of N- and O-glycans released from glycoproteins. Nature Protocols 2012, 7 (7), 1299-310.

23. Gbormittah, F. O.; Haab, B. B.; Partyka, K.; Garcia-Ott, C.; Hancapie, M.; Hancock, W. S., Characterization of glycoproteins in pancreatic cyst fluid using a high-performance multiple lectin affinity chromatography platform. Journal of Proteome Research 2014, 13 (1), 289-99. 24. Leymarie et. al., Interlaboratory study on differential analysis of protein glycosylation by mass spectrometry: the ABRF glycoprotein research multi-institutional study 2012. Molecular & cellular proteomics : MCP 2013, 12 (10), 2935-2951.

88

25. Plavina, T.; Hincapie, M.; Wakshull, E.; Subramanyam, M.; Hancock, W. S., Increased plasma concentrations of cytoskeletal and Ca2+-binding proteins and their peptides in psoriasis patients. Clinical Chemistry 2008, 54 (11), 1805-14.

26. Yang, Z.; Harris, L. E.; Palmer-Toy, D. E.; Hancock, W. S., Multilectin affinity chromatography for characterization of multiple glycoprotein biomarker candidates in serum from breast cancer patients. Clinical Chemistry 2006, 52 (10), 1897-905.

27. Abbott, K. L.; Lim, J. M.; Wells, L.; Benigno, B. B.; McDonald, J. F.; Pierce, M., Identification of candidate biomarkers with cancer-specific glycosylation in the tissue and serum of endometrioid ovarian cancer patients by glycoproteomic analysis. Proteomics 2010, 10 (3), 470-81.

28. Batruch, I.; Lecker, I.; Kagedan, D.; Smith, C. R.; Mullen, B. J.; Grober, E.; Lo, K. C.; Diamandis, E. P.; Jarvi, K. A., Proteomic analysis of seminal plasma from normal volunteers and post-vasectomy patients identifies over 2000 proteins and candidate biomarkers of the urogenital system. Journal of Proteome Research 2011, 10 (3), 941-53.

29. Bones, J.; Byrne, J. C.; O'Donoghue, N.; McManus, C.; Scaife, C.; Boissin, H.; Nastase, A.; Rudd, P. M., Glycomic and glycoproteomic analysis of serum from patients with stomach cancer reveals potential markers arising from host defense response mechanisms. Journal of Proteome Research 2011, 10 (3), 1246-65.

30. Zheng, X.; Wu, S. L.; Hincapie, M.; Hancock, W. S., Study of the human plasma proteome of rheumatoid arthritis. Journal of chromatography. A 2009, 1216 (16), 3538-45.

31. Feng, G.; Fang, F.; Liu, C.; Zhang, F.; Huang, H.; Pu, C., CD146 gene expression in clear cell renal cell carcinoma: a potential marker for prediction of early recurrence after nephrectomy. Int Urol Nephrol 2012, 44 (6), 1663-9.

32. Wu, G. J.; Wu, M. W.; Wang, S. W.; Liu, Z.; Qu, P.; Peng, Q.; Yang, H.; Varma, V. A.; Sun, Q. C.; Petros, J. A.; Lim, S. D.; Amin, M. B., Isolation and characterization of the major form of human MUC18 cDNA gene and correlation of MUC18 over-expression in prostate cancer cell lines and tissues with malignant progression. Gene 2001, 279 (1), 17-31.

33. Wu, Z.; Li, J.; Yang, X.; Wang, Y.; Yu, Y.; Ye, J.; Xu, C.; Qin, W.; Zhang, Z., MCAM is a novel metastasis marker and regulates spreading, apoptosis and invasion of ovarian cancer cells. Tumour Biol 2012, 33 (5), 1619-28.

34. Imbert, A. M.; Garulli, C.; Choquet, E.; Koubi, M.; Aurrand-Lions, M.; Chabannon, C., CD146 expression in human breast cancer cell lines induces phenotypic and functional changes observed in Epithelial to Mesenchymal Transition. PLoS One 2012, 7 (8), e43752. 35. Schon, M.; Kahne, T.; Gollnick, H.; Schon, M. P., Expression of gp130 in tumors and inflammatory disorders of the skin: formal proof of its identity as CD146 (MUC18, Mel-CAM). J Invest Dermatol 2005, 125 (2), 353-63.

89

36. Kristiansen, G.; Yu, Y.; Schluns, K.; Sers, C.; Dietel, M.; Petersen, I., Expression of the cell adhesion molecule CD146/MCAM in non-small cell lung cancer. Anal Cell Pathol 2003, 25 (2), 77-81.

37. Malyszko, J.; Malyszko, J. S.; Brzosko, S.; Wolczynski, S.; Mysliwiec, M., Adiponectin is related to CD146, a novel marker of endothelial cell activation/injury in chronic renal failure and peritoneally dialyzed patients. The Journal of clinical endocrinology and metabolism 2004, 89 (9), 4620-7.

38. Yang, Z.; Hancock, W. S., Approach to the comprehensive analysis of glycoproteins isolated from human serum using a multi-lectin affinity column. Journal of chromatography. A 2004, 1053 (1-2), 79-88.

39. Pawlik, T. M.; Hawke, D. H.; Liu, Y.; Krishnamurthy, S.; Fritsche, H.; Hunt, K. K.; Kuerer, H. M., Proteomic analysis of nipple aspirate fluid from women with early-stage breast cancer using isotope-coded affinity tags and tandem mass spectrometry reveals differential expression of vitamin D binding protein. BMC Cancer 2006, 6, 68.

40. Woltje, M.; Tschoke, B.; von Bulow, V.; Westenfeld, R.; Denecke, B.; Graber, S.; Jahnen-Dechent, W., CCAAT enhancer binding protein beta and hepatocyte nuclear factor 3beta are necessary and sufficient to mediate dexamethasone-induced up-regulation of alpha2HS- glycoprotein/fetuin-A gene expression. J Mol Endocrinol 2006, 36 (2), 261-77.

41. Andersen, C. L.; Schepeler, T.; Thorsen, K.; Birkenkamp-Demtroder, K.; Mansilla, F.; Aaltonen, L. A.; Laurberg, S.; Orntoft, T. F., Clusterin expression in normal mucosa and colorectal cancer. Molecular & cellular proteomics : MCP 2007, 6 (6), 1039-48.

42. Tousi, F.; Bones, J.; Iliopoulos, O.; Hancock, W. S.; Hincapie, M., Multidimensional liquid chromatography platform for profiling alterations of clusterin N-glycosylation in the plasma of patients with renal cell carcinoma. Journal of chromatography. A 2012, 1256, 121-8.

43. Yamamoto, N.; Naraparaju, V. R.; Asbell, S. O., Deglycosylation of serum vitamin D3- binding protein leads to immunosuppression in cancer patients. Cancer Research 1996, 56 (12), 2827-31.

44. Yamamoto, N.; Naraparaju, V. R.; Urade, M., Prognostic utility of serum alpha-N- acetylgalactosaminidase and immunosuppression resulted from deglycosylation of serum Gc protein in oral cancer patients. Cancer Research 1997, 57 (2), 295-9.

45. Yamamoto, N.; Suyama, H.; Ushijima, N., Immunotherapy of metastatic breast cancer patients with vitamin D-binding protein-derived macrophage activating factor (GcMAF). International journal of cancer. Journal international du cancer 2008, 122 (2), 461-7. 46. Rehder, D. S.; Nelson, R. W.; Borges, C. R., Glycosylation status of vitamin D binding protein in cancer patients. Protein Sci 2009, 18 (10), 2036-42.

90

47. Everest-Dass, A. V.; Jin, D.; Thaysen-Andersen, M.; Nevalainen, H.; Kolarich, D.; Packer, N. H., Comparative structural analysis of the glycosylation of salivary and buccal cell proteins: innate protection against infection by Candida albicans. Glycobiology 2012, 22 (11), 1465-79.

48. Nakano, M.; Saldanha, R.; Gobel, A.; Kavallaris, M.; Packer, N. H., Identification of glycan structure alterations on cell membrane proteins in desoxyepothilone B resistant leukemia cells. Molecular & cellular proteomics : MCP 2011, 10 (11), M111 009001.

49. Borzym-Kluczyk, M.; Radziejewska, I.; Darewicz, B., Glycosylation of proteins in healthy and pathological human renal tissues. Folia Histochem Cytobiol 2012, 50 (4), 599-604.

50. Dall'olio, F., Protein glycosylation in cancer biology: an overview. Clin Mol Pathol 1996, 49 (3), M126-35.

51. Kim, Y. J.; Varki, A., Perspectives on the significance of altered glycosylation of glycoproteins in cancer. Glycoconj J 1997, 14 (5), 569-76.

52. Yoshimura, M.; Nishikawa, A.; Ihara, Y.; Taniguchi, S.; Taniguchi, N., Suppression of lung metastasis of B16 mouse melanoma by N-acetylglucosaminyltransferase III gene transfection. Proc Natl Acad Sci U S A 1995, 92 (19), 8754-8.

53. Gessner, P.; Riedl, S.; Quentmaier, A.; Kemmner, W., Enhanced activity of CMP- neuAc:Gal beta 1-4GlcNAc:alpha 2,6-sialyltransferase in metastasizing human colorectal tumor tissue and serum of tumor patients. Cancer letters 1993, 75 (3), 143-9.

54. Kannagi, R., Molecular mechanism for cancer-associated induction of sialyl Lewis X and sialyl Lewis A expression-The Warburg effect revisited. Glycoconj J 2004, 20 (5), 353-64.

55. Lin, S.; Kemmner, W.; Grigull, S.; Schlag, P. M., Cell surface alpha 2,6 sialylation affects adhesion of breast carcinoma cells. Exp Cell Res 2002, 276 (1), 101-10.

56. de Leoz, M. L.; Young, L. J.; An, H. J.; Kronewitter, S. R.; Kim, J.; Miyamoto, S.; Borowsky, A. D.; Chew, H. K.; Lebrilla, C. B., High-mannose glycans are elevated during breast cancer progression. Molecular & cellular proteomics : MCP 2011, 10 (1), M110 002717.

57. Lattova, E.; Varma, S.; Bezabeh, T.; Petrus, L.; Perreault, H., Mass spectrometric profiling of N-linked oligosaccharides and uncommon glycoform in mouse serum with head and neck tumor. J Am Soc Mass Spectrom 2008, 19 (5), 671-85.

91

CHAPTER 4

TANDEM MASS SPECTROMETRY CHARACTERIZATION OF CLUSTERIN GLYCOPEPTIDE VARIANTS IN THE PLASMA OF CLEAR CELL RENAL CELL CARCINOMA

This work is currently under review for publication: Francisca O. Gbormittah, Jonathan Bones,

Marina Hincapie, Fateme Tousi, William S. Hancock, Othon Iliopoulos, Cancer Research, 2014

92

4.1 ABSTRACT

Cancer-related alterations in protein glycosylation may serve as diagnostic or prognostic biomarkers or may be used for monitoring disease progression. Clusterin is a medium abundance, yet heavily glycosylated glycoprotein, which is up-regulated in clear cell renal cell carcinoma (ccRCC) tumors. We recently reported that the N-glycan profile of clusterin is altered in the plasma of ccRCC patients. Here we characterized the occupancy and the degree of heterogeneity of individual N-glycosylation sites of clusterin, in the plasma of patients diagnosed with localized ccRCC, before and after curative nephrectomy (n=40). To this end we used tandem mass spectrometry of immunoaffinity enriched plasma samples, to analyze the individual glycosylation sites in clusterin. We determined the levels of targeted clusterin glycoforms containing either a bi-antennary digalactosylated disialylated (A2G2S2) glycan or a core fucosylated bi-antennary digalactosylated disialylated (FA2G2S2) glycan at N-glycosite N-374.

We showed that the presence of these two clusterin glycoforms differed significantly in the plasma of patients prior to and after curative nephrectomy for localized ccRCC. Removal of ccRCC led to a significant increase in the levels of both FA2G2S2 and A2G2S2 glycans in plasma clusterin. These changes were further confirmed by lectin blotting of plasma clusterin.

Site-specific changes in plasma clusterin glycosylation may lead to the development of biomarkers for early detection of ccRCC recurrence and/or progression.

93

4.2 INTRODUCTION

Clear cell renal cell carcinoma (ccRCC) accounts for approximately 75% of sporadic renal cancer. In addition, patients with Von Hippel-Lindau (VHL) disease develop multiple renal cancers exclusively of clear cell histology (ccRCC). If the disease is detected at early stages it can be cured by surgery. Locally advanced disease has a great risk of recurrence and metastatic disease is currently incurable 1, 2. Therefore, there is an urgent need to identify plasma-based markers that can be used for early detection and/or prediction of disease recurrence.

Protein glycosylation is the most common posttranslational protein modification (PTM) and aberrant protein glycosylation has been shown to be associated with several malignancies 3-6.

Glycoprotein-based biomarkers have become widely used in the clinical setting 7-9, for example, monosialylated alpha-fetoprotein (AFP) and carbohydrate antigen 19-9 (CA 19-9) are currently used as biomarkers for hepatocellular carcinoma and stomach cancer respectively. Although these markers are widely used they have low sensitivity and/or specificity; their clinical application therefore requires the combination of these biomarkers with additional ones in order to create a panel with optimal receptor operative curve (ROC) 10, 11.

A major challenge in glycoproteomic studies is the low concentration of the potentially relevant candidate glycoprotein marker(s) in the blood. In addition, the attachment of glycans in multiple N-glycan sites of a protein (macro-heterogeneity) and the variable number and levels of

N-glycans at one or more occupancy sites (micro-heterogeneity) further complicates glycoproteomic investigation.

The most common approaches for undertaking N-glycosylation studies include either oligosaccharide profiling or glycopeptides analysis. In oligosaccharide profiling total N-glycans are released from a single purified/enriched glycoprotein, using PNGaseF treatment followed by

94

either labeled or un-labeled chromatographic and/or MS analysis. In glycopeptide analysis a single glycoprotein or multiple proteins are digested and the resulting glycopeptides are subsequently enriched followed by LC-MS analysis 12-15. While glycan structural information is obtained using the oligosaccharide profiling approach, no knowledge about glycan site attachment can be derived. In contrast, a glycopeptide focused method may provide information regarding both the glycan structure and the site of glycan attachment 16-18. A limitation to this latter approach however is that, due to inherent analytical complexity, data analysis is often time consuming and tedious, especially when several glycopeptides with multiple glycan attachment sites are involved.

Tandem mass spectrometry analysis of glycopeptides, combined with PNGaseF-assisted glycan release, provides information on direct, site-specific oligosaccharide heterogeneity of glycoproteins 19, and glycopeptide sequence information. Thus, this approach allows for site- specific glycoform comparison between clinically relevant specimens and corresponding controls. Observations on site-specific glycosylation changes may enhance our understanding of the glycoprotein function and improve glycan detection specificity for therapeutic targets, as previously noted 20- 26.

The significance of clusterin differential expression across many cancers, especially breast, prostate, ovarian and renal cell carcinoma has been reported previously 27- 32. In ccRCC, loss-of-the VHL tumor suppressor gene has been associated with a HIF-independent up- regulation of clusterin in tumor samples 33. Preliminary evidence suggests that strong expression of clusterin in surgically removed ccRCC tissue may correlate with a shorter recurrence-free survival 34. In contrast to studies examining the expression of clusterin in surgically removed ccRCC tumors, few studies addressed the importance of clusterin in blood circulation for

95

detection of ccRCC or in monitoring disease progression. Our previous work examined plasma clusterin glycosylation in ccRCC 35. We profiled total N-glycans released from immunoaffinity enriched plasma clusterin and observed glycosylation changes in the plasma clusterin of patients prior to RCC (+) and after RCC (-) curative nephrectomy for localized ccRCC.

In this report, we mapped the individual N-glycosylation sites of plasma clusterin and detected site-specific changes linked to the presence of ccRCC, using liquid chromatography followed by tandem mass spectrometry strategy. To obtain glycan structural information we utilized collision induced dissociation (CID) MS/MS (MS2) fragmentation and verified the sequence of each glycopeptide by CID-MS3 fragmentation. We showed that an increase in the levels of both a core fucosylated bi-antennary digalactosyl disialylated (FA2G2S2) and a bi- antennary digalactosyl disialylated (A2G2S2) glycans best discriminates between RCC (+) and

RCC (-) plasma samples.

4.3 MATERIALS AND METHODS

4.3.1 Materials

Capture select® Clusterin resin and Capture select® Protein G were provided by BAC. B. V.,

(Netherlands) and Life Technologies, Inc. (Carlsbad, CA) respectively. Human clusterin ELISA kit was purchased from R&D systems, Inc. (Minneapolis, MN). POROUS beads for conjugation were purchased from Applied Biosystems (Framingham, MA). Sequencing grade trypsin and

Glu-C endopeptidase were purchased from Promega (Madison, WI). All lectins used throughout this study were obtained from Vector laboratories (Burlingame, CA). HPLC grade water, acetonitrile and all other buffer reagents were purchased from Sigma-Aldrich (St. Louis, MO).

96

4.3.2 Clear cell renal cell carcinoma (ccRCC) plasma sample collection and preparation

Plasma samples from 20 ccRCC patients both before (RCC (+)) and after (RCC (-)) nephrectomy

(+/-RCC, n=40) were collected at Massachusetts General Hospital (MGH) (Boston, MA).

Patients provided informed consent to the corresponding Institutional Review Board (IRB) approved protocol. Pathology reports after nephrectomy were used to verify the diagnosis of ccRCC for each patient. A summary of the ccRCC plasma samples used for this study is presented in Table 3.1 (see Chapter 3). Immediately after plasma collection, samples were aliquot and frozen at -80oC until further analysis. To ensure consistency, each sample was not thawed more than twice.

4.3.3 Clusterin immuno-affinity HPLC purification

Clusterin glycoprotein was isolated from plasma samples through immuno-affinity antibody capture. Immuno-affinity HPLC purification of clusterin consisted of three PEEK columns packed in-house using high liquid pressure as reported earlier 35. Prior to clusterin purification, the concentration of clusterin in each sample was determined using a specific clusterin ELISA assay for each of the RCC (+) and RCC (-) plasma samples. 50µL of each (before nephrectomy, n=20; after nephrectomy, n=20) ccRCC plasma samples were centrifuged at 8,000 x g to precipitate mucins and other particulates. Purification of clusterin from plasma samples was performed using a semi-automated 3-column multidimensional platform as we previously described. Briefly, we used a 44 minute on-line 2D HPLC platform which combines albumin and

IgG depletion using CaptureSelect HSA and Protein G ligands immobilized on POROS chromatographic media, followed by immuno-capture of clusterin on an agarose bead functionalized with an anti-Clusterin ligand. Enriched clusterin fractions were desalted using an

97

R1 reversed phase column and the concentration of enriched clusterin was determined using a

BCA assay (Thermo Fischer Scientific, San Jose, CA) following the manufacturer’s instructions.

4.3.4 Lectin blot assay of purified Clusterin

In order to verify the identified clusterin fucosylation changes between plasma samples obtained before and after nephrectomy we performed lectin blot assays using biotinylated Aleuria aurantia lectin (AAL) (Vector Laboratories, Burlingame, CA). Purified clusterin (1µg) was loaded on a 10% Mini-PROTEAN® TGX™ Tris/Glycine SDS Precast Gels (Bio-Rad

Laboratories, Hercules, CA). Following completion of the electrophoretic run, proteins were transferred to 0.2 µm nitrocellulose mini format using a Bio-Rad Transfer-Blot Turbo transfer system at 2.5A constant voltage for 3mins. The nitrocellulose membrane was blocked in Carbo-

Free protein-based blocking solution (Vector laboratories, Inc. Burlingame, CA) for 1 hour at room temperature. After three washes in Tris-buffered saline containing 0.5% Tween-20

(TBST), the membrane was incubated with the biotinylated lectin (AAL) at a concentration of 1

µg/mL for 1 hour, followed by three washes with TBST. The membrane was then incubated for

1 hour in 1 µg/mL streptavidin–HRP (Vector Laboratories, Burlingame, CA). Lectin blots were visualized using ECL Western Blotting reagents (GE Healthcare) and images captured with a

Fluorchem SP system (Alpha Innotech, Santa Clara, CA).

4.3.5 One dimensional-SDS PAGE and enzymatic digestion

10 µg of purified clusterin from each sample was loaded on a 10% Bis-Tris SDS-PAGE gel

(Novex® NuPAGE®, Life Technologies) and separated for 45minutes at 200V. Resulting protein bands were visualized with SimplyBlue Safe Stain Coomassie® G-250 stain (Life

Technologies, Grand Island, NY). Gel bands corresponding to clusterin (based on western blot

98

verification) were excised and cut into ~1mm2 pieces followed by de-staining with 500ul 0.1 M ammonium bicarbonate (NH4HCO3) pH 7.6 and acetonitrile (ACN) in an alternating fashion.

After destaining, 10 mM (dithiothreitol) DTT for disulfide bond reduction was added to gel pieces and incubated for 45 minutes at room temperature followed by 1hour alkylation using 50 mM iodoacetamide (IAA) at room temperature, in the dark. Gel pieces were washed three times with 0.1 M NH4HCO3pH 7.6, then briefly washed with acetonitrile and reduced to dryness by vacuum centrifugation. Trypsin and Glu-C (12.5 ng/µL) or PNGase F (1unit per 25 µL) enzymes were prepared separately in 50 mM NH4HCO3 pH 7.6 and added to gel pieces for a 12 hour incubation at 37oC. Glycosylated peptides (Trypsin and/or Glu-C digest) were extracted following incubation with 5% v/v formic acid/ 50% v/v acetonitrile twice, while de-glycosylated peptides (PNGase F + Trypsin and/or Glu-C peptides) were extracted with water and 5% v/v formic acid/ 50% v/v acetonitrile. Extracted peptides were dried completely by vacuum centrifugation and reconstituted in 20 µL 0.1%formic acid in water prior to LC-MS/MS analysis.

4.3.6 C18 reversed phase nano-LC-MS/MS Analysis

Reconstituted glycosylated and de-glycosylated peptides were loaded onto a reverse phase capillary column (150mm x 75mm i.d.) packed with 5µm, 300Ǻ C18silica bonded stationary phase. Liquid chromatography separation was performed using a Dionex Ultimate 3000 HPLC system (LC Packings-Dionex, Marlton, NJ, USA) at a constant flow-rate of 200 nL/min. Mobile phase A was 0.1% v/v formic acid in HPLC grade water and mobile phase B was 0.1% v/v formic acid in acetonitrile using the following gradient; 15 minutes at 2% mobile phase B for sample loading; a linear increase from 5% mobile phase B to 40% mobile phase B in 60 minutes;

40% mobile phase B to 80% mobile phase B in 15 minutes and lastly 80% mobile phase B to 2% mobile phase B in 10 minutes. All mass spectrometry experiments were performed on an LTQ- 99

Orbitrap Elite instrument (Thermo Fisher Scientific, San Jose, CA). The mass spectrometer was operated in the data dependent fashion with an automatic switch between the full MS survey

(scan 1) acquired over the range of 400-2000 m/z, followed by CID-MS/MS fragmentation on the six most abundant precursor ions. The mass spectrometer was operated using the following parameters; precursor ion isolation width of 3Th, mass resolution at 60,000 for 400m/z, 35% normalized collision energy, 2.1kV spray voltage and capillary temperature of 210 °C. The instrument was calibrated and tuned with Thermo Scientific calibration mix.

4.3.7 Data and statistical Analysis

Gel loading amounts of clusterin were normalized based upon the ELISA quantification data.

Lectin blot densities of RCC (+) and RCC (-) purified clusterin samples were evaluated using

ImageJ, version 1.47 (http://rsbweb.nih.gov/ij/download.html). Experimental deglycosylated peptides and glycosylated peptides were identified by searching LC-MS/MS raw data with a combined clusterin sequence (SwissProt P10909) and annotated human database (release

2012_1) using the SEQUEST algorithm (Thermo Electron Corp, San Jose, CA) incorporated within the Thermo Fisher Proteome Discoverer 1.3 suite. Acceptance criteria for peptide identifications were as follows; singly, doubly, and triply charged peptides were accepted for identification if Xcorr values were ≥1.9, 2.5, and 3.8 respectively; delta CN value (ΔCN) > 0.1; and peptide probability score > 0.95. Other search parameters were as follows; carbamidomethylation (C) as fixed modification; deamidation (N) as variable modification; full trypsin as the enzyme specificity; maximum 2 missed cleavages allowed; precursor ion mass tolerance was 5 ppm and fragment ion mass tolerance was 0.8 Da. Glycan site occupancy was determined by the combination of automatic database search followed by peak area analysis of extracted ion chromatogram (EIC) of de-glycosylated and glycosylated peptides. CID-MS2 and 100

CID-MS3 fragmentation pattern allowed for manual annotation, characterization, and relative quantitation of glycoforms. GlycoWorkbench36 version 1.1.3480 was used for glycan structure generation and evaluation of the MS/MS fragmentation of glycan structures. Data analysis and graphic generation were performed using Microsoft Excel 2010. P-value <0.05 was considered as statistically significant in all statistical analyses.

4.4 RESULTS AND DISCUSSION

Clusterin is a heavily glycosylated protein (approximately 30% w/w carbohydrate), harboring seven potential N-linked glycosylation sites 37. Since glycosylation alterations are directly linked to several disease states, understanding the glycan status may serve as a biomarker for the specific disease. Our previous work characterized the overall glycans modifications of clusterin isolated from patient samples prior to and after curative nephrectomy.

In the current study, we employed tandem mass spectrometry, integrated with lectin blotting, to investigate and quantify site-specific changes of previously observed clusterin glycans. 1D SDS-PAGE gel loading of purified clusterin was estimated based upon the clusterin concentration determined by ELISA in RCC (+) and RCC (-) plasma samples. Sialylated bi- and tri-antennary glycans (monosialo, disialo, and trisialo), with or without fucose, were the major glycan types identified. A significant increase in the levels of a core fucosylated bi-antennary digalactosyl disialylated glycan (FA2G2S2) and a bi-antennary digalactosyl disialylated glycans

(A2G2S2), were observed in RCC (-) samples.

4.4.1 Development of the Analytical Approach

Reversed phase nano-liquid chromatography mass spectrometry analysis of glycopeptides is challenging due to the hydrophilic nature of glycopeptides and the associated structural

101

complexity imparted by variability in the glycan structures present38. Glycopeptides unlike unmodified peptides generally do not ionize well with electrospray and their signals are easily suppressed by co-eluting nonglycosylated peptides39. It is therefore important to use an analytical approach that completely isolates the glycoprotein under study in order to facilitate and improve glycopeptide characterization. To this end, we first determine total clusterin concentration in ccRCC plasma (before and after nephrectomy) by an ELISA assay, Figure 4.1. An average concentration of 290±5 µg/mL was observed. Importantly, there was no significant difference in total plasma clusterin concentration in this cohort of patients prior to and after nephrectomy.

Figure 4.1: Box and whisker plot of ELISA measurement of clusterin concentration in RCC (pre,+) and RCC (post, -) plasma samples. We observed no significant difference (p value = 0.1031) in before and after nephrectomy RCC plasma samples. p value < 0.05 was considered significant.

102

This finding highlights the fact that measurement of total clusterin cannot inform on the presence of ccRCC. In contrast, assays designed to capture specific glycoforms of clusterin may contribute to ccRCC disease monitoring. Below we describe changes in clusterin glycosylation that may inform such assays.

We used a semi-automated immuno-affinity purification platform to generate clusterin enriched extracts from patient plasma. To evaluate the performance and efficiency of this platform, we analyzed pooled reference female plasma purchased from Bioreclamation (Jericho,

NY) at different loading amounts (50 - 100 µL). Testing the loading capacity helped to ensure minimal sample loss and minimal sample-to-sample carryover during each clusterin purification process. After SDS-PAGE analysis, 50 µL ccRCC plasma volume corresponding to 14 ± 2µg of clusterin as estimated based on ELISA was established to be the optimal loading amount and the purity of isolated clusterin estimated to be 70%.

A successful proteolytic digestion strategy depends on the glycoprotein under study. In some instances, trypsin is sufficient for complete digestion; however, some glycoproteins require a combination of proteolytic enzymes for efficient digestion40. In a preliminary study, we investigated the efficiency of two proteolytic enzymes; trypsin and Glu-C for the digestion of purified clusterin treated with PNGase-F to select the appropriate enzyme for this study. Trypsin digestion on its own generated 3 glycopeptides (30% overall sequence coverage): similarly, Glu-

C digestion on its own generated 2 glycopeptides (17% overall sequence coverage). Combined digestion with Glu-C and trypsin generated 5 potential glycopeptides (and >80% overall sequence coverage). A detailed summary of identified glycopeptides using an optimized protein digestion procedure involving combination of Glu-C and trypsin enzymes is presented in Table

4.1. It is interesting to note that neither the aspartic acid form (after PNGase F digestion) nor the

103

asparagine (N) form of N-linked potential glycopeptides sites 103 and 145 were identified in replicate samples. A possible reason may be that the specific enzyme cleavage results in long hydrophobic stretches in these glycopeptides (ELPGVCNETMMALWEECKPCLK, 2696.1786;

QLEEFLNQSSPFYFWMNGDRIDSLLE, 3178.4825), which may be responsible for low recovery during LC-MS analysis.

104

Table 4.1: Enzymatic glycopeptides identified by nano-LC-MS/MS analysis Glycosylation Trypsin MH+ (Da) GluC MH+ (Da) Site occupancy status site N-86 KKEDALN/DETR 1204.6003 KKEDALN/DETRE 1333.7053 Fully glycosylated KKKEDALN/DETR 1332.6877 KKEDALN/DETRESE 1549.7702 Fully glycosylated KKKEDALN/DETRE 1461.7681 Fully glycosylated KKKEDALN/DETRESE 1677.8401 Fully glycosylated

N-103* ELPGVC†NETMMALWEEC†KPC†LK 2696.1786 N/A

N-145* QLEEFLNQSSPFYFWMNGDRIDSLLE 3178.4825 FLN/DQSSPFYFWMNGDRIDSLLE 2680.2187 N/A

N-291 EIRHN/DSTGC†LR 1343.5994 IRHN/DSTGC†LR 1214.6058 Fully glycosylated HN/DSTGC†LR 945.4207 Fully glycosylated

N-317 EILSVDC†STNNPSQAK 1762.8255 ILSVDC†STNNPSQAKLRRE 2188.1438 Not glycosylated

N-354 MLN/DTSSLLEQLNE 1492.7201 Partially glycosylated

N-374 LAN/DLTQGEDQYYLR 1684.8172 Fully glycosylated * N-linked glycopeptides not identified in replicate sample analysis

†Carbarmidomethylated cystein

105

CID tandem mass spectrometry has been applied successfully to the analysis of glycopeptides although measurement of low levels of glycopeptides still remains a challenge. In this study we used both data dependent CID LC-MS analysis and targeted CID LC-MS for the characterization of clusterin glycopeptides.

Data dependent analysis of PNGase-F treated and un-treated samples produced a candidate list of glycopeptides, of glycan site occupancy and of glycoforms, all available for further analysis. In order to characterize and quantify specific glycan structures and their corresponding glycopeptide backbones we used CID-MS2 and MS3 fragmentation. This targeted approach improved our ability to quantify specific glycoforms present in low levels, such as fucosylated glycans.

4.4.2 Glycan occupancy analysis

To maximize the analytical return from the available clinical samples, mass spectrometry method optimization including N-linked site identification (step 1) and glycan site occupancy (step 2) was performed using pooled clinical samples from 3 patients (RCC(+), n=3 and RCC(-), n=3) in three replicates. Step 1 and step 2 were completed as described before26 with slight modifications as presented in Figure 4.2.

106

Figure 4.2: Flow diagram showing tandem mass spectrometry approach for the characterization of clusterin N-linked glycan sites.

Briefly, we perform a database search using SEQUEST algorithm against clusterin sequence

(UniProt P10909) on PNGase-F treated and untreated mass spectrometry data, with Asn (N) deamidation set as variable modification. Deamidation at N residues that may occur as a result of sample manipulation was first evaluated in PNGase-F untreated samples from database search results. No observations of induced deamidation (N→D) were found on peptides identified with

NXS/T sequon after this initial database search, therefore, mass defect resulting from PNGase-F treatment (N→D) (deglycosylation) was used for initial screening to identify glycopeptides in samples treated with PNGase-F. In addition, in order to locate glycopeptides elution periods, we extracted glycan oxonium ions with m/z 366, 528 and 657 from data-dependent CID MS/MS

107

spectra of samples that had not been subjected to PNGase-F treatment. Glycopeptides with partial glycan occupancy were observed as both the non-glycosylated peptides (Asn) and the corresponding Asp containing peptides after PNGase-F treatment. Estimation of the degree of occupancy at each previously identified specific peptide site was based on peak areas of extracted ion chromatogram for de-glycosylated and non-glycosylated peptides (Appendix F), based on the assumption of similar MS response factors for the glycosylated versus deglycosylated peptides as previously described26. Of the seven potential glycosylation sites, five sites were observed and validated in replicate samples; three were fully glycosylated, one was partially glycosylated and one was not glycosylated (see Table 2 above). N-linked sites 86, 291, and 374 were found to be fully occupied showing 92%, 90%, and 96% occupancy respectively, while N-linked site 354 exhibited partial glycosylation with 75% glycan occupancy. An average variation of <4.8 %CV was recorded for three analytical replicates of pooled ± RCC samples showing good reproducibility. No glycan attachment was observed for the putative N-linked site

317.

The optimized protocol for combined digestion of clusterin with Glu-C and trypsin resulted in variable cleavage sites, due to both the presence of adjacent Lys (K) and Glu (E) residues for the N-linked sites 86, 291, and 317 (see Table 2 above). It is important to note that

RCC(+) and RCC(-) patient’s samples showed similar degree of glycan site occupancy, however because glycan occupancy analysis was based on pools, this data represent an average of clusterin glycan occupancy for this small set of ccRCC patients and not an individual patient occupancy analysis.

108

4.4.3 Characterization of site-specific oligosaccharide heterogeneity

Glycopeptides elution times were estimated from data-dependent LC-MS analysis using the retention time of deglycosylated peptides obtained from PNGase-F treated samples and diagnostic oxonium ions (m/z 366, 528 and 657) typical of CID-MS2 fragmentation of a glycopeptide from un-treated PNGase-F digest 41, 42, which allowed us to characterize site specific glycoforms of observed glycopeptides. The potential sugar structures, mainly bi- and tri- antennary sialylated (monosialo, disialo, trisialo) glycans with varying fucosylation, were based on those identified during the earlier pilot glycomic study of released clusterin oligosaccharides35. The presence of these structures was then confirmed by GlycoMod43 and by manual examination and annotation of the experimental glycopeptide mass spectra. In Table 4.2, a summary of the observed site-specific glycoforms obtained from nano-LC-MS/MS analysis is provided, showing the highest intensity glycopeptides identified in three replicates of pooled samples. We excluded from the analysis the peptides that exhibited multiple enzyme cleavage sites due to digestion variability and we selected only the glycoforms at N-linked sites 354 and

374 for subsequent measurements.

109

Table 4.2: Overview of major identified site-specific glycoforms and their calculated mass and corresponding m/z of glycopeptides Peptide mass Charge Glyco-form composition Glyco-form mass m/z of glycopeptide m/z of glycopeptide error (monoisotopic) (monoisotopic) (calculated) (observed) [Δppm] KKEDALNETRESE 1547.74 3 (GlcNAc)4(Hex)5(NeuAc)2 2222.78 1251.8437 1251.8430 0.5831 KKEDALNETRESE 1547.74 3 (GlcNAc)5(Hex)5(NeuAc)1Fuc 2262.81 1265.1871 1265.1879 -0.6613 KKEDALNETRESE 1547.74 3 (GlcNAc)5(Hex)6(NeuAc)1 2280.82 1271.1904 1271.1895 0.7054 KKEDALNETRESE 1547.74 3 (GlcNAc)4(Hex)3(NeuAc)3Fuc 2335.83 1289.5271 1289.5268 0.2042 KKEDALNETRESE 1547.74 3 (GlcNAc)4(Hex)4(NeuAc)3 2351.83 1294.8590 1294.8480 8.4951 EIRHNSTGCLR 1284.64 3 (GlcNAc)4(Hex)4(NeuAc)2Fuc 2206.79 1158.8137 1158.8047 7.7666 EIRHNSTGCLR 1284.64 3 (GlcNAc)4(Hex)5(NeuAc)2 2222.78 1164.1437 1164.1423 1.2026 EIRHNSTGCLR 1284.64 3 (GlcNAc)4(Hex)5(NeuAc)2Fuc 2368.84 1212.8304 1212.8318 -1.1571 EIRHNSTGCLR 1284.64 3 (GlcNAc)5(Hex)5(NeuAc)2 2425.86 1231.8371 1231.8370 0.0514 EIRHNSTGCLR 1284.64 3 (GlcNAc)5(Hex)5(NeuAc)2(Fuc)2 2717.98 1329.2104 1329.2082 1.6526 MLNTSSLLEQLNE 1490.73 3 (GlcNAc)4(Hex)4(NeuAc)2Fuc 2206.79 1227.5104 1227.5118 -1.1432 MLNTSSLLEQLNE 1490.73 3 (GlcNAc)4(Hex)5(NeuAc)2 2222.78 1232.8404 1232.8401 0.2406 MLNTSSLLEQLNE 1490.73 3 (GlcNAc)4(Hex)4(NeuAc)1Fuc 1915.69 1130.4771 1130.4698 6.4250 MLNTSSLLEQLNE 1490.73 3 (GlcNAc)5(Hex)4(NeuAc)1(Fuc)2 2264.83 1246.8571 1246.8559 0.9330 MLNTSSLLEQLNE 1490.73 3 (GlcNAc)5(Hex)6(NeuAc)2 2587.92 1354.5533 1354.5512 1.5503 MLNTSSLLEQLNE 1490.73 3 (GlcNAc)6(Hex)4(NeuAc)3 2757.98 1411.2404 1411.2427 -1.6321 MLNTSSLLEQLNE 1490.73 3 (GlcNAc)6(Hex)5(NeuAc)2Fuc 2775.00 1416.9137 1416.9123 1.0092 MLNTSSLLEQLNE 1490.73 3 (GlcNAc)6(Hex)6(NeuAc)1Fuc 3132.12 1535.9537 1535.9523 0.9310 MLNTSSLLEQLNE 1490.73 3 (GlcNAc)6(Hex)5(NeuAc)2(Fuc)2 2921.06 1465.5997 1465.5973 1.6580 LANLTQGEDQYYLR 1682.83 3 (GlcNAc)4(Hex)5(NeuAc)2 2222.78 1296.8737 1296.8758 -1.6424 LANLTQGEDQYYLR 1682.83 3 (GlcNAc)4(Hex)5(NeuAc)2Fuc 2368.84 1345.5604 1345.5601 0.2230 LANLTQGEDQYYLR 1682.83 3 (GlcNAc)5(Hex)6(NeuAc)2 2587.92 1418.5870 1418.5862 0.5851 LANLTQGEDQYYLR 1682.83 3 (GlcNAc)5(Hex)7(NeuAc)2 2749.97 1472.7037 1472.7021 1.0864 LANLTQGEDQYYLR 1682.83 3 (GlcNAc)5(Hex)6(NeuAc)3 2879.01 1515.6189 1515.6192 -0.1979 LANLTQGEDQYYLR 1682.83 3 (GlcNAc)5(Hex)6(NeuAc)3Fuc 3025.07 1564.3037 1564.3011 1.6621 LANLTQGEDQYYLR 1682.83 3 (GlcNAc)5(Hex)7(NeuAc)2(Fuc)2 3042.08 1569.9737 1569.9701 2.2930

110

Glycoform expression levels of partial N-linked site 354 were similar in RCC(+) and

RCC(-) samples, while the levels of afucosylated bi-antennary digalactosylated disialylated glycan (A2G2S2) and a fucosylated bi-antennary digalactosylated disialylated (FA2G2S2) glycan attached at glycosylation site 374 were observed to differ between RCC(+) and RCC(-) samples (discussed below). Figure 4.3 shows an example representation of LC-MS base peak chromatogram showing glycan heterogeneity at glycosylation siteN-374 with an elution time window 38.5-40.5 (red box, Figure 4.3A).

Figure 4.3: Representative LC-MS/MS analysis of clusterin glycoprotein digests. (A) Base peak FT-MS chromatogram (B) Average MS spectrum showing annotated glycoforms at elution time window 38.5 - 40.5 (red box insert). For illustration purposes, only major glycoforms identified on N-linked site 374 were annotated and shown. Red circles indicate bi-antennary sialic acid (m/z 1296.8758) and bi-antennary core fucose (m/z 1345.5601) glycoforms observed to be changing between RCC (+) and after RCC (-) samples. NL: normalized level. GlcNAc: N- acetylglucosamine; Fuc: fucose; Hex: hexose; NeuAc: N-acetylneuraminic acid (sialic acid).

111

The major glycoforms that differ between RCC(+) and RCC(-) samples are annotated and indicated with red circles in Figure 4.3B; A2G2S2 glycan (m/z 1296.8758, +3 charge) and its fucosylated counterpart, FA2G2S2 (m/z 1345.5601, +3 charge) glycoforms. Based on the observation that glycopeptide LAN374LTQGEDQYYLR is fully glycosylated with site-specific glycan changes in RCC (+) compared to RCC (-) samples, we decided to characterize and quantitate this diagnostic glycopeptide with these site-specific changes in our patients sample set.

4.4.4 Glycan structures for the selected glycopeptide residue 372-385, N-374

It is known that nano-LC-MS reversed phase separation results in the elution of a glycopeptide family of glycoforms with similar retention times 39. Therefore, we used a targeted LC-MS approach on a high resolution accurate mass spectrometer (LTQ-Orbitrap Elite, Thermo

Scientific) to separate individual glycoform precursor ions m/z 1296.88 and m/z 1345.56 bearing the same peptide backbone with a different glycan (e.g. the fucosylated versus the non- fucosylated oligosaccharide) for characterization and quantification. Glycoforms identified at N- linked site 374 were predominantly expressed in the +3 charge state and therefore only this charged state was used for quantification.

CID-MS2 fragmentation of a glycopeptide generates diagnostic oxonium and glycan-B ions (m/z 366, 528 and 657) plus glycopeptide fragments (Y-ions) leaving the peptide backbone mostly un-fragmented 44, 45, while CID-MS3 fragmentation is used to confirm the backbone structure of the glycopeptide. In Figure 4.4, we show CID-MS2 (A) and CID-MS3 (B) fragmentation spectra of the clusterin glycopeptide LAN374LTQGEDQYYLR with precursor ion m/z 1296.88 (+3 charge). First, glycopeptide precursor ion m/z 1296.88 was fragmented by CID resulting in diagnostic oxonium ions m/z 366, 528.29, 657.30 plus glycopeptide fragments

(Figure 4.4A). As CID-MS2 fragmentation gives little or no information of the peptide backbone

112

structure, we pre-selected precursor ion m/z 944.29 which carries GlcNAc plus the intact peptide from CID-MS2 fragmentation spectrum and performed CID-MS3 fragmentation. Annotation of b and y ion series (inset) confirmed the glycopeptide LAN374LTQGEDQYYLR (Figure 4.4B).

Figure 4.4: CID-MS2 and MS3 glycopeptide analysis. (A) Annotated CID-MS/MS (MS2) spectra of the precursor ion 1296.88 (+ 3 charge state) results in singly charged oxonium ions and glycopeptide fragments. (B) Annotated CID-MS/MS/MS (MS3) spectra of glycopeptide fragment ion 944.29 (+2 charge state) generated from CID-MS/MS analysis, with b and y ion series (insert), which confirms glycopeptide backbone structure. NL: normalized level. Yellow circle: Galactose; Blue square: N-acetylglucosamine; Green circle: Mannose; Purple diamond: sialic acid; Red triangle: Fucose.

In a similar fashion, glycopeptide precursor ion m/z 1345.56 bearing a core fucosylated bi-antennary digalactosylated disialylated glycan was fragmented using CID-MS2 followed by

CID-MS3 fragmentation of pre-selected precursor ion m/z 1117.84. Diagnostic fragment ions m/z

113

1117.84 (MS2) and m/z617.43 (MS3) confirmed the core fucose status of the glycan, which enabled its differentiation from the isobaric antennary type fucosylated glycan. In Figure 4.5A, glycopeptide fragment ions and oxonium ions typically observed in CID-MS2 spectra are indicated, while in Figure 4.5B we show the b and y ion series corresponding to glycopeptide backbone.

Figure 4.5: CID-MS2 and MS3 glycopeptide analysis. (A) Annotated CID-MS/MS (MS2) spectra of the precursor ion 1345.56(+ 3 charge state) results in singly charged oxonium ions and glycopeptide fragments. (B) Annotated CID-MS/MS/MS (MS3) spectra of glycopeptide fragment ion 1117.84 (+2 charge state) generated from CID-MS/MS analysis, with b and y ion series (insert) confirming glycopeptide backbone structure. NL: normalized level. Yellow circle: Galactose; Blue square: N-acetylglucosamine; Green circle: Mannose; Purple diamond: sialic acid; Red triangle: Fucose.

114

4.4.5 Quantitation of targeted glycoforms in clinical samples

After we completed the characterization of clusterin glycoforms and selected the optimal glycopeptides, we quantified the levels of clusterin N374 A2G2S2 and FA2G2S2 glycoforms in

RCC (+) and RCC (-) patient samples. We estimated the relative abundance of each glycoform based on the peak area of the extracted ion chromatogram, normalized to the peak area of the non-glycosylated form of the peptide residues 69-78, which were reproducibly observed in the enzyme digest of all clinical samples, Figure 4.6.

Figure 4.6: Representative CID-MS2 fragmentation spectrum of the precursor ion m/z 559.309(+ 2 charge state) of a non-glycosylated peptide (insert) used as internal standard for normalization of glycopeptides. NL: normalized level

In order to demonstrate the analytical reproducibility of the relative quantification method we first performed three analytical replicates of a subset of ten patient samples (RCC (+), n=5; RCC (-), n=5). Good reproducibility was recorded with the following averaged % CVs; 1.5

115

% and 1.3 % for the +RCC and –RCC A2G2S2 glycoform respectively; 3.3 % and 2.6 % for the

+RCC and –RCC FA2G2S2 glycoform respectively, Table 4.4.

Table 4.3: Analytical reproducibility peak area measurements of 5 ccRCC patient samples (averaged peak areas of 3 analytical replicates shown).

A2G2S2 FA2G2S2 Patient ID RCC (+) RCC(-) RCC (+) RCC(-) 50 21 x 105(0.3) 35 x 105(0.4) 3.5 x 104 (1.3) 5.9 x 104 (0.9) 46 23 x 105(0.5) 172 x 105(0.2) 18.5 x 104(1.9) 82.9 x 104 (1.0) 223 5 x 105 (0.9) 25 x 105(0.2) 6.4 x 104 (1.7) 77 x 104 (1.5) 53 22 x 105(0.7) 57 x 105(0.3) 39.2 x 104(1.3) 110 x 104 (0.8) 88 46 x 105 (0.3) 47 x 105(0.5) 21 x 104 (1.7) 25.7 x 104 (0.5) Average 23.4 x 105 (0.5) 67.2 x 105 (0.3) 17.7 x 104 (1.6) 60.3 x 104 (0.9) %CV in parenthesis

After establishing reproducibility, the glycoforms in RCC (+) and RCC (-) individual clinical samples (n=17, Figure 4.7A, 4.7B, 4.7C, and 4.7D) were quantified. The experimentally determined levels of the A2G2S2 glycoform showed a significant difference (Wilcoxon Signed-

Rank test, p value < 0.0009) between RCC (+) and RCC (-) samples. As indicated in Figure 4.7

(A and B), a marked increase in the levels of the A2G2S2 glycoform across all screened samples was noted for patients without ccRCC. Similar differences were observed in the levels of the

FA2G2S2 glycoform (Figure 4.7, 4.7C and 4.7D). We observed significant difference between

RCC (+) and RCC (-) samples (p value < 0.0003).

116

Figure 4.7: Column chart and box plot of A2G2S2 (A and B), and FA2G2S2 (C and D) comparison between before RCC (+) and after RCC (-) nephrectomy samples. Wilcoxon Signed- Rank Test was used to calculate p (2-tail) values and p value < 0.05 was considered significant. Blue square: RCC (+); Red square: RCC (-).

To the best of our knowledge this is the first report highlighting site-specific glycan occupancy and site-specific glycan alterations in clusterin of ccRCC patients. Our data highlight the importance of clusterin glycosylation as a candidate biomarker for ccRCC and our findings need further validation in an independent, large cohort of ccRCC patients.

The biological function(s) of clusterin are not yet well defined. Clusterin has been implicated in cancer progression and inactivation of the complement system has been reported 46,

117

47. Clusterin from prostate cancer cells was found to interact with SCF-betaTrCP E3 ligase family members, resulting in enhancing COMMD1 and I-kappaB proteasomal degradation 48.

Moreover, in the kidney, clusterin is known to be expressed in high abundance 49, 50. In the present study, immuno-affinity enriched clusterin site-specific glycan alterations were identified in patients presented with ccRCC. A2G2S2 and FA2G2S2 glycoform levels at N-linked site 374 were observed to significantly decrease, whiles tri-antennary trigalactosyl disialylated glycans

(A3G3S2) showed no changes at N-linked site 374 in patients with ccRCC compared to patients without ccRCC.

Our previous report that focused on the total glycan profile of clusterin is in agreement with the current observation of FA2G2S2 glycan levels in patients with ccRCC. A3G3S2 glycans that previously showed higher levels in patients without ccRCC presently exhibited no changes at N-linked site 374. One potential reason for this discrepancy may be the effect of sialic acids negative charges on the positive ions generated by electro-spray ionization (ESI). Also, a negative correlation in the levels of A2G2S2 glycan bearing α 2-3 linked sialic acids was previously observed in ccRCC patients following curative nephrectomy. This may be attributed to glycan changes occurring at individual sites rather than the entire N-linked sites where a site specific change can be diluted by the large numbers of other glycan structures which do not contain disease related changes.

Furthermore, the observation of lower amounts of A2G2S2 glycans in ccRCC cancer samples is consistent with many studies. For example, Arnold et al. reported a decrease in the levels of bi-antennary sialic acid glycans in serum of lung cancer patients 51. A similar observation was reported in an earlier study by Tabarés et al.52. Since sialic acid linkage orientation was not explored in this current study, future investigations on linkage information

118

(currently underway in our laboratory) will be an important step.

Alterations in oligosaccharide fucosylation are a hallmark of cancer 53. In several instances, there are reports of up-regulation of fucosylated glycans in cancer 54. It is suggested that certain organ specific glycoproteins (e.g. haptoglobin in the liver) contains fucose specific signals that induces fucosylated glycans production in the liver 55, 56. It is thought that fucosylation becomes dysregulated in cancer cells, resulting in release of fucosylated glycans into the serum rather than into the liver bile ducts. However, the observed significant decrease in core-fucosylated glycan amounts of clusterin in ccRCC is in agreement with our previous report35 and other published works. Tabaréset al. and Sarrats et al. reported a significant decrease in fucosylated glycans in prostate-specific antigen (PSA) from prostate cancer (PCa) patients 52,

57. Furthermore, White et al. noted a decrease in core-fucose glycans present in seminal plasma

PSA from prostate cancer patients compared to seminal plasma from healthy benign prostatic hyperplasia (BPH) patients 58. Further establishing will enhance the clinical potential of our observations in this chapter whether ccRCC cells are directly producing clusterin that exhibits decreased fucosylation or the observed changes in fucosylation are a secondary effect of tumor microenvironment due to environmental factors.

4.4.6 Lectin blot assay

To further validate the discovery made by analytical methods, we investigated glycan changes in

RCC (+) and RCC (-) plasma samples by lectin blotting using Aleuria aurantia lectin (AAL), which has an affinity for fucosylated glycans. As shown in Figure 4.8A & 4.8B, lectin blotting with AAL confirmed the observed changes in glycan fucosylation levels between the RCC (+) and RCC (-) clusterin samples, with a fold change ranging between 2 to 10. AAL lectin blot reproducibility of the same subset was demonstrated with three analytical replicates.

119

Figure 4.8: Representative lectin blot assay to determine total glycan changes in before (+) and after (-) nephrectomy plasma samples. Biotinylated lectin Aleuria aurantia lectin (AAL) probed with 1 µg/mL streptavidin–HRP showed significant variation in fucose glycan levels in RCC (+) compared to RCC (-) samples for both A and B . Blue square: RCC (+); Red square: RCC (-).

Probing alterations in protein glycosylation by lectin-based blots were reported before in cancer 59-61. Here we used lectin blotting as a complementary approach to analyze clusterin glycosylation and we showed significant elevation of fucosylation glycans in RCC (-) samples.

This result reaffirms our mass spectrometry site-specific glycan characterization data.

4.5 CONCLUSIONS

Cancer associated glycoforms and glycan attachment sites may serve as tumor biomarkers and/or therapeutic targets. In the current study, we sought to comprehensively characterize N-linked glycan sites of immuno-affinity purified clusterin present in the plasma of ccRCC patients before and after curative nephrectomy for localized disease and to evaluate disease-associated site-

120

specific changes using CID tandem mass spectrometry. We discovered that biantennary digalactosylated di-sialylated oligosaccharides, with and without fucose are significantly decreased in disease RCC (+) samples and verified observed reduced levels of fucosylated glycans with AAL lectin blots. This work demonstrates that glycosylation changes can be used to distinguish cancer and non-cancer samples.

121

4.6 REFERENCES

1. Weiss, R. H.; Lin, P. Y., Kidney cancer: identification of novel targets for therapy. Kidney Int 2006, 69 (2), 224-32.

2. Pantuck, A. J.; Zisman, A.; Belldegrun, A. S., The changing natural history of renal cell carcinoma. J Urol 2001, 166 (5), 1611-23.

3. Yurist-Doutsch, S.; Chaban, B.; VanDyke, D. J.; Jarrell, K. F.; Eichler, J., Sweet to the extreme: protein glycosylation in Archaea. Mol Microbiol 2008, 68 (5), 1079-84.

4. Freeze, H. H., Congenital Disorders of Glycosylation: CDG-I, CDG-II, and beyond. Curr Mol Med 2007, 7 (4), 389-96.

5. Vigerust, D. J.; Shepherd, V. L., Virus glycosylation: role in virulence and immune interactions. Trends Microbiol 2007, 15 (5), 211-8.

6. Rambaruth, N. D.; Dwek, M. V., Cell surface glycan-lectin interactions in tumor metastasis. Acta Histochem 2011, 113 (6), 591-600.

7. Meany, D. L.; Zhang, Z.; Sokoll, L. J.; Zhang, H.; Chan, D. W., Glycoproteomics for prostate cancer detection: changes in serum PSA glycosylation patterns. Journal of Proteome Research 2009, 8 (2), 613-9.

8. Hammarstrom, S., The carcinoembryonic antigen (CEA) family: structures, suggested functions and expression in normal and malignant tissues. Semin Cancer Biol 1999, 9 (2), 67-81.

9. Moss, E. L.; Hollingworth, J.; Reynolds, T. M., The role of CA125 in clinical practice. Journal of Clinical Pathology 2005, 58 (3), 308-12.

10. Poon, T. C.; Mok, T. S.; Chan, A. T.; Chan, C. M.; Leong, V.; Tsui, S. H.; Leung, T. W.; Wong, H. T.; Ho, S. K.; Johnson, P. J., Quantification and utility of monosialylated alpha- fetoprotein in the diagnosis of hepatocellular carcinoma with nondiagnostic serum total alpha- fetoprotein. Clinical Chemistry 2002, 48 (7), 1021-7.

11. Meany, D. L.; Chan, D. W., Aberrant glycosylation associated with enzymes as cancer biomarkers. Clin Proteomics 2011, 8 (1), 7.

12. Bones, J.; McLoughlin, N.; Hilliard, M.; Wynne, K.; Karger, B. L.; Rudd, P. M., 2D-LC analysis of BRP 3 erythropoietin N-glycosylation using anion exchange fractionation and hydrophilic interaction UPLC reveals long poly-N-acetyl lactosamine extensions. Analytical Chemistry 2011, 83 (11), 4154-62.

13. Deshpande, N.; Jensen, P. H.; Packer, N. H.; Kolarich, D., GlycoSpectrumScan: fishing glycopeptides from MS spectra of protease digests of human colostrum sIgA. Journal of Proteome Research 2010, 9 (2), 1063-75.

122

14. Lee, A.; Chick, J. M.; Kolarich, D.; Haynes, P. A.; Robertson, G. R.; Tsoli, M.; Jankova, L.; Clarke, S. J.; Packer, N. H.; Baker, M. S., Liver membrane proteome glycosylation changes in mice bearing an extra-hepatic tumor. Molecular & cellular proteomics : MCP 2011, 10 (9), M900538MCP200.

15. Lee, A.; Nakano, M.; Hincapie, M.; Kolarich, D.; Baker, M. S.; Hancock, W. S.; Packer, N. H., The lectin riddle: glycoproteins fractionated from complex mixtures have similar glycomic profiles. Omics : a journal of integrative biology 2010, 14 (4), 487-99.

16. Nakano, M.; Nakagawa, T.; Ito, T.; Kitada, T.; Hijioka, T.; Kasahara, A.; Tajiri, M.; Wada, Y.; Taniguchi, N.; Miyoshi, E., Site-specific analysis of N-glycans on haptoglobin in sera of patients with pancreatic cancer: a novel approach for the development of tumor markers. International journal of cancer. Journal international du cancer 2008, 122 (10), 2301-9.

17. Nwosu, C. C.; Seipert, R. R.; Strum, J. S.; Hua, S. S.; An, H. J.; Zivkovic, A. M.; German, B. J.; Lebrilla, C. B., Simultaneous and extensive site-specific N- and O-glycosylation analysis in protein mixtures. Journal of Proteome Research 2011, 10 (5), 2612-24.

18. Pompach, P.; Chandler, K. B.; Lan, R.; Edwards, N.; Goldman, R., Semi-automated identification of N-Glycopeptides by hydrophilic interaction chromatography, nano-reverse- phase LC-MS/MS, and glycan database search. Journal of Proteome Research 2012, 11 (3), 1728-40.

19. Jensen, P. H.; Karlsson, N. G.; Kolarich, D.; Packer, N. H., Structural analysis of N- and O-glycans released from glycoproteins. Nature Protocols 2012, 7 (7), 1299-310.

20. Du, M. Q.; Hutchinson, W. L.; Johnson, P. J.; Williams, R., Differential alpha-fetoprotein lectin binding in hepatocellular carcinoma. Diagnostic utility at low serum levels. Cancer 1991, 67 (2), 476-80.

21. Taketa, K.; Sekiya, C.; Namiki, M.; Akamatsu, K.; Ohta, Y.; Endo, Y.; Kosaka, K., Lectin-reactive profiles of alpha-fetoprotein characterizing hepatocellular carcinoma and related conditions. Gastroenterology 1990, 99 (2), 508-18.

22. Sterling, R. K.; Jeffers, L.; Gordon, F.; Sherman, M.; Venook, A. P.; Reddy, K. R.; Satomura, S.; Schwartz, M. E., Clinical utility of AFP-L3% measurement in North American patients with HCV-related cirrhosis. Am J Gastroenterol 2007, 102 (10), 2196-205.

23. Kolarich, D.; Jensen, P. H.; Altmann, F.; Packer, N. H., Determination of site-specific glycan heterogeneity on glycoproteins. Nature Protocols 2012, 7 (7), 1285-98.

24. Hua, S.; Nwosu, C. C.; Strum, J. S.; Seipert, R. R.; An, H. J.; Zivkovic, A. M.; German, J. B.; Lebrilla, C. B., Site-specific protein glycosylation analysis with glycan isomer differentiation. Analytical and Bioanalytical Chemistry 2012, 403 (5), 1291-302.

123

25. Hua, S.; Lebrilla, C.; An, H. J., Application of nano-LC-based glycomics towards biomarker discovery. Bioanalysis 2011, 3 (22), 2573-85.

26. Wang, D.; Hincapie, M.; Rejtar, T.; Karger, B. L., Ultrasensitive characterization of site- specific glycosylation of affinity-purified haptoglobin from lung cancer patient plasma using 10 mum i.d. porous layer open tubular liquid chromatography-linear ion trap collision-induced dissociation/electron transfer dissociation mass spectrometry. Analytical Chemistry 2011, 83 (6), 2029-37.

27. Blaschuk, O.; Burdzy, K.; Fritz, I. B., Purification and characterization of a cell- aggregating factor (clusterin), the major glycoprotein in ram rete testis fluid. The Journal of biological chemistry 1983, 258 (12), 7714-20.

28. Chen, X.; Halberg, R. B.; Ehrhardt, W. M.; Torrealba, J.; Dove, W. F., Clusterin as a biomarker in murine and human intestinal neoplasia. Proc Natl Acad Sci U S A 2003, 100 (16), 9530-5.

29. Miyake, H.; Gleave, M. E.; Arakawa, S.; Kamidono, S.; Hara, I., Introducing the clusterin gene into human renal cell carcinoma cells enhances their metastatic potential. J Urol 2002, 167 (5), 2203-8.

30. Redondo, M.; Villar, E.; Torres-Munoz, J.; Tellez, T.; Morell, M.; Petito, C. K., Overexpression of clusterin in human breast carcinoma. Am J Pathol 2000, 157 (2), 393-9.

31. Steinberg, J.; Oyasu, R.; Lang, S.; Sintich, S.; Rademaker, A.; Lee, C.; Kozlowski, J. M.; Sensibar, J. A., Intracellular levels of SGP-2 (Clusterin) correlate with tumor grade in prostate cancer. Clin Cancer Res 1997, 3 (10), 1707-11.

32. Xie, D.; Lau, S. H.; Sham, J. S.; Wu, Q. L.; Fang, Y.; Liang, L. Z.; Che, L. H.; Zeng, Y. X.; Guan, X. Y., Up-regulated expression of cytoplasmic clusterin in human ovarian carcinoma. Cancer 2005, 103 (2), 277-83.

33. Nakamura, E.; Abreu-e-Lima, P.; Awakura, Y.; Inoue, T.; Kamoto, T.; Ogawa, O.; Kotani, H.; Manabe, T.; Zhang, G. J.; Kondo, K.; Nose, V.; Kaelin, W. G., Jr., Clusterin is a secreted marker for a hypoxia-inducible factor-independent function of the von Hippel-Lindau tumor suppressor protein. Am J Pathol 2006, 168 (2), 574-84.

34. Kurahashi, T.; Muramaki, M.; Yamanaka, K.; Hara, I.; Miyake, H., Expression of the secreted form of clusterin protein in renal cell carcinoma as a predictor of disease extension. BJU international 2005, 96 (6), 895-9.

35. Tousi, F.; Bones, J.; Iliopoulos, O.; Hancock, W. S.; Hincapie, M., Multidimensional liquid chromatography platform for profiling alterations of clusterin N-glycosylation in the plasma of patients with renal cell carcinoma. Journal of chromatography. A 2012, 1256, 121-8.

124

36. Damerell, D.; Ceroni, A.; Maass, K.; Ranzinger, R.; Dell, A.; Haslam, S. M., The GlycanBuilder and GlycoWorkbench glycoinformatics tools: updates and new developments. Biol Chem 2012, 393 (11), 1357-62.

37. Jenne, D. E.; Tschopp, J., Clusterin: the intriguing guises of a widely expressed glycoprotein. Trends Biochem Sci 1992, 17 (4), 154-9.

38. Wuhrer, M.; Catalina, M. I.; Deelder, A. M.; Hokke, C. H., Glycoproteomics based on tandem mass spectrometry of glycopeptides. J Chromatogr B Analyt Technol Biomed Life Sci 2007, 849 (1-2), 115-28.

39. Wada, Y.; Azadi, P.; Costello, C. E.; Dell, A.; Dwek, R. A.; Geyer, H.; Geyer, R.; Kakehi, K.; Karlsson, N. G.; Kato, K.; Kawasaki, N.; Khoo, K. H.; Kim, S.; Kondo, A.; Lattova, E.; Mechref, Y.; Miyoshi, E.; Nakamura, K.; Narimatsu, H.; Novotny, M. V.; Packer, N. H.; Perreault, H.; Peter-Katalinic, J.; Pohlentz, G.; Reinhold, V. N.; Rudd, P. M.; Suzuki, A.; Taniguchi, N., Comparison of the methods for profiling glycoprotein glycans--HUPO Human Disease Glycomics/Proteome Initiative multi-institutional study. Glycobiology 2007, 17 (4), 411- 22.

40. Chen, R.; Jiang, X.; Sun, D.; Han, G.; Wang, F.; Ye, M.; Wang, L.; Zou, H., Glycoproteomics analysis of human liver tissue by combination of multiple enzyme digestion and hydrazide chemistry. Journal of Proteome Research 2009, 8 (2), 651-61.

41. Huddleston, M. J.; Bean, M. F.; Carr, S. A., Collisional fragmentation of glycopeptides by electrospray ionization LC/MS and LC/MS/MS: methods for selective detection of glycopeptides in protein digests. Analytical Chemistry 1993, 65 (7), 877-84.

42. Sullivan, B.; Addona, T. A.; Carr, S. A., Selective detection of glycopeptides on ion trap mass spectrometers. Analytical Chemistry 2004, 76 (11), 3112-8.

43. Cooper, C. A.; Gasteiger, E.; Packer, N. H., GlycoMod--a software tool for determining glycosylation compositions from mass spectrometric data. Proteomics 2001, 1 (2), 340-9.

44. Conboy, J. J.; Henion, J. D., The determination of glycopeptides by liquid chromatography/mass spectrometry with collision-induced dissociation. J Am Soc Mass Spectrom 1992, 3 (8), 804-14.

45. Hunter, A. P.; Games, D. E., Evaluation of glycosylation site heterogeneity and selective identification of glycopeptides in proteolytic digests of bovine alpha 1-acid glycoprotein by mass spectrometry. Rapid Commun Mass Spectrom 1995, 9 (1), 42-56.

46. Tschopp, J.; French, L. E., Clusterin: modulation of complement function. Clin Exp Immunol 1994, 97 Suppl 2, 11-4.

125

47. Shannan, B.; Seifert, M.; Leskov, K.; Willis, J.; Boothman, D.; Tilgen, W.; Reichrath, J., Challenge and promise: roles for clusterin in pathogenesis, progression and therapy of cancer. Cell Death Differ 2006, 13 (1), 12-9.

48. Zoubeidi, A.; Ettinger, S.; Beraldi, E.; Hadaschik, B.; Zardan, A.; Klomp, L. W.; Nelson, C. C.; Rennie, P. S.; Gleave, M. E., Clusterin facilitates COMMD1 and I-kappaB degradation to enhance NF-kappaB activity in prostate cancer cells. Mol Cancer Res 2010, 8 (1), 119-30.

49. Laslop, A.; Steiner, H. J.; Egger, C.; Wolkersdorfer, M.; Kapelari, S.; Hogue-Angeletti, R.; Erickson, J. D.; Fischer-Colbrie, R.; Winkler, H., Glycoprotein III (clusterin, sulfated glycoprotein 2) in endocrine, nervous, and other tissues: immunochemical characterization, subcellular localization, and regulation of biosynthesis. J Neurochem 1993, 61 (4), 1498-505.

50. Trougakos, I. P.; Gonos, E. S., Clusterin/apolipoprotein J in human aging and cancer. Int J Biochem Cell Biol 2002, 34 (11), 1430-48.

51. Arnold, J. N.; Saldova, R.; Galligan, M. C.; Murphy, T. B.; Mimura-Kimura, Y.; Telford, J. E.; Godwin, A. K.; Rudd, P. M., Novel glycan biomarkers for the detection of lung cancer. Journal of Proteome Research 2011, 10 (4), 1755-64.

52. Tabares, G.; Radcliffe, C. M.; Barrabes, S.; Ramirez, M.; Aleixandre, R. N.; Hoesel, W.; Dwek, R. A.; Rudd, P. M.; Peracaula, R.; de Llorens, R., Different glycan structures in prostate- specific antigen from prostate cancer sera in relation to seminal plasma PSA. Glycobiology 2006, 16 (2), 132-45.

53. Miyoshi, E.; Noda, K.; Yamaguchi, Y.; Inoue, S.; Ikeda, Y.; Wang, W.; Ko, J. H.; Uozumi, N.; Li, W.; Taniguchi, N., The alpha1-6-fucosyltransferase gene and its biological significance. Biochimica et biophysica acta 1999, 1473 (1), 9-20.

54. Santer, U. V.; Glick, M. C., Partial structure of a membrane glycopeptide from virus- transformed hamster cells. Biochemistry 1979, 18 (12), 2533-40.

55. Okuyama, N.; Ide, Y.; Nakano, M.; Nakagawa, T.; Yamanaka, K.; Moriwaki, K.; Murata, K.; Ohigashi, H.; Yokoyama, S.; Eguchi, H.; Ishikawa, O.; Ito, T.; Kato, M.; Kasahara, A.; Kawano, S.; Gu, J.; Taniguchi, N.; Miyoshi, E., Fucosylated haptoglobin is a novel marker for pancreatic cancer: a detailed analysis of the oligosaccharide structure and a possible mechanism for fucosylation. International journal of cancer. Journal international du cancer 2006, 118 (11), 2803-8.

56. Nakagawa, T.; Uozumi, N.; Nakano, M.; Mizuno-Horikawa, Y.; Okuyama, N.; Taguchi, T.; Gu, J.; Kondo, A.; Taniguchi, N.; Miyoshi, E., Fucosylation of N-glycans regulates the secretion of hepatic glycoproteins into bile ducts. The Journal of biological chemistry 2006, 281 (40), 29797-806.

126

57. Sarrats, A.; Saldova, R.; Comet, J.; O'Donoghue, N.; de Llorens, R.; Rudd, P. M.; Peracaula, R., Glycan characterization of PSA 2-DE subforms from serum and seminal plasma. Omics : a journal of integrative biology 2010, 14 (4), 465-74.

58. White, K. Y.; Rodemich, L.; Nyalwidhe, J. O.; Comunale, M. A.; Clements, M. A.; Lance, R. S.; Schellhammer, P. F.; Mehta, A. S.; Semmes, O. J.; Drake, R. R., Glycomic characterization of prostate-specific antigen and prostatic acid phosphatase in prostate cancer and benign disease seminal plasma fluids. Journal of Proteome Research 2009, 8 (2), 620-30.

59. Patwa, T. H.; Zhao, J.; Anderson, M. A.; Simeone, D. M.; Lubman, D. M., Screening of glycosylation patterns in serum using natural glycoprotein microarrays and multi-lectin fluorescence detection. Analytical Chemistry 2006, 78 (18), 6411-21.

60. Qiu, Y.; Patwa, T. H.; Xu, L.; Shedden, K.; Misek, D. E.; Tuck, M.; Jin, G.; Ruffin, M. T.; Turgeon, D. K.; Synal, S.; Bresalier, R.; Marcon, N.; Brenner, D. E.; Lubman, D. M., Plasma glycoprotein profiling for colorectal cancer biomarker identification by lectin glycoarray and lectin blot. Journal of Proteome Research 2008, 7 (4), 1693-703.

61. Zhao, J.; Patwa, T. H.; Qiu, W.; Shedden, K.; Hinderer, R.; Misek, D. E.; Anderson, M. A.; Simeone, D. M.; Lubman, D. M., Glycoprotein microarrays with multi-lectin detection: unique lectin binding patterns as a tool for classifying normal, chronic pancreatitis and pancreatic cancer sera. Journal of Proteome Research 2007, 6 (5), 1864-74.

127

CHAPTER 5

CHARACTERIZATION OF GLYCOPROTEINS IN PANCREATIC CYST FLUID USING A

HIGH PERFORMANCE MULTIPLE LECTIN AFFINITY CHROMATOGRAPHY

PLATFORM

This work is published: Francisca Owusu Gbormittah, Brian B. Haab, Katie Partyka, Carolina Garcia-Ott, Marina Hancapie, William S. Hancock, J. Proteome Res., 2014, 13 (1), pp 289–299

128

5.1 ABSTRACT

Currently, pancreatic cancer is the fourth cause of cancer death. In 2013, it is estimated that approximately 38,460 people will die of pancreatic cancer. Early detection of malignant cyst

(pancreatic cancer precursor) is necessary to help prevent late diagnosis of the tumor. In this study, we characterized glycoproteins and non-glycoproteins on pooled mucinous (n=10) and non-mucinous (n=10) pancreatic cyst fluid to identify ‘proteins of interest’ to differentiate between mucinous cyst from non-mucinous cyst and investigate these proteins as potential biomarker targets. An automated multi-lectin affinity chromatography (M-LAC) platform was utilized for glycoprotein enrichment followed by nano-LC-MS/MS analysis. Spectral count quantitation allowed for the identification of proteins with significant differential levels in mucinous cysts from non-mucinous cysts of which one protein (periostin) was confirmed via immunoblotting. To exhaustively evaluate differentially expressed proteins, we used a number of proteomic tools including; gene ontology classification, pathway and network analysis,

Novoseek data mining and chromosome gene mapping. Utilization of complementary proteomic tools, revealed that several of the proteins such as mucin 6 (MUC6), bile salt-activated lipase

(CEL) and pyruvate kinase lysozyme M1/M2 with significant differential expression have strong association with pancreatic cancer. Further, chromosome gene mapping demonstrated co- expressions and co-localization of some proteins of interest including 14-3-3 protein epsilon

(YWHAE), pigment epithelium derived factor (SERPINF1) and oncogene p53.

129

5.2 INTRODUCTION

Pancreatic cancer is one of the deadliest cancers with a 95% mortality rate within 5years after diagnosis1. The main cause for an almost 100% death rate of pancreatic cancer is attributed to late detection of the tumor and subsequent late diagnosis of the disease2-4. It is a difficult task to accurately make a prognosis of pancreatic cancer because pancreatic tumors are pathologically diverse with similar clinical and radiological characteristics5-6. The most effective means to reduce mortality from pancreatic cancer may be to identify and remove precursor lesions before they progress to invasive cancer. Pancreatic cysts are potential precursors of pancreatic cancer that can be identified through non-invasive imaging and therefore detectable prior to progression7. Some pancreatic cyst lesions do not have malignant potential, including; pseudocysts and serous cystadenomas (referred to as non-mucinous cysts), and others are established cancer precursors, including mucinous cystic neoplasms and intraductal papillary mucinous neoplasms (referred to as mucinous cysts). Unfortunately, it is sometimes difficult to distinguish the mucinous from the non-mucinous cysts by imaging or by clinical symptoms6.

Although, there are a number of parameters and techniques currently available for classifying malignant lesions and non-malignant lesions, more needs to be done since none of these methods provides definitive results8.

Glycoproteomics play an essential role in biomarker discovery studies of biological samples since an alteration in glycan structures and cellular glycosylation profile are closely related to cellular regulation and malignancy9-11. Investigating and analyzing glycoproteins of pancreatic cyst fluid represents a potentially valuable source for information and can benefit in differentiating mucinous from non-mucinous cysts. In glycoproteomics, specific glycoproteins and glyco-isoforms are enriched followed by matrix assisted laser desorption or liquid

130

chromatography mass spectrometry analysis. Previous studies have demonstrated that by using antibody-lectin sandwich microarray some protein families and their glycan variants can be glyco-biomarker targets for accurate differentiation of mucinous cyst from non-mucinous cyst 9.

Also, lectin based glycoproteomics of pancreatic cyst fluid have identified specific glycans and glycoforms as possible biomarker targets to differentiate mucinous cyst from non-mucinous cyst10-11.

We have focused on the application of glycoproteins enrichment via a high performance multi-lectin affinity chromatography (HP-MLAC) method12 to differentiate mucinous pancreatic cyst fluid subtypes from non-mucinous pancreatic cyst fluid subtypes. HP-MLAC is a robust and high throughput high performance multi-lectin affinity chromatography platform that combines the depletion of two highly abundant proteins (human immunoglobulins and albumin), enrichment of glycoproteins and glyco-isoforms by multiple lectins (ConA, WGA and Jac) followed by a reversed phase sample clean up on an HPLC system. This platform is shown to result in the identification of potential glyco-biomarker targets, in plasma of breast cancer patients12-13.

In this chapter the glycoproteome as well as the non-glycoproteome landscape of pancreatic cyst fluid, which allows us to study the differences between mucinous cyst fluid vs. non-mucinous cyst fluid. First, we present data analysis based on our glycoprotein fractionation platform which indicated that a combination of high abundance protein depletion and enrichment by M-LAC followed by 1D SDS-PAGE fractionation allows us to characterize not only glycoproteins but also low abundant proteins which may be potential glyco-biomarker targets of interest14. By using different complementary proteomics tools such as; gene ontology, Novoseek, pathway analysis, network interactions and chromosome gene mapping analysis, we show that

131

‘proteins of interest’ selected via spectral count have significant cancer associations and provides a good list for selection of target proteins for pancreatic cyst biomarker discovery.

5.3 MATERIALS AND METHODS

5.3.1 Reagents

Pancreatic cyst fluid samples used in this study were obtained from Dr. Brian Haab’s laboratory at the Van Andel Research Institute (Grand Rapids, MI). Lectins for M-LAC column were purchased from Vector laboratories (Burlingame, CA). Capture select (R) ligand with specificity for albumin proteins and protein G with affinity for immunoglobulin’s (IgA, IgM, and IgG) were obtained from BAC. B. V., Netherlands and Life Technologies, Inc. (Carlsbad, CA) respectively.

POROS 20 R1 and POROS beads for conjugation were also purchased from Applied Biosystems

(Framingham, MA). Sequencing grade modified trypsin was purchased from Promega (Madison,

WI). HPLC-MS grade water, formic acid, acetonitrile and other buffer reagents were all purchased from Thermo Fisher Scientific (Waltham, MA).

5.3.2 Pancreatic cyst fluid samples

All pancreatic cyst samples involved in this study were collected in compliance with the guidelines of the local Institutional Review Boards at the University of Michigan Medical

Center, Memorial Sloan-Kettering Cancer Center, the University of Arizona Medical Center, the

University of Pittsburg Medical Center and Ospedale Sacro Cuore-Don Calabria Negrar, Italy.

Due to limited amount of materials, pancreatic cyst fluid samples for glycoproteomics were pooled into mucinous subtypes (potential malignant lesions) which includes; intraductal papillary

132

mucinous neoplasms (IPMN) and mucinous cystic neoplasms (MCN) and non-mucinous subtypes (benign lesions) comprise of serous cystadenomas (SC) and pseudocysts (PC) for glycoproteomic studies. Mucin-like proteins, fats and other particulate matter were depleted from cyst fluid samples as follows; 200 µL volume of each pool was mixed with equal volumes of 20 mM phosphate buffer pH 7.4 in 1.5 mL centrifuge tubes. Samples were vortexed briefly and centrifuged for 15 minutes at 6,000 x g. Supernatants were carefully separated from precipitated materials into a conditioned (equal ratio water/ethanol mixture) MWCO 3kDa micron centrifugal filter (Millipore, Billerica, MA) to buffer exchange with HP-MLAC binding buffer (25 mM Tris,

0.5 M NaCl, 1 mM MnCl2, 1 mM CaCl2 and 0.05% sodium azide, pH 7.4). This was followed by

BCA protein assay and immediate storage at -80oC until glycoproteomic analysis. Samples were thawed not more than twice for each experiment.

5.3.3 Sample fractionation and glycoproteins enrichment

200 µg each of mucinous subtypes and non-mucinous subtypes were fractionated and enriched for glycoproteins on an HP-MLAC platform as previously described 12. Briefly, high abundance proteins (IgG and albumin) were depleted followed by the HP-MLAC glycoprotein enrichment and sample desalting all on-line; preventing sample degradation and sample loses. The on-line sample preparation was performed on a 2D HPLC System (Shimadzu, Columbia, MD) equipped with three valves to allow for switching among the three columns connected serially. 0.1 M glycine, pH 2.5 was the elution buffer for POROS protein G and Capture select(R) ligand albumin-IgG columns. For the HP-MLAC glycoprotein affinity column, 0.1 M acetic acid pH

2.5 was used to elute M-LAC bound proteins. POROS 50 R1 reversed-phase PEEK column was used to desalt unbound and bound fractions collected. For desalting, solvent A was made of

0.1% trifluoroacetic acid in milli-q water and that of B was composed of 0.1% trifluoroacetic 133

acid in acetonitrile. 70% solvent B step gradient was used to elute bound proteins on the POROS

R1 column. Unbound protein fractions (no specificity for M-LAC column) were collected separately from M-LAC bound fractions. All fractions collected were speed vacuum to dryness.

5.3.4 1D SDS-PAGE analysis followed by in-gel Trypsin digestion

20 µg of each fraction was loaded on 10% Mini-PROTEAN® TGX™ Precast Gels and run on a

Mini-PROTEAN Tetra cell system (Bio-Rad Laboratories, Hercules, CA) for 40 minutes at a constant voltage of 200V. Gels were stained after electrophoresis following manufacturer’s instructions with SimplyBlue TM SafeStain (Invitrogen, Carlsbad, CA). Each fraction was excised into 10 separate bands and processed in a low binding centrifuge tubes as earlier described15, with some modifications. Briefly, each band was cut into 1mm x 1mm x 1mm pieces and destained as follows; 500 µL of 50 mM ammonium bicarbonate pH 8.0, 500 µL acetonitrile in that order for up to 5 cycles. In each instance, addition of the appropriate solvent, 5 minutes vortexing, short spin on a bench-top centrifuge and aspiration of solvent to waste were performed in a sequence. After the aspiration of acetonitrile in the last cycle, dehydrated gel pieces were dried in a speed vacuum, reduced at 56oC for 30 minutes with 0.5 M dithiothreitol

(DTT) in 50 mM ammonium bicarbonate pH 8.0 to a final concentration of 25 mM, and alkylated with 0.5 M iodoacetamide in 50 mM ammonium bicarbonate pH 8.0 to a final concentration of 50 mM at room temperature for 30 minutes in the dark. Gel pieces were washed with 500 µL 50 mM ammonium bicarbonate 3 times and dehydrated with 500 µL acetonitrile. 50

µL of freshly prepared 0.04 µg/µL of sequencing grade trypsin (Promega, Madison, WI) prepared in 25mM ammonium bicarbonate containing 3% acetonitrile (v/v) pH 8.0 was added to the dehydrated gel pieces. Low binding centrifuge tubes containing gel pieces were incubated on ice for 45 minutes to allow for trypsin enzyme absorption. Excess trypsin was aspirated after 45 134

minutes and enough digestion buffer (25 mM ammonium bicarbonate containing 3% acetonitrile

(v/v) pH 8.0) was added to cover gel pieces and incubated at 37oC for approximately 20 hours.

Peptides were extracted into labeled low binding centrifuge tubes in the following fashion; 200

µL of 5% formic acid/ 50% acetonitrile twice and 200 µL 100% acetonitrile once. Aspirated supernatants were pooled for each individual excised band and dried completely in a speed vacuum. Dried peptides were constituted in 20 µL 0.1% formic acid in HPLC grade water prior to nano-LC-MS/MS analysis.

5.3.5 Liquid chromatography mass spectrometry analysis

In-gel tryptic digested peptides were subjected to nano-LC-MS/MS analysis on an LTQ-Orbitrap

XL mass spectrometer (Thermo Fisher Scientific, Waltham, MA) equipped with an Ultimate

3000 HPLC (LC Packings-Dionex, Marlton, NJ, USA). A 75µm reversed phase C18 column was packed in-house using a slurry of 5-µm particle, 200-Å pore size Magic C18 stationary phase

(Michrom Bioresources, Auburn, CA) into a 150mm x 75mm i.d. capillary column (New

Objective, Woburn, MA). The peptides were loaded onto the C18 capillary column using an auto sampler and desalted for 30 mins at a flow-rate of 300 nL/min in isocratic mode. The following constituted the mobile phase buffers; mobile phase A consisted of 0.1% formic acid in HPLC grade water and mobile phase B consisted of 0.1% formic acid in HPLC grade acetonitrile. Flow rate was automatically adjusted to 200 nL/min for gradient separation of desalted peptides using the following gradient method; 5% B buffer to 40% B buffer for 80 mins; 40% B buffer to 90%

B buffer for 15 mins; 90% B buffer to 2% B buffer for 5mins. The total time for nano-LC-

MS/MS analysis was 140 mins. The mass spectrometer was operated in a data dependent fashion; eight most abundant precursor ions selected from the MS spectrum were MS/MS fragmented via collision induced dissociation (CID) using an isolation width of 3.0 mass unit. A 135

resolution, R of 60,000 was used to acquire each full MS scan over a mass range of 400-2000 m/z. Dynamic exclusion was set with 1 repeating counts (repeat duration of 30s, exclusion list of

150, and exclusion duration of 30s, exclusion mass width 0.55 m/z low and 1.55 m/z high).

5.3.6 Data processing and Bioinformatics

Protein identification was obtained by searching the generated MS/MS spectra against Uniprot annotated human database (release 2012_1; 34,157 entries) using Thermo Fisher Proteome

Discoverer 1.3 suite (Thermo Electron Corp, San Jose, CA). Both MASCOT (Matrix Science) version 2.3 and SEQUEST (Thermo Electron) algorithms present in the Thermo Fischer

Proteome Discoverer suite were used simultaneously to perform the search. This approach was utilized because it is shown that combined algorithms increases the number of proteins identified and reduces false positive identification in shot gun proteomics.16-17 Confidence in identification was enhanced by applying the reverse database with a false discovery rate (FDR) targeted at 1% at the peptide level. The following are the other search parameters used; 2 maximum missed cleavages, enzyme was set at full trypsin, carbamidomethylation on cysteine as static modification, precursor ion mass tolerance and fragment ion mass tolerance were set at 5ppm and 0.8Da respectively. PANTHER (Protein ANalysis THrough Evolutionary Relationships) database (http://pantherdb.org/) was used to determine gene ontology (GO) Molecular Function.

The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, Gene A La Cart (provided by www.genecards.org) version 3.10 was used to assign gene symbols as well as protein disease relationships. Novoseek (www.novoseek.com), a biomedical text mining tool was used to acquire disease relevance to gene scores.

136

5.3.7 Western Blot Analysis

10% Mini-PROTEAN® TGX™ Precast Gels (Bio-Rad Laboratories, Hercules, CA) and

Tris/Glycine SDS running buffer were used to resolve 5ug of proteins for each sample. Blotting was performed on Bio-Rad’s Transfer-Blot Turbo transfer system for 10mins. The primary antibody (rabbit polyclonal, 1:500) was purchased from Novus Biologicals, Littleton, CO. Goat anti-Rabbit HRP (System Biosciences, Mountain View, CA) was used as the secondary antibody. Immuno-detected proteins were visualized using ECL Western Blotting reagents (GE

Healthcare), and images were captured with a Fluorchem SP system (Alpha Innotech, Santa

Clara, CA.

5.4 RESULTS AND DISCUSSION

5.4.1 Analytical Strategy

To achieve good depth of glycoproteome analysis of the cyst fluid samples we used a 4 step chromatographic strategy, namely 1) abundant protein depletion, 2) lectin based chromatography

3) 1D SDS-PAGE separation and 4) capillary LC reversed phase separation of the mixture of tryptic peptides generated by a trypsin digest of the resulting 10 gel bands. In previous publications we have described our lectin platform which is based on a physical admixture of 3 different lectins that are individually bound to different chromatographic beads (termed multi- lectin affinity chromatography or M-LAC). We showed that such an admixture gave approximately a 10-fold increase relative to total proteins in binding efficiency that was based on the glycoside clustering effect18-19. Furthermore the ratio of an individual glycoprotein that was partitioned into the bound or unbound fraction was reproducible and reflected the types of glycan present in an individual protein. Also, changes in the ratios in a global protein analysis can yield

137

information on changes in glycan motifs in glycoproteins resulting from diseases such as cancer and diabetes20-21. As mentioned in the introduction, pancreatic cancer is both difficult to diagnose and appropriate clinical samples are difficult to obtain. We initiated the study with a sample set of 20 individual cyst samples and performed protein concentration, 1D SDS-PAGE and mass spectrometric analysis. It was clear from this phase of the investigation that a study with meaningful depth of analysis of samples with such rare availability would not permit replicate analyses. Patients’ samples were grouped (Table 5.1) into mucinous and non-mucinous cyst subtypes based on clinical diagnosis and classification as previously discussed6 and then pooled

(mucinous cyst fluid=10 and non-mucinous cyst fluid=10 samples in the two groups respectively).

Table 5.1: Pancreatic cyst fluid samples for glycoproteomics analysis

Clinical Number of samples Cyst fluid sample type and specimen notes classification Set 1 Set 2 Low IPMN Mucinous 2 2 Low grade MUCN Mucinous 0 1 Intraductal papillary mucinous neoplasms Mucinous 1 1 Pancreatic ductal adenocarcinoma arising in intraductal papillary Mucinous 2 1 neoplasms carcinoma-in situ (CIS) IPMN Mucinous 2 2 Mucinous cystic neoplasm with low grade dysplasia Mucinous 1 1 High grade dysplasia Mucinous 1 0 Adenoma Mucinous 1 2 Tissue origin: serous cystadenoma Non-mucinous 1 1 Pseudocyst Non-mucinous 2 1 Serous cystadenoma: Two lymph nodes negative for malignancy Non-mucinous 1 2 Serous cystadenoma Non-mucinous 3 2 Serous cystadenoma, macrocystic variant. Two benign lymph nodes Non-mucinous 1 2 Benign retention cyst Non-mucinous 2 1 Cystic lesion of the pancreas with fibrous walls but without Non-mucinous 0 1 epithelium IPMN = intraductal papillary mucinous neoplasm, MUCN = mucinous neoplasm

138

While cyst fluid offers an ideal proximal fluid for the observation of ‘proteins of interest’ that change in either concentration or amount of glycosylation motifs in cancer, it is a sample that is highly variable and of limited amount. For this phase of the investigation we found that pools were necessary to permit an effective depletion of contaminating blood proteins and to achieve good depth of glycoproteome analysis of the bound and unbound M-LAC fractions.

Another advantage of pooling is that individual patient variability would be minimized and that resulting ‘proteins of interest’ would likely to be of more general applicability22. We also had access to sufficient patient numbers to permit the preparation of a second pool of mucinous and non-mucinous cyst samples and thus to performed a second analysis (second sample set). Since the second pool contained a different set of individuals, the analysis would generate a second list of ‘proteins of interest’ that could be compared with the first analysis and be used to explore the differences in cyst fluid in mucinous and non-mucinous pancreatic cancer.

5.4.2 Glycoproteome and non-glycoproteome platform

As described earlier, our HP-MLAC based platform (Figure 5.1) used two protein (2P, albumin and IgG) depletion followed by multi-lectin affinity chromatography (M-LAC) glycoproteins enrichment and then 1D SDS-PAGE fractionation of eluted M-LAC bound and unbound proteins.

139

Figure 5.1: Block diagram showing experimental process used in glycoproteomic studies of two analysis sample set. Pancreatic cyst fluid samples were purified from immunoglobulins (A, D, E, G, M, and light chains) and albumin using an immobilized antibody HPLC packed PEEK column. Glycoprotein enrichment followed by one dimensional gel electrophoresis were used as further fractionation steps leading to nano-LC-MS/MS analysis of M-LAC bound and unbound fractions.

The decision to perform 2P depletion was based on a preliminary analysis of a set of pancreatic cyst fluid samples that showed highly variable and in many cases a significant contamination of albumin and IgG’s (see Figure 5.3 below) as well as demonstrating that the depth of resulting proteomic analysis was improved. Lectins for the M-LAC column with broad 140

specificity for glycans typically present in cancer and showed no non-specific binding in the M-

LAC platform, namely Concanavalin A (ConA), Artocarpus integrifolia (Jacalin) and Wheat germ agglutinin (WGA) lectins (high mannose, glycans; galactose and O-linked glycans; and N- acetylglucosamine and sialic acid glycans respectively) were selected. As shown later (Table

2A), some proteins in the bound fraction are not known to be glycosylated (e.g. carboxypeptidases (CPA)) which are previously attributed to either non-specific binding or binding to a glycosylated carrier protein12. In a pilot study we showed that the 1D SDS-PAGE separation step followed by in-gel trypsin digestion of 10 gel bands improved depth of proteins identification by more than 2 fold compared to in-solution trypsin digestion of protein fractions with no prior proteins or peptides level separation.

5.4.3 Summary of glycoproteome and non-glycoproteome data

To increase confidence and positive proteins identification, we used a 5ppm peptide mass tolerance and false discovery rate (FDR) targeted at 1% during MS/MS database search analysis.

A total average of 520 unique proteins was identified between the two sample sets of which mucinous subtypes had 230 proteins and 290 proteins belonging to non-mucinous subtypes. A four way Venn diagram was constructed to understand the distribution of unique and shared proteins for individual M-LAC fractions in each sample set, Figure 5.2.

141

Figure 5.2: A four way Venn diagram showing distribution of proteins identified in unbound and M-LAC bound fractions of mucinous and non-mucinous subtypes after glycoproteomic analysis in sample set 1 and sample set 2. MucFT = mucinous unbound fraction, MucBD = mucinouc M- LAC bound fraction, Non-mucFT = non-mucinous unbound fraction, Non-mucBD = non- mucinous M-LAC bound fraction. In addition, percent ratio for unbound vs. bound in mucinous and non-mucinous fractions (79.7 vs. 72.2 and 65.5 vs. 66.0 respectively) is similar in both sample set 1 & 2, Table 5.2.

Table 5.2: Number of Identified proteins in the Unbound and M-LAC Bound fractions after 1D SDS-PAGE LC-MS/MS Glycoproteomics analysis

Sample Number of unbound set 1 proteins Number of bound proteins Ratio Spectral Spectral Total Spectral Spectral Total Pools count = 1* count ≥ 2 proteins count = 1* count ≥ 2 proteins (BD/UNBD)X100 Mucinous 146 223 369 134 160 294 79.7 Non- mucinous 142 290 432 99 184 283 65.5

Sample Number of unbound set 2 proteins Number of bound proteins Ratio Spectral Spectral Total Spectral Spectral Total Pools count = 1* count ≥ 2 proteins count = 1* count ≥ 2 proteins (BD/UNBD)X100 Mucinous 93 170 263 80 110 190 72.2 Non- mucinous 71 120 191 44 82 126 66 *Proteins with spectral count equal to one were included in data analysis if the proteins were found with high abundance (spectral count > 10) in one sample set.

This observation is supported by molecular function characterization obtained from

PANTHER, a web-based gene ontology classification system23 showing protein abundance of each molecular function as their relative percentage. The corresponding unbound mucinous and non-mucinous fractions showed similar results. The similarity in protein molecular function for the M-LAC fractions presents a similar picture between mucinous and non-mucinous (M-LAC bound and unbound fractions). As expected, we observed variability in identified proteins (427 unique proteins in analysis set 1 vs. 298 unique proteins in analysis set 2) due to two 142

possibilities; 1) different levels of albumin contamination; and 2) individual differences between sample set 1 and sample set 2 (Figure 5.3A & 5.3B).

Figure 5.3: 1D SDS-PAGE of two sample sets used for glycoproteomics analysis. Variations in albumin levels and individual differences in each sample set accounts for variability in proteins identified in sample set one compared to sample set two. Overall, we observed significant differences in protein levels between mucinous and non- mucinous fractions of lipid (fatty acid) metabolism proteins such as Glutathione S-transferase

(see Table 5.3B); energy associated proteins such as those involved in glucose metabolism and

ATP synthesis e.g. Pyruvate kinase isozymes M1/M2 (see Table 5.3A and 5.3B below) as well as stress related proteins such as heat shock cognate 71kDa, heat shock 70kDa protein 1-like, and heat shock 70kDa protein 6.

143

5.4.4 Quantitation of glycoproteins in different analysis set and selection of potential protein targets of interest

Based on BCA total protein assay, equal amounts of cyst fluid pools (200µg) were depleted and separated on the M-LAC column. Protein recoveries were determined using BCA total protein assay. Total recovery range of 84%- 85% of yield of the starting material after M-LAC fractionation was recorded. This result is consistent with earlier published work24. Proteins were quantitated for differential expression by spectral counts (total MS/MS spectra collected for each protein), a label free semi-quantitative method developed for shot-gun proteomics and proteins with ≥ 2 unique peptides were used in quantitation. In selected cases, spectral counts were confirmed by peak area measurements of extracted ion chromatogram (EIC) as previously described14 as well as manual inspection of MS/MS spectra. Briefly, mass spectrometry data was first normalized by the reference ratio calculated from total spectral counts of mucinous and non- mucinous M-LAC fractions for each protein. Relative protein abundance changes were based on the ratios of spectral counts of mucinous and non-mucinous M-LAC bound and unbound fractions after normalization. The algorithm used for reference ratio calculations and protein abundance calculations were previously published13. Proteins with spectral count ratio ≥ 5.0 or ≤

0.3 were assigned as differentially expressed. In cases where no peptides (“0”) were observed for a protein, “1” was added for meaningful ratio calculations.

The development of a disease marker is preceded by a comprehensive discovery program such as in this study of pancreatic cyst fluid. In this type of study, it is premature to discuss biomarker candidates which will require a study of larger number of individual patient samples.

Our goal is to determine ‘proteins of interest’ (Table 5.3A & 5.3B) for the M-LAC bound and unbound subproteomes based on four criteria; 1) high protein abundance (spectral count), 2)

144

presence of protein in both sample sets analyzed, 3) significance to pancreatic cancer and other related diseases, and 4) spectral count ratio changes (higher or lower protein levels) as grouped in table 5.3A and 5.3B.

145

Table 5.3A: Mucinous vs. non-mucinous proteins identified in M-LAC bound fraction with relative abundance changes (spectral counts).

Spectral Counts (Total peptides hits) Analysis set 1 (fractions) Analysis set 2 (fractions) Pancreatic disease association b Protein name a Gene name b Mucinous c Non-mucinous c Mucinous c Non-mucinous c Pancreatic cancer Pancreatitis Bile salt-activated lipase CEL 41 3 55 5 yes(3.9) d yes(18.2) Carboxypeptidase A2 CPA2 31 4 12 2 yes(24.7) yes(44.1) Mucin-6 MUC6 23 2 21 3 yes(50.0) yes(2.4) Pancreatic triacylglycerol lipase PNLIP 21 3 13 2 yes(1.0) yes(49.0) Pyruvate kinase isozymes M1/M2 PKM 14 2 12 1f yes(22.3) yes(6.3) Periostin POSTN 3 20 4 23 yes(35.2) yes(11.6) Alpha- 2B AMY2B 43 4 46 2 no yes(22.9) Calcium-activated chloride channel regulator 1 CLCA1 78 3 23 1f no yes* Carbonic anhydrase 1 CA1 2 ND 16 2 no yes(17.3) Pancreatic lipase-related protein 2 PNLIPRP2 9 1f 17 1f no yes(25.5) Pancreatic alpha-amylase AMY2A 15 3 25 5 no yes(44.4) Leucine-rich alpha-2-glycoprotein LRG1 5 31 4 19 no yes(1.3) Phosphoglycerate kinase 1 PGK1 10 2 11 2 no no Metalloproteinase inhibitor 1 TIMP1 2 15 2 11 no no Alpha-1-acid glycoprotein 2 ORM2 37 5 21 3 no no Interstitial collagenase MMP1 36 6 42 3 no no Fibronectin FN1 63 2 77 2 no no Tetranectin CLEC3B 2 19 2 14 no no Vitamin D-binding protein GC 25 5 23 2 no no Protein S100-A12 S100A12 6 ND 2 11 no no

146

Table 5.3B: Mucinous vs. non-mucinous proteins identified in unbound fraction with relative abundance changes (spectral counts).

Spectral Counts (Total peptides hits) Analysis set 1 (fractions) Analysis set 2 (fractions) Pancreatic disease association b Protein name a Gene name b Mucinous c Non-mucinous c Mucinous c Non-mucinous c Pancreatic cancer Pancreatitis Adenylyl cyclase-associated protein 1 CAP1 14 2 11 2 yes(8.5) d yes(8.5) Hexokinase-1 HK1 57 4 49 5 yes(7.3) yes(7.3) Isoform Short of Bile salt-activated lipase CEL 8 1f 11 2 yes(3.9) yes(18.2) Mucin-2 MUC2 35 5 55 7 yes(49.0) yes(10.6) Pancreatic triacylglycerol lipase PNLIP 33 3 44 6 yes(1.0) yes(49.0) Pyruvate kinase isozymes M1/M2 PKM 17 2 13 2 yes(22.3) yes(6.3) Annexin A5 ANXA5 4 43 1f 13 yes(36.0) yes(9.4) Calcium-activated chloride channel regulator 1 CLCA1 59 2 27 1f no yes* Kininogen-1 KNG1 18 2 9 ND no yes(3.7) Glycine amidinotransferase, mitochondrial GATM 4 37 5 41 no yes(8.2) Cadherin-17 CDH17 36 6 ND 2 no yes(3.96) Pancreatic lipase-related protein 1 PNLIPRP1 17 1f 15 ND no no Glutathione S-transferase A1 GSTA1 2 20 3 29 no no Bifunctional purine biosynthesis protein PURH ATIC 31 3 32 4 no no Puromycin-sensitive aminopeptidase NPEPPS 30 6 35 7 no no 14-3-3 protein zeta/delta YWHAZ 5 31 7 37 no no Vinculin VCL 5 45 2 21 no no Heat shock cognate 71 kDa protein HSPA8 49 ND 41 ND no no Leukotriene A-4 hydrolase LTA4H 41 ND 19 1f no no

147

Spectral Counts (Total peptides hits) Analysis set 1 (fractions) Analysis set 2 (fractions) Pancreatic disease association b Protein Name a Gene name b Mucinous c Non-mucinous c Mucinous c Non-mucinous c Pancreatic cancer Pancreatitis 14-3-3 protein epsilon YWHAE 3 19 2 11 no no Aldo-keto reductase family 1 member B10 AKR1B10 ND 25 1f 11 no no Aspartate aminotransferase, mitochondrial GOT2 29 3 25 3 no no Annexin A10 ANXA10 1f 8 ND 23 no no Catalase CAT 13 2 24 2 no no a protein names are from Swiss-Prot., b gene name, and pancreatic disease association information are from Genecards, c relative expression levels based on spectral count., f in no peptide identification, 1 replaced 0 for easier ratio calculations, d Novoseek score (- log(P-Val)), based on literature mining information on the significance of disease to gene, * Genes without Novoseek score, red box highlights proteins expressed at lower levels in mucinous fractions, ND; not identified

148

Of particular interest are proteins that are mostly expressed in high levels in mucinous fractions including pancreatic cancer related proteins (pancreatic lipases, , mucin 2, calcium-activated chloride channel regulator 1, catalase, bile salt-activated lipase, carboxypeptidase etc.), and energy metabolism associated proteins (hexokinase-1, phosphoglycerate kinase 1, pyruvate kinase isozymes M1/M2, etc.). Also, chaperone function proteins (heat shock family proteins) were observed with higher protein levels (Table 3B). It is interesting to note that periostin, an extracellular matrix protein and a low abundance protein implicated in pancreatic cancer and other cancers25-29 was found in lower levels in mucinous cyst fluid. As part of a pilot pre-validation study, POSTN was investigated by western blots (see below). Some proteins in the M-LAC bound fraction potentially are not glycosylated (such as

Alpha-amylase 2B, Carboxypeptidase A2, and Pyruvate kinase isozymes M1/M2), but may be associated with a glycoprotein due to their function (protein binding and metal ion binding) or bind to the column as a result of non-specific binding of lectins.

Unlike the analysis of molecular function classification for total proteins identified in the bound and unbound M-LAC fractions, the list of ‘proteins of interest’ showed significant differences between the fractions in specific proteins (Figure 5.4) when molecular function classification analysis was performed. Proteins with antioxidant activity, for example, catalase

(CAT) and Glutathione S-transferase A1 were observed to be highly enriched in the unbound sub-proteome (Figure 5.4 (A)). On the other hand, proteins with receptor binding activity and proteins involved in structural functions were exclusively detected in the M-LAC bound glycoproteome (Figure 5.4 (B)).

149

Figure 5.4: Molecular functional characterizations of differentially expressed proteins in M-LAC fractionation. (A) Unbound fraction and (B) M-LAC bound fraction. PANTHER was used for classification. Percentage (%) = relative abundance of each molecular function. Red boxes = protein molecular function differentiating bound and unbound subproteomes.

Bile salt-activated lipase (CEL) long and short isoforms were observed to be differentiated by our M-LAC column (Appendix G I & II and Appendix H (MS/MS for long isoform diagnostic peptide)), with higher levels observed in M-LAC bound fraction of mucinous cyst fluid. CEL is a heavily O-linked glycosylated digestive enzyme30 implicated in diabetes and pancreatic exocrine dysfunction32. Previous studies of CEL31,32 shows O-linked glycosylation sites found at the C-tail fragment which binds to Jac lectin, a constituent of our M-LAC column.

It is possible that due to glycosylation changes, CEL long isoform selectively binds to the M-

LAC column while the short isoform flows through the M-LAC column. Recent studies by Mann and colleagues reported an observation of significant high levels of CEL in fucose enriched samples, suggesting CEL as a potential glyco-biomarker target in pancreatic cyst fluid10. Our findings of M-LAC’s ability to enrich CEL long isoform in mucinous cyst fluid subtypes contributes to recent observation,10 therefore M-LAC may contribute significantly to future structural and glycosylation alteration studies of CEL in pancreatic cyst fluid.

150

Novoseek, a data mining tool from Genecards human gene database (Version 3.10), was used to investigate the relationship to pancreatic disease of selected potential target proteins. The analysis revealed that, majority of our selected targets are involved in a variety of pancreatic diseases such as pancreatic cancer and pancreatitis (Appendix I), is consistent with earlier reports33. In Appendix I, details of specific pancreatic disease association with each target protein is listed. Since we do not know which of these targets will be secreted into the blood, the next phase of this study will determine which of these disease related targets are measurable in blood as such observations will depend on the biology of the disease and the huge dynamic range of plasma.

5.4.5 Pathway and network interaction analysis of potential targets of interest

Recent studies34,35 have suggested that a perturbed module (pathway or biological process) is a better disease marker than one or more biomarkers and thus discovery studies should have such a focus. Furthermore, it has been shown that prioritizing cancer associated protein observations together with protein-protein interaction network information can add further discrimination. 36

To determine the pathway of interest, we used the pathway listed in Genecards with the greatest concentration of our ‘proteins of interest’ which resulted in the selection of the pancreatic secretion pathway (KEGG, Figure 5.5). The relevance of this pathway was shown by the clustering of proteins involved in the pancreatic secretion process that were present in either M-

LAC bound fraction and/or unbound fractions (highlighted in red, blue and yellow in figure 4).

151

Figure 5.5: Annotation of pancreatic secretion KEGG signaling pathway. The pathway was generated from http://www.genome.jp/kegg/pathway.html. Genes in the pathway are circled as follows; Blue: up-regulated potential target proteins in mucinous subtypes; Red: up-regulated proteins in mucinous subtypes with less fold change; Yellow: proteins identified in glycoproteomics with no change in relative abundance based on spectral count. a Proteins observed more in M-LAC bound fraction; b Proteins observed more in unbound fraction; c Proteins observed equally in both unbound and bound fractions.

Examples include: alpha-amylase 2B (AMY2B), bile salt-activated lipase (CEL), pancreatic triacylglycerol lipase (PNLIP), pancreatic lipase-related protein 2 (PNLIPR2), and carboxypeptidases (CPA, CPB), and in the unbound fraction observed proteins include; chymotrypsinogen B1 (CTRB1) and pancreatic lipase-related (PNLIPR1) (denoted (a) and (b) respectively). The presence of higher levels of proteins in mucinous cyst fluid (blue and red circles in figure 4), their clustering and the fact that some target proteins physically associate

152

(CEL, PNLIP, and PNLIRP2), as shown in the interactome data (Appendix J), is an indication of the significance of these identified proteins in pancreatic cyst biology.

5.4.6 Chromosome gene mapping analysis of potential targets of interest

It has been observed that protein coding genes which express proteins that have related functions, such as tissue location, cellular compartment, common pathways or interactants are more likely to be co-located in the same chromosomal region37,38. In such situations co- expression can be facilitated by mechanisms such as cis-activation or suppression (gene slicing) of specific chromosomal regions. In this context, selected proteins of interest was submitted to the Gene A La Cart (provided by www.genecards.com, uploaded to Gene A La Cart for analysis in August, 2011) to obtain genomic location and Ensemble cytobands. It is of interest that some of the proteins identified in the M-LAC fractions with protein level change in mucinous vs. non- mucinous are located in specific chromosomal regions (Table 5.4), e.g. , band p21 and 36 (amylase and elastase); chromosome 10, band 25&26 (lipases). Also catalase (CAT) is located in the same genomic region as the important cancer associated genes MUC2 and 6

(chromosome 11, bands p13 and p15). Further, several of the enzyme groups such as PNLIPRP1, and PNLIP, are co-located in the same chromosome region and potentially co-expressed in cancer, which is consistent with other studies such as the ERBB2 amplicon39,40.

Table 5.4: Chromosome gene analysis of some ‘proteins of interest’

Protein Name Gene Name Chromosome # Cytogenetic band Pancreatic alpha-amylase AMY2A↑ 1 1p21.1 Alpha-amylase 2B AMY2B↑ 1 1p21.1 Calcium-activated chloride channel regulator 1 CLCA1↑ 1 1p22.3 Heat shock 70 kDa protein 6 HSPA6 1 1q23.3 Isoform 2 of Adenylyl cyclase-associated protein 1 CAP1 1 1p34.2 Chymotrypsin-like elastase family member 3B CELA3B 1 1p36.12

153

Chymotrypsin-like elastase family member 3A CELA3A 1 1p36.12 Pancreatic lipase-related protein 1 PNLIPRP1↑ 10 10q25.3 Pancreatic triacylglycerol lipase PNLIP↑ 10 10q26.1 Hexokinase-1 HK1↑ 10 10q22.1 Catalase CAT↑ 11 11p13 Mucin-2 MUC2↑ 11 11p15.5 Mucin-6 MUC6↑ 11 11p15.5 Cellular tumor antigen p53 P53 17 17p13.1 Pigment epithelium-derived factor SERPINF1↓ 17 17p13.3 14-3-3 protein epsilon YWHAE↓ 17 17p13.3 Puromycin-sensitive aminopeptidase NPEPPS↓ 17 17q21.32

Green highlights = co-expressed genes observed in the pancreatic secretion pathway Red highlights = co-located genes with oncogene p53 (yellow highlight) Arrows denotes relative proteins expression levels in mucinous vs. non-mucinous fractions; ↑ (higher levels in mucinous), ↓ (lower levels in mucinous)

It is interesting to note that SERPINF1, YWHAE and NPEPPS genes (proteins involved in; proteolytic events of cell growth, various signaling pathways, and inhibitor of angiogenesis respectively) are located in the same band (p13 and p21) on chromosome 17 as the important apoptotic gene p53. Future studies will explore the potential role of co-expressions in the development of pancreatic cancer and the potential role of these genes since this is the first report on such observations.

5.4.7 Validation of Periostin

As part of a pilot pre-validation of differentially expressed target proteins identified, we analyzed periostin (POSTN) by western blot using six samples; three mucinous and three non-mucinous pancreatic cyst fluid subtypes. POSTN was chosen for investigation from the protein target list because of its potential significance in pancreatic cancer progression and other related diseases 41 as well as its over expression in breast cancer 42 which is in contrast to our observations. POSTN

154

was immuno-precipitated using anti-periostin antibody and the blot detected by anti-periostin antibody measuring total POSTN protein levels at a molecular weight of 89kDa. Significant lower

POSTN levels was found in mucinous cyst subtypes; intraductal papillary mucinous neoplasm

(IPMN) and mucinous cyst neoplasm (MCN), compared to non-mucinous cyst subtypes; serous cystadenoma (Figure 5.6A). An 8 fold relative increase of POSTN in non-mucinous cyst fluid was observed (Figure 5.6B) after densitometry quantification correlating spectral count observations of cyst pools.

Figure 5.6: Pre-validation of Periostin as a potential biomarker target through SDS-PAGE western blot analysis. (A) Six pancreatic cyst fluid samples were subjected to western blotting to validate the relative abundance of Periostin in mucinous and non-mucinous cyst fluid subtypes. (B) Densitometry quantitation of Periostin levels. IPMN= intraductal papillary mucinous neoplasm.

Periostin (POSTN) was previously reported to be a potential biomarker target in pancreatic cancer, 43 however its role in pancreatic cyst is not known. Although POSTN is known to be glycosylated with one N-linked site, its glycosylation patterns with pancreatic cancer has

155

not been studied. Future studies are required to understand cancer related glycosylation changes of periostin and its overall role in pancreatic cyst fluid.

5.5 CONCLUSION

We have demonstrated that the high performance multi-lectin affinity chromatography (HP-

MLAC) platform successfully allows the enrichment and characterization of glycoproteins which are present at different levels in mucinous and non-mucinous cyst fluid subtypes. Of particular interest is the observation of increased amounts in the mucinous vs. non-mucinous cyst fluid of proteins with strong cancer associations such as; mucin 2 (MUC2), mucin 6 (MUC6), carboxypeptidase A2 (CPA), and hexokinase-1(HK1), proteins with energy metabolism functions as well as various pancreatic enzymes. The significance of the identified proteins was further shown with pathway analysis, interaction and chromosomal location investigations. In addition, bile salt-activated lipase (CEL) long isoform was significantly enriched in M-LAC fractions and differentially bound to M-LAC column thus indicating possible glycosylation changes in mucinous cyst fluid. Since these observations are based on cyst fluid pools of 20 individuals (10 mucinous cyst subtypes and 10 non-mucinous cyst subtypes), we believe that the true picture will be confirmed by analyzing individual samples that constitute the pools. In this discovery study, glycoproteomics is used to explore differentially expressed proteins to differentiate mucinous cyst fluid from non-mucinous cyst fluid. In future studies we plan to; 1) investigate selected ‘proteins of interest’ using anti-body lectin sandwich microarray platform 9,44 in a larger cohort; and 2) potentially measure ‘proteins of interest’ in a readily available diagnostic fluid i.e. plasma.

156

5.6 REFERENCES

1. Cleary, S. P.; Gryfe, R.; Guindi, M.; Greig, P.; Smith, L.; Mackenzie, R.; Strasberg, S.; Hanna, S.; Taylor, B.; Langer, B.; Gallinger, S., Prognostic factors in resected pancreatic adenocarcinoma: analysis of actual 5-year survivors. J Am Coll Surg 2004, 198 (5), 722-31.

2. Snow, P., Pancreatic Cyst Fluid Analysis–A Review. J Gastrointestin Liver Dis 2011, 20 (2), 175-180.

3. Cuoghi, A.; Farina, A.; Z’graggen, K.; Dumonceau, J.-M.; Tomasi, A.; Hochstrasser, D. F.; Genevay, M.; Lescuyer, P.; Frossard, J.-L., Role of Proteomics to Differentiate between Benign and Potentially Malignant Pancreatic Cysts. Journal of Proteome Research 2011, 10 (5), 2664-2670.

4. Kwon, R. S.; Simeone, D. M., The Use of Protein-Based Biomarkers for the Diagnosis of Cystic Tumors of the Pancreas. International Journal of Proteomics 2011, 2011, 1-9.

5. Hutchins, G. F.; Draganov, P. V., Cystic neoplasms of the pancreas: a diagnostic challenge. World J Gastroenterol 2009, 15 (1), 48-54.

6. Jeurnink, S. M.; Vleggaar, F. P.; Siersema, P. D., Overview of the clinical problem: facts and current issues of mucinous cystic neoplasms of the pancreas. Dig Liver Dis 2008, 40 (11), 837-46.

7. Matthaei, H.; Schulick, R. D.; Hruban, R. H.; Maitra, A., Cystic precursors to invasive pancreatic cancer. Nat Rev Gastroenterol Hepatol 2011, 8 (3), 141-50.

8. Hara, T.; Kawashima, H.; Ishigooka, M.; Kashiyama, M.; Takanashi, S.; Yamazaki, S.; Hosokawa, Y., Mucinous cystic tumors of the pancreas. Surg Today 2002, 32 (11), 965-9.

9. Haab, B. B.; Porter, A.; Yue, T.; Li, L.; Scheiman, J.; Anderson, M. A.; Barnes, D.; Schmidt, C. M.; Feng, Z.; Simeone, D. M., Glycosylation Variants of Mucins and CEACAMs As Candidate Biomarkers for the Diagnosis of Pancreatic Cystic Neoplasms. Annals of Surgery 2010, 251 (5), 937-945.

10. Mann, B. F.; Goetz, J. A.; House, M. G.; Schmidt, C. M.; Novotny, M. V., Glycomic and proteomic profiling of pancreatic cyst fluids identifies hyperfucosylated lactosamines on the N- linked glycans of overexpressed glycoproteins. Mol Cell Proteomics 2012, 11 (7), M111 015792.

11. Cao, Z.; Maupin, K.; Curnutte, B.; Fallon, B.; Feasley, C. L.; Brouhard, E.; Kwon, R.; West, C. M.; Cunningham, J.; Brand, R.; Castelli, P.; Crippa, S.; Feng, Z.; Allen, P.; Simeone, D. M.; Haab, B. B., Specific glycoforms of MUC5AC and endorepellin accurately distinguish mucinous from non-mucinous pancreatic cysts. Mol Cell Proteomics 2013.

12. Kullolli, M.; Hancock, W. S.; Hincapie, M., Automated platform for fractionation of human plasma glycoproteome in clinical proteomics. Anal Chem 2010, 82 (1), 115-20.

157

13. Zeng, Z.; Hincapie, M.; Haab, B. B.; Hanash, S.; Pitteri, S. J.; Kluck, S.; Hogan, J. M.; Kennedy, J.; Hancock, W. S., The development of an integrated platform to identify breast cancer glycoproteome changes in human serum. J Chromatogr A 2010, 1217 (19), 3307-15.

14. Plavina, T.; Wakshull, E.; Hancock, W. S.; Hincapie, M., Combination of abundant protein depletion and multi-lectin affinity chromatography (M-LAC) for plasma protein biomarker discovery. J Proteome Res 2007, 6 (2), 662-71.

15. Shevchenko, A.; Tomas, H.; Havlis, J.; Olsen, J. V.; Mann, M., In-gel digestion for mass spectrometric characterization of proteins and proteomes. Nat Protoc 2006, 1 (6), 2856-60.

16. Nesvizhskii, A. I., A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. Journal of Proteomics 2010, 73 (11), 2092-123.

17. Kapp, E. A.; Schutz, F.; Connolly, L. M.; Chakel, J. A.; Meza, J. E.; Miller, C. A.; Fenyo, D.; Eng, J. K.; Adkins, J. N.; Omenn, G. S.; Simpson, R. J., An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis. Proteomics 2005, 5 (13), 3475-90.

18. Yang, Z.; Hancock, W. S., Approach to the comprehensive analysis of glycoproteins isolated from human serum using a multi-lectin affinity column. Journal of chromatography. A 2004, 1053 (1-2), 79-88.

19. Yang, Z.; Hancock, W. S., Monitoring glycosylation pattern changes of glycoproteins using multi-lectin affinity chromatography. J Chromatogr A 2005, 1070 (1-2), 57-64.

20. Kyselova, Z.; Mechref, Y.; Kang, P.; Goetz, J. A.; Dobrolecki, L. E.; Sledge, G. W.; Schnaper, L.; Hickey, R. J.; Malkas, L. H.; Novotny, M. V., Breast cancer diagnosis and prognosis through quantitative measurements of serum glycan profiles. Clin Chem 2008, 54 (7), 1166-75.

21. Abd Hamid, U. M.; Royle, L.; Saldova, R.; Radcliffe, C. M.; Harvey, D. J.; Storr, S. J.; Pardo, M.; Antrobus, R.; Chapman, C. J.; Zitzmann, N.; Robertson, J. F.; Dwek, R. A.; Rudd, P. M., A strategy to reveal potential glycan markers from serum glycoproteins associated with breast cancer progression. Glycobiology 2008, 18 (12), 1105-18.

22. Batruch, I.; Lecker, I.; Kagedan, D.; Smith, C. R.; Mullen, B. J.; Grober, E.; Lo, K. C.; Diamandis, E. P.; Jarvi, K. A., Proteomic analysis of seminal plasma from normal volunteers and post-vasectomy patients identifies over 2000 proteins and candidate biomarkers of the urogenital system. J Proteome Res 2011, 10 (3), 941-53.

23. Mi, H.; Muruganujan, A.; Thomas, P. D., PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res 2013, 41 (Database issue), D377-86.

158

24. Kullolli, M.; Hancock, W. S.; Hincapie, M., Preparation of a high-performance multi- lectin affinity chromatography (HP-M-LAC) adsorbent for the analysis of human plasma glycoproteins. J Sep Sci 2008, 31 (14), 2733-9.

25. Kanno, A.; Satoh, K.; Masamune, A.; Hirota, M.; Kimura, K.; Umino, J.; Hamada, S.; Satoh, A.; Egawa, S.; Motoi, F.; Unno, M.; Shimosegawa, T., Periostin, secreted from stromal cells, has biphasic effect on cell migration and correlates with the epithelial to mesenchymal transition of human pancreatic cancer cells. Int J Cancer 2008, 122 (12), 2707-18.

26. Baril, P.; Gangeswaran, R.; Mahon, P. C.; Caulee, K.; Kocher, H. M.; Harada, T.; Zhu, M.; Kalthoff, H.; Crnogorac-Jurcevic, T.; Lemoine, N. R., Periostin promotes invasiveness and resistance of pancreatic cancer cells to hypoxia-induced cell death: role of the beta4 integrin and the PI3k pathway. Oncogene 2007, 26 (14), 2082-94.

27. Kudo, Y.; Siriwardena, B. S.; Hatano, H.; Ogawa, I.; Takata, T., Periostin: novel diagnostic and therapeutic target for cancer. Histol Histopathol 2007, 22 (10), 1167-74.

28. Zhang, Y.; Zhang, G.; Li, J.; Tao, Q.; Tang, W., The expression analysis of periostin in human breast cancer. J Surg Res 2010, 160 (1), 102-6.

29. Ouyang, G.; Liu, M.; Ruan, K.; Song, G.; Mao, Y.; Bao, S., Upregulated expression of periostin by hypoxia in non-small-cell lung cancer cells promotes cell survival via the Akt/PKB pathway. Cancer Lett 2009, 281 (2), 213-9.

30. Hui, D. Y.; Hayakawa, K.; Oizumi, J., Lipoamidase activity in normal and mutagenized pancreatic cholesterol esterase (bile salt-stimulated lipase). Biochem J 1993, 291 ( Pt 1), 65-9.

31. Raeder, H.; Johansson, S.; Holm, P. I.; Haldorsen, I. S.; Mas, E.; Sbarra, V.; Nermoen, I.; Eide, S. A.; Grevle, L.; Bjorkhaug, L.; Sagen, J. V.; Aksnes, L.; Sovik, O.; Lombardo, D.; Molven, A.; Njolstad, P. R., Mutations in the CEL VNTR cause a syndrome of diabetes and pancreatic exocrine dysfunction. Nat Genet 2006, 38 (1), 54-62.

32. Mechref, Y.; Chen, P.; Novotny, M. V., Structural characterization of the N-linked oligosaccharides in bile salt-stimulated lipase originated from human breast milk. Glycobiology 1999, 9 (3), 227-34.

33. Ke, E.; Patel, B. B.; Liu, T.; Li, X. M.; Haluszka, O.; Hoffman, J. P.; Ehya, H.; Young, N. A.; Watson, J. C.; Weinberg, D. S.; Nguyen, M. T.; Cohen, S. J.; Meropol, N. J.; Litwin, S.; Tokar, J. L.; Yeung, A. T., Proteomic analyses of pancreatic cyst fluids. Pancreas 2009, 38 (2), e33-42.

34. Chuang, H. Y.; Rassenti, L.; Salcedo, M.; Licon, K.; Kohlmann, A.; Haferlach, T.; Foa, R.; Ideker, T.; Kipps, T. J., Subnetwork-based analysis of chronic lymphocytic leukemia identifies pathways that associate with disease progression. Blood 2012, 120 (13), 2639-49.

159

35. Lee, E.; Chuang, H. Y.; Kim, J. W.; Ideker, T.; Lee, D., Inferring pathway activity toward precise disease classification. PLoS Comput Biol 2008, 4 (11), e1000217.

36. Lessner, D.J., et al., An unconventional pathway for reduction of CO2 to methane in CO-grown Methanosarcina acetivorans revealed by proteomics. Proc Natl Acad Sci U S A 2006, 103(47): p. 17921-6.

37. Zhang, E. Y.; Cristofanilli, M.; Robertson, F.; Reuben, J. M.; Mu, Z.; Beavis, R. C.; Im, H.; Snyder, M.; Hofree, M.; Ideker, T.; Omenn, G. S.; Fanayan, S.; Jeong, S. K.; Paik, Y. K.; Zhang, A. F.; Wu, S. L.; Hancock, W. S., Genome Wide Proteomics of ERBB2 and EGFR and Other Oncogenic Pathways in Inflammatory Breast Cancer. J Proteome Res 2013, 12 (6), 2805- 17.

38. Wu, C.; Zhu, J.; Zhang, X., Integrating gene expression and protein-protein interaction network to prioritize cancer-associated genes. BMC Bioinformatics 2012, 13, 182.

39. Kauraniemi, P.; Barlund, M.; Monni, O.; Kallioniemi, A., New amplified and highly expressed genes discovered in the ERBB2 amplicon in breast cancer by cDNA microarrays. Cancer Res 2001, 61 (22), 8235-40.

40. Greenbaum, D.; Colangelo, C.; Williams, K.; Gerstein, M., Comparing protein abundance and mRNA expression levels on a genomic scale. Genome Biol 2003, 4 (9), 117.

41. Ke, E.; Patel, B. B.; Liu, T.; Li, X.-M.; Haluszka, O.; Hoffman, J. P.; Ehya, H.; Young, N. A.; Watson, J. C.; Weinberg, D. S.; Nguyen, M. T.; Cohen, S. J.; Meropol, N. J.; Litwin, S.; Tokar, J. L.; Yeung, A. T., Proteomic Analyses of Pancreatic Cyst Fluids. Pancreas 2009, 38 (2), e33-e42.

42. Abbott, K. L.; Aoki, K.; Lim, J. M.; Porterfield, M.; Johnson, R.; O'Regan, R. M.; Wells, L.; Tiemeyer, M.; Pierce, M., Targeted glycoproteomic identification of biomarkers for human breast carcinoma. J Proteome Res 2008, 7 (4), 1470-80.

43. Kosanam, H.; Makawita, S.; Judd, B.; Newman, A.; Diamandis, E. P., Mining the malignant ascites proteome for pancreatic cancer biomarkers. Proteomics 2011, 11 (23), 4551-8.

44. Orchekowski, R.; Hamelinck, D.; Li, L.; Gliwa, E.; vanBrocklin, M.; Marrero, J. A.; Vande Woude, G. F.; Feng, Z.; Brand, R.; Haab, B. B., Antibody microarray profiling reveals individual and combined serum proteins associated with pancreatic cancer. Cancer Res 2005, 65 (23), 11193-202.

160

CHAPTER 6

SUMMARY AND FUTURE DIRECTIONS

161

Although there has been tremendous progress in cancer diagnosis and prognosis, cancer remains the main cause of disease related deaths with an estimation of 12 million deaths by

2030. Current methodologies or tests utilized in the clinics are not satisfactory due to their invasiveness and complexity making the development of alternative procedures attractive. In addition the discovery and the availability of newer more sensitive or specific markers either on their own or when used in combination with current markers is highly desired to improve diagnostic or prognostic accuracy. In this thesis, we developed a multi-dimensional glycoproteomics platform for the characterization of cancer samples and identified several potential candidate markers.

The glycoproteome of cancer samples were studied to identify potential biomarker candidates using off-line depletion and lectin based fractionation followed by on-line nano-

LC-MS/MS analysis. Clinical samples involved in this project were pancreatic cyst fluid and clear cell renal cell carcinoma (ccRCC) plasma samples. To improve the glycoproteome depth of the biological samples, a multi-dimensional platform which combined abundant protein depletion and lectin enrichment was developed, optimized and validated before application to the clinical samples. The goal for the depletion process was to improve the identification and quantitation of low-level proteins difficult to detect due to masking effects of high abundant proteins. The lectin enrichment column separated the glycoproteome based on carbohydrates affinity for selected lectins to capture glycoproteins sub-populations.

Glycoproteins and non-glycoproteins identified after nano-LC-MS/MS peptide sequencing were mapped to investigate protein abundance expression as well as glycosylation profile differences between disease and healthy (control) clinical samples. Expressed proteins with significant (p value < 0.05) differential changes were selected and their biological relevance

162

in terms of disease associations and interactions established. Further, comprehensive characterization of a glycoprotein of interest identified during the glycoproteomics investigation was performed to establish their structural composition and significance.

In ccRCC glycoproteomics analysis from plasma samples obtained from patients before and after surgery, a total of 39 proteins with significant abundance changes were quantified and 13 glycoproteins were identified with unique glycosylation patterns distinguishing before and after surgery sample. Among these proteins was clusterin glycoprotein whose high amounts in ccRCC plasma is linked with the loss of pVHL tumor suppressor gene; the gene responsible for the protection of ccRCC cells from cancer growth. Clusterin was therefore isolated from ccRCC plasma and characterized using nano-LC-MS/MS to explore their structure and their utility as potential candidate biomarker. Two glycan iso-forms; bi- antennary digalactosylated disialylated (A2G2S2) glycan or a core fucosylated bi-antennary digalactosylated disialylated (FA2G2S2) glycan at N-glycosite N-374 were found to uniquely differentiate cRCC plasma samples before and after surgery. Lectin blotting assay was employed as a validation platform to confirm aberrant glycosylation as a function of the presence of ccRCC. This therefore confirms the utility of clusterin glycans as potential markers of ccRCC however, clusterin glycan specificity to ccRCC and other diseases.

In a similar glycoproteomics analysis, pancreatic cyst fluid samples were comprehensively characterized to examine the glycoproteome profiles of pooled samples from mucinous (disease) and non-mucinous (benign) cysts. Proteomic complimentary tools such as; gene ontology classification, pathway and network analysis, and chromosome gene mapping were applied to evaluate differentially expressed proteins. Our results revealed a strong relationship between pancreatic cancer and selected candidate biomarkers.

163

Furthermore, 6 samples were independently selected and western blot assay performed to validate differentially expressed proteins. Periostin protein was found after western blot analysis to significantly increase in non-mucinous cyst fluid. All these results further confirm the potential role of cancer-specific glycoproteins and non-glycoproteins in glyco-biomarker discovery studies.

In the future we plan to follow-up by confirming observed protein abundance and glycan alterations in individual ccRCC and pancreatic cysts fluid patient samples. Further validation of observed changes in clinical assays such as ELISA, MRM and antibody-lectin sandwich microarray in a larger cohort will be conducted.

164

APPENDIX A

A: Peak area measurements of serum albumin to evaluate column loading capacity, sample run- to-run carry over and non-specific binding. Average of three replicates was used to generate bar graphs.

165

APPENDIX B

B: Stability analysis of 12P-M-LAC platform. 12P bound fraction was utilized to assess column life time based on spectral counts of 2 analytical replicates. Each point in the graph represents an identified protein, x axes shows spectral counts per protein in analytical replicate 1 and y axes shows spectral counts per protein in analytical replicate 2. Pearson coefficient correlation values (R2) indicates consistent reproducibility of 12P depletion column over runs (time period) with an average R2 of 0.9818

166

APPENDIX C

Details of the number of total peptides and proteins identified from two analytical replicates

Proteomics (12P depleted plasma)

Sample Total number of peptides* Total number of proteins

Disease (+RCC) 4063 159

Control (-RCC) 3112 151

Glycoproteomics (12P-M-LAC plasma)

M-LAC Unbound fraction M-LAC Bound fraction

Total number of Total number of Total number of Total number of Sample peptides* proteins peptides* proteins

Disease (+RCC) 3221 147 1598 86

Control (-RCC) 2404 125 1437 88

*average spectral count of two analytical replicates, only proteins with ≥ 2 unique peptides are included in the data

167

APPENDIX D

D: A representative average MS spectral of M-LAC bound (BD) and unbound (FT) fractions of ccRCC before (+RCC) and after (-RCC) surgery plasma. Glycans eluted at a specific time window (~35-60min) in m/z range 650-1600. Yellow circle: Galactose; Blue square: N- acetylglucosamine; Green circle: Mannose; Purple diamond: sialic acid; Red triangle: Fucose; Core = (GlcNAc)2(Man)3

168

APPENDIX E

E: Relative abundances of RCC (+) and RCC (-) glycans for averaged (n=3) analytical replicates is represented in the heat map. Hierarchical clustering was used to arrange 12P-M-LAC fractions (bound and unbound), and glycans identified (rows). On the left hand side is the clustering tree obtained after hierarchical clustering. Color key is ordered from -1(low enriched glycans) to 1.5 (highly enriched glycans).

169

APPENDIX F

F: A representation of glycan site occupancy calculation showing N-linked site 374 NL: normalized level

170

APPENDIX G

GI: Identified peptides for bile salt-activated lipase (CEL) long iso-form in M-LAC bound subproteome. Protein sequence coverage is 16.5%.

Sequence Charge MH+ [Da] Intensity RT [min] ALENPQPHPGWQGTLK 3 1772.909 6.82E+06 44.14 DQHMAIAWVK 3 1214.598 1.59E+06 43.39 GIPFAAPTK 1 901.514 6.70E+06 45.44 KLGLLGDSVDIFK 3 1404.809 4.19E+05 63.46 LGAVYTEGGFVEGVNKK 2 1767.927 1.94E+06 48.20 NPLFWAK 2 875.478 6.99E+06 54.32 TVVDFETDVLFLVPTEIALAQHR 3 2613.397 2.59E+05 89.59 TYAYLFSHPSR 3 1341.660 2.68E+06 48.32 VGCPVGDAAR 2 1001.482 8.43E+06 26.56 VTEEDFYK 2 1030.472 2.22E+06 39.50 AISQSGVALSPWVIQK 2 1683.945 5.78E+05 61.88 LGLLGDSVDIFK 2 1276.716 2.04E+06 70.57 VGPLGFLSTGDANLPGNYGLR 2 2118.103 1.11E+05 70.91 Blue highlight is showing diagnostic peptide sequence unique to bile salt-activated lipase long iso-form.

GII: Identified peptides for bile salt-activated lipase (CEL) short iso-form in unbound subproteome Protein sequence coverage is 13.97%.

Sequence Charge MH+ [Da] Intensity RT [min] AISQSGVALSPWVIQK 2 1683.943 2.60E+06 60.54 ALTLAYK 1 779.466 2.66E+06 45.25 LGLLGDSVDIFK 2 1276.715 4.94E+06 69.1 NPLFWAK 2 875.477 2.00E+06 55.85 TVVDFETDVLFLVPTEIALAQHR 3 2613.392 7.20E+05 88.57 VGCPVGDAAR 2 1001.483 3.19E+06 31.79 VGPLGFLSTGDANLPGNYGLR 2 2118.099 1.94E+06 70.58

171

APPENDIX H

H: MS/MS fragmentation of diagnostic peptide TYAYLFSHPSR of CEL-long isoform. Sequence coverage is 13.95, Charge: +2, Monoisotopic m/z: 671.333 Da (accuracy: +0.05 mmu/+0.08 ppm), ionscore: 44

172

APPENDIX I

I: Novoseek disease relationship to pancreatic cancer and related diseases data of selected protein target list

Protein Name Gene Name Disease association 14-3-3 protein epsilon YWHAE - 14-3-3 protein zeta/delta YWHAZ - Adenylyl cyclase-associated protein 1 CAP1 Pancreatic Cancer Adenylyl cyclase-associated protein 1 CAP1 pancreatitis Aldo-keto reductase family 1 member B10 AKR1B10 - Alpha-1-acid glycoprotein 2 ORM2 - Alpha-2-HS-glycoprotein AHSG - Alpha-amylase 2B AMY2B pancreatitis Annexin A10 ANXA10 - Annexin A5 ANXA5 pancreatitis Annexin A5 ANXA5 Pancreatic Ductal Adenocarcinoma Annexin A5 ANXA5 pancreatic carcinoma Annexin A5 ANXA5 pancreatic cancer Aspartate aminotransferase, mitochondrial GOT2 - Basement membrane-specific heparan sulfate proteoglycan core protein HSPG2 - Bifunctional purine biosynthesis protein PURH ATIC - Bile salt-activated lipase CEL pancreas exocrine Bile salt-activated lipase CEL pancreatic tumor Bile salt-activated lipase CEL pancreatitis Bile salt-activated lipase CEL pancreatic cancer Cadherin-17 CDH17 pancreatitis Calcium-activated chloride channel regulator 1 CLCA1 pancreatitis Carbonic anhydrase 1 CA1 chronic pancreatitis Carbonic anhydrase 2 CA2 chronic pancreatitis Carboxypeptidase A1 CPA1 Pancreatitis Carboxypeptidase A2 CPA2 pancreatitis Carboxypeptidase B CPB1 pancreatitis alcoholic Carboxypeptidase B CPB1 acute pancreatitis Carboxypeptidase B CPB1 pancreatitis Carboxypeptidase B CPB1 pancreatic cancer Catalase CAT - Fibronectin FN1 - Glutathione S-transferase A1 GSTA1 -

173

Protein Name Gene Name Disease association Glycine amidinotransferase, mitochondrial GATM pancreatitis Heat shock 70 kDa protein 1A/1B HSPA1A - Heat shock 70 kDa protein 1-like HSPA1L - Heat shock 70 kDa protein 6 HSPA6 - Heat shock cognate 71 kDa protein HSPA8 - Hexokinase-1 HK1 Pancreatic Cancer Hexokinase-1 HK1 pancreatitis Histone H4 HIST1H4I - Interstitial collagenase MMP1 - Isoform H14 of Myeloperoxidase MPO - Kininogen-1 KNG1 pancreatitis Leucine-rich alpha-2-glycoprotein LRG1 Pancreatitis Leukotriene A-4 hydrolase LTA4H - Metalloproteinase inhibitor 1 TIMP1 - Mucin-2 MUC2 pancreatic tumor Mucin-2 MUC2 pancreatic cancer Mucin-2 MUC2 carcinoma pancreatic ductal Mucin-2 MUC2 pancreatic carcinoma Mucin-2 MUC2 chronic pancreatitis Mucin-2 MUC2 pancreatic cystadenoma mucinous Mucin-6 MUC6 pancreatic tumor Mucin-6 MUC6 chronic pancreatitis Mucin-6 MUC6 carcinoma pancreatic ductal Pancreatic alpha-amylase AMY2A pancreatitis Pancreatic lipase-related protein 2 PNLIPRP2 pancreatitis Pancreatic triacylglycerol lipase PNLIP pancreatic insufficiency Pancreatic triacylglycerol lipase PNLIP exocrine pancreatic insufficiency Pancreatic triacylglycerol lipase PNLIP pancreas exocrine Pancreatic triacylglycerol lipase PNLIP pancreatic diseases Pancreatic triacylglycerol lipase PNLIP acute pancreatitis Pancreatic triacylglycerol lipase PNLIP chronic pancreatitis Pancreatic triacylglycerol lipase PNLIP pancreatitis Pancreatic triacylglycerol lipase PNLIP pancreatic cancer

174

Protein Name Gene Name Disease association Periostin POSTN Pancreatic Ductal Adenocarcinoma Periostin POSTN Pancreatitis Periostin POSTN Pancreatic cancer Phosphoglycerate kinase 1 PGK1 - Pigment epithelium-derived factor SERPINF1 - Protein disulfide-isomerase A4 PDIA4 - Protein S100-A12 S100A12 - Puromycin-sensitive aminopeptidase NPEPPS - Pyruvate kinase isozymes M1/M2 PKM pancreatitis Pyruvate kinase isozymes M1/M2 PKM pancreatic cancer Tetranectin CLEC3B - Vinculin VCL - Vitamin D-binding protein GC - (-)‘proteins of interest’ in which no disease association was found using Novoseek data mining tool.

175

APPENDIX J

J: A string network interaction of CEL, PNLIP, and PNLIPRP1 genes significantly enriched in glycoproteomics and observed in pancreatic secretion pathway. Red circles = target genes interacting

176