Mass Spectrometry-Based Clinical Proteomics for Non-Small Cell Lung Cancer

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University

By

Nilini Sugeesha Ranbaduge

Graduate Program in Chemistry

The Ohio State University

2016

Dissertation Committee:

Dr. Vicki H. Wysocki, Advisor

Dr. Susan V. Olesik

Dr. Abraham Badu-Tawiah

Dr. David P. Carbone

Copyrighted by

Nilini Sugeesha Ranbaduge

2016

Abstract

Even with extensive genomic and transcriptomic characterization of tumors, the relationship of the human cancer genotype to cancer phenotype remains unclear. , however, are the immediate molecular drivers of the cancer phenotype that govern tumorigenesis or tumor recurrence. The research described here highlights work on non- small cell lung cancer tumors and cell lines. The major goals are to discover proteins exclusive to tumor recurrence and liver kinase B1 (LKB1) mutation, respectively.

The proteins were discovered by nanoflow multidimensional liquid chromatography coupled to mass spectrometry.

The goal of the research described in Chapter 2 of the dissertation focuses on establishing a mass spectrometry-based bottom-up proteomic method for detection in formalin-fixed paraffin embedded (FFPE) tissue specimens. Identification of protein markers for lung cancer requires tumor tissues that are usually unavailable in the fresh, frozen state. FFPE tissues, however, are produced from resected tumor material and are readily available for proteome analysis. The use of these tumor samples in mass spectrometry demands effective sample preparation and detection strategies. In the analysis, the use of an on-slide deparaffinization method and modified lysis buffer recovered the maximum amount of protein from the slide tissue specimen and reduced

ii

the sample incubation time during digestion. Fractionation of peptide digests into fifteen high pH reversed phase fractions followed by low pH reversed phase separation resulted in the highest number of protein identifications for a minimum amount of tissue protein extract when coupled to an optimized mass spectrometry method.

In Chapter 3, the use of this method for the tumor protein analysis yielded over five thousand proteins per cohort. The corresponding changes at protein level were identified by comparing the proteins discovered in specimens from recurrent to those of nonrecurrent patients. Adenocarcinoma tumor samples resulted in a number of altered protein expressions in recurrent tumors, but the results were different from those of messenger RNA reported for the same tissue cohort in a previous study. Squamous cell data produced a potential protein signature featuring desmosomal proteins for lung squamous cell carcinoma recurrence. The changes in desmosomal proteins were unidirectional and have not been reported before. In this case, the respective transcript level changes were in agreement with those at the protein level.

Chapter 4 described the use of lung cancer cell lines to detect LKB1 dependent modifications in the cellular proteome and phosphoproteome. LKB1 is a tumor suppressor protein that regulates and subsequent phenotypic changes through phosphorylation. The protein expression of LKB1 is lost in 40% of lung adenocarcinomas due to gene mutations. The mass spectrometric analysis of adenocarcinoma cell lines with and without LKB1 protein yielded many differentially regulated phosphoproteins including salt-inducible kinase (SIK) and CREB related transcription coactivators (CRTCs). SIK, a direct downstream target of LKB1, regulates iii

the localization of CRTC in the nucleus that activates CREB-dependent cell transcription.

The mass spectrometry data reported the effects of LKB1 expression on transcription through the LKB-SIK3-CRTC3/2 pathway. The identification of SIK3 and CRTC3 pathway components is novel and could be useful as drug targets.

iv

Dedication

Dedicated to my parents, P.R. Karunadasa and Sunethra Geethanjalie

for their love and support

v

Acknowledgments

First and for most, I would like to express the deepest appreciation to my research advisor Dr. Vicki Wysocki for her continuous support, throughout the past six years.

Without her guidance, I would not be able to take my research this far. She not only taught me how to be a better scientist but more importantly, how to be a better communicator. Further, I appreciate her for trusting me and giving me the freedom to design my own research projects.

Carbone research group has been my collaborators for the past four years. I would like to thank, Dr. David P. Carbone for his support in all lung cancer projects, especially for providing incredibly useful feedback each step of the way. I would like to thank

Carbone group member, Dr. Joseph M. Amann for his endless support in completing numerous molecular biology assays and for helping me understand the medical side of all of the projects.

I want to thank Wysocki, Carbone, and Nana-Sinkam research group members who have helped me in various research projects. This includes several group members past and present, specifically Dr. Chengsi Huang, Dr. Royston Quityn, Akiko Tanimoto, Jing

Yang, Jikang Wu, Mengxuan Jia, Dr. David Bush, Dr. Tadaaki Yamada, Dr. Jacob

vi

Kauffman, Michael Koenig, Michael Shapnack, Dr. Sophie Harvey, Dr. Florian Busch and Dr. Jennifer Fleming for their help in various projects.

The mass spectrometry facilities at both the University of Arizona and The Ohio

State University have been my mentors in instrument operations. I want to thank both facilities’ staff, especially Dr. Linda Breci and Dr. Arpad Somogyi.

Last but not least, I want to thank my close friends and my family members. Your continuous love and support allow me to keep pushing myself further and to better myself.

vii

Vita

2009...... B.S. Chemistry, University of Colombo, Sri

Lanka

2010-2012 ...... University of Arizona

2012 to present ...... Graduate Research Associate, Department

Chemistry and Biochemistry, The Ohio

State University

Field of Study

Major Field: Chemistry

viii

Table of Contents

Abstract ...... ii

Dedication ...... v

Acknowledgments...... vi

Vita ...... viii

Table of Contents ...... ix

List of Tables ...... xv

List of Figures ...... xvii

List of Abbreviations ...... xxiv

Chapter 1: Mass Spectrometry-Based Clinical Proteomics for Lung Cancer ...... 1

1.1 Genomics to Proteomics: Protein analysis in lung cancer ...... 1

1.2 Characterization of the cancer proteomes ...... 2

1.2.1 Proteomics analysis of the cancer proteome ...... 3

1.3 Discovery proteomics: Mass spectrometry for biological and clinical applications . 5

1.3.1 Ionization techniques ...... 5

ix

1.3.1.1 Electrospray ionization ...... 6

1.3.1.2 Matrix-assisted laser desorption ionization ...... 8

1.3.2 Mass analyzers ...... 10

1.3.2.1 Quadrupole ...... 10

1.3.2.2 Quadrupole Ion trap ...... 12

1.3.2.3 Orbitrap ...... 14

1.3.2.4 Hybrid mass analyzers and data dependent acquisition ...... 15

1.3.3 Ion activation methods ...... 19

1.3.3.1 Collision-induced dissociation (CID) ...... 20

1.3.3.2 Higher-energy collision induced dissociation (HCD)...... 21

1.3.3.3 Electron transfer dissociation (ETD) ...... 21

1.4 Introduction to separation of samples ...... 23

1.4.1 Reversed phase high performance liquid chromatography ...... 26

1.4.1.1 Reversed phase/Reversed phase high performance liquid chromatography 26

1.4.1.2 Affinity separation coupled to reversed phase high performance liquid

chromatography ...... 28

1.5 Data analysis in proteomics ...... 29

1.5.1 Database search algorithms...... 30

1.5.2 False discovery rate...... 31

x

1.6 Quantification of proteins in clinical and biological samples ...... 32

1.7 Non-mass spectrometry based techniques used in clinical sample analysis ...... 34

1.7.1 Western blot analysis ...... 34

1.7.2 Immunofluorescence microscopy ...... 35

1.8 Specific techniques and introduction ...... 35

Chapter 2: Preparation and Analysis of Formalin-Fixed Paraffin-Embedded Tissues by

Two-Dimensional Liquid Chromatography Coupled to Tandem Mass Spectrometry ..... 36

2.1 Introduction ...... 36

2.2 Materials and methods ...... 41

2.2.1 Tissue deparaffination and protein extraction ...... 41

2.2.2 Digestion of tissue protein samples ...... 42

2.2.3 Optimizing reversed phase online fractionation and high performance liquid

chromatography separation of protein digests ...... 43

2.2.4 The identification of peptides by tandem mass spectrometry ...... 45

2.2.5 Determining the limit of quantification for label-free shotgun proteomics ...... 47

2.2.6 Identification of peptides and protein groups in tissue protein samples ...... 48

2.3 Results and Discussion ...... 49

2.3.1 The use of trypsin compatible protein extraction and lysis buffer ...... 49

xi

2.3.2 Optimization of the number of online fractions and the analytical separation

time 56

2.3.3 Optimization of tandem mass spectrometry analysis...... 59

2.3.4 Limit of detection of the optimized LC/LC-MS/MS method and the use of

spectral counts in differential expression analysis ...... 64

2.4 Conclusions ...... 66

Chapter 3: The Proteomic Analysis of Patient Tumor Samples to Identify Marker

Proteins in Recurrent and Nonrecurrent Lung Cancer ...... 68

3.1 Introduction ...... 68

3.2 Materials and Methods ...... 72

3.2.1 The dissection of tumors and extraction of proteins ...... 72

3.2.2 Digestion of protein sample ...... 72

3.2.3 The online fractionation and LC-MS/MS analysis of tumor protein digests ..... 73

3.2.4 Data processing and protein identification ...... 75

3.2.5 The bioinformatics analysis of the samples ...... 76

3.2.6 Pathway Analysis ...... 76

3.3 Results and Discussion ...... 77

3.1.1 Identifying the peptides and proteins in patient tumor samples ...... 77

3.3.1 Differential expression analysis of the tumor tissue proteome ...... 81

xii

3.3.2 Recurrent vs. non-recurrent adenocarcinoma tumors ...... 85

3.3.3 Recurrent vs. non-recurrent squamous cell carcinoma tumors ...... 95

3.3.4 Desmosomal proteins; prognostic markers for early recurrence in lung

squamous cell carcinoma ...... 105

3.3.5 Identifying the differences in adenocarcinoma vs. squamous cell carcinoma

tumors 112

3.4 Conclusions ...... 120

Chapter 4: Free Proteomics and Phosphoproteomics for Detection and Semi-

Quantification of the Effects of Liver Kinase B1 Protein in Lung Cancer Cells ...... 122

4.1 Introduction ...... 122

4.2 Materials and methods ...... 126

4.2.1 Cell culture and digest preparation ...... 126

4.2.2 Phosphorylated and unmodified peptide selection ...... 128

4.2.3 Liquid chromatography coupled to mass spectrometry ...... 128

4.2.4 Protein identification and semi-quantification ...... 130

4.2.5 Statistical analysis of the data ...... 131

4.2.6 Annotating proteins identified in A549 cell lines ...... 132

4.2.7 Immunofluorescence measurements of lung cancer cells ...... 132

4.3 Results and Discussion ...... 133

xiii

4.3.1 Optimization of sample preparation and phosphopeptide enrichment ...... 133

4.3.2 Identification of proteins and phosphoproteins in A549 digests ...... 136

4.3.3 The differentially expressed proteins indicate protein and phosphorylation level

changes associated with LKB1 ...... 143

4.3.4 LKB1-dependent pathways in A549 cells ...... 152

4.3.5 LKB1 phosphorylates its direct downstream target SIK3 ...... 157

4.3.6 LKB1 alters caveolin-1 and membrane receptor proteins ...... 165

4.3.7 LKB1 is a tumor suppressor in lung adenocarcinoma cell lines ...... 167

4.4 Conclusions ...... 169

Chapter 5: Conclusions and Future Directions ...... 171

5.1 Conclusions ...... 171

5.2 Future directions ...... 173

References ...... 177

Appendix A: The Chromatograms and the Change in Protein Sequence Coverage by

Different High pH fractionation...... 196

Appendix B: The Tissue H&E Slides indicating the Tumor Margins...... 201

Appendix C: The 1505 patient Data ...... 202

Appendix D: Protein Sequence Alignments of AKR Family Proteins ...... 205

Appendix E: The Normalized A549 Data Plots ...... 209

xiv

Appendix F: Phosphoprotein and protein network analysis of A549 cell lines ...... 210

xv

List of Tables

Table 2.1: Percentage of acetonitrile (solvent B1) used in each fraction in 3-fraction, 5- fraction, 10-fraction and 15-fraction 2D LC methods...... 45

Table 2.2: The parameter groups tested in the mass spectrometry method optimization.. 46

Table 3.1: The patient clinical information...... 78

Table 3.2: The differentially expressed proteins in lung adenocarcinoma samples...... 88

Table 3.3: Differentially expressed proteins in squamous cell carcinoma samples...... 98

Table 3.4: The cell junction proteins discovered in the proteomics analysis...... 107

Table 3.5: The mRNA-protein correlation of significantly altered desmosomal proteins

...... 109

Table 3.6: The differentially expressed protein groups in lung adenocarcinoma vs. lung squamous cell carcinoma nonrecurrent samples ...... 114

Table 3.7: The differentially expressed protein groups in lung adenocarcinoma vs lung squamous cell carcinoma recurrent samples ...... 119

Table 4.1: The differentially expressed proteins in the proteome (flow-through sample) indicating significant changes in wild-type sample comapred to LKB1 vector...... 145

Table 4.2: The differentially expressed phosphoproteins (flow-through sample) indicating significant changes in wild-type sample comapred to LKB1 vector...... 149

xv

Table C.1: The data obtained by Myrimatch search algorithm for patient 1505...... 202

Table C.2: The top 97 protein identifications and respective spectral counts obtained for

WU 1505 patient data...... 202

Table D.1: Presents the normalized spectral counts for the selected signature proteins in squamous cell carcinoma samples...... 207

xvi

List of Figures

Figure 1.1: The workflow for A- bottom-up proteomic analysis. B- top-down proteomic analysis ...... 4

Figure 1.2: Schematic of electrospray ionization ...... 7

Figure 1.3: The schematic of matrix-assisted laser desorption ionization ...... 9

Figure 1.4: The schematic of quadrupole mass analyzer ...... 12

Figure 1.5: The schematic of two-dimensional linear ion trap instrument ...... 13

Figure 1.6: The cross section view of orbitrap mass analyzer ...... 15

Figure 1.7: The schematic of Thermo Orbitrap Elite hybrid mass spectrometer ...... 18

Figure 1.8: Roepstorff and Fohlman nomenclature for peptide fragmentation ...... 20

Figure 1.9: The ETD fragmentation scheme...... 22

Figure 1.10: A- The dynamic range of proteins in the tissue proteome. B- The problems faced by MS-based protein identification...... 24

Figure 1.11: Biological sample preparation and analysis. A- samples are separated using

2D SDS-PAGE followed by proteolytic digestion. B- The peptides are fractionated using multidimensional chromatography ...... 25

Figure 1.12: The schematic of the Waters two-dimensional liquid chromatography system

...... 27

xvii

Figure 1.13: The common quantitative mass spectrometry workflows used in shotgun proteomics ...... 33

Figure 2.1: The formalin fixation process of proteins. A- the primary amine group of the protein reacts with formaldehyde and forms a hydroxymethyl-methylol adduct B- the elimination of water to form a Schiff-base product C- the amino acid side chain reacts with the methylene carbon of the Schiff-base to form the cross-linking product...... 38

Figure 2.2: The structure of RapiGest detergent indicating A- the cleavage sites of the protein in the presence of the Rapigest detergent B- the cleavage products of the detergent in low pH solvents...... 40

Figure 2.3: The flow diagram showing the sample preparation method ...... 50

Figure 2.4: Optimization of sample extraction and preparation...... 52

Figure 2.5: A- Indicates the amount of lysine-ending and arginine-ending peptides detected in the samples, B- The STRAP GO annotation analysis of proteins identified by

4 h gradient. The figure indicates the total number of proteins annotated with each cell fraction...... 54

Figure 2.6: The number of protein groups and total peptides detected in each LC run A- in each multi-fraction online LC method. B- with 10 and 15-fractions methods. The two analytical gradient times used in these LC runs were 60 min and 90 min followed by 30 min equilibration time...... 58

Figure 2.7: Optimization of the MS/MS method for detection of peptides and protein groups ...... 61

Figure 2.8: The liquid chromatogram of FFPE tissue protein digest ...... 63

xviii

Figure 2.9: A- The log10 spectral counts detected for the spiked in internal E.coli chaperonin protein ...... 65

Figure 3.1: The flow of information available with different omic experiments ...... 69

Figure 3.2: The number of protein groups identified in A- adenocarcinoma patient samples B- Squamous cell carcinoma patient samples ...... 79

Figure 3.3: The Venn diagram of protein groups identified in (filtered to include only high confidence peptides with a minimum of 2 peptides a protein) A- squamous cell carcinoma vs. adenocarcinoma sample groups B- adenocarcinoma recurrent and non- recurrent samples C- squamous cell carcinoma recurrent and non-recurrent samples. .... 80

Figure 3.4: The box plot of the distribution of spectral counts in each patient sample before and after DESeq normalization ...... 82

Figure 3.5: The principle component 1 (PC 1) vs principle component 2 (PC 2) of the samples...... 83

Figure 3.6: Differentially expressed proteins (p<0.01) identified in recurrent vs. nonrecurrent samples of adenocarcinoma and squamous cell carcinoma sample groups 84

Figure 3.7: The heat map of protein groups identified in adenocarcinoma patient samples.

...... 86

Figure 3.8: The proteogenomic data for adenocarcinoma samples A- the heat map of adenocarcinoma recurrent and nonrecurrent data (RNA and DNA). B- the RNA and protein correlation of the data C- overlap of differentially expressed and differentially correlated at the RNA and protein levels (npSeq, p<0.05) ...... 93

xix

Figure 3.9: The IPA network map of differentially expressed proteins identified in lung adenocarcinoma samples...... 94

Figure 3.10: The heat map of the top 150 squamous cell carcinoma protein groups ...... 97

Figure 3.11: The IPA pathway map of differentially expressed proteins found in SCC sample...... 104

Figure 3.12: The schematic of desmosomal junctions in the cells ...... 106

Figure 3.14: The box plots of desmosomal proteins; DSC3, DSG3, PKP1, DSP, and JUN.

...... 110

Figure 3.14: The pathway analysis of differentially expressed (fold-change>2, p<0.05) squamous cell carcinoma proteins by ingenuity pathway analysis ...... 111

Figure 4.1: A- The LKB1 protein B- inactive LKB1 in the nucleus binds to STRAD and

MO25 ...... 123

Figure 4.2: The downstream signaling pathway of LKB1...... 125

Figure 4.3: Shows A- the western blot of LKB1 vector (V) and wild-type (WT) cell lysates B- the morphology of LKB1-vector and wild-type cell lines in cell culture, C- the general workflow of the experiment ...... 134

Figure 4.4: The optimization of sample preparation, A- shows the total number of phosphopeptides detected in /surfactant and trypsin/lysC digested samples, B- shows the total number of phosphopeptides detected with each TiO2 bead bed weight, C- the total number of singly, doubly and triply phosphorylated peptides identified in 4 mg

TiO2 bed sample, D- shows the amounts of serine, threonine and tyrosine phosphorylation sites detected in the 4 mg TiO2 bed sample...... 135

xx

Figure 4.5: Shows A- chromatograms of fraction 1, 2, 5, 10 and 15 of a 15 fraction multi-dimensional liquid chromatography run for phosphopeptide analysis of A546,

LKB1 wild-type. B- the number of total phosphopeptides identified in each high pH fraction following low pH analytical separation. The %acetonitrile composition is given in solid blue line...... 139

Figure 4.6: The Icelogo motif analysis of peptide sequences with one or two missed cleavages containing A- serine B- threonine and C-tyrosine phosphorylation sites. All phosphorylation site residues were aligned at position eight before the analysis and fourteen residues around phosphorylation site were selected. Symbol sizes represent the conservation level based on information theory...... 140

Figure 4.7: Representative fragmentation spectra of two phosphopeptide sequences indicating the fragmentation sites and respective fragment masses ...... 141

Figure 4.8: shows the A- the total number of protein groups and phosphoprotein groups identified in LKB1-vector and wild-type samples B- the Venn diagram shows the overlapping proteins in protein and phosphoprotein groups...... 143

Figure 4.9: The heat maps generated in MeV 4.9.0 for selected differentially expressed

LKB1 vector (V) and LKB1 wild-type (WT) A- A549 phosphoproteins B- A549 proteins identified in the statistical analysis ...... 144

Figure 4.10: The annotation of differentially expressed proteins (fold change≥2 and p-value<0.05) in A549 cell lines...... 151

Figure 4.11: The pathway protein map of differentially expressed phosphoproteins with a fold change ≥2 and p-value<0.05...... 154

xxi

Figure 4.12: The pathway map of differentially expressed proteins with a fold change ≥2 and p-value<0.05...... 155

Figure 4.13: The pathway map of differentially expressed proteins with a fold change ≥2 and p-value<0.05...... 156

Figure 4.14: shows the phosphorylation sites of A- SIK3; a direct downstream target of

LKB1. B- CRTC2 C-CRTC3. CRTC 2/3 are downstream targets of SIK3 ...... 159

Figure 4.15: The confocal microscopy images of CRTC3 localization in A549 cell lines.

...... 161

Figure 4.16: Confocal microscopy images of cell lines treated with SIK inhibitor, HG-9-

91-01 ...... 162

Figure 4.17: confocal microscopy images of A- Calu-1 parental and the N12 knockout cell lines B- Calu-6 parental and the knockout (F2 and G2) cell lines...... 163

Figure 4.18: The figure indicated the modeled pathway for LKB1-SIK3-CRTC3-CREB signaling pathway. SIK3 inhibits CRTC3 via phosphorylation...... 164

Figure 4.19: The changes in log2 transformed normalized intensity for phosphorylation sites of A- CAV1, B- PTRF and C- EGFR ...... 167

Figure A.1: The multidimensional liquid chromatograms for a- 3-fraction, b- 5-fraction, c- 10-fraction, d- 15-fraction LC/LC runs...... 196

Figure A.2: The an example indicating the effect of multi-fractionation on protein sequence coverage ...... 200

Figure B.1: The H&E slides indicating the tumor margins of glass slide specimens. .... 201

xxii

Figure D.1: The of a- AKR1C1 and AKR1C2 and b- AKR1C2 and

AKR1C3. The UniProt sequence alignment tool was used...... 205

Figure D.2: The sequence alignment of A- AKR1C2 and AKR1B10...... 206

Figure D.3: The fragment ion spectra for two selected peptides of DSC3, desmosomal protein...... 208

Figure E.1: The PCA plots for A549 total protein and phosphoprotein data...... 209

Figure E.2: The phosphorylation site intensity change in vector (darker color) vs. wild- type (lighter color) sample for MyC and MEK proteins...... 209

Figure F.1: The pathway map of differentially expressed phosphoproteins with a fold change ≥2 and p-value<0.05 ...... 210

Figure F.2: The pathway map of differentially expressed phosphoproteins with a fold change ≥2 and p-value<0.05 ...... 211

Figure F.3: The pathway map of differentially expressed proteins with a fold change ≥2 and p-value<0.05 ...... 212

xxiii

List of Abbreviations

DNA Deoxyribonucleic acid

RNA Ribonucleic acid

TMT Tandem mass tag iTRAQ Isobaric tag for relative and absolute quantitation

GC Gas chromatography

UPLC Ultra performance liquid chromatography

LC Liquid chromatography

MS Mass Spectrometry

MS/MS Tandem mass spectrometry

BCA Bicinchoninic acid

TFA Trifluoroacetic acid

DTT DL-Dithiothreitol

ALK anaplastic lymphoma kinase

EGFR Epidermal growth factor receptor

AJCC American joint committee on cancer mRNA Messenger ribonucleic acid

FC fold change

xxiv

SPE Solid phase extraction cAMP Cyclic AMP

AMP Adenosine monophosphate

GAPDH Glyceraldehyde 3-phosphate dehydrogenase

xxv

Chapter 1: Mass Spectrometry-Based Clinical Proteomics for Lung Cancer

1.1 Genomics to Proteomics: Protein analysis in lung cancer

Lung cancer is the most common cause of cancer mortality.1 Each year 1.8 million new incidences are being reported and about 85% of the patients are diagnosed with non- small cell lung cancer (NSCLC).2 Most NSCLC patients are diagnosed at a later stage resulting in a low, 5-year, survival rate. Constantly evolving treatment options and diagnostic methods are therefore vital to the field.

Over the years, NSCLC-related clinical research has made several breakthroughs in identifying genetic signatures resulting from a mutation of an oncogene or a tumor suppressor.3-5 Such changes were discovered at genomic and transcriptomic levels. The use of these marker genes in the clinic led to gene mutation based diagnosis and eventually to personalized treatment options.

The influence of genomic and transcriptomic level changes on cancer cell function is an ongoing investigation. This is mainly due to the difficulty in translating such changes into proteins. Although proteins are molecular drivers of the active genome, the genomic expression levels do not correlate with that of the proteome every time.6,7

Methods and instrumentation used in the cancer proteome analysis, therefore, have

1

undergone rapid growth over the last decade to match the improvements in DNA and

RNA analysis.

Proteomics is the study of proteins, posttranslational (PTM) modifications and protein-protein interactions. Posttranslational modifications are essential in cell signaling.

Modifications such as phosphorylation, glycosylation, ubiquitination, and acetylation can modulate signaling cascades and associated cellular functions.8 For instance a point mutation of serine, threonine or tyrosine may result in a loss of a phosphorylation site.

Such adverse effects in cell signaling may affect a network of interacting proteins favoring an oncogenic phenotype. Therefore, identifying protein level modifications and respective protein-protein interactions are an important part of the cancer proteomic analysis.

Proteomic research is, largely focused on recognizing protein level modifications of cells. These discoveries can be made at a proteome-wide level or in a targeted manner.

The research presented in this dissertation is centered on discovery-based proteomics of the NSCLC proteome to discover prognostic markers for lung tumor recurrence and liver kinase B1 dependent protein networks. The techniques used in the dissertation research are discussed below in detail.

1.2 Characterization of the cancer proteomes

The underlying molecular biology differentiates the disease proteome from that of the normal tissues. Comparison of cell lines and tumor tissue for disease and the normal

2

state identifies changes at the protein level. Reverse phase protein array (RPPA) and mass spectrometry are the two most common methods used in the global proteomic analysis.

RPPA quantitatively measure multiple proteins in a large number of samples using antibodies. Mass spectrometry has an advantage over RPPA analysis as RPPA requires prior knowledge of the proteins and high-quality antibodies. As a result, mass spectrometry is used as the primary technique in large-scale clinical proteomics.

1.2.1 Proteomics analysis of the cancer proteome

Cancer proteome analysis is often accomplished with carefully paired specimens that represent the biological systems under investigation. These specimens are either patient tissue samples, model animal tissue samples or cell lines. The proteomics strategies followed in the analysis of such samples are of two types, top-down and bottom-up proteomics.

Top-down analysis (Figure 1.1.A) requires minimal sample preparation as the method uses intact proteins for the analysis. Mass spectra generated in the analysis are used to identify the protein sequence. The top-down method results in proteins and their posttranslational modifications however greatly suffers from the sample complexity in clinical samples, identifying smaller numbers of proteins than bottom up analysis.9

Bottom-up proteomics (Figure 1.1.B) requires more sample preparation than the top-down method. Protein samples are digested with a proteolytic enzyme producing oligopeptides. In a typical experiment, the samples are fractionated using one or more

3

chromatography methods prior to mass spectrometry. Mass spectrometry analyzes fractionated peptide mixtures and produces protein IDs from small numbers of identified peptides. The discovery-based bottom-up analysis of a protein mixture is often called a shotgun analysis. Shotgun analysis is one of the most common proteomic methods used in mass spectrometry-based global-scale proteome profiling of tissues and cell lines.

Figure 1.1: The workflow for A- bottom-up proteomic analysis. The protein samples are digested with an enzyme, and the peptides are identified by mass spectrometry. B- top- down proteomic analysis. The intact protein masses are selected and fragmented for sequence identification. Reproduced with permission.10

4

1.3 Discovery proteomics: Mass spectrometry for biological and clinical applications

Mass spectrometry-based bottom-up proteomics identifies proteins and peptides based on their mass-to-charge ratio. When coupled with a separation method such as liquid chromatography, MS is capable of answering questions from protein level changes to active pathways under a given condition. In the MS analysis, proteins and peptides are ionized by introducing positive or negative charge(s). The ionized molecules enable ion transfer into the mass analyzer under vacuum or extremely low pressures. The applied electric field manipulates or controls the movement of these ions in the mass analyzer where each molecule is separated based on the mass-to-charge ratio (m/z). Each separated ion is detected by the detector to produce a signal proportional to its m/z. The breakthroughs in the field over the years have led to multiple ionization methods, analyzers, and detectors that are customized based on the application.

1.3.1 Ionization techniques

Introducing molecules into the gas phase is a challenge that mass spectrometry faces in the analysis of non-volatile macromolecules such as proteins. Ionization methods were developed to provide a solution to this by depositing a charge or charges on to the molecule followed by desorption or desolvation. In clinical sample analysis, electrospray ionization (ESI) and matrix-assisted laser desorption ionization (MALDI) are more commonly used than others. The two techniques were awarded the Nobel Prize in chemistry in 2002.

5

1.3.1.1 Electrospray ionization

Electrospray ionization is one of the most common ionization techniques used in biomolecular MS. This is a soft ionization technique that causes little to no fragmentation in the source, therefore, it is widely used in protein and nucleic acid analysis. The effective ionization capability of this method widens the protein identification by allowing detection, characterization, and quantification while coupled to a high- performance liquid chromatography (HPLC) or capillary electrophoresis (CE) system.11,12

In ESI, a solution of analytes is sprayed through a capillary at low flow rates

(µL/min) using a voltage between 2-5 kV. This creates an electric field gradient that forms a Taylor cone at the tip of the capillary. The solvent droplets that reach the

Rayleigh limit form droplets with extra charges and eject from the tip.13 The droplets move through the electric field while forming charged analyte molecules (Figure 1.2).

The process that forms the charged molecular ions is explained by three models: ion evaporation model (IEM), charge residue model (CRM) and chain ejection mechanism

(CEM).

IEM assumes that the increased charge density caused by solvent evaporation releases ions from the droplet leaving bare ions in the gas phase.14 CRM model assumes that increased charge density due to solvent evaporation divides large droplets into smaller droplets eventually leaving a single ion behind.15 This ionization takes place at atmospheric pressure (API). The technique uses a differential pressure gradient to

6

transport ions into the mass analyzer. CEM model explains the behavior of large molecular weight polymers like unfolded proteins with a hydrophobic core.15 In here the polypeptide chain moves to the surface of the droplet avoiding contact between the hydrophobic regions and the solvent. Eventually, these polypeptide chains eject from the surface starting from one of the termini. ESI results in a rapid, sensitive, soft and quantitative ionization method for proteins that can be easily coupled to a mass analyzer for detection.

Figure 1.2: Schematic of electrospray ionization. The spray forms a Taylor cone at the end of the spray needle. The desolvation of charged droplets forms charged molecular ions. Reprinted with permission.16

7

In clinical samples analyses, a nanoESI source is often used. This uses a nano bore exit capillary with 1-2 µm diameter at a flow rate of 20-1000 nL/min to obtain a stable spray.17 The reduced capillary size enables the use of smaller sample volumes as few as a couple of microliters. The sample consumption is much smaller making nanoESI a more appealing method of ionization for sample limited biological and clinical specimens.18

Further , the use of low voltages between 0.7-2 kV preserves PTMs, enables the use of high polarity solvents like water in both positive and negative ionization methods18 and provides high tolerance for buffers containing salt.19 NanoESI is hence a method suitable for biomolecule mass spectrometry of proteins and posttranslational modifications.

1.3.1.2 Matrix-assisted laser desorption ionization

Like ESI, matrix-assisted laser desorption ionization20,21 is a commonly used ionization method in biomolecular MS. This is a “soft ionization” method that can be used to detect molecular masses ranging from a few thousands to a few hundred thousand daltons.22 In MALDI, the sample mixed with an excess amount of the matrix, is co- crystallized on a metal plate. A UV laser (or IR) irradiates the sample using nanosecond laser pulses. The matrix consists of organic molecules that absorb radiation at the wavelength of the laser. α-cyano-4-hydroxycinnamic acid or dihydroxybenzoic acid

(DHB) are often used as the matrix in protein and peptide analysis.23

The proposed ionization of MALDI consists of two steps: the photoionization of the matrix and the secondary ion-molecule reaction to generate analyte ions (Figure 1.3).24,25

8

However, to-date the complete mechanism of MALDI remains unclear. The method is used in nucleic acid, protein, and peptide analysis, especially for smaller sample volumes and concentrations due to the higher sensitivity to generate singly or doubly charged molecular ions. The other appealing features of MALDI are its high salt tolerance and the ability to couple this pulsed ion source with mass analyzers like time-of-flight (TOF) instruments. MALDI, however, is not commonly used in complex sample analysis mainly due to the difficulty in coupling the ionization method with a continuous flow separation system.

Figure 1.3: The schematic of matrix-assisted laser desorption ionization. The crystallized sample-matrix medium is irradiated with a laser that causes sublimation and ionization of proteins or peptides. Reproduced with permission.23

9

1.3.2 Mass analyzers

Essentially, mass spectrometers measure the mass-to-charge (m/z) of molecular ions. In a biological sample, these molecular ions can be anything from nucleic acids to peptides. A mass analyzer separates these charged molecular ions based on the m/z ratio for detection and identification. There are several types of mass analyzers including magnetic sector (B), time-of-flight, quadrupole (Q), quadrupole ion trap (QIT), orbitrap and Fourier transformed ion cyclotron resonance (FTICR).

In recent years, the increase in demand for more evolved MS instrumentation has led to the development of hybrid mass spectrometers. The hybrid MS instrumentation is designed by bringing two similar or different mass analyzers together with tandem capabilities. Quadrupole ion trap and orbitrap mass analyzers used for this dissertation research are discussed here.

1.3.2.1 Quadrupole

Quadrupole mass analyzers are one of the most common types of analyzers used in biomolecular mass spectrometry. This is a beam type mass analyzer and was the most popular choice for chromatography-based MS instrumentation (GC-MS and LC-MS) in the years from 1970-1990.26 Even today, quadrupole mass analyzers are commercially used due to the use of lower accelerating voltages (2-50 V), high transmission efficiency, low price and fast scan speeds.

10

Modern quadrupoles contain four cylindrical or hyperbolic rods mounted in a square configuration. Each opposite rod pair is designed to experience the same combined potential of direct current (DC) voltage and alternating radio frequency (RF) voltage.

The DC voltage (U) and the RF voltage (V with a frequency ω) determine the overall ion trajectory of any m/z range. When the opposite rod pairs experience +(U+Vcosωt) and -

(U+Vcosωt), the stable ion trajectory is decided by the magnitude of V and U/V ratios.

The m/z with stable ion trajectories are allowed to transmit through the two- dimensional electric field of the quadrupole while the others collide with the rods (Figure

1.4). Stable regions are defined by solutions to the Mathieu equation.27 In practice, m/z filtering is achieved by changing the magnitude of U and V while keeping the U/V ratio constant. Quadrupole mass analyzers often have a mass range from m/z 50- 4000 with unit resolution separation of peaks over the entire m/z range. As a result, a reduced sensitivity is observed for higher m/z ions in mass-filtering quadrupoles.

Quadrupoles can be operated in RF only mode by setting the DC to zero making it a wide band pass for ions. Such quadrupoles are known as RF-only quadrupoles (q) and are often used as ion guides and collision cells in hybrid mass spectrometers. Ion guide quadrupoles are used at a high pressure that confines the ion beam through collisional cooling. This reduces the axial motion of the ion beam and increases the ion transfer efficiency. These types of ion guides are seen in transmission regions of ion trap and orbitrap instrumentation. Hexapoles or octapoles may also be used in place of q to enhance the band pass quality.26 RF-only quadrupole collision cells are used in multi quadrupole instruments like triple quadrupoles. 11

Figure 1.4: The schematic of quadrupole mass analyzer. A fixed direct current (DC) and radio frequency applied create a stable ion trajectory for a single m/z. Reproduced with permission.26

1.3.2.2 Quadrupole Ion trap

The ion trap is another common mass analyzer used in biomolecular analysis.

Depending on the design, ion traps can be of two types: the three-dimensional quadrupole ion trap (QIT) and the two-dimensional linear ion trap (LIT). The quadrupole ion traps use three hyperbolic electrodes to create a three-dimensional RF quadrupole field. The inlet focusing system, the two capping electrodes and the space charge limit of the QIT control the ion injection. The RF voltage applied to the ring electrode confines the ions inside the trap by creating a stable oscillating trajectory. Since the trapping step depends on the voltage and the m/z of the ions, only pre-selected m/z ions are stabilized.

Disrupting DC and RF voltages make the ions unstable and thus they are ejected axially for detection.

Two-dimensional ion traps (Figure 1.5) are currently more common in biological sample analysis than quadrupole ion traps. The design uses four capping electrodes on 12

either end to trap ions inside the field of the central electrode. This design has enhanced sensitivity, dynamic range, and signal-to-noise (S/N) ratio compared to conventional 3D traps. As with 3D traps, the 2D traps provide multistep activation capabilities allowing the users to perform higher order tandem experiments (MSn) useful in protein or peptide analysis.28 The two common types of activation methods used are collision induced dissociation (CID) and electron transfer dissociation (ETD). Further, a slit on the main electrodes allows both axial ejection of the ions as well as the radial ejection leading to more efficient precursor and fragment ion detection.

Figure 1.5: The schematic of two-dimensional linear ion trap instrument. Ions trapped inside maintain a stable trajectory as a result of the RF voltage applied to the ring electrode. Reproduced with permission.28,29

The inherent resolution of a 2D ion trap is a unit mass resolution which can be improved with relatively slow RF voltage scan rates. The m/z of the commercial ion trap mass analyzers is around m/z 200-4000. The linear ions traps are readily used in hybrid

13

mass spectrometers including the Thermo Velos Pro dual pressure dual ion trap and the

Thermo Orbitrap Elite mass spectrometer described in section 1.3.2.4 .

1.3.2.3 Orbitrap

Identification of proteins in complex mixtures has been one of the biggest challenges faced by mass spectrometry-based protein analysis. This is mainly due to lack of instrumentation with high-performance capabilities. The orbitrap mass analyzer, invented by Alexander Makarov at the end of 1990s30, was a breakthrough in biological and clinical mass spectrometry instrumentation. The design was based on the Knight-type

Kingdon trap invented in the early 1980s.31

The orbitrap contains two coaxial electrodes; an inner spindle like electrode and an outer barrel-like electrode.32 The confined packet of ions injected into the orbitrap is pulled towards the central electrode by the potential drop. Shortly after, the ions begin the orbiting motion around the central electrode with simultaneous oscillations in the z-axis

(Figure 1.6).33 Ions of different m/z are separated based on the axial frequency. Having no dependence on initial ion motion yields high resolution and accuracy of the orbitrap analyzers.

For the detection and identification of ions, an image current of the oscillating ions is differentially amplified. The analog-to-digital conversion of the image current produces a signal in the time domain. The Fourier transformation of this signal results in frequency information that is converted to the mass spectrum generated by the orbitrap.

14

This yields mass spectra with resolution up to 240000 at m/z 400 and < 1 ppm mass accuracy with internal calibration.34

Figure 1.6: The cross section view of orbitrap mass analyzer. Ions are injected perpendicular to the z-axis where each ion packet begins axial oscillation. Reproduced with permission.32

Today, orbitrap mass analyzers are commonly used in protein and peptides analysis and available in all high-resolution hybrid instruments introduced by Thermo Fisher

Scientific. The most commonly used instruments are Orbitrap Discovery, XL, Velos,

Elite, Fusion, and the Exactive class instrumentation.

1.3.2.4 Hybrid mass analyzers and data dependent acquisition

The invention of new mass analyzers combined with the demand for fast, high- performance instrumentation with ion activation and dissociation capabilities led to a new

15

type of mass spectrometers with multiple mass analyzers. The generic term “hybrid mass spectrometers” were given to such instruments. The hybrid instruments combined the performance merits of two or more types of analyzers resulting mass spectrometers competent in producing tandem (MS/MS) mass spectra.

The most common hybrid tandem mass spectrometers are Q-TOF, QIT, QICR and linear ion trap-orbitrap mass spectrometers. The research presented in this dissertation utilized a Thermo Velos Pro dual-pressure linear ion trap mass spectrometer and a

Thermo Scientific Orbitrap Elite mass spectrometer. Therefore, the design and the performance of these instruments in biological sample analysis are discussed below.

In the Thermo Scientific Velos Pro dual-pressure linear ion trap, a single aperture separates the two linear ion traps which are kept at 5x10-3 Torr and 4x10-4 Torr.35 The instrument includes a stacked ring ion guide called an S-lens for ion beam focusing, a curved quad to prevent neutral species from entering the MS analyzer and two linear ion traps for ion activation and subsequent detection. These improved features of the Velos

Pro ion trap increase the efficiency of selective trapping and isolation, scan rate, resolution (25000 FWHM), decrease precursor isolation time and collision induced dissociation activation time compared to a generic ion trap instrument. As a result, the experiment yields more sensitive, high throughput data compared to generic linear ion trap instruments.

The commercialization of the first hybrid linear ion trap-orbitrap instrument revolutionized the mass spectrometry-based proteome analysis over the past decade. The first instrument design consisted of built-in linear ion trap with ion activation capability 16

and a high-resolution orbitrap mass analyzer with a mass resolution of R=60000 at m/z

400.36 Ever since then MS-based protein analysis on hybrid orbitrap instrumentation has been the center of protein discoveries from cell lysates to human plasma samples.37,38 The

Orbitrap EliteTM is one of the orbitrap class hybrid mass spectrometers with dual pressure linear ion traps (Figure 1.7). This class of orbitrap hybrid instrument comprises three major improvements compared to its previous family members. The ion transfer optics contain a square quadrupole with a neutral blocker, which blocks the line of sight, a faster ion trap scan rate (higher than 12 Hz) and a compact high field orbitrap39 with a two-fold increase in resolving power.40

The instrument has the ability to scan m/z from 50-2000 or 200-4000 with less than

1 ppm mass accuracy and a maximum resolution of R=240000 at 400 m/z. The Orbitrap

EliteTM has three ion activation methods; collision induced dissociation (CID),higher energy collisional dissociation (HCD, which is a type of CID performed in an external multipole and electron transfer dissociation (ETD) that facilitates protein and proteome characterization. The advanced scan modes can perform simultaneous scanning at both the ion trap and the orbitrap resulting in better use of molecules eluting from the HPLC.

Data-dependent acquisition (DDA) is one of the most commonly used scan modes of the hybrid orbitrap instrumentation. The first mass scan is performed in the orbitrap mass analyzer to detect the precursor m/z ratios at a higher resolution (R=240000 or

120000 at m/z 400).41 Next, the instrument selects the top N number of most intense precursor peaks from the precursor m/z scan and sends them to the ion trap or HCD cell for ion activation. 17

18

Figure 1.7: The schematic of Thermo Orbitrap Elite hybrid mass spectrometer. The instrument consists of an s-lens, square quadrupoles, an octapole, two linear ion traps, a quadrupole mass filter, c-trap, an orbitrap, HCD collision cell and transfer multipole for reagent ion transfer. The ion activation can be obtained by either CID, HCD or ETD. Both the precursor and the fragment ions can be detected at the ion trap or the orbitrap mass analyzers. Reproduced with permission.34

The fragments are then detected in the ion trap at low resolution or in the Orbitrap at higher resolution to produce the tandem mass spectra. Ion trap detects the fragment m/z at low resolution. Most biological samples have many proteins and the digests of such samples produce a complex mixture of peptides. Therefore, in most experiments tandem mass spectra (MS/MS) are generated in the ion trap after trap-based ion activation. In data analysis, the precursor mass spectra obtained at high resolution in the Orbitrap are combined with low resolution MS/MS spectra to identify the resulting peptides and proteins. The data analysis will be discussed in more detail later in this chapter (section

1.5 ).

1.3.3 Ion activation methods

The activation of ions is achieved independently to the ionization, and both precursor and products are detected based on their m/z ratios. The three main most common ion activtions methods are collision induced dissociation (CID), higher energy collision induced dissociation (HCD) and electron transfer dissociation (ETD). Each method forms a unique set of fragments ions (Figure 1.8) which is then used in peptide/protein sequence identification.

19

Figure 1.8: Roepstorff and Fohlman nomenclature for peptide fragmentation. Reproduced with permisssion42

1.3.3.1 Collision-induced dissociation (CID)

Collision-induced dissociation (CID) is the most commonly used ion activation method in protein mass spectrometry. With CID the precursor ions undergo multiple collisions with a neutral gas, most commonly: He, N2 or Ar. In the Velos Pro and

Orbitrap Elite, this is a low energy trap-based ion activation process that occurs inside the high-pressure linear ion trap in Velos Pro and LTQ orbitrap EliteTM mass spectrometers.

The kinetic energy of the peptide ion is converted to vibrational energy of the peptides upon. The redistribution of energy throughout the peptide causes bond breakage resulting in product/fragment ions. The backbone fragmentation of a peptide in CID generates b and y ions (Figure 1.8). Tryptic peptides result in a predominant y ion series due to C- terminal charge retention. Further, CID also results in product ions with secondary water or ammonia loss. Overall, this ion activation method is more effective for fragmentation of small low-charged peptides compared to longer peptide sequences. 20

1.3.3.2 Higher-energy collision induced dissociation (HCD)

HCD is an additional ion activation method that gained popularity in biological mass spectrometry over the last few years.43 This is a beam-type CID that employs higher energy dissociation than trap CID enabling multiple fragmentation pathways. The precursor ions are accelerated into the collision cell containing a neutral gas. The ion activation results in more fragment ions than trap CID. Like trap CID, HCD yields b ions and y ions. These fragment ions are then transferred back through the c-trap into the orbitrap for ion detection at higher resolution. Compared with CID, HCD has no low mass cut-off, detects the fragments at a high resolution and results in much better quality tandem mass spectra. Therefore, this is more abundantly used in low mass reporter ion- based quantification of peptides (iTRAQ, TMT). However, due to longer acquisition times required by Fourier transform detection of ions HCD results in longer scan rates; therefore, CID often outperforms HCD in in-depth protein analysis.44

1.3.3.3 Electron transfer dissociation (ETD)

Electron transfer dissociation is another ion activation method used in protein and peptide analysis. ETD utilizes an ion-ion chemistry to transfer an electron to excite and cleave the peptide backbone.45 Novel mass spectrometry instrumentation is designed with linear ion traps that can be used for both CID and ETD-based fragmentation as required by the experiment.

21

Figure 1.9: The ETD fragmentation scheme. The multi protonated peptides are reacted with a radical anion that transfers an electron to systematically fragments the peptide backbone. Reprinted with permissoion46

In ETD, the transfer of an electron from a radical anion to the protonated peptide initiates the fragmentation. Cleavage at Cα-N bond of the peptide (Figure 1.9) is observed creating a complementary series of c ions and z ions.46 Unlike CID, ETD is a low-energy fragmentation method that preserves posttranslational modifications (PTMs) on peptides.

Therefore, ETD is often used in the characterization of PTMs such as phosphorylations and glycosylations. ETD works best with multiply charged peptides and proteins47 and is used in top-down protein characterization. In global proteomic analyses, CID, however, outperforms ETD as ETD requires careful tuning of the instrument, more frequent

22

maintenance schedule, and longer acquisition times resulting in fewer peptide identifications.48

1.4 Introduction to separation of samples

In clinical and biological sample analysis MS detection is limited to high abundance proteins. This is mainly due to the complexity of the protein expression levels in the proteome. The dynamic range barrier in direct MS analyses can vary from 106-107 orders of magnitude (Figure 1.10.A).49,50 MS only detects a fraction of these proteins in the sample proteome and quantify even less masking some of the clinically significant protein level changes (Figure 1.10.B). For an example, the structural proteins like histones, enolase, tubulin and GAPDH51 in cancer cell proteome have much higher expression levels than that of some metabolic regulator proteins like AMP-activated protein kinase (AMPK) or serine/threonine kinase 11 (STK11). In clinical sample analysis, all these proteins are important, therefore, require methods that reduce the sample complexity in ion detection.

Orthogonal separation or fractionation of peptide samples prior to mass spectrometry has been used to address the issue of sample complexity. The most commonly used fractionation methods are sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE)52, peptide isoelectric focusing (IFE)53, ion exchange chromatography (strong cation/SCX and strong anion/SAX)54, reversed phase high pH fractionation,55 and reversed phase liquid chromatography (RP-HPLC).12 Among all, RP-

HPLC is extensively used in peptide separation. 23

24

Figure 1.10: A- The dynamic range of proteins in the tissue proteome. B- The problems faced by MS-based protein identification. Reproduced with permission.56

25

Figure 1.11: Biological sample preparation and analysis. A- samples are separated using 2D SDS-PAGE followed by proteolytic digestion. B- The peptides are fractionated using multidimensional chromatography. All samples are analyzed by tandem mass spectrometry for protein identification. Reproduced with permission.57

In most experiments, a combination of two or more separation methods is used

(Figure 1.11) prior to MS analysis of complex protein digests.58,59 The most common combinations of multidimensional chromatography methods are SCX-RP-HPLC60 and

RP-RP-HPLC55. The first-dimensional separation can be obtained on-line or off-line to analytical separation.60 Over the last few years, the advancements in HPLC systems brought these first-dimensional off-line fractionation techniques in-line with the generic

HPLC systems.61 RP-RP-HPLC based separation is discussed here in detail.

1.4.1 Reversed phase high performance liquid chromatography

The coupling of RP-HPLC with ESI-MS has provided one of the most powerful techniques available for peptide mapping.62 This has been used to separate mixtures with varying complexity from multi-length peptides to identical sequences with single amino acid variation.63 The RP-HPLC method uses a non-polar solid sorbent with isocratic or gradient polar mobile phase. The change in the mobile phase composition is used to elute the peptides for ESI-MS analysis. The technique has become popular in clinical sample analysis for biomarkers and biotherapeutics.64

1.4.1.1 Reversed phase/Reversed phase high performance liquid chromatography

Reversed phase High Performance Liquid chromatography coupled to tandem mass spectrometry provides protein level information of biological systems. In an RP LC-MS analysis, the majority of ions competing for ionization are high abundance proteins.

26

Although these identifications are sufficient to monitor the changes in the most abundant protein levels, the mid and low abundance proteome remains untouched. The coupling of other fractionation methods with RP-HPLC is a solution for deep proteome sequencing.

Reversed phase fractionation was first introduced off-line to RP-HPLC separation.55 It is now incorporated into the generic HPLC systems with modifications.

A multidimensional nano UPLC system with two in-line reversed phase systems was used in this dissertation research. The reversed phase/ reversed phase high performance liquid chromatography system contains two sets of pumps; one connected to the first dimension and the other to the second dimension.

Figure 1.12: The schematic of the Waters two-dimensional ultra performance liquid chromatography system. The system consists of two valves; the injection valve and the trap valve to switch between trapping and analytical separation modes of operation. Reproduced from the Waters nanoacquity user’s manual.

27

As indicated in Figure 1.12, the sample is injected into the high pH (pH 10.0) reversed phase fraction system (RP1, first dimension). The bound peptides are eluted from the fractioning column at the beginning of trapping by varying the percent organic composition of the mobile phase followed by analytical separation at low pH (2.4) in the reversed phase analytical column. Each eluate fraction is ionized and sent for MS analysis and detection. The novel RP/RP-HPLC system designs have the capability to fractionate, desalt and separate peptide samples in two reversed phase systems with three

C18 columns. These online RP/RP-HPLC systems outperform SCX-RP analysis yielding more peptide and protein identifications,61 thus providing access to the mid and the low abundance proteins in clinical samples.

1.4.1.2 Affinity separation coupled to reversed phase high performance liquid

chromatography

Unlike reversed phase separation, affinity chromatography enriches the samples for a specific peptide or a group of peptides. This is especially useful in peptide posttranslational modification analysis.65 Phosphoproteins are one of the important products of protein posttranslational modifications. Monitoring the changes at phosphopeptide level, however, is difficult due to the relatively low abundance of phosphorylation sites in the cells.

Affinity separation techniques are often used to enrich samples for phosphopeptides.

The phosphopeptide enrichment is completed off-line or online prior to RP-HPLC separation. In most occasions, this is achieved off-line using mini enrichment columns.66 28

The most common methods are immobilized metal affinity chromatography (IMAC),

66 TiO2 and polyMAC. The TiO2 affinity enrichment is used for research described

Chapter 4.

In TiO2 enrichment, the phosphoryl group forms coordinate bonds with the metal ion retaining the phosphopeptides in the affinity column. The resulting eluent exiting the column (flow-through) contains the nonphosphorylated peptides. The phosphopetides are released from the affinity column by increasing the solvent strength of the mobile phase.

The resulting eluate comprises of phosphopeptides that can be injected into a reversed phase liquid chromatography system.

1.5 Data analysis in proteomics

Bottom-up proteomics data contain indirect protein information. The data conversion from MS/MS spectra to protein identification can follow two methods. De novo sequencing interprets spectra by identifying the amino acid sequences. The mass difference between each peak yields the mass of an amino acid and the addition of adjacent amino acids results in a peptide sequence. The unique peptide sequences reveal the identity of the protein.

Unlike pure protein samples most clinical MS/MS experiments acquire many thousands of tandem mass spectra per sample. Manual interpretation of MS/MS spectra is a possibility, however, the complete process requires an enormous amount of time.

Search algorithms, theoretical databases, and peptide scoring models were designed for

29

large proteomic data analyses. The protein identification is obtained by searching experimental peptide mass spectra against a theoretical database via in silico digestion of the proteins. The theoretical databases used in dissertation work were downloaded from http://www.uniprot.org and http://www.ncbi.nlm.nih.gov/protein.

1.5.1 Database search algorithms

SEQUEST is one of the first algorithms developed for tandem mass spectrometry data analysis.67 This has been incorporated in vendor specific software like Thermo

Proteome Discoverer. The algorithm uses a simple descriptive model for peptide identification. Each experimental spectrum is matched with the corresponding theoretical fragmentation spectrum. Each spectral match is rewarded with a preliminary score and a correlation score. The correlation score is known as the XCorr, which depends on the

68 spectral quality, peptide mass, and the charge state. The ΔCn is calculated from the normalized XCorr values between the best sequence match and the others. A larger ΔCn value represent the best sequence match identified by the algorithm. The algorithm also supports the search of posttranslatonal modifications; however, does not provide any localization information.

A second search engine used in data analysis is Andromeda. This is a free peptide search algorithm embedded in the MaxQuant software suite (by the Max Planck

Institute). Andromeda uses a probability-based score derived from the p-score.69,70 The algorithm calculates theoretical fragments based on the peptide sequences and accommodates PTMs including phosphorylation. This assigns and scores even complex 30

patterns of posttranslational modifications. This is built into MaxQuant software and is able to run on user’s computer without external computing power. More importantly, this is designed to handle both ion trap and orbitrap generated tandem mass spectra improving peptide identifications. Overall, Andromeda is designed to analyze and identify peptides in large data sets.

The third algorithm used in data analysis is Myrimatch software (Vanderbilt

Medical Center). This is a free software developed for shotgun data analysis. Unlike many scoring algorithms used in MS/MS data analysis, Myrimatch accounts for the differences in peak intensity. The algorithm uses a statistical model to score matching peptides based on the multivariate hypergeometric distribution.71 The scoring model uses the probability of the best random spectral match to eliminate random matches in experimental data.

1.5.2 False discovery rate

Global scale, shotgun experiments acquire thousands of tandem mass spectra per sample in a proteomic experiment. The search algorithm infers the peptide spectral matches (PSMs) deduced from the theoretical sequences to these experimental mass spectra to identify all peptide sequencesand those are then assembled into protein groups.

The identifications often consist of false positive and true positive peptides or proteins.

The confidence of the data compilation is, therefore, validated through false discovery rate (FDR) assignments.

31

FDR represents the fraction of false positive assignments in a peptide group.72

Among the methods used to control FDR of PSMs, the use of a decoy database is the most common. For FDR assignment, the data are searched against all possible sequences and reverse sequences (decoy). The fraction of false positives is determined based on the reverse sequences for FDR of PSMs. The FDR assignment for proteins is more complicated than that of PSMs. When PSMs are assembled into proteins, the PSM error propagates in a nontrivial way.73,74 The protein level FDR, therefore, is kept higher than the FDR for PSMs during protein identification.

1.6 Quantification of proteins in clinical and biological samples

The goal of cancer tissue or cell line proteomics is to identify disease induced protein level changes. These changes can result in fluctuations in protein quantity or posttranslational modification status. Such changes in a proteome could yield potentially useful protein networks, diagnostic, predictive or prognostic markers. Liquid chromatography coupled to tandem mass spectrometry offers a way to detect such proteins. Without proper quantification, however, it is difficult to predict the nature of these protein changes.

Mass spectrometry-based quantification of proteins can be performed on a global scale (shotgun) or targeted. While techniques like multiple reaction monitoring facilitate targeted quantification, the labeling or label-free techniques provide quantitative measurements in global scale shotgun analyses. The MS-based relative quantification techniques are chemical labeling, metabolic labeling, and label-free (Figure 1.13). 32

Label-free quantification is a technique commonly used in the proteomic analysis of clinical samples. The samples are analyzed separately, and the data arecombined in the final step to obtain quantitative information. The protein level changes are quantified based on the precursor peptide ion intensity or by comparing the number of tandem spectra (spectral counts) acquired per peptide sequence. The label-free analysis is an inexpensive method applicable to an unlimited number of samples, however, it requires more instrument time than a targeted proteomic experiment involving labeling.

Figure 1.13: The common quantitative mass spectrometry workflows used in shotgun proteomics. The blue and the yellow indicates the different sample groups. The horizontal dash lines indicate the point in which the sample groups are combined. Reprinted with permission.75

33

1.7 Non-mass spectrometry based techniques used in clinical sample analysis

In clinical biology, mass spectrometry is often used to discover new proteins or protein level modifications in cell lines and tissue samples. These proteins are often validated in multiple samples to confirm respective changes and the biological relevance.

Western blot analysis76 and immunofluorescence are two other techniques often used in parallel with mass spectrometry, although some mass spectrometrists question the use of these methods to “validate” clear cut, specific MS data.

1.7.1 Western blot analysis

Western blot analysis is an important technique in molecular biology which is used to separate and identify proteins in complex sample mixtures. In this method, SDS-PAGE separates the proteins based on the molecular weight into separate bands.77 Next, each band is transferred onto a membrane, most commonly polyvinylidene difluoride

(PVDF).76 The membrane is then incubated with the primary antibody specific to the protein of interest. The incubation time for this step varies from 1 h at room temperature to overnight at 4˚C. The excess antibody is rinsed off, and the membrane is incubated with the secondary antibody usually linked to an enzyme that results in a cleavage of a chemiluminescent agent. The bound antibody signal is developed on a film. In western blot analysis, the thickness of the band represents the abundance of the protein, which can be compared across a wide range of samples.

34

1.7.2 Immunofluorescence microscopy

Immunofluorescence is a spectroscopy technique more commonly used in in vivo detection and validation of proteins.78 The most common samples are live tissue or cell lines.78 Both direct and indirect sample detection methods are used for the detection of the antibody-bound protein. In direct immunofluorescence, the protein of interest is tagged with a protein-specific primary antibody containing a fluorophore. In the second analysis method, the primary antibody is used to capture the protein of interest and the secondary antibody tagged with a fluorophore is used to detect the protein bound primary antibody.78 The proteins labeled with these fluorophores are then monitored using a confocal microscope. The quantity is often determined by measuring the intensity of the fluorophore. The main challenges of the technique are photobleaching of the fluorophore over time79 and the inability to validate many proteins in one experiment.

1.8 Specific techniques and introduction

The introduction for formalin-fixed paraffin-embedded tissue preparation and the analysis of patient tumor tissue cohorts are given in Chapter 2 and 3 respectively. The specific techniques and corresponding conditions used for work described in this dissertation are provided separately in each chapter.

35

Chapter 2: Preparation and Analysis of Formalin-Fixed Paraffin-Embedded

Tissues by Two-Dimensional Liquid Chromatography Coupled to

Tandem Mass Spectrometry

2.1 Introduction

Formalin-fixed paraffin-embedded (FFPE) tissue specimens are often prepared during routine histopathology analysis of resected tumors or biopsy samples. The fixation process forms cross-links between proteins (Figure 2.1) allowing these sample to be stored at ambient temperatures over a long period of time without degradation. These heavily cross-linked proteins are preserved in the tissue samples and can be examined even after years in storage. The tumor banks archive FFPE specimens and catalog information associated with it: such as patient number, clinical history, treatment regimen, gender, age, histology and the clinical outcome. Most tissue banks contain tumor specimens collected over a span of decades representing a variety of cancers, different gene mutations, patients from different geographic locations, and even ethnicities. The wealth of information available with these tissue cohorts enables the pairing of samples based on clinical outcome and provides an invaluable resource for protein biomarker analysis.

36

For the longest time, immunohistochemistry (IHC) analysis has been the technique associated with FFPE tissue protein biomarker detection and validation. However, IHC is less appealing as a protein discovery method due to its demand for paraffin compatible antibodies, and inability to quantify80 proteins. Liquid chromatography coupled to mass spectrometry-based bottom-up proteomics is one of the most popular techniques used in global scale proteome profiling of clinical samples. Mass spectrometry-based proteomics enables tissue protein detection while facilitating the relative quantification of proteins.81,82 Many mass spectrometry-based analyses use fresh-frozen tissues or cell lines. FFPE tissues are not directly compatible with the generic bottom-up protein detection methods; therefore, they require novel sample preparation methods, tissue extraction and lysis buffers, peptide fractionation methods, and mass spectrometry detection methods.

The first critical step in mass spectrometry-based FFPE tissue protein analysis is to reverse the protein cross-links formed during formalin fixation (Figure 2.1). The cross- link removal procedure requires efficient methods of paraffin removal and tissue rehydration. Xylene, sub-x clearing medium, or octane followed by a series of ethanol or methanol washes has been used in most tissue extraction procedures to remove cross- links.6,83,84 The use of xylene followed by ethanol-based rehydration is the most common approach. Like deparaffination solvents, the extraction buffers that follow rehydration play an important role in sample preparation. The most common extraction buffers are radioimmunoprecipitation assay (RIPA) buffer, Liquid TissueTM MS Protein Prep kit, and modified tris buffers with a detergent like sodium dodecyl sulfate (SDS).85-87

37

Figure 2.1: The formalin fixation process of proteins. A- the primary amine group of the protein reacts with formaldehyde and forms a hydroxymethyl-methylol adduct B- the elimination of water to form a Schiff-base product C- the amino acid side chain reacts with the methylene carbon of the Schiff-base to form the cross-linking product. Reproduced with permission.88

The next step in sample preparation is protein digestion. Unlike fresh frozen sample specimens, FFPE tumor tissues are sample limited and yield fewer protein identifications than fresh tissue.89 Optimized digestion protocols are vital in improving proteome coverage. The most common methods used are direct tissue trypsinization, in-solution digestion, and filter-aided sample preparation (FASP).90-92 A recent study conducted on these methods indicated that FASP and direct trypsinization are more effective compared to in-solution digestion method.91

38

The direct trypsinization method includes a tissue homogenization step with a buffer such as Liquid TissueTM84 or mixtures of ammonium bicarbonate and acetonitrile93,94 or ammonium bicarbonate and trifluoroethanol6. In-solution digestion is one of the most common digestion methods used in protein analysis and consists of protein extraction using a detergent rich lysis buffer, detergent removal, and digestion.82,95,96 FASP is a protocol developed in recent years and has presented great potential in FFPE tissue analysis. The FASP method is comprised of protein extraction, cell lysis, detergent removal and molecular cut-off filter aided digestion.89,92,97,98

However, FASP requires 20 µg91 or higher amounts of proteins which may not be available in small primary tumors. Modified extraction and digestion protocols based on the above methods can be utilized to improve the sample preparation protocol.

Most of the previous studies have been performed on one-dimensional liquid chromatography separation coupled to mass spectrometry yielding a small percentage of the tissue proteome.85,99 While these experiments facilitated the detection of high abundance proteins in the proteome, the detection of mid and low abundance proteins greatly suffered signal loss. Recently, a few studies have been performed using off-line peptide fractionation methods such as gel-based isoelectric focusing100 and strong anion exchange fractionation90 that resulted in more peptides and proteins than generic 1D

HPLC coupled to mass spectrometry detection. High pH reversed phase fractionation55 is one of the new fractionation methods introduced for peptides, however, it has not been used in formalin-fixed paraffin-embedded tissue analysis.

39

Figure 2.2: The structure of RapiGest detergent indicating A- the cleavage sites of the protein in the presence of the Rapigest detergent B- the cleavage products of the detergent in low pH solvents. Abstracted from Waters RapiGest SF user’s manual.

Here, we address the problems associated with sample preparation by introducing a modified protocol for protein extraction and digestion and present the use of online high pH reversed phase prefractionation in FFPE tumor specimen analysis. The protocol uses a combination of generic in-solution digestion and direct trypsinization sample preparation methods. We modified the method by replacing the detergent in our extraction buffer with a mass spectrometry compatible detergent to simplify the detergent removal step. The detergent used in this analysis is RapiGest (Figure 2.2). RapiGest is a thermally stable compound compatible with high-temperature protein extraction. This compound is unstable under low pH conditions (Figure 2.2: The structure of RapiGest

40

detergent indicating A- the cleavage sites of the protein in the presence of the Rapigest detergent B- the cleavage products of the detergent in low pH solvents. Figure 2.2) and precipitates during acidification of the sample. The use of the same buffer for extraction, lysis, and digestion followed by trypsinization improved the recovery of the peptides.

With the modified protocol, we were able to analyze small primary tumor samples and identified about three thousand protein groups per sample. Further, RapiGest containing protein extraction buffer reduced the incubation time for digestion by ten hours, from overnight to 4 hours.

For peptide fractionation and separation, we utilized our multidimensional liquid chromatography system. The first dimension of the HPLC system contains a high pH reversed phase system that fractionates the peptides. We improved the protein and proteome coverage by optimizing the number of high pH fractions and the analytical gradient. As illustrated below, the fifteen fraction method presented the best proteome coverage. The optimized multi-fraction HPLC method coupled to mass spectrometry improved the sensitivity per 4.5 µg of total proteins by enabling protein detection over four orders of magnitude in FFPE primary lung tumor specimen analysis.

2.2 Materials and methods

2.2.1 Tissue deparaffination and protein extraction

Two types of FFPE tissue specimens (FFPE tissue block x 1 and charged glass slide specimens x 2) were subjected to three different deparaffinization and protein extraction

41

methods. First, 20 µm x 20 mm x 10 mm FFPE tumor tissues slices were obtained from the paraffin block into a microcentrifuge tube (Fisher Scientific). Second, the two glass slide specimens were macrodissected (5 µm x 10 mm x 5 mm) and one of the two specimens was placed in a microcentrifuge tube. Third, the remaining macrodisected sample was (5 µm x 10 mm x 5 mm) subjected to on-slide deparaffinization.

All samples were deparaffinized in xylene followed by rehydration in a graded ethanol solution series (100%, 90%, and 70% ) as previously described.83 The tumor tissues were homogenized in a modified lysis buffer containing 0.2% RapiGest SF

(Waters Corporation, Milford, MA) in 50 mM ammonium bicarbonate and 5 mM DTT

(Sigma-Aldrich). The lysates were incubated at 105 ˚C for 30 min and stored on ice for 5 min. Next, the samples were sonicated using Sonic Dismembrator Model 100 (Fisher

Scientific) at 20 W for 20 s with 30 s intervals. This sonication step was repeated twice, followed by incubation at 70 ˚C for 2 h. The protein concentration of each tissue lysate was determined by BCA protein assay (Thermo Fisher Pierce, Rockford, Illinois) using the manufacturer’s protocol.

2.2.2 Digestion of tissue protein samples

A total of 100 µg of tissue proteins were reduced with 50 mM DTT at 56 ˚C for 30 min followed by alkylation with 100 mM iodoacetamide (Sigma Aldrich) at room temperature for 20 min in the dark. The samples were digested with sequencing grade trypsin and trypsin and LysC mixture (Promega Corporation, Madison, WI). The proteins were digested with trypsin at a ratio of 1:50 (w:w protease to protein). For multi-enzyme 42

digestion, trypsin and LysC were used at a ratio of 1:50 (w/w, protease to protein) and

1:100 (w/w, protease to protein) respectively. The sample was incubated at 37 ˚C overnight according to manufacturer’s protocol. A second aliquot of the same sample was incubated at the same temperature for 4 h (user defined, refer to section 2.3.1 49). Based on the results obtained in the previous experiment, a time course analysis was conducted to determine the optimum incubation time for proteolysis. Identical protein aliquots were incubated at 37 ˚C for 2 h, 4 h, 6 h, 9 h and 14 h. At the end of each defined incubation time, the samples were acidified with 0.5% TFA and incubated at 37 ˚C for 30 min followed by centrifugation at 14000 g for 15 min. The supernatant was collected and evaporated to dryness in a Speed-Vac concentrator (Thermo Scientific). The samples were stored in -80 ˚C until LC/LC-MS/MS analysis.

2.2.3 Optimizing reversed phase online fractionation and high performance liquid

chromatography separation of protein digests

For the analysis of protein digests, liquid chromatography coupled to tandem mass spectrometry was used. Waters nanoACQUITY two-dimensional (2D) UPLC system

(Waters Corporation, Milford, MA) was equipped with two reversed phase columns interfaced to a Thermo LTQ-Orbitrap Elite hybrid mass spectrometer (Thermo Fisher

Scientific, Bremen, Germany). All protein digests were reconstituted in 100 mM ammonium formate to prepare a 1 µg/µL solution of the sample and injected using a nanoACQUITY UPLC autosampler (Waters Corporation, Milford, MA). The peptides were fractionated online at high pH prior to analytical separation. 43

The fractionation of peptides was achieved in the first reversed phase column

(Waters BEH C18, 130 Å, 1.7 µm particle diameter, 300 µm i.d., 100 mm) at pH 10.0 in

20 mM ammonium formate (A1) by varying the amount of 100% acetonitrile (B1) in the mobile phase. The column was equilibrated at 3% B1 (v/v), prior to fractionation. Each fraction of peptides was eluted from the column by increasing the %B composition at 1 min, and the column was held at this composition for a total of 4 min before changing the mobile phase composition back to 3% B1. The %B1 compositions used in each multifunction method is as indicated in Table 2.1. At the end of each peptide elution, the fractioning column was equilibrated at 3% (v/v) B1 for 4 min before starting the next method. Each separation was achieved at a steady flow rate of 2 µL/min.

Each fraction eluted from the fractioning column was loaded onto a Waters

Symmetry C18 trap column (100 Å, 5 µm particle diameter, 180 µm x 20 mm) and desalted at a flow rate of 20 µL/min. The analytical separation was achieved in the second reversed phase column (Waters HSS T3, C18, 100 Å, 1.8 µm particle diameter,

75 µm i.d. X 150 mm) at pH 2.4 which was equilibrated to initial conditions; 95% (v/v) water with 0.1 % formic acid (A2) and 5% (v/v) acetonitrile with 0.1 % formic acid (B2).

Four linear gradients achieved the subsequent separation at 38 ˚C where % B2 was increased from 5%-9% in 3 min; 9%-30% over 44 min; 30%-40% over 5 min and 40%-

85% over 5 min at a flow rate of 0.5 µL/min. The column was held at 85% (v/v) B2 from

65-70 min before reequilibrating at 5% B2.

44

% Acetonitrile

Fraction 3-fraction 5-fraction 10-fraction 15-fraction number 1 13.1 10.8 7.4 4.7 2 17.7 14 10.8 9.0 3 65 16.7 12.6 10.8 4 - 20.4 14 12.0 5 - 65 15.3 13.1 6 - - 16.7 14.0

7 - - 18.3 14.9

8 - 20.4 15.8 - 9 - - 23.5 16.7 10 - - 65 17.7 11 - - - 18.9 12 - - - 20.4 13 - - - 22.2 14 - - - 25.8 15 - - - 65

Table 2.1: Percentage of acetonitrile (solvent B1) used in each fraction in 3-fraction, 5- fraction, 10-fraction and 15-fraction 2D HPLC methods.

2.2.4 The identification of peptides by tandem mass spectrometry

The 2D HPLC was coupled to LTQ-Orbitrap Elite via a nanospray Flex ion source

(Thermo Fisher Scientific, Bremen, Germany) containing a 30 µm inner diameter stainless steel emitter (Thermo Fisher Scientific) with spray voltage between 1.7-1.8 kV.

The Orbitrap (FT) mass spectrometer was operated in data-dependent acquisition (DDA) mode. Three different DDA MS parameter groups were tested to see which would provide the greatest number of protein IDs. Each parameter group contained a fixed ion activation method and dynamic exclusion window (DEW). The first set of experiments

45

were conducted to decide this fixed DEW between 15 s and 30 s and the ion activation method between collision-induced dissociation (CID) and higher energy collisional dissociation (HCD). Second, the best DDA parameters were determined based on the maximum number of peptides and protein groups produced by each DDA method.The three parameter groups are as indicated in Table 2.2. The full MS scans were acquired in the orbitrap mass analyzer followed by a top 10, 12 or 15 MS/MS acquisition in the ion trap.

Parameter type Group 1 Group 2 Group 3 Resolution* 240000 120000 120000 IT injection time* 10 ms 20 ms 50 ms Microscans FT/IT* 1/4 1/4 1/1 Fragmentation CID CID CID Fragmentation method* Top 10 Top 12 Top 15 Preview scan* On On On Predict ion injection time On On On AGC IT* 3x104 3x104 5x103 AGC FT* 5x105 5x105 107 Early expiration Off Off Off Scan mode IT* Normal Rapid Rapid Dynamic exclusion* 15 s 15 s 15 s Table 2.2: The parameter groups tested in the mass spectrometry method optimization. Each group is composed of variable parameters. (*) indicates the paramaters optimized in the analysis.

46

In parameter group one, the MS spectra were acquired in the Orbitrap at a resolution of R= 240,000 at m/z 400 for every 5x105 charges acquired in the Orbitrap trap mass analyzer. This acquisition triggered MS/MS scans for the top 10 most abundant m/z peaks after CID for an ion trap automated gain control (AGC) target value of 30000 charges.

The method was used with an ion trap injection time of 10 ms; 4 micro scans in normal mass scan mode. Parameter group 2 and 3 were determined based on group one. The resolution, IT injection time, FT and IT micro scans, fragmentation method, AGC IT,

AGC FT and IT scan modes were optimized (Table 2.2).

Each method was programmed with an ion transfer tube temperature at 275 ˚C: S- lens RF 55%, dynamic exclusion with a repeat count 1 and repeat duration of 15 s for exclusion list size of 500 m/z, CID with normalized collision energy of 35%, q= 0.25 and activation time of 10 ms. The minimum signal intensity threshold was set to 6000 counts.

2.2.5 Determining the limit of quantification for label-free shotgun proteomics

Finally, the limit of quantification was determined by using a spiked internal standard protein (E.coli chaperonin) at 0.6 ng/mL, 60 ng/mL, 0.6 µg/mL and 60 µg/mL concentrations. The 15-fraction HPLC method was used with top 15 DDA method. The number of spectral counts reported for each spiked standard was comapred to estimate the minimum quantifiable concentration of proteins in FFPE protein extracts by a multidimensional liquid chromatography coupled to LTQ-Orbitrap Elite mass spectrometer.

47

2.2.6 Identification of peptides and protein groups in tissue protein samples

For protein identification, Proteome Discoverer 1.4 (Thermo Fisher Scientific) was used with a customized UniProt human database (http://www.uniprot.org, download on

12/03/2015) and cRAP contaminant database containing the most common contaminants

(http://www.thegpm.org/crap). The raw files generated by Xcalibur software (Thermo

Fisher Scientific) for three, five, ten and fifteen fraction HPLC runs were used in the peptide identification.

The MS/MS spectra were searched with fixed carbamidomethyl modification at cysteine, and variable acetylation at protein N-termini, oxidation of methionines and deamidation at asparagine and glutamine. A maximum of two missed cleavages was allowed for every fully tryptic peptide (proline rule applied) with a minimum peptide length of seven amino acids.

The proteins in the sample were identified at a 0.05 false discovery rate (FDR) for peptide spectral matches (PSM) and 0.05 protein FDR. Protein groups were filtered only to include peptides with 99% confidence and a minimum of two peptides per protein group.

48

2.3 Results and Discussion

2.3.1 The use of trypsin compatible protein extraction and lysis buffer

Here we present our results in optimizing the sample preparation, peptide fractionation, and tandem mass spectrometry methods for formalin-fixed paraffin- embedded tissues. These methods were developed in preparation for the cancer research project presented in Chapter 3. A schematic of the optimized protocol followed in this chapter is given in Figure 2.3.

Formalin fixation of the tissues forms cross-links preventing the direct use of these samples in bottom-up proteomic analyses. To reverse these cross-links, we used a xylene based deparaffination method followed by graded ethanol solution rehydration as previously described.100,101 The modified extraction buffer/lysis buffer contained 0.2%

RapiGest SF, a mild detergent. The use of one buffer in extraction, lysis and digestion eliminated the additional steps in sample preparation. We compared three deparaffination methods to determine the most efficient decrosslinking procedure. Each sample used here was a stage III lung adenocarcinoma tumor specimen with 10 µm thickness.

Determining the protein concentration of FFPE tissue lysates by BCA assay is not a direct approach; however, it is one of the most common methods used in determining protein concentration in lysates.85,100 The amino acids that contribute to the reduction of copper also react with formaldehyde during protein cross-linking. As a result, the protein concentrations reported here are much lower than the actual amount present in the sample.

49

Figure 2.3: The flow diagram showing the sample preparation method; A- The protein extraction method using 0.2% RapiGest, 50 mM ammonium bicarbonate, and 5 mM DTT B- the following extraction; protein alkylation, reduction, and digestion, neutralization and centrifugation

50

The estimated protein concentration of an FFPE tissue lysate is 56% less than that of a frozen tissue sample.100 We used this correction factor in concentration estimations. As presented in Figure 2.4.A, the paraffin block tissue sample produced the maximum amount of protein, and the on-slide deparaffinization method produced the second highest amount. The differences in respective protein concentrations could be due to variations in protein extraction methods or decrosslinking.

RapiGest SF was used in the extraction buffer was to improve the final protein yield. This is a water-soluble surfactant that improves the solubility of hydrophobic species such as membrane proteins. Most common proteases like trypsin and lysC are also compatible with this extraction buffer. The digestion was completed according to manufacturer’s protocol with overnight incubation for 14 h. However, 5-7 h digestion periods have also been reported for a RapiGest SF based proteolysis.102 To determine the most effective incubation time, we digested another aliquot of the same sample for a period of 4 h. The liquid chromatography method used here contained three high pH fractions followed by a 55 min analytical gradient separation per fraction.

With a 3-fraction 2D HPLC separation (Table 2.1 column 2) with reversed phase

C18 columns, the digestion of paraffin block sample yielded 618 protein groups and 2040 peptides with 4 h incubation and 517 protein groups and 1290 peptides with overnight incubation (Figure 2.4.B). Both on and off-slide deparaffinized tissue samples resulted in a similar number of protein identifications (458 and 464 respectively) and peptides (1715 and 1698 respectively).

51

Figure 2.4: Optimization of sample extraction and preparation. A- total proteins identified by different deparaffination and extraction platforms, B- total protein groups identified by 4 h and overnight digestion methods C- total peptides identified by 4 h and overnight digestion methods. D- total protein groups identified in the time course analysis E- total number of peptides identified in each time course experiment. HPLC conditions were kept constant for all runs: 2D, three fraction with reversed phase C18 column, C18 trap for desalting, and reversed phase C18 analytical column with 60 min gradient.

52

Overall, the sample incubated for 4 h during digestion yielded more protein groups and peptides compared to the manufacturer recommended incubation time. Therefore, to determine the optimum incubation time for protein digestion, we performed a time course analysis. The incubation time used in the experiment ranged from 2 h to 14 h. One aliquot was digested with LysC (1:100 w/w)-trypsin mixture at 37˚C as it has been reported to improve the digestion of proteins by cleaving the peptide bonds at the C- terminus of lysine.

We identified 415, 414, 408, 395, 362 and 306 protein groups in 2 h, 4 h, 4 h with

LysC, 6 h, 9 h and 14 h samples respectively (Figure 2.4.D). Similarly, to the 4 h vs. overnight results, the results indicated a gradual decrease in peptide identifications over time. The samples incubated for a period of 2 h to 4 h with trypsin yielded more protein groups and peptides compared to samples incubated for a longer time. Further, the use of trypsin-LysC mixture did not improve the proteome coverage. The 4 h incubation time generated the maximum number of high confidence peptides and the second highest number of protein groups resulting in the best coverage for most protein sequences.

The lysine is one of the amino acids involved in formalin cross-linking of the proteins. Lysine to arginine ratio in FFPE compared to the fresh frozen tissues has been reported in previous studies.89,100 In FFPE tissue protein digests, the number of lysine ending peptides is lower than the number of arginine-ending peptides. In agreement with what has been reported, our experiments also generated more arginine-ending peptides than lysine-ending peptides as shown in Figure 2.5.A.

53

Figure 2.5: A- Indicates the number of lysine-ending and arginine-ending peptides detected in the samples, B- The STRAP GO annotation analysis of proteins identified with a 4 h gradient. The figure indicates the proteins extracted and identified from different cellular compartments of the cells in FFPE tumor tissues.

54

Incomplete recovery of the cross-linked proteins during rehydration results in uncleavable lysines producing few lysine ending peptides. Another reason for the shift in

K to R ratio is imine (+12 Da)100 modification of lysine side chain with formaldehyde.

The database searches often overlook this modification yielding a low K ending peptide count. However, adding this formalin induced imine modification as a variable modification in database search did not result in any additional peptides.

The coverage of the proteome plays a crucial role in discovery protein analysis. The lack of protein coverage in certain cellular fractions directly impacts the analysis by preventing the detection of biologically important proteins. We annotated all proteins identified by the 4 h digestion method using STRAP GO annotation to determine the proteome coverage based on cellular compartment (Figure 2.5.B). The majority of the proteins identified were cytoplasmic, nuclear or from the extracellular matrix. The annotations further indicated the ability of the extraction method to detect the proteins from the cell surface, plasma membrane and . The data reported in Figure

2.5.B is similar to what has been reported in the literature.89-91

The use of FFPE tissues is a challenge that MS-based biomarker studies face when using archived tumor samples. Even under optimum extraction and digestion conditions, most methods result in a lower number of protein groups and peptides than that of fresh frozen tissues. Therefore, careful optimization of the protocol is required from protein extraction to MS-based detection.

55

2.3.2 Optimization of the number of online fractions and the analytical separation time

Complex proteomic samples are often fractionated prior to LC-MS analysis to increase the peptide and protein identifications. In FFPE tissue-based analysis this is typically achieved after digestion off-line to liquid chromatography.90 In here, we applied fractionation prior to analytical separation using a second HPLC system that was in-line with our HPLC system. High pH fractionation is one of the fractionation methods used in complex sample analysis that manipulates the total charge of a peptide by introducing negative charges. This process alters the affinity of the peptides to reversed phase C18 columns producing fractions of peptides that can be collected at different %acetonitrile concentrations as given in section 2.2.3 .

The optimum number of fractions required for the analysis was determined by analyzing FFPE protein digests with 3, 5, 10 and 15 fractions. The experiment yielded

600 protein groups and 4283 peptides with 3-fraction, 958 protein groups and 7090 peptides with 5-fraction, and 1510 protein groups and 12059 peptides with 10-fraction multidimensional liquid chromatography runs of tissue digests. The 10-fraction FFPE cell line digests resulted in 2297 protein groups, and 17797 peptides and the 15-fraction method led to 2982 protein groups and 24267 peptides (Figure 2.6). Fraction numbers larger than fifteen were considered to be too time-consuming for practical consideration.

While the numbers of proteins identified grows with fraction number, the growth is not linear. It is clear that additional fractions would yield more proteins but a 15 fraction run already takes a total of 24 hours when loading and trapping are included.

56

The %acetonitrile concentration used in each fractionation method is given in Table

2.1. The increased number of high pH online fractions improved the number of protein groups and peptides identified in each sample. Likewise, the increase in fraction number from 3-15, increased the injection amount from 1.5-4.5 µg of protein digest with an estimation of 0.3 µg per online fraction.

The goal of this analysis is to develop an LC-MS/MS method for shotgun mass spectrometry of small primary lung tumors. Most primary lung tumors are 1-2 cm in diameter. As previously presented in section 2.3.1 , the maximum mass of proteins extracted from a stage III lung tumor specimen (glass slide) was 81 µg. The primary tumors are much smaller than a stage III tumor specimen; therefore, they will result in much lower amounts of protein. Considering this limitation in sample availability we limited our maximum number of high pH fractions to fifteen in a multidimensional

HPLC method. Further, this method resulted in approximately 3000 protein groups per run, which was similar to what has been reported in the literature.94

57

Figure 2.6: The number of protein groups and total peptides detected in each HPLC run A- in each multi-fraction online HPLC method. The samples are fractionated in reversed phase C18 at pH 10.0 using different %acetonitrile amounts. The fractionation was achieved at 3%-65% acetonitrile B- with 10 and 15-fractions methods. The two analytical gradient times used in these HPLC runs were 60 min and 90 min followed by 30 min equilibration time.

58

The analytical separation time of an HPLC method determines the total analysis time for one sample. An optimized gradient results in the maximum amount of peptide and protein identifications in a minimum amount of time. We optimized the gradient time for the HPLC runs using our 10 and 15-fraction HPLC methods. The methods used a 60 min and a 90 min gradient for two identical injections of tissue digest. As previously described, the total injection amount was determined based on the number of online- fractions with an estimation of 0.3 µg of protein digest per fraction.

The 15-fraction HPLC method produced 3129 protein groups with a 60 min gradient and 2384 protein groups with a 90 min gradient. A similar trend was observed with the

10-fraction HPLC method where more protein groups were identified with the 60 min method compared to that of the 90 min method (Figure 2.6.B). The decrease in protein groups observed with the multi-fraction long gradients could be a result of peptide degradation at the 35˚C column temperature after high pH separation. Considering the performance of the 15-fraction method with 60 min separation time, we propose this method to be used in lung tumor cohort analysis.

2.3.3 Optimization of tandem mass spectrometry analysis

Here we present our work on optimizing the mass spectrometry methods and respective instrument parameters. A high-resolution Orbitrap instrument was used in

DDA mode. The Orbitrap scans all the m/z at high resolution and selects the top ‘N’ number of peaks from the precursor m/z ratios. The ion trap fragments the selected m/z ions via ion activation and acquires the fragment ion spectra for each precursor ion m/z. 59

Ion activation of the precursor generates fragment ions detected in the ion trap.

Deconvoluted fragment ion spectra yield peptide sequences that are then assembled into protein groups. The efficiency of fragmentation and fragment ion detection influence the peptide identifications and subsequent number of total protein groups. CID and HCD are two common ion activation methods used in peptide fragmentation. We used both CID and HCD to determine the best fragmentation method for FFPE tissue protein identification.

The CID fragmentation yielded 10404 peptides and 1555 protein groups compared to HCD fragmentation that yielded 6558 peptides and 1027 protein groups (Figure 2.7.A) indicating the advantages of using CID in DDA analysis of FFPE protein digests44. In

CID, the parent ions are fragmented and analyzed in the ion trap. In HCD the parent ions are fragmented in the HCD-cell and detected in the Orbitrap (Figure 1.7). The lower

HCD-based protein identifications could be a result of low ion transfer efficiencies to and from the HCD-cell and the longer duty cycle times required by the Orbitrap mass analyzer compared to that of ion trap MS/MS.

The dynamic exclusion window (DEW) is another important parameter that affects

DDA-based protein identification. The use of an exclusion window prevents the recurrent detection of high abundance peptides in the sample. In an LC-MS analysis, optimum

DEW improves the number of protein group identifications enabling the detection of mid and low abundance peptides that coelute with high abundance peptides in one chromatographic peak. Often the effect of the length of the DEW on protein

60

identifications is at a minimum, however, may vary depending on the sample complexity and the separation method as described previously.41,103

Figure 2.7: Optimization of the MS/MS method for detection of peptides and protein groups A- The number proteins and peptide groups detected with group 1 method with different dynamic exclusion windows B- protein groups and peptides detected by the experiments completed with old and new parameters C- number of MS/MS spectra acquired using parameter groups 1-3.

We verified the average width of a chromatographic peak to be between 15-30 s as indicated in Figure 2.8. The DDA method with a 15 s DEW yielded 12204 peptides and 61

1870 protein groups compared to that of 30 s DEW that resulted in 10404 peptides and

1555 protein groups. In our experiment, the change in DEW presented a 16% increase in the number of total protein identifications.

To optimize the mass spectrometry method, we implemented and tested various other parameters introduced in (*) Table 2.2. The ion injection time, top ‘N’ number of precursors to select, automated gain control (AGC), scan mode and the number of microscans per spectrum are the parameters optimized for the ion trap. The resolution, preview scan on/off status and orbitrap AGC are the parameters optimized for the orbitrap mass analyzer. All DDA methods analyzed precursor m/z ratios in the orbitrap at high-resolution (R=240000 or 120000 at m/z 400) and the fragments at low resolution in the ion trap.

For optimization of the instrument parameters, we used the generic MS method employed in the lab as indicated in group 1 of Table 2.2. First, we decreased the Orbitrap resolution from R = 240000 to R=120000 to allow relatively short Orbitrap scan times. In group 2, we increased the ion trap injection time allowing more ions to be accumulated in the ion trap prior to each tandem scan, which increases the sensitivity. Here, the number of precursor m/z used in the tandem scan was also increased from 10 to 12 with rapid ion trap scan rates. In group 3, we increased the Orbitrap AGC to 107 allowing the C-trap to send larger ion packets into the Orbitrap for detection and reduced the number of microscans per one ion trap tandem mass spectrum from 4 to 1. In here, the precursor selection was further increased to 15 m/z with an ion trap injection time of 50 ms.

62

Figure 2.8: The liquid chromatogram of FFPE tissue protein digest. The precursor mass spectrum acquired for the chromatographic peak highlighted in pink is given in the top right corner. All the m/z were acquired in the orbitrap mass analyzer at R=120000 at m/z 400.

As indicated in Figure 2.7.B parameter Group 3 resulted in the maximum number of protein groups (554) which is 10% more than that of the top-10 method used in group 1.

Both top12 and top 15 methods with rapid ion trap scan modes and reduced Orbitrap MS resolution yielded about 500 more confident peptides than our generic top-10 (group 1) method. The reduced number of microscans used in the top-15 method further improved the number of MS/MS scans in the group 3 experiment. Overall, parameter group 3 with

50 ms ion trap ion injection time, 5x103 IT-AGC, 107 FT-AGC, top 15 precursor

63

selection, one microscan per tandem spectrum in rapid ion trap scan mode produced the best results for 4.5 µg of tissue protein digests separated by multidimensional HPLC.

2.3.4 Limit of detection of the optimized LC/LC-MS/MS method and the use of

spectral counts in differential expression analysis

Despite protein cross-links and the low protein yield of the sample, LC-MS methods are constantly improved for efficient tissue proteome analysis of FFPE specimens. Here, we present our data indicating the advantages of optimized HPLC fractionation coupled to MS detection in shotgun proteomics of cross-linked tumor samples.

We validated the detection limits of the method by analyzing multiple concentrations of spiked internal standard protein, E.coli Chaperonin added to the aliquots of the FFPE protein extract at several concentrations (0.6 ng/mL - 60 µg/mL) prior to digestion. The concentrations of the spiked internal standard produced a wide protein dynamic range in the FFPE tissue extract.

The LC/LC-MS/MS method used fifteen front-end high pH fractions separated over a 60 min analytical gradient before MS detection. The MS method used top fifteen fragmentation with high-resolution precursor and low-resolution fragment ion spectra.

The optimized shotgun method indicated protein detection over four orders of magnitude

(Figure 2.9.A). At the lowest concentration of the internal standard (0.6 ng/mL), the method yielded 10 spectral counts indicating the lower detection limits of the method.

64

Figure 2.9: A- The log10 spectral counts detected for the spiked internal standard E.coli chaperonin protein. The concentration of the protein is varied from 0.6 ng/mL-60 µg/mL B- Shows the division of spectral counts in high (>100), mid (99-10), low (<10) counts per protein detected in 0.6 ng/mL experiment

In shotgun proteomics, the most significant changes between different sample groups are often determined by statistical analysis. This analysis uses intensity or the spectral count measurements reported for each protein group. Both measurements reflect the absolute and the relative abundance of proteins in the sample respectively. However, the dynamic range of proteins creates a nonlinear range for spectral counts and most proteins yield less than 20 counts indicating a negative binomial distribution. The high amounts of lower number of spectral counts decrease the sample mean close to one count. Further, the spectral counts less than 5 are often subjected to high variability and do not produce statistically meaningful data. As a result, spectral count filters are often applied to reduce this variation data.

65

In most differential expression analyses the protein groups with lower than 5 spectral counts are removed during filtering. We divided the results obtained for 15- fraction sample into different groups based on the total number of spectral counts. As indicated in Figure 2.9.B about 24% of the protein groups out of 2982 had less than 5 spectral counts. After the filtering, we obtained a total of 2266 proteins groups available for differential expression analysis. As specified earlier, about 51% of the proteins resulted in less than 10 spectral counts compared to the top 5% of the protein groups with more than 100 spectral counts.

2.4 Conclusions

The potential contribution of FFPE tissues in retrospective protein analysis of tumor samples is invaluable.These are easily accessible samples; however, not directly compatible with generic mass spectrometry-based bottom-up proteomic methods. This part of the dissertation addresses this issue by providing modified sample preparation, separation and detection methods. We optimized our methods to achieve the most effective deparaffinization method and digestion in a minimum amount of time yielding the maximum amount of protein identifications. The on-slide deparaffinization method resulted in the maximum tissue recovery for charged glass slide specimens. Our modified protein extraction buffer eliminated detergent removal and protein precipitation steps by simply replacing SDS with Rapigest. In addition, the use of modified buffer reduced the digestion time by ten hours resulting in fast sample preparation times. The improved high pH reversed phase peptide prefractionation method produced about 3000 protein groups 66

for 4.5 µg of tissue protein extract. The use of this fifteen high pH fractions coupled to top fifteen DDA method yielded more coverage of the proteins and the tumor proteome identifying proteins over four orders of magnitude. Finally, the experiments demonstrated the ability of the modified protocol to detect proteins in FFPE tissues specimen that can be used in tumor cohort protein analysis.

67

Chapter 3: The Proteomic Analysis of Patient Tumor Samples to Identify

Marker Proteins in Recurrent and Nonrecurrent Lung Cancer

3.1 Introduction

Lung cancer is the leading cause of cancer mortality in both men and women accounting for 14% of the total cancer incidences. Non-small cell lung cancer (NSCLC) is the most predominant form of lung cancer with a high mortality and low 5-year survival rate of approximately 15%. While NSCLC is often diagnosed at late stages, early diagnosis and surgery are the best chance of cure. However, despite early diagnosis and surgery, almost half of those patients recur and die from their disease. Therefore, there is a demand for prognostic markers that can identify tumor recurrence and metastasis.

High-throughput genomic sequencing has led to the discovery of DNA alterations in the . Identification of driver mutations in genes such as ALK and

EGFR104 have provided diagnostic markers and initiated novel treatment regimens.105

However, in most cases the cause of the cancer is unknown and even less is known about why cancers return.

Transcription and translation of the genome produce proteins which are very tightly regulated within a cell. Mutations in DNA may lead to the loss or gain of proteins that may have dramatic effects on the way the cell behaves. In addition to losses and gains of

68

proteins, there may be other changes in post-translational modifications such as phosphorylation, which regulates many aspects of a protein’s function including interactions with other proteins, enzyme activity, and stability. In addition, alterations in

DNA may cause changes in microRNAs, which can also lead to changes in protein levels by affecting transcript levels and translation. Genomic, transcriptomic and proteomic techniques provide unique information (Figure 1.1) useful in discovering the molecular biological changes in a malignant tumor. When combined, these multi-omic analyses yield tumor-specific pathways with clinical significance.

Figure 3.1: The flow of information available with different omic experiments. The combination of proteomics with a general mRNA workflow provides information of the outcome of transcription, translation, degradation and regulation. Reprinted with permission.106

69

In a previous study, the differential expression analysis of adenocarcinoma tumor mRNA reported multiple prognostic markers including insulin growth factor binding protein 3 (IGFBP3) and keratin 7 (KRT7).107 The NSCLC patients that presented hypermethylation of IGFBP3 resulted in poor prognosis.108 The altered expression of cell junction protein genes has been seen in different types of cancers including NSCLC.109

The anchoring protein genes such as desmosomal and adheren junctions are prognostic markers for lung squamous cell tumor metastasis and invasion.110 However, few efforts have been made to validate the respective changes at protein level that may lead to targetable protein-protein networks.

In recent years mass spectrometry (MS) has shown great potential in discovering protein level changes in cultured cells and tissue. The advancements in instrumentation and separation techniques improved the coverage of the proteome leading to the discovery of high, mid and low abundance proteins. With these technical improvements, we can delve deeper into the proteome and propose studies to identify changes in thousands of proteins in normal and tumor tissue or in more aggressive versus less aggressive tumors.

The combination of high performance liquid chromatography (HPLC) coupled to mass spectrometry provides an ideal platform for tissue analysis. Often, the number of proteins identified by an LC-MS platform is limited by the complexity of the sample and the dynamic range of the proteins. Therefore most tissue digests are pre-fractionated prior to analytical separation. High pH liquid chromatography fractionation is one such method that has gained popularity over the past few years. 70

The goal of this study is to use the shotgun proteomics technique to establish a protein expression signature capable of identifying early stage tumors destined to relapse.

Further, we proposed to evaluate such protein level changes in previously generated mRNA data. The overall objective is to discover prognostic markers that could distinguish high-risk patients who need additional, more aggressive adjuvant therapy following surgery and those who do not need aggressive treatments.

To investigate the protein level changes exhibited by highly invasive non-small cell lung tumors, we undertook a study of twenty-one malignant tumors obtained from the

Washington University (St.Louis, MO). The tumor cohort contained tissue specimens collected from patients that survived less than 2 years (recurrent) or survived more than

5-years (nonrecurrent). The objective was to identify protein level alterations associated with tumor recurrence that leads to signature protein changes and assesses whether there is an mRNA to protein correlation.

The mass spectrometry analysis of these tumor samples yielded over 5000 protein groups in both adenocarcinoma (ADC) and squamous cell carcinoma (SCC) samples. The analysis of both ADC and SCC tumor groups revealed protein expression changes altered in the respective recurrent group compared to that of nonrecurrent. The adenocarcinoma tumors revealed significant changes in IGFBP family proteins. The squamous tumor subtypes showed multiple tumor-related proteins that can be used as potential recurrence markers. The most significant changes were reported in cell junction proteins. The desmosomal proteins presented a unique expression pattern at the protein level in SCC which was not observed in other cell-cell junction proteins. Having marker proteins like 71

desmosomal proteins to recognize early recurrence might improve the treatment options for high-risk patients that may eventually lead to longer 5-year survival rates.

3.2 Materials and Methods

3.2.1 The dissection of tumors and extraction of proteins

All samples were randomly selected for the analysis. The FFPE tumor tissue samples were macrodissected by removing the tissue adjacent to tumor margin and deparaffinized in xylene followed by rehydration in graded ethanol series as previously described.83 Each tissue specimen was homogenized in a modified lysis buffer containing

0.2% RapiGest (Waters Corporation, Milford, MA) in 50 mM ammonium bicarbonate.

The lysates were incubated at 105 ˚C for 30 min and stored on ice for 5 min. The samples were sonicated using Sonic Dismembrator Model 100 (Fisher Scientific) at 20 W for 5 min with 20 s intervals per 30 s. This sonication step was repeated twice, and the samples were incubated at 70 ˚C for 2 h. The protein concentration of each lysate was determined by BCA protein assay (Thermo Fisher Pierce, Rockford, Illinois) using the manufacturer’s protocol.

3.2.2 Digestion of protein sample

A total of 100 µg of tissue proteins were reduced with 50 mM DTT at 60 ˚C for 30 min followed by alkylation with 100 mM iodoacetamide in the dark at room temperature for 20 min. The samples were digested with sequencing grade trypsin (Promega

72

Corporation, Madison, WI) at a ratio of 1:50 (w:w) and 0.01% RapiGest SF surfactant

(Waters Corporation) at 37 ˚C for 4 h. The samples were acidified with 0.5% TFA and incubated at 37 ˚C for 30 min followed by centrifugation at 14000 g for 15 min. The supernatant was collected and evaporated to dryness in a Speed-Vac concentrator

(Thermo Scientific). The samples were stored in -80 ˚C until LC/LC-MS/MS analysis.

3.2.3 The online fractionation and LC-MS/MS analysis of tumor protein digests

For the analysis of the tissue cohort (21 samples), liquid chromatography coupled to tandem mass spectrometry was performed using a Waters nanoACQUITY two- dimensional (2D) UHPLC system (Waters Corporation, Milford, MA). The UPLC system containing two reversed phases was coupled to a Thermo LTQ-Orbitrap Elite hybrid mass spectrometer (Thermo Fisher Scientific, Bremen, Germany).

A total of 8 µg of protein digest reconstituted in 100 mM ammonium formate was injected using Acquity UPLC autosampler (Waters Corporation, Milford, MA) and the peptides were fractionated online at high pH prior to analytical separation. The fractionation of peptides was achieved in the first reversed phase column (Waters BEH

C18, 130 Å, 1.7 µm particle diameter, 300 µm i.d., 100 mm) at pH 10.0 in buffer A1 (20 mM ammonium formate) by varying the amounts of solvent B1 (100% acetonitrile). The column was equilibrated at 3% B1 (v/v), which was increased to 4.7% (v/v) in 1 min eluting the first fraction of peptides and decreased back to 3% (v/v) B1 in the next 4 min.

The column was held at 3% (v/v) B1 during analytical separation on a low pH RP column at a steady flow rate of 2 µL/min. The solvent % B1 (v/v) was increased from 3% to 73

4.7% in fraction 1. This was gradually increased to: 9.0%, 10.8%, 12.0%, 13.1%, 14.0%,

14.9%, 15.8%, 16.7%, 17.7%, 18.9%, 20.4%, 22.2%, 25.8% and to 65% over fifteen fractions. Each fraction eluted from the fractioning column was loaded onto a Waters

Symmetry C18 trap column (100 Å, 5 µm particle diameter, 180 µm x 20 mm) and desalted at a flow rate of 20 µL/min.

The analytical separation was achieved in the second reversed phase column

(Waters HSS T3, C18, 100 Å, 1.8 µm particle diameter, 75 µm i.d. X 150 mm) at pH 2.4 which was equilibrated to initial conditions; 95% (v/v) A2 (water with 0.1 % formic acid) and 5% (v/v) B2 (acetonitrile with 0.1 % formic acid). The subsequent separation was achieved by four linear gradients at 38 ˚C where the % B2 was increased from 5%-9% in

3 min; 9%-30% over 44 min; 30%-40% over 5 min and 40%-85% over 5 min at a flow rate of 0.5 µL/min. The column was held at 5% (v/v) B2 from 65-70 min before reaching initial conditions.

The 2D HPLC was coupled to an LTQ-Orbitrap Elite via a nanospray Flex ion source (Thermo Fisher Scientific, Bremen, Germany) containing a 30 µm inner diameter stainless steel emitter (Thermo Fisher Scientific) with spray voltage between 1.7-1.8 kV.

The Orbitrap mass spectrometer was operated in data-dependent acquisition mode, where the top fifteen MS/MS scans were acquired for every full MS-scan.

The full MS-scan was acquired in the Orbitrap MS-analyzer with resolution R =

120,000 at m/z 400 for every 107 charges acquired in the ion trap MS-analyzer. Collision induced dissociation (CID) was used for peptide fragmentation. The method triggered

MS/MS scans for the top fifteen most abundant m/z peaks at an automated gain control 74

(AGC) target value of 5000 charges. The method was programmed with an ion transfer tube temperature at 275 ˚C; S-lens RF 55%; dynamic exclusion with a repeat count 1 and repeat duration of 15 s for an exclusion list size of 500 mass-to-charge ratios; CID with normalized collision energy of 35%, q= 0.25 and activation time of 10 ms; and minimum intensity threshold set to 6000 counts.

3.2.4 Data processing and protein identification

For protein identification, Myrimatch71 version 2.1.111 was used with a customized

RefSeq human database (version 54). The raw files generated by Xcalibur software

(Thermo Fisher Scientific) for all fifteen fractions of each protein digest were used in the peptide identification. The MS/MS spectra were searched with fixed carbamidomethyl modification at cysteine, and variable acetylation at protein N-termini, oxidation of methionines and deamidation at asparagine and glutamine (only for validation cohort data). A maximum of two missed cleavages was allowed for every fully tryptic peptide and proline rule was applied to disregard the miss cleaved lysines and arginines before a proline. The minimum peptide length was six amino acids.

The data were filtered in IdPicker software version 3.0.504.111 The proteins present in each sample were identified with a peptide false discovery rate (FDR) of 1% and a protein FDR of 4.45%. Protein groups were filtered only to include proteins with a minimum of two peptides and with two spectra required per peptide. For proteogenomic analysis, protein groups identified in each sample were grouped based on the gene group

75

and the respective number of spectral counts for each gene group per patient was recorded.

3.2.5 The bioinformatics analysis of the samples

Raw peptide spectral counts were filtered to remove low-count spectra (i.e. 0 or 1 for most samples) and normalized across samples by the median-of-ratios method used in

DESeq2 package.112 Spectral counts were modeled based on the assumption of Negative

Binomial distribution to account for the overdispersion. Generalized linear models

(GLM) with the logarithmic link were used for the two comparisons (recurrent vs. non- recurrent SCC and ADC, SCC vs ADC). Empirical Bayes Shrinkage estimation for dispersions and fold changes were used to improve the stability of the estimates by the R

DESeq2 package.112 The Wald test was applied for testing differential expression of individual proteins between the two groups of interest. A p-value cut-off criterion was used to control the expected number of false positives.113

3.2.6 Pathway Analysis

The pathway analysis of proteins with significant differential expression (fold- change≥2, p-value<0.05) in any two selected sample groups was performed by Ingenuity pathway analysis (IPA). The pathway interactions were predicted using IPA pathway/ protein network database.

76

3.3 Results and Discussion

3.1.1 Identifying the peptides and proteins in patient tumor samples

The proteomic analysis of patient tumor samples was performed on twenty-one lung tumors obtained from Washington University (St. Louis, MO). The patient sample cohort contained 8 adenocarcinoma samples with 5 recurrent (patients who lived less than 2 years) and 3 nonrecurrent tumors (patients who lived longer than 5 years). Similarly, the squamous tumor cohort contained two sample groups with 3 recurrent and 9 non- recurrent tumors. All tumors selected for the studies were primary lung tumors of stage I and II. Table 3.1 shows the de-identified patient information for all tumor samples including gender, age at death, site of relapse, time to relapse, dead or alive status, and metastasis.

We removed the tissues surrounding the tumor margin prior to protein extraction based on the H&E slides (Appendix B, Figure B.1) prepared at the same time as the

FFPE slide specimens. The proteins were extracted from the deparaffinized tissues as described in Section 3.2.1 . The method presented in Chapter 2 with 15 high pH fractions coupled to reversed phase analytical separation was used prior to tandem mass spectrometry product ion scans. Each sample run acquired 15 raw files representing the fractions. The peptides were identified through the BumberDash114 user interface and

Myrimatch search algorithm. The protein identification used a cumulative set of 15 raw files per patient tumor sample. The data search resulted in peptides at 1% peptide FDR.

77

Histology Patient Gender Age at Overall Site of Relapsed 0=Dead OS Metastasis death Stage Relapse in 1=Alive (Years) (years) Adeno WU1511 male 63.55 T1a - - 1 5.31 - WU1510 male 54.44 T2a - - 0 5.63 - WU1519 female 46.66 T2a - - 1 5.60 - WU1515 male 73.08 T1a Brain 0.62 0 0.71 Brain WU1500 female 74.39 T2a Lung 0.38 0 0.71 Lymph Node WU1516 female 65.64 T2a Lung 0.68 0 1.74 - WU1508 male 73.92 T1b - - 1 1.14 - WU1501 female 44.15 T1b - - 0 1.09 - Squamous WU1506 male 70.78 T1b - - 0 7.38 - WU1518 male 71.67 T2a - - 0 7.80 - WU1502 male 65.82 T2b - - 0 9.91 - 78 WU1507 male 52.24 T1a - - 0 8.84 -

WU1504 male 66.52 T1a - - 0 5.21 - WU1503 male 58.18 T1a - - 1 11.79 - WU1513 female 77.03 T1a - - 1 9.39 - WU1520 male 62.46 T1a - - 0 8.77 - WU1512 male 53.45 T1a - - 1 8.69 - WU1509 male 53.96 T1b - - 0 0.82 - WU1514 male 68.13 T1b - - 0 0.08 - WU1517 male 62.18 T1b - - 0 0.04 -

Table 3.1: The patient clinical information. The overall stage is given based on AJCC 7th edition. T1a- stage I tumors <2 cm, T1b- stage I tumors from >2-3 cm, T2a-stage II tumors from >3-5 cm, T2b- stage II tumors from >5-7 cm

78

Figure 3.2: The number of protein groups identified in A- adenocarcinoma patient samples. Numbers 1500, 1501, 1508, 1515 and 1516 represents the recurrent group and 1510, 1511 and 1519 represents the non-recurrent group of patients. B- Squamous cell carcinoma patient samples. Numbers 1509, 1514 and 1517 represents the recurrent group, and 1502, 1503, 1504, 1506, 1507, 1512, 1513, 1518 and 1520 represents the non- recurrent group of patients.

79

Figure 3.3: The Venn diagram of protein groups identified in (filtered to include only high confidence peptides with a minimum of 2 peptides a protein) A- squamous cell carcinoma vs. adenocarcinoma sample groups B- adenocarcinoma recurrent and non- recurrent samples C- squamous cell carcinoma recurrent and non-recurrent samples.

The Myrimatch algorithm used 226,484 filtered spectra to identify 49,778 distinct matches leading to 30,656 peptides in the adenocarcinoma cohort. The deconvolution and sequence identification of the squamous cohort data used 775,595 filtered spectra resulting in 86,974 distinct matches to identify 46,163 peptides. At a protein group FDR of 2.84%, the search led to 5680 protein groups in adenocarcinoma and 6574 protein groups in squamous cell carcinoma cohorts. The analysis of patient samples yielded a

80

total of 5689 protein clusters, 6652 protein groups, 22,151 proteins belonging to 13,550 genes and 6581 gene groups.

The final dataset after filtering presented 81% overlap in the ADC and SCC cohort protein groups. Further, ADC data contained 961 unique recurrent and 75 unique nonrecurrent protein groups with 78% overlapping protein groups. SCC samples yielded

81 unique recurrent and 550 unique non-recurrent protein groups indicating an 89% overlap between the two samples. All protein groups were filtered only to contain high confidence peptides with a minimum of two peptides per protein.

3.3.1 Differential expression analysis of the tumor tissue proteome

The protein level changes were determined based on the spectral counts reported per protein group. The number of spectral counts before and after DESeq normalization is given in Figure 3.4. The samples are labeled based on the original patient number and experimental batch number. As indicated in the boxplots (Figure 3.4) the majority of spectral counts reported were below 6. The patient number 1505 presented the minimum number of protein groups with the maximum number of zero counts per protein group.

Similarly, the mRNA data for 1505 reported an extremely low number of identifications.

Therefore we eliminated both protein and RNA level data associated with that sample.

Further, the protein group filter used in the statistical analysis removed the protein group identifications with less than four non-zero values reported per sample cohort.

81

Spectral counts

Spectral counts

Sample number based on the analysis group

Figure 3.4: The box plot of the distribution of spectral counts in each patient sample before and after DESeq normalization. In the box plots, each quantile is separated by the sample spectral count mean (solid black line). The colors represent different subcategories of samples in ADC and SCC sample groups. Black- ADC non-recurrent, red- ADC recurrent, green- SCC non-recurrent carcinoma and blue- SCC recurrent.

Principal component analysis of the samples indicated two distinct groups for ADC

and SCC tumor cohorts. ADC recurrent and non-recurrent sample groups presented one

cluster while SCC presented separate sample clusters for recurrent and nonrecurrent

samples (Figure 3.5). The statistical analysis of protein data featured proteins with

82

significant protein level modifications (p<0.05) in recurrent samples compared to that of nonrecurrent. The three recurrent samples in the SCC sample group with T1b tumors yielded two different PCA clusters. The patient sample 1514 and 1517 resulted in more protein level similarities than that of 1509.

Figure 3.5: The principle component 1 (PC 1) vs principle component 2 (PC 2) of the samples. ADC- adenocarcinoma samples, SCC- squamous cell carcinoma, NR-non- recurrent, R- recurrent. Squamous cell carcinoma tumor cohort presents separate clusters for recurrent and non-recurrent tumor groups.

83

The differential expression analysis was performed using original sample grouping; recurrent and non-recurrent in ADC and SCC. The analysis generated 19 differentially expressed (p<0.05) protein groups in ADC tumor samples and 98 in SCC tumor sample

(Figure 3.6). A total of 23 protein groups yielded significant protein up-regulations in recurrent samples compared to non-recurrent. These protein groups consisted of 5 unique

ADC proteins, 16 unique SCC proteins, and 2 overlapping proteins. The number of protein groups down-regulated in SCC samples was much higher than that of ADC. The data revealed 77 unique SCC protein groups 9 unique ADC protein groups and 3 protein groups common to both ADC and SCC (Figure 3.6).

Figure 3.6: Differentially expressed proteins (p<0.01) identified in recurrent vs. nonrecurrent samples of adenocarcinoma and squamous cell carcinoma sample groups. Proteins were analyzed by DESeq R package. All “up” proteins are up-regulated in recurrent sample and “down” proteins are down-regulated in recurrent samples than that of non-recurrent samples.

84

Based on the results it is evident that squamous cell carcinoma samples have more protein changes, which allow us to develop a more robust signature to distinguish between those patients that will recur and those that are nonrecurrent.

3.3.2 Recurrent vs. non-recurrent adenocarcinoma tumors

The two subgroups of adenocarcinoma samples used in the study contained 5 recurrent samples and 3 nonrecurrent patient tumor samples. The heat map was generated for the protein groups identified in the samples using MeV 4.9.0 (http://www.tm4.org).

The unsupervised clustering of the normalized data did not reveal clearly identifiable groups for recurrent and non-recurrent. Among the non-recurrent subgroup of samples consisting of 1510, 1511 and 1519, only 1519 protein expression indicated a parting from the main sample cluster. Further, 1510 and 1511 showed similarities to two of the recurrent samples; 1500 and 1516. The results confirmed the results represented in the

PCA analysis.

However, the differential expression analysis and the clustering here are based on a limited number of data sets that only contained 3 nonrecurrent samples. The nonrecurrent tumor specimen group contained 2 stage II tumors and one stage I tumor, and one of the stage II tumors was collected from a female patient. Therefore, the differences we observed in protein expression levels presented on the heat map may be a representation of the gender-based biology of the selected individuals.

85

Figure 3.7: The heat map of protein groups identified in adenocarcinoma patient samples. The heat maps generated in MeV 4.9.0 used both supervised and unsupervised clustering. A- the clustering of the samples identified in unsupervised clustering, ‘R’ represents the recurrent patient tumor samples B- the heat map of the samples after supervised clustering

86

The DESeq analysis of the tumor samples yielded 76 differentially expressed

(p<0.05) protein groups in adenocarcinoma sample subtypes (Table 3.2). Twenty-four proteins in recurrent ADC samples produced significantly (p<0.05) reduced protein expression levels compared to that of nonrecurrent tumor group. Three of the down- regulated proteins in recurrent samples are from the superfamily of insulin growth factor binding proteins (IGFBP); IGF2BP1, IGF2BP2, and IGF2BP3. This superfamily of proteins can bind to insulin-like growth factors (IGF) modulating its binding to type I

IGF receptors. Overall, by controlling the IGF levels, these IGFBPs regulate insulin- dependent signaling in cells.

In the analysis, all IGF protein groups revealed more than 4-fold protein abundance change in recurrent than nonrecurrent patient tumor tissue samples. Although the role of

IGFBPs in lung cancer is not completely understood, inhibition of growth and proliferation has been observed with increased levels of these proteins.115,116 Especially, hypermethylation of the IGF2BP3 gene which has been reported in non-small cell lung cancer is known to be associated with poor prognosis.117As presented in our data the protein level changes in IGFBP superfamily in nonrecurrent patient tumors directly correlates with patient survival. We hypothesize that this may occur through IGF- dependent or independent signaling pathways in tumor cells.

In adenocarcinoma tumor subtypes, 11 protein groups presented significantly

(p<0.05) up-regulated protein levels in recurrent compared to that of nonrecurrent patient tumors. Vimentin is one such target detected in this shotgun analysis of recurrent vs. nonrecurrent ADC tumors. 87

Fold- UniProt change accession Description Gene (R/NR) p-Value Insulin-like growth factor 2 mRNA- Q9Y6M1 binding protein 2 IGF2BP2 -10. 4.0E-05 P55011 Solute carrier family 12 member 2 SLC12A2 -6.3 0.0021 Insulin-like growth factor 2 mRNA- Q9NZI8 binding protein 1 IGF2BP1 -5.9 0.0029 Protein kinase C and casein kinase Q9UKS6 substrate in neurons protein 3 PACSIN3 -5.3 0.0033 Insulin-like growth factor 2 mRNA- O00425 binding protein 3 IGF2BP3 -5.0 0.0067 P24043 Laminin subunit alpha-2 LAMA2 -3.9 0.021 Q92506 Estradiol 17-beta-dehydrogenase 8 HSD17B8 -3.6 0.017 Q92626 Peroxidasin homolog PXDN -3.6 3.0E-02 Q96PU5 E3 ubiquitin-protein NEDD4-like NEDD4L -3.5 0.036 Q9BUT1 3-hydroxybutyrate dehydrogenase type 2 BDH2 -3.4 3.0E-02 Thiosulfate sulfurtransferase/rhodanese- Q8NFU3 like domain-containing protein 1 TSTD1 -3.4 0.038 Latent-transforming growth factor beta- Q14767 binding protein 2 LTBP2 -3.3 0.019 Q9P2M7 Cingulin CGN -3.3 0.036 Q53EL6 Programmed cell death protein 4 PDCD4 -3.3 0.031 P62861 40S ribosomal protein S30 FAU -3.3 0.036 Q9NZN4 EH domain-containing protein 2 EHD2 -3.2 0.015 Succinate-semialdehyde dehydrogenase, P51649 mitochondrial ALDH5A1 -3.1 0.049 O43175 D-3-phosphoglycerate dehydrogenase PHGDH -3.0 0.018 Nicotinate-nucleotide pyrophosphorylase Q15274 [carboxylating] QPRT -2.9 0.039 P35222 Catenin beta-1 CTNNB1 -2.9 0.0082 Q13308 Inactive tyrosine-protein kinase 7 PTK7 -2.9 0.045 P09429 High mobility group protein B1 HMGB1 -2.7 0.0074 Q9P0M6 Core histone macro-H2A.2 H2AFY2 -2.5 0.039 Q14980 Nuclear mitotic apparatus protein 1 NUMA1 2.1 0.018 Q86UX7 Fermitin family homolog 3 FERMT3 2.1 0.032 …………………………………………………………………………………...Continued Table 3.2: The differentially expressed proteins in lung adenocarcinoma samples. R- recurrent, NR- nonrecurrent.

88

Table 3.2 continued .. UniProt FC p- accession Description Gene (R/NR) Value V-type proton ATPase subunit B, brain 2.0E- P21281 isoform ATP6V1B2 2.1 02 Q01518 Adenylyl cyclase-associated protein 1 CAP1 2.2 0.043 P08670 Vimentin VIM 2.4 0.0046 Actin-related protein 2/3 complex subunit O15143 1B ARPC1B 2.4 0.038 O60701 UDP-glucose 6-dehydrogenase UGDH 2.5 0.041 Myeloid cell nuclear differentiation P41218 antigen MNDA 2.5 0.036 3.0E- Q99715 Collagen alpha-1(XII) chain COL12A1 2.6 02 Q96G03 Phosphoglucomutase-2 PGM2 2.6 0.026 P10909 Clusterin CLU 2.6 0.048 O14617 AP-3 complex subunit delta-1 AP3D1 2.6 0.049 P08603 Complement factor H CFH 2.7 0.021 Q05707 Collagen alpha-1(XIV) chain COL14A1 2.7 0.046 P02538 Keratin, type II cytoskeletal 6A KRT6A 2.7 0.028 P48668 Keratin, type II cytoskeletal 6C KRT6C 2.7 0.022 P01023 Alpha-2-macroglobulin A2M 2.8 0.0097 P06756 Integrin alpha-V ITGAV 2.8 0.034 P05362 Intercellular adhesion molecule 1 ICAM1 2.8 0.016 P00751 Complement factor B CFB 2.9 0.015 Q5TEJ8 Protein THEMIS2 THEMIS2 2.9 0.031 P02765 Alpha-2-HS-glycoprotein AHSG 2.9 0.041 Complement C1q subcomponent subunit P02747 C C1QC 2.9 0.044 Q0VD83 Apolipoprotein B receptor APOBR 2.9 0.046 O76070 Gamma-synuclein SNCG 3.0 0.049 Q6P4A8 Phospholipase B-like 1 PLBD1 3.1 0.046 Erythrocyte band 7 integral membrane P27105 protein STOM 3.1 0.012 P17812 CTP synthase 1 CTPS1 3.1 0.044 P17480 Nucleolar transcription factor 1 UBTF 3.1 0.046 Q15080 Neutrophil cytosol factor 4 NCF4 3.0 0.036 HLA class II histocompatibility antigen, HLA- P13761 DRB1-7 beta chain DRB1 3.2 0.023 Q08378 Golgin subfamily A member 3 GOLGA3 3.2 0.042 ………………………………………………………………………..….Continued 89

Table 3.2 continued UniProt FC p- accession Description Gene (R/NR) Value Q08378 Golgin subfamily A member 3 GOLGA3 3.2 0.042 Q9NY15 Stabilin-1 STAB1 3.2 0.035 P10153 Non-secretory ribonuclease RNASE2 3.2 0.044 Q96BY6 Dedicator of cytokinesis protein 10 DOCK10 3.2 0.036 P41219 Peripherin PRPH 3.3 0.026 P04004 Vitronectin VTN 3.3 0.011 Q32P28 Prolyl 3-hydroxylase 1 P3H1 3.3 0.032 Peptidyl-prolyl cis-trans Q96AY3 FKBP10 FKBP10 3.4 0.035 P40261 Nicotinamide N-methyltransferase NNMT 3.4 0.028 O60234 Glia maturation factor gamma GMFG 3.4 0.0096 O75594 Peptidoglycan recognition protein 1 PGLYRP1 3.5 0.032 Receptor-type tyrosine-protein P08575 phosphatase C PTPRC 3.5 0.0052 P02749 Beta-2-glycoprotein 1 APOH 3.6 0.0038 Serine/threonine-protein kinase MRCK Q9Y5S2 beta CDC42BPB 3.7 0.023 Cold shock domain-containing protein O75534 E1 CSDE1 3.7 0.015 A disintegrin and metalloproteinase O75173 with thrombospondin motifs 4 ADAMTS4 3.8 0.024 P02656 Apolipoprotein C-III APOC3 3.9 0.011 Q86TI2 Dipeptidyl peptidase 9 DPP9 4.1 0.014 Q5ZPR3 CD276 antigen CD276 4.2 0.0089 P14780 Matrix metalloproteinase-9 MMP9 4.4 0.012 P04839 b-245 heavy chain CYBB 4.6 0.0025 P09917 Arachidonate 5-lipoxygenase ALOX5 4.6 0.0049

It is a protein that belongs to the family of intermediate filament (IF) proteins important in maintaining organ stability and strength. The over-expression of vimentin in tumor cells is often associated with increased tumor growth and metastasis leading to poor prognosis.118 Such changes have been reported in NSCLC adenocarcinoma patients

90

with poor survival rates,119 validating the corresponding increase of vimentin levels discovered in the recurrent patient tumors used in our study.

Among up-regulated proteins, synuclein gamma (SNCG) and glia mutation factor gamma (GMFG) indicated a three-fold increase (p<0.05) in protein expression in recurrent samples. SNCG is small protein from a with three members120 that is known to associate with neurodegenerative diseases. However, SNCG, which was initially recognized as breast cancer specific gene I,121 is also linked to several cancers including lung cancer122. SNCG protein group initiates cell proliferation and induces metastasis leading to poor prognosis which has been reported in breast cancer cells lines.122 The increased levels of this protein in recurrent samples can be correlated with the shorter relapsing time of the tumor. GMFG is a protein that regulates actin filaments, therefore, involve in cytoskeletal restructuring.123 This protein promotes invasion and cancer cell migration through cytoskeleton remodeling. The up-regulation of GMFG is often associated with poor prognosis in ovarian cancer tumors.124

The data generated at the Ohio State University was combined with a previous proteomic dataset obtained at the Vanderbilt University. The Ohio State mass spectrometry data was generated in high-resolution, and Vanderbilt data was generated in low-resolution mass spectrometry instruments. The proteogenomic analysis of these samples used the paired results for proteins and RNA produced for the same patient tumor sample. The unsupervised clustering of RNA and protein data indicated two separate clusters (Figure 3.8.A). The data correlation was strong among different tumor samples but weaker between the two data categories (Figure 3.8.A). The RNA-protein 91

correlation (Figure 3.8.B) was weaker (ρ=0.39) than that of RNA-RNA or protein- protein. The differential expression analysis resulted in 66 proteins and 159 RNA. The samples did not contain any differentially expressed and differentially correlated genes

(Figure 3.8.C). A few differentially expressed RNA and protein groups indicated overlapping genes with no RNA-protein correlation. The IPA network analysis of proteins with significant (p<0.05) expression level changes indicated a few important cancer pathways. TNF and RPTOR (raptor) are two genes (Figure 3.9) associated with protein changes like IGF2B1, ADAMS4, and IGF2B2. Such proteins are pathway components of TNF, PI3K-Akt, and GPCR signaling in cancer. The IPA network analysis of proteins with significant (p<0.05) expression level changes indicated a few important cancer pathways. TNF and RPTOR (raptor) are two genes (Figure 3.9) associated with protein changes like IGF2B1, ADAMS4, and IGF2B2. Such proteins are pathway components of TNF, PI3K-Akt, and GPCR signaling in cancer.

In a proteogenomic analysis, the RNA level changes discovered in tumor specimens may or may not directly correlate to protein level changes. Often, most genes correlate at both RNA and protein levels while a few genes fail to indicate such relationship between

RNA to proteins.6 However, in our analysis, the RNA data did not validate any of the proteins discovered in the adenocarcinoma tumor shotgun analysis. Each data category yielded unique genes with altered expression levels at RNA or protein level. The

Vanderbilt data further confirmed this RNA-protein correlation discovered in the Ohio state ADC tumor cohort and did not provide a confident proteogenomic signature.

92

Figure 3.8: The proteogenomic data for adenocarcinoma samples A- the heat map of adenocarcinoma recurrent and nonrecurrent data (RNA and DNA). B- the RNA and protein correlation of the data C- overlap of differentially expressed and differentially correlated genes at the RNA and protein levels (npSeq, p<0.05). Reproduced with permission from the author.81

The lack of correlation in RNA-protein data indicated an alternate mechanism regulating the RNA to protein translation. The microRNAs are another set of biological

93

species that regulate transcription and translation of the proteins. However, the miRNAs were not investigated in this experiment. Therefore we propose a thorough validation study for adenocarcinoma recurrent and nonrecurrent tissues with additional genomic and transcriptomic data. Further, we also suggest a validation experiment for proteins discovered in this proteomic analysis in a second tissue cohort.

Figure 3.9: The IPA network map of differentially expressed proteins identified in lung adenocarcinoma samples. IGF2BP1/2 and 3 present down-regulated protein levels in recurrent patient tumors.

94

3.3.3 Recurrent vs. non-recurrent squamous cell carcinoma tumors

Squamous cell carcinoma (SCC) is one of the most common forms of non-small cell lung cancer (NSCLC) and comprises approximately 25% to 30% of lung cancer cases.

Unlike adenocarcinoma for which many driver mutations are known, the drivers of SCC are much less clear making treatment options very limited for these patients. Recently, the advent of immunotherapy has provided some benefit for SCC patients with a 20% response rate. Early diagnosis and surgery seem to be the best option for these patients as resection of early stage tumors provides a cure rate of approximately 50 to 60%.

However, there are those patients that recur soon after curative intent surgery. These recurrent patients may benefit from adjuvant chemotherapy or immunotherapy, and identification of those most likely to recur should increase survival.

The shotgun experiment was conducted for 12 squamous cell carcinoma tumor samples with 3 recurrent and 9 nonrecurrent tumor specimens. The unsupervised clustering and the heat map analysis of the normalized spectral counts for protein samples indicated two main clusters. One cluster was comprised of 1509, 1514 and 1517 recurrent samples and the other with all nonrecurrent samples (Figure 3.10). The DESeq analysis of the protein spectral counts revealed 98 significantly altered (p<0.01) proteins in SCC subtypes. Among these 18 protein groups yielded increased protein levels, and 80 protein groups yielded decreased protein levels in recurrent compared to that of nonrecurrent tumors (Table 3.3).

95

Twenty protein groups identified in this analysis are discussed here as potential marker proteins for tumor recurrence. The keratin proteins, KRT17, and KRT76 are down-regulated (fold changes are -22.3 and -21.3 respectively) in recurrent samples in the SCC tumor specimen analysis. Keratin 17 is a type I basal cell keratin initially identified in cutaneous basal cell carcinomas.125 Although several studies have reported the altered expression level of KRT17 in gastric adeno126, thyroid127, and ovarian carcinoma128, no clear evidence has been found supporting its role as a prognostic marker for recurrence in lung cancer. We believe that the reduced expression levels in recurrent patient tumor samples, as opposed to the higher expression levels in nonrecurrent tumors indicate the potential of KRT17 as a marker for recurrence in lung squamous cell carcinoma. Similarly, KRT76 indicates unique SCC specific changes associated with tumor recurrence.

KRT76 is type II keratin responsible for maintaining the epidermal integrity of the cells.129 Despite being a structural marker protein, the depletion of the protein has been found in oral cancer progression.130 This is mainly through the ability of KRT76 to maintain tight junctions between cells129, and the loss of the protein reverses this function which can be correlated to the tumor recurrence. Therefore, we predict these two proteins as markers for squamous cell carcinoma tumor recurrence.

The second group of proteins differentially expressed in the SCC tumor subsets is the aldo-keto reductase family 1 (AKR1) of proteins. The down-regulated protein expressions in recurrent SCC are from two subfamilies; aldose reductases (AKR1B) and hydroxysteroid/dihydrodiol dehydrogenases (AKR1C).131 96

Figure 3.10: The heat map of the top 150 squamous cell carcinoma protein groups. The supervised clustering was obtained using MeV 4.9.0. Normalized spectral counts per protein groups were used in the analysis.

97

Fold- UniProt change accession Description Gene (R/NR) p-Value Q13835 Plakophilin-1 PKP1 -87 0 P32926 Desmoglein-3 DSG3 -73. 0 Q14574 Desmocollin-3 DSC3 -42 0 P27482 Calmodulin-like protein 3 CALML3 -25 0 Q01546 Keratin, type II cytoskeletal 2 oral KRT76 -22. 0 Q04695 Keratin, type I cytoskeletal 17 KRT17 -21 0 class 4 ADH7 P40394 mu/sigma chain -21 0 Q02487 Desmocollin-2 DSC2 -20. 0 P12035 Keratin, type II cytoskeletal 3 KRT3 -20. 0 Q9UN81 LINE-1 retrotransposable element L1RE1 -17 0 O95678 Keratin, type II cytoskeletal 75 KRT75 -13 0 P13646 Keratin, type I cytoskeletal 13 KRT13 -13 0 Q6ZN66 Guanylate-binding protein 6 GBP6 -12 0 Q5XKE5 Keratin, type II cytoskeletal 79 KRT79 -12 0 P48668 Keratin, type II cytoskeletal 6C KRT6C -11 0 Aldehyde dehydrogenase, dimeric ALDH3A1 P30838 NADP-preferring -11 0 P04259 Keratin, type II cytoskeletal 6B KRT6B -11 0 Q5T0W9 Protein FAM83B FAM83B -11 4.0E-05 Q02388 Collagen alpha-1(VII) chain COL7A1 -11 1.0E-05 Aldo-keto reductase family 1 AKR1C1 Q04828 member C1 -10. 0 P13647 Keratin, type II cytoskeletal 5 KRT5 -10. 1.0E-05 Q6ZRV2 Protein FAM83H FAM83H -10. 0 Aldo-keto reductase family 1 AKR1B10 O60218 member B10 -10. 0 Q8N1N4 Keratin, type II cytoskeletal 78 KRT78 -9.7 1.0E-04 Aldo-keto reductase family 1 AKR1C3 P42330 member C3 -9.6 0 P36952 Serpin B5 SERPINB5 -8.6 2.0E-05 Q96SQ9 Cytochrome P450 2S1 CYP2S1 -8.6 0.00016 Multidrug resistance-associated ABCC5 O15440 protein 5 -8.4 0.00029 ………………………………………………………………………………..Continued Table 3.3: Differentially expressed proteins in squamous cell carcinoma samples. R- recurrent, NR-nonrecurrent. 98

Table 3.3 continued UniProt FC accession Description Gene (R/NR) p-Value Q9UIV8 Serpin B13 SERPINB13 -8.3 0.00028 P02538 Keratin, type II cytoskeletal 6A KRT6A -8.2 0 Q9H3D4 Tumor protein 63 CRYAB -8.1 0.00016 P02511 Alpha-crystallin B chain TP63 -8.1 3.0E-04 Aldo-keto reductase family 1 member AKR1C2 P52895 C2 -7.6 0 P62760 Visinin-like protein 1 VSNL1 -7.5 0.00055 P14923 Junction plakoglobin JUP -7.2 0 Q8IVF2 Protein AHNAK2 AHNAK2 -6.7 2.0E-04 Solute carrier family 2, facilitated SLC2A1 P11166 glucose transporter member 1 -6.7 6.0E-05 P01040 Cystatin-A CSTA -5.8 0.00013 P18283 Glutathione peroxidase 2 GPX2 -5.8 0.00065 P47929 Galectin-7 LGALS7B -5.7 0.0027 P51857 3-oxo-5-beta- 4-dehydrogenase AKR1D1 -5.7 0.0012 P23229 Integrin alpha-6 ITGA6 -5.6 0.0011 Transforming acidic coiled-coil- KRT10 O95359 containing protein 2 -5.5 0.0032 Q16890 Tumor protein D53 TPD52L1 -5.4 0.0012 Q7RTS7 Keratin, type II cytoskeletal 74 ABCC1 -5.3 0.0029 Q8TD16 Protein bicaudal D homolog 2 KRT74 -5.2 0.0031 Q8TEW0 Partitioning defective 3 homolog FKBP5 -5.2 0.0042 P05120 Plasminogen activator inhibitor 2 HLA-DRB1 -5.1 0.0053 P22528 Cornifin-B CD74 -5.0 0.0059 RalBP1-associated Eps domain- REPS1 Q96D71 containing protein 1 -4.8 0.0076 Q86Y46 Keratin, type II cytoskeletal 73 TACC2 -4.6 0.0036 P35321 Cornifin-A SPRR1A -4.6 0.0093 Q6NXG1 Epithelial splicing regulatory protein 1 BICD2 -4.5 0.0031 O94888 UBX domain-containing protein 7 DOCK5 -4.5 0.0051 Q8NDI1 EH domain-binding protein 1 EHBP1 -4.4 0.0094 Structural maintenance of SMC2 O95347 chromosomes protein 2 -4.4 0.0098 Insulin-like growth factor-binding IGFBP2 P18065 protein 2 -4.4 0.0094 …………………………………………………………………………………...Continued

99

Table 3.3 continued UniProt FC accession Description Gene (R/NR) p-Value Glutamate--cysteine ligase catalytic GCLC P48506 subunit -4.3 0.00071 P23141 Liver carboxylesterase 1 CES1 -4.2 0.0077 Prostaglandin F2 receptor negative PARD3 Q9P2B2 regulator -4.1 0.0043 P35052 Glypican-1 GPC1 -4.1 0.0085 P33527 Multidrug resistance-associated protein 1 LGALS7 -4.0 0.0028 Q9P265 Disco-interacting protein 2 homolog B DIP2B -3.9 0.0068 Q9BQL6 Fermitin family homolog 1 FERMT1 -3.9 0.0097 P16144 Integrin beta-4 GSTM2 -3.9 0.0056 P28161 Glutathione S- Mu 2 SERPINB2 -3.9 0.0075 O00515 Ladinin-1 LAD1 -3.8 0.0019 P13645 Keratin, type I cytoskeletal 10 ESRP1 -3.8 0.0031 Insulin-like growth factor 2 mRNA- IGF2BP2 Q9Y6M1 binding protein 2 -3.6 0.0069 P58107 Epiplakin EPPK1 -3.6 0.0075 P15924 Desmoplakin DSP -3.5 0.0016 Q7Z794 Keratin, type II cytoskeletal 1b KRT77 -3.2 0.0072 P04792 Heat shock protein beta-1 PTGFRN -3.2 0.0043 P16152 Carbonyl reductase [NADPH] 1 CBR1 -3.2 0.00012 Choline-phosphate cytidylyltransferase PCYT1A P49585 A -2.9 0.0079 O75828 Carbonyl reductase [NADPH] 3 CBR3 -2.7 0.00048 Q14257 Reticulocalbin-2 RCN2 -2.6 0.0091 P08727 Keratin, type I cytoskeletal 19 KRT19 -2.6 0.0084 Na(+)/H(+) exchange regulatory SPRR1B O14745 NHE-RF1 -2.5 0.0062 P09211 Glutathione S-transferase P GSTP1 -2.1 0.0016 Q9UBR2 Cathepsin Z HSPB1 2.6 0.0044 HLA class II histocompatibility antigen ITGB4 P04233 gamma chain 2.8 0.0057 HLA class II histocompatibility antigen, HLA- P01911 DRB1-15 beta chain DRB1 3.4 0.0078 Q8NFJ5 Retinoic acid-induced protein 3 GPRC5A 3.5 0.0084 HLA class II histocompatibility antigen, ACSF2 P01912 DRB1-3 chain 3.5 0.0053 …………………………………………………………………………………..Continued 100

Table 3.3 continued UniProt FC accession Description Gene (R/NR) p-Value Peptidyl-prolyl cis-trans isomerase KRT73 Q13451 FKBP5 3.7 0.0041 Q9P2M7 Cingulin CGN 3.9 0.0082 Src kinase-associated SKAP2 O75563 phosphoprotein 2 4.5 0.0064 Acyl-CoA synthetase family UBXN7 Q96CM8 member 2, mitochondrial 4.6 0.0053 P05156 Complement factor I CFI 4.6 0.0021 Protein-glutamine gamma- TGM2 P21980 glutamyltransferase 2 4.6 0 HLA class II histocompatibility HLA-DPB1 P04440 antigen, DP beta 1 chain 4.9 0.0011 P23083 Ig heavy chain V-I region V35 SLC9A3R1 4.9 0.0064 P27487 Dipeptidyl peptidase 4 DPP4 4.9 0.0065 Q9H7D0 Dedicator of cytokinesis protein 5 CTSZ 5.3 0.0046 P55042 GTP-binding protein RAD RRAD 5.5 0.0027 P15941 Mucin-1 MUC1 6.9 0.00062 O76070 Gamma-synuclein SNCG 7.4 3.0E-05

Four isoforms of the AKR family were detected in the squamous cell carcinoma samples;

AKR1C1, AKR1C2, AKR1C3, and AKR1B10 with a fold-change greater than 7 and at a p-value less than 0.05 (Table 3.3).

All AKR proteins (AKR1C1, AKR1C2, AKR1C3 and AKR1B10) detected in the analysis presented a significant decrease in protein expression levels in recurrent samples with 10.5, 7.6, 9.6 and 10.2 fold-change respectively. Our lower expression level change of AKR1B10, a known biomarker for NSCLC in smokers, conflicts what has been reported by microarray analysis and real-time PCR (RT-PCR).131 However, the changes in AKR1 proteins that we see are in agreement with the changes reported in some breast 101

cancer and gastric tumors at the protein level.132,133 The spectral counts based analysis of these proteins contained data from both unique and razor peptides (a peptide assigned to the protein group with largest number of peptide IDs). Because AKR1C1 and AKR1C2 have 97% sequence similarity, distinguishing changes as AKR1C1 or 2 is not practical and these should be considered as one protein group. The protein level change is not limited to AKR1C1/2, but is also observed in two other AKR1 family members,

AKR1C3, AND AKR1B10. Therefore, we propose that members of the AKR1 family could serve as potential prognostic markers for squamous cell lung cancer.

Another group of proteins identified in the differential expression analysis between recurrent and non-recurrent samples is carbonyl reductases (CBR). The proteomic analysis of the two SCC tissue subtypes detected significantly (p<0.05) reduced CBR proteins in the recurrent tumor samples; CBR1 with -3.2 and CBR3 with -2.6 fold changes (Table 3.3). Among these two proteins CBR3 has been reported in invasive tumors with significantly down-regulated expression levels.134 Further, this was identified as a prognostic marker in NSCLC tumors where patients with high CBR expression levels reported higher 5-year survival rates.135 The tumor suppressive effects of these proteins appear to be by molecular mechanisms associated with retinoic acid. Our results were in agreement with these reported observations in which recurrent squamous patients indicated down-regulation of CBR1 and CBR3 proteins.

In the analysis of recurrent and non-recurrent SCC patient tumor protein data, 19 proteins had significantly up-regulated protein levels in the recurrent samples. Three of these proteins are the GTP-binding protein Ras-related associated with diabetes (RRAD), 102

gamma-synuclein (SNCG) and protein-glutamine gamma-glutamyltransferase 2 (TGM2).

Among these proteins, both RRAD and SNCG yielded dramatic expression changes in the nonrecurrent samples with each reporting no normalized spectral counts. The associated protein level change for recurrent samples was 5-fold (p<0.003) higher than that of the nonrecurrent samples. In contrast, TGM2 was detected in both recurrent and nonrecurrent samples with a reported average of 39 spectral counts per nonrecurrent, and

206 spectral counts per recurrent sample (fold-change 4.5, p-value<0.001).

Overall, the data showed a much higher abundance of TGM2 in recurrent tumors than RRAD and SNCG. TGM2 has been reported as a prognostic marker for several cancers including non-small cell lung cancer136-138 While our analysis discovered the changes at the protein level, TGM2 and RRAD transcript levels were also higher. This data suggests the possibility of using these proteins as potential recurrence markers for lung SCC. As noted in section 3.3.3, SNCG is a differentially expressed protein up- regulated in both adenocarcinoma and squamous cell carcinoma recurrent tumors.

However, neither ADC nor SCC protein data supported the SNCG protein expression at the transcript level. Nevertheless considering the overall patient survival dependent nature of SNCG protein expression and the involvement in cell proliferation, this is proposed as a generic prognostic marker for tumor recurrence.

Although expression levels of TGM2 and SNCG in recurrent samples can be somewhat clearly explained, the up-regulation of RRAD in recurrence samples remains a question. RRAD was initially found in skeletal muscles and highly expressed in the lungs.139 103

Figure 3.11: The IPA pathway map of differentially expressed proteins found in SCC sample. The network analysis indicates the pathways associated with AKR1C1/2 and 3 protein changes in lung squamous cell carcinoma tumor sample.

The biological functions associated with RRAD are cell proliferation and differentialtion140, cytoskeletal control and a negative regulator for glucose uptake139.

Several studies of RRAD have reported its similarities to Ras proto-oncogene.141 While it has been stated that the role of RRAD may vary depending on the cancer type or the cell

104

line, the overexpression of the protein has been seen in more invasive tumors and could provide a marker of recurrence.142

Overall, the use one specific marker protein to determine the tumor recurrence in lung SCC may not be a practical solution as many of the differentially expressed proteins are common markers in several types of cancers. Therefore, we propose a protein signature that can be used as a prognostic marker for lung squamous cell tumor recurrence. The predicted signature includes decreased levels of KRT17, KRT6,

AKR1C1, AKR1C2, AKR1C3, AKR1B10, CBR1 and CBR3 combined with the overexpressed TGM2, SNCG, and RRAD.

3.3.4 Desmosomal proteins; prognostic markers for early recurrence in lung squamous

cell carcinoma

Cell junctions are at the point of contact of cell-cell or cell-matrix in tissues. These are more abundant in tissues and can be categorized into three groups; occluding junctions, anchoring junctions, and communicating junctions. Desmosomes are one of the functional forms of anchoring junctions consisting of transmembrane adhesion proteins including cadherins, armadillo proteins, and plakins (Figure 3.12).143 These protein components are important in maintaining tissue integrity, cell proliferation, and differentiation, therefore down-regulation of desmosomes drives tumor growth and invasion.143,144

105

The differential expression analysis of the SCC patient tumor samples indicated significant expression changes for desmosomal proteins in recurrent samples compared to that of nonrecurrent tumors. The unique nature of this protein expression is that other anchoring junction proteins, such as those in adherens junctions, for example, were not significantly altered. Although changes in protein expression levels have been reported in anchoring proteins in a number of malignant and invasive tumors, this is the first report to show that only desmosomal protein changes are associated with longevity in lung squamous cell cancer patients.

Figure 3.12: The schematic of desmosomal junctions in the cells. The figure indicates the respective distribution of desmosomal cadherins; desmoglein and desmocollin, the armadillo proteins (junction plakoglobin and plakophilin) and plakin (desmoplakin).

106

fold- Accession Gene Protein name Cell junction change p-value Q14574 DSC3 Desmocollin-3 Desomosomes -42 0 Junction P14923 PLAK/JUP plakoglobin Desomosomes -7.3 0 P15924 DSP Desmoplakin Desomosomes -3.6 1.0E-03 P32926 DSG3 Desmoglein-3 Desomosomes -73 0 Q13835 PKP1 Plakophilin-1 Desomosomes -87 0 P35222 CTNB1 Catenin beta-1 Adherens -1.9 0.058 P12830 CADH1 Cadherin-1 Adherens -1.9 0.032 Tight junction Q07157 ZO1 protein ZO-1 Tight junction -0.84 0.68 Q16625 OCLN Occludin Tight junction -0.81 0.71 Table 3.4: The cell junction proteins discovered in the proteomics analysis. The proteins were analyzed by DESeq analysis. The fold-changes and the respective p-value for the desmosomal, adherens and tight junction proteins are given below

The change related to the desmosomal proteins were >3 fold change at a p- value<0.002. The adherens junction proteins and the tight junction proteins did not meet either the fold change or the p-value cut-off (Table 3.4). The desmosomal cadherin desmoglein-3 (DSG3) and desmocollin-3 (DSC3) yielded a 73 and 41 fold decrease in recurrent samples, respectively, with a p-value<0.001. Although the change in desmosomal cadherins, DSG3 and DSC3, has been reported in many cancer types including head and neck145, skin146 and lung cancer, a unidirectional decrease in all desmosomal proteins has never been shown before for recurrent SCC. The direction of all desmosomal protein expressions, therefore produce a unique protein signature for lung squamous cell carcinoma tumors.

107

Among all the desmosomal proteins, plakophilin-1 yielded the highest reported fold-change for differentially expressed proteins, which was down 86 fold with a p-value less than 0.001 in recurrent samples compared to that of nonrecurrent samples.

Plakophilin-1 (PKP1) is one of the two significantly changed armadillo proteins identified in the SCC patient tumor cohort and assumed to be one of the desmosomal proteins affecting Wnt signaling pathway in the cancer cells. The counterpart of PKP1, junctional plakoglobins (JUP), however, did not result in such dramatic changes in its protein expression level. The lowest protein level alteration was observed in desmoplakin

(fold-change -3, p=0.001); a plakin protein in desmosomes. These two proteins groups are important in bridging desmosomal cadherins to intermediate filaments of the cytoskeleton.147

The pathway analysis of the differentially expressed proteins revealed glycogen synthase kinase (GSK) as an upstream regulator of desmosomal proteins (Figure 3.14).

GSK is a regulator of the Wnt signaling pathway, and the pathway analysis indicates the participation of desmosomal proteins in the Wnt-signaling pathway in cancer. The most studied model proposes the release of JUP from the desmosomes that manifests β-catenin like signaling.148 The effects include the transcription of Wnt target genes including oncogenic targets.149 Furthermore, DSG3 activation of the p38MAPK (Figure 3.14) oncogenic pathway and activation of NFKB and AKT signaling pathways may lead to tumorigenesis. More importantly, the loss of the desmosomal protein barrier that supports cell-cell adhesion causes invasion and metastasis of cancer cells.

108

In this pilot study using a small group of recurrent samples, it is clear that there are some significant protein changes, such as those we have identified in desmosomes, which could be related to tumor recurrence. Although the loss of cell junction proteins is often attributed to invasion and metastasis of cancer cells, the unique loss of desmosomal proteins in primary lung tumors has not been reported as a potential recurrence marker.

Gene name Protein, mRNA data correlation DSC3 yes PLAK/JUP yes DESP yes DSG3 yes PKP1 yes Table 3.5: The mRNA-protein correlation of significantly altered desmosomal proteins

Based on our results it is clear that the loss of cell adhesion starts at a very early stage of tumor progression in some tumors. In squamous cell carcinoma, lung cancer desmosomal proteins are reduced dramatically in stage I and stage II tumors similar to the expression pattern of messenger RNA data (Table 3.5). However, the use of desmosomal proteins in the clinic as prognostic markers will require more systematic investigations and experimental evidence with a greater number of tumor samples.

109

** **

Nonrecurrent Recurrent Nonrecurrent Recurrent

* **

Nonrecurrent Recurrent Nonrecurrent Recurrent

**

Nonrecurrent Recurrent Nonrecurrent Recurrent

Nonrecurrent Recurrent Nonrecurrent Recurrent

Figure 3.13: The box plots of desmosomal proteins; DSC3, DSG3, PKP1, DSP, and JUN. The figure presents the changes in normalized spectral counts in both nonrecurrent and recurrent sample groups. ‘**’ is p<0.001 and ‘*’ is p<0.01. The adherens junction protein CDH1 and the tight junction proteins OCLN and TJP1 indicate no significant changes in protein expression levels. 110

Figure 3.14: The pathway analysis of differentially expressed (fold-change>2, p<0.05) squamous cell carcinoma proteins by ingenuity pathway analysis. All the down-regulated proteins are given in green, and all the up-regulated proteins are given in red in recurrent samples compared to that of nonrecurrent tumor samples.

111

The mechanism associated with loss of desmosomal proteins leading to tumor recurrence yet remains unclear and requires more experimental evidence. Overall, a study of such marker proteins could benefit some patients undergoing curative intent surgery for lung squamous cell carcinoma by recognizing those patients that could benefit from adjuvant therapy.

3.3.5 Identifying the differences in adenocarcinoma vs. squamous cell carcinoma

tumors

We compared the protein groups identified in our squamous cell carcinoma tumor samples with that of adenocarcinoma. The two nonrecurrent sample groups resulted in a total of 197 differentially expressed (fold-change>2, p<0.01) proteins (Table 3.6).

Nonrecurrent squamous cell carcinoma yielded 35 down-regulated and 163 up-regulated proteins compared to that of nonrecurrent adenocarcinoma tumors. In contrast, the same analysis in recurrent samples presented as few as 28 differentially expressed proteins

(Table 3.7) indicating the protein expression level similarity between recurrent adenocarcinoma and recurrent squamous cell carcinoma. These recurrent sample proteins contained 23 up-regulated proteins and 6 down-regulated proteins (fold-change>2, p<0.01).

Both recurrent (SCC-NR vs. ADC-NR) and nonrecurrent (SCC-R vs. ADC-R) data generated two common proteins in this analysis; KRT6A and KRT6C. These protein expression levels were equally higher in both SCC recurrent and nonrecurrent samples compared to that of all ADC samples. Further investigation of the spectral counts of these 112

two keratin proteins confirmed the lower expression levels in adenocarcinoma; therefore, these are signature proteins for lung squamous cell carcinoma.150,151 We propose KRT6A and KRT6C as squamous cell tumor markers for non-small cell lung cancer. Like most keratin marker proteins, KRT3, one of the proposed prognostic markers for tumor recurrence in squamous carcinoma indicated higher protein expression levels in both

SCC groups compared to that of ADC tumors.

The desmosomal protein expression was relatively high in nonrecurrent SCC samples, but was similar in nonrecurrent ADC, recurrent ADC, and recurrent SCC. The results confirmed the dramatic change in desmosomal proteins in SCC making it a unique prognostic marker for SCC tumor recurrence. Considering the respective expression levels, the combination of KRT6C, KRT6A, KRT3 and desmoaomsl proteins would provide a unique protein signature for SCC tumor recurrence which can be also used to distinguish SCC from ADC tumors.

Overall, the comparison of the two recurrent and nonrecurrent data provided potentially useful marker proteins for lung squamous cell carcinoma. However, the number of samples used in the ADC analysis was limited to eight tumor specimens and led to seventy-six (p<0.05) protein expression changes, as compared to squamous cell carcinoma cohort with twelve specimens resulting in ninety-eight (p<0.01) proteins.

Therefore, we propose to validate all potential marker proteins a substantial amount of squamous cell and adenocarcinoma samples.

113

Fold- Accession Description Gene change p-value P27487 Dipeptidyl peptidase 4 DPP4 -5.0 1.0E-04 Q8TD06 Anterior gradient protein 3 homolog AGR3 -4.8 2.0E-05 Specifically androgen-regulated gene Q9BW04 protein SARG -4.8 4.0E-05 Q9P2M7 Cingulin CGN -3.6 0.00091 Acyl-CoA synthetase family member 2, Q96CM8 mitochondrial ACSF2 -3.6 0.0011 HLA class II histocompatibility antigen, HLA- P04440 DP beta 1 chain DPB1 -3.6 0.00058 O95994 Anterior gradient protein 2 homolog AGR2 -3.5 2.0E-04 HLA class II histocompatibility antigen, HLA- Q5Y7A7 DRB1-13 beta chain DRB1 -3.5 0.00023 P06454 Prothymosin alpha PTMA -3.4 0.0017 HLA class II histocompatibility antigen, HLA- P20039 DRB1-11 beta chain DRB1 -3.4 0.00017 Q9ULC5 Long-chain-fatty-acid--CoA ligase 5 ACSL5 -3.3 0.0012 HLA class II histocompatibility antigen, HLA- P01912 DRB1-3 chain DRB1 -3.2 0.00077 HLA class II histocompatibility antigen, HLA- P01911 DRB1-15 beta chain DRB1 -3.2 0.00082 Alpha-aminoadipic semialdehyde P49419 dehydrogenase ALDH7A1 -3.1 0.00022 P23083 Ig heavy chain V-I region V35 *N/A -3.0 0.0065 P08729 Keratin, type II cytoskeletal 7 KRT7 -3.0 0.00035 Q9BUT1 3-hydroxybutyrate dehydrogenase type 2 BDH2 -2.9 0.0045 Q8NFJ5 Retinoic acid-induced protein 3 GPRC5A -2.9 4.0E-03 EGF-containing fibulin-like extracellular Q12805 matrix protein 1 EFEMP1 -2.8 0.0014 Q9NR28 Diablo homolog, mitochondrial DIABLO -2.8 3.0E-03 Q14847 LIM and SH3 domain protein 1 LASP1 -2.7 0.0019 HLA class I histocompatibility antigen, P04439 A-3 alpha chain HLA-A -2.7 0.0075 HLA class II histocompatibility antigen, HLA- P13760 DRB1-4 beta chain DRB1 -2.7 0.0061 Golgi-associated plant pathogenesis- Q9H4G4 related protein 1 GLIPR2 -2.7 0.0069 ……………………………………………….…………………………………..Continued Table 3.6: The differentially expressed protein groups in lung adenocarcinoma vs. lung squamous cell carcinoma nonrecurrent samples. All accession numbers UniProt protein identifiers. The “*” indicates not applicable 114

Table 3.6 continued Fold- Accession Description Gene change p-Value P49407 Beta-arrestin-1 ARRB1 -2.6 0.0022 O96009 Napsin-A NAPSA -2.6 0.0069 Q05655 Protein kinase C delta type PRKCD -2.4 0.0067 P62263 40S ribosomal protein S14 RPS14 -2.4 0.00086 Q8NBJ7 Sulfatase-modifying factor 2 SUMF2 -2.4 0.0059 Q93052 Lipoma-preferred partner LPP -2.4 0.0032 ATP-dependent RNA helicase Q9UMR2 DDX19B DDX19B -2.3 0.0025 P02042 Hemoglobin subunit delta HBD -2.3 0.0098 HLA class I histocompatibility P10321 antigen, Cw-7 alpha chain HLA-C -2.3 0.0097 ATP-dependent RNA helicase Q9NUU7 DDX19A DDX19A -2.2 0.0057 Heterogeneous nuclear Q1KMD3 ribonucleoprotein U-like protein 2 HNRNPUL2 -2.0 0.0082 P00387 NADH-cytochrome b5 reductase 3 CYB5R3 2.0 0.0074 P19367 Hexokinase-1 HK1 2.1 0.0018 P07384 Calpain-1 catalytic subunit CAPN1 2.1 0.0062 Serine/threonine-protein phosphatase 2A 65 kDa regulatory P30153 subunit A alpha isoform PPP2R1A 2.1 0.0058 Plasma membrane calcium- P20020 transporting ATPase 1 ATP2B1 2.2 0.0076 P78371 T-complex protein 1 subunit beta CCT2 2.2 0.0096 P08195 4F2 cell-surface antigen heavy chain SLC3A2 2.2 0.0078 Voltage-dependent anion-selective P45880 channel protein 2 VDAC2 2.2 0.0086 P23528 Cofilin-1 CFL1 2.3 7.0E-04 Cytochrome b-c1 complex subunit P47985 Rieske, mitochondrial UQCRFS1 2.3 0.0065 P23258 Tubulin gamma-1 chain TUBG1 2.3 0.0084 UTP--glucose-1-phosphate Q16851 uridylyltransferase UGP2 2.3 0.0087 Nuclear protein localization protein Q8TAT6 4 homolog NPLOC4 2.4 0.0062 P07205 Phosphoglycerate kinase 2 PGK2 2.4 0.0054 Fragile X mental retardation P51114 syndrome-related protein 1 FXR1 2.4 0.0079 …………………………………………………………………………………...Continued 115

Table 3.6 continued Accession Description Gene FC p-value P05141 ADP/ATP 2 SLC25A5 2.4 0.0066 P07954 Fumarate hydratase, mitochondrial FH 2.4 0.0059 P31947 14-3-3 protein sigma SFN 2.4 0.0012 P31943 Heterogeneous nuclear ribonucleoprotein H HNRNPH1 2.5 0.00029 P19105 Myosin regulatory light chain 12B MYL12A 2.5 0.0018 P29966 Myristoylated alanine-rich C-kinase substrate MARCKS 2.5 0.0033 Q8WVV9 Heterogeneous nuclear ribonucleoprotein L- like HNRNPLL 2.5 0.0085 Q3LXA3 Bifunctional ATP-dependent dihydroxyacetone kinase/FAD-AMP (cyclizing) TKFC 2.5 0.0014 O43633 Charged multivesicular body protein 2a CHMP2A 2.5 0.0079 P55795 Heterogeneous nuclear ribonucleoprotein H2 HNRNPH2 2.5 0.00074 P05388 60S acidic ribosomal protein P0 RPLP0 2.5 0.0025 P52788 Spermine synthase SMS 2.5 0.0049 Q9BRX8 Redox-regulatory protein FAM213A FAM213A 2.5 0.0035 O43752 Syntaxin-6 STX6 2.5 0.0059 P02786 Transferrin receptor protein 1 TFRC 2.6 9.0E-03 O95425 Supervillin SVIL 2.6 0.0058 Q92616 Translational activator GCN1 GCN1 2.6 0.0075 P30154 Serine/threonine-protein phosphatase 2A 65 kDa regulatory subunit A beta isoform PPP2R1B 2.6 0.0043 O14908 PDZ domain-containing protein GIPC1 GIPC1 2.6 2.0E-03 Q460N5 Poly [ADP-ribose] polymerase 14 PARP14 2.6 0.0071 Q9H3U1 Protein unc-45 homolog A UNC45A 2.6 0.0043 Q14166 Tubulin--tyrosine ligase-like protein 12 TTLL12 2.6 0.0019 P20810 Calpastatin CAST 2.6 0.0011 O15020 Spectrin beta chain, non-erythrocytic 2 SPTBN2 2.6 0.0082 P78559 Microtubule-associated protein 1A MAP1A 2.6 0.0092 Q9NP58 ATP-binding cassette sub-family B member 6, mitochondrial ABCB6 2.7 0.0051 O76003 Glutaredoxin-3 GLRX3 2.7 0.0014 Q96S44 TP53-regulating kinase TP53RK 2.7 0.0052 P09012 U1 small nuclear ribonucleoprotein A SNRPA 2.7 0.0096 Q9BYX2 TBC1 domain family member 2A TBC1D2 2.7 0.0034 …………………………………………………………………………………...Continued

116

Table 3.6 continued Accession Description Gene FC p-value P52209 6-phosphogluconate dehydrogenase, decarboxylating PGD 2.7 0.00043 Q9NYQ8 Protocadherin Fat 2 FAT2 3.2 0.0041 Prostaglandin F2 receptor negative Q9P2B2 regulator PTGFRN 3.2 0.0012 Vacuolar protein sorting-associated protein Q8NEZ2 37A VPS37A 3.3 0.0019 Q08431 Lactadherin MFGE8 3.3 0.0033 P43362 Melanoma-associated antigen 9 MAGEA9 3.3 0.0038 Q03013 Glutathione S-transferase Mu 4 GSTM4 3.3 0.0033 P00352 Retinal dehydrogenase 1 ALDH1A1 3.3 0.00018 Transmembrane and TPR repeat-containing Q6ZXV5 protein 3 TMTC3 3.3 0.0013 Q9BYN0 Sulfiredoxin-1 SRXN1 3.3 0.0019 P12532 Creatine kinase U-type, mitochondrial CKMT1A 3.3 0.0011 A8K2U0 Alpha-2-macroglobulin-like protein 1 CKMT1B 3.3 0.0032 P07476 Involucrin A2ML1 3.3 0.0027 P48163 NADP-dependent malic enzyme IVL 3.4 0.00098 P35321 Cornifin-A ME1 3.4 0.0029 Q6NXG1 Epithelial splicing regulatory protein 1 SPRR1A 3.4 0.00069 Receptor-type tyrosine-protein phosphatase P10586 F ESRP1 3.5 0.00037 Arf-GAP with coiled-coil, ANK repeat and Q15057 PH domain-containing protein 2 PTPRF 3.5 0.00026 Q14134 Tripartite motif-containing protein 29 ACAP2 3.5 0.0014 Q14914 Prostaglandin reductase 1 TRIM29 3.5 0.00039 O94888 UBX domain-containing protein 7 UBXN7 3.6 0.00056 P58107 Epiplakin EPPK1 3.6 0.00029 P46821 Microtubule-associated protein 1B MAP1B 3.7 0.00035 P05120 Plasminogen activator inhibitor 2 SERPINB2 3.7 0.0014 Q8IVF2 Protein AHNAK2 AHNAK2 3.8 3.0E-04 Q6YHK3 CD109 antigen CD109 3.8 0.00041 P13645 Keratin, type I cytoskeletal 10 KRT10 3.8 8.0E-05 Q96K17 Transcription factor BTF3 homolog 4 BTF3L4 3.9 3.0E-05 Q8IXT5 RNA-binding protein 12B RBM12B 3.9 0.00064 Q16890 Tumor protein D53 TPD52L1 3.9 0.00014 Q96SQ9 Cytochrome P450 2S1 CYP2S1 4.0 0.00057 …………………………………………………………………………………..Continued

117

Table 3.6 continued Accession Description Gene FC p-Value Q01469 Fatty acid-binding protein, epidermal FABP5 4.1 0.00015 Q9UN81 LINE-1 retrotransposable element L1RE1 4.2 5.0E-05 Q7RTS7 Keratin, type II cytoskeletal 74 KRT74 4.2 0.00022 Q5VT79 Annexin A8-like protein 2 ANXA8L1 4.3 5.0E-05 Q9P265 Disco-interacting protein 2 homolog B DIP2B 4.3 6.0E-05 Q8TEW0 Partitioning defective 3 homolog PARD3 4.3 0.00026 P28161 Glutathione S-transferase Mu 2 GSTM2 4.4 8.0E-05 P33527 Multidrug resistance-associated protein 1 ABCC1 4.5 2.0E-05 P21266 Glutathione S-transferase Mu 3 GSTM3 4.7 4.0E-05 P51857 3-oxo-5-beta-steroid 4-dehydrogenase AKR1D1 4.7 4.0E-05 Glutamate--cysteine ligase catalytic P48506 subunit GCLC 4.7 0 Q86Y46 Keratin, type II cytoskeletal 73 KRT73 4.8 3.0E-05 P62760 Visinin-like protein 1 VSNL1 4.9 1.0E-04 P35052 Glypican-1 GPC1 5.0 3.0E-05 P13928 Annexin A8 ANXA8 5.0 1.0E-05 P23141 Liver carboxylesterase 1 CES1 5.2 2.0E-05 Q9UIV8 Serpin B13 SERPINB13 5.4 4.0E-05 O15440 Multidrug resistance-associated protein 5 ABCC5 5.5 4.0E-05 P01040 Cystatin-A CSTA 5.5 0 Q6ZRV2 Protein FAM83H FAM83H 5.7 0 P14923 Junction plakoglobin JUP 5.8 0 Q8N1N4 Keratin, type II cytoskeletal 78 KRT78 6.1 1.0E-05 Q9H3D4 Tumor protein 63 TP63 6.1 0 Q5T0W9 Protein FAM83B FAM83B 6.4 0 Q6ZN66 Guanylate-binding protein 6 GBP6 8.5 0 P52895 Aldo-keto reductase family 1 member C2 AKR1C2 9.5 0 Q5XKE5 Keratin, type II cytoskeletal 79 KRT79 9.7 0 P13647 Keratin, type II cytoskeletal 5 KRT5 10. 0 Q02487 Desmocollin-2 DSC2 10. 0 P42330 Aldo-keto reductase family 1 member C3 AKR1C3 11 0 Alcohol dehydrogenase class 4 mu/sigma P40394 chain ADH7 11 0 Q04828 Aldo-keto reductase family 1 member C1 AKR1C1 11 0 P27482 Calmodulin-like protein 3 CALML3 11 0

118

Fold- Accession Description Gene change p-Value Q14574 Desmocollin-3 DSC3 19 0 Q13835 Plakophilin-1 PKP1 25 0 O95678 Keratin, type II cytoskeletal 75 KRT75 26 0 P04259 Keratin, type II cytoskeletal 6B KRT6B 28 0 P02656 Apolipoprotein C-III APOC3 -3.7 0.0095 Q8IXQ6 Poly [ADP-ribose] polymerase 9 PARP9 5.4 0.0049 Q9UJU6 Drebrin-like protein DBNL -2.7 0.0049 Q16647 Prostacyclin synthase PTGIS 4.3 0.0062 P01620 Ig kappa chain V-III region SIE * 3.0 0.0069 P04206 Ig kappa chain V-III region G * 3.4 0.0049 P18135 Ig kappa chain V-III region HAH * 3.0 0.0072 P12035 Keratin, type II cytoskeletal 3 KRT3 6.5 0.00027 O15269 Serine palmitoyltransferase 1 SPTLC1 4.7 0.0058 Q9NZN4 EH domain-containing protein 2 EHD2 4.2 0.0019 O15347 High mobility group protein B3 HMGB3 -5.2 0.0025 Q9UMY4 Sorting nexin-12 SNX12 3.4 0.0047 P12955 Xaa-Pro dipeptidase PEPD 3.7 0.0071 P02042 Hemoglobin subunit delta HBD -3.7 0.0036 Q96S44 TP53-regulating kinase TP53RK 6.1 0.0017 P06454 Prothymosin alpha PTMA -5.5 0.0038 P02538 Keratin, type II cytoskeletal 6A KRT6A 4.4 0.00081 P48668 Keratin, type II cytoskeletal 6C KRT6C 3.3 0.0047 Protein-glutamine gamma- P21980 glutamyltransferase 2 TGM2 2.5 0.0085 P04632 Calpain small subunit 1 CAPNS1 4.0 0.0063 Q03001 Dystonin DST 4.7 0.0065 HLA class II histocompatibility HLA- P20039 antigen, DRB1-11 beta chain DRB1 -4.7 0.00038 Calcium/calmodulin-dependent protein Q96RR4 kinase kinase 2 CAMKK2 5.5 0.0055 Phosphoribosylformylglycinamidine O15067 synthase PFAS 2.7 0.0096 P18621 60S ribosomal protein L17 RPL17 5.2 0.0015 P13928 Annexin A8 ANXA8 9.6 1.0E-04 Q6FHJ7 Secreted frizzled-related protein 4 SFRP4 4.5 9.0E-03 Table 3.7: The differentially expressed protein groups in lung adenocarcinoma vs lung squamous cell carcinoma recurrent samples. All accession numbers UniProt protein identifiers. The “*” indicates not applicable

119

3.4 Conclusions

Overall, both adenocarcinoma and squamous cell carcinoma tumor specimens produced over five thousand protein group identifications per tumor cohort. Among differentially expressed adenocarcinoma tumor proteins, we identified a potential tumor metastasis and invasion protein signature that included: IGF2BP1, IGF2BP2, IGF2BP3,

SNCG and GMFG proteins with a fold change greater than two and p-value less than

0.05. However, the attempt to validate these protein level changes in mRNA data did not result in any genes that correlate at both mRNA and protein levels. The mRNA data rather produced a unique signature with high RNA-RNA correlation between Vanderbilt and Ohio State ADC tumor data.

In contrast, the squamous cell carcinoma tumor protein data revealed many significantly altered proteins in recurrent patient tumor specimens compared to nonrecurrent. The potential protein signature included: KRT17, KRT76, AKR1C1,

AKR1C2, AKR1C3, AKR1B10, CBR1, CBR3, SNCG, TGM2, RRAD and all desmosomal proteins. Desmosomal proteins presented the highest expression level changes discovered in the SCC recurrent patient tumor specimens compared to nonrecurrent. Plakophilin, a desmosomal armadillo protein reported the highest fold change (eighty-six fold) revealed in the shotgun analysis. Unlike ADC tumor cohort data, we validated desmosomal protein level changes in mRNA data with a positive RNA to protein correlation. The change in desmosomal proteins was unidirectional and unique with no associated adherense or tight junction protein level modifications.

120

The proteomic data generated for the two tumor cohorts produced potential protein level changes that could distinguish recurrent patient tumors from nonrecurrent patient tumors at an early stage. The protein signature produced for adenocarcinoma was based on data generated for eight tumor specimens that did not correlate with corresponding mRNA data. Therefore, we propose to validate these protein level changes in substantial amount of tumors to obtain confident markers for ADC tumor recurrence. The squamous cell carcinoma tumor cohort, however, produced a potential protein signature that included desmosomal proteins with RNA to protein level correlation. However, this cohort only contained twelve tumors specimens that reduce the confidence of the protein signature. Therefore, we recommend validating these changes in more SCC tumors by targeted mass spectrometry. Further, we propose to investigate tumor cell migration and invasion in cell lines with no desmosomal protein expression.

121

Chapter 4: Free Proteomics and Phosphoproteomics for Detection and Semi-

Quantification of the Effects of Liver Kinase B1 Protein in Lung

Cancer Cells

4.1 Introduction

The fundamental obligation of a cell is to sustain life by proliferation and cell division. This process is tightly controlled in normal cells. Signaling pathways respond to environmental cues, such as nutrient and energy availability to regulate growth and survival. Liver kinase B1 (LKB1; also known as STK11) is a key serine-threonine kinase protein central to maintaining energetic balance.152,153 It is a 433 amino acid protein with a molecular weight of 48.6 kDa that contains a kinase domain spanning from amino acid

49-309 (Figure 4.1.A).

Free LKB1 is located in the nucleus, where it is kept inactive. It is transported as a complex into the cytoplasm when bound to STE20-related adaptor protein (STRAD) and

Mouse protein-25 (MO25) (Figure 4.1.B). The heterotrimeric complex predominantly mediates metabolic signaling and control through AMP-activated protein kinase

(AMPK).153 In addition to AMPK, LKB1 phosphorylates other downstream kinase and non-kinase proteins including salt-inducible kinase proteins (SIK’s) and serine/threonine- protein kinase MARK1 (MARK) as indicated in Figure 4.2. Mutations in LKB1 leads to

122

the cancer disorder Peutz-Jeghers syndrome (PJS).154,155 LKB1 is also one of the most common genes prone to somatic mutations156 leading to non-PJS cancers such as non- small cell lung cancer (NSCLC). LKB1 loss has been reported to occur in about 30-40% of all cases of non-small cell lung adenocarcinomas.157

Figure 4.1: A- The LKB1 protein consists of 433 amino acids. The kinase domain spans from amino acid 49 to 309 B- inactive LKB1 in the nucleus binds to STRAD and MO25. This initiates the export of LKB1 complex into the cytoplasm. Within the cytoplasm, the complex initiates downstream signaling. Reprinted with permission.158 123

The LKB1 protein regulates cell signaling mainly via phosphorylation of AMPK- related proteins.156,159 For instance, the phosphorylation of AMPK leads to activation of the tuberous sclerosis complex (TSC) suppressing the mammalian target of rapamycin

(mTOR) signaling pathway.159,160 This process preserves cellular energy under metabolic stress supporting cell survival. Further, LKB1 controls the downstream target of AMPK, tumor protein 53 (p53) associated with p53 dependent apoptosis pathway.159 p53 regulates proliferation and cell survival by controlling the phosphorylation of proteins such as phosphatase and tensin homolog (PTEN), signal transducer and activator of transcription 3 (STAT3), mitogen-activated protein kinase (JNK) and Myc proto- oncogene protein (c-Myc).161,162 Recently the activation of the LKB1-SIK pathway has been found as a key modulator of apoptosis induced by cell detachment and transformation.163 Mutations in LKB1 can interfere with the regulation of such proteins causing the deregulation of signaling cascades that ultimately results in tumor cell growth, proliferation, and anti-apoptosis.

Although previous studies indicated the effect of LKB1 mutations on signaling pathways, the comprehensive mechanism and the relationship of LKB1 to NSCLC remains unclear. LKB1 loss has been reported in human lung cancer cell lines and mouse models with significant changes in PIK3/AKT/FOXO3 pathway at transcriptomic level.3

Though the study provided variations at the transcriptomic level, the effects on the proteome have not been investigated to this point. Protein level information is composed of not only the quantitative information but also the qualitative information such as the posttranslational modifications that may be useful in biotherapeutics.

124

Figure 4.2: The downstream signaling pathway of LKB1. The active LKB1 complex activates its direct targets SNRK, NUAK, BRSK, MARK, SIK, and AMPK. This initiates different signaling cascades that alter cell polarity, , growth, and proliferation. Reprinted with permission.152

Here, we present our work on identifying these pathways based on label-free differential expression analysis of phosphoproteins, proteins, and localization of phosphorylation sites by liquid chromatography coupled to tandem mass spectrometry.

Mass spectrometry provides an ideal platform for discovery studies of proteins due to its capability to detected and quantify proteins. Further, the MS platform enables the detection of proteins and phosphorylation sites undetected by techniques like western blot analysis due to lack of suitable antibodies. We used the adenocarcinoma cell line A549

(LKB1-/-), which had been transfected with an empty lentiviral vector and LKB1 active gene-carrying vector, respectively. The complexity of the sample complicates the 125

successful detection of low-abundance phosphopeptides. Therefore, the digests of modified cell lines were enriched for phosphorylated peptides using TiO2 beads. Two- dimensional liquid chromatography with high pH fractionation followed by low pH separation coupled to mass spectrometry was used for detection and relative quantification.

Here, we identified the effects on the proteome and phosphoproteome directly linked to LKB1 expression. The analysis of LKB1 expressing A549 cells indicates the effect of LKB1 on calveolar formation in conjunction with membrane receptor-based signaling, especially EGFR and NOTCH 2. Further, the experimental data highlights the ability of LKB1 to function as a tumor suppressor. In particular, the overexpression of

LKB1 is linked with lower levels of aldo-ketoreductase proteins, tensisn-4, contactin and elevated levels of tight junction protein. The data further indicates that the loss of LKB1 expression alters cellular signaling pathways like the EGFR and SIK-CRTC signaling.

Such changes can be linked to metastasis and proliferation of NSCLC tumors without

LKB1 protein expression.

4.2 Materials and methods

4.2.1 Cell culture and digest preparation

The cell lines were obtained from the Carbone lab at The James cancer center at The

Ohio State University and tested to be free of mycoplasma (Lonza, MycoAlert™ PLUS

Mycoplasma Detection Kit). The A549 cell line used in the experiment was transfected

126

with a lentivirus that contains active LKB1 gene as described previously.3 As a control,

A549 cells were infected with the empty lentiviral vector generating LKB1-deficient

(vector/V) cells. The cells were grown to 75% confluence in 15 cm dishes containing

RPMI 1640 media (Gibo) and 10% fetal bovine serum and maintained at 5% CO2, 37 ˚C for 72 h. The cells were washed with cold phosphate buffered saline (PBS) and lysed in lysis buffer containing: 6M urea, 40 mM NaCl, 50 mM Tris (pH 8), 2 mM MgCl2, 50 mM NaF, 1mM sodium orthovanadate, 10 mM sodium pyrophosphate, 1 mini EDTA- free protease inhibitor (Roche Diagnostics) and PhosphoStop phosphatase inhibitor

(Roche Life Sciences, USA).164

The total protein concentrations were measured by using a BCA protein assay kit

(Thermo Fisher Scientific). The digestion conditions were optimized for trypsin-

ProteaseMax and trypsin-LysC protease mixture. A total of 1 mg of protein lysate was reduced with DTT (Sigma Aldrich) at a final concentration of 10 mM at 37 ˚C for 1 h followed by an alkylation step with iodoacetamide (Sigma Aldrich) at a final concentration of 55 mM at room temperature for 30 min in the dark. The samples were diluted to a final concentration of 1.5 M urea with 50 mM Tris, digested in-solution with sequencing grade trypsin (1:20, protein to protease) and 1% ProteaseMax detergent

(Promega Corporation) at 37 ˚C for 3 h. The trypsin-LysC containing sample was incubated overnight at 37 ˚C without ProteaseMax. The digests were acidified with 0.5%

TFA, desalted with Waters Sep-Pak (C18, SPE, 5 mg) columns and evaporated to dryness in a Speed-vac concentrator (Thermo Fisher Scientific).

127

4.2.2 Phosphorylated and unmodified peptide selection

Peptides and phosphopeptides were separated using TiO2 beads (GL sciences,

65,165 Japan) containing Mobicol spin columns (Mobicol, Germany). The TiO2 bead bed weight was optimized for 1 mg of protein digest. The number of phosphopeptides recovered was experimentally determined, and the efficiency of enrichment was calculated for bead weights as described.166,167

For phosphopeptide enrichment, the samples were dissolved in loading buffer (80%

CH3CN, 300 mg/mL lactic acid, 2% TFA) and loaded onto TiO2 spin columns. The loading step was followed by an end-to-end rotation of the spin columns on a tube rotator

(Fisher Scientific) for 10 mins. The columns were washed with 1 mL of equilibration buffer (80% CH3CN, 2% TFA) followed by 1 mL of wash buffer (1% CH3CN, 0.1%

Formic acid). The flow-through was collected after each step, and the phosphopeptides were eluted first with 0.3 M NH4OH and then with 7 M NH4OH. The eluate was evaporated to dryness in a speed-vac concentrator. The flow-through fraction was evaporated to dryness, reconstituted in equilibration buffer (0.1 % TFA), desalted using

Sep-Pak C18 columns and evaporated to dryness before storing in -80 ˚C.

4.2.3 Liquid chromatography coupled to mass spectrometry

Liquid chromatography coupled to tandem mass spectrometry analysis of the samples was performed using a Waters nanoACQUITY two-dimensional (2D) UPLC system (Waters Corporation, Milford, MA) with two reversed phase columns interfaced

128

to a Thermo LTQ-Orbitrap Elite mass spectrometer (Thermo Fisher Scientific, San Jose,

CA). All peptide samples were fractionated online prior to analytical separation. The peptides were fractionated in the first reversed phase C18 column (Waters BEH C18, 130

Å, 1.7 µm particle diameter, 300 µm i.d., 100 mm) at pH 10.0 in 20 mM ammonium formate (A1) buffer by varying the amounts of 100% CH3CN (B1) in the mobile phase.

The column was equilibrated at 3% (v/v) B1 at the beginning of peptide fractionation.

The %B composition was increased to 4.7% (v/v) in 1 min to elute the first fraction of peptides and held at the same %B for another 4 min. At the end of 5 min the mobile phase composition was dropped back to 3% (v/v) B1 and kept the same during trapping at a steady flow rate of 2 µL/min. The %B1 (v/v) during each peptide fractionation was:

4.7, 9.0, 10.8, 12, 13.1, 14, 14.9, 15.8, 16.7, 17.7, 18.9, 20.4, 22.2, 25.8 and 65 respectively. Each peptide eluate was loaded onto the Waters symmetry C18 trap column

(100 Å, 5 µm, 180 µm x 20 mm) at a flow rate of 20 µL/min followed by analytical separation in the second C18 reversed phase column (Waters HSS T3, C18, 100 Å, 1.8

µm particle diameter, 75 µm i.d. X 150 mm) at pH 2.4. The column was pre-equilibrated with 95% of 0.1 % formic acid (A2) and 5% (v/v) of CH3CN with 0.1 % formic acid

(B2). The subsequent separation was achieved by four linear gradients at 38 ˚C where the

%B2 was increased from 5%-9% in 3 min; 9%-30% over 44 min; 30%-40% over 5 min;

40%-85% over 5 min at a flow rate of 0.5 µL/min. The column was held at 5% (v/v) B2 from 65 min- 80 min for equilibration.

The 2D UPLC was coupled to an LTQ-Orbitrap Elite via a nanospray Flex ion source (Thermo Fisher Scientific, Bremen, Germany) containing a 30 µm inner diameter

129

stainless steel emitter (Thermo Fisher Scientific) using a spray voltage between 1.7- 1.8 kV. The Orbitrap mass spectrometer was operated in top ten data-dependent acquisition mode (DDA) to acquire ten MS/MS scans for every full MS-scan. The full MS-scan was acquired in the Orbitrap MS-analyzer with resolution R = 120,000 at m/z 400 at an AGC target value of 107 charges. This method triggered collision induced dissociation (CID) and detection of fragments in the linear ion trap MS-analyzer at an AGC target value of

5000 charges. The method used ion transfer tube temperature at 275 ˚C: S-lens RF 55%; dynamic exclusion with a repeat count 1 and repeat duration of 15 s for an exclusion list size of 500 mass-to-charges; CID with normalized collision energy of 35%, q= 0.25 and activation time of 10 ms; the minimum intensity threshold for products was set to 6000 counts.

4.2.4 Protein identification and semi-quantification

The MS data were analyzed using the software environment MaxQuant version

1.3.0.5 (downloaded April 2015)168,169 with the Andromeda search algorithm69. The peptides and proteins were assigned by searching MS and MS/MS data against a customized (canonical) Homo sapiens database (www.uniprot.org, downloaded on

03/15/2014) with 247 contaminant sequences (ftp://ftp.thegpm.org/fasta/cRAP). The raw files generated by Xcalibur software (Thermo Fisher Scientific) for all protein and phosphoprotein samples were used in the peptide search. Each protein digest was given a unique experimental number and all fractions collected for one sample were given a number from 1-15 based on the order in which the files were acquired. 130

A maximum of two missed cleavages was allowed for every fully tryptic peptide disregarding the miss cleaved lysines or arginines before a proline (proline rule). The minimum peptide length was set to six amino acids. The MS/MS spectra were searched with fixed carbamidomethyl modification at cysteine, and variable acetylation at protein

N-termini, deamidation at asparagine and glutamine and oxidations of methionines.

When searching for phosphorylations, variable phosphoryl modifications were introduced at serine, threonine and tyrosine residues. The precursor mass tolerance was set to 10 ppm (acquired in Orbitrap) followed by fragment mass tolerance of 0.5 Da (acquired in ion trap). The peptides were identified at a false discovery rate of 0.001 and proteins at a false discovery rate of 0.01. The spectral counts obtained for proteins with a minimum of two high confidence (99% or higher) peptides were used in the statistical analysis of unmodified proteins. For phosphoprotein groups, the minimum peptide requirement was changed to one high confidence phosphopeptide (rank 1).

4.2.5 Statistical analysis of the data

The raw spectral counts reported for proteins and phosphoproteins were filtered to remove protein groups with low spectral counts (i.e. 0 or 1 for most samples). The remaining spectral counts were normalized across samples by the median-of-ratios method used in DESeq2 package.112

Spectral counts were modeled based on the assumption of a Negative Binomial distribution to account for overdispersion. Generalized linear models (GLM) with the logarithmic link were used for comparisons (LKB1 vector and wild-type). Empirical 131

Bayes Shrinkage estimation for dispersions and fold changes was used to improve the stability of the estimates by the R DESeq2 package.112 The Wald test was applied for testing differential expression of individual proteins between the two groups of interest.

A p-value cut-off criterion was used to control the expected number of false positives.113

4.2.6 Annotating proteins identified in A549 cell lines

We determined the molecular biological importance and cellular localization of proteins by annotation using Software tools for rapid protein annotation (STRAP, http://www.bumc.bu.edu/cardiovascularproteomics/cpctools/strap).170

4.2.7 Immunofluorescence measurements of lung cancer cells

A549 and HCC15, vector and LKB1 wild-type, Calu-1 parental and LKB1 knockout

(N12) and Calu-6 parental and LKB1 knockout (F2 and G2) cell lines were cultured in multiwell chamber slides overnight. The cells were treated with vehicle (DMSO) or 300 nM concentration of SIK inhibitor (HG-9-91-01, MedChem Express, New Jersy, USA) for 1, 4, 12 and 24 h. The cells were fixed in 4% paraformaldehyde for 30 min at room temperature and rinsed with PBS. Next, the cells were blocked in buffer containing 5%

BSA, 0.1% Triton-X, and 1X PBS for 60 min, rinsed and incubated with anti- CRTC3 primary antibody (Abcam, 1:200 in 3% BSA, 1X PBS) overnight. Finally, the samples were rinsed with PBS and incubated with anti-rabbit-Alexa594 (Invitrogen, 1:500) for 1 h. The cells were rinsed, and coverslips were mounted with Prolong Gold Antifade

132

reagent containing DAPI (Invitrogen). The slides were imaged on a UPlan FLN confocal microscope equipped with a Plan Apochromat N.A. 1.46 lens.

The experiment was repeated in both A549 and HCC15 cell lines using the most effective SIK inhibitor incubation time determined based on the above experiment. All sample preparation steps were reproduced as given above. The samples were incubated with both anti- CRTC3 primary antibody (Abcam, 1:200 in 3% BSA, 1X PBS) and anti-

CRTC2 primary antibody (Cell Signaling, 1:200 in 3% BSA, 1X PBS) overnight. In the end, the CRTC2 was incubated with anti-mouse-Alexa488. The sample preparation was reproduced for knockout cell lines without SIK inhibitor.

4.3 Results and Discussion

4.3.1 Optimization of sample preparation and phosphopeptide enrichment

LKB1 is a serine/threonine kinase that controls metabolic and growth signaling via phosphorylation of downstream targets. A549 is a lung adenocarcinoma cell line that contains a somatic nonsense mutation resulting in a truncated LKB1 protein. Transfection of the gene back into the genome of A549 cells restored LKB1 expression as indicated in

Figure 4.3.A. The expression of LKB1 protein has a strong effect on the phenotype as shown in Figure 4.3.B. The LKB1 deficiency results in small, globular cells, whereas

LKB1 wild-type cells show normal cell morphology and have a slower growth rate.

Figure 4.3.C presents the general workflow used in the experiment.

133

To determine the most effective digestion method for phosphorylated proteins, we digested the protein samples with the two most commonly used proteases: trypsin and a trypsin-LysC mixture. Trypsin cleaves the proteins at the C-terminus of lysine (K) and arginine (R) and LysC cleaves at the C-terminus of lysine. The moderate abundance of K and R in the proteins generates peptides within the range of m/z 400-2000 which are compatible with tandem mass spectrometry fragmentation and detection.

Figure 4.3: A- The western blot of LKB1 vector (V) and wild-type (WT) cell lysates. GAPDH protein was used as a loading control, B- the morphology of LKB1-vector and wild-type cell lines in cell culture, C- the general workflow of the experiment

134

Figure 4.4: The optimization of sample preparation, A- shows the total number of phosphopeptides detected in enzyme/surfactant and trypsin/lysC digested samples, B- shows the total number of phosphopeptides detected with each TiO2 bead bed weight, C- the total number of singly, doubly and triply phosphorylated peptides identified in 4 mg TiO2 bed sample, D- shows the amounts of serine, threonine and tyrosine phosphorylation sites detected in the 4 mg TiO2 bed sample.

The peptides were separated by one-dimensional HPLC with a 60 min linear gradient from 5% to 85% CH3CN. The data was acquired by top 10 data-dependent acquisition mode. Samples digested with Trypsin-ProteaseMax had 159 more high 135

confidence (99% or higher) peptides than samples digested with the trypsin-LysC multi- enzyme combination (Figure 4.4.A). Therefore, Trypsin-ProteaseMax was used throughout the experiment.

The enrichment of phosphopeptides was carried out with 2, 4 or 8 mg of TiO2 as the bead beds. The TiO2 bead bed weights 2, 4 and 8 mg yielded an average of 1167, 1207 and 1112 phosphopeptides (Figure 4.4.B) respectively, in 3 trials each. The 4 mg TiO2 bed sample enriched for the highest number of phosphopeptides with 1029 monophosphorylated, 159 diphosphorylated, and 19 tri-/ multi-phosphorylated peptides

(Figure 4.4.C). We observed 1121 serine-, 118 threonine- and 19 tyrosine phosphorylation sites in the enriched samples. The data presented all three phosphorylation sites identified in the human proteome at a serine: threonine: tyrosine ratio of 59:6:1 respectively (Figure 4.4.D).

4.3.2 Identification of proteins and phosphoproteins in A549 digests

The complexity of both unmodified and phospho-modified peptide samples is one of the reasons for poor peptide yield. Like, any other complex proteomic sample, the phosphopeptide samples can be fractionated prior to phospho-enrichment55 or after enrichment to improve peptide detection and identification.171 We fractionated all peptide samples after enrichment by high pH reversed phase fractionation at pH 10.0 followed by low pH (2.4) reversed phase separation. The high pH HPLC system was in-line with the low pH separation system to synchronize each sample run between fractioning, trapping

136

and analytical separation. The chromatograms representing a few phosphopeptide fractions are given in Figure 4.5.A.

During fractionation, the high pH buffer, 20 mM ammonium formate negatively charges the peptides altering phosphopeptide affinity to the reversed phase C18 column as described previously171. Peptides with multiple phosphoryl groups contain a higher negative charge hence, elute from the column earlier than singly phosphorylated peptides.

Further, the peptides that elute at the end of fractionation are mostly singly phosphorylated or unmodified. Figure 4.5.B, confirms this behavior of phosphopeptides with more peptides eluting in the first few fractions of the liquid chromatography method.

Most phosphopeptides contained a single phosphorylation site, and only 19% of the total phosphopeptides identified in A549 cell lines contained phosphoryl group multiplicity greater than or equal to two.

Unlike unmodified peptides, phosphopeptides exhibit different inherent properties in solution and the gas phase.55 In solution, some phosphoryl group containing peptides resist for tryptic digestion resulting in mis-cleaved peptides. This is most commonly observed in serine and threonine phosphopeptides.172 Based on the experimental evidence, phosphopeptides presented longer peptide lengths than unmodified peptides.

The average length of an A549 phosphoryl group containing peptide was 20 amino acids, and this was 5 amino acids longer than that of an unmodified peptide. The missed cleavages detected in the peptides clarified the difference associated with peptide lengths.

About 60% of the A549 vector, phosphopeptide data presented miss cleaved peptides, and this was about 35% in the unmodified MS data generated from the same sample. 137

Icelogo173 analysis (https://github.com/compomics/icelogo) of phosphopeptide sequences with one or two missed cleavages revealed preserved proline (P), lysine (K) and arginine (R) amino acids in phosphoserine and phosphothreonine peptides. As indicated in Figure 4.5, most missed cleavages on these peptides were found between amino acids 1-4 (-4 to-7 position from number 8), 7 (-1) and 10 (+2) (Figure 4.6). It is assumed that salt bridges formed between a serine or a tyrosine and a lysine or an arginine prevents tryptic cleavage resulting in a mis-cleaved peptide.

Proline-containing motifs are common in regulatory and structural kinases and Pro is known to be abundant in turns; therefore, most missed cleavages contained a Pro- directed motif (Figure 4.6.A and Figure 4.6.B). The motif analysis further indicated the presence of acidic groups like aspartate or glutamate (Figure 4.6). These acidic motifs are also common in some kinase proteins. As indicated in Figure 4.6.C, tyrosine phosphorylation sites also present prevalent acid-directed motifs (D and E); however, we are unable to draw a conclusion on the significance of these sites due to the smaller number of tyrosine peptides detected in A549 cell lines.

As previously reported172, having a phosphoryl modification on the peptide changes the charge state distribution. In this analysis, we observed a similar trend. A much higher fraction of phosphopeptides presented a charge state of +3 compared to unmodified peptides with a majority of +2 charges. Further, the average mass of a modified peptide was about +100 Da higher than that of an unmodified peptide (phosphopeptide m/z-

821.3033 and unmodified peptide m/z- 706.0853).

138

A Fraction 1 4.7% B

Fraction 2 9.0% B

Fraction 5 13.1% B

Fraction 10 17.7% B

Fraction 15 65.0% B

B

Figure 4.5: Shows A- chromatograms of fraction 1, 2, 5, 10 and 15 of a 15 fraction multi-dimensional liquid chromatography run for phosphopeptide analysis of A546, LKB1 wild-type. B- the number of total phosphopeptides identified in each high pH fraction following low pH analytical separation. The %acetonitrile composition is given in solid blue line.

139

A

B

C

Figure 4.6: The Icelogo motif analysis of peptide sequences with one or two missed cleavages containing A- serine B- threonine and C-tyrosine phosphorylation sites. All phosphorylation site residues were aligned at position eight before the analysis and fourteen residues around the phosphorylation site were selected. Symbol sizes represent the conservation level based on information theory.

140

Figure 4.7: Representative fragmentation spectra of two phosphopeptide sequences indicating the fragmentation sites and respective fragment masses. A- product ion spectrum shows a phosphorylated peptide with 0.4 site probability, B- product ion spectrum shows a peptide with 0.9 phosphorylation site probability.

Overall we used 240 chromatograms containing 628956 MS/MS spectra acquired for 4 biological replicates in each of two sample groups, vector and wild-type cells. A

MaxQuant data search of phosphopeptide spectra resulted in a maximum of 16153 high confidence (99% or higher) peptides at 0.001 peptide false discovery rate. A total of

10734 phosphopeptides identified had a localization probability higher than or equal to

141

0.75 (Figure 4.7). The statistical analysis of phosphorylation sites only contained phosphopeptides with high probability site occupancies.

We assembled these peptides with phosphoryl modifications into 3246 phosphoprotein groups (Figure 4.8) and provide their respective spectral counts for each quantification. Each phosphopeptide contained a minimum of one confident phosphorylation at serine, threonine or tyrosine. In both sample groups, the distribution of serine (89%), threonine (8%) and tyrosine phosphorylation sites (1.7%) was similar to what has been reported in the literature.65,174

As described in section 4.2.2 , TiO2 affinity purification separated phosphopeptides from unmodified peptides. The unmodified peptides exhibit a very low affinity towards

TiO2 beads. Therefore, the peptides were collected in flow-through samples. The tandem mass spectrometric analysis of the respective samples yielded a total of 681538 spectra identifying a total of 60433 unmodified peptides. Each protein group consisted of a minimum of two peptides and a protein false discovery rate of 0.01. The total unmodified protein groups identified were 4160 (Figure 4.8.A). Among these, 1771 protein groups presented phosphoryl modification (Figure 4.8.B).

The quantification in label-free experiments is either based on normalized intensity or spectral counts. Here, we present our data obtained by the latter method as described in detail in section 4.2.4 .

142

Figure 4.8: shows the A- the total number of protein groups and phosphoprotein groups identified in LKB1-vector and wild-type samples B- the Venn diagram shows the overlapping proteins in protein and phosphoprotein groups.

4.3.3 The differentially expressed proteins indicate protein and phosphorylation level

changes associated with LKB1

We used the lists of proteins and phosphoproteins identified in the analyses to discover all differentially expressed proteins in LKB1 vector and LKB1 wild-type re- expressing samples. Among 3246 phosphoprotein groups, the abundance of 70 (2.1%) proteins was significantly affected (p<0.05) by LKB1 protein expression. Figure 4.9.A presents the top 150 changes detected in the phosphoproteome of A549 cell lines. The

LKB1 associated effects of the proteome were more pronounced than that of the phosphoproteome resulting in more LKB1 protein activity dependent changes.

143

A B

Figure 4.9: The heat maps generated in MeV 4.9.0 for selected differentially expressed LKB1 vector (V) and LKB1 wild-type (WT) A- A549 phosphoproteins B- A549 proteins identified in the statistical analysis. Yellow indicates higher expressions and blue indicates lower expression levels of proteins and phosphoproteins.

144

Gene name Fold-change p-value AKR1C2 -4.1 0

AKR1B1 -2.0 0

AHNAK2 2.4 0 NAMPT -3.7 0 TGM2 -4.4 0 AGR2 -26.0 0 NNMT -9.5 0 CPLX2 -8.1 0 CNTN1 -11 0

SLC12A2 -11 0

CPS1 -12 0 DPYSL3 25 0 AKR1C3 -4.4 1.0E-05 MAP7 -8.6 1.0E-05 ALDH3A1 -2.6 2.0E-05 PC -6.1 2.0E-05

SYNE1 4.5 2.0E-05

AKR1B10 -2.0 5.0E-05 HSPA2 -3.9 6.0E-05 MTUS1 -6.8 6.0E-05 KYNU -2.5 0.00013 GLRX -5.5 0.00017 SERPINE1 3.9 0.00017 KRT18 -2.0 0.00021

ATP2B1 -2.9 0.00021

AFAP1L2 5.8 0.00026 PDLIM5 -2. 3.0E-04 ACSL4 5.2 0.00035 EPHB2 5.6 0.00036 CA12 -3.3 0.00038 TACC2 -4.9 7.0E-04

TJP1 2.5 9.0E-04

ADD2 5.0 0.00091 AHNAK 1.6 0.0014 ……………..…………………………Continued Table 4.1: The differentially expressed proteins in the proteome (flow-through sample) indicating significant changes in wild-type sample comapred to LKB1 vector. Gene name notation is from uniprot. 145

Table 4.1 continued

Gene name Fold-change p-value PODXL 4.2 0.0015 NOLC1 4.6 0.0016 GPAT3 -4.6 0.0017 RTN4RL2 -4.6 0.0017 ATRX 4.4 0.0021 VIM 1.6 0.0022

STAT1 -2.2 0.0025

CAV1 2.4 0.0026 PPL 4.2 0.0032 CD55 -4.2 0.0033 SIRPA 4.0 0.0040 EPB41L2 2.7 0.0041 ZNF185 3.9 0.0051 AXL 3.5 0.0058

TRIO 3.7 0.0063

TMSB4X -2.8 0.0069 ITGB5 -3.6 0.0074 MUC5AC -3.6 0.0078 PAPSS2 -2.9 0.0082 PLOD2 -2.6 0.0089 GTPBP4 3.6 9.0E-03 DOCK10 2.7 0.0097

ANXA13 -3.4 9.0E-03

LMO7 -3.4 1.0E-02 HMOX1 2.6 1.0E-02 NOP2 3.2 0.012 PTMS -3.4 0.012 SLC2A3 -3.3 0.012 HMGB2 3.1 0.012 EPHX1 -2.2 0.013

DFNA5 -3.2 0.014

TPT1 -1.8 0.014 PYGB 1.5 0.014 MFGE8 3.0 0.015 ………………………………………...Continued

146

Table 4.1 continued

Gene name Fold-change p-value DSP -2.1 0.016 HIST2H2BE -3.1 0.016 SLC3A2 -1.6 0.016 EPHA2 2.4 0.016 ALDH2 -1.9 0.016 TCOF1 3.2 0.017

TUBB2B 3.0 0.018

MISP -2.1 0.019 SELENBP1 -3.0 0.019 HTATIP2 -2.5 0.019 IL18 -2.2 0.019 EHD1 1.6 2.0E-02 RBM39 2.9 2.0E-02 HIST2H3A -2.9 2.0E-02

HIST2H3C 2.4 0.021

HIST2H3D 1.9 0.022 EEA1 2.4 0.022 STAT3 2.8 0.023 SSRP1 -1.4 0.023 CHD4 -2.5 0.024 DPYSL2 -1.4 0.024 SERPINB9 -2.2 0.025

PGD -2.8 0.026

PON2 -2.9 0.026 VMP1 2.9 0.026 MYO15B 2.8 0.026 TRA2B -2.9 0.026 STK26 2.8 0.026 LIMCH1 2.5 0.027 CEP55 2.4 0.027

SBNO1 2.6 0.028

SRSF1 -2.9 0.028 BAZ1B 2.4 0.029 HSPA6 -1.5 3.0E-02 FTSJ3 2.2 3.0E-02

…………………………..……………...Continued 147

Table 4.1 continued Gene name Fold-change p-value

LDHA 2.8 0.031 INA -1.8 0.032 LUC7L2 2.8 0.032 JUP -2.7 0.033 PFKFB3 2.8 0.033 PLA2G4A 2.2 0.034 CIRH1A -2.5 0.034 NOP56 2.2 0.035 CYP24A1 2.7 0.036 TNFAIP2 2.6 0.036 SRSF5 2.4 0.038 RPF2 -2.7 0.038 DEK 2.4 0.03 OAS3 2.3 0.039 SNRNP70 2.6 0.039 RBM28 2.7 4.0E-02 NUMA1 2.7 4.0E-02 ABCF1 -2.0 4.0E-02 DDX23 2.5 0.041 WBP11 2.5 0.043 PTGR1 2.6 0.043 ESF1 -2.5 0.045 ERBB2IP 2.0 0.048 MID1 2.5 0.049 TPM1 -2.6 0.049

148

protein Fold-change p-value ATP2B1 -3.8 0 MAP7 -18 0 PTPN14 5.4 0 EEPD1 7.5 0 TNS4 -10. 0 AFAP1L2 7.6 0 LMO7 -4.7 0 KIAA1462 7.5 0 MTUS1 -9.7 0 LIMCH1 -12 0 DPYSL3 26 0 TACC2 -3.3 1.0E-05 AHNAK 2.2 7.0E-05 AHNAK2 2.0 7.0E-05 IRS2 -3.3 7.0E-05 EPB41L2 3.4 1.0E-04 ULK1 3.4 0.00016 MPRIP -2.1 0.00017 DOCK10 4.2 0.00028

PDE3A -5.7 0.00046

SYNPO2 -5.6 0.00056

SLC12A2 -4.6 0.00057

TRIO 4.4 0.00062 KRT18 -2.1 0.00078 PLA2G4A -5.2 1.0E-03 PDE4D -4.5 1.0E-03 PPP1R13L 4.4 0.0012 PODXL 4.5 0.0013 ARHGAP29 4.5 0.0013 GPRIN3 -4.6 0.0023 CRTC3 2.6 0.0026 LTBP1 -4.4 0.0031 ROBO1 -4.2 0.0038 FAM101B 3.8 0.0038 BCAR1 3.9 4.0E-03 CRTC2 3.1 0.0045 PDLIM2 3.5 0.0051

…………………………………………..Continued Table 4.2: The differentially expressed phosphoproteins (flow-through sample) indicating significant changes in wild-type sample comapred to LKB1 vector. 149

Table 4.2 continued protein Fold-change p-value NAV1 3.8 0.0052 STARD13 3.7 0.0065 AKR1C3 -3.8 0.0069

PACS1 3.1 0.0075 SYNJ2 3.3 0.012

TJP1 1.9 0.012 FRMD6 3.3 0.013 CTNNB1 1.7 0.018 ALDH1A1 -2.5 0.018 MYC 3.0 0.019 TMSB4X -2.5 2.0E-02 PDLIM5 -2.6 0.023 CAV1 2.4 0.023 STK4 3.1 0.023 HMGA1 2.7 0.024 BAZ1B 2.4 0.025 FNBP4 2.4 0.025 PALLD -2.0 0.027 ESAM 2.9 0.029 TNKS1BP1 -1.5 0.029 MISP -1.6 0.031 TPPP -2.9 0.031 RAB11FIP1 1.7 0.031 CLASRP 2.9 0.034 ANXA1 -2.3 0.034 NCOA5 -2.7 0.036 USP43 2.8 0.038 PNISR 2.7 0.038 TNFAIP2 2.8 0.039 CGNL1 2.7 0.039

NOTCH2 2.8 4.0E-02 CAMKK1 2.7 0.041

CSRP1 -2.0 0.041 TRA2A 2.7 0.041 FAM83H -2.0 0.043 CLNS1A 2.4 0.044 SIK3 2.2 0.044 DCDC2 2.6 0.044 ARFGAP1 2.5 0.045 EGFR 1.8 0.045 150

Figure 4.10: The gene ontology annotation of differentially expressed proteins (fold change≥2 and p-value<0.05) in A549 cell lines. The A- cellular fraction of differentially expressed proteins and phosphoproteins, B- the biological process of differentially expressed phosphoproteins and proteins. LKB1 activity altered mostly cytoplasmic, extracellular, nucleaus and plasma membrane proteins. The proteins involved in cellular processes and regulation presented the most common biological processes associated with these changes.

151

We discovered a total of 134 differentially expressed unmodified proteins (p<0.05) in the A549 cell proteome. Three differentially expressed phosphoproteins indicated the effect of LKB1 protein activity on its direct downstream targets as described in section 0.

All differentially expressed proteins detected at both proteome and phosphoproteome level (fold change, p-value<0.05) are given in Table 4.1 and Table 4.2 respectively. To categorize these proteins based on biological function, we used STRAP annotation software (Figure 4.10).

4.3.4 LKB1-dependent pathways in A549 cells

When expressed, LKB1 catalytic activity induces a functional response in downstream proteins. In human cells, these active protein expressions reflect the effects of LKB1-dependent metabolic regulation or tumor suppression. The protein-protein interactions leading to LKB1-tumor suppression are important as those may lead to protein biomarkers or drug targets. We used Ingenuity pathway analysis (IPA) to identify biologically significant, LKB1-dependent protein networks. The protein networks identified in the analysis are as indicated in Figure 4.11, Figure 4.12 and Figure 4.13.

In the human liver model, AMPK is one of the main downstream proteins affected by LKB1 catalytic activity.153 In our analysis of the A549 lung cancer cell line, we did not observe any significant changes in AMPK protein level or phosphorylation. However, the results indicated phosphoprotein level changes downstream to AMPK protein. The

152

AKT signaling pathway is one such network detected downstream to AMPK (Figure

4.11).

The IPA network of phosphoproteins indicated possible LKB1 dependencies of caveolae proteins, EGFR and NOTCH2 membrane receptor proteins, MEK, ERK, MYC and focal adhesion proteins (Figure 4.11). SIK3, a direct downstream target of LKB1 yielded differentially expressed phosphorylation sites in A549 cell lines. The discovery of phospho-SIK3 protein along with phospho-CRTC2 and phospho-CRTC3 led to the

LKB1-SIK-CRTC signaling pathway. SIK3 however, is not one of the well-known proteins of the SIK family. Due to lack of information available on SIK3, IPA analysis failed to reveal the relationship of SIK3 and CRTC proteins. This is one of the drawbacks of using a protein-protein network database consisting of curated experimental data. Such relationships of proteins benefit from validation experiments as described in Section

4.3.5.

The IPA analysis identified more protein-protein associations important in LKB1 dependent tumor suppression. The wild-type samples presented the changes in proteins related to PI3K, a known regulator of cancer signaling: SLC12A2, AKR1C1/2, and

AKR1B10. However, the data did not present evidence of its activity. CAV-1 and

HSP90 also indicated up-regulated protein expression levels in wild-type sample. TGM2,

STAT1, and STAT3 were also observed in the pathway map with probable protein- protein interactions

Overall the IPA pathway analysis of differentially expressed proteins formed a network of proteins for significantly altered proteins identified in A549 cell lines. 153

Figure 4.11: The pathway protein map of differentially expressed phosphoproteins with a fold change ≥2 and p-value<0.05. The pathway map was generated based on the curated protein-protein interaction information available in Ingenuity pathway analysis. All proteins up-regulated in LKB1 wild-type sample compared to LKB1- vector sample are presented in pink and all down-regulated proteins are presented in green. The solid lines indicate the direct relationships between proteins and dotted lines indicate the indirect protein interactions. As shown in the pathway map the expression of LKB1 protein alters the phosphorylation of proteins such as CAV1, PTRF, MTUS1, EGFR, and TJP.

154

Figure 4.12: The pathway map of differentially expressed proteins with a fold change ≥2 and p-value<0.05. The pathway map was generated based on the curated protein-protein interaction information available in Ingenuity pathway analysis. All proteins up-regulated in LKB1 wild-type sample compared to LKB1-vector sample are presented in pink and all down-regulated proteins are presented in green. The solid lines indicate the direct relationships between proteins and dotted lines indicate the indirect protein interactions.

155

Figure 4.13: The pathway map of differentially expressed proteins with a fold change ≥2 and p-value<0.05. The pathway map was generated based on the curated protein-protein interaction information available in Ingenuity pathway analysis. Pink/red- up-regulated proteins in LKB1 wild-type sample compared to LKB1-vector sample. Green- down- regulated proteins in wild-type samples compared to LKB1 vector. The solid lines indicate direct interactions between proteins and dotted lines indicate the indirect protein interactions.

156

Although these network maps revealed some probable protein-protein interactions, it did not provide a complete representation. We propose to validate further and investigate the biological relevance of these proteins in multiple NSCLC cell lines before drawing any conclusions to design an NSCLC specific pathway for LKB1 activity.

4.3.5 LKB1 phosphorylates its direct downstream target SIK3

Like AMPK, SIK proteins are direct downstream targets of LKB1 and members of the AMPK family of serine/threonine kinases. The human cells contain all three isoforms of SIK proteins that express ubiquitously in human tissues.8 SIK3, one of the three SIK proteins is important in human cancer. SIK3 favors the growth of cancer cells by promoting cell cycle progression.175, which has been detected in ovarian cancer patient tumors.176 In ovarian cancer patients, higher expression levels of SIK3 protein was correlated with a poor clinical outcome.176

Here we report changes in SIK pathway proteins including SIK3, CRTC2, and

CRTC3 (Figure 4.14). The observed changes occurred at the phosphorylation level indicating active signaling. The phosphorylation sites detected in SIK3 are S493, S568,

S673, S688, S690, S808, and S916 (Figure 4.14.A). The sites S493 and S568 have been reported in cells with induced RNA transcription and 14-3-3 motif binding.175,177 The data indicate significant dephosphorylation at S493, S688, and S690 (students’ T-test, p<0.05) in wild-type cell lines.

157

CRTC3 is one of the anticipated downstream proteins of SIK3. The wild-type samples yielded higher phosphorylation levels of CRTC3 than vector samples. The phosphorylation sites detected in the analysis are S62, S329, S370, S373, S376, S410, and S443 (Figure 4.14.C). Most of these sites have a higher probability to be phosphorylated at high LKB1 expression level, especially the sites S62, S329 (students’

T-test, p<0.05) and S370. These three serine phosphorylation sites have been recognized as 14-3-3 protein binding sites of CRTC3.177 The phosphorylation of CRTC3 initiates the binding to 14-3-3 proteins, preventing nuclear localization and subsequent binding to

CREB protein. Similarly, CRTC2 (Figure 4.14.B) indicated higher phosphorylation levels in wild-type than vector cell lines.

CRTC proteins are co-activators of cAMP response element binding protein

(CREB). Changes in CREB activity have been previously reported with CRTC1 in lung cancer cell lines and CRTC2 in liver cancer cell lines. We propose changes in cellular localization of CRTC3 as a product of SIK3 phosphorylation by LKB1. We evaluated changes in CRTC3 localization associated with LKB1. Two LKB1-deficient cell lines

(A549 and HCC15) stably transfected with either empty vector or wild-type LKB1 and two LKB1 wild-type cell lines (Calu-1 and Calu-6) depleted of endogenous LKB1 were used to determine changes in CRTC3 localization. Additionally, we used a pan-SIK inhibitor HG 9-91-01 to determine the requirement of SIK phosphorylation to mediate

LKB1 changes in CTRC localization.

158

Figure 4.14: shows the phosphorylation sites of A- SIK3; a direct downstream target of LKB1. B- CRTC2 C-CRTC3. CRTC 2/3 are downstream targets of SIK3

The SIK inhibitor is a pan-SIK inhibitor and thus does not provide evidence for exclusive SIK3 activity in wild-type cell lines. The SIK inhibitor prevents the phosphorylation of SIK proteins inhibiting the phosphorylation of CRTC proteins, permitting nuclear localization and subsequent CREB protein binding. We performed a time course analysis to optimize the incubation time with the inhibitor using 0, 1, 4 and

12 h incubation periods after adding the inhibitor. 159

The confocal microscopy images of vector and wild-type samples indicated higher levels of nuclear CRTC3 in vector at 0 h. However, limited cytoplasmic localization is also seen in the vector sample. The wild-type sample incubated with the inhibitor for 1 h indicated much higher nuclear CRTC3 than samples incubated for 4 h and 12 h (Figure

4.15). Though we expected to observe decreased levels of cytoplasmic CRTC3 with the longer incubations, rather we observed increased levels of cytoplasmic CRTC3. We believe this is mainly due to the instability of the drug over time or cellular metabolism resulting in much lower endogenous concentrations of the SIK inhibitor.

The changes were further investigated in both A549 and HCC15 cell lines in the presence of the inhibitor. With 1h incubation, wild-type A549 resulted in increased nuclear localization of CRTC3 (Figure 4.16.A). However, the same cell lines incubated with CRTC2 antibody did not indicate increased nuclear localization. CRTC2 in wild- type sample presented sub-nuclear localization of the protein upon SIK inhibitor incubation.

160

Figure 4.15: The confocal microscopy images of CRTC3 localization in A549 cell lines. The experiment used both vector and wild-type cell lines treated with HG-9-91-01 SIK inhibitor. The incubation period with the inhibitor was varied from 1-12 h and all samples were stained with DAPI and anti-rabbit fluorophore (594 nm).

161

Figure 4.16: Confocal microscopy images of cell lines treated with SIK inhibitor, HG-9- 91-01. The images indicate; A- A549, vector, and wild-type cell lines, B- HCC15 vector and wild-type cell lines. The blue color region represents the DAPI stained nucleus of the cells.

162

Figure 4.17: confocal microscopy images of A- Calu-1 parental and the N12 knockout cell lines B- Calu-6 parental and the knockout (F2 and G2) cell lines. Both parental cell lines present more cytoplasmic localization of CRTC2/3 than knockout cell lines.

163

Figure 4.18: The figure indicates the modeled pathway for the LKB1-SIK3-CRTC3- CREB signaling pathway. SIK3 inhibits CRTC3 via phosphorylation.

We validated the localization changes in CRTC3 and CRTC2 in knockout cell lines.

The Calu-1 parental cell line with active LKB1 protein indicated higher levels of CRTC3 in the cytoplasm compared to that of its’ LKB1 knockout (N12) (Figure 4.17.A). CRTC2 exhibited similar changes in these cell lines. The use of Calu-6 and its’ LKB1 knockouts

(F2 and G2) further confirmed the consistent nuclear localization (Figure 4.17.B) of

CRTC3 in the absence of LKB1 expression. CRTC2 indicated increased cytoplasmic protein in parental cell line however did not confirm the nuclear localization of the knockout cell lines.

164

Therefore, we predict our pathway model based on the consistency of protein expressions mentioned above. SIK3 is activated by LKB1 expression in cells. The active phosphorylation of SIK increases CRTC3 phosphorylation inducing cytoplasmic accumulation of the protein. The absence of LKB1 protein reverses this by decreasing the subsequent phosphorylation of CRTC3. Therefore CRTC3 localize in the nucleus increasing CREB protein activity as given in Figure 4.18.

4.3.6 LKB1 alters caveolin-1 and membrane receptor proteins

Caveolae forms non-planar lipid raft domains that localize, pre-organize and sequester the receptors on the plasma membrane, a process which aids receptor-based signaling in human cells.178 In the literature, three caveolar proteins have been reported:

Caveolin1(CAV1), cavin1(PTRF) and CD36. Mutations or loss of this proteins have been associated with a poor outcome of patients with stroma located tumors .179 The expression levels of CAV1 show variation across different tissue types. In the analysis we conducted with A549 lung adenocarcinoma cells, the two caveolar proteins CAV1 and

PTRF are affected by their abundance and the level of phosphorylation is dependent on

LKB1. The non-phosphorylated and phosphorylated CAV1 levels are at least two-fold higher (p-value less than 0.05) in wild-type cells compared to LKB1 vector cells (Table

4.1 and Table 4.2). Consequently, the level of phosphorylated epidermal growth factor receptor (EGFR) was up-regulated. The phosphorylation site information for CAV1,

PTRF and EGFR was extracted from Perseus software and was used to determine the changes of each phosphorylation after LKB1 expression in A549 cell lines. The log2 165

transformed normalized intensities of the peptides occupying each serine, threonine and tyrosine residue was calculated (Figure 4.19.A).We identified 5 different high probability, high-confidence phosphorylation sites; S2, Y6, S9, T15, and S37 in CAV1.

All phosphorylation sites were previously identified by mass spectrometry but not confirmed by any other technique. Conversely, EGFR presented many phosphorylation sites of serine, threonine, and tyrosine. The EGFR phosphorylation sites identified in the analysis are T693, S991, S995, S1025, S1039, S1042, S1064, S1081, Y1092, Y1110,

Y1166, Y1172 and Y1197 (Figure 4.19.C). Some of these are known to associate with

EGFR signaling. The T693 phosphorylation is associated with receptor internalization,180 the tyrosine phosphorylation sites: Y1092, Y1110, Y1172 and Y1197 are auto-phosphorylation sites that regulate EGFR kinase activity181, and S1039 induce

EGFR receptor endocytosis182. These changes in phosphorylation in wild-type samples clearly indicates that overexpression of LKB1 in adenocarcinoma cell lines regulates

EGFR signaling. However, the extent of these changes on the EGFR downstream signaling is yet to be discovered.

166

Figure 4.19: The changes in log2 transformed normalized intensity for phosphorylation sites of A- CAV1, B- PTRF and C- EGFR. All the phosphorylation sites and the intensity data are extracted using Perseus software with a phosphorylation site probability>0.75.

4.3.7 LKB1 is a tumor suppressor in lung adenocarcinoma cell lines

In addition to being a key metabolic regulator in the cells, LKB1 is a tumor suppressor. The loss of LKB1 protein activity is often associated with invasion and metastasis of lung tumors.183 Some of the proteins identified in the differential expression 167

analysis are known tumor metastasis and invasion markers. The regulation of those proteins in A549 cell lines provided evidence for LKB1 associated tumor suprresor activity.

Aldo-keto reductase (AKR) proteins are a group of identified in the analysis. Three AKR proteins: AKR1C1, AKR1C2, and AKR1B10 yielded higher protein expression levels (Figure 4.12) in LKB1 vector samples compared to wild-type. AKR1B10, a known diagnostic marker for non-small cell lung cancer presented a characteristically high expression level in LKB1 wild-type cell lines. This was similar to AKR1B10 expression reported in the literature.131,184 AKR1C1 and

AKR1C2 regulate steroid hormone metabolism and have been reported in NSCLC tumors.185 Further, the tumors with induced AKR1C1 and 2 expression levels have been known to present resistance to chemotherapy.185 We propose down-regulation of AKR proteins as a product of LKB1 tumor suppressor activity in lung cancer cells.

Transglutaminase 2 (TGM2) is another marker protein observed with lower expression levels in the wild-type compared to vector. TGM2 is a known NSCLC protein marker for invasion and cisplatin resistance.186,187 Like AKR proteins, TGM2 presented a significantly low protein level following LKB1 expression. The data also reported significant changes (Table 4.1) in contactin (CNTN1) and AKT. Both CNTN1188 and

AKT are proteins involved in invasion and metastasis. AKT is a protein downstream of both EGFR and LKB1, regulated by E-cadherin.188,189 The wild-ype A549 cell line proteome yielded down-regulated AKT and CNTN1 protein expression levels providing more evidence for LKB1-dependent tumor suppression. 168

The gene SERPINB9 encodes for the protein protenase inhibitor-9 (PI-9). PI-9 is a protein, overexpressed in tumor cells in response to host immune system. This protein inhibits granzyme B to produce an immune evasion mechanism.190 The overexpressed

SERPINB9 protein exhibited similar expression levels in LKB1-vector compared to wild- type further confirming the effect of LKB1 expression on tumor suppression. We, therefore, propose a potential LKB1 tumor suprresor activity related protein signature with: AKR1C1, AKR1C2, AKR1B10, TGM2, CNTN1, AKT and SERPINB9 that could be used to validate the LKB1 loss in patient tumor samples.

4.4 Conclusions

In this study, we used liquid chromatography coupled to mass spectrometry to discover protein level changes associated with LKB1 catalytic activity in A549 cell lines.

The proteomic analysis revealed about seventy phosphorylation site alterations in phosphoproteins and 136 total protein expression changes. Many of the proteins identified in the experiment did not present a known direct relationship to LKB1 protein.

SIK3 however, is the only direct downstream protein of LKB1 identified in the expression profiling. The wild-type LKB1 sample indicated induced levels of SIK3 phosphorylation. SIK protein is the upstream regulator of CRTC proteins, and the analysis detected two CRTC proteins: CRTC2 and CRTC3. CRTCs are CREB binding proteins; hence, decreased levels indicate increased CREB binding and subsequent cell transcription. The higher phosphorylation levels of CRTC proteins in wild-type samples were in agreement with this signal transduction pathway function. While CRTC3 169

consistently provided evidence for LKB1 dependance, CRTC2 failed to reproduce the changes at the protein level in our immunofluorescence experiment. Therefore, LKB1-

SIK-CRTC3 provides a novel pathway in non-small cell lung cancer. We propose this as one of the primary LKB1 activity-dependent pathways in lung cancer and studying the response of CRTC3 to different drugs may provide an effective way to control LKB1- dependent cell transcription.

170

Chapter 5: Conclusions and Future Directions

5.1 Conclusions

Lung cancer is the leading cause of cancer mortality and has a relatively low five- year survival rate. Non-small cell lung cancer (NSCLC) accounts for about 85% of these reported incidences making it one of the most devastating health issues in the world.

Some of the major concerns associated with NSCLC are not knowing how some mutations produce a cancer phenotype and why some cancers recur even after curative intent surgery. Numerous attempts to reveal mechanisms that govern tumorigenesis and recurrence have been partially successful at the genomic and transcriptomic level, however, were unable to explain the final outcome. Proteins hold the key to many misconnections between the genotype and phenotype. Considering that many of the misconnections are a result of posttranslational modifications and protein-protein interactions, profiling the tumor proteome is vital in improving patient survival. In this dissertation, mass spectrometry-based bottom-up proteomics is being used to profile the protein level differences in tumors obtained from recurrent and nonrecurrent NSCLC patients and the phosphoproteomic and proteomic changes associated with LKB1 protein expression.

171

The goal of the tumor cohort protein profiling was to determine why some early stage NSCLC patients recur soon after curative intent surgery and why some patients do not have a recurrence of their disease or recur many years later. To this end, we have identified potential prognostic markers in both adenocarcinoma and squamous cell carcinoma tumor cohorts that are important in predicting the patient survival. The recurrent squamous cell carcinoma patient tumors resulted in a dramatic decrease in desmosomal protein expressions highlighting the potential of these proteins as unique prognostic markers for SCC. The identification of those patients at higher risk for recurrence has the potential to improve the 5-year survival rate of NSCLC. This could be accomplished by aggressive adjuvant therapy after surgery for patients deemed to be at high risk for recurrence. In addition, prognostic markers could also prevent the low-risk patients from having to undergo unnecessary secondary treatments.

Our proteomic and phosphoproteomic analysis of lung cancer cell lines revealed that the loss of Liver Kinase B1 (LKB1) activates the CREB signaling pathway through the CRTC proteins, which are coactivators of the CREB transcription factor. Salt

Inducible Kinases (SIKs) are direct downstream targets of LKB1 and phosphorylate

CRTC proteins, which is necessary for interaction with 14-3-3 proteins and sequestration of CRTC proteins in the cytoplasm. Loss of LKB1 leads to unphosphorylated CRTC proteins and translocation into the nucleus where they induce transcription of CREB target genes. The identification of the proteins involved in this pathway reveals LKB1- dependent transcription regulation in lung cancer cells that may be useful as potential drug targets.

172

5.2 Future directions

The FFPE tissue protein profiling greatly benefits from online fractionation of the peptides. The experiment resulted in a multidimensional liquid chromatography coupled to mass spectrometry method with fifteen online fractions that spend 24 h on one sample.

In large tumor cohort analyses, the instrument cost and time is an issue. Therefore, the immediate next set of experiments is to further investigate the fractionation method with the intent of reducing the total separation time.

Based on our data it is clear that increased number of fractions yielded more protein groups, and longer analytical gradients yielded fewer protein groups. Therefore, we propose to explore the total protein yield of methods with increased number of high pH fractions followed by a shorter low pH gradient separation. Further, in the analysis each peptide fraction was separated using identical analytical gradients. The last high pH fractions (13-15), however, contained a fewer number of peptides that are hydrophobic compared to that of initial high pH fractions. The separation of these peptides using a generic gradient resulted in delayed retention times. We propose to combine high pH fractions from 13-15 into one fraction followed by an optimized gradient separation. This will reduce the total analysis time for one sample without affecting the data comparison across samples for different fractions.

After identifying potential prognostic markers in lung tumors, one of the major challenges is validating these proteins in a clinical setting. Therefore, the immediate future direction is to obtain matching clinical FFPE tissue samples to confirm the

173

corresponding changes discovered in the proteomic analysis. To do this, multiple reaction monitoring (MRM) experiments of the selected tissue cohorts are proposed.

Additionally, if antibodies are available for immunohistochemistry (IHC), tissue can be examined for gain or loss of specific markers identified by proteomic analysis.

In our study, it is clear that desmosomal proteins are a group of proteins with a characteristic pattern of protein expression in recurrent versus nonrecurrent squamous tumors. In recurrent tumors, desmosomal proteins are reduced many fold. However, determining if the loss of desmosomal proteins is responsible for or only predictive of tumor recurrence in early stage SCC needs to be investigated. Therefore, we propose to investigate cell migration and invasion in cell lines with altered desmosomal protein expression by overexpression or knockdown of selected desmosomal proteins. These experiments will help us identify those proteins most likely to be responsible for early recurrence of tumors in early stage SCC.

The use of prognostic markers in the clinic is one of the ways to validate the nature of recurrence of primary lung tumors. Evaluating the possibility of converting some of these marker proteins into immunohistochemistry assays is one important thing that can be done. The positive results of such assessment will provide a cost effective, relatively fast way of assessing these targets during the routine pathology examination.

Using a CREB reporter, we would like to confirm that CREB activity is high in

LKB1 mutant cells and is reduced when wild-type LKB1 is added back to those cells using retroviral transduction of an LKB1 cDNA. In addition, a laboratory we collaborate with has generated LKB1-deficient NSCLC cell lines using CRISPR technology. We 174

would like to determine if CREB activity is increased in the deficient clones when compared to the parental cell line from which they were derived.

The SIK family of proteins is comprised of three isoforms, SIK1, SIK2, and SIK3.

Our mass spectrometry based cell line characterization identified differentially expressed

SIK3. The evidence we have that SIK3 is the isoform involved in the LKB1 pathway is limited to mass spectrometry data. Therefore, we propose to investigate the activity of

SIK3 further by knocking down SIK3 in LKB1 wild-type and deficient cells using siRNA and monitoring CREB activity using a CREB reporter. We will also determine if the other isoforms, SIK1, and SIK2, are important by using siRNA to knock them down and determine CREB activity.

While our phosphoproteomics analysis has identified multiple changes in phosphorylation on SIK3 and CRTC3 proteins with LKB1 add back, the role of these phosphorylation changes on SIK3 and CRTC3 proteins remains unclear. We propose to investigate the phosphorylation sites by mutation analysis to identify the SIK3 phosphorylation sites important for the subsequent downstream phosphorylation of

CRTC3. In addition, we will investigate the phosphorylation sites on CRTC3 involved in

14-3-3 binding. Clarifying the role of SIK3 and CRTC3 proteins and increased CREB activity in LKB1 mutant NSCLC may provide additional drug targets for lung cancer with this mutation.

Target proteins are often validated in clinical tissue samples to confirm the differences in expression levels and protein localization. We would like to validate our in

175

vitro data by performing immunohistochemistry for CRTC3 in FFPE lung tumor samples known to be mutant for LKB1 and compare those to LKB1 wild type tumors.

176

References

(1) Lung Cancer Fact Sheet. 2016.

(2) Kisluk, J.; Ciborowski, M.; Niemira, M.; Kretowski, A.; Niklinski, J. Proteomics biomarkers for non-small cell lung cancer. Journal of Pharmaceutical and Biomedical Analysis 2014, 101, 40-49.

(3) Kaufman, J. M.; Amann, J. M.; Park, K.; Arasada, R. R.; Li, H.; Shyr, Y.; Carbone, D. P. LKB1 loss induces characteristic pathway activation in human tumors and confers sensitivity to MEK inhibition associated with decreased PI3K-AKT-FOXO3 signaling. Journal of Thoracic oncology submitted.

(4) Chen, H.-Y.; Yu, S.-L.; Chen, C.-H.; Chang, G.-C.; Chen, C.-Y.; Yuan, A.; Cheng, C.-L.; Wang, C.-H.; Terng, H.-J.; Kao, S.-F.; Chan, W.-K.; Li, H.-N.; Liu, C.-C.; Singh, S.; Chen, W. J.; Chen, J. J. W.; Yang, P.-C. A Five-Gene Signature and Clinical Outcome in Non–Small-Cell Lung Cancer. New England Journal of Medicine 2007, 356, 11-20.

(5) Boutros, P. C.; Lau, S. K.; Pintilie, M.; Liu, N.; Shepherd, F. A.; Der, S. D.; Tsao, M.-S.; Penn, L. Z.; Jurisica, I. Prognostic gene signatures for non-small-cell lung cancer. Proceedings of the National Academy of Sciences 2009, 106, 2824-2828.

(6) Zhang, B.; Wang, J.; Wang, X.; Zhu, J.; Liu, Q.; Shi, Z.; Chambers, M. C.; Zimmerman, L. J.; Shaddox, K. F.; Kim, S.; Davies, S. R.; Wang, S.; Wang, P.; Kinsinger, C. R.; Rivers, R. C.; Rodriguez, H.; Townsend, R. R.; Ellis, M. J. C.; Carr, S. A.; Tabb, D. L.; Coffey, R. J.; Slebos, R. J. C.; Liebler, D. C.; the, N. C. Proteogenomic characterization of human colon and rectal cancer. Nature 2014, 513, 382-387.

(7) Koussounadis, A.; Langdon, S. P.; Um, I. H.; Harrison, D. J.; Smith, V. A. Relationship between differentially expressed mRNA and mRNA-protein correlations in a xenograft model system. Scientific Reports 2015, 5, 10775. (8) Chen, H.; Huang, S.; Han, X.; Zhang, J.; Shan, C.; Tsang, Y. H.; Ma, H. T.; Poon, R. Y. C. Salt-inducible kinase 3 is a novel mitotic regulator and a target for enhancing antimitotic therapeutic-mediated cell death. Cell Death Dis 2014, 5, e1177.

(9) Kelleher, N. L.; Lin, H. Y.; Valaskovic, G. A.; Aaserud, D. J.; Fridriksson, E. K.; McLafferty, F. W. Top Down versus Bottom Up Protein Characterization by Tandem

177

High-Resolution Mass Spectrometry. Journal of the American Chemical Society 1999, 121, 806-812.

(10) Zhang, H.; Ge, Y. Comprehensive Analysis of Protein Modifications by Top- Down Mass Spectrometry. Circulation: Cardiovascular Genetics 2011, 4, 711-711.

(11) Gelpí, E. Biomedical and biochemical applications of liquid chromatography- mass spectrometry. Journal of Chromatography A 1995, 703, 59-80.

(12) Niessen, W. M. A.; Tinke, A. P. Liquid chromatography-mass spectrometry General principles and instrumentation. Journal of Chromatography A 1995, 703, 37-57.

(13) Taflin, D. C.; Ward, T. L.; Davis, E. J. Electrified droplet fission and the Rayleigh limit. Langmuir 1989, 5, 376-384.

(14) Dole, M. Molecular Beams of Macroions. The Journal of Chemical Physics 1968, 49, 2240.

(15) Iribarne, J. V. On the evaporation of small ions from charged droplets. The Journal of Chemical Physics 1976, 64, 2287.

(16) Cech, N. B.; Enke, C. G. Practical implications of some recent studies in electrospray ionization fundamentals. Mass Spectrometry Reviews 2001, 20, 362-387.

(17) Wilm, M.; Mann, M. Analytical Properties of the Nanoelectrospray Ion Source. Analytical Chemistry 1996, 68, 1-8.

(18) Wilm, M.; Shevchenko, A.; Houthaeve, T.; Breit, S.; Schweigerer, L.; Fotsis, T.; Mann, M. Femtomole sequencing of proteins from polyacrylamide gels by nano-electrospray mass spectrometry. Nature 1996, 379, 466-469.

(19) Juraschek, R.; Dülcks, T.; Karas, M. Nanoelectrospray—more than just a minimized-flow electrospray ionization source. J. Am. Soc. Mass Spectrom. 1999, 10, 300-308.

(20) Tanaka, K.; Waki, H.; Ido, Y.; Akita, S.; Yoshida, Y.; Yoshida, T.; Matsuo, T. Protein and polymer analyses up to m/z 100 000 by laser ionization time-of-flight mass spectrometry. Rapid Communications in Mass Spectrometry 1988, 2, 151-153.

(21) Karas, M.; Hillenkamp, F. Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. Analytical Chemistry 1988, 60, 2299-2301.

178

(22) Hillenkamp, F.; Karas, M.; Beavis, R. C.; Chait, B. T. Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry of Biopolymers. Analytical Chemistry 1991, 63, 1193A-1203A.

(23) Mann, M.; Hendrickson, R. C.; Pandey, A. Analysis of Proteins and Proteomes by Mass Spectrometry. Annual Review of Biochemistry 2001, 70, 437-473.

(24) Knochenmuss, R. Ion formation mechanisms in UV-MALDI. Analyst 2006, 131, 966-986.

(25) Karas, M.; Krüger, R. Ion Formation in MALDI: The Cluster Ionization Mechanism. Chemical Reviews 2003, 103, 427-440.

(26) Glish, G. L.; Vachet, R. W. The basics of mass spectrometry in the twenty-first century. Nat Rev Drug Discov 2003, 2, 140-150.

(27) March, R. E. An Introduction to Quadrupole Ion Trap Mass Spectrometry. Journal of Mass Spectrometry 1997, 32, 351-369.

(28) Schwartz, J. C.; Senko, M. W.; Syka, J. E. P. A two-dimensional quadrupole ion trap mass spectrometer. J. Am. Soc. Mass Spectrom. 2002, 13, 659-669.

(29) Schwartz, J. C.; Senko, M. W.: Two-dimensional quadrupole ion trap operated as a mass spectrometer. Google Patents, 2004.

(30) Makarov, A. Electrostatic Axially Harmonic Orbital Trapping: A High- Performance Technique of Mass Analysis. Analytical Chemistry 2000, 72, 1156-1162.

(31) Knight, R. D. Storage of ions from laser-produced plasmas. Applied Physics Letters 1981, 38, 221.

(32) Hu, Q.; Noll, R. J.; Li, H.; Makarov, A.; Hardman, M.; Graham Cooks, R. The Orbitrap: a new mass spectrometer. Journal of Mass Spectrometry 2005, 40, 430-443.

(33) Perry, R. H.; Cooks, R. G.; Noll, R. J. Orbitrap mass spectrometry: Instrumentation, ion motion and applications. Mass Spectrometry Reviews 2008, 27, 661- 699.

(34) Michalski, A.; Damoc, E.; Lange, O.; Denisov, E.; Nolting, D.; Müller, M.; Viner, R.; Schwartz, J.; Remes, P.; Belford, M.; Dunyach, J.-J.; Cox, J.; Horning, S.; Mann, M.; Makarov, A. Ultra High Resolution Linear Ion Trap Orbitrap Mass Spectrometer (Orbitrap Elite) Facilitates Top Down LC MS/MS and Versatile Peptide Fragmentation Modes. Molecular & Cellular Proteomics : MCP 2012, 11, O111.013698.

179

(35) Pekar Second, T.; Blethrow, J. D.; Schwartz, J. C.; Merrihew, G. E.; MacCoss, M. J.; Swaney, D. L.; Russell, J. D.; Coon, J. J.; Zabrouskov, V. Dual-Pressure Linear Ion Trap Mass Spectrometer Improving the Analysis of Complex Protein Mixtures. Analytical Chemistry 2009, 81, 7757-7765.

(36) Makarov, A.; Denisov, E.; Kholomeev, A.; Balschun, W.; Lange, O.; Strupat, K.; Horning, S. Performance Evaluation of a Hybrid Linear Ion Trap/Orbitrap Mass Spectrometer. Analytical Chemistry 2006, 78, 2113-2120.

(37) Yates, J. R.; Ruse, C. I.; Nakorchevsky, A. Proteomics by Mass Spectrometry: Approaches, Advances, and Applications. Annual Review of Biomedical Engineering 2009, 11, 49-79.

(38) Han, X.; Aslanian, A.; Yates Iii, J. R. Mass spectrometry for proteomics. Current Opinion in Chemical Biology 2008, 12, 483-490.

(39) Zubarev, R. A.; Makarov, A. Orbitrap Mass Spectrometry. Analytical Chemistry 2013, 85, 5288-5296.

(40) Michalski, A.; Damoc, E.; Lange, O.; Denisov, E.; Nolting, D.; Müller, M.; Viner, R.; Schwartz, J.; Remes, P.; Belford, M.; Dunyach, J.-J.; Cox, J.; Horning, S.; Mann, M.; Makarov, A. Ultra High Resolution Linear Ion Trap Orbitrap Mass Spectrometer (Orbitrap Elite) Facilitates Top Down LC MS/MS and Versatile Peptide Fragmentation Modes. Molecular & Cellular Proteomics 2012, 11.

(41) Kalli, A.; Smith, G. T.; Sweredoski, M. J.; Hess, S. Evaluation and Optimization of Mass Spectrometric Settings during Data-dependent Acquisition Mode: Focus on LTQ-Orbitrap Mass Analyzers. Journal of Proteome Research 2013, 12, 3071-3086.

(42) Roepstorff, P.; Fohlman, J. Proposal for a common nomenclature for sequence ions in mass spectra of peptides. Biomed Mass Spectrom 1984, 11, 601.

(43) Zhang, Y.; Ficarro, S. B.; Li, S.; Marto, J. A. Optimized Orbitrap HCD for Quantitative Analysis of Phosphopeptides. J. Am. Soc. Mass Spectrom. 2009, 20, 1425- 1434.

(44) Jedrychowski, M. P.; Huttlin, E. L.; Haas, W.; Sowa, M. E.; Rad, R.; Gygi, S. P. Evaluation of HCD- and CID-type Fragmentation Within Their Respective Detection Platforms For Murine Phosphoproteomics. Molecular & Cellular Proteomics : MCP 2011, 10, M111.009910.

(45) Syka, J. E. P.; Coon, J. J.; Schroeder, M. J.; Shabanowitz, J.; Hunt, D. F. Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry.

180

Proceedings of the National Academy of Sciences of the United States of America 2004, 101, 9528-9533.

(46) Mikesh, L. M.; Ueberheide, B.; Chi, A.; Coon, J. J.; Syka, J. E. P.; Shabanowitz, J.; Hunt, D. F. The utility of ETD mass spectrometry in proteomic analysis. Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics 2006, 1764, 1811-1822.

(47) Frese, C. K.; Altelaar, A. F. M.; Hennrich, M. L.; Nolting, D.; Zeller, M.; Griep- Raming, J.; Heck, A. J. R.; Mohammed, S. Improved Peptide Identification by Targeted Fragmentation Using CID, HCD and ETD on an LTQ-Orbitrap Velos. Journal of Proteome Research 2011, 10, 2377-2388.

(48) Lingdong Quan, M. L. CID,ETD and HCD Fragmentation to Study Protein Post- Translational Modifications. Modern Chemistry & Applications 2013, 01.

(49) Zubarev, R. A. The challenge of the proteome dynamic range and its implications for in-depth proteomics. PROTEOMICS 2013, 13, 723-726.

(50) Wu, L.; Han, D. K. Overcoming the dynamic range problem in mass spectrometry-based shotgun proteomics. Expert Review of Proteomics 2006, 3, 611-619.

(51) Geiger, T.; Wehner, A.; Schaab, C.; Cox, J.; Mann, M. Comparative Proteomic Analysis of Eleven Common Cell Lines Reveals Ubiquitous but Varying Expression of Most Proteins. Molecular & Cellular Proteomics : MCP 2012, 11, M111.014050.

(52) Shapiro, A. L.; Viñuela, E.; V. Maizel, J. Molecular weight estimation of polypeptide chains by electrophoresis in SDS-polyacrylamide gels. Biochemical and Biophysical Research Communications 1967, 28, 815-820.

(53) Cargile, B. J.; Bundy, J. L.; Freeman, T. W.; Stephenson, J. L. Gel Based Isoelectric Focusing of Peptides and the Utility of Isoelectric Point in Protein Identification. Journal of Proteome Research 2004, 3, 112-119.

(54) Velickovic, T. C.; Ognjenovic, J.; Mihajlovic, L.: Separation of Amino Acids, Peptides, and Proteins by Ion Exchange Chromatography. In Ion Exchange Technology II: Applications; Inamuddin, D., Luqman, M., Eds.; Springer Netherlands: Dordrecht, 2012; pp 1-34.

(55) Batth, T. S.; Francavilla, C.; Olsen, J. V. Off-Line High-pH Reversed-Phase Fractionation for In-Depth Phosphoproteomics. Journal of Proteome Research 2014, 13, 6176-6186.

181

(56) Bantscheff, M.; Schirle, M.; Sweetman, G.; Rick, J.; Kuster, B. Quantitative mass spectrometry in proteomics: a critical review. Analytical and Bioanalytical Chemistry 2007, 389, 1017-1031.

(57) Fournier, M. L.; Gilmore, J. M.; Martin-Brown, S. A.; Washburn, M. P. Multidimensional Separations-Based Shotgun Proteomics. Chemical Reviews 2007, 107, 3654-3686.

(58) Mostovenko, E.; Hassan, C.; Rattke, J.; Deelder, A. M.; van Veelen, P. A.; Palmblad, M. Comparison of peptide and protein fractionation methods in proteomics. EuPA Open Proteomics 2013, 1, 30-37.

(59) Adkins, J. N.; Varnum, S. M.; Auberry, K. J.; Moore, R. J.; Angell, N. H.; Smith, R. D.; Springer, D. L.; Pounds, J. G. Toward a Human Blood Serum Proteome: Analysis By Multidimensional Separation Coupled With Mass Spectrometry. Molecular & Cellular Proteomics 2002, 1, 947-955.

(60) Majors, R. E. Multidimensional High Performance Liquid Chromatography. Journal of Chromatographic Science 1980, 18, 571-579.

(61) Zhou, F.; Cardoza, J. D.; Ficarro, S. B.; Adelmant, G. O.; Lazaro, J.-B.; Marto, J. A. Online Nanoflow RP-RP-MS Reveals Dynamics of Multi-component Ku Complex in Response to DNA Damage. Journal of proteome research 2010, 9, 6242-6255.

(62) Yates, J. R. Mass spectrometry and the age of the proteome. Journal of Mass Spectrometry 1998, 33, 1-19.

(63) Molnár, I.; Horváth, C. Separation of amino acids and peptides on non-polar stationary phases by high-performance liquid chromatography. Journal of Chromatography A 1977, 142, 623-640.

(64) Piersma, S. R.; Fiedler, U.; Span, S.; Lingnau, A.; Pham, T. V.; Hoffmann, S.; Kubbutat, M. H. G.; Jiménez, C. R. Workflow Comparison for Label-Free, Quantitative Secretome Proteomics for Cancer Biomarker Discovery: Method Evaluation, Differential Analysis, and Verification in Serum. Journal of Proteome Research 2010, 9, 1913-1922.

(65) Dürnberger, G.; Camurdanoglu, B. Z.; Tomschik, M.; Schutzbier, M.; Roitinger, E.; Hudecz, O.; Mechtler, K.; Herbst, R. Global Analysis of Muscle-specific Kinase Signaling by Quantitative Phosphoproteomics. Molecular & Cellular Proteomics 2014, 13, 1993-2003.

(66) Fíla, J.; Honys, D. Enrichment techniques employed in phosphoproteomics. Amino Acids 2012, 43, 1025-1047.

182

(67) Eng, J. K.; McCormack, A. L.; Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994, 5, 976-989.

(68) Sadygov, R. G.; Cociorva, D.; Yates, J. R. Large-scale database searching using tandem mass spectra: Looking up the answer in the back of the book. Nat Meth 2004, 1, 195-202.

(69) Cox, J.; Neuhauser, N.; Michalski, A.; Scheltema, R. A.; Olsen, J. V.; Mann, M. Andromeda: A Peptide Search Engine Integrated into the MaxQuant Environment. Journal of Proteome Research 2011, 10, 1794-1805.

(70) Olsen, J. V.; Mann, M. Improved peptide identification in proteomics by two consecutive stages of mass spectrometric fragmentation. Proceedings of the National Academy of Sciences of the United States of America 2004, 101, 13417-13422.

(71) Tabb, D. L.; Fernando, C. G.; Chambers, M. C. MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. Journal of proteome research 2007, 6, 654-661.

(72) Adamski, M.; Blackwell, T.; Menon, R.; Martens, L.; Hermjakob, H.; Taylor, C.; Omenn, G. S.; States, D. J. Data management and preliminary data analysis in the pilot phase of the HUPO Plasma Proteome Project. PROTEOMICS 2005, 5, 3246-3261.

(73) Choi, H.; Nesvizhskii, A. I. False Discovery Rates and Related Statistical Concepts in Mass Spectrometry-Based Proteomics. Journal of Proteome Research 2008, 7, 47-50.

(74) Reiter, L.; Claassen, M.; Schrimpf, S. P.; Jovanovic, M.; Schmidt, A.; Buhmann, J. M.; Hengartner, M. O.; Aebersold, R. Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry. Molecular & Cellular Proteomics 2009, 8, 2405-2417.

(75) Ong, S.-E.; Mann, M. Mass spectrometry-based proteomics turns quantitative. Nat Chem Biol 2005, 1, 252-262.

(76) Lammich, S.; Kojro, E.; Postina, R.; Gilbert, S.; Pfeiffer, R.; Jasionowski, M.; Haass, C.; Fahrenholz, F. Constitutive and regulated -secretase cleavage of Alzheimer's amyloid precursor protein by a disintegrin metalloprotease. Proceedings of the National Academy of Sciences 1999, 96, 3922-3927.

(77) Mahmood, T.; Yang, P.-C. Western Blot: Technique, Theory, and Trouble Shooting. North American Journal of Medical Sciences 2012, 4, 429-434.

183

(78) Fine, J.-D.; Neises, G. R.; Katz, S. I. Immunofluorescence and Immunoelectron Microscopic Studies in Cicatricial Pemphigoid. Journal of Investigative Dermatology 1984, 82, 39-43.

(79) Adkins, E. M.; Samuvel, D. J.; Fog, J. U.; Eriksen, J.; Jayanthi, L. D.; Vaegter, C. B.; Ramamoorthy, S.; Gether, U. Membrane Mobility and Microdomain Association of the Dopamine Transporter Studied with Fluorescence Correlation Spectroscopy and Fluorescence Recovery after Photobleaching. Biochemistry 2007, 46, 10484-10497.

(80) Idikio, H. A. Immunohistochemistry in diagnostic surgical pathology: contributions of protein life-cycle, use of evidence-based methods and data normalization on interpretation of immunohistochemical stains. Int J Clin Exp Pathol 2009, 3, 169-176.

(81) Sharpnack, M., Srivastava, A., Cerciello, F., Ranbaduge, N., Sharpnack, J., Liebler, D., Codreanu, S., Amann, J., Araujo, L., Maher, C., Machiraju, R., Wysocki, V., Govindan, R., Mallick, P., Coombes, K., Huang , K., Carbone, D.: Proteogenomics for predicting recurrence of surgically resected lung adenocarcinomas. 2016.

(82) Xiao, Z.; Li, G.; Chen, Y.; Li, M.; Peng, F.; Li, C.; Li, F.; Yu, Y.; Ouyang, Y.; Xiao, Z.; Chen, Z. Quantitative Proteomic Analysis of Formalin-fixed and Paraffin- embedded Nasopharyngeal Carcinoma Using iTRAQ Labeling, Two-dimensional Liquid Chromatography, and Tandem Mass Spectrometry. Journal of Histochemistry and Cytochemistry 2010, 58, 517-527.

(83) Scicchitano, M. S.; Dalmas, D. A.; Boyce, R. W.; Thomas, H. C.; Frazier, K. S. Protein Extraction of Formalin-fixed, Paraffin-embedded Tissue Enables Robust Proteomic Profiles by Mass Spectrometry. Journal of Histochemistry and Cytochemistry 2009, 57, 849-860.

(84) Prieto, D. A.; Hood, B. L.; Darfler, M. M.; Guiel, T. G.; Lucas, D. A.; Conrads, T. P.; Veenstra, T. D.; Krizman, D. B. Liquid Tissue™: proteomic profiling of formalin- fixed tissues. Mass Spectrometry 2005.

(85) Nirmalan, N. J.; Hughes, C.; Peng, J.; McKenna, T.; Langridge, J.; Cairns, D. A.; Harnden, P.; Selby, P. J.; Banks, R. E. Initial Development and Validation of a Novel Extraction Method for Quantitative Mining of the Formalin-Fixed, Paraffin-Embedded Tissue Proteome for Biomarker Investigations. Journal of Proteome Research 2011, 10, 896-906.

(86) Nirmalan, N. J.; Harnden, P.; Selby, P. J.; Banks, R. E. Development and validation of a novel protein extraction methodology for quantitation of protein expression in formalin-fixed paraffin-embedded tissues using western blotting. The Journal of Pathology 2009, 217, 497-506.

184

(87) Ikeda, K.; Monden, T.; Kanoh, T.; Tsujie, M.; Izawa, H.; Haba, A.; Ohnishi, T.; Sekimoto, M.; Tomita, N.; Shiozaki, H.; Monden, M. Extraction and Analysis of Diagnostically Useful Proteins from Formalin-fixed, Paraffin-embedded Tissue Sections. Journal of Histochemistry & Cytochemistry 1998, 46, 397-403.

(88) Metz, B.; Kersten, G. F. A.; Hoogerhout, P.; Brugghe, H. F.; Timmermans, H. A. M.; de Jong, A.; Meiring, H.; Hove, J. t.; Hennink, W. E.; Crommelin, D. J. A.; Jiskoot, W. Identification of Formaldehyde-induced Modifications in Proteins: REACTIONS WITH MODEL PEPTIDES. Journal of Biological Chemistry 2004, 279, 6235-6243.

(89) Ostasiewicz, P.; Zielinska, D. F.; Mann, M.; Wiśniewski, J. R. Proteome, Phosphoproteome, and N-Glycoproteome Are Quantitatively Preserved in Formalin- Fixed Paraffin-Embedded Tissue and Analyzable by High-Resolution Mass Spectrometry. Journal of Proteome Research 2010, 9, 3688-3700.

(90) Wiśniewski, J. R.; Duś, K.; Mann, M. Proteomic workflow for analysis of archival formalin-fixed and paraffin-embedded clinical samples to a depth of 10 000 proteins. PROTEOMICS – Clinical Applications 2013, 7, 225-233.

(91) Tanca, A.; Abbondio, M.; Pisanu, S.; Pagnozzi, D.; Uzzau, S.; Addis, M. F. Critical comparison of sample preparation strategies for shotgun proteomic analysis of formalin-fixed, paraffin-embedded samples: insights from liver tissue. Clinical Proteomics 2014, 11, 1-12.

(92) Wiśniewski, J. R.; Ostasiewicz, P.; Mann, M. High Recovery FASP Applied to the Proteomic Analysis of Microdissected Formalin Fixed Paraffin Embedded Cancer Tissues Retrieves Known Colon Cancer Markers. Journal of Proteome Research 2011, 10, 3040-3049.

(93) Alkhas, A.; Hood, B. L.; Oliver, K.; Teng, P.-n.; Oliver, J.; Mitchell, D.; Hamilton, C. A.; Maxwell, G. L.; Conrads, T. P. Standardization of a Sample Preparation and Analytical Workflow for Proteomics of Archival Endometrial Cancer Tissue. Journal of Proteome Research 2011, 10, 5264-5271.

(94) Nazarian, J.; Santi, M.; Hathout, Y.; MacDonald, T. J. Protein profiling of formalin fixed paraffin embedded tissue: Identification of potential biomarkers for pediatric brainstem glioma. PROTEOMICS – Clinical Applications 2008, 2, 915-924.

(95) Ly, L.; Barnett, M. H.; Zheng, Y. Z.; Gulati, T.; Prineas, J. W.; Crossett, B. Comprehensive Tissue Processing Strategy for Quantitative Proteomics of Formalin- fixed Multiple Sclerosis Lesions. Journal of Proteome Research 2011, 10, 4855-4868.

(96) Wakabayashi, M.; Yoshihara, H.; Masuda, T.; Tsukahara, M.; Sugiyama, N.; Ishihama, Y. Phosphoproteome Analysis of Formalin-Fixed and Paraffin-Embedded 185

Tissue Sections Mounted on Microscope Slides. Journal of Proteome Research 2014, 13, 915-924.

(97) Manza, L. L.; Stamer, S. L.; Ham, A.-J. L.; Codreanu, S. G.; Liebler, D. C. Sample preparation and digestion for proteomic analyses using spin filters. PROTEOMICS 2005, 5, 1742-1745.

(98) Wisniewski, J. R.; Zougman, A.; Nagaraj, N.; Mann, M. Universal sample preparation method for proteome analysis. Nat Meth 2009, 6, 359-362.

(99) Paulo, J. A.; Lee, L. S.; Banks, P. A.; Steen, H.; Conwell, D. L. Proteomic analysis of formalin-fixed paraffin-embedded pancreatic tissue using liquid chromatography tandem mass spectrometry (LC-MS/MS). Pancreas 2012, 41, 175-185.

(100) Sprung, R. W.; Brock, J. W. C.; Tanksley, J. P.; Li, M.; Washington, M. K.; Slebos, R. J. C.; Liebler, D. C. Equivalence of Protein Inventories Obtained from Formalin-fixed Paraffin-embedded and Frozen Tissue in Multidimensional Liquid Chromatography-Tandem Mass Spectrometry Shotgun Proteomic Analysis. Molecular & Cellular Proteomics 2009, 8, 1988-1998.

(101) Bell, L. N.; Saxena, R.; Mattar, S. G.; You, J.; Wang, M.; Chalasani, N. Utility of formalin-fixed, paraffin-embedded liver biopsy specimens for global proteomic analysis in nonalcoholic steatohepatitis. Proteomics. Clinical applications 2011, 5, 397-404.

(102) León, I. R.; Schwämmle, V.; Jensen, O. N.; Sprenger, R. R. Quantitative Assessment of In-solution Digestion Efficiency Identifies Optimal Protocols for Unbiased Protein Analysis. Molecular & Cellular Proteomics : MCP 2013, 12, 2992- 3005.

(103) Andrews, G. L.; Dean, R. A.; Hawkridge, A. M.; Muddiman, D. C. Improving Proteome Coverage on a LTQ-Orbitrap Using Design of Experiments. J. Am. Soc. Mass Spectrom. 2011, 22, 773-783.

(104) Paez, J. G.; Jänne, P. A.; Lee, J. C.; Tracy, S.; Greulich, H.; Gabriel, S.; Herman, P.; Kaye, F. J.; Lindeman, N.; Boggon, T. J.; Naoki, K.; Sasaki, H.; Fujii, Y.; Eck, M. J.; Sellers, W. R.; Johnson, B. E.; Meyerson, M. EGFR Mutations in Lung Cancer: Correlation with Clinical Response to Gefitinib Therapy. Science 2004, 304, 1497-1500.

(105) Massarelli, E.; Varella-Garcia, M.; Tang, X.; Xavier, A. C.; Ozburn, N. C.; Liu, D. D.; Bekele, B. N.; Herbst, R. S.; Wistuba, I. I. KRAS Mutation Is an Important Predictor of Resistance to Therapy with Epidermal Growth Factor Receptor Tyrosine Kinase Inhibitors in Non–Small-Cell Lung Cancer. Clinical Cancer Research 2007, 13, 2890-2896.

186

(106) Alfaro, J. A.; Sinha, A.; Kislinger, T.; Boutros, P. C. Onco-proteogenomics: cancer proteomics joins forces with genomics. Nat Meth 2014, 11, 1107-1113.

(107) Beer, D. G.; Kardia, S. L. R.; Huang, C.-C.; Giordano, T. J.; Levin, A. M.; Misek, D. E.; Lin, L.; Chen, G.; Gharib, T. G.; Thomas, D. G.; Lizyness, M. L.; Kuick, R.; Hayasaka, S.; Taylor, J. M. G.; Iannettoni, M. D.; Orringer, M. B.; Hanash, S. Gene- expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 2002, 8, 816-824.

(108) Chang, Y. S.; Wang, L.; Liu, D.; Mao, L.; Hong, W. K.; Khuri, F. R.; Lee, H.-Y. Correlation between Insulin-like Growth Factor-binding Protein-3 Promoter Methylation and Prognosis of Patients with Stage I Non-Small Cell Lung Cancer. American Association for Cancer Research 2002, 8, 3669-3675.

(109) Knights, A. J.; Funnell, A. P. W.; Crossley, M.; Pearson, R. C. M. Holding Tight: Cell Junctions and Cancer Spread. Trends in cancer research 2012, 8, 61-69.

(110) Takeichi, M. Cadherins in cancer: implications for invasion and metastasis. Current Opinion in Cell Biology 1993, 5, 806-811.

(111) Ma, Z.-Q.; Dasari, S.; Chambers, M. C.; Litton, M. D.; Sobecki, S. M.; Zimmerman, L. J.; Halvey, P. J.; Schilling, B.; Drake, P. M.; Gibson, B. W.; Tabb, D. L. IDPicker 2.0: Improved Protein Assembly with High Discrimination Peptide Identification Filtering. Journal of proteome research 2009, 8, 3872-3881.

(112) Love, M. I.; Huber, W.; Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 2014, 15, 550.

(113) Gordon, A.; Glazko, G.; Qiu, X.; Yakovlev, A. Control of the mean number of false discoveries, Bonferroni and stability of multiple testing. 2007, 179-190.

(114) Holman, J. D.; Dasari, S.; Tabb, D. L. Informatics of Protein and Posttranslational Modification Detection via Shotgun Proteomics. Methods in molecular biology (Clifton, N.J.) 2013, 1002, 167-179.

(115) Baxter, R. C. IGF binding proteins in cancer: mechanistic and clinical insights. Nat Rev Cancer 2014, 14, 329-341.

(116) Takaoka, M.; Kim, S.-H.; Okawa, T.; Michaylira, C. Z.; Stairs, D. B.; Johnstone, C. N.; Andl, C. D.; Rhoades, B.; Lee, J. J.; Klein-Szanto, A. J. P.; El-Deiry, W. S.; Nakagawa, H. IGFBP-3 Regulates Esophageal Tumor Growth Through IGF-Dependent and Independent Mechanisms. Cancer biology & therapy 2007, 6, 534-540.

187

(117) Ibanez de Caceres, I.; Cortes-Sempere, M.; Moratilla, C.; Machado-Pinilla, R.; Rodriguez-Fanjul, V.; Manguan-Garcia, C.; Cejas, P.; Lopez-Rios, F.; Paz-Ares, L.; de CastroCarpeno, J.; Nistal, M.; Belda-Iniesta, C.; Perona, R. IGFBP-3 hypermethylation- derived deficiency mediates cisplatin resistance in non-small-cell lung cancer. Oncogene 2010, 29, 1681-1690.

(118) Satelli, A.; Li, S. Vimentin as a potential molecular target in cancer therapy Or Vimentin, an overview and its potential as a molecular target for cancer therapy. Cellular and molecular life sciences : CMLS 2011, 68, 3033-3046.

(119) Al-Saad, S.; Al-Shibli, K.; Donnem, T.; Persson, M.; Bremnes, R. M.; Busund, L. T. The prognostic impact of NF-κB p105, vimentin, E-cadherin and Par6 expression in epithelial and stromal compartment in non-small-cell lung cancer. British Journal of Cancer 2008, 99, 1476-1483.

(120) Clayton, D. F.; George, J. M. The synucleins: a family of proteins involved in synaptic function, plasticity, neurodegeneration and disease. Trends in Neurosciences, 21, 249-254.

(121) Liu, C.; Dong, B.; Lu, A.; Qu, L.; Xing, X.; Meng, L.; Wu, J.; Eric Shi, Y.; Shou, C. Synuclein gamma predicts poor clinical outcome in colon cancer with normal levels of carcinoembryonic antigen. BMC Cancer 2010, 10, 1-10.

(122) Liu, H.; Liu, W.; Wu, Y.; Zhou, Y.; Xue, R.; Luo, C.; Wang, L.; Zhao, W.; Jiang, J.-D.; Liu, J. Loss of Epigenetic Control of Synuclein-γ Gene as a Molecular Indicator of Metastasis in a Wide Range of Human Cancers. Cancer Research 2005, 65, 7635-7643.

(123) Gandhi, M.; Smith, B. A.; Bovellan, M.; Paavilainen, V.; Daugherty-Clarke, K.; Gelles, J.; Lappalainen, P.; Goode, B. L. GMF Is a Cofilin Homolog that Binds Arp2/3 Complex to Stimulate Filament Debranching and Inhibit Actin Nucleation. Current Biology 2010, 20, 861-867.

(124) Zuo, P.; Ma, Y.; Huang, Y.; Ye, F.; Wang, P.; Wang, X.; Zhou, C.; Lu, W.; Kong, B.; Xie, X. High GMFG expression correlates with poor prognosis and promotes cell migration and invasion in epithelial ovarian cancer. Gynecologic Oncology 2014, 132, 745-751.

(125) Different keratin polypeptides in epidermis and other epithelia of human skin: a specific cytokeratin of molecular weight 46,000 in epithelia of the pilosebaceous tract and basal cell epitheliomas. The Journal of Cell Biology 1982, 95, 285-295.

(126) Ide, M.; Kato, T.; Ogata, K.; Mochiki, E.; Kuwano, H.; Oyama, T. Keratin 17 Expression Correlates with Tumor Progression and Poor Prognosis in Gastric Adenocarcinoma. Annals of Surgical Oncology 2012, 19, 3506-3514. 188

(127) Kim, H.-S.; Lee, J.-J.; Do, S.-I.; Kim, K.; Do, I.-G.; Kim, D.-H.; Chae, S. W.; Sohn, J. H. Overexpression of cytokeratin 17 is associated with the development of papillary thyroid carcinoma and the presence of lymph node metastasis. International Journal of Clinical and Experimental Pathology 2015, 8, 5695-5701.

(128) Wang, Y.-F.; Lang, H.-Y.; Yuan, J.; Wang, J.; Wang, R.; Zhang, X.-H.; Zhang, J.; Zhao, T.; Li, Y.-R.; Liu, J.-Y.; Zeng, L.-H.; Guo, G.-Z. Overexpression of keratin 17 is associated with poor prognosis in epithelial ovarian cancer. Tumor Biology 2013, 34, 1685-1689.

(129) DiTommaso, T.; Cottle, D. L.; Pearson, H. B.; Schlüter, H.; Kaur, P.; Humbert, P. O.; Smyth, I. M. Keratin 76 Is Required for Tight Junction Function and Maintenance of the Skin Barrier. PLoS Genet 2014, 10, e1004706.

(130) Ambatipudi, S.; Bhosale, P. G.; Heath, E.; Pandey, M.; Kumar, G.; Kane, S.; Patil, A.; Maru, G. B.; Desai, R. S.; Watt, F. M.; Mahimkar, M. B. Downregulation of Keratin 76 Expression during Oral Carcinogenesis of Human, Hamster and Mouse. PLoS ONE 2013, 8, e70688.

(131) Penning, T. M. AKR1B10: A New Diagnostic Marker of Non–Small Cell Lung Carcinoma in Smokers. Clinical Cancer Research 2005, 11, 1687-1690.

(132) Frycz, B. A.; Murawa, D.; Borejsza-Wysocki, M.; Wichtowski, M.; Spychała, A.; Marciniak, R.; Murawa, P.; Drews, M.; Jagodziński, P. P. Transcript level of AKR1C3 is down-regulated in gastric cancer. Biochemistry and Cell Biology 2015, 94, 138-146.

(133) Yang, L.; Zhang, J.; Zhang, S.; Dong, W.; Lou, X.; Liu, S. Quantitative Evaluation of Aldo–keto Reductase Expression in Hepatocellular Carcinoma (HCC) Cell Lines. Genomics, Proteomics & Bioinformatics 2013, 11, 230-240.

(134) Ohkura-Hada, S.; Kondoh, N.; Hada, A.; Arai, M.; Yamazaki, Y.; Shindoh, M.; Kitagawa, Y.; Takahashi, M.; Ando, T.; Sato, Y.; Yamamoto, M. Carbonyl Reductase 3 (CBR3) Mediates 9-cis-Retinoic Acid-Induced Cytostatis and is a Potential Prognostic Marker for Oral Malignancy. The Open Dentistry Journal 2008, 2, 78-88.

(135) Takenaka, K.; Ogawa, E.; Oyanagi, H.; Wada, H.; Tanaka, F. Carbonyl Reductase Expression and Its Clinical Significance in Non–Small-Cell Lung Cancer. Cancer Epidemiology Biomarkers & Prevention 2005, 14, 1972-1975.

(136) Park, K.-S.; Kim, H.-K.; Lee, J.-H.; Choi, Y.-B.; Park, S.-Y.; Yang, S.-H.; Kim, S.-Y.; Hong, K.-M. Transglutaminase 2 as a cisplatin resistance marker in non-small cell lung cancer. Journal of Cancer Research and Clinical Oncology 2010, 136, 493-502.

189

(137) Ai, L.; Kim, W.-J.; Demircan, B.; Dyer, L. M.; Bray, K. J.; Skehan, R. R.; Massoll, N. A.; Brown, K. D. The transglutaminase 2 gene (TGM2), a potential molecular marker for chemotherapeutic drug sensitivity, is epigenetically silenced in breast cancer. Carcinogenesis 2008, 29, 510-518.

(138) Miyoshi, N.; Ishii, H.; Mimori, K.; Tanaka, F.; Hitora, T.; Tei, M.; Sekimoto, M.; Doki, Y.; Mori, M. TGM2 Is a Novel Marker for Prognosis and Therapeutic Target in Colorectal Cancer. Annals of Surgical Oncology 2010, 17, 967-972.

(139) Moyers, J. S.; Bilan, P. J.; Reynet, C.; Kahn, C. R. Overexpression of Rad Inhibits Glucose Uptake in Cultured Muscle and Fat Cells. Journal of Biological Chemistry 1996, 271, 23111-23116.

(140) Downward, J. Regulatory mechanisms for ras proteins. BioEssays 1992, 14, 177- 184.

(141) Zhu, J.; Reynet, C.; Caldwell, J. S.; Kahn, C. R. Characterization of Rad, a New Member of Ras/GTPase Superfamily, and Its Regulation by a Unique GTPase-activating protein (GAP)-like Activity. Journal of Biological Chemistry 1995, 270, 4805-4812.

(142) Mo, Y.; Midorikawa, K.; Zhang, Z.; Zhou, X.; Ma, N.; Huang, G.; Hiraku, Y.; Oikawa, S.; Murata, M. Promoter hypermethylation of Ras-related GTPase gene RRAD inactivates a tumor suppressor function in nasopharyngeal carcinoma. Cancer Letters 2012, 323, 147-154.

(143) Dusek, R. L.; Attardi, L. D. Desmosomes: new perpetrators in tumour suppression. Nature reviews. Cancer 2011, 11, 317-323.

(144) Chidgey, M.; Dawson, C. Desmosomes: a role in cancer? British Journal of Cancer 2007, 96, 1783-1787.

(145) Chen, Y. J.; Chang, J. T.; Lee, L.; Wang, H. M.; Liao, C. T.; Chiu, C. C.; Chen, P. J.; Cheng, A. J. DSG3 is overexpressed in head neck cancer and is a potential molecular target for inhibition of oncogenesis. Oncogene 2006, 26, 467-476.

(146) Brennan, D.; Mahoney, M. G. Increased expression of Dsg2 in malignant skin carcinomas: A tissue-microarray based study. Cell Adhesion & Migration 2009, 3, 148- 154.

(147) Kowalczyk, A. P.; Bornslaeger, E. A.; Borgwardt, J. E.; Palka, H. L.; Dhaliwal, A. S.; Corcoran, C. M.; Denning, M. F.; Green, K. J. The Amino-terminal Domain of Desmoplakin Binds to Plakoglobin and Clusters Desmosomal Cadherin–Plakoglobin Complexes. The Journal of Cell Biology 1997, 139, 773-784.

190

(148) Karnovsky, A.; Klymkowsky, M. W. Anterior axis duplication in Xenopus induced by the over-expression of the cadherin-binding protein plakoglobin. Proceedings of the National Academy of Sciences of the United States of America 1995, 92, 4522- 4526.

(149) Conacci-Sorrell, M. E.; Ben-Yedidia, T.; Shtutman, M.; Feinstein, E.; Einat, P.; Ben-Ze'ev, A. Nr-CAM is a target gene of the β-catenin/LEF-1 pathway in melanoma and colon cancer and its expression enhances motility and confers tumorigenesis. Genes & Development 2002, 16, 2058-2072.

(150) Chang, H.-H.; Dreyfuss, J. M.; Ramoni, M. F. A Transcriptional Network Signature Characterizes Lung Cancer Subtypes. Cancer 2011, 117, 353-360.

(151) Kikuchi, T.; Hassanein, M.; Amann, J. M.; Liu, Q.; Slebos, R. J. C.; Rahman, S. M. J.; Kaufman, J. M.; Zhang, X.; Hoeksema, M. D.; Harris, B. K.; Li, M.; Shyr, Y.; Gonzalez, A. L.; Zimmerman, L. J.; Liebler, D. C.; Massion, P. P.; Carbone, D. P. In- depth Proteomic Analysis of Nonsmall Cell Lung Cancer to Discover Molecular Targets and Candidate Biomarkers. Molecular & Cellular Proteomics 2012, 11, 916-932.

(152) Shackelford, D. B.; Shaw, R. J. The LKB1-AMPK pathway: metabolism and growth control in tumour suppression. Nat Rev Cancer 2009, 9, 563-575.

(153) Liang, J.; Shao, S. H.; Xu, Z.-X.; Hennessy, B.; Ding, Z.; Larrea, M.; Kondo, S.; Dumont, D. J.; Gutterman, J. U.; Walker, C. L.; Slingerland, J. M.; Mills, G. B. The energy sensing LKB1-AMPK pathway regulates p27kip1 phosphorylation mediating the decision to enter autophagy or apoptosis. Nat Cell Biol 2007, 9, 218-224.

(154) Alessi, D. R.; Sakamoto, K.; Bayascas, J. R. LKB1-Dependent Signaling Pathways. Annual Review of Biochemistry 2006, 75, 137-163.

(155) Faubert, B.; Vincent, E. E.; Griss, T.; Samborska, B.; Izreig, S.; Svensson, R. U.; Mamer, O. A.; Avizonis, D.; Shackelford, D. B.; Shaw, R. J.; Jones, R. G. Loss of the tumor suppressor LKB1 promotes metabolic reprogramming of cancer cells via HIF-1α. Proceedings of the National Academy of Sciences 2014.

(156) Lizcano, J. M.; Göransson, O.; Toth, R.; Deak, M.; Morrice, N. A.; Boudeau, J.; Hawley, S. A.; Udd, L.; Mäkelä, T. P.; Hardie, D. G.; Alessi, D. R. LKB1 is a master kinase that activates 13 kinases of the AMPK subfamily, including MARK/PAR‐1. The EMBO Journal 2004, 23, 833-843.

(157) Ji, H.; Ramsey, M. R.; Hayes, D. N.; Fan, C.; McNamara, K.; Kozlowski, P.; Torrice, C.; Wu, M. C.; Shimamura, T.; Perera, S. A.; Liang, M.-C.; Cai, D.; Naumov, G. N.; Bao, L.; Contreras, C. M.; Li, D.; Chen, L.; Krishnamurthy, J.; Koivunen, J.; Chirieac, L. R.; Padera, R. F.; Bronson, R. T.; Lindeman, N. I.; Christiani, D. C.; Lin, X.; 191

Shapiro, G. I.; Janne, P. A.; Johnson, B. E.; Meyerson, M.; Kwiatkowski, D. J.; Castrillon, D. H.; Bardeesy, N.; Sharpless, N. E.; Wong, K.-K. LKB1 modulates lung cancer differentiation and metastasis. Nature 2007, 448, 807-810.

(158) van Veelen, W.; Korsse, S. E.; van de Laar, L.; Peppelenbosch, M. P. The long and winding road to rational treatment of cancer associated with LKB1/AMPK/TSC/mTORC1 signaling. Oncogene 2011, 30, 2289-2303.

(159) Zhao, R.-X.; Xu, Z.-X. Targeting the LKB1 Tumor Suppressor. Current drug targets 2014, 15, 32-52.

(160) Laplante, M.; Sabatini, David M. mTOR Signaling in Growth Control and Disease. Cell, 149, 274-293.

(161) Mehenni, H.; Lin-Marq, N.; Buchet-Poyau, K.; Reymond, A.; Collart, M. A.; Picard, D.; Antonarakis, S. E. LKB1 interacts with and phosphorylates PTEN: a functional link between two proteins involved in cancer predisposing syndromes. Human Molecular Genetics 2005, 14, 2209-2219.

(162) Partanen, J. I.; Nieminen, A. I.; Mäkelä, T. P.; Klefstrom, J. Suppression of oncogenic properties of c-Myc by LKB1-controlled epithelial organization. Proceedings of the National Academy of Sciences 2007, 104, 14694-14699.

(163) Shaw, R. J. Tumor Suppression by LKB1: SIK-ness Prevents Metastasis. Science Signaling 2009, 2, pe55-pe55.

(164) Phanstiel, D. H.; Brumbaugh, J.; Wenger, C. D.; Tian, S.; Probasco, M. D.; Bailey, D. J.; Swaney, D. L.; Tervo, M. A.; Bolin, J. M.; Ruotti, V.; Stewart, R.; Thomson, J. A.; Coon, J. J. Proteomic and phosphoproteomic comparison of human ES and iPS cells. Nat Meth 2011, 8, 821-827.

(165) Bodenmiller, B.; Mueller, L. N.; Mueller, M.; Domon, B.; Aebersold, R. Reproducible isolation of distinct, overlapping segments of the phosphoproteome. Nat Meth 2007, 4, 231-237.

(166) Li, Q.-r.; Ning, Z.-b.; Tang, J.-s.; Nie, S.; Zeng, R. Effect of Peptide-to-TiO2 Beads Ratio on Phosphopeptide Enrichment Selectivity. Journal of Proteome Research 2009, 8, 5375-5381.

(167) Kettenbach, A. N.; Gerber, S. A. Rapid and Reproducible Single-Stage Phosphopeptide Enrichment of Complex Peptide Mixtures: Application to General and Phosphotyrosine-Specific Phosphoproteomics Experiments. Analytical Chemistry 2011, 83, 7635-7644.

192

(168) Cox, J.; Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotech 2008, 26, 1367-1372.

(169) Cox, J.; Matic, I.; Hilger, M.; Nagaraj, N.; Selbach, M.; Olsen, J. V.; Mann, M. A practical guide to the MaxQuant computational platform for SILAC-based quantitative proteomics. Nat. Protocols 2009, 4, 698-705.

(170) Bhatia, V. N.; Perlman, D. H.; Costello, C. E.; McComb, M. E. Software Tool for Researching Annotations of Proteins (STRAP): Open-Source Protein Annotation Software with Data Visualization. Analytical chemistry 2009, 81, 9819-9823.

(171) Ficarro, S. B.; Zhang, Y.; Carrasco-Alfonso, M. J.; Garg, B.; Adelmant, G.; Webber, J. T.; Luckey, C. J.; Marto, J. A. Online Nanoflow Multidimensional Fractionation for High Efficiency Phosphopeptide Analysis. Molecular & Cellular Proteomics : MCP 2011, 10, O111.011064.

(172) Dickhut, C.; Feldmann, I.; Lambert, J.; Zahedi, R. P. Impact of Digestion Conditions on Phosphoproteomics. Journal of Proteome Research 2014, 13, 2761-2770.

(173) Colaert, N.; Helsens, K.; Martens, L.; Vandekerckhove, J.; Gevaert, K. Improved visualization of protein consensus sequences by iceLogo. Nat Meth 2009, 6, 786-787.

(174) Olsen, J. V.; Blagoev, B.; Gnad, F.; Macek, B.; Kumar, C.; Mortensen, P.; Mann, M. Global, In Vivo, and Site-Specific Phosphorylation Dynamics in Signaling Networks. Cell, 127, 635-648.

(175) Chen, H.; Huang, S.; Han, X.; Zhang, J.; Shan, C.; Tsang, Y. H.; Ma, H. T.; Poon, R. Y. C. Salt-inducible kinase 3 is a novel mitotic regulator and a target for enhancing antimitotic therapeutic-mediated cell death. Cell Death & Disease 2014, 5, e1177.

(176) Charoenfuprasert, S.; Yang, Y. Y.; Lee, Y. C.; Chao, K. C.; Chu, P. Y.; Lai, C. R.; Hsu, K. F.; Chang, K. C.; Chen, Y. C.; Chen, L. T.; Chang, J. Y.; Leu, S. J.; Shih, N. Y. Identification of salt-inducible kinase 3 as a novel tumor antigen associated with tumorigenesis of ovarian cancer. Oncogene 2011, 30, 3570-3584.

(177) Clark, K.; MacKenzie, K. F.; Petkevicius, K.; Kristariyanto, Y.; Zhang, J.; Choi, H. G.; Peggie, M.; Plater, L.; Pedrioli, P. G. A.; McIver, E.; Gray, N. S.; Arthur, J. S. C.; Cohen, P. Phosphorylation of CRTC3 by the salt-inducible kinases controls the interconversion of classically activated and regulatory macrophages. Proceedings of the National Academy of Sciences 2012, 109, 16986-16991.

(178) Lingwood, D.; Simons, K. Lipid Rafts As a Membrane-Organizing Principle. Science 2009, 327, 46-50. 193

(179) Martinez-Outschoorn, U. E.; Sotgia, F.; Lisanti, M. P. Caveolae and signalling in cancer. Nat Rev Cancer 2015, 15, 225-237.

(180) Tsai, C.-F.; Wang, Y.-T.; Yen, H.-Y.; Tsou, C.-C.; Ku, W.-C.; Lin, P.-Y.; Chen, H.-Y.; Nesvizhskii, A. I.; Ishihama, Y.; Chen, Y.-J. Large-scale determination of absolute phosphorylation stoichiometries in human cells by motif-targeting quantitative proteomics. Nat Commun 2015, 6.

(181) Zhang, G.; Fang, B.; Liu, R. Z.; Lin, H.; Kinose, F.; Bai, Y.; Oguz, U.; Remily- Wood, E. R.; Li, J.; Altiok, S.; Eschrich, S.; Koomen, J.; Haura, E. B. Mass spectrometry mapping of epidermal growth factor receptor phosphorylation related to oncogenic mutations and tyrosine kinase inhibitor sensitivity. Journal of proteome research 2011, 10, 305-319.

(182) Tong, J.; Taylor, P.; Moran, M. F. Proteomic Analysis of the Epidermal Growth Factor Receptor (EGFR) Interactome and Post-translational Modifications Associated with Receptor Endocytosis in Response to EGF and Stress. Molecular & Cellular Proteomics : MCP 2014, 13, 1644-1658.

(183) Carretero, J.; Shimamura, T.; Rikova, K.; Jackson, A. L.; Wilkerson, M. D.; Borgman, C. L.; Buttarazzi, M. S.; Sanofsky, B. A.; McNamara, K. L.; Brandstetter, K. A.; Walton, Z. E.; Gu, T.-L.; Silva, J. C.; Crosby, K.; Shapiro, G. I.; Maira, M.; Ji, H.; Castrillon, D. H.; Kim, C. F.; García-Echeverría, C.; Bardeesy, N.; Sharpless, N. E.; Hayes, N. D.; Kim, W. Y.; Engelman, J. A.; Wong, K.-K. Integrative genomic and proteomic analyses identify targets for Lkb1 deficient metastatic lung tumors. Cancer cell 2010, 17, 547-559.

(184) Laffin, B.; Petrash, J. M. Expression of the Aldo-Ketoreductases AKR1B1 and AKR1B10 in Human Cancers. Frontiers in Pharmacology 2012, 3, 104.

(185) Wang, H.-W.; Lin, C.-P.; Chiu, J.-H.; Chow, K.-C.; Kuo, K.-T.; Lin, C.-S.; Wang, L.-S. Reversal of inflammation-associated dihydrodiol dehydrogenases (AKR1C1 and AKR1C2) overexpression and drug resistance in nonsmall cell lung cancer cells by wogonin and chrysin. International Journal of Cancer 2007, 120, 2019-2027.

(186) Kumar, A.; Xu, J.; Brady, S.; Gao, H.; Yu, D.; Reuben, J.; Mehta, K. Tissue transglutaminase promotes drug resistance and invasion by inducing mesenchymal transition in mammary epithelial cells. PLoS One 2010, 5.

(187) Park, K. S.; Kim, H. K.; Lee, J. H.; Choi, Y. B.; Park, S. Y.; Yang, S. H.; Kim, S. Y.; Hong, K. M. Transglutaminase 2 as a cisplatin resistance marker in non-small cell lung cancer. J Cancer Res Clin Oncol 2010, 136.

194

(188) Su, J.-L.; Yang, C.-Y.; Shih, J.-Y.; Wei, L.-H.; Hsieh, C.-Y.; Jeng, Y.-M.; Wang, M.-Y.; Yang, P.-C.; Kuo, M.-L. Knockdown of Contactin-1 Expression Suppresses Invasion and Metastasis of Lung Adenocarcinoma. Cancer Research 2006, 66, 2553- 2561.

(189) Yan, J.; Wong, N.; Hung, C.; Chen, W. X.-Y.; Tang, D. Contactin-1 Reduces E- Cadherin Expression Via Activating AKT in Lung Cancer. PLoS ONE 2013, 8, e65463.

(190) Soriano, C.; Mukaro, V.; Hodge, G.; Ahern, J.; Holmes, M.; Jersmann, H.; Moffat, D.; Meredith, D.; Jurisevic, C.; Reynolds, P. N.; Hodge, S. Increased proteinase inhibitor-9 (PI-9) and reduced granzyme B in lung cancer: Mechanism for immune evasion? Lung Cancer 2012, 77, 38-45.

195

Appendix A: The Chromatograms and the Change in Protein Sequence

Coverage by Different High pH fractionation

a

1

2

3

…………………………………………………………………………………Continued Figure A.1: The multidimensional liquid chromatograms for a- 3-fraction, b- 5-fraction, c- 10-fraction, d- 15-fraction LC/LC runs. The numbering indicates the chromatogram acquired for each high pH fraction followed by low pH separation.

196

Figure A.1 continued

b

1

2

3

4

5

………………………………………………………………………………..Continued

197

Figure A.1 continued

c

1

2

3

4

5

6

7

8

9

10

………………………………………………………………………………….Continued

198

Figure A.1 continued d

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

199

200

Figure A.2: The an example indicating the effect of multi-fractionation on protein sequence coverage. The selected protein is Annexin A5. The 15-fraction run yields the maximum coverage of the selected protein sequnce.

Appendix B: The Tissue H&E Slides indicating the Tumor Margins

Figure B.1: The H&E slides indicating the tumor margins of glass slide specimens. The highlighted area contains malignant tissue. The tumor adjacent tissue was macrodissected.

201

Appendix C: The 1505 patient Data

Data category WU 1505 patient Filtered spectra 7543 Distinct matches 3212 Peptides 2393 Proteins 1222 Table C.1: The data obtained by Myrimatch search algorithm. The peptide FDR is 1% and protein FDR is 2.84%.

Accession Normalized spectral counts sp|P08670|VIME_HUMAN 2563.19 sp|P04792|HSPB1_HUMAN 2291.33 sp|P02538|K2C6A_HUMAN 1067.99 sp|P48668|K2C6C_HUMAN 1055.05 sp|P04259|K2C6B_HUMAN 951.49 sp|P63261|ACTG_HUMAN 757.31 sp|P13647|K2C5_HUMAN 757.31 sp|P01834|IGKC_HUMAN 724.94 sp|Q6S8J3|POTEE_HUMAN 576.07 sp|P04406|G3P_HUMAN 556.65 sp|P68032|ACTC_HUMAN 459.56 sp|P10412|H14_HUMAN 446.62 sp|P16402|H13_HUMAN 440.14 sp|P16403|H12_HUMAN 440.14 sp|P63267|ACTH_HUMAN 427.2 sp|A5A3E0|POTEF_HUMAN 420.73 sp|P08727|K1C19_HUMAN 375.42 sp|P08779|K1C16_HUMAN 375.42 …………………………………………………………………………………..Continued Table C.2: The top 97 protein identifications and respective spectral counts obtained for WU 1505 patient data.

202

Table C 2 continued sp|Q9BQE3|TBA1C_HUMAN 330.11 sp|Q562R1|ACTBL_HUMAN 310.69 sp|P02545|LMNA_HUMAN 297.74 sp|Q04695|K1C17_HUMAN 297.74 sp|P19012|K1C15_HUMAN 284.8 sp|P16401|H15_HUMAN 252.44 sp|P06733|ENOA_HUMAN 245.96 sp|P35579|MYH9_HUMAN 245.96 sp|P62805|H4_HUMAN 239.49 sp|P21333|FLNA_HUMAN 233.02 sp|P13929|ENOB_HUMAN 226.54 sp|P09104|ENOG_HUMAN 226.54 sp|Q15084|PDIA6_HUMAN 213.6 sp|P13639|EF2_HUMAN 207.13 sp|P01024|CO3_HUMAN 194.18 sp|Q05682|CALD1_HUMAN 194.18 sp|P14625|ENPL_HUMAN 187.71 sp|P60174|TPIS_HUMAN 181.24 sp|P07737|PROF1_HUMAN 181.24 sp|P04264|K2C1_HUMAN 181.24 sp|P04083|ANXA1_HUMAN 174.76 sp|P14618|KPYM_HUMAN 168.29 sp|Q8NBJ5|GT251_HUMAN 161.82 sp|Q5XKE5|K2C79_HUMAN 155.34 sp|O95678|K2C75_HUMAN 155.34 sp|Q9C0C2|TB182_HUMAN 148.87 sp|P06899|H2B1J_HUMAN 148.87 sp|Q09666|AHNK_HUMAN 142.4 sp|P02792|FRIL_HUMAN 135.93 sp|P01857|IGHG1_HUMAN 129.45 sp|P67809|YBOX1_HUMAN 129.45 sp|P0C0L4|CO4A_HUMAN 129.45 sp|P0C0L5|CO4B_HUMAN 129.45 sp|P02533|K1C14_HUMAN 129.45 sp|Q16543|CDC37_HUMAN 122.98 sp|P35555|FBN1_HUMAN 122.98 sp|P16989|YBOX3_HUMAN 122.98 …………………………………………………………………………………..Continued

203

Table C.2 continued sp|O43707|ACTN4_HUMAN 122.98 sp|Q9P2E9|RRBP1_HUMAN 122.98 sp|Q14980|NUMA1_HUMAN 116.51 sp|Q6PEY2|TBA3E_HUMAN 116.51 sp|P12814|ACTN1_HUMAN 116.51 sp|P14314|GLU2B_HUMAN 110.04 sp|Q8NC51|PAIRB_HUMAN 110.04 sp|P0CG38|POTEI_HUMAN 110.04 sp|O00151|PDLI1_HUMAN 103.56 sp|P23528|COF1_HUMAN 103.56 sp|Q9H2U2|IPYR2_HUMAN 103.56 sp|O00515|LAD1_HUMAN 97.09 sp|P22314|UBA1_HUMAN 97.09 sp|P59665|DEF1_HUMAN 97.09 sp|Q02539|H11_HUMAN 97.09 sp|P42167|LAP2B_HUMAN 90.62 sp|P13646|K1C13_HUMAN 90.62 sp|P02751|FINC_HUMAN 90.62 sp|P07355|ANXA2_HUMAN 90.62 sp|Q07955|SRSF1_HUMAN 90.62 sp|Q14764|MVP_HUMAN 90.62 sp|P23246|SFPQ_HUMAN 84.15 sp|P42166|LAP2A_HUMAN 84.15 sp|P01860|IGHG3_HUMAN 84.15 sp|P62328|TYB4_HUMAN 84.15 sp|P68104|EF1A1_HUMAN 84.15 sp|P10809|CH60_HUMAN 84.15 sp|Q14195|DPYL3_HUMAN 84.15 sp|P05787|K2C8_HUMAN 84.15 sp|P63104|1433Z_HUMAN 84.15 sp|P05164|PERM_HUMAN 84.15 sp|P50454|SERPH_HUMAN 77.67 sp|P26038|MOES_HUMAN 77.67 sp|P02042|HBD_HUMAN 77.67 sp|P04075|ALDOA_HUMAN 77.67 sp|P0CG39|POTEJ_HUMAN 77.67 sp|P62917|RL8_HUMAN 71.2

204

Appendix D: Protein Sequence Alignments of AKR Family Proteins

a

b

Figure D.1: The sequence alignment of a- AKR1C1 and AKR1C2 and b- AKR1C2 and AKR1C3. The UniProt sequence alignment tool was used. The sequence similarity of AKR1C1 and AKR1C2 is 97%. The sequence similarity of AKR1C2 and AKR1C3 is 90%. The highlighted amino acids indicate the sequence mismatches.

205

206

Figure D.2: The sequence alignment of A- AKR1C2 and AKR1B10. The UniProt sequence alignment tool was used. The highlighted amino acids indicate the sequence mismatches.

Sample group Non-recurrent Recurrent Tumor differentiation PD MD MD MD PD PD MD MD MD MD MD PD Patient # 1502 1503 1504 1506 1507 1512 1513 1518 1520 1509 1514 1517 sp|O75828|CBR3_HUMAN 32.23 37.12 24.09 44.63 45.52 61.14 467.74 14.03 55.23 7.38 14.98 15.77 sp|Q01546|K22O_HUMAN 83.68 179.39 143.98 159 296.88 182.35 16.82 267.2 249.23 1.23 9.18 7.39 sp|P12035|K2C3_HUMAN 88.77 184.25 152.55 178.53 375.05 185.01 25.23 269.36 246.4 8.61 10.15 6.41 sp|P35052|GPC1_HUMAN 14.13 2.21 2.68 1.39 3.96 7.97 2.1 5.94 0 0 0.97 1.48 sp|Q9H3D4|P63_HUMAN 5.65 2.65 4.82 1.39 6.93 0.53 2.1 3.78 3.54 0 0 0.49 sp|P36952|SPB5_HUMAN 14.13 61.86 8.56 13.95 3.96 5.85 4.2 82.05 22.66 2.46 0.48 0 sp|Q14574|DSC3_HUMAN 24.88 19.88 21.95 1.39 50.47 13.29 32.58 18.35 6.37 0 0 0 sp|P04406|G3P_HUMAN 589.15 396.34 461.38 578.82 672.92 1012.76 238.6 267.74 395.79 279.22 533.54 761.32 sp|P14923|PLAK_HUMAN 59.93 83.51 70.12 18.13 68.28 36.15 188.15 54.52 46.02 4.92 14.98 3.94

207 sp|P15924|DESP_HUMAN 196.19 213.86 175.03 68.34 306.77 84.53 278.54 276.92 191.88 94.71 31.41 22.67

sp|P16152|CBR1_HUMAN 83.11 90.58 59.41 89.26 78.18 140.88 758.9 32.39 257.02 28.29 29.48 27.59 sp|Q04828|AK1C1_HUMAN 328.5 217.83 133.81 26.5 119.74 47.85 215.48 258.03 157.18 0 7.25 26.12 sp|P52895|AK1C2_HUMAN 114.78 68.49 50.31 19.53 49.48 24.99 101.96 69.63 68.68 0 4.83 14.78 sp|O60218|AK1BA_HUMAN 23.18 97.65 38.54 13.95 7.92 31.37 87.24 17.27 83.55 0 0 9.36 sp|Q04695|K1C17_HUMAN 23.18 1092.71 657.29 613.69 3243.87 442.32 1408.49 1361.92 300.21 27.06 14.02 7.39 sp|P32926|DSG3_HUMAN 24.88 37.56 33.19 25.11 59.38 19.14 36.79 31.31 16.99 0 0 0 sp|Q13835|PKP1_HUMAN 36.19 81.3 41.21 13.95 95 32.96 129.29 50.74 37.53 0 0 0 sp|P55042|RAD_HUMAN 0.57 0.44 0.54 0 0 0 0 0.54 0 7.38 2.42 1.48 sp|O76070|SYUG_HUMAN 1.7 0 0.54 0 0 0 2.1 0 0 4.92 6.28 5.42 sp|P21980|TGM2_HUMAN 37.32 66.28 21.95 55.79 19.79 49.44 19.97 31.31 53.1 285.37 199.11 135.02 Table D.1: Presents the normalized spectral counts for the selected signature proteins in squamous cell carcinoma samples. PD-poorly differentiated, MD- Moderately differentiated, WD- well differentiated

208

Figure D.3: The fragment ion spectra for two selected peptides of DSC3, desmosomal protein. The y ions and b ions identified by Sequest search algorithm is labeled in blue and red respectively.

Appendix E: The Normalized A549 Data Plots

Figure E.1: The PCA plots for A549 total protein and phosphoprotein data.

Figure E.2: The phosphorylation site intensity change in vector (darker color) vs. wild- type (lighter color) sample for MyC and MEK proteins

209

Appendix F: Phosphoprotein and protein network analysis of A549 cell lines

Figure F.1: The pathway map of differentially expressed phosphoproteins with a fold change ≥2 and p-value<0.05. All proteins up-regulated in LKB1 wild-type sample compared to LKB1-vector sample are presented in pink and all down-regulated proteins are presented in green. The solid lines indicate the direct relationships between proteins and dotted lines indicate the indirect protein interactions

210

Figure F.2: The pathway map of differentially expressed phosphoproteins with a fold change ≥2 and p-value<0.05. All proteins up-regulated in LKB1 wild-type sample compared to LKB1-vector sample are presented in pink and all down-regulated proteins are presented in green. The solid lines indicate the direct relationships between proteins and dotted lines indicate the indirect protein interactions

211

Figure F.3: The pathway map of differentially expressed proteins with a fold change ≥2 and p-value<0.05. All proteins up-regulated in LKB1 wild-type sample compared to LKB1-vector sample are presented in pink and all down-regulated proteins are presented in green. The solid lines indicate the direct relationships between proteins and dotted lines indicate the indirect protein interactions

212