<<

Comparison of Label and Label-free Quantitative Liquid

Chromatography Tandem for Biomarker

Discovery

THESIS

Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University

By

Bei Zhao

Graduate Program in Chemistry

The Ohio State University

2010

Master's Examination Committee:

Dr. Michael A. Freitas, Advisor

Dr. Susan V. Olesik, Co-advisor

Copyright by

Bei Zhao

2010

ABSTRACT

Mass spectrometry-based protein quantification approaches are powerful tools for biomarker discovery. In this thesis, a spectral counting-based label-free analysis platform and quantification platform are described and evaluated for detecting differential protein abundances of an in vivo system in response to

UV radiation. Sample preparation and separation of global protein digests were optimized for both label-free and labeling methods. Statistical evaluations such as principal component analysis and Venn diagram were used to evaluate the reproducibility of the analytical system. For spectral counting analysis, a few different normalization and modification methods such as ratioing of normalized spectral counting, normalized spectral abundance factors, and spectral index were applied and the results were compared. Correlations of the results from spectral counting and tandem mass tags labeling were also carried out. Potential protein biomarkers related to DNA damage repair in the in vivo system are proposed based on both labeling and label-free approaches.

ii

DEDICATION

This document is dedicated to my parents

iii

ACKNOWLEDGMENTS

I wish to express sincere thanks to my advisors Dr. Michael A. Freitas and Dr.

Susan Olesik for their valuable guidance and support. I also want to thank all of the group members and collaborators in this research. Especially I would like to thank Dr. Liwen

Wang for training me in the lab, Dr. George Heine and Dr. Jeffrey Parvin, the collaborators for the project in the thesis, Nan Kleinholz from Campus Chemical

Instrument Center who offered a lot of help on instrumentation. I also would like to thank

Mr. John Shapiro, Jonathan Clark and Josh Dettman for discussion.

Special thanks to my dearest parents and my best friend Renan Cabrera, who supported me mentally and financially without any reservations.

The study was funded by the Ohio State University.

iv

VITA

June 2003 ...... B.S. Marine Chemistry, Ocean University of

China, China

March 2006 ...... M.S. Physical Chemistry, University of

Windsor, Canada

2006 to present ...... Graduate Teaching and Research Assistant,

Analytical Chemistry, Department of

Chemistry, The Ohio State University

PUBLICATIONS

1. Bei Zhao, Jichang Wang. “Chemical Oscillations during the Photoreduction of 1, 4-

benzoquinone in Acidic Bromate Solution” J. Photochem. Photobiol. A: Chem 2007, 192,

204-210.

2. Bei Zhao, Jichang Wang. “Photomediated bromate-1, 4-benzoquinone reaction: A novel

photochemical oscillator” Chem. Phys. Lett. 2006, 430, 1-3, 41-44.

3. Bei Zhao, Jichang Wang.“Stirring-Controlled Bifurcations in the 1, 4-Cyclohexanedione-

Bromate Reaction” J. Phys. Chem. A. 2005, 109, 16, 3647-3651.

4. Jichang Wang, Krishan Yadav, Bei Zhao, QingYu Gao, Do Sung Huh. “Photo-

Controlled Oscillatory Dynamics in the Bromate-1, 4-Cyclohexanedione Reaction” J.

Chem. Phys. 2004, 121, 10138-10144. v

FIELDS OF STUDY

Major Field: Chemistry

vi

TABLE OF CONTENTS

ABSTRACT ...... ii

DEDICATION ...... iii

ACKNOWLEDGMENTS ...... iv

VITA ...... v

TABLE OF CONTENTS ...... vii

LIST OF TABLES ...... x

LIST OF FIGURES ...... xi

CHAPTER 1 INTRODUCTION ...... 1

1.1 Overview of quantification by mass spectrometry ...... 1

1.2 Isotope-labeled mass spectrometry ...... 2

1.2.1 Isotope-labeled mass spectrometry ...... 2

1.2.2 Advantages and limitations of isotope-labeled mass spectrometry ...... 2

1. 3 Label-free mass spectrometry approaches ...... 3

1.3.1 Label-free mass spectrometry approaches ...... 3 vii

1.3.2 chromatographic peak intensity measurements ...... 4

1.3.3 Spectral counting quantification ...... 4

1.3.4 Advantages and limitations of label-free approaches ...... 6

1.4 Summary ...... 7

CHAPTER 2 DEVELOPMENT OF ROBUST LABEL-FREE TO

DETERMINE PROTEIN CHANGES IN UV-INDUCED DNA DAMAGE ...... 8

2.1 Introduction ...... 8

2.1.1 Spectral counting label-free quantification approach ...... 9

2.1.2 The TMT quantification approach ...... 11

2.2 Experimental ...... 12

2.2.1 Material ...... 12

2.2.2 Cell culture and treatments ...... 13

2.2.3 Sample preparation for label-free analysis ...... 13

2.2.4 Sample preparation of TMT isotope-labeled ...... 17

2.2.5 LC-MS/MS analysis ...... 22

2.2.6 Database search ...... 23

2.3 Results and discussion ...... 25

2.3.1 Separation gradient optimization ...... 25

2.3.2 Evaluation of the reproducibility of the ...... 27

viii

2.3.3 Comparison analysis of the number of identified ...... 42

2.3.4 Data Reduction ...... 60

2.3.5 Normalization of label-free spectral count data ...... 77

2.3.6 Analysis with spectral counting method ...... 78

2.3.7 Analysis by use of Spectral Index (SI) ...... 81

2.3.8 Analysis by use of normalized spectral abundance factors ...... 85

2.3.9 Normalization for TMT data ...... 90

2.3.10 Analysis of the TMT data ...... 91

2.3.11 Correlation of label-free spectral counting data and TMT data ...... 93

2.3.12 Cluster analysis ...... 95

2.3.13 Potential biomarker selection ...... 100

2.4 Conclusion ...... 109

CHAPTER 3 SUMMARY ...... 111

REFERENCES ...... 112

APPENDICES ...... 119

Appendix A: ZipTip clean-up procedure ...... 119

Appendix B: Tables ...... 120

ix

LIST OF TABLES

Table 2.1 Potential protein biomarkers for HeLa cell samples...... 105

Table 2.2 Potential protein biomarkers for MCF-10A cell samples...... 107

Table B.1 Ratios and p-values for the t-test of log2(NSAF) for selected HeLa cell protein biomarker………………………………………………………...………………….….119

Table B.2 Ratios and p-values for the t-test of log2 (NSAF) for selected MCF-10A cell protein biomarkers……………………………………………………………...………121

x

LIST OF FIGURES

Figure 2.1 Workflow for the label-free spectral counting quantification strategy...... 10

Figure 2.2 The molecular structure of TMT labeling reagents...... 11

Figure 2.3 Workflow for protein quantification via TMT labeling ...... 12

Figure 2.4 Experiment design of label-free spectral counting quantification experiments to study cellular protein abundance changes under UV treatment...... 16

Figure 2.5 Experimental design of tandem mass tag labeling quantification for cellular protein abundance changes under UV treatment...... 21

Figure 2.6 The relationship of the number of homologous proteins identified at 95% confidence and gradient length for the analysis of HeLa global digests with LC-MS/MS.

...... 25

Figure 2.7 Baseline chromatography of HeLa cell global digests with an illustration of a gradient ...... 26

Figure 2.8 The comparison of baseline chromatography of HeLa cell label-free experiments for the technical replicate runs of the same control sample...... 31

Figure 2.9 The comparison of baseline chromatography of HeLa cell label-free experiments for the technical replicate runs of the same 1h recovered sample...... 31 xi

Figure 2.10 The comparison of baseline chromatography of HeLa cell label-free experiments for an analysis of different biological control samples...... 32

Figure 2.11 The comparison of baseline chromatography of HeLa cell label-free experiments for an analysis of one sample from different treatments...... 32

Figure 2.12 The comparison of baseline chromatography of MCF-10A cell label-free experiments for the technical replicate runs of the same control sample...... 33

Figure 2.13 The comparison of baseline chromatography of MCF-10A cell label-free experiments for the technical replicate runs of the same 2h recovered sample...... 33

Figure 2.14 The comparison of baseline chromatography of MCF-10A cell label-free experiments for an analysis of different biological 1h recovered sample...... 34

Figure 2.15 The comparison of baseline chromatography of MCF-10A cell label-free experiments for an analysis of one sample from different treatments...... 34

Figure 2.16 The comparison of baseline chromatography of TMT experiments for technical replicate runs of the same sample TMT1...... 35

Figure 2.17 The comparison of baseline chromatography of TMT experiments for technical replicate runs of the same sample TMT2...... 35

Figure 2.18 The comparison of baseline chromatography of TMT experiments for technical replicate runs of the same sample TMT3...... 36

Figure 2.19 The comparison of baseline chromatography of TMT experiments for an analysis of different samples...... 36

xii

Figure 2.20 Principal component analysis of all of the LC-MS data of HeLa label-free samples by Chaorder: (a) data points colored in terms of the treatments; (b) data points colored in terms of the analysis batches on different days...... 37

Figure 2.21 Principal component analyses of the LC-MS data of HeLa label-free samples for different treatments: (a) HeLa control samples, Red- HeLa ctrl1, Blue-HeLa ctrl2,

Black-HeLa ctrl3; (b) HeLa 1h recovered samples, Red- HeLa 1h1, Blue-HeLa 1h2,

Black-HeLa 1h3; and (c) HeLa 2h recovered samples, Red- HeLa 2h1, Blue-HeLa 2h2,

Black-HeLa 2h3...... 38

Figure 2.22 Principal component analyses of all of the LC-MS data of MCF-10A label- free samples by Chaorder: (a) data points colored in terms of the treatments; (b) data points colored in terms of the analysis batch on different days...... 39

Figure 2.23 Principal component analyses of the LC-MS data of MCF label-free samples for different treatments: (a) MCF control samples, Red- MCF ctrl1, Blue-MCF ctrl2,

Black-MCF ctrl3; (b) MCF 1h recovered samples, Red- MCF 1h1, Blue-MCF 1h2,

Black-MCF 1h3; and (c) MCF 2h recovered samples, Red- MCF 2h1, Blue-MCF 2h2,

Black-MCF 2h3...... 40

Figure 2.24 Principal component analyses of all of the LC-MS data of TMT samples by

Chaorder: (a) data points colored in terms of different TMT samples; and (b) data points colored in terms of the analysis batch on different days...... 41

Figure 2.25 Comparison of protein IDs for each HeLa label-free sample with technical replicates...... 46

xiii

Figure 2.26 Comparison of the average protein IDs in each technical analysis, total protein IDs for each biological sample, and the overlapped protein IDs for each biological sample for HeLa label-free samples...... 47

Figure 2.27 Comparison of the average protein IDs in each technical analysis within each treatment, average of total protein IDs for each biological sample, and the overlapped protein IDs for each biological sample within each treatment for HeLa label-free samples.

...... 48

Figure 2.28 Comparison of protein IDs for each MCF-10A label-free sample with technical replicates...... 49

Figure 2.29 Comparison of the average protein IDs in each technical analysis, total protein IDs for each biological sample, and the overlapped protein IDs for each biological sample for the MCF-10A label-free samples...... 50

Figure 2.30 Comparison of the average protein IDs in each technical analysis within each treatment, average of total protein IDs for each biological sample and the overlapped protein IDs for each biological sample within each treatment for MCF-10A label-free samples...... 51

Figure 2.31 Comparison of protein IDs for each TMT sample with technical replicates. 52

Figure 2.32 Comparison of the average protein IDs in each technical analysis, total protein IDs, and the overlapped protein IDs of each sample for the TMT samples...... 52

Figure 2.33 Comparison of protein IDs of a collectively searched dataset for HeLa label- free biological replicates receiving the same treatment...... 56

xiv

Figure 2.34 Comparison of protein IDs of a collectively searched dataset for MCF label- free biological replicates receiving the same treatment...... 56

Figure 2.35 Comparison of the average protein IDs of the HeLa and MCF label-free biological samples for each treatment, total protein IDs, and the overlapped protein IDs for each treatment...... 57

Figure 2.36 Comparison of protein IDs for the HeLa and MCF label-free samples with different treatments...... 58

Figure 2.37 Comparison of protein IDs of collectively searched dataset for different

TMT samples...... 58

Figure 2.38 Comparison of the total protein IDs between the HeLa and MCF-10A samples with the label-free approach and the TMT approach...... 59

Figure 2.39 Comparison of the total protein IDs between the label-free approach and the

TMT approach for the HeLa and MCF-10A samples...... 59

Figure 2.40 The number of proteins identified vs, technical replicates of the control samples for Hela cells...... 60

Figure 2.41 Number of spectral counts normalized by technical run numbers for label- free experiments...... 61

Figure 2.42 Histograms of raw spectral counts for each biological sample for HeLa cell label-free experiments...... 62

Figure 2.43 Histograms of raw spectral counts for each biological sample for MCF-10A cell label-free experiments...... 63

xv

Figure 2.44 Histograms of ion abundance of each biological sample in TMT labeling experiments...... 64

Figure 2.45 Correlations of raw spectra counts for different biological samples within the same treatment for the HeLa cell label-free experiments...... 66

Figure 2.46 Correlations of raw spectra counts for different biological samples within the same treatment for the MCF-10A cell label-free experiments...... 69

Figure 2.47 Correlation of raw spectra counts for the biological samples with different treatments for the HeLa cell label-free experiments...... 72

Figure 2.48 Correlations of raw spectra counts for the biological samples with different treatments for the MCF-10A cell label-free experiments...... 73

Figure 2.49 Correlations of the average raw spectral counts for the HeLa and MCF-10A cell label-free biological samples receiving the same treatment...... 74

Figure 2.50 Correlations of ion abundance for the biological samples with the same treatment for the HeLa cell TMT labeling experiments...... 75

Figure 2.51 Correlations of ion abundance for the biological samples with the same treatment for the MCF-10A cell TMT labeling experiments...... 76

Figure 2.52 Volcano plots for raw spectral counts of the HeLa label-free samples...... 79

Figure 2.52 Volcano plots for normalized spectral counts of the HeLa label-free samples.

...... 80

Figure 2.53 Volcano plots for raw spectra counts of the MCF label-free samples...... 80

Figure 2.54 Volcano plots for normalized spectral counts of the MCF label-free samples.

...... 81

xvi

Figure 2.55 Volcano plots of SI for normalized spectral counts of HeLa label-free samples...... 83

Figure 2.56 Volcano plots of SI for normalized spectral counts of MCF label-free samples...... 83

Figure 2.57 Distribution of SI for HeLa cell label-free analysis...... 84

Figure 2.58 Distribution of SI for MCF cell label-free analysis...... 84

Figure 2.59 Histograms of NSAF of each biological HeLa cell sample in label-free experiments ...... 85

Figure 2.60 Histograms of log2 transformed NSAF of each biological HeLa cell sample in label-free experiments...... 87

Figure 2.61 Volcano plots of NSAF with log2 transformation for the HeLa label-free samples...... 88

Figure 2.62 Volcano plots of NSAF without log2 transformation for the HeLa label-free samples...... 89

Figure 2.63 Volcano plots of NSAF with log2 transformation for the MCF label-free samples...... 89

Figure 2.64 Volcano plots of NSAF without log2 transformation for the MCF label-free samples...... 90

Figure 2.65 Distribution of ratios of averaged normalized ion intensities between treatments for HeLa TMT analysis...... 92

Figure 2.66 Distribution of ratios of averaged normalized ion intensities between treatments for MCF TMT analysis...... 93

xvii

Figure 2.67 Scatter plots of the ratios of protein abundances from replicate samples of the label-free normalized spectra counts and TMT analyses...... 94

Figure 2.68 Scatter plots of the ratios of protein abundances from replicate samples of the label-free NSAF and TMT analyses...... 95

Figure 2.69 Heat maps from the cluster analysis of the normalized SC for both the HeLa cell and MCF-10A cell label-free experiments...... 97

Figure 2.70 Heat maps from the cluster analysis of the NSAF for both the HeLa cell and

MCF-10A cell label-free experiments...... 98

Figure 2.71 Heat maps from the cluster analysis of the normalized TMT ion abundance for both the HeLa cell and MCF-10A cell TMT labeling experiments...... 99

Figure 2.72 Bar plots of raw SC, normalized SC, SAF, NSAF, TMT ion abundance, normalized TMT ion abundance, ratio of normalized SC, ratio of NSAF, SI, and ratio of normalized TMT ion abundance for protein IPI00001159.1 of the HeLa samples...... 102

Figure 2.73 Bar plots of raw SC, normalized SC, SAF, NSAF, TMT ion abundance, normalized TMT ion abundance, ratio of normalized SC, ratio of NSAF, SI, and ratio of normalized TMT ion abundance for protein IPI00219169.3 of the MCF samples...... 103

xviii

CHAPTER 1

INTRODUCTION

1.1 Overview of quantification by mass spectrometry

Mass spectrometry has become a powerful tool for identifying proteins and profiling global protein abundances in biological matrices, which can provide important information for the study of cellular response to disturbance and disease. The top-down protein profiling approach often involves the utility of two-dimensional , which lacks resolution and sensitivity. It is only good for abundant proteins and inefficient for the analysis of insoluble proteins and those with very high and very low mass. Alternatively, the bottom-up shotgun approach, which demonstrated better sensitivity and reproducibility over the 2D electrophoresis-based method, involves protein digestion in a complex mixture followed by peptide separation and mapping with

HPLC-mass spectrometry. For the shotgun protein quantification with mass spectrometry, there are mainly two kinds of strategies, isotope-labeled and label-free methods [1-9].

1

1.2 Isotope-labeled mass spectrometry

1.2.1 Isotope-labeled mass spectrometry

This category of quantitative methods utilizes stable isotope tags as internal standard introduced to amino acids for quantification between samples with different treatments or from patients at different progressive stages. Isotope labels can be introduced (a) metabolically, (b) enzymatically, or (c) chemically. Metabolic stable- isotope labeling introduces mass tags through culturing cells in the media that are isotopically depleted or isotopically enriched by using 15N salts or 13C-labeled amino acids, e.g., stable-isotope labeling by amino acids in cell culture, SILAC) [10, 11].

Enzymatic labeling usually incorporates 18O either during or after digestion to the C- terminus peptides [12-14]. Examples of isotope tagging by chemical reaction include isotope-coded affinity tags [15, 16], isobaric tags for relative and absolute quantification

[17] and tandem mass tags [18].

1.2.2 Advantages and limitations of isotope-labeled mass spectrometry

These isotope-labeled methods are compatible with , providing flexible tools to study protein changes in complex biological matrices with relatively good accuracy and precision. The stable isotope-labeled strategies are considered to be more sensitive for differential detection than the label-free methods [4,

19, 20]. However, most of these approaches have potential limitations such as high cost

2

of reagents and time, complicated sample preparation, incomplete labeling, requirement of larger sample amounts, and sophisticated software for data analysis.

1. 3 Label-free mass spectrometry approaches

1.3.1 Label-free mass spectrometry approaches

To overcome the disadvantages of isotope-labeling mass spectrometry, label-free methods have been developed. Label-free approaches do not require any additional chemical labeling, hence simplifying the sample preparation step, and reducing the cost of time and reagents. There are no limitations of the number of multiplexed samples to be compared, thus adding more flexibility to the experimental design. The label-free methods are also compatible with shotgun mass spectrometry, which are suitable to high- throughput analysis for complex biological mixtures. Moreover, compared with the labeling approaches whose dynamic range falls in two orders of magnitude, label-free methods can provide a larger dynamic range of quantitation; up to four orders of magnitude [21].

There are primarily two different label-free strategies: Peptide chromatographic peak intensity measurements and spectral counting [22]. The common steps of the label- free strategies are: (i) sample preparation including protein extraction, reduction, alkylation, and digestion; (ii) chromatographic separation and mass spectrometry analysis;

3

(iii) data analysis including peptide/protein identification, quantification, and statistical analysis.

1.3.2 Peptide chromatographic peak intensity measurements

It was reported in 1999 that electro-spray ionization (ESI) signal intensity and ion concentration are correlated [23]. And the peak area of extracted ion chromatogram (XIC) for each peptide was found to increase with increased peptide concentration [24].

Therefore, relative protein concentrations can be directly compared by normalized peak intensity of XIC. This approach is considered to be a simple and effective method for quantifying peptide relative abundance or protein relative abundance. Retention time and m/z shift can cause large variability and inaccuracy for direct comparison of multiple LC-

MS analysis. Therefore, highly reproducible LC-MS and sophisticated software are needed for automated aligned peak comparisons for complex biological samples [21].

1.3.3 Spectral counting quantification

Dr. John Yates’s group studied the correlation between relative protein abundance and spectral counts. They found a strong linear correlation with over two orders of magnitude dynamic range, which revealed potential applications to quantify relative protein abundance with spectral counts [25]. Washburn’s group showed that there’s a strong correlation of the spectral counting method and stable isotope method for quantification [26].

4

Compared with quantification by peptide chromatographic peak intensity measurements, the spectral counting method does not require sophisticated computer algorithms for data reduction; however, normalization and statistical analyses are necessary to accurately detect protein abundance changes. The simplest approach is to normalize spectral counts with total spectral counts to correct the variation from run to run [27]. The Spectral counting method is also more tolerant to LC-MS variability, while peptide chromatographic peak intensity quantification performs best when excellent reproducibility is obtained.

Zhang et al. compared five statistical tests to evaluate the significance of quantification by spectral counts. They reported that the student’s t-test outperformed

Fisher’s exact test, goodness-of-fit test (G-test), and the AC test when three or more replicates were available [28]. Fu et al. developed a spectral index (SI), which combines relative protein abundance by SC with the number of replicates under the same treatment containing detectable peptides for each protein. The SI approach was applied for protein diffeential detection for cystic fibrosis and showed that it outperformed the other statistical tests [29]. In the spectral counting approach, the proteins with large molecular weight/length tend to generate more spectral counts than the proteins with small molecular weight/length; therefore, the detected relative abundances tend to be biased toward the large proteins [30-32]. Washburn’s group introduced the normalized

5

abundance factor (NSAF), which is defined as the number of spectral counts (SC) for a certain protein divided by the protein length (L), then divided by the sum of SC/L for all proteins in the analysis [33-35]. A similar approach is to correct the SC with the molecular weight [36]. Griffin et al. recently developed an approach also called spectral index, that combines SC and the cumulative fragment ion abundance for each significantly identified peptide, giving rise to a protein, which is claimed to be more advantageous than the other spectral counting methods [37]. This method requires additional software to extract peptide ion abundance from the LC-MS file.

1.3.4 Advantages and limitations of label-free approaches

The label-free quantification approaches are considered to be simple and cost- effective and showed good reproducibility and linearity for both peptide and protein quantifications. Due to the obvious advantages, the SC quantification strategies have been applied to a broad range of bio-applications. Examples include profiling specific pathological stages of multiple sclerosis [38], comparing protein expression in wild-type versus defective mutant cell lines or mice [39, 40], detecting differential protein expression related to disease progression [41], comparing differential protein expression under multiple growth conditions [28], identifying the extent of phsophorylation over a time course in response to signaling events [42], and creating proteomic maps of subcellular localization or functional categorization [43-45].

6

The limitations of spectral counting rise from the inherent nature of mass spectrometry. Due to the limited sampling duty cycle and utility of data dependent acquisition with dynamic exclusion, mass spectrometer tends to pick up the peptide ions from the abundant proteins. Therefore, the quantification of spectral counting is strongly biased towards abundant proteins and the visibility of low abundant proteins is constrained [46].Another limitation of spectral counting is the saturation of SCs at high protein abundance levels, due to the limitations in the ion-trapping capacity and ionization efficiency of the mass spectrometer, which compresses the dynamic range of quantifiable proteins up to two or three orders of magnitude [46, 47]. To estimate high- fold changes, some studies showed that SC performs better than stable-isotope labeling in terms of accuracy [48]. To detect low-fold changes, SC showed lower sensitivity, especially for the proteins with low SC values [49].

1.4 Summary

The discovery and quantification of biomarkers in different physiological states of a biological system remains to be and important and challenging technical task in proteomics. A handful of mass spectrometry-based quantification methods have emerged, and all of them have their strengths and limitations. Each method can provide a partial quantification solution to the complex protein matrices. It is considered that significant further improvements to experimental strategies and statistical approaches are required to allow for more comprehensive investigations of biological phenomena.

7

CHAPTER 2

DEVELOPMENT OF ROBUST LABEL-FREE PROTEOMICS

TO DETERMINE PROTEIN CHANGES IN UV-INDUCED DNA

DAMAGE

2.1 Introduction

Ultraviolet (UV) radiation plays an important role in skin carcinogenesis, leading to a complex transcriptional response of the cells resulting in the regulation of DNA damage repair, cell cycle progression, and apoptosis [50-52]. The molecular mechanisms of the cellular UV responses still need to be understood in further detail. Mass spectrometry-based proteomics is a popular approach for the quantification and identification of proteins [2-4, 8]. The two main strategies of quantification mass spectrometry are stable isotopic labeling and label-free quantification. Both strategies have pros and cons. The labeling approaches offer better sensitivity, but require

8

additional sample processing steps, which increase cost. The label-free approaches are popular alternatives due to their simplicity and flexibility for large-scale quantification.

[21, 22]. In this study, we applied both the label-free spectral counting approach and the tandem mass tag (TMT) labeling method to study the global protein responses of human cells subjected to UV radiation to provide important clues regarding cellular response to

DNA damage.

2.1.1 Spectral counting label-free quantification approach

The development and limitations of the spectral-counting strategy were discussed in Chapter 1. The general experimental workflow is shown in Figure 2.1. The global protein sample is digested into peptides that are separated by a HPLC and then detected by a high-resolution tandem mass spectrometer. Under the data-dependent acquisition mode with dynamic exclusion for the settings of the ion-trap mass spectrometer, in each duty cycle, one full MS survey scan is followed by ten MS/MS scans. Then, the output is analyzed by a database-searching algorithm to assign detected spectra to each protein.

The number of spectra for each protein is compared to obtain the relative abundance of the protein in the samples.

9

10

Figure 2.1 Workflow for the label-free spectral counting quantification strategy.

10

2.1.2 The TMT quantification approach

TMT quantification is an analog of iTRAQ and is compatible with shotgun proteomics, enabling the simultaneous identification and quantification of up to six samples. It labels the peptides with isobaric tags after enzymatic digestion on primary amine groups at the N-terminus and internal lysine (K) side chains through N-hydroxy- succinimide (NHS) chemistry [53, 54]. These isobaric tags contain three groups, including a peptide reactive group that labels the tags on peptides, a reporter group that reports the abundance of a given peptide in MS/MS mode, and a mass balance group that makes the tags have the same mass to add to the peptide (see Figure 2.2). When fragmentations occur in the MS/MS mode, the reporters will be released from the cleavable linker and provide relative quantification for the given peptide through their respective ion abundance [55].

Figure 2.2 The molecular structure of TMT labeling reagents.

11

The procedure of the TMT labeling approach is demonstrated in Figure 2.3

Figure 2.3 Workflow for protein quantification via TMT labeling

Unlike SILAC which can only be implemented via metabolic incorporation, TMT and iTRAQ methods are relatively straight forward, and can be applied to both in vivo and in vitro systems including cell lines, organ tissue from experimental animals or from human origin [56].

2.2 Experimental

2.2.1 Material

The RapigestTM SF protein solubilization reagent was purchased from Waters

(Milford, MA). Tandem mass tag isobaric mass tagging kits and reagents were purchased from Thermo Scientific (Waltham, MA). Modified sequencing grade trypsin (V511) was purchased from Promega (St. Louis, MO). Phosphate buffered saline (PBS) solution, and ammonium bicarbonate (ABC) were obtained from Fisher Scientific (Pittsburgh, PA). All chemicals were analytical or HPLC grade. 12

2.2.2 Cell culture and treatments

All of the cells were cultured and treated in Dr. Jeffery Parvin’s lab (OSU

Medical Center). The HeLa, MCF-10A cells were maintained according to American

Type Culture Collection [57] recommendations. For UV treatment, about 1×106 human cells on each 10 cm plate covered in 1 ml of phosphate buffered saline were exposed to an instantaneous burst of UV radiation at 254 nm and 25 J/m2 (Hoefer UVC-500), and allowed to recover in the incubator for 1 or 2 hours; fresh media was added for various recovery times. A separate control set was prepared in the same manner, but without UV radiation. For the label-free experiments, triplicate samples of HeLa and MCF-10A cells were prepared for each treatment.

2.2.3 Sample preparation for label-free analysis

To harvest the cells, the cell plates were washed twice with PBS buffer (3.2 mM

Na2HPO4, 0.5 mM KH2PO4, 1.3 mM KCl, 135 mM NaCl, pH=7.4). The cells were scraped into micro-centrifuge tubes in a PBS buffer, and the mixtures were centrifuged for 15 secs at ~14,000 rpm. The supernatants were removed and the cell pellets were re- suspended in lysis buffer (50 mM Tris-HCl buffer pH=8.0, 150 mM NaCl, 0.5% NP40, 1 mM EDTA, 5% glycerol) with the addition of protease inhibitors. Five volumes of lysis buffer were added to one volume of the cell pellet; the solution was then incubated for

15-30 min on ice. Then the supernatants were obtained by centrifuging in a cold room at

14K rpm for 30 min. The protein concentrations were measured by Bradford assay. For

13

each sample, approximately 100 µg of protein was precipitated from the whole cell lysate by the addition of trichloroacetic acid (TCA) up to 10% of the total volume and incubation overnight. The detailed TCA precipitation procedure is as follows.

One volume of trichloroacetic acid (100%w/v TCA) was added to nine volumes of protein lysis to each sample, and then added 0.02% total volume of sodium deoxycholate (SDC, 100% w/v) carrier solution to each sample, which were incubated on ice for two hours. The sample tubes were centrifuged in a micro-centrifuge at 14,000 rpm for 30 min in a cold room. The supernatants were removed, and the pellets were gently washed with 400 µl cold acetone, then they were incubated on ice for 10 min. Then the samples were centrifuged again at 14,000 rpm for 10 min. After removing the supernatants, the pellets were dried by leaving the tubes open for 10 min to evaporate the acetone residue.

The cell pellets were suspended in 50 mM NH4HCO3 (pH=7.9), with 0.5%

RapigestTM, and then sonicated for one hour. The disulfide bonds were reduced using a final concentration of 5 mM dithiothreitol (DTT), and incubated for 30 min at 60°C. The solution was cooled and a final concentration 15 mM of iodoacetamide was added to the alkylate sulfhydryl groups. The samples were incubated for 30 min in the dark and then the proteins were digested twice for 8 hours at 37°C with addition of trypsin at a 20:1 ratio each time. To decompose RapigestTM, formic acid was added to 2% of the final

14

concentration (pH~2.0), and incubated at 37°C for two hours. The samples were then centrifuged at 14,000 rpm for 30 min to remove the water immiscible degradation product. The aqueous fractions were concentrated with a SpeedVac concentrator (Savant

SPD1010, Thermo Scientific), and then re-suspended into water.

Before the LC-MS/MS analysis, the samples were cleaned with ZipTipc18

(Millipore, Billerica, MA) using the procedure described in Appendix A. The eluents were then concentrated with SpeedVac concentrator, and resuspended into HPLC water to reach a final concentration of approximately 1g/l. The protein concentrations were measured again with Bradford assay for optimizing the sample load of LC-MS/MS analysis. The experimental design for the label-free approach is demonstrated in Figure

2.4.

15

Figure 2.4 Experiment design of label-free spectral counting quantification experiments to study cellular protein abundance changes under UV treatment.

16

2.2.4 Sample preparation of TMT isotope-labeled peptides

The cell lysates of the HeLa cells and MCF-10A cells were obtained and TCA precipitated with the procedure as described for the label-free approach in 2.2.3.

Approximately 100 µg of protein per condition were obtained and their concentration was determined by . The cell pellets were resuspended in 100 µl of

100 mM triethylammonium bicarbonate (TEAB) with 0.5% RapigestTM and sonicated for one hour. A final concentration of 10 mM tris-(2-carboxyethyl)phosphine (TCEP) was added to each sample to reduce the disulfide bonds, incubating at 60 °C for one hour.

Then a final concentration of 18 mM iodoacetimide was added, leaving in the dark for 30 min. The proteins were digested with the addition of trypsin at a ratio of 20:1, incubating at 37°C for eight hours, and then the same amount of trypsin was added again with another eight hours of incubation at 37°C.

The labeling process was performed after the digested samples were cooled to room temperature. acetonitrile (~ 41 µl) was added to each tube containing 0.8 mg of labeling reagent. Then the reagent was allowed to dissolve for five min with occasional vortexing and brief spinning. Then 41 µl TMT label reagent was pipetted into each sample, followed by 1h incubation at room temperature. To terminate the labeling process, 8 µl of 5% hydroxylamine was added to each sample, incubating for 15 min.

Formic acid was added to each sample to reach a final concentration of 6% in order to

17

precipitate out RapigestTM, incubating at 37°C for four hours. After centrifugation at

14,000 rpm for 30 min, the supernatants were collected and combined.

The experimental design is shown in Figure 2.5. The TMT1 sample was prepared by mixing two identical MCF-10A cell samples of control, 1h recovered, and 2h recovered. The TMT3 sample was prepared by mixing two identical HeLa cell samples of control, 1h recovered, and 2h recovered. The TMT2 sample was prepared by mixing one

HeLa cell sample and one MCF-10A cell sample of control, 1h recovered, and 2h recovered.

The samples were cleaned with a cation exchange column (SCX, Applied

Biosystems Inc, Carlsbad, California) using the following procedure: The SCX column was cleaned by injecting 1 ml of cleaning buffer (detailed composition unknown, purchased from Applied Biosystems Inc.), then the column was washed with 2 ml of load buffer (detailed composition unknown, purchased from Applied Biosystems Inc.). The sample was slowly injected with a 500 µl syringe (Hamilton, Reno, Nevada, US) at the speed of 1 drop per second. Then 1 ml of load buffer was added to wash the column, which was eluted with 500 µl elute buffer (detailed composition unknown, purchased from Applied Biosystems Inc.). The eluent was then collected in a clean vial. The samples were dried with a SpeedVac concentrator.

18

Since a significant amount of salt was observed with the samples, desalting steps were performed with PepCleanTM C-18 spin columns (Pierce, Rockford, IL, US) as follows:

To prepare the spin columns, 200 µl of activation solution (50% ACN in dd H2O) was added to each column, and spun at 1700 rpm for one min to let the solution flow through. The above procedure was repeated one more time. Then 200 µl equilibration solution (0.5% TFA in 5% ACN) was added and spun at 1700 rpm for one min. This step was repeated once more.

To bind the samples, a 25 µl sample was mixed with a 8.3 µl sample buffer (2%

TFA in 20% ACN) and added to the column, spun at 1700 rpm for one min. Then the flow through was recovered. The above steps were repeated four times. Then 200 µl wash solution (0.5% TFA in 5% ACN) was added and spun at 1700 rpm for 1 min. The washing step was three times.

To elute the samples, a 20 µl elution buffer (70% ACN in dd H2O) was added to the column, which was spun at 1700 rpm for one min. The eluent was then collected.

This procedure was repeated four times to collect all of the eluents in a vial. Afterward, the samples were dried in a SpeedVac and re-suspended in 0.1% formic acid at approximately 1 µg/µl. The protein concentrations were estimated using Bradford assay

19

prior LC-MS/MS analysis. The experimental design for TMT labeling approach is demonstrated in Figure 2.5.

20

Figure 2.5 Experimental design of tandem mass tag labeling quantification for cellular protein abundance changes under UV treatment.

21

2.2.5 LC-MS/MS analysis

The global protein digests were separated by a capillary HPLC instrument

(UltiMate® 3000 HPLC, Dionex, Sunnyvale, CA) with a reversed-phase C18 column

(Michrom Magic C18AQ, 200um ID, 15cm, 3µm, 200Å) at a flow rate of 2µL/min. The peptides were then detected on a high resolution Orbitrap tandem mass spectrometer

(Thermo Finnigan, San Jose, CA) and interfaced with a Michrom Advance ESI source

(Auburn, CA).

For label-free HeLa cell samples, an optimized 200 min gradient of mobile phase

A (0.1% formic acid in water) and mobile phase B (0.1% formic acid in acetonitrile) was used. Mobile phase B increased linearly from 2% to 10% in 5 min, then to 32% in 175 min and to 60% in 10 min, to 90% in 10 min, hold at 90% for 10 min, and decreased to 2% in 5 min. The column was washed for half an hour with a 20 min wash gradient, and followed by a 40 min double blank gradient between each run. The injection volume was optimized based on the concentration; approximately 2 µg peptides were injected for each analysis. Each HeLa cell sample was analyzed at least three times.

For label-free MCF-10A cell samples, an optimized 300 min gradient was used, for which mobile phase B increased linearly from 2% to 10% in 5 min, then to 32% in

275 min and to 60% in 10 min, to 90% in 10 min, held at 90% for 10 min, and decreased

22

to 2% in 5 min. The other separation conditions were the same as the ones for label-free

HeLa cell samples. Each MCF-10A cell sample was analyzed twice with LC-MS/MS.

For TMT samples, a long gradient of 480 min was applied, for which mobile phase B increased linearly from 2% to 10% in 5 min, then to 32% in 455 min and to 60% in 10 min, to 90% in 10 min, held at 90% for 10 min, and decreased to 2% in 5 min. The other separation conditions were the same as the ones for label-free samples. Each TMT sample was analyzed three times.

The mass spectrometry for the label-free method was set up with Collision

Induced Dissociation (CID). The detection was initiated with a full mass scan in the range of 200-2000 Da, followed by ten data-dependent tandem mass scans for the ten most intense precursor ions. Electrospray voltage was maintained at 2 KV and capillary temperature was set at 175°C. For the mass spectrometric detection of TMT samples, the ions were fragmented by Pulsed-Q Dissociation (PQD). The relative ion intensities for each mass tag were later compared for quantitative analysis. The other settings were the same as the ones for the label-free samples.

2.2.6 Database search

The generated product ions were analyzed by the in-house developed search algorithm, MassMatrix. The search parameter profile was set up according to the

23

suggested values for the LTQ-Orbitrap mass spectrometer. The data obtained from the

LC-MS/MS analysis is in the RAW format. Database IPI HUMAN v3.65 was selected and a randomized decoy database was attached to search with in order to determine the false positive rate. The enzyme option was selected at the default setting, which is

“Trypsin no P rule: R, K-X”. The precursor ion tolerance was set at 10 ppm, while the product ion tolerance was set at 0.8 Da, which are considered to be compatible parameters for the data of the LTQ-Obitrap mass spectrometer. Missed cleavages were chosen at 3, and the monoisotopic option was selected for mass type. Minimum and maximum peptide length were set at 6 AA and 40 AA. The minimum pp score was set at

5.0 and the minimum pptag score was set at 1.3. CID was chosen for the fragmentation method for label-free data, while for TMT data, quantification with TMT 6-plex, and robust linear regression for quantification statistics were selected.

The datasets of technical replicates for the same biological sample were searched both individually and collectively. After the search, the information of the non- homologous proteins (only the protein of the highest score of the protein family was reported) and the homologous proteins (all of the proteins within the protein family with the same peptide sequence matching were reported) was extracted to spreadsheets. The information of proteins were aligned across different datasets for comparison. A series of statistical analyses was performed to quantify the proteins identified in the study.

24

2.3 Results and discussion

2.3.1 Separation gradient optimization

Due to the limited duty cycle and dynamic range of the mass spectrometers, only the highest intensity ions of very complex samples will be selected by the mass spectrometer and fragmented in data-dependent acquisition, which reduced the probability of identifying the less intensive co-eluted peptide ions [58]. Therefore, it is critical to develop proper separation with large peak capacity to resolve the peptide co- eluents. By injecting 1.5 µg HeLa cell global protein digests to LC-MS/MS with different gradient lengths, an approximately linear relationship was observed between the gradient length and the number of proteins identified at 95% confidence with homologous search summary, as shown in Figure 2.6.

1400

1200

1000

800

600

400 confidence 200

0

Num of proteins identified at 95% at identified proteins of Num 0 50 100 150 200 250 300 350 Gradient length (min)

Figure 2.6 The relationship of the number of homologous proteins identified at 95% confidence and gradient length for the analysis of HeLa global digests with LC-MS/MS.

25

Figure 2.7 illustrates a gradient with a baseline chromatography of HeLa cell global digests and the peptide peaks spanning over the gradient range, indicating that all of the gradient space was efficiently used.

Figure 2.7 Baseline chromatography of HeLa cell global digests with an illustration of a

gradient

26

2.3.2 Evaluation of the reproducibility of the chromatography

Reproducibility is very important to the quality of mass spectrometry-based quantification [49]. For highly complex global protein digest samples, column degradation with heavy peptide residues or other sample wastes over time can cause retention time shift from run to run. Therefore, in our study, it was observed that it is critical to have multiple washing gradients and blanks between each analysis to recover the column; otherwise, there will be a significant increase in column pressure.

Instrument availability is another uncontrollable factor which prevented the label-free samples to be analyzed in the same batch, inevitably introducing the variability of the instrument conditions, such as HPLC tubing connection, mass spectrometry tune condition, or ESI spray condition etc.

In this study, the baseline chromatography of the label-free HeLa cell samples,

MCF-10A cell samples, and TMT samples are demonstrated in Figure 2.8 ~ Figure 2.11,

Figure 2.12 ~ Figure 2.15 and Figure 2.16 ~ Figure 2.19, Some of the figures show excellent reproducibility for the repeated analysis of the same sample, such as Figure 2.9,

Figure 2.12, Figure 2.16 and Figure 2.17. From Figure 2.8, Figure 2.13 and Figure 2.18, slight retention time shift and subtle difference in terms of some peak ion abundance could be observed. Figure 2.14 shows good agreement of the baseline chromatography of the different biological MCF cell samples for the same treatment, while Figure 2.10 shows differences in chromatography of the different biological HeLa cell samples for

27

the same treatment. For the chromatography of the samples with different treatments in

Figure 2.11, Figure 2.15 and Figure 2.19, the obvious differences could be observed from all of the label-free samples and TMT samples. It seems that the between treatment variability that was observed is larger than the between sample variability and between run variability.

The reproducibility of chromatography was also evaluated with the Chaorder software [59], a tool producing a visual representation of the similarities in a set of experiments using principal component analysis. In a three-dimensional graph of the

Chaorder analysis output, the distance between two points indicates the difference of the two experiments in terms of the chromatography retention time shift and peak ion abundance, and other factors.

The outputs of the principal component analysis by Chaorder were graphed into three-dimensional xyz scatter plots for easier visualization, as shown in Figure

2.20~Figure 2.24. For HeLa cell label-free PCA analysis shown in Figure 2.20a, the colored points represented each analysis colored by different treatments randomly mixed with each other. In other words, one cannot see the cluster of the data based on different treatments from the PCA of the chromatograph. In Figure 2.20b the data points colored by different batches were clearly grouped with each other. Batch 1 data were obtained in early November 2008, and Batch 2 and Batch 3 data were obtained consecutively from

28

late December 2008 to the beginning of January 2009. In Figure 2.20b, we can see that the points representing Batch 2 and Batch 3 were closer to each other and far away from

Batch 1 data points. This could be due to a longer time interval between Batch 1 and

Batch 2, creating more variability to the instrument conditions. In Figure 2.20, all of the points fall in the range of 0.8 from the scales of the axis, except for three data points of the controls. Most points fall in the range of 0.5. From Chaorder analysis, this indicates a bias in the experimental setup for the technical replicates. However, the analysis indicated good reproducibility despite clustering of the different batches.

Figure 2.21 displays the PCA results for the HeLa label-free samples with different treatments in 3D scatter plots individually, which didn’t show any grouping behavior of the same sample between technical replicates. It indicates that all of the samples in each plot look similar, and there is no significant difference from one biological sample to another within the same treatment.

Figure 2.22 and Figure 2.23 demonstrate the PCA analysis of the chromatography for MCF-10A label-free samples. In Figure 2.22a, all of the data points fall in the range of 0.5, without showing the grouping in different treatments. This indicates good reproducibility and similarity of the chromatography. Clusters were also observed for the different batches in Figure 2.22b, indicating the bias rising from the change of instrument conditions, which agrees with Figure 2.20b. Figure 2.23 also agrees with Figure 2.21,

29

indicating no difference between the biological replicate samples receiving the same treatment.

Figure 2.24 shows the PCA analysis of the three TMT samples with triplicate analyses. All of the data points fall in the range of 0.3, showing even better reproducibility, which could be due to the fact that the analyses were actually conducted in a consecutive long batch that minimized the instrument bias. However, if we arbitrarily set the first half of the batch as Batch 1 and the remainder as Batch 2, the grouping could still be observed in terms of batches in Figure 2.24b. This agrees with the fact that the conditions of the instruments changed over time. Figure 2.24a shows that the data points of the same sample are closer to each other. This grouping behavior can be seen only when there is excellent reproducibility.

30

Figure 2.8 The comparison of baseline chromatography of HeLa cell label-free experiments for the technical replicate runs of the same control sample.

Figure 2.9 The comparison of baseline chromatography of HeLa cell label-free experiments for the technical replicate runs of the same 1h recovered sample.

31

Figure 2.10 The comparison of baseline chromatography of HeLa cell label-free experiments for an analysis of different biological control samples.

Figure 2.11 The comparison of baseline chromatography of HeLa cell label-free experiments for an analysis of one sample from different treatments.

32

Figure 2.12 The comparison of baseline chromatography of MCF-10A cell label-free experiments for the technical replicate runs of the same control sample.

Figure 2.13 The comparison of baseline chromatography of MCF-10A cell label-free experiments for the technical replicate runs of the same 2h recovered sample.

33

Figure 2.14 The comparison of baseline chromatography of MCF-10A cell label-free experiments for an analysis of different biological 1h recovered sample.

Figure 2.15 The comparison of baseline chromatography of MCF-10A cell label-free experiments for an analysis of one sample from different treatments.

34

Figure 2.16 The comparison of baseline chromatography of TMT experiments for technical replicate runs of the same sample TMT1.

Figure 2.17 The comparison of baseline chromatography of TMT experiments for technical replicate runs of the same sample TMT2.

35

Figure 2.18 The comparison of baseline chromatography of TMT experiments for technical replicate runs of the same sample TMT3.

Figure 2.19 The comparison of baseline chromatography of TMT experiments for an analysis of different samples.

36

Figure 2.20 Principal component analysis of all of the LC-MS data of HeLa label-free samples by Chaorder: (a) data points colored in terms of the treatments; (b) data points colored in terms of the analysis batches on different days.

37

Figure 2.21 Principal component analyses of the LC-MS data of HeLa label-free samples for different treatments: (a) HeLa control samples, Red- HeLa ctrl1, Blue-HeLa ctrl2, Black-HeLa ctrl3; (b) HeLa 1h recovered samples, Red- HeLa 1h1, Blue-HeLa 1h2, Black-HeLa 1h3; and (c) HeLa 2h recovered samples, Red- HeLa 2h1, Blue-HeLa 2h2, Black-HeLa 2h3.

38

Figure 2.22 Principal component analyses of all of the LC-MS data of MCF-10A label- free samples by Chaorder: (a) data points colored in terms of the treatments; (b) data points colored in terms of the analysis batch on different days.

39

Figure 2.23 Principal component analyses of the LC-MS data of MCF label-free samples for different treatments: (a) MCF control samples, Red- MCF ctrl1, Blue-MCF ctrl2, Black-MCF ctrl3; (b) MCF 1h recovered samples, Red- MCF 1h1, Blue-MCF 1h2, Black-MCF 1h3; and (c) MCF 2h recovered samples, Red- MCF 2h1, Blue-MCF 2h2, Black-MCF 2h3.

40

Figure 2.24 Principal component analyses of all of the LC-MS data of TMT samples by Chaorder: (a) data points colored in terms of different TMT samples; and (b) data points colored in terms of the analysis batch on different days.

41

2.3.3 Comparison analysis of the number of identified proteins

Comparison analysis of the number of identified proteins has been performed on all of the outputs of the database search algorithm Massmatrix. The protein IDs were cut off at 95% confidence level by checking the false positives. The results are illustrated in

Figure 2.25~Figure 2.36. The comparisons of the the number of identified homologous proteins at 95% confidence level with decoy database searching) for each sample with technical replicates are demonstrated in Figure 2.25 (for the HeLa cell label-free samples),

Figure 2.27 (for the MCF cell label-free samples) and Figure 2.29 (for the TMT samples).

The average number of identified homologous proteins of each technical analysis, total proteins identified, and overlapped protein identified for each biological sample with multiple analyses for all of the label-free samples and TMT samples are summarized in

Figure 2.26, Figure 2.28 and Figure 2.30.

For the HeLa label-free samples shown in Figure 2.25 and Figure 2.26, the total proteins identified for each biological sample falls in the range of 1,983 to 4,401 and the overlapped protein IDs for each biological sample across the technical replicates range from 543 to 1,062. The relative large variation is due to the inclusion of protein isoforms, which amplified the magnitude of variation, since the change of one protein hit could corresponds to the change of multiple protein isoforms.

42

The average of the protein IDs of multiple analyses within each biological sample is in the range of 1,270 to 2,352 with a standard deviation of 6% to 26% within each sample. The percentage of the overlapped protein IDs within each biological sample in the total protein IDs for each biological sample ranges from 17% to 32%, and the percentage of the overlapped protein IDs within each biological sample respected in the average protein counts of technical replicates for each biological sample is from 34% to

50%.

For the MCF-10A label-free samples, in Figure 2.27 and Figure 2.28, the total protein IDs for each biological sample falls in the range of 2,578 to 4,352, which is at the same level of the HeLa samples and the overlapped protein IDs for each biological sample across the technical replicates ranges from 1,387 to 1,979.

Comparing the results of the MCF samples to the HeLa samples, the average protein IDs of each technical analysis for the MCF-10A label-free samples (2,301~2,850) is higher than the average of the HeLa samples (1270~2352). This may be due to the fact that a longer gradient was used for the separation of the MCF label-free samples. The percentages of overlapped protein IDs in the total protein IDs for each biological sample for the MCF-10A label-free samples (34%~69%) are higher than the percentages for the

HeLa samples. The percentages of the overlapped protein IDs with respect to the average protein IDs of technical replicates for each MCF-10A label-free sample (55% to 71%) are

43

also higher than the percentages for the HeLa samples. This is due to the good reproducibility and lower number of technical replicates (two or three technical replicates) for MCF-10A label-free samples.

Figure 2.27 and Figure 2.30 show the variations among the biological samples within the same treatment. In Figure 2.27, the average protein IDs (1,464~1,941) in each

HeLa label-free technical analysis within each treatment have standard deviation around

21%~26%. The average total protein IDs for each biological sample within each treatment is 3,834 for the control samples, 2,534 for the 1h recovered samples and 3,104 for the 2h recovered samples, while the average overlapped protein IDs for each biological sample within treatment is 707, 610 and 856 respectively for each treatment.

The 1h recovered samples showed the lowest value among the treatments in average protein IDs in each analysis and in the total protein IDs for each biological sample. This may be caused by the inhibition of protein synthesis by the UV light while having shorter rover time than the 2h recovery treatment.

For the MCF-10A label-free samples in Figure 2.30, the average protein IDs for each technical analysis is about the same, around 2,443 to 2,658, with standard deviation around 6.8% to 14% within each treatment. The total identified proteins of the 1h recovery samples (3,193) are slightly lower than the one for the controls (3,719) and the

44

2h recovered samples (3,335). Similar behavior was observed on the average overlapped protein IDs.

For HeLa and MCF-10A TMT samples shown in Figure 2.31 and Figure 2.32, the total protein IDs for each biological sample is in the range of 1,889 to 2,390, and the overlapped protein IDs for each biological sample across the technical replicates range from 1,383 to 1,635.

The average of the protein IDs of multiple analyses within each TMT sample is in the range of 1,625 to 2,092, with a standard deviation of 5.5% to 6.7% within each sample. The relative small deviation may be due to the excellent reproducibility resulting from one batch analysis. Even with a very long gradient (400 min), the protein IDs for each TMT technical run being smaller than those of the label-free analysis may be due to the sample loss in the complex sample handling process of the TMT approach.

The percentage of the overlapped protein IDs within each TMT sample to the total protein IDs for each biological sample ranges from 62% to 73%. The percentage of the overlapped protein IDs within each TMT sample over the average protein IDs of technical replicates for each biological sample ranges from 78% to 85%. This high percentage may be due to the good reproducibility.

45

Figure 2.25 Comparison of protein IDs for each HeLa label-free sample with technical replicates.

46

Figure 2.26 Comparison of the average protein IDs in each technical analysis, total protein IDs for each biological sample, and the overlapped protein IDs for each biological sample for HeLa label-free samples.

47

Figure 2.27 Comparison of the average protein IDs in each technical analysis within each treatment, average of total protein IDs for each biological sample, and the overlapped protein IDs for each biological sample within each treatment for HeLa label-free samples.

48

Figure 2.28 Comparison of protein IDs for each MCF-10A label-free sample with technical replicates.

49

Figure 2.29 Comparison of the average protein IDs in each technical analysis, total protein IDs for each biological sample, and the overlapped protein IDs for each biological sample for the MCF-10A label-free samples.

50

Figure 2.30 Comparison of the average protein IDs in each technical analysis within each treatment, average of total protein IDs for each biological sample and the overlapped protein IDs for each biological sample within each treatment for MCF-10A label-free samples.

51

Figure 2.31 Comparison of protein IDs for each TMT sample with technical replicates.

Figure 2.32 Comparison of the average protein IDs in each technical analysis, total protein IDs, and the overlapped protein IDs of each sample for the TMT samples.

52

In the database search step, the peptide hits from each technical replicate were assembled collectively as one dataset and were searched against the target database for one biological sample. This increased the sensitivity of the experiments and allowed more low abundant proteins with less peptide hits to be identified. The collectively searched output could be used to represent each biological sample, which are compared by the venn diagrams in Figure 2.33~Figure 2.34, summarized as a bar plot in Figure

2.35.

For the HeLa label-free samples, Figure 2.33 and Figure 2.35 demonstrate that the protein IDs at 95% confidence interval for each biological sample range from 1,864 to

3,932 with a standard deviation of 13.5% to 35% within each treatment. The average protein IDs in the same treatment range from 2,242 to 3,261, and the total IDs counts for each treatment are 4,408, 3,266, and 4,315, for control, 1h recovered and 2h recovered samples respectively. The overlapped protein counts for each treatment are 2,200, 1,377, and 1,682, respectively.

For the MCF label-free samples, Figure 2.34 and Figure 2.35 demonstrate that the average protein counts at 95% confidence interval for each biological sample range from

2,771 to 3,866, with standard deviation of 3.6% to 15.6% within each treatment. The average protein IDs in the same treatment range from 2,987 to 3,306, and the total protein

IDs for each treatment are 4,665, 4,581, and 4,128, for control, 1h recovered and 2h

53

recovered samples, respectively. The overlapped protein counts for each treatment are

2,113, 2,131, and 1,980 respectively.

From Figure 2.35, we can also tell that the deviations of the collectively searched dataset for the MCF-10A label-free samples are smaller than the ones for the HeLa label- free samples, indicating better reproducibility for the analysis of the MCF-10A cell samples. These collectively searched datasets were used for quantification in a later step.

Figure 2.36 shows a comparison of the total protein counts of each treatment for the collectively searched HeLa and MCF-10A label-free data in venn diagrams. The total protein IDs is 5,352 for the HeLa label-free experiments and 5,910 for the MCF-10A label-free experiments. The overlapped protein IDs among the treatments is 2,803 for the

HeLa label-free experiments and 3,210 for the MCF-10A label-free experiments.

In Figure 2.37, three different TMT samples in total identified 3,161 proteins from the collectively searched datasets with 1,416 overlapped proteins in different samples.

Figure 2.38 shows that with 3,405 overlapped proteins, MCF-10A label-free experiments identified 500 more proteins than the HeLa label-free experiments, which was due to using the long gradient for MCF-10A label-free separation. For TMT analysis

54

with the same length of gradient and with 1,460 overlapped proteins, the HeLa experiments identified 139 more proteins than the MCF-10A experiments.

From Figure 2.39, we can see that with 2,145 overlapped proteins, the label-free experiments for the HeLa samples identified 2,728 more proteins than the TMT experiments. For the MCF-10A samples, the label-free experiments identified 3,866 more proteins than the TMT method with 1,886 overlapped proteins.

Figure 2.40 shows that with more technical replicate analysis, the number of identified proteins will increase drastically from about 1,000 to around 1,400 after three analyses and then gradually increase until reaching a plateau at about 1,600 around the tenth analysis, indicating that more than three multiple analyses for the same sample are preferred in the label-free experiments. This observation agrees with the report in the literature in 2007 [43].

55

Figure 2.33 Comparison of protein IDs of a collectively searched dataset for HeLa label- free biological replicates receiving the same treatment.

Figure 2.34 Comparison of protein IDs of a collectively searched dataset for MCF label- free biological replicates receiving the same treatment.

56

Figure 2.35 Comparison of the average protein IDs of the HeLa and MCF label-free biological samples for each treatment, total protein IDs, and the overlapped protein IDs for each treatment.

57

Figure 2.36 Comparison of protein IDs for the HeLa and MCF label-free samples with different treatments.

Figure 2.37 Comparison of protein IDs of collectively searched dataset for different TMT samples.

58

Figure 2.38 Comparison of the total protein IDs between the HeLa and MCF-10A samples with the label-free approach and the TMT approach.

Figure 2.39 Comparison of the total protein IDs between the label-free approach and the TMT approach for the HeLa and MCF-10A samples.

59

Figure 2.40 The number of proteins identified vs, technical replicates of the control samples for Hela cells.

2.3.4 Data Reduction

All of the collectively searched datasets are aligned according to the Protein IPI number on a spreadsheet for easy comparison across the datasets. Raw SCs are normalized across experiments to correct the differences in total SC. Previous research pointed out that even under the most carefully controlled conditions, the variation of the total SC between samples can be very large [60]. Standard deviations of up to 50% or more between biological samples were observed in the previous study [43]. Figure 2.41 shows the standard deviation for spectral counts, normalized by the number of technical replicates, for each biological replicate are 26.23% for the HeLa cell label-free experiments and 28.44% for the MCF-10A cell label-free experiments.

60

Figure 2.41 Number of spectral counts normalized by number of technical replicates for label-free experiments.

Figure 2.42 and Figure 2.43 shows the histograms of raw spectral counts of each biological sample for label-free experiments. The distribution of the spectral counts skewed toward the low spectral counts. From the figure, we can see the spectral counts for most proteins identified are below 40; the identified proteins with spectral counts larger than 50 are quite rare.

Figure 2.44 are histograms of the ion abundance of each biological sample in

TMT labeling experiments. The distribution of the TMT ion abundance is similar to normal distribution more than in Figure 2.42 and Figure 2.43. The range of ion abundance is between 0.1 and 0.3.

61

Figure 2.42 Histograms of raw spectral counts for each biological sample for HeLa cell label-free experiments.

62

Figure 2.43 Histograms of raw spectral counts for each biological sample for MCF-10A cell label-free experiments.

63

Figure 2.44 Histograms of ion abundance of each biological sample in TMT labeling experiments.

64

In Figure 2.45, the correlations of raw spectra counts for different biological samples within the same treatment for the HeLa cell label-free experiments fall in the range of 0.889 and 0.967. And in Figure 2.46, the correlations of raw spectra counts for different biological samples within the same treatment for the MCF-10A cell label-free experiments fall between 0.951 and 0.969. The points of correlation in the scatter plots of the log-transformed spectral counts of one biological sample versus another are mostly distributed symmetrically around the identity line. The only obvious deviation is for the third biological sample of the 2h treatment for the HeLa cell samples. This agrees with

Figure 2.41 that this sample has the highest number of spectra counts, which is significantly different from the other samples. Figure 2.47 and Figure 2.48 show a strong correlation of the raw spectral counts for the biological samples with different treatments for the HeLa cell and the MCF-10A cell label-free samples, which indicate that the global protein digests are similar in general, regardless of the treatments. Figure 2.49 demonstrates that the correlations of average raw spectral counts for the same treatment between the HeLa and MCF-10A cell label-free experiments are much weaker than those for the same cell line. In Figure 2.50, the scatter plots of ion abundance for the biological samples with the same treatment for the HeLa cell and MCF cell TMT labeling experiments tend to converge to one point, which indicates an obvious correlation.

65

Figure 2.45 Correlations of raw spectra counts for different biological samples within the same treatment for the HeLa cell label-free experiments. (Continued)

66

(Figure 2.45 continued)

(Continued) 67

(Figure 2.45 continued)

68

Figure 2.46 Correlations of raw spectra counts for different biological samples within the same treatment for the MCF-10A cell label-free experiments. (Continued)

69

(Figure 2.46 continued)

(Continued)

70

(Figure 2.46 continued)

71

Figure 2.47 Correlation of raw spectra counts for the biological samples with different treatments for the HeLa cell label-free experiments.

72

Figure 2.48 Correlations of raw spectra counts for the biological samples with different treatments for the MCF-10A cell label-free experiments.

73

Figure 2.49 Correlations of the average raw spectral counts for the HeLa and MCF-10A cell label-free biological samples receiving the same treatment. 74

Figure 2.50 Correlations of ion abundance for the biological samples with the same treatment for the HeLa cell TMT labeling experiments.

75

Figure 2.51 Correlations of ion abundance for the biological samples with the same treatment for the MCF-10A cell TMT labeling experiments.

76

2.3.5 Normalization of label-free spectral count data

The total spectral count between samples can be large; therefore, it is advisable to transform raw SCs to relative abundance through normalizing the spectral counts of each protein in one fraction relative to the total spectral count in the corresponding fraction

[43]. The basic normalization method follows the literature in 2006 by dividing the protein spectral count in a particular experiment by the average spectral count across all of the proteins in that experiment [28], to have the same global average counts across all experiments. In this way, the differences in sample loads can be corrected [61]. The choice of reference analysis is generally arbitrary, and the results are independent of the chosen file.

Some low abundant proteins may have zero value in the spectral counts and may display the most dramatic changes in abundance. In order to avoid invalid values in the mathematical handling process, a common practice is to replace the zero values with an arbitrarily low number of spectral counts (i.e., 0.5 in this project). To ensure reliability, proteins were cut off if the total spectral counts for that protein across the experiments were less than 10.

77

2.3.6 Analysis with spectral counting method

After the spectral counts were normalized, the average of the spectral counts of every protein for three biological replicates in each treatment was taken. Previous research shows that the t-test outperformed other comparison statistic methods [28].

Therefore, the t-test was applied in the current study.

The Ratio of Spectral counts between different treatments (RSC) is calculated as follows:

The p-values were calculated for each protein across the biological replicates, which could be used for selecting data points below 0.05.

The RSC and p-values were obtained for both raw spectral counts and normalized spectral counts. The volcano plots of –log (p-value) vs. log2RSC are displayed in Figure

78

2.52 ~Figure 2.55. The red highlighted points indicated the proteins with at least a two- fold change between treatments at the confidence interval of 95%. Figure 2.52 and Figure

2.54 are volcano plots for raw spectral counts of the HeLa and MCF-10A cells. Figure

2.52 clearly displays unbalanced shapes compared to Figure 2.53 for the normalized spectral counts of the HeLa cells, due to variations of sample load and the HPLC condition. Figure 2.54 shows more balanced shapes compared to Figure 2.55 of normalized spectral counts for MCF-10A cells, due to better reproducibility. Comparing the volcano plots for raw spectral counts and normalized spectral counts shows that normalization is required; otherwise, there will be false detection.

Figure 2.52 Volcano plots for raw spectral counts of the HeLa label-free samples.

79

Figure 2.53 Volcano plots for normalized spectral counts of the HeLa label-free samples.

Figure 2.54 Volcano plots for raw spectra counts of the MCF label-free samples.

80

Figure 2.55 Volcano plots for normalized spectral counts of the MCF label-free samples.

2.3.7 Analysis by use of Spectral Index (SI)

The Spectral Index (SI) method was developed by Fu et al. in 2007. It combines relative protein abundance by spectral counting and the number of replicates in a group that have detectable peptides for a given protein, which was claimed to outperform other statistical tests in sensitivity by correctly identifying the largest number of differentially expressed proteins [29].

The SI for each protein is calculated using the following equation:

is the mean spectral count for a given protein among treated samples;

is the mean spectral count for a given protein among control samples;

81

and are the numbers of treated samples and control samples in which the protein of interest is detected;

and are the total number of treated and control subjects.

Figure 2.56 and Figure 2.57 show the volcano plots of SI for the normalized spectral counts of the HeLa cell and the MCF-10A cell label-free samples. The x-axis represents SIs, and the y-axis represents the p-value obtained from the t-test of normalized spectral counts between treatments. A cutoff at 1.301 of -log(p-value) was used to select the data at 95% confidence interval. Fu et al. selected the proteins with |SI|

0.75 at 99% confidence interval from the permutation test as potential biomarkers for cystic fibrosis. Following the literature, the green lines on the x axis at 0.75 in the figures indicate the approximated 99% confidence interval of the permutation test. However, at this cutoff point, almost all of the data points are excluded for this project. Therefore, the

95% confidence interval of SI will be used as the primary analysis of potential protein biomarkers. Figure 2.58 and Figure 2.59 show the distribution of SI for the HeLa and

MCF cell label-free analyses.

The permutation tests were performed for each comparison pairwise. For the SI of 2h recovered and control data, SI ≤ -0.49 and SI ≥ 0.38 is in the range of 95% confidence interval. For 1h recovered and control samples, SI ≤ -0.56 and SI ≥ 0.41 is at

82

95% confidence interval. For 2h recovered and 1h recovered samples, SI ≤ -0.60 and SI

≥ 0.58 is at 95% confidence interval.

Figure 2.56 Volcano plots of SI for normalized spectral counts of HeLa label-free samples.

Figure 2.57 Volcano plots of SI for normalized spectral counts of MCF label-free samples.

83

Figure 2.58 Distribution of SI for HeLa cell label-free analysis.

Figure 2.59 Distribution of SI for MCF cell label-free analysis.

84

2.3.8 Analysis by use of normalized spectral abundance factors

In previous studies, it was observed that proteins with lower molecular weight produce fewer spectra than proteins with higher molecular weight [30, 31], which could cause the bias in quantifying the proteins with different molecular weights. To correct this problem, Washburn’s group developed the normalized spectral abundance factor

(NSAF), which is to divide SC for each protein by protein sequence length, which is then normalized by total SC of each analysis [33-35, 62].

For this project, protein lengths were obtained by using the International Protein

Index (IPI) from the online database [63]. The NSAF conversion from SC follows the following equation.

SC is the spectra counts for protein k;

L is the protein length for protein k in the unit of AA;

In the paper published by Zybailov et al. in 2006, each NSAF was transformed to ln(NSAF), then analyzed by running a t-test [33]. This transformation is performed because the histogram of nature log values is more similar to normal distribution than is the distribution of non-log transformed NSAF, which is evidenced in Figure 2.59 and

Figure 2.60. Normal distribution is assumed in the t-test; therefore, the p-values calculated from the t-test of log2(NSAF) values can more accurately describe the dataset. 85

Figure 2.59 Histograms of NSAF of each biological HeLa cell sample in label-free experiments.

86

Figure 2.60 Histograms of log2 transformed NSAF of each biological HeLa cell sample in label-free experiments.

To avoid zero errors in the natural log transformation, zero spectral count values were replaced with 0.16. The analysis here follows two strategies, with log2

87

transformation and without log2 transformation. The volcano plots for NSAF were demonstrated in Figure 2.61 through Figure 2.64 for both HeLa and MCF cell label-free analyses. The red highlighted points represent the proteins with more than twofold change at 95% confidence interval. In the protein selection step, primary analysis showed that cutting off with the p-value from the t-test of log2 (NSAF) provided more valid results than the p-value from the t-test of NSAF. Therefore, the former p-value will be used to avoid the possible loss of potential protein biomarkers in the selection.

Figure 2.61 Volcano plots of NSAF with log2 transformation for the HeLa label-free samples.

88

Figure 2.62 Volcano plots of NSAF without log2 transformation for the HeLa label-free samples.

Figure 2.63 Volcano plots of NSAF with log2 transformation for the MCF label-free samples.

89

Figure 2.64 Volcano plots of NSAF without log2 transformation for the MCF label-free samples.

2.3.9 Normalization for TMT data

As shown in Figure 2.5, each TMT sample was prepared by mixing six biological samples in equal amounts. To correct the differences of mixed biological sample amounts for each TMT sample, a median normalization was applied by dividing the ion abundance to the median ion abundance of an arbitrarily selected biological sample. The calculation of normalized ion abundance for protein k in a biological sample is shown as follows:

90

2.3.10 Analysis of the TMT data

After median normalization, the ratios of averaged ion abundance for each protein between treatments were calculated as follows:

The distribution of ratios is demonstrated in Figure 2.65 and Figure 2.66. The proteins with at least a twofold change will be considered for potential biomarkers.

91

Figure 2.65 Distribution of ratios of averaged normalized ion intensities between treatments for HeLa TMT analysis.

92

Figure 2.66 Distribution of ratios of averaged normalized ion intensities between treatments for MCF TMT analysis.

2.3.11 Correlation of label-free spectral counting data and TMT data

Figure 2.67 and Figure 2.68 show the scatter plots of the ratios of protein abundances from replicate samples of the label-free normalized spectra counts/NSAF and

TMT analyses. The data are clustered around the ideal ratio of 1:1, indicating excellent agreement between TMT and label-free quantification results.

93

Figure 2.67 Scatter plots of the ratios of protein abundances from replicate samples of the label-free normalized spectra counts and TMT analyses.

94

Figure 2.68 Scatter plots of the ratios of protein abundances from replicate samples of the label-free NSAF and TMT analyses.

2.3.12 Cluster analysis

In order to ascertain if changes in protein response to UV-induced DNA damage are correlated with time of recovery, hierarchical cluster analyses were performed on the biological replicates for normalized SC, NSAF, and normalized TMT ion abundance of the HeLa and MCF-10A samples. Only proteins with a p-value < 0.10 were used. The heat maps are shown in

95

Figure 2.69~Figure 2.71. The cluster analysis showed that the normalized SC/NSAF and normalized TMT ion abundance results for the majority of the controls and the 1h recovery samples were present in one cluster and the 2h recovery samples, in the other.

96

Figure 2.69 Heat maps from the cluster analysis of the normalized SC for both the HeLa cell and MCF-10A cell label-free experiments.

97

Figure 2.70 Heat maps from the cluster analysis of the NSAF for both the HeLa cell and MCF-10A cell label-free experiments.

98

Figure 2.71 Heat maps from the cluster analysis of the normalized TMT ion abundance for both the HeLa cell and MCF-10A cell TMT labeling experiments.

99

2.3.13 Potential biomarker selection

The initial criterion was to select the proteins with at least twofold change of

NSAF. The confidence interval of 95% was chosen with the p-value calculated using the t-test of log2(NSAF). For each protein that meets this criterion, we plotted and compared raw SC, normalized SC, SAF, NSAF, TMT ion abundance, normalized TMT ion abundance, ratio of normalized SC, ratio of NSAF, SI, and ratio of normalized TMT ion abundance. A positive selection was made when the patterns of raw SC, normalized SC,

SAF, and NSAF agreed with the normalized TMT ion abundance. Spectral Index was also checked; however, it was not restrained to a 95% confidence interval, since the calculation of SI did not take into consideration the molecular weight. We observed that about 80% of selected proteins have SI with a 95% confidence interval.

Figure 2.72 shows an example of a possible selection of protein markers and

Figure 2.73 illustrates an example of a discarded selection. In Figure 2.72, the protein abundance in the 1h recovered sample is higher than in both the control sample and 2h recovered sample shown in the bar plots of normalized SC, NSAF, and normalized TMT ion abundance. This indicates protein up-regulation within the first hour after UV radiation treatment, and then followed by down-regulation after one hour.

100

In Figure 2.73, the bar plot of normalized TMT ion abundance shows the opposite protein abundance changes of the bar plots of normalized SC and NSAF, which is the reason to discard the selection of that protein.

101

101

Figure 2.72 Bar plots of raw SC, normalized SC, SAF, NSAF, TMT ion abundance, normalized TMT ion abundance, ratio of normalized SC, ratio of NSAF, SI, and ratio of normalized TMT ion abundance for protein IPI00001159.1 of the HeLa samples.

102

102

Figure 2.73 Bar plots of raw SC, normalized SC, SAF, NSAF, TMT ion abundance, normalized TMT ion abundance, ratio of normalized SC, ratio of NSAF, SI, and ratio of normalized TMT ion abundance for protein IPI00219169.3 of the MCF samples.

103

The selected potential protein biomarkers for the HeLa and MCF-10A cell samples are summarized in Table 2.1 and Table 2.2. About 60 potential protein biomarkers are listed here for HeLa cell samples and around 30 for MCF-10A cell samples. There is barely any overlap of the two sets of biomarkers of the two cell types, which may indicate that the two types of cells follow different DNA damage repair mechanisms. We also found that for the same protein, the response to the UV radiation in different cells can be different. The Ratios and p-values for each protein are listed in

Appendix B

The average spectra counts of the proteins listed above is between 3 to 50, and about 80% of them have less than 10 spectral counts, indicating that most potential biomarkers are at very low abundance. We observed that the TMT approach identified 50% fewer proteins than the label-free approach as shown in Figure 2.39. Therefore, for many selected proteins in the low concentration range, the bar plot of normalized TMT ion abundance may show a similar pattern to the bar plot of NSAF. However, most of the

TMT ion abundances do not show the same fold change magnitude; they usually show much fewer changes. The low sensitivity of TMT could be due to the sample loss of the low abundant proteins in the complicated TMT sample cleaning up process, low labeling efficiency, and low fragmentation efficiency of PQD, etc. Thus, the absolute magnitude of fold changes in the TMT approach was not used for the primary selection of low abundant proteins.

104

Table 2.1 Potential protein biomarkers for HeLa cell samples.

ID Description IPI00021812.2 AHNAK Neuroblast differentiation-associated protein AHNAK IPI00303476.1 ATP5B ATP synthase subunit beta; mitochondrial IPI00216308.5 VDAC1 Voltage-dependent anion-selective channel protein 1 IPI00220301.5 PRDX6 Peroxiredoxin-6 IPI00872379.1 ANXA5 Putative uncharacterized protein ANXA5 (Fragment) IPI00299573.1 RPL7A 60S ribosomal protein L7a IPI00794211.1 UBC;RPS27A;UBB 18 kDa protein IPI00797082.1 10 kDa protein IPI00215914.5 ARF1 ADP-ribosylation factor 1 IPI00215917.3 ARF3 ADP-ribosylation factor 3 IPI00930263.1 ARF1 cDNA FLJ61099; highly similar to ADP-ribosylation factor 1 IPI00008524.1 H-PABPC1 Isoform 1 of Polyadenylate-binding protein 1 IPI00886833.1 LOC100129958 similar to hCG1643231 IPI00556482.1 HSP90B3P Heat shock protein 94c IPI00555876.1 HSP90AA5P Putative heat shock protein HSP 90-alpha A5 IPI00937169.1 LOC100293160 similar to TRIMCyp IPI00888053.1 LOC100129958 similar to hCG1643231 IPI00853059.2 FUBP1 Isoform 2 of Far upstream element-binding protein 1 IPI00465233.1 EIF3L Eukaryotic translation initiation factor 3; subunit E interacting protein IPI00219219.3 LGALS1 Galectin-1 IPI00477663.1 RTN4 Isoform 4 of Reticulon-4 IPI00001159.1 GCN1L1 Translational activator GCN1 IPI00004860.2 RARS Isoform Complexed of Arginyl-tRNA synthetase; cytoplasmic IPI00012750.3 RPS25 40S ribosomal protein S25 IPI00301154.3 PABPC3 Polyadenylate-binding protein 3 IPI00008167.1 ATP1B3 Sodium/potassium-transporting ATPase subunit beta-3 IPI00021805.1 MGST1 Microsomal glutathione S-transferase 1 IPI00556364.1 ILF3 Interleukin enhancer binding factor 3 isoform c variant (Fragment) IPI00910980.1 IARS IARS protein IPI00903251.2 14 kDa protein IPI00644127.2 IARS Isoleucyl-tRNA synthetase; cytoplasmic (Continued)

105

(Table 2.1 continued)

IPI00027834.3 HNRNPL Heterogeneous nuclear ribonucleoprotein L IPI00018349.5 MCM4 DNA replication licensing factor MCM4 IPI00795318.2 MCM4 cDNA FLJ54365; highly similar to DNA replication licensing factor MCM4 IPI00007074.5 YARS Tyrosyl-tRNA synthetase; cytoplasmic IPI00328840.9 THOC4 THO complex 4 IPI00943181.1 PSME2 29 kDa protein IPI00297579.4 CBX3 Chromobox protein homolog 3 IPI00216694.3 PLS3 Plastin-3 IPI00216951.2 DARS Aspartyl-tRNA synthetase; cytoplasmic IPI00940656.1 ANP32A;LOC723972 Putative uncharacterized protein ANP32A IPI00917509.1 CBX3 Putative uncharacterized protein CBX3 IPI00946353.1 ATP1B3 18 kDa protein IPI00219148.2 CSDA Isoform 3 of DNA-binding protein A IPI00938044.1 LOC100291405 hypothetical protein XP_002347581 IPI00555698.1 CSDA CSDA protein variant (Fragment) IPI00026271.5 RPS14 40S ribosomal protein S14 IPI00479480.5 18 kDa protein IPI00910662.1 DNM1L cDNA FLJ59504; highly similar to Dynamin-1-like protein IPI00414860.6 RPL37A 60S ribosomal protein L37a IPI00555747.1 PABPC4 Isoform 2 of Polyadenylate-binding protein 4 IPI00915917.1 MTAP 5 kDa protein IPI00218831.4 GSTM1 Isoform 1 of Glutathione S-transferase Mu 1 IPI00640363.1 GSTM1 Putative uncharacterized protein GSTM1 IPI00291005.8 MDH1 Malate dehydrogenase; cytoplasmic IPI00792715.1 ENO2 37 kDa protein IPI00945930.1 PLS1 29 kDa protein IPI00909251.1 DDB1 cDNA FLJ51165; highly similar to DNA damage-binding protein 1 IPI00759613.2 TTN titin isoform N2-A IPI00375499.2 TTN titin isoform novex-2 IPI00797082.1 10 kDa protein IPI00644127.2 Isoleucyl-tRNA synthetase; cytoplasmic IPI00784704.2 FANCI Isoform 2 of Fanconi anemia group I protein IPI00021347.1 UBE2L3 Ubiquitin-conjugating enzyme E2 L3

106

Table 2.2 Potential protein biomarkers for MCF-10A cell samples.

ID Description IPI00021812.2 AHNAK Neuroblast differentiation-associated protein AHNAK IPI00795292.1 NME2;NME1-NME2 Isoform 3 of Nucleoside diphosphate kinase B HNRNPA2B1 Isoform B1 of Heterogeneous nuclear ribonucleoproteins IPI00396378.3 A2/B1 IPI00169383.3 PGK1 Phosphoglycerate kinase 1 IPI00872379.1 ANXA5 Putative uncharacterized protein ANXA5 (Fragment) IPI00302925.4 CCT8 59 kDa protein IPI00555744.6 RPL14 Ribosomal protein L14 variant IPI00759596.1 HNRNPC Isoform 4 of Heterogeneous nuclear ribonucleoproteins C1/C2 ESYT1 cDNA FLJ46898 fis; clone UTERU3022168; highly similar to IPI00902463.1 Protein FAM62A IPI00789337.3 YWHAZ cDNA FLJ51775; highly similar to 14-3-3 protein zeta/delta IPI00027569.1 HNRNPCL1 Heterogeneous nuclear ribonucleoprotein C-like 1 cDNA FLJ52993; highly similar to Heterogeneous nuclear IPI00910666.1 ribonucleoprotein C IPI00013452.1 EPRS Bifunctional aminoacyl-tRNA synthetase IPI00853059.2 FUBP1 Isoform 2 of Far upstream element-binding protein 1 IPI00025842.1 RPS10P5 Putative 40S ribosomal protein S10-like protein IPI00456758.4 RPL27A 60S ribosomal protein L27a IPI00031169.1 RAB2A Ras-related protein Rab-2A IPI00877802.2 RBBP4 retinoblastoma binding protein 4 isoform c IPI00645329.1 RBBP4 46 kDa protein IPI00873632.1 RAB2A 24 kDa protein PDHA1 cDNA FLJ52314; highly similar to Pyruvate dehydrogenase E1 IPI00642732.2 component alpha subunit; somatic form; mitochondrial IPI00642880.2 BAT1 HLA-B associated transcript 1 IPI00798089.1 RAB2A 21 kDa protein IPI00794027.1 RAB2A Protein IPI00107531.1 RAD50 Isoform 3 of DNA repair protein RAD50 IPI00292387.6 NOLC1 Isoform Alpha of Nucleolar phosphoprotein p130 cDNA FLJ52993; highly similar to Heterogeneous nuclear IPI00910666.1 ribonucleoprotein C

107

Among the selected proteins, we observed that some are related to DNA damage repair, such as DDB1, Pyruvate dehydrogenase E1, FANCI Isoform 2, RAD50 Isoform 3 and UBE2L3 Ubiquitin-conjugating enzyme E2 L3. Previous studies have noted that

BRCA1/2, E3, and Ubiquintin-related proteins play significant roles in DNA damage repair [50, 64-66]. In this study, we identified these proteins by a mass spectrometer with low spectral counts, which did not yield very reliable quantification results. Thus to study the proteins mentioned above in the DNA damage repair mechanism with mass spectrometry technology requires further enrichment of the above proteins, and more power of separation, such as MudPIT [67] or off gel electrophoresis [68] prior to the mass spectrometry analysis.

108

2.4 Conclusion

In this chapter, we developed label-free spectral counting and TMT labeling methodologies to quantitatively analyze the global protein response to UV radiation of

HeLa and MCF-10A cells. A series of statistical tests was performed to evaluate the reproducibility and correlation of LC-MS/MS results. PCA analysis revealed that the day- to-day instrument condition changes may contribute to the chromatographic variations.

We also observed that performing an analysis with longer gradient and more technical replicates will help to increase the number of proteins identified.

A few modified spectral counting approaches, such as normalized spectral counting, NSAF, and SI were applied to the analysis of label-free data sets. The outcome turned out to correlate with the results of the TMT labeling approach. However, the label- free approach identified over 50% proteins more than the TMT labeling approach. And for the proteins identified in both spectral counting and TMT methods, the absolute fold change magnitude in spectral counting is more significant than the TMT approach.

However, both of the methods faced difficulty in quantifying some-ultra low abundant proteins and the reliability of quantification was significantly reduced for the proteins with very low spectral counts.

Normalization turned out to be important for both label-free and TMT labeling methods. For the label-free method, the NSAF seems a better option, since it considers

109

the bias caused by the protein length. For the TMT labeling method, the number of samples that could be compared at the same time limited the statistical analysis.

According to recent literature, more work could be done regarding normalization with more advanced statistical tools. For example, Griffin et al. suggested to not only apply total SC and protein length normalization, but to also consider the fragment ion abundance for each peptide ion, which actually combined the two main streams of labeling free approaches into one method [37]. However, this approach needs sophisticated software support, which can be explored in the future.

A list of potential biomarkers that may be related to DNA damage repair was proposed based on fold changes and statistical confidence interval. The proteins with less than three average spectral counts were filtered out for better reliability. Unfortunately, some well-known biomarkers fall below the range of filtering, which shows the limitation of the shotgun proteomics strategy for global protein quantifications. Shotgun proteomics is a powerful tool for protein biomarker discovery to investigate the global abundance changes of proteins under biological perturbation, while sample enrichment and greater power of separation sometimes are necessary to improve the sensitivity of identification and reliability of the quantification of low abundant proteins.

110

CHAPTER 3

SUMMARY

Stable isotope labeling and label-free quantification approaches are two major

LC-MS quantification strategies for differential protein detection in biological systems.

Both methods have advantages and limitations.

In Chapter 2, both label-free spectral counting and TMT labeling approaches were used to study protein abundance changes for human cells under the treatment of UV radiation. Our correlation study shows that the two methods output similar results in general, while the label-free spectral counting approach indentified 50% more proteins than the TMT labeling approach. The former method also showed better sensitivity and larger dynamic range. Both of the methods face challenges to quantify abundant changes of proteins at low concentrations. Using statistical analysis, potential protein biomarkers were selected by the spectral counting method and verified by the TMT method. Some of these biomarkers may provide useful information in the study of DNA damage repair. 111

REFERENCES

1. Canas, B., et al., Mass spectrometry technologies for proteomics. Briefings in Functional Genomics and Proteomics, 2006. 4(4): p. 295-320.

2. Domon, B. and R. Aebersold, Mass spectrometry and protein analysis. Science, 2006. 312(5771): p. 212-7.

3. Kolker, E., R. Higdon, and J.M. Hogan, Protein identification and expression analysis using mass spectrometry. Trends Microbiol, 2006. 14(5): p. 229-35.

4. Nesvizhskii, A.I., O. Vitek, and R. Aebersold, Analysis and validation of proteomic data generated by . Nat Methods, 2007. 4(10): p. 787-97.

5. Ong, S.E. and M. Mann, Mass spectrometry-based proteomics turns quantitative. Nat Chem Biol, 2005. 1(5): p. 252-62.

6. Plumb, R.S., et al., Generation of Ultrahigh Peak Capacity LC Separations via Elevated Temperatures and High Linear Mobile-Phase Velocities. , 2006. 78(20): p. 7278-83.

7. Ranish, J.A., et al., The study of macromolecular complexes by . Nat Genet, 2003. 33(3): p. 349-55.

8. Yan, W. and S.S. Chen, Mass spectrometry-based quantitative proteomic profiling. Brief Funct Genomic Proteomic, 2005. 4(1): p. 27-38.

9. Yates, J.R., C.I. Ruse, and A. Nakorchevsky, Proteomics by mass spectrometry: approaches, advances, and applications. Annu Rev Biomed Eng, 2009. 11: p. 49- 79.

112

10. Everley, P.A., et al., Quantitative Cancer Proteomics: Stable Isotope Labeling with Amino Acids in Cell Culture (SILAC) as a Tool for Prostate Cancer Research. Molecular & Cellular Proteomics, 2004. 3(7): p. 729-35.

11. Ong, S.-E., et al., Stable Isotope Labeling by Amino Acids in Cell Culture, SILAC, as a Simple and Accurate Approach to Expression Proteomics. Molecular & Cellular Proteomics, 2002. 1(5): p. 376-86.

12. Zang, L., et al., Proteomic Analysis of Ductal Carcinoma of the Breast Using Laser Capture Microdissection, LC-MS, and 16O/18O Isotopic Labeling. Journal of Proteome Research, 2004. 3(3): p. 604-12.

13. Brown, K.J. and C. Fenselau, Investigation of Doxorubicin Resistance in MCF-7 Breast Cancer Cells Using Shot-Gun Comparative Proteomics with Proteolytic 18O Labeling. Journal of Proteome Research. 3(3): p. 455-62.

14. Qian, W.-J., et al., Quantitative Proteome Analysis of Human Plasma following in Vivo Lipopolysaccharide Administration Using 16O/18O Labeling and the Accurate Mass and Time Tag Approach. Molecular & Cellular Proteomics, 2005. 4(5): p. 700-9.

15. Smolka, M.B., et al., Optimization of the isotope-coded affinity tag-labeling procedure for quantitative proteome analysis. Anal Biochem, 2001. 297(1): p. 25- 31.

16. Han, D.K., et al., Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry. Nat Biotech, 2001. 19(10): p. 946-51.

17. Choe, L., et al., 8-plex quantitation of changes in cerebrospinal fluid protein expression in subjects undergoing intravenous immunoglobulin treatment for Alzheimer's disease. Proteomics, 2007. 7(20): p. 3651-60.

18. Dayon, L., et al., Relative Quantification of Proteins in Human Cerebrospinal Fluids by MS/MS Using 6-Plex Isobaric Tags. Analytical Chemistry. 80(8): p. 2921-31.

19. Hendrickson, E.L., et al., Comparison of spectral counting and metabolic stable isotope labeling for use with quantitative microbial proteomics. The Analyst. 131(12): p. 1335-41.

20. Usaite, R., et al., Characterization of Global Yeast Quantitative Proteome Data Generated from the Wild-Type and Glucose Repression Saccharomyces cerevisiae 113

Strains: The Comparison of Two Quantitative Methods. Journal of Proteome Research, 2008. 7(1): p. 266-75.

21. Zhu, W., J.W. Smith, and C.M. Huang, Mass spectrometry-based label-free quantitative proteomics. J Biomed Biotechnol. 2010: p. 840518.

22. Wang, M., et al., Label-free mass spectrometry-based protein quantification technologies in proteomic analysis. Brief Funct Genomic Proteomic, 2008. 7(5): p. 329-39.

23. Voyksner, R.D. and H. Lee, Investigating the use of an octupole ion guide for ion storage and high-pass mass filtering to improve the quantitative performance of electrospray ion trap mass spectrometry. Rapid Communications in Mass Spectrometry. 13(14): p. 1427-37.

24. Chelius, D. and P.V. Bondarenko, Quantitative Profiling of Proteins in Complex Mixtures Using Liquid Chromatography and Mass Spectrometry. Journal of Proteome Research, 2002. 1(4): p. 317-23.

25. Liu, H., R.G. Sadygov, and J.R. Yates, 3rd, A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal Chem, 2004. 76(14): p. 4193-201.

26. Zybailov, B., et al., Correlation of Relative Abundance Ratios Derived from Peptide Ion Chromatograms and Spectrum Counting for Quantitative Proteomic Analysis Using Stable Isotope Labeling. Analytical Chemistry. 77(19): p. 6218- 24.

27. Dong, M.-Q., et al., Quantitative Mass Spectrometry Identifies Insulin Signaling Targets in C. elegans. Science, 2007. 317(5838): p. 660-63.

28. Zhang, B., et al., Detecting differential and correlated protein expression in label- free shotgun proteomics. J Proteome Res, 2006. 5(11): p. 2909-18.

29. Fu, X., et al., Spectral index for assessment of differential protein expression in shotgun proteomics. J Proteome Res, 2008. 7(3): p. 845-54.

30. Cox, B., T. Kislinger, and A. Emili, Integrating gene and protein expression data: pattern analysis and profile mining. Methods, 2005. 35(3): p. 303-14.

31. Gramolini, A.O., et al., Comparative proteomics profiling of a phospholamban mutant mouse model of dilated cardiomyopathy reveals progressive intracellular stress responses. Mol Cell Proteomics, 2008. 7(3): p. 519-33.

114

32. Sanders, S.L., et al., Proteomics of the Eukaryotic Transcription Machinery: Identification of Proteins Associated with Components of Yeast TFIID by Multidimensional Mass Spectrometry. Mol. Cell. Biol., 2002. 22(13): p. 4723-38.

33. Zybailov, B., et al., Statistical analysis of membrane proteome expression changes in Saccharomyces cerevisiae. J Proteome Res, 2006. 5(9): p. 2339-47.

34. Paoletti, A.C., et al., Quantitative Proteomic Analysis of Distinct Mammalian Mediator Complexes Using Normalized Spectral Abundance Factors. Proceedings of the National Academy of Sciences of the United States of America, 2006. 103(50): p. 18928-33.

35. Zybailov, B.L., L. Florens, and M.P. Washburn, Quantitative shotgun proteomics using a protease with broad specificity and normalized spectral abundance factors. Molecular BioSystems, 2007. 3(5): p. 354-60.

36. Wu, L., et al., Global Survey of Human T Leukemic Cells by Integrating Proteomics and Transcriptomics Profiling. Molecular & Cellular Proteomics, 2007. 6(8): p. 1343-53.

37. Griffin, N.M., et al., Label-free, normalized quantification of complex mass spectrometry data for proteomic analysis. Nat Biotechnol, 2010. 28(1): p. 83-9.

38. Han, M.H., et al., Proteomic analysis of active multiple sclerosis lesions reveals therapeutic targets. Nature, 2008. 451(7182): p. 1076-81.

39. Burande, C.F., et al., A Label-free Quantitative Proteomics Strategy to Identify E3 Ubiquitin Ligase Substrates Targeted to Proteasome Degradation. Molecular & Cellular Proteomics, 2009. 8(7): p. 1719-27.

40. Qian, M., et al., Proteomics Analysis of Serum from Mutant Mice Reveals Lysosomal Proteins Selectively Transported by Each of the Two Mannose 6- Phosphate Receptors. Molecular & Cellular Proteomics, 2008. 7(1): p. 58-70.

41. Gramolini, A.O., et al., Comparative Proteomics Profiling of a Phospholamban Mutant Mouse Model of Dilated Cardiomyopathy Reveals Progressive Intracellular Stress Responses. Molecular & Cellular Proteomics, 2008. 7(3): p. 519-33.

42. Mayya, V., et al., Quantitative Phosphoproteomic Analysis of T Cell Receptor Signaling Reveals System-Wide Modulation of Protein-Protein Interactions. Sci. Signal., 2009. 2(84): p. ra46-.

115

43. Wu, L., et al., Global survey of human T leukemic cells by integrating proteomics and transcriptomics profiling. Mol Cell Proteomics, 2007. 6(8): p. 1343-53.

44. Kislinger, T., et al., Global Survey of Organ and Organelle Protein Expression in Mouse: Combined Proteomic and Transcriptomic Profiling. Cell. 125(1): p. 173- 86.

45. Gilchrist, A., et al., Quantitative Proteomics Analysis of the Secretory Pathway. Cell. 127(6): p. 1265-81.

46. Liu, H., R.G. Sadygov, and J.R. Yates, A Model for Random Sampling and Estimation of Relative Protein Abundance in Shotgun Proteomics. Analytical Chemistry, 2004. 76(14): p. 4193-201.

47. Lu, P., et al., Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nat Biotech, 2007. 25(1): p. 117-24.

48. Asara, J.M., et al., A label-free quantification method by MS/MS TIC compared to SILAC and spectral counting in a proteomics screen. Proteomics, 2008. 8(5): p. 994-99.

49. Old, W.M., et al., Comparison of Label-free Methods for Quantifying Human Proteins by Shotgun Proteomics. Molecular & Cellular Proteomics, 2005. 4(10): p. 1487-502.

50. Heine, G.F., A.A. Horwitz, and J.D. Parvin, Multiple mechanisms contribute to inhibit transcription in response to DNA damage. J Biol Chem, 2008. 283(15): p. 9555-61.

51. Sinha, R.P. and D.P. Hader, UV-induced DNA damage and repair: a review. Photochem Photobiol Sci, 2002. 1(4): p. 225-36.

52. Shibata, T. and T. Ando, Repair of UV-induced DNA damage in recombination- deficient strains of Bacillus subtilis. Mutat Res, 1975. 30(2): p. 177-90.

53. Thompson, A., et al., Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. Anal Chem, 2003. 75(8): p. 1895-904.

54. Dayon, L., et al., Relative Quantification of Proteins in Human Cerebrospinal Fluids by MS/MS Using 6-Plex Isobaric Tags. Analytical Chemistry, 2008. 80(8): p. 2921-31.

116

55. Ross, P.L., et al., Multiplexed Protein Quantitation in Saccharomyces cerevisiae Using Amine-reactive Isobaric Tagging Reagents. Molecular & Cellular Proteomics, 2004. 3(12): p. 1154-69.

56. Pichler, P., et al., Peptide Labeling with Isobaric Tags Yields Higher Identification Rates Using iTRAQ 4-Plex Compared to TMT 6-Plex and iTRAQ 8- Plex on LTQ Orbitrap. Analytical Chemistry. 82(15): p. 6549-58.

57. http://www.atcc.org/.

58. Picotti, P., R. Aebersold, and B. Domon, The Implications of Proteolytic Background for Shotgun Proteomics. Molecular & Cellular Proteomics, 2007. 6(9): p. 1589-98.

59. Prakash, A., et al., Assessing Bias in Experiment Design for Large Scale Mass Spectrometry-based Quantitative Proteomics. Molecular & Cellular Proteomics, 2007. 6(10): p. 1741-48.

60. Lundgren, D.H., et al., Role of spectral counting in quantitative proteomics. Expert Rev Proteomics. 7(1): p. 39-53.

61. Pham, T.V., et al., On the beta-binomial model for analysis of spectral count data in label-free tandem mass spectrometry-based proteomics. Bioinformatics. 26(3): p. 363-9.

62. Pavelka, N., et al., Statistical Similarities between Transcriptomics and Quantitative Shotgun Proteomics Data. Molecular & Cellular Proteomics, 2008. 7(4): p. 631-44.

63. http://www.ebi.ac.uk/IPI/IPIhelp.html.

64. Parvin, J.D. and S. Sankaran, The BRCA1 E3 ubiquitin ligase controls centrosome dynamics. Cell Cycle, 2006. 5(17): p. 1946-50.

65. Parvin, J.D., BRCA1 at a branch point. Proc Natl Acad Sci U S A, 2001. 98(11): p. 5952-4.

66. Heine, G.F. and J.D. Parvin, BRCA1 control of steroid receptor ubiquitination. Sci STKE, 2007. 2007(391): p. pe34.

67. Washburn, M.P., D. Wolters, and J.R. Yates, Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotech, 2001. 19(3): p. 242-47.

117

68. Michel, P., et al., Protein fractionation in a multicompartment device using Off- Gel isoelectric focusing. Electrophoresis, 2003. 24(1-2): p. 3 - 11.

118

APPENDICES

Appendix A: ZipTip clean-up procedure

To equilibrate the ZipTip pipette tip for sample binding, depress pipettor plunger to a dead stop; using the maximum volume setting of 10 l, aspirate wetting solution

(100% Acetonitrile) into the tip and dispense to waste; repeat the above step two times.

Then aspirate the ZipTip with an equilibration solution (0.1% TFA in ddH2O), dispense to waste, repeat two times.

To bind and wash the peptides or proteins, bind peptides to ZipTip pipette tip by fully depressing the pipette plunger to a dead stop; aspirate and dispense the sample 10 l each time by 10 cycles for maximum binding of complex mixtures. Then aspirate 10 l wash solution (0.1% TFA in ddH2O) into tip and dispense to waste; repeat four times.

To elute the peptides, dispense 10 l of elution solution (0.1% TFA/ 50% ACN) into a clean vial using a standard pipette tip, and repeat 10 times.

119

Appendix B: Tables

Table B.1 Ratios and P-values for the t-test of log2(NSAF) for selected HeLa cell protein biomarkers

Ratio P-value Ratio P-value Ratio P-value ID 2hctrl 2hctrl 1hctrl 1hctrl 2h1h 2h1h IPI00021812.2 0.626 0.214 0.699 0.139 0.896 0.657 IPI00303476.1 0.833 0.176 0.691 0.002 1.206 0.193 IPI00216308.5 0.511 0.094 1.002 0.896 0.510 0.041 IPI00220301.5 1.620 0.027 0.863 0.497 1.878 0.086 IPI00872379.1 1.751 0.002 1.853 0.094 0.945 0.929 IPI00299573.1 0.681 0.325 0.349 0.011 1.953 0.320 IPI00794211.1 1.068 0.955 1.525 0.152 0.700 0.279 IPI00797082.1 2.062 0.042 1.622 0.138 1.271 0.408 IPI00215914.5 2.224 0.018 2.280 0.259 0.975 0.812 IPI00215917.3 2.224 0.018 2.280 0.259 0.975 0.812 IPI00930263.1 2.157 0.032 2.280 0.259 0.946 0.877 IPI00008524.1 0.779 0.443 0.098 0.057 7.921 0.056 IPI00886833.1 0.378 0.031 0.336 0.215 1.124 0.544 IPI00556482.1 1.024 0.699 0.347 0.303 2.949 0.226 IPI00555876.1 0.797 0.242 0.395 0.002 2.021 0.027 IPI00937169.1 0.293 0.193 1.288 0.445 0.227 0.096 IPI00888053.1 0.399 0.073 0.321 0.177 1.244 0.567 IPI00853059.2 0.575 0.178 0.454 0.024 1.266 0.672 IPI00465233.1 0.611 0.107 0.797 0.381 0.767 0.446 IPI00219219.3 0.535 0.860 1.334 0.492 0.401 0.048 IPI00477663.1 0.938 0.927 0.074 0.008 12.636 0.004 IPI00001159.1 1.733 0.079 1.979 0.047 0.876 0.462 IPI00004860.2 0.442 0.179 0.373 0.112 1.183 0.858 IPI00012750.3 1.316 0.227 0.647 0.003 2.034 0.034 IPI00301154.3 0.736 0.243 0.114 0.030 6.434 0.042 IPI00008167.1 0.301 0.137 1.364 0.319 0.221 0.103 IPI00021805.1 0.805 0.880 1.627 0.293 0.495 0.042 IPI00556364.1 1.633 0.069 0.504 0.276 3.240 0.170 IPI00910980.1 1.025 0.894 0.326 0.201 3.140 0.195

120

(Continued)

(Table B.1 Continued )

IPI00903251.2 1.316 0.227 0.647 0.003 2.034 0.034 IPI00644127.2 0.913 0.700 0.279 0.174 3.270 0.189 IPI00027834.3 0.516 0.029 1.125 0.619 0.459 0.024 IPI00018349.5 0.694 0.296 0.161 0.049 4.309 0.704 IPI00795318.2 0.694 0.296 0.161 0.049 4.309 0.704 IPI00007074.5 0.908 0.608 0.363 0.085 2.505 0.118 IPI00328840.9 0.425 0.022 0.330 0.015 1.288 0.328 IPI00943181.1 0.601 0.183 1.441 0.221 0.417 0.015 IPI00297579.4 0.350 0.003 0.437 0.288 0.802 0.732 IPI00216694.3 0.438 0.022 0.305 0.007 1.438 0.189 IPI00216951.2 2.000 0.047 0.971 0.544 2.059 0.288 IPI00940656.1 0.746 0.344 0.178 0.109 4.201 0.133 IPI00917509.1 0.089 0.027 0.469 0.300 0.191 0.284 IPI00946353.1 1.440 0.472 1.825 0.023 0.789 0.441 IPI00219148.2 0.741 0.430 0.293 0.136 2.526 0.678 IPI00938044.1 0.554 0.316 0.104 0.020 5.349 0.371 IPI00555698.1 0.741 0.430 0.293 0.136 2.526 0.678 IPI00026271.5 1.380 0.340 2.080 0.001 0.663 0.176 IPI00479480.5 0.293 0.193 1.134 0.565 0.258 0.108 IPI00910662.1 2.856 0.016 0.786 0.360 3.633 0.146 IPI00021347.1 2.010 0.102 1.604 0.852 1.253 0.481 IPI00414860.6 1.082 0.709 2.249 0.063 0.481 0.031 IPI00555747.1 1.079 0.615 0.153 0.042 7.058 0.307 IPI00915917.1 2.483 0.265 0.939 0.579 2.643 0.014 IPI00218831.4 0.037 0.004 0.825 0.467 0.045 0.129 IPI00640363.1 0.043 0.003 0.954 0.538 0.045 0.129 IPI00291005.8 1.706 0.999 4.774 0.007 0.357 0.231 IPI00792715.1 0.408 0.074 1.062 0.967 0.384 0.089 IPI00784704.2 0.451 0.003 0.631 0.254 0.714 0.703 IPI00945930.1 0.289 0.208 0.229 0.122 1.263 0.839 IPI00909251.1 0.455 0.250 0.052 0.000 8.668 0.196 IPI00759613.2 1.267 0.647 2.466 0.028 0.514 0.352 IPI00375499.2 2.458 0.406 5.270 0.136 0.466 0.323

121

Table B.2 Ratios and P-values for the t-test of log2 (NSAF) for selected MCF-10A cell protein biomarkers

Ratio P-value Ratio P-value Ratio P-value ID 2hctrl 2hctrl 1hctrl 1hctrl 2h1h 2h1h IPI00021812.2 1.006 0.984 0.924 0.493 1.089 0.532 IPI00795292.1 0.291 0.222 0.919 0.966 0.317 0.228 IPI00396378.3 2.045 0.152 1.876 0.013 1.090 0.953 IPI00169383.3 1.820 0.893 2.885 0.266 0.631 0.414 IPI00872379.1 1.019 0.418 3.186 0.167 0.320 0.018 IPI00302925.4 0.249 0.180 0.718 0.249 0.347 0.232 IPI00555744.6 0.411 0.066 0.538 0.164 0.765 0.655 IPI00759596.1 8.721 0.079 12.826 0.066 0.680 0.231 IPI00902463.1 0.951 0.697 0.594 0.006 1.602 0.075 IPI00789337.3 2.414 0.007 2.186 0.008 1.105 0.087 IPI00027569.1 3.267 0.172 4.666 0.137 0.700 0.243 IPI00910666.1 5.965 0.104 8.083 0.088 0.738 0.141 IPI00013452.1 1.350 0.391 2.290 0.264 0.589 0.539 IPI00853059.2 0.411 0.032 1.180 0.974 0.348 0.134 IPI00025842.1 0.622 0.039 1.354 0.237 0.459 0.026 IPI00456758.4 0.304 0.031 0.227 0.129 1.341 0.493 IPI00031169.1 1.074 0.573 0.448 0.021 2.398 0.030 IPI00877802.2 0.581 0.269 0.227 0.084 2.560 0.765 IPI00645329.1 0.581 0.269 0.227 0.084 2.560 0.765 IPI00873632.1 1.089 0.470 0.411 0.076 2.647 0.066 IPI00642732.2 1.535 0.307 0.391 0.229 3.932 0.007 IPI00642880.2 0.389 0.009 1.185 0.793 0.329 0.066 IPI00798089.1 1.173 0.536 0.444 0.217 2.643 0.003 IPI00794027.1 1.285 0.420 0.404 0.269 3.184 0.204 IPI00107531.1 0.702 0.140 0.211 0.096 3.329 0.140 IPI00292387.6 13.566 0.002 9.172 0.180 1.479 0.414

122