Applicability of Multidimensional Fractionation to
Affinity Purification Mass Spectrometry Samples and
Protein Phosphatase 4 Substrate Identification
by
Wade Hampton Dunham
A thesis submitted in conformity with the requirements for the degree of Master of Science
Department of Molecular Genetics
University of Toronto
© Copyright by Wade Dunham 2012
ii
Applicability of Multidimensional Fractionation to Affinity Purification Mass Spectrometry Samples and Protein
Phosphatase 4 Substrate Identification
Wade Dunham Master of Science
Department of Molecular Genetics
University of Toronto
2012
Abstract
Affinity-purification coupled to mass spectrometry (AP-MS) is gaining widespread use for the identification of protein-protein interactions. It is unclear however, whether typical AP sample complexity is limiting for the identification of all protein components using standard one-dimensional LC-MS/MS. Multidimensional sample separation is a useful for reducing sample complexity prior to MS analysis, and increases peptide and protein coverage of complex samples, yet the applicability of this approach to AP-MS samples remains unknown. Here I present work to show that multidimensional separation of AP-MS samples is not a cost-effective method for identifying increased peptide or protein coverage in these sample types. As such this approach was not adapted for the identification of putative Phosphoprotein Phosphatase 4 (PP4c) substrates. Instead, affinity purification coupled to onedimensional LC-MS/MS was used to identify putative PP4c substrates, and semiquantitative methods applied to identify possible PP4c targeted phosphosites in PP2A subfamily phosphatase inhibited (okadaic acid treated) cells.
iii
Acknowledgments
I would like to acknowledge and thank everyone in the Gingras lab, in particular my supervisor Anne-Claude, for allowing me to take on complex projects and to be an integral part in the pursuit of high impact publications. I would also like to thank our lab manager Marilyn, for patiently letting me know where I can find my much needed reagents, although initially, I likely asked her the same question at least once a month, and Brett for his initial training and work, which helped me to secure my first publication. I must also thank both of my committee members; Drs. Ben Blencowe and Thomas Kislinger for their helpful insight and guidance over the course of my projects. The past two and a half years have definitely been a great learning experience. Thanks to everybody!
iv
Table of Contents
Chapter 1: Introduction…………………………………………………………..……..1
1.1 1.2 1.3 1.4 1.5
General Introduction and thesis overview……..………………………………….1 Identification of Proteins by Mass Spectrometry………………………………....2
Affinity-purification Coupled to Mass Spectrometry (AP-MS)………………......6
Affinity-purification using epitope tags…………………………………………...9
Background contaminants in AP-MS……………………………………………13 1.5.1 Strategies to remove the contaminants from the sample before mass
spectrometry……………………………………………………………...14
1.5.2 Strategies to remove the contaminants during or after MS analysis:
Label-free approaches……….…………………………………………..15
Fractionation of mass spectrometry samples………………………….…………17
Utilizing MS for identification of protein phosphorylation……………………...19 1.7.1 Phosphopeptide enrichment approaches and identification……..……….20 1.7.2 Label free phosphopeptide quantification and phosphosite
1.6 1.7
localization…………………………………………………………..…...24
PP2A subfamily phosphatases…………………………………………..……….31
1.8.1 PP4c biology and regulation………...…………….…………………......31 1.8.2 PP4c interactions and substrate identification……..…………………….39 1.8.3 PP4c regulation of mRNA transcription and splicing……..……...……...49 Thesis objectives…………………………………………………………………52
1.8 1.9
Chapter 2: A cost-benefit analysis of multidimensional fractionation of affinity purification-mass spectrometry samples……………………………..….54
2.1
Methods………………………………………………………………………….55
2.1.1 Generation and culture of stably transfected Flp-In
T-REx 293 cell lines……………………………………………………..55
2.1.2 Affinity purification……………………………………………………...56 2.1.3 One dimensional (1D) LC-MS/MS analysis……………………………..57 2.1.4 Multidimensional LC-MS/MS analysis……………………………….…57
v
2.1.4.1 MudPIT…………………………………………………………..58 2.1.4.2 RP/RP…………………………………………………………….58 2.1.4.3 GeLC……………………………………………………………..59
2.1.5 Data Analysis………………………………………………….……..…..59
Results……………………………………………………………………………60
2.2.1 Reproducibility of protein identifications made by AP-MS……………..64 2.2.2 Effect of fractionating affinity purified samples on spectral count, unique
peptide, and protein identification…..…………………………………...69
2.2.3 Effect of fractionating affinity purified samples on protein complex component identification.………………………………………………..75
2.2.4 Applying Significance Analysis of INTeractome (SAINT) to fractionated affinity purified samples…………………………………………………82
Discussion………………………………………………………………………..84
2.3.1 Multidimensional fractionation of AP-MS samples appears to allow for a better depth of coverage of low level background proteins and not core
2.2 2.3 protein complex components....……………………………………...…..84
2.3.2 Primary benefits and disadvantages of multidimensional fractionation of
AP-MS samples…..……..……………………………………………….85
2.3.3 Applicability of multidimensional fractionation of AP-MS samples to the
expansion of the PP4c network and substrate identification………….....86
2.3.4 Conclusions………………………………………………………………87
Chapter 3: PP4c interactor/subunit phosphosite identification…………………......88
3.1
Methods………………………………………………………………………….88
3.1.1 Generation and culture of stably transfected Flp-In
T-REx 293 cell lines………………………………………………..........88
3.1.2 Affinity purification……………………………………………………...91 3.1.3 Enrichment of phosphopeptides…………………………………….....…91 3.1.4 LC-MS/MS analysis…………………………………………………..…92 3.1.5 Data Analysis……………………………………………………….....…93
Results……………………………………………………………………………94
3.2
vi
3.2.1 PP4c interactor/subunit phosphosite identification………………...…….94 3.2.2 Phosphosite detection reproducibility……………….……………….....104 3.2.3 Changes in PP4c interactor phosphorylation upon PP2A subfamily phosphatase inhibition…………...………………….……………….…114
3.2.4 Reproducibility of phosphosite quantification across biological
replicates…………….…………...………………….……………….…136
Discussion……………………………………………………………………....141
3.3.1 Discerning whether PP4c interactors are possible substrates for the
enzyme……..…………………………………………………………...141
3.3
Chapter 4: Thesis Summary and Future Directions…………………………….….144
4.1 4.2
Thesis Summary……………………………………………………………...…144 Future Directions…………………………………………………………….....145
4.2.1 PP4c substrate identification……...……………...…………………......145
Conclusions………………………………………………………………….….150
4.3
References……………………………………………..…………………………….…152
vii
List of Tables
Table 1-1. Select peptide, protein, and dual affinity tags successfully used for purification of recombinant proteins in AP-MS studies………………………………………………10
Table 1-2. Classification of human protein phosphatases……………………………….32 Table 1-3. Gene and protein identifiers for PP4c, PP4c regulatory subunits, and interacting proteins investigated and discussed in detail in this thesis…………..………38
Table 2-1. Summary of the mass spectrometry data for this project.…....………..……..63 Table 2-2. Spectral counts, unique peptides, and non-redundant protein identification (A) for all proteins identified in COPS5 samples after background contaminant removal, (B) for the COPS5 interactors reported in BioGRID and detected in our samples, or (C) for all proteins prior to background contaminant removal..…………………………………65
Table 2-3. Spectral counts, unique peptides, and non-redundant protein identification for two biological replicate analyses of EIF4A2 and RAF1 (A) for all proteins after background contaminant removal, (B) for the interaction partners reported in BioGRID, or (C) for all protein hits prior to background contaminant removal..………………..…70
Table 2-4. Spectral counts, unique peptides, and non-redundant protein identifications for MEPCE samples (A) for all proteins identified after background contaminant removal, (B) for the interactors reported in BioGRID, and (C) for all protein hits prior to background contaminant removal………………………………………………………..72
Table 2-5. Paired t-test analysis comparing enrichment of spectra, unique peptides and protein identification by RP-RP analysis of FLAG-eIF4A2, RAF1, and MEPCE samples, after background removal (A), or for BioGRID annotated interactors only (B)………...74
Table 2-6. Proteins identified in "Core" interaction network of FLAG-COPS5 purifications, that are not annotated as COPS5 interactors in BioGRID…………...……76
viii
Table 2-7. Fold increase in spectral counts (A), or unique peptides (B) for BioGRID- annotated COPS5 interactors..……………………………………………………...……78
Table 2-8. A) Fold increase in spectral counts or unique peptides shown as the ratio of 2D/1D for BioGRID-annotated EIF4A2 interactors for each of the biological replicates analyzed (replicates annotated 1 and 2). B) Fold increase in spectral counts or unique peptides shown as the ratio of 2D/1D for BioGRID-annotated RAF1 interactors for each of the biological replicates analyzed (replicates annotated 1 and 2)...…….…………….80
Table 2-9. Fold increase in spectral counts or unique peptides shown as the ratio of 2D/1D for BioGRID-annotated MEPCE interactors………………………………..…...81
Table 3-1. FLAG tagged constructs and stable cell lines generated for identification of puataive PP4c substrates………………………………………………………………....90
Table 3-2. Summary table listing phosphopeptides identified for PP4c interacting proteins
or regulatory subunits..…………………………………………….………………..…...98
Table 3-3. Quantification of phosphopeptides identified in biological replicate analysis of
FLAG-DHX38…………………………………………………….……………………115
Table 3-4. Quantification of phosphopeptides identified in biological replicate analysis of
FLAG-HTASF1..………………………………………………….……………………119
Table 3-5. Quantification of SUPT5H phosphopeptides identified in biological replicate
analysis of FLAG-SUPT4H……………………………………….……………………124
Table 3-6. Fold change in DHX38 phosphosite abundance upon okadaic acid
treatment………………………………………………….……….…………………....129
Table 3-7. Fold change in HTATSF1 phosphosite abundance upon okadaic acid
treatment………………….……………………………………….……………………131
Table 3-8. Fold change in SUPT5H phosphosite abundance upon okadaic acid
treatment………………….……………………………………….……………………134
ix
Table 3-9. Reproducibility of DHX38 phosphopeptide quantification across biological
replicates………………….……………………………………….……………………137
Table 3-10. Reproducibility of HTATSF1 phosphopeptide quantification across biological replicates……....……………………………………….……………………138
Table 3-11. Reproducibility of SUPT5H phosphopeptide quantification across biological
replicates………………….……………………………………….……………………139
x
List of Figures
Figure 1-1. Schematic overview of protein identification by mass spectrometry………...4 Figure 1-2. Affinity purification coupled to mass spectrometry……………………….…8 Figure 1-3. Peptide fragmentation induced by collision-induced dissociation (CID)...…22 Figure 1-4. Proteome Discoverer PhosphoRS algorithm phosphosite localization and
scoring……………………………………..………..……………………………………29
Figure 1-5. Phylogenetic tree of PPP family of phosphatases…………………………...34
Figure 1-6. PP4c regulatory subunits………..…………………………...………………37
Figure 1-7. PP4c Network highlighting selective high-confidence interactions
(SAINT > 0.8)..…………………………………………………………………………..40
Figure 1-8. Network highlighting the interactions of SUPT5H-SUPT4H-RNGTT and PP4c-PP4R2-PP4R3A complex………………………………………………………….42
Figure 1-9. Network highlighting the interactions of DHX38-PRP19-U5 snRNP and PP4c-PP4R2-PP4R3A complex……………………………………………...…………..43
Figure 1-10. Network highlighting the interactions of SLC4A1AP-HTATSF1-U2 snRNP and PP4c-PP4R2-PP4R3A complex……………………………………………………..45
Figure 1-11. PP4R3A targeting of transcription and splicing factors occurs through its
EVH1 domain……………………………………………………………………………47
Figure 1-12. Real-time PCR analysis of FOSB, JUNB mRNA expression.....…...……...51 Figure 2-1. Sample preparation…………………………………………………………..61 Figure 2-2. Venn diagrams showing protein identification overlap for COPS5…………68
xi
Figure 2-3. Venn diagram of SAINT result overlap in (A) EIF4A2 and (B) RAF1
sample analysis…………………………………………………………………………..83
Figure 3-1. Experimental workflow for PP4c interacting protein or regulatory subunit phosphopeptide identification…………………………………………………………...95
Figure 3-2. Venn diagrams showing phosphosite identification overlap for DHX38,
HTATSF1, and SUPT5H………………………...……………………………………..105
Figure 3-3. Venn diagrams showing PhosphoSitePlus phosphosite identification overlap for DHX38, HTATSF1, and SUPT5H………………………………………………….107
Figure 3-4. DHX38 protein diagram illustrating phosphosites identified in biological replicate analysis of FLAG-DHX38………………………...………………………….108
Figure 3-5. HTATSF1 protein diagram illustrating phosphosites identified in biological replicate analysis of FLAG-HTATSF1…………...…………………………………….110
Figure 3-6. SUPT5H protein diagram illustrating phosphosites identified in biological replicate analysis of FLAG-SUPT4H………………...………………………………...112
Figure 3-7. Quantification of phosphopeptides reproducibly identified in biological replicate analysis of FLAG-DHX38. …………...……………………………………...117
Figure 3-8. Quantification of phosphopeptides reproducibly identified in biological replicate analysis of FLAG-HTATSF1……………..………………………………......122
Figure 3-9. Quantification of SUPT5H phosphopeptides reproducibly identified in biological replicate analysis of FLAG-SUPT4H...……………………...……………...126
Figure 3-10. Gel shift assay for identifying putative PP4c substrates by monitoring protein de-phosphorylation in the absence of PP4c…...………………………………..148
xii
List of Appendices
Appendix 1. Effect of siRNA directed depletion of PP4c on gene expression and RNA splicing………………………………………………………………………..….175
A1.1 Materials and Methods………………………………………………………….176
A1.1.1 Cell culture and siRNA directed PP4c depletion………….……………176 A1.1.2 Western blotting…………………………...……………………………177 A1.1.3 RNA isolation, reverse transcription, PCR, and qRT-PCR.……………177
A1.1.4 RNAseq analysis of alternate exon inclusion…………...………….…..180
A1.1.5 Splicing Assay….………………………………………………………180
A1.1.6 Data Analysis……..……………………………..…………………...…181
A1.2 Results……………………………………………………………………….…181
A1.2.1 Effect of PP4c depletion on gene expression……..……...……………..181 A1.2.2 Effect of PP4c depletion on RNA splicing……...…..…………………195
A1.3 Discussion…………………………………………………………………...….205
A1.3.1 siRNA off target effects and proper siRNA controls…..………………205
A1.4 Conclusions and Future Directions…………………………………………..…209
A1.4.1 PP4c regulation of Transcription and Splicing………..…..……………209
Appendix 1. List of Tables
Table A1-1. PCR primers used for gene expression analyses by qRT-PCR.....……..…179 Table A1-2. RNAseq genes demonstrated to be differentially expressed upon siPP4c ± EGF treatment selected for validation by qRT-PCR………..…..…………………...…190
Table A1-3. qRT-PCR validation of genes observed to be differentially expressed upon PP4c depletion (by RNAseq) in the presence or absence of epidermal growth factor
(EGF) treatment..………..…..……………………………………………….…………191
Table A1-4. Raw data from qRT-PCR validation of genes observed to be differentially expressed upon PP4c depletion (by RNAseq) in the presence or absence of epidermal growth factor (EGF) treatment..………..…..………………………………………...…193
xiii
Table A1-5. RNAseq genes demonstrated to be differentially spliced upon siPP4c ± EGF selected for validation by RT-PCR assay…………..…..………………………………197
Appendix 1. List of Figures
Figure A1-1. PP4c protein and mRNA levels after siRNA and EGF treatment..…..…..183 Figure A1-2. FOSB expression upon siRNA and epidermal growth factor (EGF)
treatment..………..…..…………………………………………………………………186
Figure A1-3. JUNB expression upon siRNA and epidermal growth factor (EGF)
treatment..………..………………………………………………………..……………187
Figure A1-4. NUMA1 and WBSCR1 alternate exon inclusion after siRNA and EGF
treatment..……………………………………………………………..…..……………198
Figure A1-5. NUMA1 alternate exon inclusion after siRNA and EGF treatment..…….200 Figure A1-6. WBSCR1 alternate exon inclusion after siRNA and EGF treatment……..202
Appendix 2. A cost-benefit analysis of multidimensional fractionation of affinity purification-mass spectrometry samples……………………………………….……212
A2.1 Supplementary Tables
Table S2-1. A) List of the proteins removed because they are listed as "frequent fliers" in our internal FLAG AP-MS HEK293 cell database (contains >1000 AP-MS analyses). B) Proteins removed from subsequent analysis because they were detected in FLAG alone negative Proteins identified in FLAG-COPS5 purifications after
background removal…………….............…………………………212
Table S2-2. Proteins identified in FLAG-COPS5 purifications after background removal…………….....………………………………217
Table S2-3. Proteins identified in FLAG-EIF4A2 purifications after background removal….…...………………….……………………228
Table S2-4. Proteins identified in FLAG-RAF1 purifications after background removal……………………….………………………233
xiv
Table S2-5. Proteins identified in FLAG-MEPCE purifications after background removal…….…………………………………………238
Table S2-6. SAINT analysis of proteins identified in FLAG-EIF4A2 or
FLAG-RAF1 purifications………………..……….……..……......240
1
Chapter 1 Introduction
1.1 General Introduction and Thesis Overview
In 2005, the Aebersold lab, where Dr. Gingras was a post doctoral fellow, discovered a novel mammalian trimeric complex containing protein phosphatase 4 (PP4c), a serine-threonine phosphatase conserved throughout eukaryotic evolution and involved in resistance to cisplatin, one of the oldest anticancer drugs [1]. This trimeric complex was demonstrated to consist of PP4c and the PP4c regulatory subunit PP4R2, in addition to one of two novel proteins PP4R3A or PP4R3B, now known to function in PP4c substrate targeting. Furthermore, Gingras et al. [1] went on to demonstrate that a similar complex is functional in yeast (comprised of the PP4c ortholog Pph3, the PP4R2 ortholog Ybl046w and the PP4R3A ortholog Psy2), deletion of which reduces cell viability following cisplatin induced DNA damage. In addition, they demonstrated that mammalian PP4R3A was able to revert the cisplatin hypersensitivity of a psy2∆ yeast strain indicating the human and yeast proteins are functionally equivalent and that a reduction in PP4R3A activity in Drosophila (flfl encodes the fly homolog of the PP4R3A and Psy2 proteins) renders them hypersensitive to cisplatin, indicating that this PP4c containing trimeric complex may function in a conserved role in DNA damage repair from yeast to higher eukaryotes, in addition to facilitating cisplatin resistance in mammalian cells. Hereafter this PP4c trimeric complex will be referred to as PP4cPP4R2-PP4R3A.
To begin to understand how the PP4c-PP4R2-PP4R3A complex outlined above is linked to the cisplatin resistance phenotype and to uncover its physiological role in mammalian cells, a graduate student of the Gingras lab, Ginny Chen, extensively characterized the cellular context in which the components of this complex reside using affinity purification coupled to mass spectrometry (AP-MS). What Ginny discovered was that the components of the PP4c-PP4R2-PP4R3A trimeric complex associate with components of the splicing and transcription elongation machineries, especially PP4R3A, which was observed to localize to nuclear speckles (thought to be storage sites for the transcription and splicing machineries) in a transcription dependant manner.
2
Additionally, she demonstrated a positive role for PP4c in the regulation of mRNA transcription following epidermal growth factor (EGF) stimulation, in addition to a role for PP4c in the regulation of RNA splicing (both explained in more detail in section 1.8). Based on these discoveries, I postulated that in concert with its associated partners PP4R2 and PP4R3A, PP4c may serve as a master controller in the processes of splicing and transcription.
I begin this thesis with a general overview of protein identification by mass spectrometry and AP-MS (most of which is part of a review in Proteomics; in press), as these methods were used to generate the data presented in chapter 2 and 3, and by Ginny Chen to generate the PP4c interaction network presented in section 1.8. Next I move into the applicability of multidimensional fractionation methods (demonstrated to increase peptide and protein coverage of complex MS samples), to the identification of additional components of protein complexes in AP-MS samples (presented in chapter 2 and published in Proteomics [2]). At the onset of my studies, I was interested in testing whether fractionation of AP-MS samples could uncover new components of protein complexes. If this were the case, I would have re-interrogated the PP4c interactome in an attempt to expand the PP4c interaction network generated by Ginny Chen and further our understanding of PP4c regulation of transcription and splicing. Next I discuss methods for identifying protein phosphosites using mass spectrometry, and for label free quantification of changes in protein phosphorylation, techniques I used as an initial step towards determine the enzyme-substrate relationship between PP4c and its interaction partners (presented in chapter 3). Lastly, I end this introduction with background on PP4c and a summary of the work done by Ginny Chen, setting the stage for my thesis rational and a presentation of my thesis objectives. My thesis summary and future directions are presented in chapter 4. The role of PP4c in regulating mRNA transcription and splicing is further investigated using PCR-based assays and presented in appendix 1.
1.2 Identification of Proteins by Mass Spectrometry
Mass spectrometry (MS) has become the analytical technique of choice for the identification of proteins in biological samples. In general terms, mass spectrometers