The Study of Macromolecular Complexes by Quantitative Proteomics

Jeff Ranish Proteomics: the systematic study of the complement of the cell

Patterson and Aebersold, Nat. Gen., 2003 Many cellular functions are carried out by in complexes

Response to environmental signals Koretzky, G., Myung, P., Nature Immunology Rev., 2001 Nuclear transport Rout, M.P., Aitchison, J.D., J.B.C., 2001

RNA synthesis Shilatifard, A., FASEB J, 1998 Transcription factor complexes orchestrate the control of expression

Chromatin remodeling Activators/Repressors complexes Pol II machinery

mRNA

Distinct transcription complexes regulate expression of specific Outline

I. Introduction to the study of macromolecular complexes by mass spectrometry

II. Analysis of macromolecular complexes using quantitative MS

 A human sequence-specific DNA binding transcription factor

 An RNA polymerase II transcription complex

 A new component of the transcription machinery

 Changes in transcription factor complex composition during development

III. Future directions  Transcription complexes in chromatin

 Targeted MS approaches to study complexes

IV. Conclusions Macromolecular complex analysis by mass spectrometry

Step 1. Purify complex from cell extracts

Separate based on physical, chemical and/or biochemical properties

i.e., ion exchange chromatography gel filtration chromatography density gradients affinity interaction chromatography (antibodies, nucleic acids, …)

Step 2. Analyze purified sample by MS

Kumar and Synder, Nature, 2002 Step 3. Evaluate results using informatics tools Isolating interacting proteins by affinity chromatography: epitope tags

epitope tag Protein of interest

epitope tags composition affinity matrix FLAG DYKDDDDK FLAG antibody HA YPYDVPDYA HA antibody C-MYC EQKLISEEDL c-MYC antibody 6XHIS HHHHHH Immobilized metal affinity (IMAC) Biotinylation signal 78 amino acids avidin/streptavidin Strep binding 10-50 amino acids avidin/streptavidin Protein A 137 amino acids IgG binding peptide 26 amino acids Calmodulin Isolating interacting proteins: Tandem affinity purification (TAP)

Protein A tag Calmodulin-BD Protein of interest

TEV protease cleavage site

IgG-sepharose Calmodulin-sepharose TAP purification of U1snRNP complex

IgG Beads - + + TEV cleavage - + + Calmodulin beads + - + Extract TAP WT TAP WT MW TEV TAP WT

TEV protease

1 2 3 4 5 6 7 8

Large scale studies have characterized protein interactions associated with >2000 yeast proteins

Gavin, Nature, 2002, Ho, Nature, 2002, Krogan, Nature, 2006 Current limitations of mass spectrometry-based protein complex analysis

• Difficult to distinguish specific complex components from non-specific proteins without extensive purification

• Loosely associated factors may be lost during extensive purification

• Static representation of complex composition

• No information about subunit stoichiometry NCDIR - APPROACH Chait, Aitchison & Rout, Nature Methods, 2007

• Protein A Tag Macromolecular Complexes

• Complexes Preserved by Freezing & Cryolysis

• Rapid Isolation (5-30 minutes) Preserves Complexes

• MS analysis to determine complex: COMPOSITION MODIFICATIONS A quantitative MS approach to complex characterization Ranish, Nat. Gen., 2003 a) b)

control purification specific purification purification from purification from cell state 1 cell state 2

differentially label differentially label isotopically normal isotopically heavy isotopically normal isotopically heavy

combine combine

proteolysis proteolysis

sample complexity reduction sample complexity reduction

LC-MS/MS LC-MS/MS

identify peptides and quantify their identify peptides and quantify their relative levels by measuring peak ratios relative levels by measuring peak ratios condition condition specific

100 complex specific non-specific 100 invariant enrichment rel. abundance rel. rel. abundance rel. 0 m/z 0 m/z distinguish specific complex components detect changes in complex abundance or composition from co-purifying proteins I-DIRT Tacket, JPR, 2005

Mousson, F. (2008) Mol. Cell. Proteomics 7: 845-852 Quantitation: Labeling vs. Label-free

L. Muller et al. Nat. Methods 2007 Isotopic labeling strategies for MS-based quantitative proteomics

MS1 ntensity I Modified from CEBI web site m/z Stable Isotope Tagging by Metabolic Labeling (i.e., SILAC)

 Cells are grown on media containing isotopically heavy or normal nutrients, i.e., lysine and arginine

Strengths Weaknesses • Simple in vivo labeling • Compatible with selected protocols species, samples only • Minimal sample handling • Label potentially metabolized • Potentially all peptides labeled • Labeling potentially perturbs biological system

 detect changes in complex composition • No inherent sample enrichment

purification from purification from cell state 1 cell state 2 Stable Isotope Tagging by Chemical Reaction (i.e., ICAT, ICPL, N-isotag)

Labeling at the protein or peptide level, after cell lysis. Most reagents target amines or sulfhydryl groups

O ICAT N N O X X XX O O I S cys N O O N S H X X XX Biotin tag Linker (X = hydrogen Thiol specific or deuterium) reactive group Dmass = 8 daltons

GABA NHS N-isotag tBoc-Leucine NHS

X X O X O X N (heavy or normal) X X O *NH tBoc * O

NH2Leu X = C12 or C13, NH Leu *N = N14 or N15 2 Dmass = 7 daltons Stable Isotope Tagging by Chemical Reaction (i.e., ICAT, ICPL, N-isotag)

Strengths Weaknesses • Compatible with any protein • Chemical reactions required source • Sample handling • Different specificities can be designed into reagent • Tag might interfere with MS or MS/MS • Selective tagging reduces • Potential for side reactions, sample complexity incomplete reactions

 monitoring complex enrichment

control purification specific purification i.e., control antibody i.e., specific antibody iTRAQ-isobaric tags

 MS2-based quantification

Ross, P. L. (2004) Mol. Cell. Proteomics 3: 1154-1169 Stable Isotope Tagging by Chemical Reaction-iTRAQ

Strengths Weaknesses • Multiplexed-up to 8 • Chemical reactions required • Compatible with any • Sample handling protein source • Need to detect reporter ions • Sample complexity in low m/z range reduction in the MS1 • Ion must be selected for dimension CID to quantify • Not necessary to reconstruct ion chromatograms for quantification Label-free Method: Spectral Counting

N unique = 5 N spectra = 14

protein sequence

MESSPFNRRQWTSLSLRVTAKELSLVNKNKSSAIVEIFSKYQKAAEETNMEKKRSNTENLSQHFRKGTLTVLKKKWENP GLGAESHTDSLRNSSTEIRHRADHPPAEVTSHAASGAKADQEEQIHPRSRLRSPPEALVQGRYPHIKDGEDLKDHSTES KKMENCLGESRHEVEKSEISENTDASGKIEKYNVPLNRLKMMFEKGEPTQTKILRAQSRSASGRKISENSYSLDDLEIG PGQLSSSTFDSEKNESRRNLELPRLSETSIKDRMAKYQAAVSKQSSSTNYTNELKASGGEIKIHKMEQKENVPPGPEVC ITHQEGEKISANENSLAVRSTPAEDDSRDSQVKSEVQQPVHPKPLSPDSRASSLSESSPPKAMKKFQAPARETCVECQK TVYPMERLLANQQVFHISCFRCSYCNNKLSLGTYASLHGRIYCKPHFNQLFKSKGNYDEGFGHRPHKDLWASKNENEEI LERPAQLANARETPHSPGVEDAPIAKVGVLAASMEAKASSQQEKEDKPAETKKLRIAWPPPTELGSSGSALEEGIKMSK PKWPPEDEISKPEVPEDVDLDLKKLRRSSSLKERSRPFTVAASFQSTSVKSPKTVSPPIRKGWSMSEQSEESVGGRVAE RKQVENAKASKKNGNVGKTTWQNKESKGETGKRSKEGHSLEMENENLVENGADSDEDDNSFLKQQSPQEPKSLNWSSFVD NTFAEEFTTQNQKSQDVELWEGEVVKELSVEEQIKRNRYYDEDEDEE Label free approaches: Spectral counting

• normalized count of the number of correctly identified spectra per protein (Zybailov,et. al., J. Proteome Res. 2006, Choi, et. al., MCP, 2008)

• robust for high abundance proteins and simple mixtures • simple to do perform • can compare any appropriately matched samples

• indirect assessment of abundance • accuracy may be compromised by duty cycle limitations • sample handling Label free approaches: Intensity–based quantification

• Quantification is performed computationally by comparing ion intensities in sequential runs • Any appropriately matched samples can be compared

• Computationally-intensive • accurate measurements require: • multiple MS runs • high mass accuracy • reproducible chromatography Intensity–based quantification

• Open source platforms are available • msInspect (Bioinformatics, 2006. 22(15): p. 1902-9 ) • MZmine (Bioinformatics, 2006. 22(5): p. 634-6.) • SpecArray (Mol Cell Proteomics. 2005 (9):1328-40) • SuperHirn (Proteomics, 2007 (19):3470-80.) • Platforms available through instrument vendors • Waters Protein Expression Informatics (Anal. Chem. 2005, 77, 2187-2200) Outline

I. Introduction to the study of macromolecular complexes by mass spectrometry

II. Analysis of macromolecular complexes using quantitative MS

 A human sequence-specific DNA binding transcription factor

 An RNA polymerase II transcription complex

 A new component of the transcription machinery

 Changes in transcription factor complex composition during development

III. Future directions  Transcription complexes in chromatin

 Targeted MS approaches to study complexes

IV. Conclusions How is the Muscle Creatine Kinase gene regulated?

0 2 4 6 8 10 12 14 (kb)

EXON # 1 MR-1 2 3 4 5 6 7 8

Enh P

206 nt C T L M M A A A R E E E P T E F F F r 2 rich G X T 1 2 L R E-boxes Transcription factor binding to the MCK enhancer results in muscle-specific expression of the MCK gene Charis Himeda, Steve Hauschka Experimental strategy

wt Trex mt Trex incubate HeLa nuclear extract with DNA-coupled beads

TrexBF

apply magnet remove unbound proteins wash TrexBF

elute bound proteins with high salt

differentially label with ICAT reagents

analyze by quantitative mass spectrometry Himeda, Mol. Cell. Biol., 2004 Protein composition and activity of DNA affinity purified samples

kD

180 - TrexBF 116 - 97.4 -

66 - 48.5 -

29 - 18. 4 -

14.2 - 6.5 - SDS-PAGE Gel-shift Assay Distribution of abundance ratios

200 180 160 140 120 100 80 60 40 20

Number of proteins of Number 0 -0.30 -0.20 -0.10 0.00 0.10 0.20 0.30 >0.35

log10[abundance ratio (specific/control)]  868 proteins or protein groups identified using SEQUEST, Peptide Prophet and Protein Prophet (P > 0.90), and quantified by Xpress  3 proteins with abundance ratios >2 are enriched in the Trex specific purification Trex binding factor candidates

Protein peptides d8/d0 annexin a7 CYQSEFGRDLEK 2.7 : 1

CNBP (+1) CGESGHLAK 2.1 : 1 DCDLQEDACYNCGR 2.6 : 1 GFQFVSSSLPDICYR 2.9 : 1 CYSCGEFGHIQK 10.7 : 1 CGESGHLAR 7.0 : 1 CGETGHVAINCSK 9.2 : 1

Six4 YVLDGMVDTVCEDLETDKK 2.4 : 1 TrexBF in mouse skeletal myocytes is Six4

Probe Trex Extract mouse skeletal myocyte Six4 Competitor - Trex MEF3 mt - - Trex MEF3 mt - - Antibody - - - - aSix4 CTL - - - - aSix4 CTL

TrexBF

1 2 3 4 5 6 7 8 9 10 11 12 Six4 stimulates transcription from the Trex site

350 MM14 skeletal myocytes 180 MM14 skeletal myocytes 160 300 (+Trex) TKCAT 6 140 250 (-M1)5TKCAT 120 (-enh)80MCKCAT 200 100 (-enh-M1)80MCKCAT 150 80 60 100

40 Relative CAT ActivityRelative 50 RelativeCAT Activity 20 0 0 Cand.3 - ++ -- ++ Cand.3 - + - +

700 neonatal rat myocardiocytes

600 (-enh)80MCKCAT 500 (-enh-M1)80MCKCAT

400

300

200 Relative CAT Activity CAT Relative 100

0 Cand.3 -- + -- ++ Quantitative mass spectrometry permits the identification of specific complex components in partially purified samples

kD

180 -

116 - 97.4 - Six4 66 - CNBP 48.5 - Annexin a7

29 - DNA affinity purification/qMS 18. 4 - Himeda CL, et. al. Mol Cell Biol. 2008 Qi Y, et. al. Proc Natl Acad Sci U S A. 2008 14.2 - Rubio ED, et. al. Proc Natl Acad Sci U S A. 6.5 - 2008 SDS-PAGE Outline

I. Introduction to the study of macromolecular complexes by mass spectrometry

II. Analysis of macromolecular complexes using quantitative MS

 A human sequence-specific DNA binding transcription factor

 An RNA polymerase II transcription complex

 A new component of the transcription machinery

 Changes in transcription factor complex composition during development

III. Future directions  Transcription complexes in chromatin

 Targeted MS approaches to study complexes

IV. Conclusions The RNA Polymerase II core machinery

SRB/Meds Chromatin remodeling TAFs (24) complexes (14) Pol II/IIF TBP (15) TFIIA (1) Activator TFIIH (2) TFIIB TFIIE TATA (9) (1) (2)

 Responsible for expression of all mRNA’s in the nucleus  ~68 polypeptides are thought to be recruited to promoters as part of the core transcription machinery Use of a TBP mutant extract and ICAT to guide identification of pol II core transcription factors Pst I

ACT TBP(I143N) + + M ACT TATA nuclear extract activator

+ rTBP

holoenzyme TAFs M ACTACT TATA M ACTACT rTBPTATA IIA

Pst I elute isotopically label H H H L L H H H combine proteolyze fractionate mLC/MS/MS Ranish, et. al., Nature Gen., 2003 Protein composition after affinity purification

A B kD

220- TBP (51) 160- 120- 100- TOA2 90- (>24) 80- (TFIIA) 70- 60- TFIIB (5.0) 50-

40- SRB4 (2.0) 30- (mediator) 25- 20- KIN28 (5.4) 15- (TFIIH) 10- 1 2 1 2 3 silver stain western ASAPratio

Li, et. al., Anal. Chem, 2003 . 252 proteins quantified by ASAPratio

. 47/57 proteins with enrichment ratios > 2 and p < 0.1 are known core PIC components (70%)

. 17/18 GTFs, 7/7 Pol II-specific subunits, 10/14 TAFs, 12/24 mediators subunits

. 5 proteins with annotated roles in Pol II transcription . 5 potential new PIC components Comparison of amine labeling approaches

Experiment PIC proteins identified Total proteins identified

number percent of number PIC percent PIC

ACE1- 99 90.8% 287 34.5% N-isotag

GCN4- 94 86.2% 239 39.3% N-isotag

ACE1- 105 96.3% 418 25.1% iTRAQ

GCN4- 105 96.3% 418 25.1% iTRAQ

TOTAL 108 99.1% 530 20.4%

Jie Luo TSP: a new component of the transcription machinery?

• Sequest identifies 2 overlapping cysteine containing tryptic peptides from TSP

m/z= 568.1

m/z= Relative Abundance Relative 570.8

scan number Light : Heavy 1.0 : 2.0 (0.23) Xpress ratios Composition of core TFIIH in the absence of TSP John Leppard

TFB2-FLG TFB2-FLG SILAC labeling tspD extract TSP extract (light) (heavy) Darg4, Dlys1 d10ARG, d8LYS

TFB4 Detect changes in complex composition by quantitative MS TFB2 SSL2 SSL1 TFB1 TSP TFB2

SSL2

Core TFIIH Xpress quantification architecture

TSP (9) Outline

I. Introduction to the study of macromolecular complexes by mass spectrometry

II. Analysis of macromolecular complexes using quantitative MS

 A human sequence-specific DNA binding transcription factor

 An RNA polymerase II transcription complex

 A new component of the transcription machinery

 Changes in transcription factor complex composition during development

III. Future directions  Transcription complexes in chromatin

 Targeted MS approaches to study complexes

IV. Conclusions Chromatin Remodeling Complexes

BAF chromatin remodeling complexes contain one of two ATPases: Brg, or Brm (Brahma) and ~10 core subunits

Required for pluripotency and self renewal in ES cells but not for proliferation of fibroblasts and other cell types.

Hypothesize the existence of BAF complexes with distinct subunit composition in different cell types

From Gary M. Halliday et al. Int J Biochem Cell Biol (2008)

Lena Ho, Gerald Crabtree, Stanford Alexey Nesvizhskii, Univ. Michigan Embryonic Stem Cell BAFs

Goal: to understand the role BAF complexes in pluripotency

Step 1: determine the composition of BAF complexes in ES cells

Compared complexes purified from nuclear extracts from:

Mouse Embryonic Stem Cells (ES) Mouse Embryonic Fibroblast (MEF) P0 Mouse Brain, Neurons (Neurons) Identification of BAF components

Nuclear extracts from mouse Embryonic Stem Cells, Mouse Embryonic Fibroblast, Neurons (brain)

Affinity Purification of BAF complexes with anti-Brg/Brm antibody

Trypsin digestion

Strong Cationic Exchange Fractionation

ESI-MS/MS (Orbitrap LTQ)

Peptide/protein identification (Trans Proteomic Pipeline)

Spectral counting-based quantification and computational analysis Identification of BAF components

Identification of BAF components by AP-MS

Comparison of BAF-associated proteins reveals common and cell-type specific components in ES, MEF, Neurons Quantification of BAF Complex Components by Spectral Counting

Protein abundance is estimated by the number of spectra acquired for each protein, normalized to account for protein length and normalized to the total spectra in each dataset Immunoblotting vs. Adjusted Spectral Counts

Immunoblotting confirms differential expression of BAF155 and BAF170 in ES, MEFs, and Brain

immunobloting Good agreement between immunoblotting and spectral count-based quantification spectral counting esBAF Complex

Spectral quantification of core BAF components

(normalized for protein length, and total number of spectra in each dataset)

Composition of BAF complexes from ES cells Transcription complexes in chromatin Challenges for the comprehensive analysis of gene- specific transcription complexes

• Need an efficient way to isolate the complexes in a form that is amenable to MS

• Gene-specific complexes are present at ~single copy per cell

• only ~100 fmol in 3 liters of cells

• the limit of detection in a “shotgun” MS experiment is ~10 fmol

• the complexes are dynamic Chromatin isolation

3XFLAG LacI 3XFLAG LacI

LacO gene X LacO promotergene -XGFP

ARS1 ARS1

TRP1 TRP1

Grow cells under specific environmental conditions in presence of isotopically heavy or light lysine and arginine

Anti-FLAG immuno-purify

Elute bound complexes

Analyze composition and changes in composition by quantitative MS Kinetochore purification scheme Bungo Akiyoshi, Sue Biggins 5 liters of cells CEN3

Cryolysis TRP1

Resuspend in buffer ARS1

Ultracentrifuge LacO LacI 3XFLAG Extract

anti FLAG IP CEN3

Wash beads TRP1 ARS1 Elute with FLAG peptide or SDS

LacO LacI SCX fractionate and analyze by MS/MS 3XFLAG Proteins >3-fold enriched on centromeric minichromosomes Name Function %Coverage #unique enrichment ratio

*Detected >90% of known core kinetochore proteins by MS

Akiyoshi, Genes Dev., 2009 Targeted MS using Selected Reaction Monitoring

Selected reaction monitoring m/z m/z fragment diagnostic signal corresponds corresponds for protein X to peptide x to a fragment from peptide x Domon et al., Science (2006)

• Two levels of mass selection: high selectivity • Selective scanning, short duty cycle: high sensitivity, reproducibility • The most sensitive mass spectrometry method known (low amole ) The transcription factor SRM assay After optimization and validation our assay now includes - 420 proteins - 1539 peptides - 4615 transition (Q1/Q3) 3 transitions/peptide

Hamid Mirzaei, Paola Picotti, Ruedi Aebersold Control of FLO11 expression Nutrients /Environment Ras2

Snf1 kinase cAMP/PKA Kss1/MAPK pathway pathway pathway

Nrg1, Nrg2 Flo8 Sfl1 Ste12/Tec1

Rupp, et. al., EMBO, 1999

Haploid : high glucose low glucose Diploid : high nitrogen low nitrogen

FLO11 repressed FLO11 activated Round form cells Elongation and invasion Application of the assay to identify potential regulators of the yeast FLO11 gene

Segment Regulator

promoter enriched

control enriched

Systematically measured the binding preference of 222 TFs for each promoter segment by SRM Advantages of quantitative mass spectrometry for complex analysis

 Quantitative measurement increases confidence in identification of complex components

 Permits identification of complex components without the need for extensive purification

 Permits detection of quantitative changes in complex composition and abundance

 Stoichiometry measurements possible with absolute quantification Limitations of quantitative mass spectrometry for complex analysis

Labeling  Extra steps needed to incorporate labels may result in loss of sensitivity and sample handling errors

 Low signal to noise ratios, or incomplete resolution of ions can limit accuracy of quantification. (High resolution instruments improve accuracy of quantification )

Label-free  Multiple reproducible MS runs needed for accurate measurements with intensity-based measurements

 Duty cycle limitations can hinder quantification by spectral counting

 Sample handling errors

 Ion suppression issues Acknowledgements

ISB ETH Zurich Bong Kim Paola Picotti John Leppard Ruedi Aebersold

Jie Luo University of Washington Hamid Mirzaei Charis Himeda Jimmy Eng Steve Hauschka

Andrew Keller University Michigan Xiao-jun Li Alexey Nesvizhskii David Shteynberg Fred Hutchinson Cancer Research Center Tim Galitski Steve Hahn (Pol II) Theo Knijnenburg Bungo Akiyoshi (kinetochore) Ilya Shmulevich Sue Biggins

John Aitchison Stanford Lena Ho Jerry Crabtree