MS-based Proteomics in Biosciences

Markku Varjosalo, Ph.D HiLIFE & Institute of Biotechnology University of Helsinki Overview

: Proteins are large biomolecules, consisting of one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, responding to stimuli, and transporting molecules from one location to another.

• Proteomics: Study of the complete complement of proteins present in a cell or system of cells • Includes: – structures, quantities, functions, locations – Post-translational modifications – Study of protein interactions and complexes – How each of the above change with time and in response to stimuli

wikipedia Making proteins: DNA->RNA Making proteins: RNA->protein Protein Structures

5 Proteome • Entire set of proteins expressed by: – Genome – Cell – Tissue – Organism

250,000 – 1 million proteins, highly dynamic Why proteomics?

Genomics • DNA “what possibly”

• RNA “what Transcriptomics probably”

• Proteins Proteomics “what really happens” Why proteomics?

• Protein alterations cannot be fully deduced from DNA • RNA expression does not always reflect protein levels: – translational control – degradation – turnover • Some tissues not suitable for RNA expression analysis • Proteins are the physiological/pathological active key players • General goal: – better understanding of genesis and progression of diseases • Clinical goals: – early disease detection (biomarkers) – identification of therapeutic targets – therapy monitoring Challenges in Proteomics

• Proteome is constantly changing – Depends on external regulation, post-translational modification...

• Sample degradation – Especially with tissue biopsy samples

• Extremely wide-range of concentrations – >up to 109 fold difference in order of magnitude Sample preparation for Proteomic analyses LC-MS/MS analysis overview Measurable unit: peptides

èmost common enzyme: Trypsin – cleave after Lys (K) & Arg (R) ètryptic peptides make great ions thanks to two protonable amines Peptide separation by RP-HPLC

• C18 reverse phase separation • sorts peptides according to their hydrophobicity • Gradient of acetonitrile from 5% to 35%

35%

5% HPLC: high pressure liquid chromatography Tandem mass spectrometry

1 MS1 10 MS2 cycle time < 1 sec data dependent acquisition (DDA) “shotgun MS” MS2 spectra contain sequence information Database search of peptide spectra

Yeast lysate ≈ 10,000 MS1 scans ≈ 20,000 spectrum matches ≈ 40,000 MS2 scans

statistical modeling probability cut off: 0.9 false discovery rate: 1%

in silico spectrum ≈ 10,000 peptides protein ≈ 1500 proteins database fasta Mass spectrometry based proteomic vs. gel-based proteomics

2-DE ten(s) to hundred proteins identified and quantified

MS thousands of peptides and hundreds to thousands proteins Summary

ènot proteins but peptides are measured by LC-MS/MS

èpeptides are separated by hydrophobicity using RP-HPLC

èfragment ions are generated from the most abundant MS1 signals

èfragment spectra of peptide ions contain sequence information

èfragment spectra can be searched against a sequence database Major Applications of Proteomics

1. Quantitative proteomics

2. Phosphoproteomics

3. Interactomics Quantitative proteomics Quantitative proteomics

Incorporation of stable isotopes into proteins

Metabolic labeling Chemical labeling Quantitative proteomics

Label-free quantification by MS1 alignment

•SuperHirn •OpenMS •Progenesis LC-MS •MaxQuant Quantitative proteomics

Selected Reaction Monitoring (SRM) Quantitative proteomics goes clinical proteomics

• Develop new biomarkers for disease diagnosis and early detection

• Identify new targets for drugs

• Better evaluate the therapeutic effect of possible drugs Open clinical question

• Approximately one-third of people with diabetes develop some degree of diabetes-related eye damage or retinopathy.

• Diabetic retinopathy is the leading cause of vision loss among the working age adults

IDF, Diabetes Atlas, 6th Edition Quantitative proteomics pipeline

Loukovaara, 2015 JPR Figure 3 A 0 Differences100 000 between the non-PDR and PDR

vitreous proteomes Vitreous proteomes non-PDR

PDR • Out of these 260 proteins differed significantly (p<0.001) between non-PDR and PDR

Proteins

B

10 non-PDR/PDR Alpha glycoproteins (4) Apolipoproteins (9) PDR/non-PDR Cilia (8) Collagens & fbrinogens (6) Complement system (15) Immunoglobulins (6) Inter−α−trypsin inhibitors (5) 5 Serpin family proteins (9)

PRDX2 Fold change Fold KERA

CD14 A5 OSOU1 F R Y NOS1 PRDX6P KLKB1 A H THRBLUMHABP2 CBPB2 AT U C OMO1 B FINCVTDBPLMNKNG1R A2MG AL

0 Proteins

Loukovaara, 2015 JPR Pathways behind the PDR

• KEGG pathway enrichment

• 31 proteins enriched in PDR were involved in Coagulation, Kinin-Kallikrein or Complement cascades

• Covers >40% off all the 64 cascade components

Loukovaara, 2015 JPR Protein phosphorylation

SIGNAL ATP ADP P Phosphorylation covalently attaching a phosphate group (Kinase) to one of three amino acids that have P a free hydroxyl group; PROTEIN PROTEIN serine (S), threonine (T) or tyrosine (Y) Dephosphorylation (Phosphatase)

OUTPUT P changed enzyme activity, cellular location, or association with other proteins Protein phosphorylation

SIGNAL ATP ADP P Phosphorylation covalently attaching a phosphate group (Kinase) to one of three amino acids that have P a free hydroxyl group; PROTEIN PROTEIN serine (S), threonine (T) or tyrosine (Y) Dephosphorylation (Phosphatase) mass shift of +79.99 Da OUTPUT P changed enzyme activity, cellular location, or association with other proteins Phosphoproteomics

www.pepscope.com Phosphoprofiling of the mammalian Hedgehog (SHH) signaling pathway (NIH-3T3 cells, 0 - 24 h )

number of unique phosphorylation sites (FDR>1%)

86 sites 65 sites 126 sites 121 sites Interaction proteomics

[email protected] 33 Protein complexes

• the functional units of cells!

• stable and/or unstable

• composition homomultimeric and/or heteromultimeric

34 Experimental methods for protein interaction analysis

1. Screening methods Y2H BiFC Protein microarrays (PPI)

2. Affinity purification coupled to MS (AP-MS)

35 Experimental methods for protein interaction analysis

1. Screening methods Y2H BiFC Protein microarrays (PPI)

2. Affinity purification coupled to MS (AP-MS)

36 The pipeline for cell line generation, protein complex purification and MS analysis

Flp-In HEK293 cells Flp-In HeLa cells

Varjosalo et al., 2013 Nature Methods Varjosalo et al., 2013 Cell Reports Defining the molecular context of protein kinases

ILK CSK STK1 STK25

STK3 STK24 PBK OXSR1 CMGC EIF2AK2 MST4 PAK1

GSK3B CSNK2A2 TP53RK CSNK1E

CSNK1A1 VRK1 CDK4

MAPK14 CDK9 ADRBK1 RPS6KA1 CDK5 CDK2

PRKACA EEF2K PRKACB PRKCD RIOK2

PRKAA1 PHKG2 CORE

CAMK1

Varjosalo et al., 2013 Nature Methods Varjosalo et al., 2013 Cell Reports Global view on kinase signaling networks

BCAS2 PIP4K2A GTPBP4 GPATCH2 HECTD1 PNN PRRC2C G3BP1 DEK CDC5L PBK CAMK1 VRK1 PARP1 HP1BP3 PRMT1 CUL7 OBSL1 ZC3H18 PLRG1 ZNF687 BEND3 PRKCB VPS35 NRP1 PRPF19 MKI67IP SAP18 SRRM1 BCR KIAA0494 MKNK1 MRPS27 CCDC8 MAPK14 VPS26A RIOK2 HADHA RNF111 SAP30BP PRKCD PRKCQ EIF3L EWSR1 BRD2 GPC4 EIF3A CCDC59 MAPKAPK2 MAPKAPK3 TPRKB C14orf142 HDGFRP2 CSNK2A2 NKAP SQSTM1 SMS OSGEP ARHGEF7 ARHGEF6 HAGHL PIP4K2C TP53RK EIF3J C11orf57 NUP43 IGF2BP2 CDC42 PAK2 MZT2A IGF2BP3 AUTS2 PAK1 EIF3B HIRIP3 LAGE3 NCK2 HADHB MZT2B ZNF618 GIT1 THRAP3 FBRSL1 SEC23B INADL CSNK2A1 PIP4K2B USP34 EPM2AIP1 EIF2AK2 FBXO28 AKAP9 AKAP7 NOLC1 TCOF1 SEC16A AKAP1 AKAP11 CSNK2B CDK11A RYBP SDC2 BACH1 DHX30 RRP1B SDAD1 KIAA0913 PRKRA C2orf88 AKAP2 AKAP5 YAF2 CCNL2 PCGF3 UNC45A CUL2 PRKACG PPP6R3 KLHDC3 COIL PCGF5 CDK11B DBR1 GPR161 PRKAR1A CCNL1 RING1 CSNK1A1 RASAL2 ADAM22 STOX2 PRKAR2A FBRS MLL TCEB2 PRKAR2B C18orf25 RNF2 SEC13 MTA2 TENC1 FAM55D PRKAR1B FAM199X FAM83G CSNK1D MCC PRKACA DYNLL2 PRMT5 FRK DYRK1A WDR5 SNX22 AHCYL1 MAPRE1 PPP2CA CSK DCAF7 SIKE1 DYNLL1 FAM83D CSNK1E USP9X FAM83H AHCYL2 PDE4DIP FAM40B STRN3 FAM83B PRKACB USP9Y PPP2CB ADRBK2 FARP2 CDK5RAP2 STK24 PDCD10 PPP2R1B HMMR VAPB DVL2 PPP2R1A PER1 FAM110B C10orf47 MOB4 SNX24 MAPK15 FGFR1OP2 ADRBK1 FAM110C CDK7 VAPA STRN4 STRN CRY1 RBL1 VPS13B PRDX4 SKP2 MNAT1 SLMAP CTTNBP2NL GOLGA2 AXIN2 GMNN CDC20 PRUNE AXIN1 CTTNBP2 CDK2 RAE1 FBXW11 CKS2 CDT1 PPP1CA STK25 RPS6KA3 MST4 BTRC CCNB2 GSKIP GSK3B SAV1 PPP1R2 EEF2K CKS1B RPS6KA2 MAPK1 TAOK1 ACACA CCNB1 CCNJ FRAT2 WDFY3 RASSF2 RASSF3 EP300 CLASP2 MASTL CREBBP CCNA2 TLK2 SRP9 SRP14 CDKN1C FRAT1 EPB41L3 RPS6KA1 RASSF5 RASSF4 CCNE1 GRN ERC1 RBL2 CDKN1B CCNE2 PRKAB2 RASSF1 CCND1 CDKN1A PRKAB1 CCND3 HEXIM1 CCNT2 STK3 STK4 FKBP5 PRKAA1 DST MAP1S CCND2 PRKAG1 AFF1 CCNT1 MLLT1 AIP NEB MAP1B CDK3 MLLT3 PRKAG2 AFF4 LARP7 HSP90AA1 CABLES1 AFF3 CDK9 MEPCE CDC37 HSP90AB1 CABLES2 KIAA0195 FGFR4 HEXIM2 PFKP DPM1 CLN6 FIBP CDK15 TRAP1 CDK12 RSU1 NDUFS3 CCT4 CDK5 DRG1 STUB1 ILK TCP1 CCT8 PHKA2 ZC3H15 RAF1 KIAA0528 TRABD CCT5 NPEPPS CCT6A CCT2 PHKA1 FANCI TIMM50 CAD CDK18 GCN1L1 CDKN2B PHKG2 CCT7 VDAC2 RHOT1 RBBP7 CCT3 CDK4 YME1L1 MAGED1 RHOT2 ATP2A2 GLUD1 CDKN2C FASN AKAP8L LIMS1 PNKD SEC61A1 AIFM1 BZW2 ATAD3C TNPO1 LONP1 LPCAT1 CDK6 CDKN2A LIMS2 CNNM3 PRKDC SDF4 POLR2B SSR1 TRAFD1 NDUFA4 SCO2 PARVA CDIPT WDR6 PARVB OXSR1 MAD2L1 ATAD3A TECR HAUS5 ZNF568 ILVBL IRS4 ABCE1 VKORC1

Varjosalo et al., 2013 Cell Reports Varjosalo et al., 2013 Nature Methods Detailed information of individual complexes

MED22 MED21 MED23 MED20 MED24 MED18

MED27 MED17

MED28 MED16

MED29 MED15

MED30 MED14

CDK8 CDK19 MED31 MED13L

MED4 MED13

MED6 MED12

MED7 MED11

MED8 MED10 MED9 CCNC MED1

Varjosalo et al., 2013 Cell Reports Varjosalo et al., 2013 Nature Methods Interactions do matter: MED12 examples MED12 mutations in uterine leiomyomas

MED12 is altered (G44D) in 70% of the tumors. Kampjärvi, et al., 2011, Science

Turunen, et al., 2014, Cell Reports G44D mutation disrupts MED12 interactions with Cyclin C and the kinase complex

Turunen, et al., 2014, Cell Reports Thank you!