SUMO targets in human induced pluripotent stem cells

Identification of SUMO targets required to maintain human stem cells in the pluripotent state.

Barbara Mojsa1, Michael H. Tatham1, Lindsay Davidson2, Magda Liczmanska1, Emma

Branigan1, and Ronald T. Hay1*

1Division of Regulation and Expression, 2Division of Cell and Developmental Biology,

School of Life Sciences, University of Dundee, Dundee, UK

*Corresponding author [email protected]

Supplementary Figures and Figure Legends S1-S6

Supplementary data files

This article contains supplemental data that are included as separate data files that contain following information:

Supplementary data file 1. Summary of the quantitative data from the proteomics experiment to study changes to the cellular proteome during ML792 treatment of ChiPS4 cells.

Supplementary data file 2. Summary of the quantitative data from the proteomics experiment to study differences in the cellular proteome among wild type ChiPS4 cells and cells expressing 6His-SUMO1-KGG-mCherry or 6His-SUMO2-KGG-mCherry.

Supplementary data file 3. Summary of the quantitative data from the proteomics experiment to identify SUMO1 and SUMO2 targets from ChiPS4 cells.

S-1 SUMO targets in human induced pluripotent stem cells

ChiPS4: WT SUMO1 SUMO2 WT SUMO1 SUMO2 WT SUMO1 SUMO2 -KGG -KGG -KGG -KGG -KGG -KGG heat shock: - + - + - + - + - + - + - + - + - + 42°C 15’ 250 250 250 150 150 150 100 100 100 75 75 75 50 WB: 50 50 anti-His 37 37 37 25 25 25 20 20 20 15 15 15 50 WB: WB: WB: anti-tubulin anti-SUMO1 anti-SUMO2

Supplementary Figure 1. Exogenous 6His-SUMOKGG constructs do not significantly affect pluripotency or the cellular proteome of hiPSCs.

ChiPS4 WT, SUMO1-KGG and SUMO2-KGG expressing cell lines were exposed to heat shock for 15 minutes at 42°C and total lysates were analysed by western blot using anti-

SUMO1, anti-SUMO2/3, anti-His and anti-tubulin (loading control) antibodies.

S-2 SUMO targets in human induced pluripotent stem cells

A B WT WT 100 100 SUMO1-KGG SUMO1-KGG SUMO2-KGG SUMO2-KGG 80 80

60 60

40 40 Normalized To Mode To Normalized Mode To Normalized 20 20

0 0 0 50K 100K 150K 200K 250K 3 3 4 5 -10 0 10 10 10 PI-A Alexa Fluor 488 - NANOG C IF: anti-NANOG anti-OCT4 anti-SOX2 anti-TRA-1-60 / DAPI

ChiPS4 WT

ChiPS4 SUMO1-KGG

ChiPS4 SUMO2-KGG

D Linage marker: Endoderm Ectoderm Mesoderm IF: anti-CYTOKERATIN17 / DAPI anti-NESTIN / DAPI anti-SMA / DAPI

ChiPS4 SUMO1-KGG

ChiPS4 SUMO2-KGG

Supplementary Figure 2. hiPSCs expressing 6His-SUMOKGG constructs do not show any cell cycle, pluripotency or differentiation defects.

S-3 SUMO targets in human induced pluripotent stem cells

Flow cytometry analysis of A. cell cycle and B. NANOG expression. C. Immunofluorescence analysis of pluripotency associated markers (NANOG, SOX2, OCT4, TRA-1-60) in ChiPS4 WT,

SUMO1-KGG and SUMO2-KGG expressing cell lines. D. In vitro differentiation potential of

ChiPS4 SUMO1-KGG and SUMO2-KGG expressing cell lines was assessed by immunofluorescence staining with DAPI and specific antibodies against CYTOKERATIN 17

(Endoderm), NESTIN (Ectoderm) and SMA (Mesoderm). C – D. Immunofluorescence (IF) images were obtained using a Leica DM-IRB microscope equipped with a Hamamatsu CCD camera and 20x 0.3C-Plan lens. All images contain 100 µm scale bar.

S-4 SUMO targets in human induced pluripotent stem cells

A Induced Pluripotent Stem Cells B 6His- 6His- Cell Type SUMO1-KGG SUMO2-KGG 1 2 3 1 2 3 Replicate SUMO1 or SUMO2 modified target protein

Trypsin HHHHHH SUMO2 KGG Whole Cell Protein Extracts Total protein HHHHHH SUMO1 KGG differences ...YFVPPKEDIKPLKRPRDED... WCE ...YFVPPKEDIKPLKRPRDED... Nickel NTA-Affinity purification LC MS/MS SUMO modified LysC digestion MaxQuant protein-level LysC

6His differences FASP digestion - LysC & GluC GG LysC/GluC Perseus SUMO site-level EDIKPLK LysC Gly-Gly-K peptide enrichment differences

GG-K Identical branched peptide LysC/GluC

C WCE NiNTA purification GG-K IP SUMO1 SUMO1 20 SUMO2 6His Purification 6His Purification SUMO2 WCE 10 SUMO2 SUMO1 20 GG-K IP GG-K IP Experiment 2 WCE Experiment 1 Experiment 2 Experiment 1 0 0 0 Experiment 1 Experiment 2 SUMO2 SUMO1 6His Purification SUMO1 SUMO2 Component 2 (11.6%) SUMO2 SUMO1 -10 Component 2 (22.7%) Component 2 (20.1%) GG-K IP WCE -20 WCE 6His Purification GG-K IP -10 0 10 20 30 -20 0 20 -40 -20 0 20 Component 1 (25%) Component 1 (72.8%) Component 1 (56.5%) 40

D WCE NiNTA purification GG-K IP

8 8 8 6 6 6 4 4 4 2 2 2 0 0 0 -2 -2 -2 -4 -4 -4 -6 -6 -6 SUMO1/SUMO2 (Experiment 2) SUMO1/SUMO2 (Experiment 2) 2 SUMO1/SUMO2 (Experiment 2) 2 -8 -8 2 -8 Log Log

-8 -6 -4 -2 0 2 4 6 8 -8 -6 -4 -2 0 2 4 6 8 Log -8 -6 -4 -2 0 2 4 6 8 Log SUMO1/SUMO2 (Experiment 1) Log SUMO1/SUMO2 (Experiment 1) Log SUMO1/SUMO2 (Experiment 1) 2 2 2

Supplementary Figure 3. Overview of experimental design and proteomic data relating to

SUMO1 and SUMO2 site identification in hiPSCs

A. Overview of a proteomics experiment to identify IPS-specific SUMO1 and SUMO2 substrates. Two experimental runs were performed with two different hiPSC lines (expressing

6His-SUMO1-KGG or 6His-SUMO2-KGG), each one was performed in triplicate. Three protein fractions were analysed; whole cell extracts (WCE), NiNTA column elutions (6HIS), GlyGly-K immunoprecipitated peptide elutions (GG-K IP). All peptides were analysed by LC-MS/MS and

S-5 SUMO targets in human induced pluripotent stem cells data processed by MaxQuant. B. SUMO1-KGG and SUMO2-KGG leave identical GG adducts on substrates after LysC digestion, therefore peptide intensity differences between cell types can be used to infer site-specific SUMO paralogue preference. C. Principal component analyses of MS data from the three different cell fractions. D. Comparisons between experimental runs for SUMO1/SUMO2 ratio data for each of the three cell fractions analysed.

S-6 SUMO targets in human induced pluripotent stem cells

1041 1527 1403 712 134 148 7 TRIM 99 286 117 524 1509 172 33 1322 1373 769 1315 388 TOP1 801 TRIM 793 776 ZNF106 1248 515 RANGAP1 24 1223 461 548 723 1058 29 741 SUZ12 417 1068 656 587 380 2852 648 352 707 373RSL1D1 934 626 597 120 2734 744 694 356 331 318 560 2504 1144 2613 880 728 MKI67 2482 63 1035 267 1122 1137 2492 2462 248 1037 2009 32 12 1181068 9 234 VRTN 10271107 1133 239 1337 20 269 1006 1119 223 NPM1 ZMYND8385 2075 393 219 1114 202 243248 1946 202 796 629 186 ZNF462 656657 734 448 668 134 630 980 ZNF532 330 503 1693 674 374 904 150 632 487 1415 678 141 469 986 692 519 566 640 1499 1006 706 516 525 195 1057 644 163 273 652 175 149 114 260 811 42 659 173 133 39 264 17 250 222 285 704 11 169 ZNF687 156 168 364 ZMYM4400 429 430 WIZ 902 372 144 10801035 9471049 1032 289 435 551 356 384 838 293 439 278 SETDB1 154 33 SALL4 358 290 CBX1 451 360 460 426 99 432 464 564 362 1456 540 436 403 570 177 372 12711440 109 134 349 1214TOP2B NOP56 2640 496 468 173 475 381 374 3002 MBD1 329 187 240 2590 193 654 467 MGA 2242195 9 337 7 357 827 991 443 2120 1184 39 2074 1946 94 425 11 78 6 2096 164 816 2100 845 221 240 413 17 SUMO1 147 GTF2I DKC1 25 843 133 326 664 37 832 110 670 488 343 497 345 790 104 779 ZNF451 467 1454 58 706 485 1441522 0 98 56 689 444 1099 20 168 465 1240 1385 156 325 340 129 142 18 426 261 ZMYM2 687 NOP58 1196 940 938829 BEND3 1159 TOP2A 74 1091086 8 529 529 1646 CTCF 512 655 1623 429 1600 230 219 225 1582 DNMT3A SAFB2 252 796 396 804 1548 276 293 841 586 785 1541 400 468 302 POLR3D 517 380 CHD4 779 551 350 672 285 141 687 777 391 89 11 1188 220 750 1204 753 2 395 SUMO3 21 124 634 110 TRIM28 DNMT3B221 261 EZH2 63 269 617 210 11 78 257 349 469 UBB 55 SUMO2/3 428 507 11 32/33 359 48 554 41/42 RSF1 SUMO2 20 369 575

Number of experiments significantly differing 5 6 7 8 9 1011 0 1 2 -5 -4 -3 -2 -1 0 1 2 3 4 5 Log2 Ratio SUMO1/SUMO2 Log Intensity 10

S-7 SUMO targets in human induced pluripotent stem cells

Supplementary Figure 4. Schematic presentations summarising the SUMO1 and SUMO2 proteomic data for a selection of substrates in hiPSCs.

Schematic presentations of selected substrates found to be SUMO modified in hiPSCs.

Proteins nodes are labelled by name and site nodes by number. SUMO preference is represented by colour considering all GGK peptides (protein nodes) or individual site peptides

(site nodes). Peptide intensity is represented by the size of the node (see key). Site node border line thickness represents number of experiments in which it was found to show a significant SUMO preference. Edges linking sites to proteins are positioned relative to their position in the linear protein sequence with first and last residues positioned at the top of the protein node. Substrates are organised from generally SUMO1 preferential (top), to SUMO2 preferential (bottom).

S-8 SUMO targets in human induced pluripotent stem cells

A 17 residue sequence window D Full 21 residue sequence window without consensus motif 8 SUMO1 preferential Rank 1-123 6 (3.07) 4 TRIM33-K776

2

0

-2

-4 ZBTB17-K251

log2 SUMO1/SUMO2 Rank 124-246 SUMO2 preferential -6 (1.89) 0 123 246 369 492 615 739 Rank

B (log2 SUMO1/SUMO2) TRIM33-K776 AEKTSLSFKSDQVKVKQEPGTEDEICSFSGG (4.88) ZBTB17-K251 EQEEQEEEGAGPAEVKEEGSQLENGEAPEEN (-4.42) -15 -10 -5 0 5 10 15 C Rank 247-369 SUMO1 preferential SUMO2 preferential (1.18) Position Residue 1-123 63-185 124-246 186-308 247-369 309-431 370-492 432-554 493-615 555-677 616-739 -7 S 1.47 1.80 0.73 0.97 3.66 4.56 1.80 0.51 0.73 0.73 0.73 -5 S -0.04 -0.46 0.84 1.96 0.61 -0.02 1.08 4.37 2.67 -0.23 -0.02 -2 D 0.86 2.70 1.87 1.87 3.65 2.70 2.27 3.65 3.65 4.17 5.89 -1 E -2.21 -1.36 -2.18 -4.25 -3.40 -1.73 -1.36 -2.18 -2.73 -1.36 -0.54 -1 V 7.30 6.76 3.08 2.66 3.53 5.04 11.50 17.13 13.02 6.76 8.01 -1 I 11.02 11.90 13.55 6.15 3.84 4.38 3.84 4.94 7.46 11.90 8.14 0 K 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1 Q 6.79 2.02 1.64 2.44 0.68 0.41 0.41 0.16 1.29 0.68 0.16 1 E 0.80 1.06 0.82 2.51 2.18 0.02 -0.17 1.06 2.18 5.46 4.98 2 E 26.99 17.31 17.31 25.28 29.20 25.28 26.24 31.24 33.34 31.24 24.33 3 P 6.19 3.53 4.55 5.66 4.55 2.22 3.07 5.66 7.50 5.66 2.22 5 E 1.06 0.85 0.21 1.63 3.39 2.27 1.63 3.00 4.26 2.62 2.27 Rank 370-492 6 E 0.16 -0.44 0.38 0.80 0.38 0.18 1.57 3.70 6.08 6.08 3.29 7 E 0.56 0.80 -0.02 0.58 1.03 0.58 1.03 2.18 3.28 4.13 5.06 (0.51) 8 E 0.59 1.32 1.91 0.61 1.91 1.32 0.83 2.58 2.95 4.20 4.66 10 E -0.38 -0.60 0.25 0.67 -0.15 0.45 1.14 0.67 0.25 1.70 3.93 Under-represented Over-represented

10 E r(737)=0.300, p<0.0001 8 Moving average by ratio rank 6 (40 peptides) Rank 493-615 (-0.32) 4

2

0

-2

-4 Rank 616-739 -6 (-1.95) Net peptide charge at pH7.4 (21 residue sequence window) -8

-10 -8 -6 -4 -2 0 2 4 6 8 log2 SUMO1/SUMO2

Supplementary Figure 5. Detailed Sequence logo analysis of SUMO modification sites.

A. Distribution by rank of the log2 SUMO1/SUMO2 ratios for 739 sites. Examples of SUMO1- preferntial (TRIM33 K776) and SUMO2-preferential (ZBTB17 K251) sites are indicated. B. 31 residue sequence windows for TRIM33 K776 and ZBTB17 K251. C. Residue over- and under- representation within sequence windows of SUMO sites grouped by log2 SUMO1/SUMO2 ratio as shown in A. Only amino-acid positions with at least one significant over or underrepresentation are included. Values are -log10 odds of the binomial probability calculated by pLogo using the human proteome as background. Positive values (red fill) show

S-9 SUMO targets in human induced pluripotent stem cells over-representation and negative values (green fill) show under-representation. Boxes with a black borders are statistically significant (p<0.05). D. Sequence logos generated by pLogo

(ref) for 6 groups of target lysines grouped by log2 SUMO1/SUMO2 as shown in A and used to generate the table in C. Rank range is indicated and average log2 SUMO1/SUMO2 shown in brackets. E. Relationship between log2 SUMO1/SUMO2 and net charge at pH 7.4 for the 21 residue sequence window for 739 sites. Pearson correlation is indicated and pink line shows

40 peptide moving average by rank of log2 SUMO1/SUMO2.

A PDB: 3UIP (SUMO-1) B PDB: 3UIO (SUMO-2)

Ubc9 Ubc9

SUMO-2 SUMO-1

Ubc9 Ubc9

SUMO-1 SUMO-2

RanGAP1 RanGAP1

Supplementary Figure 6. Comparison of the electrostatic potential surface of SUMO-1 and

SUMO-2 in complex with Ubc9.

A. Electrostatic potential surface generated at pH 7.0 for Ubc9 and SUMO1 with accompanying electrostatic potential range bar (top panel) and cartoon representation of

Ubc9, SUMO1 and RanGAP1 (bottom panel) based on PDB 3UIP. RanGAP1 and RanBP2 have

S-10 SUMO targets in human induced pluripotent stem cells been omitted from the top panel, while RanBP2 has been omitted from the bottom panel, for clarity. A black oval in the top panel highlights the electrostatic potential surface in the active site of Ubc9 and SUMO1, which is presented to substrate. B. Same as in A but for PDB 3UIO, which contains SUMO2 instead of SUMO1.

S-11