Supplementary information for ‘Disseminating cells in tumours acquire an EMT stem cell state that is predictive of metastasis’, Youssef et al.

Supplementary Figure S1

Figure S1 – No difference in self-renewal ability between the 4 EMT sub-populations. Two independent experiments assessing the ability of single seeded cells to populate a well, after single cell sorting the

4 EMT sub-populations from the CA1 and LUC4 oral cancer cell lines into 96 well plates. All 96 seeded wells were assessed after 30 days culture, and assigned as greater than 1/2 full, containing cells (+ve), or devoid of cells (-ve). For technical reasons, the EpCAM-CD24+ sub-population was not sorted from

CA1 in the first experiment. The outcome shows that all sub-populations can self-renew with similar efficiency, whether looking at wells > 1/2 full or total +ve wells.

Supplementary Figure S2

Figure S2 – Tumour characteristics for the 12 metastatic tumours (tumours 1 – 12) and 12 non- metastatic tumours (tumours 13 – 24) that were sectioned and stained for EpCAM, CD24 and

Vimentin.

Supplementary Figure S3

Figure S3 – The mean ratio of FPKM-UQ RNA-seq values for progressing versus progression-free tumours from the TCGA OSCC cohort, for all that were specifically upregulated in either CA1,

EMT-stem or EMT-restricted in our expression microarray dataset. A ratio greater than 1 denotes genes that are up-regulated in progressing tumours. The Y-axis is shown as a log10 scale.

The identities of the four most up-regulated genes in EMT-stem are labelled.

Supplementary Figure S4 – Displayed on the following two pages

Figure S4 – analysis and functional clustering for the 10 clusters in figure 6B. Gene Ontology (GO) term enrichments and functional clustering diagrams were generated by inputting the gene list into the STRING online tool. In the functional clustering, lines connecting genes indicate functional interactions (confirmed or predicted), with line thickness dependent on the strength of evidence for an interaction. Non-interacting genes are excluded from the functional clustering diagrams. Where insufficient GO term enrichments were returned, manual annotation was performed by identifying the function of genes within the main functional clusters.

Cluster 1: CA1 and EMT-restricted > EMT-stem Cluster 2: CA1 > EMT-stem > EMT-restricted 63 genes 47 genes

Genes Top 5 GO terms (biological processes) Genes Top 5 GO terms (biological processes) GAS6 FAM96B GO-term AFAP1L2 ZBED2 No significant GO term enrichments detected HIST2H2BE TSC22D1 description CAV2 UGT1A6 for biological processes C1orf24 ST3GAL4 count in gene set FAM3C NAMPT PLAC8 LOC729208 false discovery rate LITAF PDE6D Manual annotations IMPA2 IRF7 GO:0060337 MTSS1 BMP2 PHF15 RARRES3 type I interferon signaling pathway SYK GNA15 KRT17 and KRT6B are involved in proliferation of epithelial APRT MX1 13 of 65 C20orf27 TEAD2 cells C10orf54 GLTSCR2 1.20E-17 C20orf30 CTSH High spliceosome activity – metabolically active LOC100134273 HS.574590 PPP1R14C SDCCAG10 IFITM1 PSMB10 GO:0051607 LIPG GSDMC MFSD3 LOC645157 defense response to virus SH3KBP1 ANKRD57 HS.25318 LOC643272 12 of 181 CRIP1 LEPREL1 IFITM3 LOC390345 2.23E-11 NAT5 MRPS26 CUEDC1 MRPL23 LOC642975 SIRPA EPSTI1 BCAT1 GO:0009615 SLC27A3 CORO2A ITPRIP EFNA1 response to virus CTSL2 KRT6B SLC15A3 ITPR3 13 of 270 ATP9A IRX2 MYL6B LOC653156 6.85E-11 TXNIP PERP CFB NCRNA00219 ANXA8 SNRPB2 LOC388532 VEGFC GO:0098542 CREG1 FASTKD5 NPM3 IFIT2 defense response to other organism CSAG1 KRT17 MSLN FAM129A 14 of 438 SH3PXD2A LOC100129759 SREBF1 PODXL 1.25E-09 LOC652846 SNRPB EGR1 OAS2 COMMD7 IFITM2 ARL2 GO:0035455 HMGA1 SAMD9L response to interferon-alpha IFI6 STAT2 6 of 22 C17orf45 OAS3 1.81E-08 DDIT4 PSMB9 LOC387825 OAS1 IFIT1 ZFP36 LOC728453

Cluster 3: CA1 > EMT-restricted > EMT-stem 79 genes

Genes Top 5 GO terms (biological processes) C17orf61 LOC653381 GO-term AVPI1 CAMK2N1 description SAMD9 SCD count in gene set C16orf74 VAMP8 false discovery rate JUND MAL2 GO:0060337 FASN S100A2 type I interferon signaling pathway HPS6 DHCR7 7 of 65 ECGF1 ISG15 6.61E-06 FRAT2 PHGDH GO:0009615 LOC388564 PDLIM1 response to virus MMP7 RPL22 11 of 270 FGFBP1 C6orf136 6.61E-06 NME4 ACTA2 GO:0051607 PARP12 FOS defense response to virus DDX58 AKR1A1 9 of 181 NIPSNAP1 IRF9 7.76E-06 SEMA3A MIR1974 GO:0051707 MIR1978 MMS19L response to other organism TIMP3 PHF11 15 of 835 CEBPD TLCD1 5.62E-05 ARHGEF2 HERC6 GO:0071310 ITGA2 ISG20 cellular response to organic substance C1orf66 TMEM41B 22 of 2219 IGFBP6 MMS19 0.00083 C20orf199 PSCA CDKN1C TMBIM1 Other notable GO terms with high significance TFCP2L1 COL17A1 GO:0034329 IFIH1 LGTN cell junction assembly RPS2 FER1L3 4 of 135 TRIM8 SNHG6 0.0374 FERMT1 RSAD2 IRF1 CTDSP2 Manual annotations PARP14 INSIG1 ESRRAP2 MYOF Fos and Jun GJA1 MBOAT2 NMB STAT1 TP53AP1 IFIT3 FAM83A IFI44 CDT1 TIGA1 LAMB3

Cluster 4: CA1 > EMT-stem and EMT-restricted 236 genes

Genes Top 5 GO terms (biological processes) CDR2L FGFR3 ARHGEF3 COMTD1 GO-term ACSF2 LOC643287 ACSL3 IDH1 description B4GALT5 LRP11 KLK5 RRAD count in gene set ASCL2 SERPINB7 MPZL2 CTDSPL false discovery rate LOC100128918 VAT1 C1orf106 RNY4 GO:0008544 CBLC BCAS4 KLK10 UPK2 epidermis development CALML3 LOC387934 HAMP OLR1 37 of 403 ROD1 TP63 LAD1 PPL 1.11E-18 HES4 TP73L FST SLC44A2 GO:0070268 EDG4 IGSF3 WNT10A TACSTD2 cornification STARD10 TMEM51 SERPINB13 ARHGEF16 22 of 112 BIK PSTPIP2 GRHL2 EPHA1 1.38E-16 SAT1 RHOD PLCG2 SCEL GO:0030855 CCDC24 KRT80 LCN2 CENTA1 epithelial cell differentiation SQLE DHRS3 TMEM30B MAPK13 42 of 649 C20orf55 KLK7 CDH1 AKR1B10 1.38E-16 CD70 SPINK6 DSG3 CKMT1A TMEM54 FDFT1 LGALS7B LOC100132240 GO:0030216 HES2 LAMA3 PRRG2 TACSTD1 keratinocyte differentiation VARS2 CAPN1 PI3 TNIP1 28 of 267 SPINT1 PYCARD SLC29A2 HMGCR 1.97E-15 C10orf47 DFNA5 CRABP2 CNN3 GO:0009913 CXCL16 RBM47 JUP PYGL epidermal cell differentiation ECM1 TNFSF10 TSPAN13 C1orf116 29 of 306 CD24 C1orf172 DENND2D PTGES 4.63E-15 S100A8 IL1RN MXD1 ST14 PTK6 KRT86 DDX60 THBD C1orf74 SC4MOL ELF3 TASP1 MAP7 A2ML1 SLC44A3 ZNF165 SH2D3A LOC650369 KLK11 HBEGF SYTL3 IFI27 TNF IRF6 ETS2 ZC3HAV1 LEMD1 LOC729645 CDS1 MBP SPRR1A ESR1 SFN HSPB1 RAPGEFL1 SBSN C10orf58 UCA1 FOXA1 LOC649970 RNF144B BHLHB2 ANO1 LOC728910 MYO1D ELF1 KRT34 CD86 SERINC2 PIM1 RBM35A GPR56 BTG2 TRIM22 PROM2 PCSK9 CD82 LOC100130919 CDH3 SPRR2A SOX15 KRT15 LGALS7 EPCAM MAPK3 SLC37A2 TMEM154 SPRR1B RAP1GAP DSG2 TYMP GJB5 ANXA8L2 F3 HMGCS1 LOC100129882 COBLL1 GRAMD2 PRSS8 SLC1A3 CMTM6 SRPX2 SPRR2D PDZD2 CSRP2 MVD CGN LYN FAM84B CXXC5 DSC2 S100A14 OSBP2 MMP28 PNLIPRP3 TMEM16A LOC730278 PTPRF CLDN7 SLC2A3 RAET1G OASL EHF FXYD3 JUN PRIC285 RAB25 KRT13 LOC442727 CD9 SPRR2F NRIP1 RALA UPLP PTGS2 SEL1L3 VIL2 S100P CLDN12 OVOL2 NMU AP1M2 SLC25A23 SH3YL1 SDC1 CRB3 LOC653499 S100A9 CNN2 DMKN HSPBL2 TGM1 FOXQ1 KRTCAP3 ZNF277 TINAGL1 Cluster 5: EMT-restricted > CA1 > EMT-stem Cluster 6: EMT-restricted > EMT-stem > CA1 21 genes 78 genes

Genes Top 5 GO terms (biological processes) Genes Top 5 GO terms (biological processes) BIN1 GNG11 GO-term DPP4 GO-term AKR1C3 IGF2BP3 description HIST1H2AC description CXCL6 MPP1 count in gene set M6PRBP1 count in gene set DPY19L1 AKR1C4 false discovery rate CDH11 false discovery rate SLIT2 FTHL11 GO:2000145 LOC347292 GO:0006952 HIST1H1C FTHL2 regulation of cell motility GBP2 defense response ICAM3 COL23A1 13 of 807 LOC653879 9 of 1234 C19orf10 RNF144A 0.0052 NAV2 4.24E-05 CYP1B1 SERPINE2 KYNU GO:0045087 SNORA61 TIMM44 GO:0051270 innate immune response CD320 FAM109A LOC100129599 regulation of cellular component movement 5 of 676 ATOH8 XAGE2B C3 14 of 886 0.0416 NTSR1 LOC645638 PARP4 0.0052 RAB31 GO:0002376 HSPB8 SH2B3 immune system process GLRX ARHGEF6 LOC100134134 GO:0048583 MESP1 CDH2 LOC127295 8 of 2370 regulation of response to stimulus RAMP1 C1QTNF1 IFI44L 0.0416 30 of 3882 ANTXR2 IL13RA2 SAA1 GO:0034341 0.0053 APOOL SDPR SAA2 response to interferon-gamma ABCC2 TWIST1 GBP1 3 of 176 C10orf90 DOCK10 GO:0030334 LOC100133511 0.0465 regulation of cell migration COL8A1 DPP9 12 of 753 TRNP1 TMEM136 LOC390530 0.0065 CA12 ABLIM3 DCN KRT81 DPYSL3 AOX1 GO:0035295 tube development HIST1H2BK SNORA16A 12 of 793 LOC88523 HS.444329 0.0078 TSHZ1 LOC100134370 FLJ43879 SMAD6 SQSTM1 PTMS HIST2H2AA3 SLC7A2 PRKCDBP ABCC3 HIST2H2AA4 CPA4 MAGED1 EMP3 TIMP1 SPRY2 S100A3 TNFAIP6 DPYD S100A4 CIRBP DPYSL2

Cluster 7: EMT-stem > CA1 > EMT-restricted Cluster 8: EMT-restricted and EMT-stem > CA1 33 genes 80 genes

Genes Top 5 GO terms (biological processes) Genes Top 5 GO terms (biological processes) CA2 GO-term C6orf141 FOSL1 GO-term ESPN description H19 FTHL3 description IGFBP3 count in gene set ANKRD1 FAM46A count in gene set ALS2CR4 false discovery rate CTGF VIM false discovery rate IL1B GO:0032642 BAMBI FHOD1 GO:1902624 GJB6 regulation of chemokine production GLIPR1 CCL2 positive regulation of neutrophil migration SPANXA1 4 of 74 ANGPT1 METTL7B 4 of 34 PPP2R2C 0.0111 C6orf191 GBE1 0.0175 IL1A GO:2001240 FTHL16 RNF144 GO:0033993 NRIP3 negative regulation of extrinsic apoptotic signaling pathway in absence of ligand CHSY1 C5orf5 response to lipid SPANXE 2 of 37 ID2 CMBL 12 of 825 DKK3 0.0402 MMP3 LOC643479 0.0175 FAM133A GO:2000778 PCNT LOC100129673 GO:0002685 LIX1L TNIK SRXN1 positive regulation of interleukin-6 secretion regulation of leukocyte migration C5orf13 ERRFI1 RFTN1 2 of 25 6 of 175 LOC642567 EML1 SERPINB2 0.0402 0.0185 SOD2 RICH2 EDN1 GO:1904407 GO:0002688 C1orf53 FGFRL1 positive regulation of nitric oxide metabolic process regulation of leukocyte chemotaxis FAM25A KIAA0672 MAGEA6 3 of 42 5 of 112 MFSD1 KCNIP3 CRLS1 0.0402 0.0207 COL6A1 MEGF6 GO:1903034 GO:1901700 TH CCDC99 IGFBP4 regulation of response to wounding response to oxygen-containing compound LIMA1 IL8 LOC643161 3 of 148 15 of 1427 MYO1B C12orf24 NEFL 0.0402 0.0278 MARCKSL1 C1S CXCL1 ULBP2 Manual annotations LOC728888 LOC647993 FKBP1A LOC729009 SUSD2 LOC729200 Inflammatory response to wounding MUC13 CD99L2 NOP56 Tissue repair FJX1 GOLGA8B SLC4A11 WNT5A PDE4A LOC730167 CHES1 SPANXB2 FAM83D NNMT TYMS SPANXD C20orf127 CAPRIN2 VISA MTE SEPP1 QSOX1 LOC643150 STXBP5 SRPX PGBD3 IQCG TMTC1 LOC647987 CAP2 LOXL1 CXCL2 ZNF428

Cluster 9: EMT-stem > EMT-restricted and CA1 Cluster 10: EMT-stem > EMT-restricted > CA1 75 genes 101 genes

Genes Top 5 GO terms (biological processes) Genes Top 5 GO terms (biological processes) ACMSD EOMES C9orf116 LOC400578 HLA-DPA1 LOC440160 No significant GO term enrichments detected IMPAD1 LOC730313 Only one significant GO term enrichment detected ADAMTS1 M160 for biological processes ACER3 ORC6L for biological processes LOC642477 EEF1A2 HYLS1 KRT16 ZNF77 LOC100131139 Manual annotations GNG12 SLC9A9 pathway FLJ20718 TMEM2 GAS1 IRX5 description PTHLH FBXL16 downregulation PMP22 JAM3 count in gene set GPT2 BTG3 downregulation of apoptosis NFIL3 HERPUD1 DDX51 MIOS false discovery rate LOC643031 MAFB survival of oxidative stress MAGEC1 CIAPIN1 LPXN MGC102966 HSA-5660526 DLL1 MAEA THBS2 SIPA1L2 Response to metal ions HS.579631 SYNCRIP CLCA2 IL1R1 3 of 14 PM20D2 LOC652683 RGL1 WRB 0.0154 MAGEC2 NUDT11 NRGN COPS8 CCDC55 AADACL1 LOC653419 UAP1 FAM46C SPANXC Manual annotations LOX TTC14 EIF5A2 COL5A1 AMPH CATSPER2 PPAP2B GPR137 D4S234E LOC653071 p53 downregulation IRX3 TSPAN3 HS3ST3A1 LOC100132585 downregulation of apoptosis IRS2 KDELC2 LRAP TNC survival of oxidative stress PDPN ODC1 SGK1 C16orf63 TMEM158 DNER C1GALT1 HIF1A ADORA2B PHTF1 CRYBA2 LOC100132391 HEG1 SRPRB LRRC38 SGTA ZFR CTSL1 BCYRN1 ISCA1 DDAH1 GCNT1 LOC647954 LOC100130835 LOC645895 LOC100133772 CREBZF SLC22A4 OBFC2A LOC652545 MRPS6 STC2 BASP1 LOC728809 TSPAN9 ENC1 COQ9 LOC100128505 BMS1P5 GINS3 TRAM2 HS.128753 PRPF3 FHL1 CSRP1 KIAA0367 RSPRY1 LOC100133171 NOP58 MT1X KCTD12 LOC100130516 LOC729252 ALDH3A1 DENND2A PHCA MGC16121 MT1G BBS2 SLC16A9 LOC100128084 POP1 NTF3 SOX18 SPTAN1 ZNF259 GJB2 RRM2 ASAP1 CYR61 SGK NFIX ELL2 FLJ44124 CRISPLD2 PCOLCE2 MAGEA1 MAP2 PRKCA ACAA2 LOC643911 DEPDC6 SFRS1 FAM175A LMOD3 CNTNAP1 TMEM156 SPANXB1 SYNJ2 SNHG9 LOC441019 RTN1 PRICKLE2 SPANXA2 SLC30A1 LOC729603 Supplementary Figure S5

Supplementary Figure S5 – The ability of EMT sub-populations to re-populate the epithelial tumour population is not associated with a spectrum of epithelial/mesenchymal gene expression amongst classical EMT regulators. A, B, Quantitative RT-PCR analysis of gene (A) and miRNA (B) expression for the indicated targets in the 4 sorted EMT sub-populations from the CA1 and LUC4 cell lines. Data is displayed as fold-change compared to the epithelial population, on a log scale, with 1 indicating no change. Mean +/- SEM is displayed for each EMT sub-population, and the number of biological repeats is indicated next to each set of graphs. P-values for comparisons between EMT sub-populations were calculated using a two-sided t-test, and are displayed where the comparison showed a statistically significant difference for both cell lines. Primer sequences (mRNA) and assay codes (miRNA) for quantitative PCR

Vimentin

F: CCCTCACCTGTGAAGTGGAT R: GACGAGCCATTTCCTCCTTC

E-cadherin

F: GAACGCATTGCCACATACAC R: AGCACCTTCCATGACAGACC

Twist

F: TCTGGAGGATGGAGGGGG R: CTGTCCATTTTCTCCTTCTCTGGA

Zeb1

F: GTCCAAGAACCACCCTTGAA R: TTTTTGGGCGGTGTAGAATC

Snail

F: CAAGGAATACCTCAGCCTGG R: CATCTGAGTGGGTCTGGAGG

Slug

F: AGATGCATATTCGGACCCAC R: GCAGTGAGGGCAAGAAAAAG

Prrx1

F: CTGATGCTTTTGTGCGAGAA R: ACTTGGCTCTTCGGTTCTGA

FSP1

F: TACTCGGGCAAAGAGGGTGA R: TTGTCCCTGTTGCTGTCCAA

GAPDH

F: CATGGGGAAGGTGAAGGTCG R: AGTTAAAAGCAGCCCTGGTGA

MiR34a miScript primer assay MS00003318 (Qiagen)

MiR200c miScript primer assay MS00003752 (Qiagen)

RNU6-2 miScript primer assay MS00033740 (Qiagen)