Supplementary Appendix

Genomic Analysis of Thymic Epithelial Tumors Identifies Novel Subtypes Associated with Distinct Clinical Features

Hyun-Sung Lee1, Hee-Jin Jang1, Rohan Shah1, David Yoon1, Masatsugu Hamaji2, Ori Wald1, Ju-Seog Lee3, David J. Sugarbaker1, Bryan M. Burt1

1

Table of Contents

Section Page

Supplementary Methods 3 Supplementary Figure Figure S1 Genomic positions of amplifications and deletions in TCGA TETs 4 Hierarchical Clustering Analysis of mRNA Expression Data in Patients Figure S2 5 with TET (n=120) Distribution of TET molecular subtypes and 3-D plot of principal Figure S3 6 component analysis and molecular subtypes of TETs Survival of TCGA TET cohort based on Masaoka staging system to Figure S4 7 check the quality of clinical data

Figure S5 GTF2I mutation and mRNA expression of GTF2I across 30 types of 8 TCGA cancer Supplementary Table Table S1 Subtype-specific mRNAs in thymic epithelial tumors 9 Table S2 Clinicopathologic Characteristics of Patients 22

Table S3 Univariable Cox Regression Analysis of molecular subtypes and 23 Disease-Free Survival in the TCGA TET Sets (n=120)

Table S4 List of 220 GTF2I target set using known 24 motifs within the TRANSFAC® predicted transcription factor targets dataset

2

SUPPLEMENTARY METHODS

Justification of Decision-Tree Approach through mutual exclusivity and principal component analysis in molecular subtypes of TETs The test for mutual exclusivity applied to groups defined by GTF2I mutation, T cell signaling signature, and SCNA revealed that each subgroup was mutually exclusive (P=3.7x10-14) (Supplementary Figure S2A). Similarly, a 3D-plot of PCA demonstrated that the molecular subgroups could be distinctly separated by GTF2I mutation, T cell signaling, and SCNA (Supplementary Figure S2B). The distribution tails of GTF2I mutation and T cell signaling signature exhibited a mutually exclusive pattern (P=1.4x10-6), and thus identified the groups of samples in the GTF2I and TS groups, respectively. T cell signaling and SCNA were also mutually exclusive (P=2.4x10-8), but GTF2I mutation and SCNA were less significant (P=0.056). Herein, we prioritized the GTF2I mutational status to generate a clinically relevant model since unsupervised clustering and PCA of mRNA data revealed that GTF2I mutation plays a crucial role in classifying TET. Thus, a decision tree algorithm was created to categorize the 120 TET samples into four subtypes using an approach that could more readily be applied to TET in clinical care (Figure 1A).

3

4

5

6

7

8

Supplementary Table S1. Subtype-specific mRNAs in thymic epithelial tumors.

GTF2I TS CS CIN Fold Fold Fold Fold Expression Subtype GTF2I Change TS Change CS Change CIN Change

UP 1 TBX1 138.89 PTCRA 27.03 UPF0639 10.00 PRLR 14.49

2 AACSL 107.53 DNTT 26.32 FCRLA 9.09 MEGF11 13.51

3 KLK8 83.33 CD1B 21.28 CHST8 8.33 TMEM40 13.33

4 DKFZp686A1627 76.92 RAG2 20.00 FCRL2 8.33 FOXA1 10.75

5 FBN3 76.92 PRKCG 19.61 SLITRK4 8.33 LANCL3 10.64

6 AQP5 71.43 CCR9 18.87 CHL1 7.69 PITX1 10.20

7 IRX4 71.43 ARPP21 18.52 F5 7.69 CKMT1B 10.10

8 PHEX 66.67 SLC7A3 17.86 BARX2 6.67 CBLC 10.00

9 IRX2 55.56 C19orf77 16.13 CD22 6.25 GPR144 10.00

10 C1QTNF9B 50.00 ARL5C 15.87 FCRL1 6.25 ENTPD3 9.09

11 SEMA3D 50.00 KCNJ4 15.63 HAS2 6.25 PHF21B 9.09

12 CLCNKB 41.67 CD1A 15.38 SELE 6.25 WDR69 9.09

13 SPOCK3 37.04 CD1E 15.15 FAM13C 5.26 MYO3B 8.33

14 WDR72 34.48 CHRNA3 14.93 PLXNA4 5.26 PDZK1IP1 8.33

15 COL9A3 33.33 C10orf129 14.29 ADAMTS14 5.00 DHDPSL 7.69

16 GPR81 33.33 DPPA4 14.08 ABCA6 4.76 GRHL3 7.69

17 ZNF560 32.26 CFC1B 13.51 CLEC17A 4.76 MOCOS 7.69

18 TYRP1 30.30 MAL 13.33 RGS7 4.76 CKMT1A 7.14

19 GPR158 28.57 UGT3A2 13.33 SLC22A3 4.76 CPLX3 7.14

20 SCUBE3 27.78 TMIGD2 13.16 KCNJ15 4.55 FAM196B 7.14

21 NTF3 27.03 AOX2P 12.99 LPPR4 4.55 PIK3C2G 7.14

22 SEMA3E 27.03 HHIP 12.35 POU2AF1 4.17 POF1B 7.14

23 ADAMTS20 26.32 ZP1 12.20 CSF3 4.00 SMOC1 7.14

24 GJB7 26.32 PRL 11.90 VSIG1 3.85 EVPL 6.67

25 SCUBE1 25.64 RORC 11.90 ECM2 3.57 NCAM2 6.67

26 VGLL3 25.00 CELF5 11.63 HSD11B1 3.57 SRPX2 6.67

27 CLDN8 24.39 PON1 11.63 STAP1 3.57 ALDH1A1 6.25

28 TMEM59L 23.81 CD1C 11.36 ADAMTS4 3.45 C1orf175 6.25

29 TULP1 23.81 MGC29506 11.11 GIPR 3.45 CEND1 6.25

30 SOHLH1 22.73 HMHB1 10.64 SLC12A8 3.45 CYP4F3 6.25

31 PDGFRL 22.22 CXorf49B 10.00 WNT7A 3.45 DGCR9 6.25

32 PRAME 21.28 ELOVL4 10.00 CDKN2B 3.33 DNAH5 6.25

33 PCDHGA11 20.83 GPR44 10.00 CNR2 3.33 KCND3 6.25

34 ANKRD26P1 20.41 LCT 10.00 PTGFR 3.33 NWD1 6.25

35 AQP6 20.00 PRMT8 10.00 RGS16 3.33 AIM1L 5.88

36 THSD4 19.23 TMSB15A 10.00 EBF2 3.23 C12orf54 5.88

37 C5orf38 18.52 VPREB1 10.00 PDE4B 3.13 C19orf45 5.88

9

38 GDF5 18.52 HEMGN 9.09 C6 2.94 CADPS 5.88

39 GPR98 18.18 HRK 9.09 LBP 2.94 DCST2 5.88

40 SLITRK5 18.18 PADI4 9.09 SERPINE2 2.94 IGSF9 5.88

41 COL11A1 17.86 SH2D1A 9.09 TLR10 2.94 MARK1 5.88

42 MYH6 17.86 SMPD3 9.09 IL33 2.86 OSBPL6 5.88

43 COL2A1 17.24 STK32B 9.09 P2RY10 2.86 PGBD5 5.88

44 GPR88 17.24 ABCG4 8.33 ADAM19 2.78 RAP1GAP 5.88

45 SLCO1A2 17.24 ASPDH 8.33 SIGLEC6 2.70 RIMKLA 5.88

46 CCDC144NL 16.95 CD3D 8.33 PDE3B 2.56 TMEM163 5.88

47 FAM132A 16.95 CPA5 8.33 USP26 2.56 ZBP1 5.88

48 SLPI 16.95 FSD1 8.33 DOK3 2.50 FAM155B 5.56

49 FOXI2 16.67 GLYATL1 8.33 PALLD 2.50 GPT 5.56

50 RBM46 16.67 HKDC1 8.33 CXCR7 2.44 IL1RN 5.56

51 DLK2 16.13 HOXA10 8.33 DLEC1 2.44 NCKAP5 5.56

52 KLK7 16.13 NR5A1 8.33 HIF1A 2.44 S1PR5 5.56

53 SGCA 16.13 ODZ1 8.33 IRAK2 2.44 WFDC5 5.56

54 NTS 15.87 PRSS3 8.33 MAP1LC3C 2.44 ALDH4A1 5.26

55 TDGF1 15.87 TSHR 8.33 PALMD 2.44 EMX1 5.26

56 C16orf89 15.63 APOA1 7.69 CD180 2.38 GPR37 5.26

57 IL17RD 15.63 C17orf67 7.69 GSDMC 2.38 MAPK8IP2 5.26

58 LPHN3 15.63 C2orf85 7.69 SP140 2.38 STC2 5.26

59 GFRA3 15.38 CMTM2 7.69 SLFN12L 2.33 TMEM132A 5.26

60 KRT14 15.38 CPLX1 7.69 CD44 2.27 TMEM145 5.26

61 MAB21L2 15.15 7.69 LDB2 2.27 ANKRD2 5.00

62 C3orf32 14.93 HIST1H3G 7.69 CYSLTR1 2.22 C3orf14 5.00

63 NGEF 14.93 KIAA1257 7.69 EVI2B 2.22 FAM84A 5.00

64 PCDHGA12 14.93 MCHR1 7.69 SAMD9L 2.22 GRID1 5.00

65 CRLF1 14.71 SLAMF1 7.69 CCDC88C 2.17 SRPK3 5.00

66 SYT1 14.71 C2orf48 7.14 EMB 2.08 CAPNS2 4.76

67 WNT2 14.49 CREB3L3 7.14 NCOA7 2.08 LPIN3 4.76

68 CYS1 14.29 CXorf65 7.14 BLNK 2.04 OAS1 4.76

69 EDAR 14.29 FGFBP2 7.14 PCDHA1 4.76

70 FBN2 14.29 GRAP2 7.14 PLIN4 4.76

71 FOLR1 14.29 HOXA9 7.14 POU3F1 4.76

72 GDF1 14.29 KIAA0087 7.14 TRAM1L1 4.76

73 IGFBP6 14.08 LEF1 7.14 ACOT11 4.55

74 KC6 14.08 SH3GL3 7.14 AHNAK2 4.55

75 BCAM 13.89 SPTA1 7.14 ARNT2 4.55

76 FCN2 13.89 TOX2 7.14 DIO2 4.55

77 CHRM3 13.51 ADA 6.67 MCOLN2 4.55

78 KLK10 13.33 C10orf50 6.67 C14orf50 4.35

10

79 SPINK5 13.33 C1orf228 6.67 CECR2 4.35

80 C8orf31 12.66 CD38 6.67 CHRNB2 4.35

81 GAL3ST3 12.66 EVPLL 6.67 CYP4F11 4.35

82 CXCL1 12.50 HIST1H3C 6.67 EGF 4.35

83 UTS2R 12.50 HIST1H3F 6.67 FAM78B 4.35

84 CNTN4 12.05 HIST2H2AC 6.67 HCN1 4.35

85 LASS1 12.05 JAKMIP2 6.67 MSLNL 4.35

86 AMOTL2 11.90 MYB 6.67 TMEM151B 4.35

87 TDGF3 11.90 MYO7B 6.67 TUFT1 4.35

88 ERBB4 11.76 NEIL3 6.67 ABCA9 4.17

89 AOX1 11.49 SIT1 6.67 C1orf53 4.17

90 C8orf84 11.49 SPSB4 6.67 CNTNAP4 4.17

91 CRYAB 11.49 TCF7 6.67 ENAH 4.17

92 NEBL 11.49 XPNPEP2 6.67 HDAC9 4.17

93 C14orf39 11.36 AURKB 6.25 MANEAL 4.17

94 ZNF208 11.36 BFSP2 6.25 SPIRE2 4.17

95 CLDN10 11.24 CD247 6.25 4.00

96 CPAMD8 11.24 CD3E 6.25 OASL 4.00

97 PRSS12 11.24 CD8B 6.25 PTK6 4.00

98 COL28A1 11.11 CENPA 6.25 LRRC16B 3.85

99 SLC16A12 11.11 CRABP1 6.25 LUZP2 3.85

100 SNCAIP 11.11 FAM57B 6.25 MPV17L 3.85

101 FN1 10.87 GFI1 6.25 NKAIN1 3.85

102 ITGB8 10.87 HIST1H2AJ 6.25 SPR 3.85

103 LIPH 10.87 RETN 6.25 TMEM38A 3.85

104 MLXIPL 10.87 SLC23A1 6.25 GPRIN1 3.70

105 SLC34A2 10.87 TMPRSS9 6.25 KIAA1211 3.70

106 GJB6 10.75 TREML2 6.25 LMAN1L 3.70

107 ZSCAN4 10.75 ARRDC5 5.88 LYPD5 3.70

108 TFAP2A 10.64 BIRC5 5.88 PTPRH 3.70

109 CYP39A1 10.53 C17orf50 5.88 SLC13A3 3.70

110 ICAM5 10.31 C22orf15 5.88 TCTE1 3.70

111 MAOA 10.31 CAMK4 5.88 C1orf182 3.57

112 SCN4B 10.31 CCL17 5.88 C2orf66 3.57

113 ANKRD56 10.20 CYP2U1 5.88 FAAH2 3.57

114 GAP43 10.20 GSTM5 5.88 FAM195A 3.57

115 HTR2C 10.20 GTSF1 5.88 FRK 3.57

116 FLRT2 10.10 HIST1H2AH 5.88 RAPGEFL1 3.57

117 CXCL6 10.00 HIST1H3J 5.88 TACR2 3.57

118 ID4 10.00 LCNL1 5.88 ANKRD9 3.45

119 MAGEL2 10.00 LIPC 5.88 CADPS2 3.45

11

120 PCDHGA6 10.00 MME 5.88 CSRP1 3.45

121 SPERT 10.00 SPNS3 5.88 MYRIP 3.45

122 C4orf31 9.09 TLX2 5.88 SRD5A1 3.45

123 CTGF 9.09 BEST3 5.56 TLR5 3.45

124 CYP27C1 9.09 C14orf64 5.56 ACCN1 3.33

125 FAM71E2 9.09 CA6 5.56 B3GNT8 3.33

126 GPM6A 9.09 CACNA1F 5.56 CDC42EP4 3.33

127 GRM4 9.09 CAPSL 5.56 DSC1 3.33

128 KANK4 9.09 CD3G 5.56 FOSL2 3.33

129 KCNS1 9.09 CDC20 5.56 GAS2L1 3.33

130 LYPD1 9.09 CDC25C 5.56 MST1R 3.33

131 PCDH15 9.09 CHI3L2 5.56 STAP2 3.33

132 PCDHGB7 9.09 CNIH2 5.56 C10orf55 3.23

133 SCARA3 9.09 FAM64A 5.56 DAB1 3.23

134 SGCE 9.09 HIST1H2AL 5.56 NEURL1B 3.23

135 SLC5A12 9.09 HIST1H2BL 5.56 PPEF1 3.23

136 SLITRK2 9.09 HIST1H3B 5.56 SHOX2 3.23

137 SP6 9.09 KCNF1 5.56 SLCO3A1 3.23

138 SYT17 9.09 LINGO4 5.56 TMEM144 3.23

139 TMEM130 9.09 LRRC26 5.56 ADAM15 3.13

140 TRIL 9.09 MKRN3 5.56 BATF2 3.13

141 WNT2B 9.09 MND1 5.56 C8orf73 3.13

142 ADAMTS5 8.33 NDST3 5.56 EPHB2 3.13

143 C10orf82 8.33 NECAB1 5.56 STOML3 3.13

144 CCDC68 8.33 PRSS1 5.56 CARD14 3.03

145 DCHS2 8.33 PVRIG 5.56 HSD3B7 3.03

146 DZIP1 8.33 SEPT1 5.56 IQSEC2 3.03

147 FAM83A 8.33 SLC22A16 5.56 PLCH2 3.03

148 KLHL14 8.33 SLIT1 5.56 PYGO1 3.03

149 LEPR 8.33 SPC25 5.56 RGMA 3.03

150 LRP4 8.33 TRAT1 5.56 SIK1 3.03

151 MN1 8.33 UMODL1 5.56 ANK1 2.94

152 NRK 8.33 UPB1 5.56 RNF148 2.94

153 PCDHGA2 8.33 ADORA2A 5.26 SLC34A1 2.94

154 PP14571 8.33 C12orf42 5.26 TNFSF10 2.94

155 PRUNE2 8.33 C21orf128 5.26 FAM108C1 2.86

156 PTPRZ1 8.33 C22orf34 5.26 FGFBP1 2.86

157 PVRL3 8.33 CDCA3 5.26 OAS2 2.86

158 RGS6 8.33 CDKN2D 5.26 APOBEC3C 2.78

159 SCPEP1 8.33 CENPV 5.26 CAMKK1 2.78

160 TRHDE 8.33 GMFG 5.26 DDX58 2.78

12

161 VASN 8.33 HIST1H2AB 5.26 MAPK8IP1 2.78

162 AFAP1L2 7.69 KIAA0748 5.26 PBX3 2.78

163 AGAP11 7.69 LAT 5.26 SPTBN4 2.78

164 AQP12B 7.69 LCK 5.26 TPD52 2.78

165 CYP17A1 7.69 MYO1A 5.26 ZBTB7B 2.78

166 GABRA4 7.69 PIF1 5.26 DIAPH2 2.70

167 GJD3 7.69 PITPNM2 5.26 HCN3 2.70

168 GULP1 7.69 POU4F1 5.26 NR6A1 2.70

169 IRX3 7.69 PTPRCAP 5.26 PLA2G4C 2.70

170 KAZALD1 7.69 TTK 5.26 PPM1H 2.70

171 PCDH18 7.69 UBE2C 5.26 PRDM12 2.70

172 PLA2R1 7.69 ZDHHC15 5.26 PTPRN 2.70

173 PRPH2 7.69 ATP8B3 5.00 RAB3A 2.70

174 SGCZ 7.69 C3orf52 5.00 FXN 2.63

175 SGMS2 7.69 CD2 5.00 NECAB3 2.63

176 SMAD9 7.69 CD52 5.00 TPST2 2.63

177 TMEM195 7.69 CD8A 5.00 PDLIM1 2.56

178 TRIM55 7.69 CENPM 5.00 RGS2 2.56

179 VWA2 7.69 CHRNB4 5.00 SLC39A14 2.56

180 WFDC2 7.69 DLGAP5 5.00 EEF2K 2.50

181 ZNF578 7.69 DNAJC12 5.00 NPC1L1 2.50

182 ZNF676 7.69 ERVFRDE1 5.00 TBC1D8 2.50

183 ACSS3 7.14 HSF5 5.00 ZFP92 2.50

184 C1QTNF4 7.14 IL9R 5.00 ANKRD34A 2.44

185 CCNA1 7.14 KCNG3 5.00 FAM186B 2.44

186 FRZB 7.14 MYBL2 5.00 FCRLB 2.44

187 GDA 7.14 NPPC 5.00 RAB11FIP4 2.44

188 IGF2BP2 7.14 OR13A1 5.00 SLC6A11 2.44

189 KLK6 7.14 PDE6G 5.00 AKT3 2.38

190 LRP2 7.14 SCARNA2 5.00 ANKRD5 2.38

191 LRRTM4 7.14 SKAP1 5.00 SMPDL3A 2.33

192 MYL9 7.14 TBC1D10C 5.00 TMCO4 2.33

193 NRP2 7.14 UHRF1 5.00 ARG2 2.27

194 NTN4 7.14 F11R 2.27

195 ODZ2 7.14 FAM57A 2.27

196 PEG10 7.14 PCNXL2 2.27

197 PREX2 7.14 PLEKHG5 2.27

198 RAMP1 7.14 PRRT3 2.27

199 RGL3 7.14 SLC2A1 2.27

200 RUNX1T1 7.14 ELL2 2.22

201 SALL1 7.14 FAM71F1 2.22

13

202 AKAP12 6.67 FLAD1 2.22

203 CA4 6.67 FLVCR1 2.22

204 CA8 6.67 GRIN3A 2.22

205 CAP2 6.67 NFE2L3 2.22

206 CCDC67 6.67 ARHGAP26 2.17

207 DKK2 6.67 C1orf21 2.17

208 EPHA3 6.67 C1QTNF6 2.17

209 F3 6.67 COQ2 2.17

210 GPR83 6.67 HOOK2 2.17

211 GREM2 6.67 MAPRE3 2.17

212 H2BFM 6.67 MRPL24 2.17

213 KANK2 6.67 SPIRE1 2.17

214 KIRREL 6.67 SRC 2.17

215 LEPREL1 6.67 TMEM198 2.17

216 LGR5 6.67 GLRX2 2.13

217 MYLK2 6.67 GPI 2.13

218 NID2 6.67 MAP3K14 2.13

219 NTN1 6.67 MR1 2.13

220 PCDH7 6.67 PKM2 2.13

221 PEG3 6.67 PRICKLE3 2.13

222 PRIMA1 6.67 RAB7L1 2.13

223 RPRM 6.67 SRXN1 2.13

224 S100A1 6.67 C14orf147 2.08

225 SNED1 6.67 CPNE2 2.08

226 STOX1 6.67 ENTPD7 2.08

227 THBS4 6.67 IRAK1 2.08

228 TMEM98 6.67 MAPK6 2.08

229 TSPAN18 6.67 TBC1D24 2.08

230 VSIG2 6.67 CORO2A 2.04

231 WDR87 6.67 OR13J1 2.04

232 ABO 6.25 PCBD1 2.04

233 ARHGAP29 6.25 PUS7 2.04

234 ATP1A2 6.25 SLC35E4 2.04

235 BICC1 6.25

236 C4orf38 6.25

237 C6orf154 6.25

238 CPE 6.25

239 CPZ 6.25

240 DACT3 6.25

241 F2R 6.25

242 GBX2 6.25

14

243 GLIS2 6.25

244 LEPREL2 6.25

245 LIN7A 6.25

246 NAV3 6.25

247 PALM 6.25

248 PTPRR 6.25

249 SLC16A9 6.25

250 SLC7A10 6.25

251 SLC9A2 6.25

252 SVEP1 6.25

253 TSPAN12 6.25

254 WNT5A 6.25

255 WWC2 6.25

256 ABCA4 5.88

257 ANO1 5.88

258 CCDC3 5.88

259 CCDC85A 5.88

260 CDH11 5.88

261 CX3CR1 5.88

262 DENND2A 5.88

263 DNAH7 5.88

264 EBF3 5.88

265 ECHDC3 5.88

266 EFNB3 5.88

267 FAM107A 5.88

268 FMOD 5.88

269 FSTL1 5.88

270 GPC4 5.88

271 HOXA4 5.88

272 INTU 5.88

273 LRP1 5.88

274 MATN3 5.88

275 MKX 5.88

276 MYLK3 5.88

277 NALCN 5.88

278 NPR3 5.88

279 NRP1 5.88

280 OLFML1 5.88

281 P4HA3 5.88

282 PI15 5.88

283 PLD5 5.88

15

284 PLSCR4 5.88

285 POTEF 5.88

286 POU6F2 5.88

287 PRRX1 5.88

288 ROR2 5.88

289 SYT2 5.88

290 TRPM3 5.88

291 WNT5B 5.88

292 BMP4 5.56

293 C3orf70 5.56

294 CACHD1 5.56

295 DTX4 5.56

296 ERRFI1 5.56

297 FAM171A2 5.56

298 FHDC1 5.56

299 FOXC1 5.56

300 FRRS1 5.56

301 FZD7 5.56

302 GALNT3 5.56

303 GAS6 5.56

304 GNAL 5.56

305 GPR126 5.56

306 GRHL1 5.56

307 HEY1 5.56

308 KCNC3 5.56

309 LEFTY2 5.56

310 LRP1B 5.56

311 LRRC2 5.56

312 LYPD6B 5.56

313 MDFI 5.56

314 MXRA8 5.56

315 PGM5 5.56

316 PTPN21 5.56

317 RARRES2 5.56

318 RASSF8 5.56

319 SERTAD4 5.56

320 SORCS1 5.56

321 TREM2 5.56

322 ZNF521 5.56

323 ADAMTS2 5.26

324 ARHGAP36 5.26

16

325 ATOH8 5.26

326 B3GNT7 5.26

327 B4GALNT3 5.26

328 BHMT2 5.26

329 BOK 5.26

330 C11orf9 5.26

331 C2orf14 5.26

332 C3 5.26

333 CCDC158 5.26

334 CDH8 5.26

335 CLDN6 5.26

336 COL9A2 5.26

337 CPNE8 5.26

338 DKKL1 5.26

339 EDNRA 5.26

340 ELFN1 5.26

341 EPHA2 5.26

342 FBXL7 5.26

343 FRAS1 5.26

344 GGT5 5.26

345 GLRB 5.26

346 HTR1F 5.26

347 MEIS3P1 5.26

348 PACRG 5.26

349 PCDHGA4 5.26

350 PDE3A 5.26

351 PLEKHH2 5.26

352 PRDM6 5.26

353 SDC4 5.26

354 SERPINF1 5.26

355 SLC16A4 5.26

356 SYT12 5.26

357 TIAM2 5.26

358 WBSCR17 5.26

359 ZG16B 5.26 DOWN 1 ATP10B -55.55 CNTN1 -6.98 NKAIN4 -12.24 SLC6A3 -19.67

2 DSG3 -46.10 ALDH1A3 -6.42 MYBPC2 -11.48 RAG1 -13.58

3 GPR87 -41.01 PARD3B -6.13 FGF17 -8.90 BMPR1B -10.96

4 NEFL -36.52 EPPK1 -6.03 SLC44A5 -8.15 PPP1R1C -10.54

5 SPOCK1 -33.35 GOLIM4 -5.59 GALNT9 -7.62 PLCH1 -8.68

6 CXCL14 -22.59 NBPF16 -5.54 DDX25 -5.67 NKAPL -7.38

17

7 C4orf50 -22.20 NBPF10 -5.50 PSD2 -5.07 LRRN2 -7.17

8 CYP11A1 -19.71 GLIS3 -5.37 LDHD -4.64 DACT2 -6.83

9 TIMD4 -19.42 MYOF -5.35 C1QL1 -4.47 TSKS -6.79

10 C1orf168 -18.67 ATP8B1 -5.30 RGS20 -4.47 CPXM1 -6.27

11 NUPR1 -17.79 GPR109A -5.24 TNNC2 -4.37 CACNA1C -5.80

12 STAR -17.69 SOX9 -5.09 GLDN -4.33 ACSBG1 -5.75

13 IRS4 -17.62 ADORA2B -4.22 MFAP4 -5.70

14 IL31RA -17.10 FAM71E1 -4.06 ZNF492 -5.47

15 BCHE -16.18 TCEAL5 -3.89 PCDH9 -5.46

16 AKR1B10 -16.05 UPK2 -3.81 PHYHD1 -5.12

17 CD5L -15.39 ITPKA -3.70 AXIN2 -4.99

18 SOD3 -15.18 ZNF541 -3.67 KIAA0319 -4.89

19 KRT31 -15.04 C22orf31 -3.46 PCSK5 -4.79

20 PIR -14.37 TCF7L1 -3.26 C21orf130 -4.69

21 PTGES -13.26 S100A5 -3.23 GIPC3 -4.64

22 CCL18 -13.19 OLAH -3.11 MPPED2 -4.58

23 ADAM23 -12.81 MESP1 -2.99 ZNF423 -4.56

24 CALN1 -12.63 GSTM2 -2.87 GJB2 -4.55

25 ENPP3 -11.71 ACCN2 -2.85 TMEM26 -4.54

26 TNFRSF17 -11.33 NPAS1 -2.81 RHBDL3 -4.47

27 PPP2R2C -10.57 ARL4D -2.76 DACH1 -4.42

28 RNF128 -10.38 DYNLRB2 -2.72 C1orf133 -4.41

29 HMGCS2 -10.29 RASIP1 -2.72 ZNF229 -4.40

30 INSM1 -9.62 SNTA1 -2.54 TOX -4.39

31 RHBDL1 -9.59 ZNF662 -2.54 KBTBD11 -4.34 NCRNA000 32 HAO2 -9.51 85 -2.41 ZNF439 -4.31

33 CYP3A4 -9.44 ZNF300 -2.37 C20orf203 -4.22

34 GBP6 -9.40 INCA1 -2.34 SH3RF3 -4.20

35 CYP3A5 -9.27 SIM2 -2.30 HPGDS -4.12

36 ZYG11A -9.27 PRRT1 -2.29 AGPAT4 -4.11

37 SPIC -8.98 SPAG8 -2.29 NRIP2 -4.02

38 SGIP1 -8.64 ABCA17P -2.24 PKDCC -4.00

39 HGD -8.62 B3GNT1 -2.20 NEDD4 -3.98

40 NRXN1 -8.56 TCTN2 -2.20 PTPLA -3.92

41 CLEC4GP1 -8.55 TPRG1 -2.19 SCN4A -3.89

42 PLXNB3 -8.50 ARHGAP10 -2.18 SNCA -3.88

43 VWC2 -8.50 LASS4 -2.16 ZNF469 -3.88

44 MYO18B -8.49 LRRC48 -2.14 AFF3 -3.82

45 PDE1A -8.47 AK3L1 -2.13 ATP1B2 -3.76

46 CAPN6 -8.36 C16orf48 -2.13 SATB1 -3.62

47 TFF3 -8.07 TRIP10 -2.13 C22orf24 -3.59

18

48 TEX101 -8.00 ABHD8 -2.12 HPDL -3.52

49 KCNC1 -7.97 BEX2 -2.12 ZNF883 -3.50

50 S100A9 -7.84 PRAM1 -2.08 TMEM200A -3.48

51 GUCY2F -7.81 MORN1 -2.07 C16orf74 -3.47

52 MGAT5B -7.81 FAM86B1 -2.05 FBLN5 -3.46

53 ESRRG -7.79 ZNF579 -2.05 SLC16A10 -3.44

54 OLIG1 -7.71 CA11 -2.04 MTMR9L -3.32

55 CCL19 -7.45 PDE6B -2.02 AKR1E2 -3.30

56 GGT6 -7.45 ARPM1 -2.01 ZNF788 -3.29

57 HAGHL -7.43 FGFR1 -3.27

58 TP53AIP1 -7.41 ZNF69 -3.26

59 FCER2 -7.38 GALNT7 -3.24

60 FAM189A1 -7.26 RTKN2 -3.24

61 F12 -7.11 DNAH10 -3.21

62 IGFBPL1 -7.10 ID3 -3.16

63 CA9 -7.03 NMUR1 -3.11

64 KRT34 -7.03 C18orf1 -3.09

65 KIAA1543 -6.99 ARHGEF10 -3.08

66 SIRT4 -6.92 CAMK1D -3.07

67 PCSK1 -6.87 MYO15B -3.00

68 MGST1 -6.84 LAYN -2.98

69 PCDH19 -6.81 TET1 -2.97

70 CHRNA2 -6.79 IGSF22 -2.92

71 MTUS2 -6.77 ZNF534 -2.92

72 CHST4 -6.73 MAP6D1 -2.91

73 FABP6 -6.72 MLC1 -2.91

74 WDR63 -6.70 ZNF677 -2.90

75 PAH -6.68 KCNMB4 -2.89

76 MYH15 -6.67 ZNF415 -2.88

77 C14orf180 -6.66 ITM2C -2.85

78 NPFFR2 -6.54 ZNF826 -2.84

79 LRRC17 -6.39 APBB1 -2.83

80 BBOX1 -6.37 GAS7 -2.83

81 CYP4F12 -6.36 AEBP1 -2.80

82 UGT2B7 -6.35 MAPRE2 -2.77

83 GPNMB -6.34 C20orf195 -2.75

84 KLHL4 -6.34 GATS -2.72

85 MORN4 -6.23 ZNF781 -2.72

86 NEURL3 -6.17 DDAH2 -2.66

87 INSRR -6.16 FAM101B -2.66

88 CYP2C18 -6.14 SLC43A3 -2.62

19

89 DGKG -6.14 ZNF681 -2.57

90 LRRN3 -6.13 KLHL32 -2.55

91 ODF3L1 -6.12 TBC1D1 -2.54

92 UBD -6.11 CTNNAL1 -2.53

93 SV2B -6.07 FAM78A -2.51

94 HIST1H1D -6.05 FAR2 -2.51

95 IFI27 -6.02 PTK7 -2.50

96 SPOCD1 -6.01 C13orf15 -2.48

97 C6orf105 -5.99 FIGNL2 -2.48

98 KCNH2 -5.98 LZTFL1 -2.46

99 IFT172 -5.97 MYADM -2.46

100 AMY1A -5.95 CDR2 -2.45

101 HSPA2 -5.80 GLCCI1 -2.42

102 NKAIN2 -5.80 GPSM3 -2.41

103 RPL39L -5.79 RRN3P1 -2.41

104 CCDC78 -5.75 C17orf72 -2.40

105 PDCD1LG2 -5.74 KLF12 -2.39

106 PCYT1B -5.72 PGM2L1 -2.39

107 HIST1H1B -5.71 RRN3P3 -2.37

108 CHIT1 -5.65 RECK -2.36

109 DGCR5 -5.65 PTPLAD2 -2.35

110 CLVS1 -5.62 HDAC7 -2.33

111 TXLNB -5.62 MOBKL1A -2.32

112 SGCD -5.58 SNRK -2.30

113 DRP2 -5.57 TRDMT1 -2.30

114 RDM1 -5.56 USP6 -2.30

115 NTRK1 -5.47 PIK3C2B -2.27

116 DRD1 -5.31 C13orf18 -2.23

117 HIST1H2BH -5.31 HIST1H2BJ -2.22

118 CYP2J2 -5.28 ZNF709 -2.21

119 ANKRD55 -5.27 NMT2 -2.20

120 GLIS1 -5.26 CMTM3 -2.19

121 VAV3 -5.22 SCN11A -2.19

122 HSPB8 -5.19 SLC11A2 -2.19

123 CMBL -5.15 C7orf31 -2.18

124 ARHGEF4 -5.04 ACVR2B -2.17

125 GDF11 -2.16

126 TMEM44 -2.15

127 BST1 -2.14

128 FAM69A -2.14

129 KAT2B -2.14

20

130 NOTCH3 -2.14

131 TFAP2E -2.14

132 C11orf95 -2.13

133 DBNDD2 -2.13

134 GKAP1 -2.13

135 VILL -2.13

136 CCDC88A -2.12

137 LBR -2.10

138 DGKD -2.09

139 AKD1 -2.08

140 EFHC1 -2.08

141 TSPYL4 -2.07

142 LRRC34 -2.06

143 STX2 -2.05

144 SUSD1 -2.05

145 LRCH1 -2.02

146 ZNF280D -2.02

147 SSBP3 -2.01

21

Supplementary Table S2. Clinicopathologic Characteristics of Patients.

Discovery cohort Validation cohort 1 Validation cohort 2 Variables (TCGA) (GSE57892) (GSE29695) Number of patients 120 22 36 Men 63 (52.5 %) 12 (54.5%) NA Sex Women 57 (47.5 %) 10 (45.5%) NA Age [median years (range)] 59.5 (17-84) 55.5 (37-76) NA Myasthenia gravis 31 (25.8%) NA NA Neoadjuvant chemotherapy 2 (1.7 %) NA NA Adjuvant chemotherapy 4 (3.3 %) NA NA Adjuvant radiotherapy 41 (34.2 %) NA NA A 17 (14.2 %) 5 (22.7 %) 2 (5.5 %) AB 35 (29.2 %) 2 (9.1 %) 8 (22.2 %) B1 14 (11.7 %) 1 (4.6 %) 11 (30.6 %) WHO classification B2 31 (25.8 %) 3 (13.6 %) 9 (25.0 %) B3 12 (10.0 %) 5 (22.7 %) 6 (16.7 %) C 11 (9.1 %) 6 (27.3 %) - I 36 (30.0 %) - IIa 41 (34.2 %) 7 (31.8 %) I & II - 21 (58.3 %) IIb 20 (16.7 %) 5 (22.7 %) Masaoka stage III 15 (12.5 %) 1 (4.6 %) III &IV - 11 (30.6 %) IV 6 (5.0 %) 9 (40.9 %) NA 2 (1.6 %) - 4 (11.1 %) GTF2I mutation 46 (38.3 %) 7 (31.8 %) NA

* NA = not available.

22

Supplementary Table S3. Univariable Cox Regression Analysis of molecular subtypes and Disease- Free Survival in the TCGA TET Sets (n=120).

Variables Univariable

Hazard ratio P value (95% CI)

M 1 Sex 0.446 F 1.18 (0.77-1.82)

<60 1 Age (years) 0.618 ≥60 1.25 (0.53-2.96)

No 1 Myasthenia gravis 0.933 Yes 1.04 (0.40-2.73)

Adjuvant No 1 0.632 radiotherapy Yes 1.24 (0.51-3.01)

A/AB/B1 1 0.134*

WHO classification B2 1.43 (0.47-4.41) 0.531

B3/C 3.05 (0.96-9.68) 0.059

I/II 1 Masaoka stage 0.082 III/IV 2.35 (0.90-6.13)

GTF2I 1 0.019*

Molecular subtype TS/CS 2.17 (0.58-8.21) 0.252

CIN 5.57 (1.50-20.68) 0.010

Age was dichotomized by median value. The variables with P value less than 0.2 are analyzed in multivariable analysis. CI denotes confidence interval.

* Overall P-value in more than 2 subgroups in variables.

23

Supplementary Table S4. List of 220 GTF2I target gene set using known transcription factor binding site motifs within the TRANSFAC® predicted transcription factor targets dataset.

GTF2I Target Gene Set Name

ABCA4 ATP-binding cassette, sub-family A (ABC1), member 4

ACBD4 acyl-CoA binding domain containing 4

ACTA1 , alpha 1, skeletal muscle

ADNP2 ADNP 2

AHNAK2 AHNAK nucleoprotein 2

ALG10 ALG10, alpha-1,2-glucosyltransferase

ALG10B ALG10B, alpha-1,2-glucosyltransferase

ANTXR2 anthrax toxin 2

APBA1 amyloid beta (A4) precursor -binding, family A, member 1

ARHGEF1 Rho guanine nucleotide exchange factor (GEF) 1

ASMT acetylserotonin O-methyltransferase

BAAT bile acid CoA: N-acyltransferase

BCL2L1 BCL2-like 1

BRI3 protein I3

C15ORF26 15 open reading frame 26

C15ORF27 open reading frame 27

C17ORF99 open reading frame 99

C2ORF82 open reading frame 82

C8ORF34 chromosome 8 open reading frame 34

C9 complement component 9

CACNB4 , voltage-dependent, beta 4 subunit

CACYBP calcyclin binding protein

CALHM3 calcium homeostasis modulator 3

CC2D1B coiled-coil and C2 domain containing 1B

CD5L CD5 molecule-like

CDC73 cell division cycle 73

CDKN1A cyclin-dependent kinase inhibitor 1A (, Cip1)

CFL1 cofilin 1 (non-muscle)

COG8 component of oligomeric golgi complex 8

COL9A2 collagen, type IX, alpha 2

COX7A2L cytochrome c oxidase subunit VIIa polypeptide 2 like

CPSF6 cleavage and polyadenylation specific factor 6, 68kDa

CRYGD crystallin, gamma D

CSAG1 chondrosarcoma associated gene 1

CXCL11 chemokine (C-X-C motif) ligand 11

CYP17A1 cytochrome P450, family 17, subfamily A, polypeptide 1

DCAF4L2 DDB1 and CUL4 associated factor 4-like 2

24

DCLK3 doublecortin-like kinase 3

DCTPP1 dCTP pyrophosphatase 1

DLG5 discs, large homolog 5 (Drosophila)

DNTT DNA nucleotidylexotransferase

DOCK7 dedicator of cytokinesis 7

E2F2 transcription factor 2

EBNA1BP2 EBNA1 binding protein 2

EBP emopamil binding protein (sterol )

EGLN1 egl-9 family hypoxia-inducible factor 1

EN2 homeobox 2

ENAH enabled homolog (Drosophila)

ENPP4 ectonucleotide pyrophosphatase/phosphodiesterase 4 (putative)

EPS15 epidermal growth factor receptor pathway substrate 15

ERCC2 excision repair cross-complementation group 2

ESYT1 extended -like protein 1

ETV2 ets variant 2

EXOC8 exocyst complex component 8

EXTL2 exostosin-like glycosyltransferase 2

FAM104B family with sequence similarity 104, member B

FAM161A family with sequence similarity 161, member A

FAM21C family with sequence similarity 21, member C

FANCB Fanconi anemia, complementation group B

FER fer (fps/fes related) tyrosine kinase

FGF3 fibroblast growth factor 3

FMN2 formin 2

FN3KRP fructosamine 3 kinase related protein

FOS FBJ murine osteosarcoma viral oncogene homolog

FOXE3 forkhead box E3

FRMD1 FERM domain containing 1

FXYD5 FXYD domain containing ion transport regulator 5

GALP galanin-like peptide

GATS GATS, stromal antigen 3 opposite strand

GCH1 GTP cyclohydrolase 1

GDPD2 glycerophosphodiester phosphodiesterase domain containing 2

GFRA3 GDNF family receptor alpha 3

GNL1 guanine nucleotide binding protein-like 1

GOLT1B golgi transport 1B

GPR137B G protein-coupled receptor 137B

GPR78 G protein-coupled receptor 78

GPX1 glutathione peroxidase 1

GRM4 glutamate receptor, metabotropic 4

25

GSK3A glycogen synthase kinase 3 alpha

GTF2F1 general transcription factor IIF, polypeptide 1, 74kDa

GTF2H5 general transcription factor IIH, polypeptide 5

GUCY1B3 guanylate cyclase 1, soluble, beta 3

HDAC9 9

HIST1H1D histone cluster 1, H1d

HRNR hornerin

IK IK cytokine, down-regulator of HLA II

IKBKAP inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase complex-associated protein

IL1B interleukin 1, beta

IMPG2 interphotoreceptor matrix proteoglycan 2

INO80E INO80 complex subunit E

INPP5K inositol polyphosphate-5-phosphatase K

ITIH1 inter-alpha-trypsin inhibitor heavy chain 1

IVL involucrin

KCNT1 , sodium activated subfamily T, member 1

KIF18A family member 18A

KIF21B kinesin family member 21B

KIF5B kinesin family member 5B

KLHL26 kelch-like family member 26

KLHL5 kelch-like family member 5

KRT79 79, type II

LCE4A late cornified envelope 4A

LEO1 Leo1, Paf1/RNA polymerase II complex component, homolog (S. cerevisiae)

LIN7B lin-7 homolog B (C. elegans)

LNX2 ligand of numb-protein X 2

LRRC1 rich repeat containing 1

LRRC70 leucine rich repeat containing 70

MAPKAP1 mitogen-activated protein kinase associated protein 1

MARCO receptor with collagenous structure

MDGA2 MAM domain containing glycosylphosphatidylinositol anchor 2

MED29 mediator complex subunit 29

MEX3C mex-3 RNA binding family member C

MRPL43 mitochondrial ribosomal protein L43

MTUS1 associated tumor suppressor 1

MUCL1 mucin-like 1

MUS81 MUS81 structure-specific endonuclease subunit

MYO1C IC

NDUFC1 NADH dehydrogenase (ubiquinone) 1, subcomplex unknown, 1, 6kDa

NFS1 NFS1 cysteine desulfurase

NLGN1 neuroligin 1

26

NLGN4Y neuroligin 4, Y-linked

NPM1 nucleophosmin (nucleolar phosphoprotein B23, numatrin)

NR1I2 subfamily 1, group I, member 2

NRCAM neuronal molecule

OLFM4 olfactomedin 4

OLIG3 oligodendrocyte transcription factor 3

OTUD5 OTU deubiquitinase 5

PABPC5 poly(A) binding protein, cytoplasmic 5

PASK PAS domain containing serine/ kinase

PBX3 pre-B-cell leukemia homeobox 3

PCDH20 20

PCMT1 protein-L-isoaspartate (D-aspartate) O-methyltransferase

PDE1A phosphodiesterase 1A, -dependent

PEX13 peroxisomal biogenesis factor 13

PIK3R5 phosphoinositide-3-kinase, regulatory subunit 5

PLAT plasminogen activator, tissue

PLEKHA1 pleckstrin homology domain containing, family A (phosphoinositide binding specific) member 1

PLEKHH1 pleckstrin homology domain containing, family H (with MyTH4 domain) member 1

PNPLA4 patatin-like phospholipase domain containing 4

PODNL1 podocan-like 1

POLR1D polymerase (RNA) I polypeptide D, 16kDa

PPP1R7 protein phosphatase 1, regulatory subunit 7

PPP2R5A protein phosphatase 2, regulatory subunit B', alpha

PRAMEF10 PRAME family member 10

PRAMEF20 PRAME family member 20

PRDM1 PR domain containing 1, with ZNF domain

PRKAG3 protein kinase, AMP-activated, gamma 3 non-catalytic subunit

RAG1 recombination activating gene 1

RALGPS1 Ral GEF with PH domain and SH3 binding motif 1

RAP1B RAP1B, member of RAS oncogene family

RBMS1 RNA binding motif, single stranded interacting protein 1

RC3H2 ring finger and CCCH-type domains 2

REEP6 receptor accessory protein 6

REG3A regenerating islet-derived 3 alpha

RERE arginine-glutamic acid dipeptide (RE) repeats

RGMA repulsive guidance molecule family member a

RHEBL1 Ras homolog enriched in brain like 1

RIPK3 receptor-interacting serine-threonine kinase 3

RNF113B ring finger protein 113B

RNF128 ring finger protein 128, E3 ubiquitin protein

RNF167 ring finger protein 167

27

RNF20 ring finger protein 20, E3 ubiquitin protein ligase

RNPS1 RNA binding protein S1, serine-rich domain

RUSC1 RUN and SH3 domain containing 1

SCHIP1 schwannomin interacting protein 1

SDCCAG8 serologically defined colon cancer antigen 8

SEH1L SEH1-like (S. cerevisiae)

SEPN1 selenoprotein N, 1

SERTAD4 SERTA domain containing 4

SETBP1 SET binding protein 1

SH2B2 SH2B adaptor protein 2

SH2D3C SH2 domain containing 3C

SHROOM2 shroom family member 2

SIX2 SIX homeobox 2

SLC12A6 solute carrier family 12 (potassium/chloride transporter), member 6

SLC13A3 solute carrier family 13 (sodium-dependent dicarboxylate transporter), member 3

SLC15A4 solute carrier family 15 (oligopeptide transporter), member 4

SLC22A18 solute carrier family 22, member 18

SLC37A1 solute carrier family 37 (glucose-6-phosphate transporter), member 1

SLC38A3 solute carrier family 38, member 3

SLITRK4 SLIT and NTRK-like family, member 4

SMYD2 SET and MYND domain containing 2

SOCS5 suppressor of cytokine signaling 5

SOHLH1 spermatogenesis and oogenesis specific basic helix-loop-helix 1

SPEM1 spermatid maturation 1

SSTR4 somatostatin receptor 4

STS steroid sulfatase (microsomal), isozyme S

SULF2 sulfatase 2

SYBU syntabulin (syntaxin-interacting)

SYNPO2L synaptopodin 2-like

TANC1 tetratricopeptide repeat, repeat and coiled-coil containing 1

TBC1D10B TBC1 domain family, member 10B

TCEAL5 transcription elongation factor A (SII)-like 5

TGFBRAP1 transforming growth factor, beta receptor associated protein 1

TINF2 TERF1 (TRF1)-interacting nuclear factor 2

TIPARP TCDD-inducible poly(ADP-ribose) polymerase

TLR5 toll-like receptor 5

TLX1 T-cell leukemia homeobox 1

TMEM159 159

TMEM56 transmembrane protein 56

TMOD4 4 (muscle)

TYMP thymidine phosphorylase

28

UBE2G1 ubiquitin-conjugating E2G 1

USP3 ubiquitin specific peptidase 3

VPS35 vacuolar protein sorting 35 homolog (S. cerevisiae)

WAS Wiskott-Aldrich syndrome

WASF1 WAS , member 1

WNT10A wingless-type MMTV integration site family, member 10A

WNT9A wingless-type MMTV integration site family, member 9A

WT1 Wilms tumor 1

XPO1 exportin 1

YEATS2 YEATS domain containing 2

ZC3H6 CCCH-type containing 6

ZER1 zyg-11 related, cell cycle regulator

ZNF154 zinc finger protein 154

ZNF326 zinc finger protein 326

ZNF502 zinc finger protein 502

ZNF510 zinc finger protein 510

ZNF652 zinc finger protein 652

ZNF681 zinc finger protein 681

ZNF837 zinc finger protein 837

29