Androgen and FoxA1 Interaction Study

Marko Laakso Biswajyoti Sahu Kristian Ovaska Olli A. J¨anne Sampsa Hautaniemi

July 10, 2011

Abstract This Anduril analysis compares (AR) and FoxA1 binding sites in LNCaP-1F5 prostate cells. data have been integrated with expression profiles obtained for the DHT stimulus. This document is generated automatically by Anduril (Engine 1.2.2). Contents

1 Expression analysis for AR 5

2 Summary of DEGs in AR 6

3 List of differentially expressed 6 3.1 set: fcC fcOver...... 6 3.2 Gene set: fcC fcUnder...... 6 3.3 Gene set: fcA1 fcOver...... 6 3.4 Gene set: fcA1 fcUnder...... 7 3.5 Gene set: fcE fcOver...... 7 3.6 Gene set: fcE fcUnder...... 8

4 Expression analysis for GR 9

5 Summary of DEGs in GR 9

6 List of differentially expressed genes 9 6.1 Gene set: fcsiCx fcOver...... 9 6.2 Gene set: fcsiCx fcUnder...... 10 6.3 Gene set: fcsiFx fcOver...... 10 6.4 Gene set: fcsiFx fcUnder...... 11

7 Gene set overlaps 14

8 Gene set comparison 17

9 Candidate report for Unique DEGs for AR 19 9.1 Moksiskaan candidate pathway...... 19 9.1.1 GO enrichment of the candidate pathway...... 22 9.2 Candidate genes...... 27 9.2.1 GO enrichment of all candidates...... 35

10 Candidate report for Unique DEGs for AR with siFOXA1 36 10.1 Moksiskaan candidate pathway...... 36 10.1.1 GO enrichment of the candidate pathway...... 38 10.2 Candidate genes...... 41 10.2.1 GO enrichment of all candidates...... 53

11 Candidate report for Common DEGs for AR and siFOXA1 56 11.1 Moksiskaan candidate pathway...... 56 11.1.1 GO enrichment of the candidate pathway...... 61 11.2 Candidate genes...... 66 11.2.1 GO enrichment of all candidates...... 71

2 12 ChIP-seq peaks 73 12.1 AR DHT binding sites...... 74 12.2 FoxA1 binding sites...... 76 12.3 AR binding sites (siFOXA1)...... 78 12.4 FoxA1 binding sites (siFOXA1)...... 81 12.5 AR and FoxA1 binding site overlaps...... 83 12.6 AR and FoxA1 binding site overlaps (up)...... 85 12.7 AR and FoxA1 binding site overlaps (down)...... 87 12.8 AR without FoxA1 binding site overlaps...... 89 12.9 AR without FoxA1 binding site overlaps (up)...... 91 12.10AR without FoxA1 binding site overlaps (down)...... 93 12.11FoxA1 without AR binding site overlaps...... 95 12.12FoxA1 without AR binding site overlaps (up)...... 97 12.13FoxA1 without AR binding site overlaps (down)...... 99 12.14AR and FoxA1 siFOXA1 binding site overlaps...... 101 12.15AR and FoxA1 siFOXA1 binding site overlaps (up)...... 103 12.16AR and FoxA1 siFOXA1 binding site overlaps (down)...... 105 12.17AR without FoxA1 siFOXA1 binding site overlaps...... 107 12.18AR without FoxA1 siFOXA1 binding site overlaps (up)...... 110 12.19AR without FoxA1 siFOXA1 binding site overlaps (down)...... 112 12.20FoxA1 without AR siFOXA1 binding site overlaps...... 114 12.21FoxA1 without AR siFOXA1 binding site overlaps (up)...... 116 12.22FoxA1 without AR siFOXA1 binding site overlaps (down)...... 118 12.23AR and AR siFOXA1 binding site overlaps...... 120 12.24AR and AR siFOXA1 binding site overlaps (up)...... 122 12.25AR and AR siFOXA1 binding site overlaps (down)...... 124 12.26AR without AR siFOXA1 binding site overlaps...... 126 12.27AR without AR siFOXA1 binding site overlaps (up)...... 128 12.28AR without AR siFOXA1 binding site overlaps (down)...... 130 12.29AR siFOXA1 without AR binding site overlaps...... 132 12.30AR siFOXA1 without AR binding site overlaps (up)...... 135 12.31AR siFOXA1 without AR binding site overlaps (down)...... 137 12.32FoxA1 and FoxA1 siFOXA1 binding site overlaps...... 139 12.33FoxA1 and FoxA1 siFOXA1 binding site overlaps (up)...... 141 12.34FoxA1 and FoxA1 siFOXA1 binding site overlaps (down)...... 143 12.35FoxA1 without FoxA1 siFOXA1 binding site overlaps...... 145 12.36FoxA1 without FoxA1 siFOXA1 binding site overlaps (up)...... 147 12.37FoxA1 without FoxA1 siFOXA1 binding site overlaps (down)...... 149 12.38FoxA1 siFOXA1 without FoxA1 binding site overlaps...... 151 12.39FoxA1 siFOXA1 without FoxA1 binding site overlaps (up)...... 153 12.40FoxA1 siFOXA1 without FoxA1 binding site overlaps (down)...... 155

3 12.41AR and FoxA1 binding site overlaps (stable)...... 157 12.42AR without FoxA1 binding site overlaps (stable)...... 159 12.43FoxA1 without AR binding site overlaps (stable)...... 161 12.44AR and AR siFOXA1 binding site overlaps (stable)...... 163 12.45AR without AR siFOXA1 binding site overlaps (stable)...... 165 12.46AR siFOXA1 without AR binding site overlaps (stable)...... 167 12.47FoxA1 and FoxA1 siFOXA1 binding site overlaps (stable)...... 170 12.48FoxA1 without FoxA1 siFOXA1 binding site overlaps (stable)...... 172 12.49FoxA1 siFOXA1 without FoxA1 binding site overlaps (stable)...... 174 12.50AR and FoxA1 siFOXA1 binding site overlaps (stable)...... 176 12.51AR without FoxA1 siFOXA1 binding site overlaps (stable)...... 178 12.52FoxA1 without AR siFOXA1 binding site overlaps (stable)...... 181 12.53GR DEX binding sites...... 183 12.54GR DEX binding sites (siFOXA1)...... 185 12.55GR and GR siFOXA1 binding site overlaps...... 188 12.56GR without GR siFOXA1 binding site overlaps...... 190 12.57GR siFOXA1 without GR binding site overlaps...... 192 12.58AR and FOXA1 overlaps unique for AR parental...... 195

13 AR versus AR siFOXA1 197

14 Expression comparision 199 14.1 Box plots...... 200

15 System configuration 213

4 1 Expression analysis for AR

Group Definition Description ce median(ce1, ce2) AR controls for FoxA1 study cd median(cd1, cd2) AR cases (DHT) for FoxA1 study a1e median(a1e1, a1e2) siFOXA1 controls a1d median(a1d1, a1d2) siFOXA1 cases fcC ratio(cd/ce) AR DHT samples divided by their controls fcA1 ratio(a1d/a1e) siFOXA1 DHT samples divided by their controls fcE ratio(ce/a1e) siFOXA1 samples divided by the parental cells

Table 1: Sample groups

a1d1

a1d2

a1d (median)

a1e2 fcA1 (ratio) a1e1 a1e (median)

fcE (ratio)

ce (median)

ce1

fcC (ratio) ce2 cd1 cd (median)

cd2

5 2 Summary of DEGs in AR

Gene set Size fcC fcOver 195 fcC fcUnder 106 fcA1 fcOver 242 fcA1 fcUnder 175 fcE fcOver 188 fcE fcUnder 199

3 List of differentially expressed genes

Overexpressed genes are sorted with the most overexpressed first and underexpressed genes with the most underexpressed first. Genes go by the column first and then by row.

3.1 Gene set: fcC fcOver

Number of genes: 195

PHGR1 C1orf116 LPAR3 ATAD2 PYGB HSD17B11 RP11-546D6.2 SMPD2 FKBP5 SPDEF HES1 PPFIBP2 PAQR4 AC068353.1 CAMKK2 PTGER4 PGC DBI KLK3 JAG1 ACAD8 AL121833.1 MKLN1 MRPS18A SLC45A3 MBOAT2 NBL1 ELOVL2 HIVEP1 GDF15 VWF HMG20B S100P STK39 CEBPD BRP44 APP WDYHV1 CEBPG TNS3 NCAPD3 EXTL2 CNTNAP2 SLC2A12 SHROOM3 MTOR VIPR1 BMPR1B SAT1 PECI TMPRSS2 RP11-312J18.5 DHCR24 CTD-2048F20.1 CORO1B TACC2 MICAL1 TUBA3D TBC1D4 KLK2 HEBP2 ZBTB24 STRA13 TIPARP TSKU ZMIZ1 CTD-2653M23.1 DNAJB9 ABHD3 SGK1 CBLL1 MERTK PMEPA1 KRT8 MLPH PFKFB2 KIF22 TEX2 GLRX SETBP1 ST6GALNAC1 TRPM8 C9orf152 PRKCH MAPK6 REPS2 C15orf23 SEMA4A ALDH1A3 TM4SF1 ISG20 SMS CRIP2 AC020915.4 UGT2B28 SEPP1 CLDN8 RP11-529H2.2 HOMER2 CENPN HERC5 ATP1A1 RAB3IP UBE2G1 SORD NFKBIA FAM43A TRPM4 AL162497.1 SNORD88C PTPRCAP YIPF1 ST3GAL4 ERRFI1 PEA15 RIPK4 IRS2 BTG1 ACACA TACSTD2 ELL2P1 C1orf21 ELOVL5 VPS26B ACSL3 MAP7D1 MPZL1 AF127936.7 ELL2 CREB3L4 SLC43A1 ADAMTS1 APOD COBLL1 HSD17B4 C3orf25 SOCS2 RAB3B EAF2 BCAP29 EML1 TBC1D8 DNM1L SPATA2 SLC41A1 TMEFF2 SLC35F2 PRR15L KCNN2 FADS2 PLEKHF1 GTF3C6 ODC1 TMEM79 GNMT C4orf34 CYP11A1 UGDH SUN2 C12orf44 ABCC4 WWC1 GPT2 GUCY1A3 HPGD LONRF1 PPAPDC1B AC061975.8 LRIG1 MYBPC1 PDIA5 C3orf58 TRIB3 ABCC1 RGS2 FAM174B DNASE2B ANKRD37 GLRX2 ARID5B INPP4B NDRG1 CAP2 PPAP2A LIMCH1 C17orf48 IDH1 ADRB2 LCP1 ZBTB16 SASH1 CAPZB UAP1 MPHOSPH9 KCNMA1

3.2 Gene set: fcC fcUnder

Number of genes: 106

CBLN2 NIPSNAP3A GALNTL4 GCG TRIB1 C6orf192 TSPAN3 C1orf53 FAM198B BCHE KIF5B FFAR2 SELENBP1 IL27RA TMEM14A CYBASC3 LRRN1 DDC DEGS2 SOX9 FAM47E RP11-510N19.5 HCP5 IRX3 UGT2B17 CAMK2N1 AC100826.1 GAS6 STBD1 ADORA2B HCP5 MAPKAPK3 CXCR7 REXO2 CALD1 NETO1 FAM113B ZNF462 HCP5 INSIG2 COLEC12 RP11-181C21.4 TSPAN8 CDH26 MAPK1IP1L SLC30A3 HCP5 PNMA3 LMO4 SLITRK3 BAMBI MT1X FIGNL1 TULP4 ZMYND15 TMEM123 CNKSR3 ATP1B1 SLC44A1 ANKRD16 RNFT2 TGM3 SRSF7 ELOVL6 TMEM144 RASL11B REEP1 NUCB2 PRKD1 MT1A DIO1 ACPP ST7 GLDC ZNF503 PRSS1 C6orf64 NAP1L3 GOLIM4 AC010170.1 TOX2 CCDC28B TSHZ3 POLR3GL C20orf177 ATP12A TMEM158 LGMN FZD2 CDK8 GPER HOXC13 SERPINI1 C5orf30 WNT5A COL16A1 CITED2 ZIC2 RP11-611D20.2 C11orf92 SDC4 PITX1 C5orf13 TNFRSF21 NAB1 P2RX4

3.3 Gene set: fcA1 fcOver

Number of genes: 242

6 FKBP5 MBOAT2 TSKU TACC1 C16orf93 Z82214.1 FSTL1 DLG5 PGC TBC1D8 FOXO1 SLC43A1 CTD-2653M23.1 DHCR24 AC068353.1 CDC25A CRISPLD2 SLC16A6 KLF9 MICAL1 SHROOM3 FZD8 CTD-2048F20.1 ARHGAP44 TIPARP EDN2 PMEPA1 KRT8 SAT1 MAP7D1 OLAH ST3GAL5 ACTA2 ZNF589 SORD C7orf63 SH3PXD2B ABCC8 CTDSPL DCXR SNAI2 CAMP ATAD2 CLDN8 TMEM14C PEA15 TAX1BP3 SHC1 AP1S2 RHOB HIST1H4H NDRG1 TPM1 ZSWIM6 PIK3CD KIAA0040 ITPR3 FADS1 KLF15 UAP1 DNASE2B CD68 ARRDC1 ADAM9 SGK1 ACPL2 C15orf23 GALNTL4 TPD52L1 PIK3C2A TSPAN13 RAP2A CAP2 IRX3 ZDHHC8P1 BRP44 KLK3 SSR3 GSTM3 PHKA1 ST3GAL4 ODC1 PACSIN1 C12orf44 SLC35F2 LCP1 AUTS2 PTGER4 HOMER2 LDLRAD3 C3orf58 HARS FADS2 TMEM100 AC073873.1 DHCR7 CEBPD JUN RBBP7 STK39 PLCXD1 SUN2 RGS4 GUCY1A3 SPDEF AL162497.1 KCNMA1 WDR41 ZMIZ1 IL6R IL18R1 CCDC17 ALDH1A3 IRS2 JAG1 TSC22D1 STEAP2 NPPC PRPS2 KRT19 ELOVL5 CDKN1C USP10 MKLN1 PHLDA2 GLUD1 BCAP29 STEAP1 ITPRIP PAK1IP1 CTGF EEF2K RHBDF1 KIF22 CEP70 RP11-529H2.2 CITED1 TMPRSS2 S100P CD24P4 TMEFF2 PTPN1 CREB3L4 CR759786.11 ERRFI1 EVL GLRX PLXNB1 CCND3 SLC5A6 HS1BP3 AL844527.3 CYR61 SERPINA3 AL590369.1 PPAP2A GNMT NFIL3 CRK CR759817.7 MPZL2 C1orf116 PTGES MYBPC1 FSIP1 GBE1 COBLL1 AL662827.4 PLEKHF1 ATP1A1 AC061975.8 FAM107B WHAMM KRTAP13-2 RIT1 B3GALT4 ELOVL2 ETS2 EZR ETNK2 CLDN12 ACAA1 ANXA5 CRNDE NUDT11 DUSP1 C6orf52 SLC44A3 IRX5 LACTB2 PTPRCAP KRT16 SYBU MPZL1 KLK2 RNF24 ABHD3 FAM8A1 CBLL1 C17orf91 SLCO2A1 TUBA3D ALDH3A2 TSC22D3 PPM1H PARP4 NCAPD3 UGDH C9orf152 C11orf75 OBFC2A VCL F2RL1 HIST2H2BE SOCS2 PGF S100A10 C4orf34 RBM24 TM4SF1 AL662827.3 ELL2P1 PIAS1 NET1 TGFBR2 HES1 CDC42EP4 CR759817.9 ELL2 WIPI1 C1orf21 REN MAK GK5 CR759786.5 SLC45A3 NFKBIA CAPN2 PYGB TTLL12 TMEM149 BX000343.1

3.4 Gene set: fcA1 fcUnder

Number of genes: 175

AC010170.1 LRRN1 ZG16B FAM111A PHLDA1 KAZALD1 FHOD1 NIPSNAP3A TMEM158 NAP1L3 RP11-510N19.5 PRKD1 C7orf23 CDR2L METRN LIMA1 COLEC12 SERPINI1 SIPA1L2 MMP15 PLEKHG4 BARD1 CFD TSPAN9 DEPTOR LMO4 MATN2 CCDC28B ATP12A TMEM144 GC LIN7A PIK3IP1 C6orf192 CBLN2 HBQ1 TSPAN7 Z83851.1 ADORA2B NUAK2 TNFRSF21 ENPP4 CNKSR3 AMOT POLB PROM2 TRIB1 NCRNA00245 YPEL2 RP11-181C21.4 MT1F REXO2 TMEM123 MUC13 NDUFS2 TP53INP1 MT1A TSPAN8 TRIM45 GAL PDE2A ILDR1 KLF13 CGNL1 SLITRK3 TSPAN15 OAF IGFBP2 ENPP5 FHOD3 NCOR2 NTPCR ADRA2C VASN PIM1 SLC16A10 SRSF7 LRBA RASL11B MYT1 ASS1 C1orf115 HSD11B2 ALDH3B2 CCDC14 FOLH1 FFAR2 COL17A1 BTG2 ZNF503 GLDC TMEM51 SHROOM1 B4GALNT4 SLC27A2 STAMBPL1 BAMBI LGALS4 NUP54 LRRC26 P2RX4 LGALS3BP ELF3 ANKRD43 BCHE ALDH4A1 HSD3B7 C11orf52 PPP1R14B ZNF239 AP3M2 AIDA HS3ST1 ODAM DDC DPYSL2 TSPAN1 MAPKAPK3 PDGFRL RSL24D1 CAMK2N1 SLPI ELOVL6 MT1X LINGO1 KLHDC5 OGDHL GRAMD4 PPP2R2A LRRTM4 TMEM116 SUSD4 PYGL CSPG5 ANXA9 SPTBN5 FAM47E SEMA3F RARRES3 ALCAM PLXDC2 HDDC3 POLR3GL IL17C STBD1 CMBL F12 GATA2 CGN FYCO1 TLCD1 RCOR2 GOLIM4 AIF1L RAI14 CYP2J2 HIBADH ACPP CCNG2 ALG13 EXOG IRF1 HOXC13 CHTF18 TANC1 MAL2 MYO6 FKBP1A CBLB SLC20A1 MOSC2 GFOD1 CYFIP2 CA11 NCAM2

3.5 Gene set: fcE fcOver

Number of genes: 188

ACPP TMEM144 RRM2 PRUNE2 AL935042.3 SLC44A4 CR933877.1 TMEM42 DDC SLC45A3 RRM2P3 CENPM CR759798.5 ARHGEF26 WDR54 RASL11B UGT2B17 C1orf116 MELK AMACR BX908719.2 NUP93 KIAA0101 VPS37C HMGCS2 RWDD2A ERGIC1 TSPAN3 CR936913.3 AC004381.6 GRB10 TUBA1C AL589734.1 TNFRSF19 GINS2 TJP1 BX088556.5 CCNB2 PIGQ CENPF SPON2 UGT2B7 LGMN KIF20A AL662845.2 TMEM170A SUV39H1 SMYD2 AC100826.1 CDC45 STEAP2 ENSA HLA-DMB ZCCHC24 CAP2 NIPSNAP3A HES6 GCG PRC1 GLDC POLA2 OIP5 ZMYND12 LPCAT1 FAM198B TOP2A PBK REEP1 FANCE STEAP1 CHEK1 MCM5 CNKSR3 RAP1GAP C6orf64 PLEKHH1 FZD2 CTXN1 C16orf75 RFC3 TMEFF2 SLC2A12 MAP7D1 DEGS1 AL391319.1 AP2S1 C12orf75 DIO1 TFF3 ST7 FGFRL1 NONO SMOC2 RDH11 TMEM14A NUSAP1

7 SPDEF TMSB15A MCM2 FIGNL1 MCM4 TM4SF1 POC1A CTSL2 CALD1 FOXA1 ANKRD16 BX088650.6 RAD51AP1 BAAT1 AC063976.6 NCRNA00287 C7orf63 TYMS ARL2 BX248088.1 KIF5B PASK P4HA2 MB CDCA5 ZIC2 BCL9 KIFC1 C5orf30 PLDN KIF2C KCNK5 RP11-45B20.3 C9orf152 APOD CTSF OSBPL5 BBS4 DLGAP5 SMUG1 GZMH HIST1H4C KBTBD11 PNPLA6 AC099759.1 SLC22A23 RPS6KA4 MCL1 TRIM68 TMPRSS2 COBL BCAT2 AL662834.7 EML3 CDH26 FAM64A MIPEP FAM83D UGT2B11 C1orf85 CR936237.3 CLDN12 TIMELESS SORD MAST4 RAMP1 TOX2 ASF1B CR388202.1 TRIB1 ST3GAL1 TK1 GNG5 CDC20 AC010296.1 AL844853.11 AURKB FZD9 AMHR2 MIR25 IGF2R SEMA6A BX005460.2 UBE2T CCNA2 TGM3 MIR93 NCAPG CR752645.1 CR759784.2 PITX1 MCM6

3.6 Gene set: fcE fcUnder

Number of genes: 199

SLPI MOSC1 MT1A NUPR1 FTH1P3 HIBCH TMEM47 YAP1 IGFBP3 S1PR3 THRSP SIPA1L2 NFKBIZ CITED2 CCDC85A VASN GDE1 TMEM100 C13orf15 TBC1D8 CDKN1C SLC16A14 TRIB3 KLF13 AC010170.1 ADRA2C ODAM PDE2A BAMBI C3orf58 HSPA5 IMMP2L TMEM158 HS3ST1 DBI BCHE MGAT1 FTH1P12 RP11-529H2.2 EIF1AY THBS1 CKS2 AC093162.5 FHOD3 PTGER4 COL17A1 DDIT4 OAT CA4 CBLB RETSAT EXOG SREBF1 TMEM116 MPZL2 ANKRD43 YPEL2 TLL2 IRF6 TNFRSF21 PRPH ORMDL3 RIT1 PYGB SCGN IDH1 GC TRIB2 DENND5A CRYAB GPRC5B ZBTB16 CAMK2N1 LGALS4 SH3BP4 SOCS2 AP3M2 TBX3 TUFT1 TWF2 FAM47E HLX MMP15 NFIB CSRP1 RP11-298P3.4 NAP1L3 IFI6 STBD1 PODXL CRABP2 SCARB2 RHOB CCNG2 GRPEL2 C1orf21 DEPTOR PHLDA1 DUSP10 TMEM2 GPT CMTM8 ELOVL4 ATP1B1 PHGR1 PIK3IP1 LRRTM4 PPIC SHROOM1 SHROOM3 DEGS2 AL139819.1 SYT17 AMOT CITED1 MUC13 GFOD1 KRT8 ATP7A SCD ALDH4A1 SOX9 EDN2 S100A11 CEBPG FADS2 HIST1H2AC SNORD88C ASS1 CITED4 DDIT3 LINGO1 PIM1 MAT2A MAP1B PLXDC2 AC012621.2 BTG1 LGALS3BP ENPP4 LRRN1 TMEM51 FAM43A TRAK2 POLB STX3 CD24P4 LPIN1 LIMCH1 CANX MERTK SERPINI1 LCN2 SLCO2A1 TLE1 KLK3 NUP54 IRF1 CAPN2 PCK2 TMOD1 MOSC2 PDIA4 GADD45A SCGB2A1 ERO1L HTR3A PIGR C1orf115 CENPN MOCOS ENPP5 FTH1 FAM102A MATN2 IFI44 CRIP2 ST8SIA2 PELI2 U6atac.22 JARID2 NEURL3 ITPR1 PDGFC FAM174B RARRES3 CSRNP1 ALCAM DNASE1L3 TEAD2 SCRG1 PRDM8 SEMA3F MFSD6 DNAJB9 OSBP HSD11B2 FTH1P11 SDC2

8 4 Expression analysis for GR

Group Definition Description siCEx median(siCEx 1, siCEx 2) GR controls for the parental cells siCDx median(siCDx 1, siCDx 2) GR cases (dexamethasone) for the parental cells siFEx median(siFEx 1, siFEx 2) GR controls for siFOXA1 siFDx median(siFDx 1, siFDx 2) GR cases (dexamethasone) for siFOXA1 fcsiCx ratio(siCDx/siCEx) GR dexamethasone samples divided by their controls fcsiFx ratio(siFDx/siFEx) GR dexamethasone siFOXA1 samples divided by their controls

Table 8: Sample groups

siCEx_2

siCEx (median) siCEx_1 siCDx_1 fcsiCx (ratio) siFEx_2 siFEx_1 siCDx (median) siFEx (median) siCDx_2 fcsiFx (ratio) siFDx_1

siFDx (median)

siFDx_2

5 Summary of DEGs in GR

Gene set Size fcsiCx fcOver 471 fcsiCx fcUnder 120 fcsiFx fcOver 532 fcsiFx fcUnder 269

6 List of differentially expressed genes

Overexpressed genes are sorted with the most overexpressed first and underexpressed genes with the most underexpressed first. Genes go by the column first and then by row.

6.1 Gene set: fcsiCx fcOver

Number of genes: 471

PGC NFKBIA ABCD3 ITGA1 KCNK1 TNKS1BP1 RHBDF1 TMED9 TUBA3D RP5-955M13.3 CYP1A2 PYGB KLF6 KIAA0513 CYP11A1 ZCCHC6 S100P SLC22A23 LCP1 JAK2 MYL12A STT3A ZBTB25 PHF17 TUBA3E CDC42EP4 FAM129B GBE1 ACSL1 PSMD8 GOLGA2 TGFBR3 CST1 SLC10A1 DHCR7 PITX1 AC061975.8 TNFRSF12A CD99 KLF9 TIPARP KRT72 WIPI1 PHACTR2 AGPAT6 DGCR2 C1orf21 ATOH8 TUBA3C ATP1A1 NCRNA00287 FAM65B C6orf81 EMILIN3 MLPH PSME4 RASD1 FADS1 TRNP1 HARS OSBPL5 THOC5 ELF1 IQGAP1

9 TMEM56 PPAP2A OFD1 IYD BBS10 TRIM24 SLC33A1 C5orf46 ACTA2 ABCC8 SPINK13 MRFAP1 TMED5 EDEM1 TMED10 NEDD4L FKBP5 SPSB1 CTD-2048F20.1 LRRFIP2 SEC23B PDZD2 EFNA1 RP11-802E16.3 CEBPD AL358252.1 CAPN13 ATG16L1 F2RL1 HAPLN3 EGFLAM ACTN1 PNLIP CLEC16A SPOCK1 PTPLA CKB ZNF189 ENPP1 IL6R LEPREL1 MUM1L1 AC217785.3 PDIA5 C1orf122 DERA GPR89C KLK3 SGK1 TUBB2A AL713999.2 LRRC8A C13orf15 GORASP1 SLC29A2 SLC41A2 CTGF PEBP4 GBA PLXNB2 CALU MEAF6 SSX2IP ARL8B SLC25A18 PHACTR3 ZDHHC8P1 TTC21A TMEM43 PRKCE ZFAND5 GFPT1 DUSP1 SLC4A7 STX12 TSC22D3 CSRNP1 SORT1 WDYHV1 TSC2 CRISPLD2 C9orf152 TDRD9 C17orf48 ZBTB24 CNN2 VPS37B JUN FOS AL162497.1 SLC5A6 C1orf172 ENPP3 IP6K3 ZC3H12A AC138649.1 ERRFI1 IRS2 BIRC3 AL391319.1 AZGP1 BRP44 GIPC1 SGSM2 CHST3 KLF15 FERMT2 SMOC2 B3GAT1 ARSA GPR89A RP11-61N20.3 LONRF1 ETNK2 ITGB1 ZBTB16 AZI2 SOD2 DEXI GOLGB1 SNAI2 PRKAB2 MAFB KIAA1826 C17orf91 SURF4 ACAT2 RRAS OGFRL1 TSKU PPL SQLE STK35 AL593848.10 BIRC2 PTPRG C17orf80 TSPYL2 TPM1 RP11-529H2.2 AC093162.5 GMNN NAT1 TSC22D1 ELL2P1 FBXO31 FAM107A FBRSL1 RETSAT SLC2A1 SLC30A7 LZTR1 ELL2 KRT8 SLC38A2 CA4 PDE9A RRBP1 KIAA1191 CLIC4 PTGER4 PHGR1 DNAJB9 SOCS2 KIAA0232 HSD17B11 ASB13 WDFY2 SRD5A1 SC4MOL AC093323.3 AGT EEF2K ZCCHC11 MICAL1 CLINT1 CYR61 RP11-397D21.1 RP11-125K10.4 HEXDC LASS6 AP001468.2 SRPRB PER1 FOXO1 SLC44A3 RCL1 AC023024.1 KLHL36 CLMN DHX36 GOLGA5 FAM105A LRIG1 TEDDM1 MED23 CLPTM1L GLUD1 PEA15 MOCOS PHLDA1 ZFP36 SLC27A3 GSTM3 PPTC7 C8orf84 ACTB RPN1 PDLIM1 CAP2 CAPN2 CPEB4 HSD17B4 NANS CAB39 GOT1 STK39 RNASE12 PALLD TP53I3 CLGN BTD MAP1LC3B ALDH3A2 SLC39A14 NPC1 USP10 JAG1 PDHA1 SRPR BBX PNPLA8 KRT73 SLCO4A1 TBC1D8 SEC61A1 SSR3 SUN2 FAM114A1 KIF5C SLC30A1 ARG1 SHROOM3 GATSL3 PACSIN2 INSIG1 C14orf147 MGAT1 GNMT DOCK5 SLC31A2 RP1-130H16.18 DNER HOMER2 ZMYND8 ATP11B RGS2 OLAH SERPINA3 SAR1B TRAM1 MYBPC1 CEBPB CRY1 ST3GAL4 PHLDA2 STRBP OSBPL11 SYBU SPRY2 TMEM39A MOGS CITED1 PAK1IP1 PROS1 TOR1AIP2 HMGCS1 TACC2 AC004893.11 TXNDC11 SLC39A11 GADD45B NUDT16 ODC1 SSR1 CTD-2653M23.1 TRRAP PELO APOD TPD52L1 STAT3 SNX33 FADS2 HYOU1 PPA1 MBNL1 CDKN1C ATAD2 ZFHX3 TMEM61 RNASE11 GPR89B CD9 WI2-3658N16.1 MBOAT2 PQLC1 SLC16A3 SEC24D CCDC6 SOCS1 LSS MAN2B2 GBP2 KCNMA1 PLEKHF1 IDI1 OLFML2B SAT1 TNFAIP3 BCAP31 SCNN1G CREB3L2 AASS CALD1 TMEM91 ABHD3 DOLK C15orf23 MFSD6 ELOVL5 RNASE4 GLRX NFIL3 CCBL1 ACSL3 ANKRA2 NSDHL CTDSPL SCAP CR759786.11 RNF115 FAM190B PPP2CB XPR1 ANO6 MSX1 AP001468.1 AL844527.3 TACSTD2 ELOVL1 RP1-96H9.6 F5 RHOB CH17-12M21.1 ANG CR759817.7 STEAP4 PHLPP2 TNFRSF1A TP53INP2 COL6A2 CST3 NET1 AL662827.4 ANXA5 MCEE MAP3K6 NKTR CES4A SLPI LRRC16A B3GALT4 RP11-458D21.5 ZDHHC9 HIPK2 AMOT GHR ZNF185 DBI TGM1 AP3S1 DDX19A C1orf198 AC063976.6 LIFR DHCR24 MERTK TMEM49 RBM24 FXYD3 AL118511.2 P4HA2 COL6A1 CA12 ASIP ACAA1 KIAA1324L TWF2 WDR60 USP38 VCL GBAP1 DLG5 TNFRSF10A FIG4 C3orf70 GOLPH3

6.2 Gene set: fcsiCx fcUnder

Number of genes: 120

TGM3 AL589734.1 SLITRK3 HOXC9 NUP93 MT1A LRRC26 AC008738.1 RASL11B SEPT3 COL16A1 C12orf48 RP11-611D20.2 NIPSNAP3A C20orf27 SPTBN5 ACPP CCDC59 DCTPP1 PRR7 MCM6 CFD C5orf30 RAI14 BAMBI TRIB1 RP11-118B22.3 SLC25A19 HJURP CKAP2L ID2 RP3-470B24.5 CARM1 TUBB4 RP11-267J23.4 C9orf140 DHRS13 RP11-1198D22.1 Z83851.1 FIGNL1 DDC F12 ZIC2 CCDC85B AIF1L FKBP1C TOX3 WDR54 AC100826.1 GATA2 NFIX HOXB13 CTB-161M19.3 HIST3H2A IRF2BPL GUSBP3 CAMK2N1 GINS2 CBLN2 CDCA5 PTMA CXCL16 DBP EFHC2 PAQR4 BCHE LMO4 CPNE4 FOLH1 CXXC5 FZD2 LMNB1 FAM198B MAST4 C13orf38-SOHLH2 MAPKAPK3 LYSMD2 ANKRD16 ZMYND15 RP11-543B16.1 MESP1 CXCR7 CACNB3 NKD2 KBTBD11 LRRN1 CMBL ELL3 HOXC13 AMACR HES6 COLEC12 ISG15 GUSBL1 FKBP1A C12orf24 HBQ1 TUBA4A AC010170.1 TRIM68 PSMB10 CLDN3 TYMS SIGIRR RP11-181C21.4 AC005017.2 TMEM158 DEGS2 MT1F AC073465.3 MCM2 PRKD1 HMGCS2 GLDC TERC UGT2B17 SLC7A5 TRAP1 PPP1R14B

6.3 Gene set: fcsiFx fcOver

Number of genes: 532

10 TUBA3D MBOAT2 DHCR7 PRDX6 AZI2 B3GAT1 LAMB1 COPS8 TUBA3E AC217785.3 MERTK UGP2 MICAL1 AC135995.3 KRT16 GEMIN8 RASD1 AL713999.2 PRKAB2 DLX1 SAA2 AC126339.2 PEA15 OSBPL10 TIPARP GBA CTDSPL AC010296.1 C15orf23 AC010724.5 TTC39B WTAP TUBA3C AL391319.1 PEBP4 SEMA6A AUTS2 AC136698.2 CLDN12 PDZD8 TMEM56 SMOC2 PTPRG MUM1L1 AC073873.1 AC044860.4 HSD17B11 MORF4L2 S100P OLFML2B ZBTB24 RBM24 C6orf52 AC004383.4 TXN KIAA0232 PGC HEXDC SYBU C6orf81 NFIL3 ARSA RP11-802E16.3 AC063943.1 FKBP5 TSPYL2 ODC1 GIPC1 TNFAIP3 DOCK5 C16orf93 NFE2L2 CEBPD KLF15 OFD1 TRIM24 EGFLAM C12orf44 FDFT1 CCDC84 C17orf80 DERA SOCS2 AL590369.1 THOC5 MEAF6 TFAP2C IRAK1 CITED1 DHCR24 SF3B5 PTGES PLXNB1 C14orf147 CCNF EFNA1 CRISPLD2 SLCO4A1 PDHA1 MAPRE3 MYBPC1 GOLPH3 ITPK1 SAMD8 FOS HBB KRT8 AGPAT6 PDE9A MAP7D1 SAP30 AC135995.4 CTGF ATAD2 LCP1 AASS TP53INP2 NAT2 PELO AC010724.6 PDLIM1 STK39 GADD45B SLC29A2 IQGAP1 TWF2 RGS19 RNFT1 DUSP1 HOMER2 SQLE RNF115 TOR1AIP2 HSD17B4 RAB32 TSKU CES4A SOCS1 HSPB8 FADS2 PRKCE CCDC107 PHKA1 C5 OGFRL1 SLC39A11 JUN PARP12 SLC9A2 FERMT2 PIAS1 MFSD6 TUBB2A ETNK2 KCNMA1 DLL1 CAPN2 ELF1 ZFYVE20 C5orf51 CYR61 CALD1 PIR PANX1 DPP4 KIAA0040 AC023024.1 UBQLN1 RP5-955M13.3 GPD1L OLAH ZNF185 TNKS1BP1 FGFRL1 FAS TAF1B CHST3 VCL GHR ZNF189 CST1 NCKIPSD FAM189B ENPP3 SGK1 CLPTM1L RP1L1 SNX33 GSTM3 TMEM63C EBP SUN2 SPOCK1 AC093323.3 AP3S1 TMEM61 PLXNA2 DNAJC3 COBLL1 FAM199X PITX1 PQLC1 TNFRSF12A SH3PXD2B SOX8 PHF17 AC126339.3 AKR1C3 ERRFI1 PALLD USP10 MRFAP1 KIAA1191 ASIP KIAA0913 LGMN COL6A2 FBXO31 PPL AP001468.1 DHX36 KLHL36 C7orf53 TLN2 TRNP1 ELOVL5 C17orf91 NUDT16 C17orf48 MRPL11 PLCXD1 DBI C8orf84 NFKBIA SCAP PHLDA2 ZFHX3 TGM1 OSBPL5 CYP51A1 ELL2P1 FRMPD3 EPAS1 SHC1 FAM107A DNAJB9 HMGCS1 ALDH3A2 ELL2 STX12 NAT1 CH17-12M21.1 EVL FDPS PPA1 HAPLN3 SERPINA3 ABCC8 CTD-2048F20.1 HPCAL4 GLRX KCNG1 C5orf32 DYNLL1 ACTA2 SLC30A1 LRRC16A ELOVL1 RP11-61N20.3 ABHD2 GNRH2 CCBL1 PGF TSPAN3 NCRNA00287 CPEB4 PDZD2 NAV2 DNER MAPK6 COL6A1 GBP2 NUDT11 GNMT XPR1 BBS12 WDR55 DAPK3 SNAI2 KLF6 PTPLA ERICH1 LSS EIF2AK3 STK17B BTBD10 SOD2 NET1 CNKSR3 SAA1 TMEM120A MYL12A UAP1 DVL2 SRD5A1 CST3 GBE1 TTC21A P4HA2 ATG16L1 RP1-96H9.6 ABCF3 SLC39A14 EMILIN3 RP3-437C15.1 SWT1 RP11-458D21.5 SEC14L1 TNFRSF1A AC061975.8 SLC4A7 ITGB1 CLEC16A GMNN FTH1P3 CKB IP6K3 PNMA1 TPD52L1 C1orf172 SLC22A23 WIPI1 CR759786.11 CIDEC TCEAL4 FTH1P12 SLC44A3 ITPKA C1orf198 ATF6 AL844527.3 AP001468.2 RBMS3 PCYT2 FADS1 SLC5A6 AL118511.2 MAP1LC3B CR759817.7 EIF2B2 CYorf15A HBP1 PNLIP SLCO2A1 SC4MOL ACAT2 AL662827.4 RHBDF1 HYOU1 PPP3CC LONRF1 S100A10 RP11-397D21.1 MRPS23 B3GALT4 AC010335.1 WWC3 PLCG1 ATP1A1 SLC25A18 CDKN1C SLC6A6 MGST1 ELL TAX1BP3 CTDP1 PAK1IP1 RHOB HARS C5orf46 NRCAM MAP3K6 AC110814.1 PHKG2 C9orf152 ABCD3 PHLDA1 FBRSL1 LRRC8A STAT3 SCAMP3 C1orf122 SPINK13 PHACTR2 FIG4 SLC27A3 FLNA DOLK WI2-3658N16.1 PTPRN2 PLEKHF1 AL162497.1 KIAA1826 DLG5 CLGN PPP2CB DUSP23 TACSTD2 FOXO1 IRS2 CCDC6 RAD51C C8orf42 MED23 KRT16P3 AC092037.3 ACPL2 JAK2 IDI1 RBBP7 RP11-529H2.2 AC010889.1 HS1BP3 MANF LIFR PROS1 BBX BBS10 ASCC1 CYorf15B ACSL3 RRAS NSDHL RP11-125K10.4 DGCR2 DDIT4 COPS3 VEZF1 SHISA2 C6orf62 TEDDM1 RCL1 STRBP HSD17B2 JAG1 EZR ELMOD2 IFRD1 PHACTR3 FAM105A ZFP36 RNASE11 PSMD8 LIN7B AC044860.3 SLC22A5 CAP2 PPAP2A PLXNB2 FAM65B YWHAQ TP53BP2 ELK1 UBE2Q2P2 ST3GAL4 SLC31A2 CA12 C3orf70 TRAPPC2P1 APEX2 AMDHD2 RAB40A QSOX1 TSC22D3 ACSL1 TPM1 OSBPL11 CAB39 PCYOX1L CAP1 LEPREL1 CRYAB TBC1D8 MIPEP CNN2 SORT1 C1orf85 BRP44 CDC42EP4 SCNN1G LASS6 EPHB6 GOT1 BZW1P2 USP3 AACS GBAP1 FAM129B NPC1 TMEM43 PYGB STEAP4 GBP1 ZDHHC9 SPSB1 OVGP1 GOLGA5 PTGER4 SPON2 ENPP1 LTV1 AL358252.1 STK35 SSR3 SEC23B AC063976.6 IGF2R VILL RNASE12 ZDHHC8P1 AP1S2 TTLL12 PPP1R14C FTH1P11 CLMN ANO6 TDRD9 ACAA1 Z82214.1 MYH10 ANXA5 GPSM2

6.4 Gene set: fcsiFx fcUnder

Number of genes: 269

POLB RP11-1280N14.4 MT1F RP11-611D20.2 GFOD1 EFNB3 IMMP2L GPT2 LRRTM4 RP11-98J23.2 CXCR7 PIM1 CGNL1 INSL5 CRABP2 CCDC28B COLEC12 RP11-497H16.5 C4orf11 TSPAN9 RPSA RP3-470B24.5 ELL3 SIPA1L2 DEPTOR GUSBP1 LYSMD2 GOLIM4 PLAT CCDC85A SCNN1D NLGN1 RP11-181C21.4 RP11-1415C14.3 NAP1L3 DHRS13 SELENBP1 CDK1 SLC39A8 MBLAC2

11 HOXC13 AC108108.1 FAM174B TYMS RXRA LUZP2 FLOT2 AC093510.1 BAMBI RP11-823P9.1 SH3RF1 NOTCH1 KLHDC5 NCRNA00219 DHRS4L2 CPNE3 RP11-543B16.1 SEMA3F SLPI PRMT3 RP11-31K13.1 NUPR1 DKC1 MEF2C AP3M2 PRPH PPP1R14B VDAC3 ADA CMBL HCP5 ZMYND15 ASS1 CLDN3 PSMB10 WDR54 AMACR SLC7A5 HCP5 GINS2 AC010170.1 CBLN2 HTR3A HIST3H2A ZBTB43 LMO4 HCP5 SNORD14D TMEM158 FAM47E FOLH1 PIK3IP1 HOXA5 SLC16A14 HCP5 SLC25A19 MYC STBD1 ABCC4 TBL1XR1 AIDA TARS RP11-781P14.3 ANKRD46 LRRC26 CAMK2N1 GLDC ACOX2 ELF3 TOP2A FAM102A TMEM123 MESP1 NETO2 PAQR4 MYO6 C1orf115 CSPG5 C5orf30 SLC2A4RG ZG16B GUSBL1 FKBP1A EFCAB4A C1orf116 PRR7 SPTBN5 SOX9 VASN TLL2 LINGO1 PIK3AP1 KBTBD11 SDC4 RP11-497H16.8 ADORA2B Z83851.1 SYT17 IRF2BPL DEGS2 PPP2R2A NDUFAF4 XXbac-BPG55C20.1 SERPINI1 PRKD1 COL16A1 AC008738.2 TUBA4A AC012621.2 CACNB3 MSRB2 ANKRD43 ACPP RP11-510N19.5 CEBPA LGALS4 PARD6A GDF15 ZNF467 NFKBIZ GATA2 ADRA2C CFD CXCL16 NEFH HIBADH RAI14 TMEM117 SLITRK3 TRIB1 HBQ1 AIF1L CITED2 BRIX1 MCM6 ADAP2 RP11-1198D22.1 C12orf48 RP11-589F5.4 CGN MANEA GPT ABHD11 CDCA5 FOXA1 GUSBP3 RP11-1415C14.4 CHTF18 IL1RAPL1 PRAC PROM2 AQP11 LRRN1 CBLB FKBP1C TGIF2 CDK5R2 TP53INP1 AC016712.2 TCEA2 F12 TSPAN15 HOXC9 PODXL YPEL2 ATP12A CCDC14 HNRNPA1P10 ZNF503 C9orf140 CCDC59 GDE1 H2AFY2 FERMT1 PCBD1 MUC13 MT1A ENPP4 MOSC2 SDSL KLHL35 GFOD2 GULP1 NKD2 HLX DDC ILDR1 C19orf48 EFR3B RAP1GAP SHROOM1 HJURP NUP54 TOB1 RSL24D1 SNORD88B MAPKAPK3 RIT2 TLCD1 BARD1 SEPT3 MAL2 FAM65A PRR15L MOSC1 ISG15 THNSL1 GRAMD1C ODAM HIST1H4C TSPAN8 TRIB2 ADAMTS1 RALYL PPP1R16A CCNG2 TOX3 HSD3B7 PPFIA2 YEATS4 RNF144A LIN7A BCHE PELI2 CYFIP2 SERHL2 RHPN2 HS3ST1 MYLK

Table 13: Pathway effects of AR DTH (S1) and siFOXA1 DHT (S2) as predicted by SPIA.

ID Name S1 p S1 status S2 p S2 status hsa05221 Acute myeloid leukemia 0.7815 Activated 0.9992 Inhibited hsa04920 Adipocytokine signaling pathway 0.3595 Inhibited 0.9992 Inhibited hsa05143 African trypanosomiasis 0.7537 Activated hsa04960 Aldosterone-regulated sodium reabsorption 0.3595 Activated 0.2230 Activated hsa05010 Alzheimer’s disease 0.8946 Inhibited 0.9992 Activated hsa05146 Amoebiasis 0.9992 Inhibited hsa04612 Antigen processing and presentation 0.6735 Inhibited hsa04210 0.9998 Inhibited 0.9992 Inhibited hsa04360 Axon guidance 0.9998 Activated 0.9992 Inhibited hsa04662 receptor signaling pathway 0.9998 Inhibited 0.9992 Inhibited hsa05100 Bacterial invasion of epithelial cells 0.9998 Inhibited 0.4870 Activated hsa05217 Basal cell carcinoma 0.5737 Inhibited 0.9992 Activated hsa04976 Bile secretion 0.5803 Inhibited 0.9992 Inhibited hsa05219 Bladder cancer 0.9992 Inhibited hsa04020 Calcium signaling pathway 0.9998 Inhibited 0.9992 Inhibited hsa04973 Carbohydrate digestion and absorption 0.9998 Inhibited 0.9992 Inhibited hsa04110 0.9992 Inhibited hsa05142 Chagas disease (American trypanosomiasis) 0.9998 Inhibited 0.5575 Activated hsa04062 Chemokine signaling pathway 0.9998 Inhibited 0.9992 Activated hsa05220 Chronic myeloid leukemia 0.9998 Inhibited 0.4870 Inhibited hsa05210 Colorectal cancer 0.7537 Activated hsa04610 Complement and coagulation cascades 0.9998 Inhibited 0.9992 Inhibited hsa04060 - interaction 0.9998 Inhibited 0.9992 Inhibited hsa04623 Cytosolic DNA-sensing pathway 0.9998 Inhibited 0.9992 Inhibited hsa05414 Dilated cardiomyopathy 0.9992 Inhibited hsa04320 Dorso-ventral axis formation 0.9992 Inhibited hsa04512 ECM-receptor interaction 0.9998 Activated hsa05213 Endometrial cancer 0.9992 Inhibited hsa05120 Epithelial cell signaling in Helicobacter pylori infection 0.9998 Inhibited 0.9992 Activated hsa04012 ErbB signaling pathway 0.9998 Activated 0.7525 Activated hsa04664 Fc epsilon RI signaling pathway 0.9992 Activated hsa04666 Fc gamma R-mediated phagocytosis 0.9998 Inhibited 0.9216 Activated hsa04510 Focal adhesion 0.9998 Activated 0.6440 Activated hsa04540 Gap junction 0.9998 Inhibited 0.9992 Inhibited hsa04971 Gastric acid secretion 0.9998 Inhibited 0.9992 Inhibited hsa05214 Glioma 0.9998 Inhibited 0.9992 Activated hsa04912 GnRH signaling pathway 0.9992 Inhibited hsa04340 Hedgehog signaling pathway 0.9998 Inhibited hsa05160 Hepatitis C 0.9998 Inhibited 0.8857 Activated hsa05016 Huntington’s disease 0.9998 Inhibited 0.9992 Inhibited hsa04910 Insulin signaling pathway 0.5737 Inhibited 0.0803 Inhibited hsa04630 Jak-STAT signaling pathway 0.6000 Inhibited 0.4870 Inhibited hsa05140 Leishmaniasis 0.9998 Inhibited 0.9992 Inhibited hsa04670 Leukocyte transendothelial migration 0.9998 Inhibited 0.9992 Activated hsa04730 Long-term depression 0.9998 Inhibited 0.9992 Inhibited hsa04720 Long-term potentiation 0.9992 Inhibited hsa04142 0.9998 Inhibited 0.9992 Inhibited hsa04010 MAPK signaling pathway 0.9998 Inhibited 0.9992 Activated hsa05144 Malaria 0.9998 Inhibited hsa04950 Maturity onset diabetes of the young 0.3595 Inhibited 0.7537 Inhibited hsa04916 Melanogenesis 0.6735 Inhibited 0.9992 Activated hsa05218 Melanoma 0.9992 Inhibited hsa04621 NOD-like receptor signaling pathway 0.9998 Inhibited 0.9992 Inhibited hsa04650 mediated cytotoxicity 0.9992 Activated hsa04080 Neuroactive -receptor interaction 0.5737 Inhibited 0.9992 Inhibited hsa04722 Neurotrophin signaling pathway 0.9998 Activated 0.7525 Activated hsa05223 Non-small cell cancer 0.9992 Inhibited hsa04330 Notch signaling pathway 0.8036 Inhibited 0.8561 Inhibited hsa04114 Oocyte meiosis 0.9992 Inhibited hsa04380 Osteoclast differentiation 0.9998 Inhibited 0.9992 Activated hsa03320 PPAR signaling pathway 0.9133 Inhibited 0.9992 Inhibited hsa05212 Pancreatic cancer 0.9992 Activated hsa04972 Pancreatic secretion 0.8036 Inhibited 0.9992 Inhibited hsa05012 Parkinson’s disease 0.9998 Inhibited 0.9992 Inhibited hsa05130 Pathogenic Escherichia coli infection 0.3595 Activated 0.4870 Activated hsa05200 Pathways in cancer 0.9998 Inhibited 0.8857 Inhibited hsa05020 Prion diseases 0.9992 Inhibited hsa04914 Progesterone-mediated oocyte maturation 0.9992 Activated hsa05215 Prostate cancer 0.5737 Inhibited 0.7537 Inhibited hsa04141 processing in endoplasmic reticulum 0.9998 Inhibited 0.9992 Activated hsa04622 RIG-I-like receptor signaling pathway 0.9998 Inhibited 0.9992 Inhibited hsa03018 RNA degradation 0.9998 Inhibited 0.9992 Inhibited Continued on next page. . .

12 ID Name S1 p S1 status S2 p S2 status hsa03013 RNA transport 0.9992 Inhibited hsa04810 Regulation of actin cytoskeleton 0.9992 Activated hsa05211 Renal cell carcinoma 0.7537 Activated hsa05323 0.7525 Activated hsa04970 Salivary secretion 0.3595 Activated 0.7903 Inhibited hsa05131 Shigellosis 0.9998 Inhibited 0.9992 Inhibited hsa05222 Small cell lung cancer 0.9998 Inhibited 0.9992 Inhibited hsa05150 Staphylococcus aureus infection 0.9992 Inhibited hsa05322 Systemic erythematosus 0.0003 Inhibited hsa04660 receptor signaling pathway 0.9998 Inhibited 0.9992 Activated hsa04350 TGF-beta signaling pathway 0.9998 Inhibited 0.9992 Activated hsa04742 Taste transduction 0.9992 Inhibited hsa04530 Tight junction 0.9310 Activated 0.4870 Activated hsa04620 Toll-like receptor signaling pathway 0.8210 Inhibited 0.9992 Inhibited hsa05145 Toxoplasmosis 0.9998 Inhibited 0.9992 Activated hsa04930 Type II diabetes mellitus 0.3595 Inhibited 0.5575 Inhibited hsa04370 VEGF signaling pathway 0.9998 Inhibited 0.9992 Inhibited hsa04270 Vascular smooth muscle contraction 0.5761 Activated 0.8406 Inhibited hsa04962 Vasopressin-regulated water reabsorption 0.9799 Activated 0.9992 Activated hsa04310 Wnt signaling pathway 0.9998 Inhibited 0.9992 Inhibited hsa04150 mTOR signaling pathway 0.7815 Inhibited 0.9992 Inhibited hsa04115 signaling pathway 0.9992 Inhibited

13 7 Gene set overlaps

AR FOXA1

1601 5830

10766 4 16719 10

149 187

10029 13 4043 21 29 24

AR siFOXA1 up AR siFOXA1 up

(a) (b)

AR FOXA1

1599 5818

10866 6 16779 22

49 127

10025 18 4053 12 33 14

AR siFOXA1 down AR siFOXA1 down

(c) (d)

Figure 1: Intersections of sets: (a) AR, up and AR siFOXA1 (b) FOXA1, up and AR siFOXA1 (c) AR, down and AR siFOXA1 (d) FOXA1, down and AR siFOXA1.

14 DHT up (siFOXA1) DHT down (siFOXA1) DHT up DHT down

107 88 154 62 44 131

(a) (b)

DEX up (siFOXA1) DEX up (siFOXA1) DEX up

DEX down

146 325 207 43 77 192

(c) (d)

Figure 2: Intersections of sets: (a) DHT up and DHT up (siFOXA1) (b) DHT down and DHT down (siFOXA1) (c) DEX up and DEX up (siFOXA1) (d) DEX down and DEX up (siFOXA1).

15 GR siFOXA1 AR GR GR

2385 8675 6704 3522 7538 4982

(a) (b)

AR siFOXA1 unique GR siFOXA1 unique

2691 2005 4204

(c)

Figure 3: Intersections of sets: (a) GR and GR siFOXA1 (b) GR and AR (c) GR siFOXA1 unique and AR siFOXA1 unique.

16 8 Gene set comparison

Unique DEGs for AR red

167 29

2 0 1 0 0 0

283 0 132 56 0 26

Unique DEGs forCommon AR with siFOXA1DEGs for AR and siFOXA1 green blue

(a) (b)

Figure 4: Intersections of sets: (a) Unique DEGs for AR, Common DEGs for AR and siFOXA1 and Unique DEGs for AR with siFOXA1 (b) red, blue and green.

Vertex fill colors: no matching annotations Unique DEGs for AR Unique DEGs for AR and Unique DEGs for AR with siFOXA1 Unique DEGs for AR with siFOXA1 Unique DEGs for AR with siFOXA1 and Common DEGs for AR and siFOXA1 Common DEGs for AR and siFOXA1 Unique DEGs for AR and Common DEGs for AR and siFOXA1 Unique DEGs for AR and Unique DEGs for AR with siFOXA1 and Common DEGs for AR and siFOXA1

gene gene pathway protein protein protein protein dephosphorylation phosphorylation expression repression precedence activation binding dissociation inhibition

Figure 5: Descriptions of the edge types and the gene colors used in the candidate pathway shown in Figure6. Green and blue borders are referring to up and down regulated genes, respectively. Light grey is used to emphasize stably expressed genes. Known regulations are shown with bold borders whereas the predictions are kept thin.

17 IDH1

IDH3A, IDH3B, IDH3G

OGDHL

DLST

ACSL3 DLD

SMS

AMD1

OTC ACADL

ACAD8 GLDC

PRODH ALDH18A1 GCAT

SRM ASS1 AMT

ODC1 NAGS ALAS2 ALAS1

AGMAT ACAA1 GNMT

GOT1, GATM GOT2

TAT

TYR

TH ALDH4A1

ABP1 EHHADH GLUD1

GLUL

HAL GLS, GLS2

SAT1 ALDH6A1

IL4I1 HADH

HIBADH

DDC

TPH1, TPH2

CARNS1

ABAT GAD2

GAD1

AOC2, AOC3

CYP17A1 IDO1

HSD11B2 HSD3B7 IDO2 DCXR

CYP2D6 CYP21A2 ATP1B1 MAOB ALDH3A2 AKR1B1

MGST2 AKR1D1 MAOA

CYP11A1 GSTA3 CYP2F1 ALDH1A3

HSD3B1, GSTO1 CYP1A1 HSD3B2 COMT

GSTO2 GSTT1 GSTM3 HSD17B2, HSD17B6, ALDH3B2 GGT1, GGT5, HSD17B8 GGT6, GGT7 GSTM1 GSTA4 AKR1C4

MGST1 GSTA1 GSTK1 GSTT2

GSTP1 GSTZ1 MGST3 ADH1A, ADH1C, ADH4, ADH5, EPHX1 ADH6, GSTA2 ADH7

GSTM2

SC5DL ADH1B

UGT2B17, LIPA CYP3A43 GSTM5 UGT2B28

CYP1A2 CYP3A4, GSTM4 CYP3A5, GUSB CYP3A7

UGT1A5, UGT1A6, DHCR7 CYP2E1 UGT2A3

DHCR24 CYP2B6

UGT1A1, UGT1A10, SOAT1, UGT1A3, SOAT2 UGT1A4, UGT1A7, UGT1A8, UGT1A9, ASAH1, UGT2A1, CYP4A11 ASAH2 UGT2B10, UGT2B11, UGT2B15, UGT2B4, ENPP7, SMPD4 UGT2B7 SMPD1, SMPD3

CYP4A22 DEGS2

AD000685.1, CYP4F11, SMPD2 CYP4F2

FADS2 CYP2J2

DGAT1 PLB1 ALOX15B ALOX15

PPAP2B CYP2U1 ALOX5

PPAP2A ALOX12 ALOX12B

CEL LIPC PPAP2C PTGS1

PNPLA3 PLD2

DGAT2 PNLIPRP2

PNLIPRP3 EPT1

PNLIPRP1 PNLIP CHPT1

LIPG LIPF CEPT1 PLA2G12B, PLA2G2C, PLA2G2D, PLA2G2E PLD1

AGPAT1, MBOAT2 C17orf48 AGPAT2

PRPS2 PRSS1 GUCY1A3 PDE2A SEC61A1, SEC61A2, BCAP31 SEC61B, F2RL1 SEC61G ADCY10 DGKB, DGKD, F12 DGKE, DGKG, DGKI, PLG SSR3 ITPA SHANK2 DGKQ

CTPS, ENTPD1 CTPS2 DGKA, DGKH, DGKZ MPZ PNPT1 NTPCR ENTPD3 MPZL1 PGM1, GALNTL4 SHANK1, PGM2 POLR3GL SHANK3

B3GNT6 HOMER2 ST6GALNAC1 ENTPD8 ADCY4 NME1, NME1-NME2, NME2, ADCY1, ST3GAL1, NME3, PKLR, ADCY3, ST3GAL2 NME4, PKM2 ADCY8 NME5, NME6, ST3GAL5 NME7 B3GALT4 ADCY2, HOMER1 ADCY5, B4GALNT1 ADCY6, ADCY7, GRM5 ADCY9

KL

CALM1, NUP54 CALM2, SORD ATP1A1 ALCAM NUP62 CALM3, CD6 CALML3, CALML5, CALML6 PTGES

ITPR3 PTGS2 GNAO1

FCGR1A, AMY1A, PABPC1, FCGR2A, AMY1B, ITPR1, PABPC1L, ITPR2 FCGR2C GRM1 AMY1C JMJD7-PLA2G4B, PLA2G10, PABPC1L2B, PLA2G12A, PABPC4, PLA2G1B, PABPC4L, PABPC5 PLCZ1 GBE1 FCGR3A PLA2G2A, PLA2G2F, BTG2 PLA2G3, PLA2G4A, PLA2G4B, GYS1, PLCB2 PLCG1 PLA2G4E, GYS2 BTG1 PABPC1L2A, PLA2G5, PABPC3 PLA2G6

PLCD1 PLCD3 PLCB1 CNOT7, CNOT8

AGPAT6, PIK3C3 PLCD4 PLCG2 FZD2 AGPAT9, GPAM PFKFB2 FZD8 PLCB4 PLCB3 NUDT5 HK1, HK2, LRP5, PLCE1 HK3, LRP6 PIK3C2A UGP2 HKDC1

PIK3C2B HOMER3 UGDH APBB1, NAE1 APP PIP4K2A, PIP4K2B, NFKB1, NFKB2 PIP4K2C RELA PYGL

PHKA1 NFKBIA AC023024.1, MARCH6 UBE2G1 HRAS, PYGB PIK3C2G NRAS

AMY2A, PIP5K1C AMY2B INPP4B PTEN

PIP5K1A, MAPKAPK3 INPP4A LAT PIP5K1B

PIKFYVE PIK3R2, PIK3CA, PIK3R3, PIK3CB, PIK3R5 PIK3CG MAPK1, TGFBR1 MAPK3 TGFBR2

KRAS PIK3AP1 PIK3R1 KDR CEBPG

PDGFRA, PIK3CD PDGFRB CALD1 ETS2

WNT5A IGF1R MAPK11, ICAM1 MAPK12, MAPK13, PDPK1 GAB2 MAPK14

FLT4 PRKCZ ZAP70 ERBB2

AKT1, LCK EGFR GRB2 AKT2

IL2RA, FLT1 PTK2 IL2RG

IL2RB PTPN11 SGK1

GLI3 AKT3 GSK3B GLI1

GLI2 CBLB GAB1

JAK3 MET

JAK2 RAC1

ZIC2 SHC1 JUN

IRS2

PGF

IRS1, IRS4 PPP2R2A CRK

JAK1 FYN BCAR1 DOCK2 CREB3L4

MAPK10, FRS2, TYK2 ABL1 DUSP1 MAPK8, KIDINS220 MAPK9

MTOR RAPGEF1 PTK2B

FOXO1 GJA1

LEPR IL6R STAT5B

TUBB, TUBB1, SOCS2 STAT5A TUBB2B, TUBB2C, TUBB3, TUBB4, TUBB4Q, STAT2 STAT1 MAP2K4 TUBB6

STAT4, INSR STAT6

STAT3 HIF1A PTPN1 CCND3

PPARGC1A DPYSL2 RP11-631M21.2 CTNNB1

WAS, TUBA3D PIAS1 WASL

EZR PIM1

SMG6 AC008810.1 DYNC1H1, DYNC1I1, DYNC1I2, DYNC1LI1, DYNC1LI2, CAMKK2 DYNC2H1

PRKAG2 PRKAA2

PRKAB1, PLXNB1 PRKAB2, PRKAG1, PRKAG3 ARRB1, SMG1, ARRB2 SMG5

ACACA

ACACB PRNP

VCAM1

CTNNA1, CTNNA2, TJP1 CTNNA3

VCL CLDN20

SEMA4A OCLN CLDN5

ACTB CLDN11 CLDN7 CLDN8

ACTG1 CLDN15 CLDN2 CLDN9 ESAM

FASN TJP2 CLDN3 CLDN22 CLDN16

NCAM2 ADRB2 CLDN4 CLDN14 CLDN10

NCAM1 CLDN6 CLDN18 MPDZ

PFN1, TJP3 CLDN17 CLDN23 PFN2, PFN3, PFN4 ACTN1, CLDN1 ACTN2, ACTN3, OLAH ACTN4 INADL

OXSM MYH10, MYH14 PRKCH

SORBS1, CLDN19 TLN1, TLN2

CGN

MYH9

MYL2 MYL10, MYL12A, MYL12B, MYL5, MYL7, MYH6 MYL9, MYLPF

MYH7

L1CAM

MYH1, MYH11, MYH13, MYH15, MYH2, MYH3, MYH4, MYH7B, MYH8

ACTC1

TPM1

TNNT2

Figure 6: Candidate pathway for Gene set comparison. Graph notations are described in Figure5.

18 9 Candidate report for Unique DEGs for AR

9.1 Moksiskaan candidate pathway

PRKAB1, LRP5, PRKAB2, LRP6 PRKAG1, PRKAG2, PRKAG3 CAMKK2 ACACA FZD2

AKT1, IDH3A, AC008810.1, IDH1 AKT2, IDH3B, PRKAA2 AKT3 IDH3G WNT5A PIK3C2A, PIKFYVE PIK3C2B, PIK3R1, PIK3C2G PIK3R2, PIK3CA, MTOR PIK3R3, GLI3 GLI1 PIK3CB, ACACB PIK3R5 PIK3CD, PIK3CG INPP4B PIK3C3 ZIC2 GLI2

APBB1, NAE1 AMD1 SMS GUSB, APP CPT1A, UGDH CPT1B, CPT1C ADRB2 ARRB1, SRM ARRB2 CYP1A1

UGT2B17 ASAH1, HSD17B2, ASAH2 HSD17B6, DEGS2 HSD17B8 ACSL3 UGT2B28 SMPD2 CYP2D6 HSD3B1, HSD3B2 CYP21A2 CYP3A4, CYP3A43, CYP3A5, KL CYP3A7 CYP11A1

CYP17A1 BTG1 CNOT7, B3GNT6 GALNTL4 CNOT8 ATP1B1

ST6GALNAC1 AC023024.1, PABPC1, MARCH6 PABPC1L, PABPC1L2A, UBE2G1 PABPC1L2B, PABPC3, PABPC4, PABPC4L, PABPC5

gene gene pathway protein protein protein protein-protein phosphorylation expression repression precedence activation binding dissociation interaction

Figure 7: Known relationships between the candidate genes. Candidate genes are shown in red if they have only output connections. The ratio of input and output connections determines how light they are. Completely white genes have only input connections. The maximum of 1 other gene step(s) are allowed between the candidate genes and these intermediate genes are shown on gray. Green and blue borders are referring to up and down regulated genes, respectively. Light grey is used to emphasize stably expressed genes. Known regulations are shown with bold borders whereas the predictions are kept thin.

Table 14: Descriptions of the intermediated genes between the candidate genes. Studies that have reported results about the candidate genes are listed so that those with negative evidence have been prefixed with a hyphen. This table has 72 rows.

name description studies AC008810.1 5’-AMP-activated protein kinase catalytic subunit alpha-1 [Source:UniProtKB/Swiss-Prot;Acc:Q13131] =5:40759481-40798476 AC023024.1 Selenoprotein S [Source:UniProtKB/Swiss-Prot;Acc:Q9BQE4] locus=15:101811022-101817705 tcgaGliomaGE ACACB acetyl-CoA carboxylase beta [Source:HGNC Symbol;Acc:85] locus=12:109554400-109706031 AKT1 v-akt murine thymoma viral homolog 1 [Source:HGNC Symbol;Acc:391] locus=14:105235689-105262080 cosmicRecurrent, tcgaGliomaGE, tscapeMelanomad AKT2 v-akt murine thymoma viral oncogene homolog 2 [Source:HGNC Symbol;Acc:392] locus=19:40736224-40791302 tcgaGliomaGE AKT3 v-akt murine thymoma viral oncogene homolog 3 (protein kinase B, gamma) [Source:HGNC Symbol;Acc:393] tcgaGliomaGE, tscapeBCa, locus=1:243651535-244013430 tscapeMelanomaa, tscapeNSCLCa, tscapeOvariana AMD1 adenosylmethionine decarboxylase 1 [Source:HGNC Symbol;Acc:457] locus=6:111195973-111216916 tscapeCRCd, tscapeOvariand APBB1 amyloid beta (A4) precursor protein-binding, family B, member 1 (Fe65) [Source:HGNC Symbol;Acc:581] tcgaGliomaGE, tscapeBCd locus=11:6416354-6440644 ARRB1 arrestin, beta 1 [Source:HGNC Symbol;Acc:711] locus=11:74975226-75062873 tcgaGliomaGE ARRB2 arrestin, beta 2 [Source:HGNC Symbol;Acc:712] locus=17:4613784-4624795 snp3dMetastasis, tcgaGliomaGE ASAH1 N-acylsphingosine amidohydrolase (acid ceramidase) 1 [Source:HGNC Symbol;Acc:735] locus=8:17913934-17942494 tcgaGliomaGE ASAH2 N-acylsphingosine amidohydrolase (non-lysosomal ceramidase) 2 [Source:HGNC Symbol;Acc:18860] locus=10:51884446-52039568 Continued on next page. . .

19 name description studies B3GNT6 UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 6 (core 3 synthase) [Source:HGNC Symbol;Acc:24141] tscapeOvariana locus=11:76745385-76753096 CNOT7 CCR4-NOT transcription complex, subunit 7 [Source:HGNC Symbol;Acc:14101] locus=8:17086737-17104387 CNOT8 CCR4-NOT transcription complex, subunit 8 [Source:HGNC Symbol;Acc:9207] locus=5:154237113-154256353 tscapeRCCa CPT1A carnitine palmitoyltransferase 1A () [Source:HGNC Symbol;Acc:2328] locus=11:68522088-68609399 tcgaBreastGE, tscapeSCLCa CPT1B carnitine palmitoyltransferase 1B (muscle) [Source:HGNC Symbol;Acc:2329] locus=22:51007290-51017899 tscapeBCd, tscapeHCCd, tscapeNSCLCd, tscapeOvariand, tscapeSCLCd CPT1C carnitine palmitoyltransferase 1C [Source:HGNC Symbol;Acc:18540] locus=19:50194373-50216988 tcgaGliomaGE, tscapeNSCLCd, tscapeOvariand CYP17A1 cytochrome P450, family 17, subfamily A, polypeptide 1 [Source:HGNC Symbol;Acc:2593] locus=10:104590288-104597290 tscapeBCd, tscapeCRCd, tscapeOvariand CYP1A1 cytochrome P450, family 1, subfamily A, polypeptide 1 [Source:HGNC Symbol;Acc:2595] locus=15:75011883-75017951 snp3dLungC CYP21A2 cytochrome P450, family 21, subfamily A, polypeptide 2 [Source:HGNC Symbol;Acc:2600] locus=6:32006042-32009447 CYP2D6 cytochrome P450, family 2, subfamily D, polypeptide 6 [Source:HGNC Symbol;Acc:2625] locus=22:42522501-42540472 tscapeGliomad CYP3A4 cytochrome P450, family 3, subfamily A, polypeptide 4 [Source:HGNC Symbol;Acc:2637] locus=7:99354604-99381888 tscapeNSCLCa CYP3A43 cytochrome P450, family 3, subfamily A, polypeptide 43 [Source:HGNC Symbol;Acc:17450] locus=7:99425636-99463718 tscapeNSCLCa CYP3A5 cytochrome P450, family 3, subfamily A, polypeptide 5 [Source:HGNC Symbol;Acc:2638] locus=7:99245817-99277621 tcgaBreastGE, tscapeNSCLCa CYP3A7 cytochrome P450, family 3, subfamily A, polypeptide 7 [Source:HGNC Symbol;Acc:2640] locus=7:99293368-99332819 tscapeNSCLCa GLI1 GLI family 1 [Source:HGNC Symbol;Acc:4317] locus=12:57853918-57866045 GLI2 GLI family zinc finger 2 [Source:HGNC Symbol;Acc:4318] locus=2:121493199-121750229 tcgaBreastGE, tscapeProstated, tscapeRCCa GLI3 GLI family zinc finger 3 [Source:HGNC Symbol;Acc:4319] locus=7:42000548-42277469 GUSB glucuronidase, beta [Source:HGNC Symbol;Acc:4696] locus=7:65425671-65447301 tcgaGliomaGE HSD17B2 hydroxysteroid (17-beta) dehydrogenase 2 [Source:HGNC Symbol;Acc:5211] locus=16:82068866-82132139 HSD17B6 hydroxysteroid (17-beta) dehydrogenase 6 homolog (mouse) [Source:HGNC Symbol;Acc:23316] locus=12:57157108-57181574 tcgaBreastGE, tcgaGliomaGE HSD17B8 hydroxysteroid (17-beta) dehydrogenase 8 [Source:HGNC Symbol;Acc:3554] locus=6:33172419-33174608 HSD3B1 hydroxy-delta-5-steroid dehydrogenase, 3 beta- and steroid delta- 1 [Source:HGNC Symbol;Acc:5217] tscapeBCa, tscapeNSCLCd, locus=1:120049821-120057681 tscapeSCLCd HSD3B2 hydroxy-delta-5-steroid dehydrogenase, 3 beta- and steroid delta-isomerase 2 [Source:HGNC Symbol;Acc:5218] tscapeBCa, tscapeNSCLCd, locus=1:119957554-119965658 tscapeSCLCd IDH3A 3 (NAD+) alpha [Source:HGNC Symbol;Acc:5384] locus=15:78441663-78464291 tcgaGliomaGE IDH3B isocitrate dehydrogenase 3 (NAD+) beta [Source:HGNC Symbol;Acc:5385] locus=20:2639041-2644865 IDH3G isocitrate dehydrogenase 3 (NAD+) gamma [Source:HGNC Symbol;Acc:5386] locus=X:153051221-153059978 tcgaGliomaGE, tscapeBCa, tscapeNSCLCa KL klotho [Source:HGNC Symbol;Acc:6344] locus=13:33590207-33640282 tcgaGliomaGE, tscapeBCd, tscapeHCCd, tscapeOvariand, tscapeSCLCd LRP5 low density lipoprotein receptor-related protein 5 [Source:HGNC Symbol;Acc:6697] locus=11:68080077-68216743 snp3dDiabetes, tcgaGliomaGE, tscapeSCLCa LRP6 low density lipoprotein receptor-related protein 6 [Source:HGNC Symbol;Acc:6698] locus=12:12268959-12419946 cosmicMetastasis, tscapeProstated MARCH6 membrane-associated ring finger (C3HC4) 6 [Source:HGNC Symbol;Acc:30550] locus=5:10353815-10435491 tcgaGliomaGE NAE1 NEDD8 activating E1 subunit 1 [Source:HGNC Symbol;Acc:621] locus=16:66836778-66864900 tscapeOvariand PABPC1 poly(A) binding protein, cytoplasmic 1 [Source:HGNC Symbol;Acc:8554] locus=8:101698044-101735037 tcgaGliomaGE, tscapeNSCLCa PABPC1L poly(A) binding protein, cytoplasmic 1-like [Source:HGNC Symbol;Acc:15797] locus=20:43538703-43587676 tcgaGliomaGE PABPC1L2A poly(A) binding protein, cytoplasmic 1-like 2A [Source:HGNC Symbol;Acc:27989] locus=X:72297115-72299351 tcgaGliomaGE PABPC1L2B poly(A) binding protein, cytoplasmic 1-like 2B [Source:HGNC Symbol;Acc:31852] locus=X:72223352-72225551 tcgaBreastGE PABPC3 poly(A) binding protein, cytoplasmic 3 [Source:HGNC Symbol;Acc:8556] locus=13:25670300-25673389 tscapeBCd, tscapeHCCd, tscapeSCLCd PABPC4 poly(A) binding protein, cytoplasmic 4 (inducible form) [Source:HGNC Symbol;Acc:8557] locus=1:40026488-40042462 tcgaGliomaGE, tscapeNSCLCa, tscapeOvariana PABPC4L poly(A) binding protein, cytoplasmic 4-like [Source:HGNC Symbol;Acc:31955] locus=4:135121062-135122903 PABPC5 poly(A) binding protein, cytoplasmic 5 [Source:HGNC Symbol;Acc:13629] locus=X:90689594-90693583 PIK3C2A phosphoinositide-3-kinase, class 2, alpha polypeptide [Source:HGNC Symbol;Acc:8971] locus=11:17099277-17229530 cosmicRecurrent PIK3C2B phosphoinositide-3-kinase, class 2, beta polypeptide [Source:HGNC Symbol;Acc:8972] locus=1:204391756-204463852 tcgaGliomaGE, tscapeBCa, tscapeGliomaa, tscapeProstatea PIK3C2G phosphoinositide-3-kinase, class 2, gamma polypeptide [Source:HGNC Symbol;Acc:8973] locus=12:18400548-18801348 tscapeHCCd PIK3C3 phosphoinositide-3-kinase, class 3 [Source:HGNC Symbol;Acc:8974] locus=18:39535171-39667794 PIK3CA phosphoinositide-3-kinase, catalytic, alpha polypeptide [Source:HGNC Symbol;Acc:8975] locus=3:178865902-178957881 cosmicPrimary, tcgaOvarianGE, tscapeBCa, tscapeOvariana PIK3CB phosphoinositide-3-kinase, catalytic, beta polypeptide [Source:HGNC Symbol;Acc:8976] locus=3:138372860-138553780 tcgaGliomaGE PIK3CD phosphoinositide-3-kinase, catalytic, delta polypeptide [Source:HGNC Symbol;Acc:8977] locus=1:9711790-9789172 tscapeBCd, tscapeCRCd, tscapeHCCd, tscapeNSCLCd, tscapeOvariana, tscapeOvariand, tscapeRCCd PIK3CG phosphoinositide-3-kinase, catalytic, gamma polypeptide [Source:HGNC Symbol;Acc:8978] locus=7:106505723-106547590 cosmicMetastasis, cosmicRecurrent, tcgaGliomaGE PIK3R1 phosphoinositide-3-kinase, regulatory subunit 1 (alpha) [Source:HGNC Symbol;Acc:8979] locus=5:67511548-67597649 tcgaBreastGE, tcgaGliomaGE, tscapeBCd, tscapeCRCd, tscapeNSCLCd, tscapeOvariand, tscapeProstated, tscapeSCLCd PIK3R2 phosphoinositide-3-kinase, regulatory subunit 2 (beta) [Source:HGNC Symbol;Acc:8980] locus=19:18263928-18281343 tcgaBreastGE PIK3R3 phosphoinositide-3-kinase, regulatory subunit 3 (gamma) [Source:HGNC Symbol;Acc:8981] locus=1:46505812-46642160 tcgaBreastGE, tscapeSCLCa PIK3R5 phosphoinositide-3-kinase, regulatory subunit 5 [Source:HGNC Symbol;Acc:30035] locus=17:8782228-8869024 tcgaGliomaGE PIKFYVE phosphoinositide kinase, FYVE finger containing [Source:HGNC Symbol;Acc:23785] locus=2:209130991-209223475 tcgaOvarianGE PRKAA2 protein kinase, AMP-activated, alpha 2 catalytic subunit [Source:HGNC Symbol;Acc:9377] locus=1:57110995-57181008 tcgaGliomaGE PRKAB1 protein kinase, AMP-activated, beta 1 non-catalytic subunit [Source:HGNC Symbol;Acc:9378] tcgaGliomaGE locus=12:120105558-120119435 PRKAB2 protein kinase, AMP-activated, beta 2 non-catalytic subunit [Source:HGNC Symbol;Acc:9379] tscapeCRCa locus=1:146626685-146644129 PRKAG1 protein kinase, AMP-activated, gamma 1 non-catalytic subunit [Source:HGNC Symbol;Acc:9385] tscapeRCCa locus=12:49396058-49412592 PRKAG2 protein kinase, AMP-activated, gamma 2 non-catalytic subunit [Source:HGNC Symbol;Acc:9386] tcgaGliomaGE, tscapeMelanomaa, locus=7:151253210-151574210 tscapeOvariana, tscapeOvariand PRKAG3 protein kinase, AMP-activated, gamma 3 non-catalytic subunit [Source:HGNC Symbol;Acc:9387] tscapeRCCd locus=2:219687106-219696809 SRM spermidine synthase [Source:HGNC Symbol;Acc:11296] locus=1:11114641-11120081 tscapeBCd, tscapeCRCd, tscapeHCCd, tscapeNSCLCd, tscapeOvariana, tscapeOvariand, tscapeRCCd UGDH UDP-glucose 6-dehydrogenase [Source:HGNC Symbol;Acc:12525] locus=4:39500375-39529931 tcgaGliomaGE

Table 15: List of KEGG [2] pathways supporting the relationships between the genes shown in Figure7. Number of edges taken from each pathway is shown on edges column.

name edges genes Steroid hormone biosynthesis 96 CYP11A1, CYP17A1, CYP1A1, CYP21A2, CYP3A4, CYP3A43, CYP3A5, CYP3A7, HSD17B2, HSD17B6, HSD17B8, HSD3B1, HSD3B2, UGT2B17, UGT2B28 Prostate cancer 78 AKT1, AKT2, AKT3, MTOR, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Non-small cell lung cancer 72 AKT1, AKT2, AKT3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Melanoma 72 AKT1, AKT2, AKT3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Influenza A 72 AKT1, AKT2, AKT3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Measles 72 AKT1, AKT2, AKT3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Chagas disease (American trypanosomiasis) 72 AKT1, AKT2, AKT3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Neurotrophin signaling pathway 72 AKT1, AKT2, AKT3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Fc gamma R-mediated phagocytosis 72 AKT1, AKT2, AKT3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 VEGF signaling pathway 72 AKT1, AKT2, AKT3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Chemokine signaling pathway 72 AKT1, AKT2, AKT3, ARRB1, ARRB2, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Insulin signaling pathway 58 AC008810.1, ACACA, ACACB, AKT1, AKT2, AKT3, MTOR, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, PRKAA2, PRKAB1, PRKAB2, PRKAG1, PRKAG2, PRKAG3 Acute myeloid leukemia 54 AKT1, AKT2, AKT3, MTOR, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 ErbB signaling pathway 54 AKT1, AKT2, AKT3, MTOR, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Small cell lung cancer 48 AKT1, AKT2, AKT3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Chronic myeloid leukemia 48 AKT1, AKT2, AKT3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Cholinergic synapse 48 AKT1, AKT2, AKT3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Continued on next page. . .

20 name edges genes T cell receptor signaling pathway 48 AKT1, AKT2, AKT3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Focal adhesion 48 AKT1, AKT2, AKT3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 RNA degradation 36 BTG1, CNOT7, CNOT8, PABPC1, PABPC1L, PABPC1L2A, PABPC1L2B, PABPC3, PABPC4, PABPC4L, PABPC5 Adipocytokine signaling pathway 33 AC008810.1, ACACB, ACSL3, AKT1, AKT2, AKT3, CAMKK2, CPT1A, CPT1B, CPT1C, MTOR, PRKAA2, PRKAB1, PRKAB2, PRKAG1, PRKAG2, PRKAG3 Colorectal cancer 24 AKT1, AKT2, AKT3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Jak-STAT signaling pathway 24 AKT1, AKT2, AKT3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Osteoclast differentiation 24 AKT1, AKT2, AKT3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Drug metabolism - cytochrome P450 20 CYP2D6, CYP3A4, CYP3A43, CYP3A5, CYP3A7, UGT2B17, UGT2B28 Phosphatidylinositol signaling system 18 INPP4B, PIK3C2A, PIK3C2B, PIK3C2G, PIK3C3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, PIKFYVE Retinol metabolism 18 CYP1A1, CYP3A4, CYP3A43, CYP3A5, CYP3A7, UGT2B17, UGT2B28 Pathways in cancer 12 AKT1, AKT2, AKT3, FZD2, GLI1, GLI2, GLI3, MTOR, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, WNT5A Basal cell carcinoma 10 FZD2, GLI1, GLI2, GLI3, WNT5A Hedgehog signaling pathway 9 GLI1, GLI2, GLI3, WNT5A, ZIC2 mTOR signaling pathway7 AC008810.1, AKT1, AKT2, AKT3, MTOR, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, PRKAA2 Sphingolipid metabolism 7 ASAH1, ASAH2, DEGS2, SMPD2 Glioma 6 AKT1, AKT2, AKT3, MTOR, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Inositol phosphate metabolism 6 INPP4B, PIK3C2A, PIK3C2B, PIK3C2G, PIK3C3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIKFYVE Starch and sucrose metabolism6 GUSB,KL, UGDH, UGT2B17, UGT2B28 Pentose and glucuronate interconversions 6 GUSB,KL, UGDH, UGT2B17, UGT2B28 Citrate cycle (TCA cycle)6 IDH1, IDH3A, IDH3B, IDH3G Wnt signaling pathway 5 FZD2, LRP5, LRP6, WNT5A Alzheimer’s disease4 APBB1, APP, NAE1 Endocytosis 4 ADRB2, ARRB1, ARRB2 Protein processing in endoplasmic reticulum4 AC023024.1, MARCH6, UBE2G1 Arginine and proline metabolism 4 AMD1, SMS, SRM Mucin type O-Glycan biosynthesis3 B3GNT6, GALNTL4, ST6GALNAC1 Fatty acid metabolism 3 ACSL3, CPT1A, CPT1B, CPT1C Melanogenesis2 FZD2, WNT5A Drug metabolism - other 2 CYP3A4, CYP3A43, CYP3A5, CYP3A7, GUSB, UGT2B17, UGT2B28 Porphyrin and chlorophyll metabolism2 GUSB, UGT2B17, UGT2B28 Cysteine and methionine metabolism 2 AMD1, SMS, SRM Fatty acid biosynthesis2 ACACA, ACACB Endocrine and other factor-regulated calcium 1 ATP1B1,KL reabsorption Glutathione metabolism1 IDH1, SMS, SRM

21 9.1.1 GO enrichment of the candidate pathway

Table 16: Enriched terms [1] (FDR corrected p ≤ 0.01). Ratio is the proportion of the annotated genes among the whole gene set. List is sorted based on the FDR corrected p-values. Green and blue borders are referring to up and down regulated genes, respectively.

Ratio Type Description Genes 0.500 BP lipid metabolic process AC008810.1, ACACA, ACACB, ACSL3, AKT1, AKT2, ASAH1, ASAH2, CPT1A, CPT1B, CPT1C, CYP11A1, CYP17A1, CYP1A1, CYP21A2, CYP2D6, CYP3A4, CYP3A5, DEGS2, HSD17B2, HSD17B6, HSD17B8, HSD3B1, HSD3B2, IDH1, MTOR, PIK3C2A, PIK3C2B, PIK3C2G, PIK3C3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIKFYVE, PRKAA2, PRKAB1, PRKAB2, PRKAG1, PRKAG2, PRKAG3, SMPD2, UGT2B17 0.250 BP lipid modification AC008810.1, ACACB, AKT1, AKT2, CPT1A, CPT1B, MTOR, PIK3C2A, PIK3C2B, PIK3C2G, PIK3C3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PRKAA2, PRKAB1, PRKAB2, PRKAG1, PRKAG2, PRKAG3 0.138 CC phosphatidylinositol 3-kinase complex MTOR, PIK3C2A, PIK3C2B, PIK3C2G, PIK3C3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3 0.148 BP regulation of fatty acid oxidation AC008810.1, ACACB, AKT1, AKT2, CPT1A, CPT1B, MTOR, PRKAA2, PRKAB1, PRKAB2, PRKAG1, PRKAG2, PRKAG3 0.109 MF 1-phosphatidylinositol-3-kinase activity PIK3C2A, PIK3C2B, PIK3C2G, PIK3C3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R2, PIK3R3 0.375 BP cellular response to chemical stimulus AC008810.1, AC023024.1, AKT1, AKT2, ARRB2, CYP11A1, CYP17A1, CYP1A1, CYP2D6, CYP3A4, CYP3A43, CYP3A5, CYP3A7, FZD2, GLI2, KL, LRP6, MTOR, PIK3C3, PIK3CA, PIK3CB, PIK3R1, PIK3R2, PIK3R3, PRKAA2, PRKAB1, PRKAB2, PRKAG1, PRKAG2, PRKAG3, UGDH, UGT2B28, WNT5A 0.193 BP insulin receptor signaling pathway AC008810.1, AKT1, AKT2, KL, MTOR, PIK3C3, PIK3CA, PIK3CB, PIK3R1, PIK3R2, PIK3R3, PRKAA2, PRKAB1, PRKAB2, PRKAG1, PRKAG2, PRKAG3 0.368 CC cell fraction ACACA, ACSL3, ADRB2, AKT1, AKT2, APP, ARRB1, CPT1A, CPT1B, CYP1A1, CYP2D6, CYP3A4, CYP3A43, CYP3A5, CYP3A7, GLI2, GUSB, HSD17B6, HSD17B8, HSD3B1, HSD3B2, IDH1, KL, LRP6, MTOR, NAE1, PIK3C2B, PIK3R1, SMPD2, UGT2B17, UGT2B28, WNT5A 0.102 BP phosphatidylinositol phosphorylation PIK3C2A, PIK3C2B, PIK3C2G, PIK3C3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1 0.207 CC microsome ACSL3, ADRB2, AKT2, CPT1A, CPT1B, CYP1A1, CYP2D6, CYP3A4, CYP3A43, CYP3A5, CYP3A7, GUSB, HSD17B6, HSD3B1, HSD3B2, PIK3C2B, UGT2B17, UGT2B28 0.250 BP lipid biosynthetic process AC008810.1, ACACA, ACACB, ACSL3, AKT1, CYP11A1, CYP17A1, CYP21A2, DEGS2, HSD17B2, HSD17B6, HSD17B8, HSD3B1, HSD3B2, PIK3C2A, PRKAA2, PRKAB1, PRKAB2, PRKAG1, PRKAG2, PRKAG3, SMPD2 0.136 BP phosphatidylinositol-mediated signaling AKT1, MTOR, PIK3C2A, PIK3C2B, PIK3C2G, PIK3C3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2 0.069 CC AMP-activated protein kinase complex AC008810.1, PRKAA2, PRKAB1, PRKAB2, PRKAG1, PRKAG2 0.182 BP steroid metabolic process AC008810.1, CYP11A1, CYP17A1, CYP1A1, CYP21A2, CYP2D6, CYP3A4, CYP3A5, HSD17B2, HSD17B6, HSD17B8, HSD3B1, HSD3B2, PRKAA2, PRKAG2, UGT2B17 0.136 BP fatty acid biosynthetic process AC008810.1, ACACA, ACACB, ACSL3, DEGS2, HSD17B8, PRKAA2, PRKAB1, PRKAB2, PRKAG1, PRKAG2, PRKAG3 0.080 BP intracellular lipid transport ACACB, CPT1A, CPT1B, LRP6, PRKAA2, PRKAB2, PRKAG2 0.068 BP carnitine shuttle ACACB, CPT1A, CPT1B, PRKAA2, PRKAB2, PRKAG2 0.239 BP carbohydrate metabolic process AC008810.1, AKT1, AKT2, B3GNT6, CNOT7, CPT1A, GUSB, IDH1, IDH3A, IDH3G, KL, LRP5, MTOR, PIK3CA, PIK3R1, PRKAB1, PRKAG1, PRKAG2, PRKAG3, ST6GALNAC1, UGDH 0.125 BP steroid biosynthetic process AC008810.1, CYP11A1, CYP17A1, CYP21A2, HSD17B2, HSD17B6, HSD17B8, HSD3B1, HSD3B2, PRKAA2, PRKAG2 0.130 MF electron carrier activity CYP11A1, CYP17A1, CYP1A1, CYP21A2, CYP2D6, CYP3A4, CYP3A43, CYP3A5, CYP3A7, HSD17B6, IDH3B, UGDH 0.109 MF activity, acting on the HSD17B2, HSD17B6, HSD17B8, HSD3B1, HSD3B2, IDH1, IDH3A, IDH3B, IDH3G, UGDH CH-OH group of donors, NAD or NADP as acceptor 0.159 BP energy derivation by oxidation of organic ACACA, ACACB, AKT1, AKT2, IDH1, IDH3A, IDH3B, IDH3G, KL, MTOR, PRKAA2, PRKAB2, PRKAG2, compounds PRKAG3 0.098 MF monooxygenase activity CYP11A1, CYP17A1, CYP1A1, CYP21A2, CYP2D6, CYP3A4, CYP3A43, CYP3A5, CYP3A7 0.207 CC endoplasmic reticulum membrane AC023024.1, ACSL3, CPT1C, CYP17A1, CYP1A1, CYP2D6, CYP3A4, CYP3A43, CYP3A5, CYP3A7, DEGS2, HSD17B2, HSD3B1, HSD3B2, MARCH6, MTOR, UGT2B17, UGT2B28 0.109 MF oxidoreductase activity, acting on paired CYP11A1, CYP17A1, CYP1A1, CYP21A2, CYP2D6, CYP3A4, CYP3A43, CYP3A5, CYP3A7, DEGS2 donors, with incorporation or reduction of molecular oxygen 0.068 BP androgen metabolic process CYP17A1, CYP3A4, HSD17B6, HSD17B8, HSD3B1, HSD3B2 0.114 BP xenobiotic metabolic process CYP11A1, CYP17A1, CYP1A1, CYP2D6, CYP3A4, CYP3A43, CYP3A5, CYP3A7, UGDH, UGT2B28 0.065 MF oxidoreductase activity, acting on paired CYP1A1, CYP2D6, CYP3A4, CYP3A43, CYP3A5, CYP3A7 donors, with incorporation or reduction of molecular oxygen, reduced flavin or flavoprotein as one donor, and incorporation of one atom of oxygen 0.125 BP glucose metabolic process AC008810.1, AKT1, AKT2, CPT1A, LRP5, MTOR, PIK3CA, PIK3R1, PRKAG1, PRKAG2, UGDH 0.370 MF binding AC008810.1, ACACA, ACACB, ACSL3, AKT1, AKT2, AKT3, CAMKK2, IDH1, IDH3A, IDH3B, IDH3G, MTOR, PABPC1, PABPC1L, PABPC1L2A, PABPC1L2B, PABPC3, PABPC4, PABPC5, PIK3C2A, PIK3C2B, PIK3C2G, PIK3C3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIKFYVE, PRKAA2, PRKAG1, PRKAG2, UBE2G1, UGDH 0.114 BP energy reserve metabolic process ACACA, ACACB, AKT1, AKT2, KL, MTOR, PRKAA2, PRKAB2, PRKAG2, PRKAG3 0.043 MF AMP-activated protein kinase activity AC008810.1, PRKAA2, PRKAG2, PRKAG3 0.043 MF isocitrate dehydrogenase activity IDH1, IDH3A, IDH3B, IDH3G 0.043 MF phosphatidylinositol-4,5-bisphosphate PIK3CA, PIK3CB, PIK3CD, PIK3CG 3-kinase activity 0.261 BP response to external stimulus AC023024.1, ACSL3, ADRB2, AKT1, AKT2, APBB1, APP, ARRB1, ARRB2, CYP11A1, CYP17A1, FZD2, GLI2, GLI3, HSD17B2, LRP6, MTOR, PIK3C3, PIK3CB, PIK3R1, SMPD2, WNT5A, ZIC2 0.098 MF heme binding CYP11A1, CYP17A1, CYP1A1, CYP21A2, CYP2D6, CYP3A4, CYP3A43, CYP3A5, CYP3A7 0.045 BP isocitrate metabolic process IDH1, IDH3A, IDH3B, IDH3G 0.125 BP platelet activation AKT1, APP, ARRB1, ARRB2, PIK3CA, PIK3CB, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 0.080 BP fibroblast growth factor receptor signaling KL, PIK3C3, PIK3CA, PIK3CB, PIK3R1, PIK3R2, WNT5A pathway 0.250 BP protein phosphorylation AC008810.1, ADRB2, AKT1, AKT2, AKT3, APP, ARRB1, ARRB2, CAMKK2, LRP5, LRP6, MTOR, PIK3C3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PRKAA2, PRKAG1, PRKAG2, WNT5A 0.057 BP C21-steroid hormone biosynthetic process CYP11A1, CYP17A1, CYP21A2, HSD3B1, HSD3B2 0.054 MF aromatase activity CYP1A1, CYP2D6, CYP3A43, CYP3A5, CYP3A7 0.065 MF oxygen binding CYP17A1, CYP1A1, CYP2D6, CYP3A4, CYP3A5, CYP3A7 0.307 BP catabolic process AC023024.1, AKT1, AKT2, ARRB1, ARRB2, CNOT7, CNOT8, CPT1A, CPT1B, CYP2D6, CYP3A4, CYP3A5, GUSB, HSD17B6, IDH1, IDH3A, IDH3B, IDH3G, LRP5, MTOR, PABPC1, PABPC4, PIK3C3, PRKAG1, PRKAG2, UBE2G1, WNT5A 0.068 BP acetyl-CoA metabolic process ACACA, ACACB, IDH1, IDH3A, IDH3B, IDH3G 0.182 BP regulation of protein phosphorylation AC008810.1, ADRB2, AKT1, AKT2, APP, ARRB1, ARRB2, CAMKK2, LRP5, LRP6, MTOR, PIK3CB, PIK3R1, PRKAG1, PRKAG2, WNT5A 0.045 BP androgen biosynthetic process CYP17A1, HSD17B6, HSD3B1, HSD3B2 0.159 BP regulation of kinase activity AC008810.1, ADRB2, AKT1, APP, ARRB1, CAMKK2, LRP5, LRP6, MTOR, PIK3CB, PIK3R1, PRKAG1, PRKAG2, WNT5A 0.043 MF phosphatidylinositol phosphate kinase PIK3C2A, PIK3C2B, PIK3C2G, PIKFYVE activity 0.045 BP regulation of fatty acid beta-oxidation AKT1, AKT2, CPT1A, MTOR 0.054 MF steroid dehydrogenase activity, acting on HSD17B2, HSD17B6, HSD17B8, HSD3B1, HSD3B2 the CH-OH group of donors, NAD or NADP as acceptor 0.068 BP protein kinase B signaling cascade AKT1, AKT2, ARRB2, MTOR, PIK3C2B, PIK3CA 0.091 BP positive regulation of T cell activation AKT1, GLI2, GLI3, MTOR, PIK3CA, PIK3R1, PIK3R2, PIK3R3 0.068 BP T cell costimulation AKT1, MTOR, PIK3CA, PIK3R1, PIK3R2, PIK3R3 0.068 BP lymphocyte costimulation AKT1, MTOR, PIK3CA, PIK3R1, PIK3R2, PIK3R3 0.033 MF isocitrate dehydrogenase (NAD+) activity IDH3A, IDH3B, IDH3G 0.033 MF phosphatidylinositol-4-phosphate 3-kinase PIK3C2A, PIK3C2B, PIK3C2G activity 0.125 BP response to nutrient levels ACSL3, AKT1, CYP11A1, CYP17A1, FZD2, HSD17B2, LRP6, MTOR, PIK3C3, PIK3R1, WNT5A 0.273 BP positive regulation of metabolic process AC008810.1, ACACA, ACACB, ADRB2, AKT1, AKT2, APBB1, APP, ARRB1, CAMKK2, CNOT7, CPT1A, FZD2, GLI1, GLI2, GLI3, LRP5, LRP6, MTOR, PABPC1, PIK3R1, PRKAG2, WNT5A, ZIC2 0.034 BP alkaloid catabolic process CYP2D6, CYP3A4, CYP3A5 0.148 BP regulation of protein kinase activity AC008810.1, ADRB2, AKT1, APP, ARRB1, CAMKK2, LRP5, LRP6, MTOR, PIK3CB, PRKAG1, PRKAG2, WNT5A 0.057 BP regulation of generation of precursor AKT1, AKT2, MTOR, PRKAG1, PRKAG2 metabolites and energy Continued on next page. . .

22 Ratio Type Description Genes 0.043 MF steroid hydroxylase activity CYP11A1, CYP17A1, CYP21A2, CYP3A4 0.174 MF enzyme binding AC023024.1, ADRB2, AKT1, ARRB1, ARRB2, BTG1, CYP3A4, GLI3, MARCH6, PIK3R1, PRKAB1, PRKAB2, PRKAG1, PRKAG2, PRKAG3, UBE2G1 0.033 MF carnitine O-palmitoyltransferase activity CPT1A, CPT1B, CPT1C 0.102 BP response to organic cyclic compound AC008810.1, ACACA, ACSL3, CPT1A, CYP17A1, GLI2, IDH1, PRKAA2, WNT5A 0.261 BP positive regulation of cellular metabolic ACACA, ACACB, ADRB2, AKT1, AKT2, APBB1, APP, ARRB1, CAMKK2, CNOT7, CPT1A, FZD2, GLI1, process GLI2, GLI3, LRP5, LRP6, MTOR, PABPC1, PIK3R1, PRKAG2, WNT5A, ZIC2 0.102 BP response to nutrient ACSL3, CYP11A1, CYP17A1, FZD2, HSD17B2, LRP6, MTOR, PIK3R1, WNT5A 0.045 BP 2-oxoglutarate metabolic process IDH1, IDH3A, IDH3B, IDH3G 0.227 BP positive regulation of biosynthetic process AC008810.1, ADRB2, AKT1, AKT2, APBB1, APP, ARRB1, CAMKK2, CNOT7, FZD2, GLI1, GLI2, GLI3, LRP5, LRP6, MTOR, PABPC1, PIK3R1, WNT5A, ZIC2 0.250 MF adenyl ribonucleotide binding AC008810.1, ACACA, ACACB, ACSL3, AKT1, AKT2, AKT3, CAMKK2, IDH3G, MTOR, PIK3C2A, PIK3C2B, PIK3C2G, PIK3C3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIKFYVE, PRKAA2, PRKAG1, PRKAG2, UBE2G1 0.033 MF phosphatidylinositol 3-kinase regulator PIK3R1, PIK3R2, PIK3R3 activity 0.033 MF testosterone 17-beta-dehydrogenase HSD17B2, HSD17B6, HSD17B8 activity 0.352 BP negative regulation of cellular process AC008810.1, AC023024.1, ADRB2, AKT1, AKT2, APBB1, APP, ARRB1, ARRB2, BTG1, CNOT8, CYP2D6, GLI1, GLI2, GLI3, LRP5, LRP6, MTOR, NAE1, PIK3CA, PIK3CG, PIK3R1, PIK3R2, PRKAA2, PRKAB1, PRKAB2, PRKAG1, PRKAG2, PRKAG3, WNT5A, ZIC2 0.216 BP positive regulation of macromolecule ADRB2, AKT1, AKT2, APBB1, APP, ARRB1, CAMKK2, CNOT7, FZD2, GLI1, GLI2, GLI3, LRP5, LRP6, biosynthetic process MTOR, PABPC1, PIK3R1, WNT5A, ZIC2 0.159 BP regulation of cell cycle AC008810.1, AKT1, APBB1, APP, LRP5, LRP6, NAE1, PRKAA2, PRKAB1, PRKAB2, PRKAG1, PRKAG2, PRKAG3, WNT5A 0.054 MF NAD binding IDH1, IDH3A, IDH3B, IDH3G, UGDH 0.057 BP regulation of glucose metabolic process AKT1, AKT2, MTOR, PRKAG1, PRKAG2 0.239 MF ATP binding AC008810.1, ACACA, ACACB, ACSL3, AKT1, AKT2, AKT3, CAMKK2, IDH3G, MTOR, PIK3C2A, PIK3C2B, PIK3C2G, PIK3C3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIKFYVE, PRKAA2, PRKAG2, UBE2G1 0.057 BP ceramide metabolic process AC008810.1, ASAH1, ASAH2, DEGS2, SMPD2 0.034 BP hindgut morphogenesis GLI2, GLI3, WNT5A 0.034 BP positive regulation of sodium ion ADRB2, AKT1, AKT2 transport 0.045 BP tricarboxylic acid cycle IDH1, IDH3A, IDH3B, IDH3G 0.216 BP positive regulation of cellular biosynthetic ADRB2, AKT1, AKT2, APBB1, APP, ARRB1, CAMKK2, CNOT7, FZD2, GLI1, GLI2, GLI3, LRP5, LRP6, process MTOR, PABPC1, PIK3R1, WNT5A, ZIC2 0.057 BP glycogen metabolic process AKT1, AKT2, MTOR, PRKAG2, PRKAG3 0.034 BP NADH metabolic process IDH3A, IDH3B, IDH3G 0.045 BP drug metabolic process CYP1A1, CYP2D6, CYP3A4, CYP3A5 0.193 BP positive regulation of transcription ADRB2, AKT1, AKT2, APBB1, APP, ARRB1, CAMKK2, CNOT7, FZD2, GLI1, GLI2, GLI3, LRP5, LRP6, PIK3R1, WNT5A, ZIC2 0.033 MF poly(A) RNA binding PABPC1, PABPC3, PABPC4 0.045 BP glycogen biosynthetic process AKT1, AKT2, MTOR, PRKAG3 0.045 BP positive regulation of ossification ADRB2, KL, LRP6, WNT5A 0.205 BP regulation of apoptosis AC008810.1, AC023024.1, ADRB2, AKT1, AKT2, APBB1, APP, ARRB2, BTG1, GLI3, LRP6, NAE1, PIK3CA, PIK3CG, PIK3R1, PIK3R2, SMPD2, WNT5A 0.034 BP oxidative demethylation CYP2D6, CYP3A4, CYP3A5 0.068 BP neural tube development FZD2, GLI2, GLI3, LRP6, WNT5A, ZIC2 0.034 BP biosynthetic process CYP17A1, HSD3B1, HSD3B2 0.033 MF estradiol 17-beta-dehydrogenase activity HSD17B2, HSD17B6, HSD17B8 0.034 BP cochlea morphogenesis FZD2, GLI2, WNT5A 0.034 BP drug catabolic process CYP2D6, CYP3A4, CYP3A5 0.057 BP regulation of lipid biosynthetic process AC008810.1, AKT1, PRKAA2, PRKAB2, PRKAG2 0.045 BP regulation of glucose import AKT1, AKT2, PIK3R1, PRKAG2 0.033 MF insulin receptor binding PIK3CA, PIK3CB, PIK3R1 0.114 BP MAPKKK cascade AC008810.1, ADRB2, AKT1, AKT2, ARRB1, ARRB2, CAMKK2, KL, PIK3CB, WNT5A 0.034 BP negative regulation of interleukin-6 AC023024.1, ARRB1, ARRB2 production 0.034 BP polyamine biosynthetic process AMD1, SMS, SRM 0.087 MF kinase binding ARRB2, BTG1, PIK3R1, PRKAB1, PRKAB2, PRKAG1, PRKAG2, PRKAG3 0.045 BP embryonic digit morphogenesis GLI2, GLI3, LRP6, WNT5A 0.034 BP desensitization of G-protein coupled ADRB2, ARRB1, ARRB2 receptor protein signaling pathway 0.034 BP planar cell polarity pathway involved in FZD2, LRP6, WNT5A neural tube closure 0.034 BP regulation of establishment of planar FZD2, LRP6, WNT5A polarity involved in neural tube closure 0.034 BP ventricular septum morphogenesis FZD2, LRP6, WNT5A 0.068 BP canonical Wnt receptor signaling pathway FZD2, GLI1, GLI3, LRP5, LRP6, WNT5A 0.034 BP vitamin D metabolic process CYP11A1, CYP1A1, CYP3A4 0.045 BP regulation of fat cell differentiation AKT1, LRP5, LRP6, WNT5A 0.102 BP cell cycle arrest AC008810.1, APBB1, NAE1, PRKAA2, PRKAB1, PRKAB2, PRKAG1, PRKAG2, PRKAG3 0.022 MF acetyl-CoA carboxylase activity ACACA, ACACB 0.057 BP dorsal/ventral pattern formation GLI1, GLI2, GLI3, LRP6, WNT5A 0.022 MF spermidine synthase activity SMS, SRM 0.103 CC soluble fraction ACACA, AKT1, AKT2, ARRB1, GUSB, IDH1, KL, MTOR, PIK3R1 0.023 BP cardiac right atrium morphogenesis LRP6, WNT5A 0.023 BP desensitization of G-protein coupled ADRB2, ARRB2 receptor protein signaling pathway by arrestin 0.023 BP non-canonical Wnt receptor signaling LRP6, WNT5A pathway involved in heart development 0.023 BP notochord regression GLI1, GLI2 0.023 BP pericardium morphogenesis LRP6, WNT5A 0.023 BP planar cell polarity pathway involved in LRP6, WNT5A cardiac muscle tissue morphogenesis 0.023 BP planar cell polarity pathway involved in LRP6, WNT5A cardiac right atrium morphogenesis 0.023 BP planar cell polarity pathway involved in LRP6, WNT5A heart morphogenesis 0.023 BP planar cell polarity pathway involved in LRP6, WNT5A outflow tract morphogenesis 0.023 BP planar cell polarity pathway involved in LRP6, WNT5A pericardium morphogenesis 0.023 BP planar cell polarity pathway involved in LRP6, WNT5A ventricular septum morphogenesis 0.080 BP regulation of establishment of protein AKT1, APBB1, GLI3, MTOR, PIK3C3, PIK3R1, WNT5A localization 0.023 BP smoothened signaling pathway involved in GLI2, GLI3 spinal cord motor neuron cell fate specification 0.023 BP smoothened signaling pathway involved in GLI2, GLI3 ventral spinal cord interneuron specification 0.023 BP spermine biosynthetic process AMD1, SMS 0.045 BP mammary gland morphogenesis GLI2, GLI3, LRP6, WNT5A 0.057 CC mitochondrial outer membrane ACSL3, CPT1A, CPT1B, CPT1C, MTOR 0.057 BP regulation of canonical Wnt receptor GLI1, GLI3, LRP5, LRP6, WNT5A signaling pathway 0.076 MF protein kinase binding ARRB2, PIK3R1, PRKAB1, PRKAB2, PRKAG1, PRKAG2, PRKAG3 0.125 BP positive regulation of transcription from ADRB2, AKT1, AKT2, APBB1, APP, ARRB1, CNOT7, GLI1, GLI2, GLI3, PIK3R1 RNA polymerase II 0.034 BP T cell apoptosis AKT1, GLI3, WNT5A 0.080 BP regulation of MAPKKK cascade ADRB2, AKT1, AKT2, ARRB1, ARRB2, KL, WNT5A 0.034 BP regulation of glycogen biosynthetic process AKT1, AKT2, MTOR 0.136 BP cell migration AKT1, AKT2, APBB1, ARRB2, ATP1B1, BTG1, LRP6, PIK3CA, PIK3CB, PIK3R1, PIK3R2, WNT5A 0.022 MF 1-phosphatidylinositol-3-kinase regulator PIK3R1, PIK3R3 activity 0.022 MF beta-glucuronidase activity GUSB, KL 0.022 MF steroid delta-isomerase activity HSD3B1, HSD3B2 0.022 MF vitamin D 24-hydroxylase activity CYP1A1, CYP3A4 0.046 CC caveola ADRB2, ATP1B1, LRP6, SMPD2 0.114 BP development ACSL3, APP, CYP11A1, FZD2, GLI1, GLI2, GLI3, LRP6, WNT5A, ZIC2 Continued on next page. . .

23 Ratio Type Description Genes 0.023 BP smoothened signaling pathway involved in GLI2, GLI3 dorsal/ventral neural tube patterning 0.068 BP respiratory system development ASAH1, GLI1, GLI2, GLI3, LRP6, WNT5A 0.159 BP positive regulation of response to stimulus ADRB2, AKT2, APBB1, ARRB1, ARRB2, GLI1, KL, LRP6, MTOR, PIK3CA, PIK3CB, PIK3R1, PIK3R2, WNT5A 0.080 BP developmental growth ADRB2, AKT1, APP, GLI2, GLI3, LRP6, WNT5A 0.034 BP ceramide biosynthetic process AC008810.1, DEGS2, SMPD2 0.045 BP cerebellum development CYP11A1, GLI1, GLI2, LRP6 0.080 BP nerve growth factor receptor signaling AKT1, MTOR, PIK3CA, PIK3CB, PIK3R1, PIK3R2, SMPD2 pathway 0.102 BP developmental process involved in AKT1, ARRB1, CYP17A1, FZD2, GLI2, IDH1, LRP6, MTOR, WNT5A reproduction 0.045 BP diencephalon development GLI1, GLI2, LRP6, WNT5A 0.034 BP gastrulation with mouth forming second LRP6, UGDH, WNT5A 0.136 BP macromolecule catabolic process AC023024.1, AKT1, ARRB1, ARRB2, CNOT7, CNOT8, GUSB, MTOR, PABPC1, PABPC4, UBE2G1, WNT5A 0.091 BP negative regulation of cell differentiation APBB1, APP, GLI2, GLI3, LRP5, LRP6, PIK3R1, WNT5A 0.023 BP negative regulation of plasma membrane AKT1, AKT2 long-chain fatty acid transport 0.023 BP smoothened signaling pathway involved in GLI1, GLI2 regulation of cerebellar granule cell precursor cell proliferation 0.045 BP camera-type eye morphogenesis GLI3, LRP5, LRP6, WNT5A 0.045 BP positive regulation of lipid metabolic AC008810.1, AKT1, AKT2, CPT1A process 0.045 BP regulation of striated muscle tissue ADRB2, BTG1, LRP6, PIK3R1 development 0.023 CC smooth endoplasmic reticulum membrane HSD3B1, HSD3B2 0.091 BP positive regulation of protein kinase AC008810.1, ADRB2, AKT1, ARRB1, PIK3CB, PRKAG1, PRKAG2, WNT5A activity 0.068 BP positive regulation of protein AKT1, AKT2, MTOR, PIK3R1, PRKAG2, WNT5A phosphorylation 0.034 BP nuclear-transcribed mRNA poly(A) tail CNOT7, CNOT8, PABPC1 shortening 0.034 BP positive regulation of glucose import AKT1, AKT2, PIK3R1 0.114 BP negative regulation of apoptosis AC008810.1, AC023024.1, AKT1, AKT2, LRP6, PIK3CA, PIK3CG, PIK3R1, PIK3R2, WNT5A 0.080 BP protein oligomerization AC008810.1, ACACA, ACACB, CPT1A, PRKAA2, PRKAB1, PRKAG1 0.057 BP positive regulation of MAPKKK cascade ADRB2, ARRB1, ARRB2, KL, WNT5A 0.080 BP regulation of transcription regulator AKT1, ARRB1, ARRB2, FZD2, LRP6, WNT5A, ZIC2 activity 0.023 BP estrogen biosynthetic process HSD17B8, HSD3B1 0.023 BP spermidine biosynthetic process AMD1, SRM 0.023 BP ventral midline development GLI1, GLI2 0.045 BP response to growth factor stimulus AKT1, FZD2, PIK3R1, WNT5A 0.054 MF phosphatidylinositol binding AKT1, PIK3C2A, PIK3C2B, PIK3C2G, PIK3R1 0.057 BP response to organic nitrogen AC008810.1, MTOR, PIK3C3, PIK3R1, PRKAA2 0.034 BP positive regulation of mesenchymal cell LRP5, LRP6, WNT5A proliferation 0.034 BP regulation of fatty acid biosynthetic PRKAA2, PRKAB2, PRKAG2 process 0.022 MF angiotensin receptor binding ARRB1, ARRB2 0.022 MF biotin binding ACACA, ACACB 0.022 MF biotin carboxylase activity ACACA, ACACB 0.057 BP positive regulation of transcription AKT1, FZD2, LRP6, WNT5A, ZIC2 regulator activity 0.045 BP protein heterooligomerization AC008810.1, PRKAA2, PRKAB1, PRKAG1 0.034 BP cerebellum morphogenesis GLI1, GLI2, LRP6 0.023 BP dopaminergic neuron differentiation LRP6, WNT5A 0.023 BP monoterpenoid metabolic process CYP2D6, CYP3A4 0.023 BP primitive streak formation LRP6, WNT5A 0.034 BP mammary gland duct morphogenesis GLI2, LRP6, WNT5A 0.034 BP proximal/distal pattern formation GLI1, GLI2, GLI3 0.045 BP response to acid LRP6, MTOR, PIK3C3, PIK3R1 0.057 BP negative regulation of cell size AKT1, APBB1, BTG1, MTOR, WNT5A 0.068 BP leukocyte migration ATP1B1, PIK3CA, PIK3CB, PIK3R1, PIK3R2, WNT5A 0.022 MF cAMP-dependent protein kinase activity AC008810.1, PRKAG1 0.091 BP regulation of cell development AKT1, APBB1, APP, BTG1, GLI2, LRP6, PIK3R1, WNT5A 0.034 BP peptidyl-threonine phosphorylation MTOR, PRKAG2, WNT5A 0.023 BP mammary gland formation GLI3, LRP6 0.023 BP mineralocorticoid biosynthetic process HSD3B1, HSD3B2 0.023 BP positive regulation of establishment of AKT1, PIK3R1 protein localization in plasma membrane 0.023 BP positive regulation of fatty acid AKT2, CPT1A beta-oxidation 0.023 BP positive regulation of myoblast BTG1, PIK3R1 differentiation 0.057 BP lung development ASAH1, GLI1, GLI2, GLI3, WNT5A 0.045 BP peptidyl-serine phosphorylation AKT1, AKT2, MTOR, WNT5A 0.034 BP response to activity AC008810.1, AKT2, PRKAA2 0.034 BP regulation of smoothened signaling GLI1, GLI2, GLI3 pathway 0.114 BP positive regulation of apoptosis ADRB2, AKT1, APBB1, APP, ARRB2, LRP6, NAE1, PIK3R1, SMPD2, WNT5A 0.043 MF protein binding ARRB1, ARRB2, PIK3R1, UBE2G1 0.023 BP NFAT protein import into nucleus MTOR, PIK3R1 0.149 CC cell projection ADRB2, AKT1, AKT2, APBB1, APP, ARRB1, CYP17A1, FZD2, GLI1, GLI2, GLI3, MTOR, PIK3CA 0.034 BP regulation of cartilage development GLI2, GLI3, WNT5A 0.045 BP negative regulation of Wnt receptor GLI1, GLI3, LRP6, WNT5A signaling pathway 0.045 BP negative regulation of response to external AC023024.1, ADRB2, MTOR, WNT5A stimulus 0.033 MF phosphoprotein binding ARRB1, MTOR, PIK3R1 0.045 BP T cell receptor signaling pathway PIK3CA, PIK3CB, PIK3R1, PIK3R2 0.080 BP regulation of protein serine/threonine AC008810.1, AKT1, ARRB1, LRP5, LRP6, PIK3CB, WNT5A kinase activity 0.057 BP response to vitamin CYP17A1, FZD2, HSD17B2, LRP6, WNT5A 0.046 CC lamellipodium AKT1, AKT2, APBB1, PIK3CA 0.023 BP G-protein coupled receptor internalization ARRB1, ARRB2 0.045 BP positive regulation of cell cycle AKT1, APP, LRP5, LRP6 0.023 BP positive regulation of peptidyl-threonine PRKAG2, WNT5A phosphorylation 0.023 BP prostatic bud formation GLI2, WNT5A

24 cellular_component

AMP-activated phosphatidylinositol endoplasmic mitochondrial cell fraction protein kinase caveola cell projection 3-kinase complex reticulum membrane outer membrane complex

smooth endoplasmic microsome soluble fraction lamellipodium reticulum membrane

Figure 8: Relationships between the enriched cellular component Gene Ontology terms that were listed in Table 16. The darkness of the red reflects the significance of the enrichment and the thicknesses of the edges are propotional to the numbers of genes sharing the following annotation.

molecular_function

oxidoreductase oxidoreductase activity, acting activity, acting on paired donors, AMP-activated phosphatidylinositol phosphatidylinositol acetyl-CoA cAMP-dependent 1-phosphatidylinositol-3-kinase electron carrier on the CH-OH monooxygenase nucleotide phosphatidylinositol-4,5-bisphosphate carnitine O-palmitoyltransferase poly(A) RNA insulin receptor spermidine beta-glucuronidase steroid delta-isomerase vitamin D 24-hydroxylase phosphatidylinositol biotin carboxylase angiotensin phosphoprotein with incorporation protein kinase heme binding oxygen binding phosphate kinase enzyme binding 3-kinase regulator carboxylase biotin binding protein kinase activity activity group of donors, activity binding 3-kinase activity activity binding substrate binding synthase activity activity activity activity binding activity receptor binding binding or reduction activity activity activity activity activity NAD or NADP of molecular as acceptor oxygen

oxidoreductase activity, acting on paired donors, steroid dehydrogenase with incorporation activity, acting or reduction isocitrate on the CH-OH of molecular steroid hydroxylase adenyl ribonucleotide phosphatidylinositol-4-phosphate ubiquitin protein 1-phosphatidylinositol-3-kinase dehydrogenase NAD binding kinase binding group of donors, oxygen, reduced activity binding 3-kinase activity ligase binding regulator activity activity NAD or NADP flavin or flavoprotein as acceptor as one donor, and incorporation of one atom of oxygen

isocitrate testosterone estradiol 17-beta-dehydrogenase protein kinase dehydrogenase 17-beta-dehydrogenase aromatase activity ATP binding activity binding (NAD+) activity activity

Figure 9: Relationships between the enriched molecular function Gene Ontology terms that were listed in Table 16. The darkness of the red reflects the significance of the enrichment and the thicknesses of the edges are propotional to the numbers of genes sharing the following annotation.

25 positive regulation of protein response to kinase activity vitamin regulation response to of protein positive regulation nutrient serine/threonine of peptidyl-threonine kinase activity phosphorylation glycogen biosynthetic process negative regulation of response regulation to external of glucose stimulus positive regulation metabolic process of transcription regulation regulation from RNA polymerase response to of protein of protein II promoter nutrient levels kinase activity phosphorylation positive regulation glycogen metabolic peptidyl-serine of protein process phosphorylation phosphorylation

glucose metabolic energy reserve peptidyl-threonine process metabolic process phosphorylation

regulation mammary gland positive regulation of glycogen formation of transcription biosynthetic process mammary gland duct morphogenesis

positive regulation positive regulation xenobiotic of myoblast of cellular metabolic process differentiation biosynthetic protein phosphorylation process insulin receptor energy derivation signaling pathway response to external stimulus by oxidation planar cell of organic polarity pathway regulation compounds involved in of canonical cardiac muscle Wnt receptor regulation tissue morphogenesis signaling pathway of generation positive regulation of precursor of macromolecule spermidine carbohydrate metabolites biosynthetic biosynthetic metabolic process and energy process process cellular response positive regulation spermine biosynthetic to chemical of cellular process stimulus metabolic process

planar cell positive regulation regulation polarity pathway of glucose of kinase activity involved in import outflow tract mammary gland morphogenesis planar cell morphogenesis polarity pathway positive regulation involved in regulation of biosynthetic planar cell T cell costimulation process polarity pathway pericardium of striated morphogenesis muscle tissue involved in positive regulation heart morphogenesis development canonical Wnt of metabolic process polyamine biosynthetic receptor signaling regulation process pathway of cell development

planar cell camera-type polarity pathway eye morphogenesis involved in cardiac right protein kinase T cell apoptosis atrium morphogenesis B signaling cascade response to embryonic digit non-canonical activity morphogenesis Wnt receptor regulation of smoothened signaling pathway developmental involved in signaling pathway growth tricarboxylic heart development pericardium acid cycle positive regulation planar cell morphogenesis of mesenchymal acetyl-CoA polarity pathway cell proliferation metabolic process involved in regulation ventricular of glucose negative regulation positive regulation septum morphogenesis import of interleukin-6 production of T cell activation cardiac right 2-oxoglutarate atrium morphogenesis platelet activation metabolic process nerve growth lymphocyte drug metabolic factor receptor costimulation process signaling pathway regulation ventricular positive regulation of fat cell drug catabolic septum morphogenesis of ossification differentiation process smoothened dopaminergic signaling pathway biological_process cochlea morphogenesis neuron differentiation involved in positive regulation regulation of fatty acid phosphatidylinositol-mediated positive regulation catabolic process of cerebellar beta-oxidation signaling of sodium ion granule cell transport precursor cell positive regulation MAPKKK cascade proliferation developmental of lipid metabolic process involved process diencephalon in reproduction development alkaloid catabolic regulation process response to negative regulation of fatty acid organic cyclic of cell size beta-oxidation brain development prostatic bud regulation compound response to formation of establishment protein oligomerization organic nitrogen of protein localization positive regulation regulation NFAT protein fibroblast of response of transcription import into growth factor to stimulus regulator activity nucleus receptor signaling pathway T cell receptor regulation notochord regression signaling pathway of MAPKKK cascade ventral midline development positive regulation gastrulation of MAPKKK cascade with mouth regulation negative regulation forming second of cartilage of plasma membrane cerebellum development long-chain development response to fatty acid growth factor response to transport protein heterooligomerization stimulus acid macromolecule cerebellum regulation hindgut morphogenesis catabolic process morphogenesis of apoptosis NADH metabolic positive regulation positive regulation process of establishment of transcription of protein regulator activity proximal/distal localization pattern formation in plasma membrane regulation regulation intracellular oxidative demethylation of cell cycle lipid transport of fatty acid oxidation isocitrate cell migration metabolic process lipid biosynthetic lipid metabolic respiratory process process system development fatty acid primitive streak neural tube biosynthetic formation development process positive regulation of apoptosis

positive regulation dorsal/ventral of cell cycle pattern formation ceramide metabolic lung development process

negative regulation negative regulation nuclear-transcribed of apoptosis of cellular mRNA poly(A) process tail shortening cell cycle arrest lipid modification

leukocyte migration monoterpenoid metabolic process carnitine shuttle steroid biosynthetic process

ceramide biosynthetic steroid metabolic process process

smoothened smoothened regulation signaling pathway signaling pathway of lipid biosynthetic involved in involved in process dorsal/ventral ventral spinal neural tube cord interneuron androgen metabolic regulation patterning specification process of fatty acid biosynthetic smoothened regulation phosphatidylinositol process signaling pathway of establishment phosphorylation involved in of planar polarity spinal cord involved in vitamin D metabolic motor neuron neural tube process cell fate specification closure C21-steroid negative regulation estrogen biosynthetic hormone biosynthetic of cell differentiation process process

negative regulation glucocorticoid of Wnt receptor biosynthetic signaling pathway process

desensitization androgen biosynthetic of G-protein process coupled receptor protein signaling pathway

mineralocorticoid biosynthetic process

planar cell polarity pathway involved in neural tube closure

desensitization of G-protein coupled receptor protein signaling pathway by arrestin

G-protein coupled receptor internalization

Figure 10: Relationships between the enriched biological process Gene Ontology terms that were listed in Table 16. The darkness of the red reflects the significance of the enrichment and the thicknesses of the edges are propotional to the numbers of genes sharing the following annotation.

26 9.2 Candidate genes

Table 17: Descriptions of the candidate genes. Studies that have reported results about the candidate genes are listed so that those with negative evidence have been prefixed with a hyphen. S column contains an at sign if the gene is part of the candidate pathway. The statuses of the genes are shown as: a=absent, d=down regulated, u=up regulated, s=stable. This table has 169 rows.

S name locus description studies u ABCC1 16:16043434-16236931 ATP-binding cassette, sub-family C (CFTR/MRP), member 1 [Source:HGNC Symbol;Acc:51], tcgaGliomaGE 16p13.11 type=protein coding, GO=[leukotriene biosynthetic process; prostanoid metabolic process; ATP catabolic process; hormone biosynthetic process; ATPase activity, coupled to transmembrane movement of substances; fatty acid biosynthetic process; ATP metabolic process; fatty acid metabolic process; response to drug; monocarboxylic acid metabolic process; purine ribonucleoside triphosphate metabolic process] u ABCC4 13:95672083-95953687 ATP-binding cassette, sub-family C (CFTR/MRP), member 4 [Source:HGNC Symbol;Acc:55], tcgaGliomaGE, tscapeBCa 13q32.1 type=processed transcript,protein coding, GO=[15-hydroxyprostaglandin dehydrogenase (NAD+) activity; platelet dense granule membrane; ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism; chloride channel activity; platelet degranulation; ATPase activity, coupled to transmembrane movement of substances; platelet activation] u AC020915.4 19:58816715-58827023 [undefined], type=processed transcript,retained 19q13.43 d AC100826.1 15:69854059-69863779 [undefined], type=lincRNA 15q23 u* ACACA 17:35441923-35766902 acetyl-CoA carboxylase alpha [Source:HGNC Symbol;Acc:84], type=protein coding, tscapeBCd, tscapeOvariand 17q12 GO=[acetyl-CoA carboxylase activity; multicellular organismal protein metabolic process; biotin binding; biotin carboxylase activity; long-chain fatty-acyl-CoA biosynthetic process; long-chain fatty-acyl-CoA metabolic process; protein homotetramerization; triglyceride biosynthetic process; acetyl-CoA metabolic process; protein tetramerization; lipid homeostasis; triglyceride metabolic process; glycerol ether metabolic process; fatty acid biosynthetic process; energy reserve metabolic process; response to organic cyclic compound; fatty acid metabolic process; response to drug; soluble fraction; monocarboxylic acid metabolic process; generation of precursor metabolites and energy] u ACAD8 11:134123389- acyl-CoA dehydrogenase family, member 8 [Source:HGNC Symbol;Acc:87], tcgaGliomaGE, tscapeBCd, 134135749 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[acyl-CoA tscapeNSCLCd 11q25 dehydrogenase activity; branched chain family catabolic process; flavin adenine dinucleotide binding] u* ACSL3 2:223725652-223809357 acyl-CoA synthetase long-chain family member 3 [Source:HGNC Symbol;Acc:3570], tscapeRCCd 2q36.1 type=protein coding,retained intron, GO=[fatty-acyl-CoA synthase activity; long-chain fatty acid-CoA ligase activity; long-chain fatty-acyl-CoA biosynthetic process; long-chain fatty-acyl-CoA metabolic process; peroxisomal membrane; triglyceride biosynthetic process; peroxisomal part; microbody part; triglyceride metabolic process; mitochondrial outer membrane; peroxisome; glycerol ether metabolic process; organelle outer membrane; fatty acid biosynthetic process; response to organic cyclic compound; response to nutrient; microsome; fatty acid metabolic process; response to nutrient levels; response to extracellular stimulus; monocarboxylic acid metabolic process; brain development; central nervous system development] u ADAMTS1 21:28208606-28217728 ADAM metallopeptidase with thrombospondin type 1 motif, 1 [Source:HGNC Symbol;Acc:217], cosmicMetastasis, 21q21.3 type=protein coding,retained intron, GO=[heart trabecula formation; ovulation from ovarian tcgaBreastGE, tscapeBCd follicle; integrin-mediated signaling pathway; basement membrane; female gonad development; metalloendopeptidase activity; heparin binding; female sex differentiation; renal system development; gonad development; urogenital system development; reproductive structure development; developmental process involved in reproduction; gamete generation; sexual reproduction; multicellular organismal reproductive process; multicellular organism reproduction] u* ADRB2 5:148206156-148208196 adrenergic, beta-2-, receptor, surface [Source:HGNC Symbol;Acc:286], type=protein coding, snp3dObesity, 5q32 GO=[beta2-adrenergic receptor activity; positive regulation of skeletal muscle tissue growth; tcgaBreastGE diaphragm contraction; desensitization of G-protein coupled receptor protein signaling pathway by arrestin; vasodilation by norepinephrine-epinephrine involved in regulation of systemic arterial blood pressure; norepinephrine binding; adenylate cyclase binding; neuronal cell body membrane; diet induced thermogenesis; epinephrine binding; negative regulation of smooth muscle contraction; positive regulation of potassium ion transport; positive regulation of sodium ion transport; activation of transmembrane receptor protein tyrosine kinase activity; negative regulation of calcium ion transport via voltage-gated calcium channel activity; negative regulation of multicellular organism growth; dopamine binding; ionotropic glutamate receptor binding; negative regulation of muscle contraction; heat generation; regulation of sensory perception of pain; positive regulation of vasodilation; positive regulation of heart contraction; endosome to lysosome transport; negative regulation of ossification; respiratory system process; positive regulation of bone mineralization; negative regulation of G-protein coupled receptor protein signaling pathway; potassium channel regulator activity; brown fat cell differentiation; positive regulation of ossification; response to cold; bone resorption; vacuolar transport; regulation of skeletal muscle tissue development; regulation of bone mineralization; negative regulation of inflammatory response; activation of adenylate cyclase activity by G-protein signaling pathway; caveola; dendritic spine; drug binding; sarcolemma; receptor-mediated endocytosis; G-protein signaling, coupled to cAMP nucleotide second messenger; fat cell differentiation; positive regulation of MAPKKK cascade; receptor complex; G-protein signaling, coupled to cyclic nucleotide second messenger; apical plasma membrane; muscle tissue development; apical part of cell; divalent metal ion transport; lysosome; lytic vacuole; microsome; protein homodimerization activity; regulation of protein kinase activity; regulation of kinase activity; positive regulation of transcription from RNA polymerase II promoter] u AF127936.7 21:16195290-16254296 [undefined], type=processed transcript 21q11.2 u AL121833.1 6:90036344-90039278 [undefined], type=processed transcript 6q15 d ANKRD16 10:5903689-5931869 ankyrin repeat domain 16 [Source:HGNC Symbol;Acc:23471], tcgaBreastGE, 10p15.1 type=nonsense mediated decay,protein coding,retained intron tcgaGliomaGE u ANKRD37 4:186317175-186321782 ankyrin repeat domain 37 [Source:HGNC Symbol;Acc:29593], tscapeCRCd, tscapeHCCd, 4q35.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron tscapeMelanomad, tscapeNSCLCd, tscapeProstated, tscapeRCCd u APOD 3:195295573-195311076 apolipoprotein D [Source:HGNC Symbol;Acc:612], tcgaBreastGE, 3q29 type=nonsense mediated decay,protein coding,retained intron, GO=[retinoid binding; lipid tcgaGliomaGE, tscapeBCa, transporter activity; extracellular space] tscapeNSCLCd, tscapeOvariana u* APP 21:27252861-27543446 amyloid beta (A4) precursor protein [Source:HGNC Symbol;Acc:620], snp3dCRC, snp3dDementia, 21q21.3 type=processed transcript,protein coding,retained intron, GO=[smooth endoplasmic reticulum tscapeBCd calcium ion homeostasis; acetylcholine receptor binding; collateral sprouting in absence of injury; PTB domain binding; synaptic growth at neuromuscular junction; neuron remodeling; axon midline choice point recognition; spindle midzone; ciliary rootlet; G2 phase of mitotic cell cycle; ionotropic glutamate receptor signaling pathway; cellular copper ion homeostasis; suckling behavior; mRNA polyadenylation; axon cargo transport; mating behavior; regulation of epidermal growth factor receptor activity; neuron maturation; neuron recognition; dendritic shaft; positive regulation of mitotic cell cycle; regulation of synapse structure and activity; visual learning; neuromuscular junction; peptidase activator activity; microtubule-based transport; neuromuscular process controlling balance; platelet alpha granule lumen; negative regulation of neuron differentiation; coated pit; dendritic spine; adult locomotory behavior; Notch signaling pathway; dendrite development; platelet degranulation; positive regulation of cell cycle; serine-type endopeptidase inhibitor activity; synaptosome; heparin binding; extracellular matrix organization; locomotory behavior; neuron apoptosis; muscle tissue development; apical part of cell; platelet activation; perinuclear region of cytoplasm; cell surface; brain development; regulation of protein kinase activity; regulation of kinase activity; positive regulation of transcription from RNA polymerase II promoter; central nervous system development] u ARID5B 10:63661059-63856703 AT rich interactive domain 5B (MRF1-like) [Source:HGNC Symbol;Acc:17362], tcgaBreastGE, 10q21.2 type=protein coding, GO=[fat pad development; fibroblast migration; face morphogenesis; face tscapeProstatea development; adrenal gland development; platelet-derived growth factor receptor signaling pathway; palate development; post-embryonic development; female gonad development; female sex differentiation; fat cell differentiation; positive regulation of activity; positive regulation of transcription regulator activity; renal system development; positive regulation of binding; gonad development; urogenital system development; reproductive structure development; transcription repressor activity; skeletal system development; developmental process involved in reproduction; negative regulation of transcription, DNA-dependent; negative regulation of transcription] Continued on next page. . .

27 S name locus description studies d* ATP1B1 1:169074935-169101960 ATPase, Na+/K+ transporting, beta 1 polypeptide [Source:HGNC Symbol;Acc:804], tcgaGliomaGE 1q24.2 type=processed transcript,protein coding, GO=[sodium:potassium-exchanging ATPase complex; sodium:potassium-exchanging ATPase activity; ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism; caveola; ATP biosynthetic process; ATPase activity, coupled to transmembrane movement of substances; ATP metabolic process; apical plasma membrane; response to hypoxia; leukocyte migration; apical part of cell; purine ribonucleoside triphosphate metabolic process] u BMPR1B 4:95679119-96079599 bone morphogenetic protein receptor, type IB [Source:HGNC Symbol;Acc:1077], tscapeHCCd 4q22.3 type=protein coding, GO=[ovarian cumulus expansion; transforming growth factor beta receptor activity, type I; retinal ganglion cell axon guidance; cartilage condensation; positive regulation of bone mineralization; positive regulation of ossification; positive regulation of osteoblast differentiation; regulation of bone mineralization; SMAD binding; receptor signaling protein serine/threonine kinase activity; retina development in camera-type eye; BMP signaling pathway; female gonad development; female sex differentiation; receptor complex; gonad development; transmembrane receptor protein serine/threonine kinase signaling pathway; reproductive structure development; skeletal system development; developmental process involved in reproduction; gamete generation; protein serine/threonine kinase activity; sexual reproduction; multicellular organismal reproductive process; multicellular organism reproduction] u* BTG1 12:92534074-92539673 B-cell translocation gene 1, anti-proliferative [Source:HGNC Symbol;Acc:1130], tcgaGliomaGE 12q21.33 type=protein coding, GO=[positive regulation of endothelial cell differentiation; positive regulation of myoblast differentiation; regulation of myoblast differentiation; positive regulation of epithelial cell differentiation; regulation of skeletal muscle tissue development; myoblast differentiation; positive regulation of angiogenesis; negative regulation of ; negative regulation of cell size; response to oxidative stress; muscle tissue development; kinase binding; response to peptide hormone stimulus; protein binding transcription factor activity; gamete generation; sexual reproduction; multicellular organismal reproductive process; multicellular organism reproduction] d C11orf92 11:111164114- 11 open reading frame 92 [Source:HGNC Symbol;Acc:33789], tcgaBreastGE 111175770 type=processed transcript,protein coding 11q23.1 u C17orf48 17:10600931-10614550 chromosome 17 open reading frame 48 [Source:HGNC Symbol;Acc:30925], 17p13.1 type=nonsense mediated decay,protein coding,retained intron, GO=[CDP-glycerol diphosphatase activity; ADP-ribose diphosphatase activity; ADP-sugar diphosphatase activity] d C1orf53 1:197871777-197876497 open reading frame 53 [Source:HGNC Symbol;Acc:30003], tcgaBreastGE 1q31.3 type=processed transcript,protein coding d C20orf177 20:58508819-58523735 chromosome 20 open reading frame 177 [Source:HGNC Symbol;Acc:16170], tcgaGliomaGE 20q13.33 type=processed transcript,protein coding u C3orf25 3:129120164-129147494 chromosome 3 open reading frame 25 [Source:HGNC Symbol;Acc:28061], 3q21.3 type=protein coding,retained intron, GO=[calcium ion binding] d C5orf13 5:110998318-111333161 chromosome 5 open reading frame 13 [Source:HGNC Symbol;Acc:16834], tcgaBreastGE, 5q22.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[axon tcgaOvarianGE, regeneration; regulation of transforming growth factor beta receptor signaling pathway; tscapeCRCd, transforming growth factor beta receptor signaling pathway; regulation of transmembrane receptor tscapeNSCLCd, protein serine/threonine kinase signaling pathway; transmembrane receptor protein tscapeOvariand, serine/threonine kinase signaling pathway] tscapeProstated d C5orf30 5:102594403-102614361 chromosome 5 open reading frame 30 [Source:HGNC Symbol;Acc:25052], type=protein coding tcgaGliomaGE, 5q21.1 tscapeCRCd, tscapeNSCLCd, tscapeOvariand, tscapeProstated d C6orf64 6:39071840-39082965 chromosome 6 open reading frame 64 [Source:HGNC Symbol;Acc:21025], tcgaBreastGE, 6p21.2 type=processed transcript,protein coding tcgaGliomaGE, tscapeOvariana d CALD1 7:134429003-134655479 caldesmon 1 [Source:HGNC Symbol;Acc:1441], snp3dGlioma, 7q33 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[actin cap; tcgaGliomaGE tropomyosin binding; positive regulation of protein binding; myosin binding; actin filament; dendritic spine; postsynaptic density; focal adhesion; calmodulin binding; positive regulation of binding; actin binding] u* CAMKK2 12:121675497- calcium/calmodulin-dependent protein kinase kinase 2, beta [Source:HGNC Symbol;Acc:1470], tcgaGliomaGE 121736111 type=nonsense mediated decay,processed transcript,protein coding,retained intron, 12q24.31 GO=[calmodulin-dependent protein kinase activity; calcium-mediated signaling; protein autophosphorylation; protein tyrosine kinase activity; calmodulin binding; regulation of protein kinase activity; regulation of kinase activity; protein serine/threonine kinase activity; calcium ion binding] u CAPZB 1:19665267-19812066 capping protein (actin filament) muscle Z-line, beta [Source:HGNC Symbol;Acc:1491], tscapeBCd, tscapeCRCd, 1p36.13 type=processed transcript,protein coding, GO=[F-actin capping protein complex; WASH complex; tscapeNSCLCd, actin filament capping; lamellipodium assembly; regulation of actin filament polymerization; tscapeOvariand, lamellipodium; actin binding] tscapeRCCd d CDH26 20:58533471-58609066 cadherin 26 [Source:HGNC Symbol;Acc:15902], type=processed transcript,protein coding, 20q13.33 GO=[homophilic cell adhesion; calcium ion binding] d CDK8 13:26828276-26979375 cyclin-dependent kinase 8 [Source:HGNC Symbol;Acc:1779], tscapeBCd, tscapeCRCa, 13q12.13 type=processed transcript,protein coding, GO=[RNA polymerase II carboxy-terminal domain tscapeSCLCd kinase activity; cyclin-dependent protein kinase activity; mediator complex; protein serine/threonine kinase activity] u CEBPG 19:33864609-33873591 CCAAT/enhancer binding protein (C/EBP), gamma [Source:HGNC Symbol;Acc:1837], tcgaBreastGE, 19q13.11 type=protein coding, GO=[enucleate erythrocyte differentiation; positive regulation of tcgaGliomaGE interferon-gamma biosynthetic process; positive regulation of DNA repair; natural killer cell mediated cytotoxicity; natural killer cell mediated immunity; regulation of interferon-gamma production; B cell differentiation; liver development; negative regulation of transcription factor activity; negative regulation of DNA binding; positive regulation of transcription factor activity; positive regulation of transcription regulator activity; double-stranded DNA binding; positive regulation of binding; protein heterodimerization activity; transcription factor binding] u CENPN 16:81040103-81066719 centromere protein N [Source:HGNC Symbol;Acc:30873], type=protein coding, tcgaBreastGE, 16q23.2 GO=[CenH3-containing nucleosome assembly at centromere; DNA replication-independent tcgaGliomaGE, nucleosome assembly; condensed chromosome kinetochore; mitotic prometaphase; chromosome, tscapeProstated centromeric region] d CITED2 6:139693393-139695757 Cbp/p300-interacting transactivator, with Glu/Asp-rich carboxy-terminal domain, 2 [Source:HGNC tscapeBCa 6q24.1 Symbol;Acc:1987], type=protein coding, GO=[LBD domain binding; embryonic process involved in female pregnancy; response to fluid shear stress; trophectodermal cell differentiation; decidualization; positive regulation of transforming growth factor beta receptor signaling pathway; positive regulation of cell-cell adhesion; adrenal gland development; vasculogenesis; embryonic placenta development; regulation of transforming growth factor beta receptor signaling pathway; determination of left/right symmetry; liver development; negative regulation of cell migration; nuclear ; positive regulation of cell cycle; transforming growth factor beta receptor signaling pathway; regulation of transmembrane receptor protein serine/threonine kinase signaling pathway; transcription corepressor activity; positive regulation of gene-specific transcription from RNA polymerase II promoter; response to hypoxia; transmembrane receptor protein serine/threonine kinase signaling pathway; transcription coactivator activity; anti-apoptosis; transcription repressor activity; transcription activator activity; regulation of gene-specific transcription from RNA polymerase II promoter; protein binding transcription factor activity; developmental process involved in reproduction; negative regulation of transcription from RNA polymerase II promoter; positive regulation of transcription from RNA polymerase II promoter; negative regulation of transcription, DNA-dependent; multicellular organismal reproductive process; multicellular organism reproduction; central nervous system development; negative regulation of transcription] u CNTNAP2 7:145813453-148118090 contactin associated protein-like 2 [Source:HGNC Symbol;Acc:13830], cosmicPrimary, 7q35, 7q36.1 type=processed transcript,protein coding, GO=[cell body fiber; protein localization to tcgaBreastGE, juxtaparanode region of axon; clustering of voltage-gated potassium channels; superior temporal tcgaGliomaGE, gyrus development; juxtaparanode region of axon; thalamus development; axolemma; striatum tscapeMelanomaa, development; neuron projection membrane; neuron maturation; neuron recognition; diencephalon tscapeNSCLCd, development; limbic system development; voltage-gated potassium channel complex; brain tscapeOvariana development; central nervous system development; calcium ion binding] d COL16A1 1:32117848-32169920 collagen, type XVI, alpha 1 [Source:HGNC Symbol;Acc:2193], cosmicPrimary, tscapeBCd, 1p35.2 type=processed transcript,protein coding,retained intron, GO=[collagen type XVI; tscapeOvariand integrin-mediated signaling pathway; integrin binding] u CORO1B 11:67205519-67211292 coronin, actin binding protein, 1B [Source:HGNC Symbol;Acc:2253], tcgaBreastGE, 11q13.2 type=nonsense mediated decay,processed transcript,protein coding, GO=[actin binding] tcgaGliomaGE u CRIP2 14:105939309- cysteine-rich protein 2 [Source:HGNC Symbol;Acc:2361], type=protein coding tcgaBreastGE, 105946507 tscapeMelanomad 14q32.33 d CXCR7 2:237476430-237491001 chemokine (C-X-C motif) receptor 7 [Source:HGNC Symbol;Acc:23692], type=protein coding, tcgaGliomaGE, tscapeBCd, 2q37.3 GO=[interspecies interaction between organisms] tscapeNSCLCd, tscapeOvariand, tscapeRCCd Continued on next page. . .

28 S name locus description studies d CYBASC3 11:61116226-61129771 cytochrome b, ascorbate dependent 3 [Source:HGNC Symbol;Acc:23014], tcgaGliomaGE 11q12.2 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[late endosome membrane; lysosomal membrane; late endosome; electron transport chain; lysosome; lytic vacuole; generation of precursor metabolites and energy] u* CYP11A1 15:74630100-74660081 cytochrome P450, family 11, subfamily A, polypeptide 1 [Source:HGNC Symbol;Acc:2590], 15q24.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[cholesterol monooxygenase (side-chain-cleaving) activity; granulosa cell differentiation; mitochondrial crista; vitamin D metabolic process; mating behavior; cholesterol binding; C21-steroid hormone biosynthetic process; response to cadmium ion; sterol binding; cerebellum development; response to cAMP; response to hydrogen peroxide; hormone biosynthetic process; cholesterol metabolic process; heme binding; steroid biosynthetic process; xenobiotic metabolic process; response to estrogen stimulus; electron carrier activity; response to oxidative stress; response to nutrient; response to nutrient levels; response to extracellular stimulus; brain development; central nervous system development] u DBI 2:120124497-120130126 diazepam binding inhibitor (GABA receptor modulator, acyl-CoA binding protein) [Source:HGNC tcgaGliomaGE, tscapeCRCa 2q14.2 Symbol;Acc:2690], type=processed transcript,protein coding,retained intron, GO=[benzodiazepine receptor binding; fatty-acyl-CoA binding; skin development; hair follicle development; triglyceride metabolic process; glycerol ether metabolic process] d* DEGS2 14:100612753- degenerative spermatocyte homolog 2, lipid desaturase (Drosophila) [Source:HGNC tscapeMelanomad 100626012 Symbol;Acc:20113], type=protein coding, GO=[sphingosine hydroxylase activity; sphingolipid 14q32.2 delta-4 desaturase activity; sphinganine metabolic process; ceramide biosynthetic process; fatty acid biosynthetic process; fatty acid metabolic process; monocarboxylic acid metabolic process] d DIO1 1:54356912-54376759 deiodinase, iodothyronine, type I [Source:HGNC Symbol;Acc:2883], 1p32.3 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[thyroxine 5’-deiodinase activity; thyroid hormone generation; selenium binding; hormone biosynthetic process; microsome] u DNAJB9 7:108210012-108215294 DnaJ (Hsp40) homolog, subfamily B, member 9 [Source:HGNC Symbol;Acc:6968], 7q31.1 type=processed transcript,protein coding,retained intron, GO=[misfolded protein binding; ER-associated protein catabolic process; heat shock protein binding; unfolded protein binding; nucleolus] u DNM1L 12:32832137-32905700 dynamin 1-like [Source:HGNC Symbol;Acc:2973], type=protein coding, GO=[mitochondrial tcgaBreastGE, 12p11.21 fragmentation involved in apoptosis; cis-Golgi network; mitochondrial membrane organization; tcgaGliomaGE, cellular component disassembly involved in apoptosis; ubiquitin protein ligase binding; tcgaOvarianGE, mitochondrial outer membrane; peroxisome; organelle outer membrane; GTPase activity; tscapeNSCLCa microsome; GTP binding; purine ribonucleoside triphosphate metabolic process] u EAF2 3:121554030-121605373 ELL associated factor 2 [Source:HGNC Symbol;Acc:23115], 3q13.33 type=nonsense mediated decay,processed transcript,protein coding, GO=[negative regulation of epithelial cell proliferation involved in prostate gland development; epithelial cell proliferation involved in prostate gland development; negative regulation of reproductive process; negative regulation of epithelial cell proliferation; nuclear speck; negative regulation of cell growth; negative regulation of cell size; urogenital system development; reproductive structure development; transcription activator activity; developmental process involved in reproduction] u EML1 14:100239936- echinoderm microtubule associated protein like 1 [Source:HGNC Symbol;Acc:3330], tscapeMelanomad 100408393 type=protein coding, GO=[microtubule; calcium ion binding] 14q32.2 u EXTL2 1:101337943-101361554 exostoses (multiple)-like 2 [Source:HGNC Symbol;Acc:3516], tscapeBCd, tscapeNSCLCd, 1p21.2 type=processed transcript,protein coding, GO=[alpha-1,4-N-acetylgalactosaminyltransferase tscapeProstated activity; glucuronyl-galactosyl-proteoglycan 4-alpha-N-acetylglucosaminyltransferase activity; UDP-N-acetylgalactosamine metabolic process; N-acetylglucosamine metabolic process; glucosamine metabolic process] d FAM113B 12:47610052-47630439 family with sequence similarity 113, member B [Source:HGNC Symbol;Acc:28255], tcgaBreastGE, tscapeRCCa, 12q13.11 type=protein coding tscapeSCLCa u FAM174B 15:93160678-93199031 family with sequence similarity 174, member B [Source:HGNC Symbol;Acc:34339], tcgaGliomaGE 15q26.1 type=protein coding d FAM198B 4:159045626-159094470 family with sequence similarity 198, member B [Source:HGNC Symbol;Acc:25312], 4q32.1 type=processed transcript,protein coding, GO=[Golgi membrane] u FAM43A 3:194406622-194409762 family with sequence similarity 43, member A [Source:HGNC Symbol;Acc:26888], tscapeBCa, 3q29 type=protein coding tscapeMelanomad, tscapeNSCLCd, tscapeOvariana d FIGNL1 7:50511831-50518088 fidgetin-like 1 [Source:HGNC Symbol;Acc:13286], type=protein coding, GO=[ATP metabolic tscapeGliomaa 7p12.1 process; magnesium ion binding; purine ribonucleoside triphosphate metabolic process] d* FZD2 17:42634827-42636907 frizzled homolog 2 (Drosophila) [Source:HGNC Symbol;Acc:4040], type=protein coding, tcgaBreastGE, 17q21.31 GO=[muscular septum morphogenesis; hard palate development; membranous septum tscapeProstated morphogenesis; G-protein signaling, coupled to cGMP nucleotide second messenger; cellular response to vitamin D; positive regulation of cGMP metabolic process; cochlea morphogenesis; planar cell polarity pathway involved in neural tube closure; regulation of establishment of planar polarity involved in neural tube closure; ventricular septum morphogenesis; Wnt receptor activity; inner ear receptor cell development; outflow tract morphogenesis; neuron projection membrane; Wnt-protein binding; Wnt receptor signaling pathway, calcium modulating pathway; cellular response to vitamin; cellular response to growth factor stimulus; palate development; PDZ domain binding; cellular response to nutrient levels; positive regulation of transcription regulator activity; canonical Wnt receptor signaling pathway; G-protein signaling, coupled to cyclic nucleotide second messenger; cellular response to extracellular stimulus; gonad development; response to nutrient; apical part of cell; reproductive structure development; protein heterodimerization activity; cellular response to peptide hormone stimulus; response to nutrient levels; response to extracellular stimulus; regulation of gene-specific transcription from RNA polymerase II promoter; response to peptide hormone stimulus; developmental process involved in reproduction; sensory perception of smell; brain development; central nervous system development] d* GALNTL4 11:11292423-11643552 UDP-N-acetyl-alpha-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase-like 4 tcgaGliomaGE 11p15.3 [Source:HGNC Symbol;Acc:30488], type=processed transcript,protein coding, GO=[polypeptide N-acetylgalactosaminyltransferase activity; sugar binding; Golgi membrane] d GAS6 13:114523524- growth arrest-specific 6 [Source:HGNC Symbol;Acc:4168], tcgaGliomaGE, tscapeBCd, 114567046 type=processed transcript,protein coding, GO=[receptor agonist activity; macrophage cytokine tscapeHCCa, 13q34 production; peptidyl-glutamic acid carboxylation; apoptotic cell clearance; Golgi lumen; cytokine tscapeNSCLCa, production involved in immune response; positive regulation of fibroblast proliferation; positive tscapeNSCLCd, regulation of ERK1 and ERK2 cascade; platelet alpha granule lumen; cellular response to growth tscapeSCLCa factor stimulus; protein kinase B signaling cascade; neuron migration; platelet degranulation; endoplasmic reticulum lumen; cellular response to nutrient levels; positive regulation of MAPKKK cascade; cellular response to extracellular stimulus; post-translational protein modification; leukocyte migration; platelet activation; response to nutrient levels; response to extracellular stimulus; calcium ion binding; extracellular space] d GCG 2:162999392-163008914 glucagon [Source:HGNC Symbol;Acc:4191], type=protein coding,retained intron, GO=[glucagon snp3dObesity 2q24.2 receptor binding; negative regulation of appetite; negative regulation of response to nutrient levels; negative regulation of response to extracellular stimulus; regulation of response to food; cellular response to glucagon stimulus; hormone activity; G-protein signaling, coupled to cAMP nucleotide second messenger; G-protein signaling, coupled to cyclic nucleotide second messenger; energy reserve metabolic process; cellular response to peptide hormone stimulus; response to nutrient levels; soluble fraction; response to extracellular stimulus; response to peptide hormone stimulus; generation of precursor metabolites and energy; extracellular space] u GDF15 19:18496968-18499986 growth differentiation factor 15 [Source:HGNC Symbol;Acc:30142], type=protein coding, snp3dProstateC, 19p13.11 GO=[transforming growth factor beta receptor signaling pathway; growth factor activity; cytokine tcgaBreastGE, activity; transmembrane receptor protein serine/threonine kinase signaling pathway; extracellular tcgaGliomaGE space] u GLRX2 1:193065598-193075244 glutaredoxin 2 [Source:HGNC Symbol;Acc:16065], type=processed transcript,protein coding, tcgaBreastGE 1q31.2 GO=[arsenate reductase (glutaredoxin) activity; DNA protection; glutathione disulfide oxidoreductase activity; protein thiol-disulfide exchange; response to redox state; 2 iron, 2 sulfur cluster binding; protein disulfide oxidoreductase activity; glutathione metabolic process; iron-sulfur cluster binding; cell redox homeostasis; response to hydrogen peroxide; cellular response to extracellular stimulus; electron transport chain; electron carrier activity; response to oxidative stress; response to extracellular stimulus; generation of precursor metabolites and energy] d GPER 7:1121844-1133451 G protein-coupled 1 [Source:HGNC Symbol;Acc:4485], type=protein coding, tscapeOvariand 7p22.3 GO=[Golgi membrane] u GPT2 16:46918290-46965201 glutamic pyruvate transaminase (alanine aminotransferase) 2 [Source:HGNC Symbol;Acc:18062], tscapeBCd, 16q11.2 type=protein coding, GO=[L-alanine metabolic process; alanine metabolic process; tscapeMelanomad, L-alanine:2-oxoglutarate aminotransferase activity; 2-oxoglutarate metabolic process; pyridoxal tscapeNSCLCd phosphate binding; cellular amino acid biosynthetic process] u GTF3C6 6:111279763-111289093 general transcription factor IIIC, polypeptide 6, alpha 35kDa [Source:HGNC Symbol;Acc:20872], tscapeCRCd 6q21 type=processed transcript,protein coding, GO=[transcription factor TFIIIC complex; tRNA transcription from RNA polymerase III promoter; 5S class rRNA transcription from RNA polymerase III type 1 promoter; RNA polymerase III transcription factor activity] d HCP5 HLA complex P5 [Source:HGNC Symbol;Acc:21659], type=processed transcript d HCP5 HLA complex P5 [Source:HGNC Symbol;Acc:21659], type=processed transcript d HCP5 HLA complex P5 [Source:HGNC Symbol;Acc:21659], type=processed transcript Continued on next page. . .

29 S name locus description studies d HCP5 6:31368479-31445283 HLA complex P5 [Source:HGNC Symbol;Acc:21659], type=processed transcript tcgaGliomaGE 6p21.33 u HEBP2 6:138724668-138734310 heme binding protein 2 [Source:HGNC Symbol;Acc:15716], tscapeBCa 6q23.3 type=nonsense mediated decay,protein coding u HERC5 4:89378268-89427314 hect domain and RLD 5 [Source:HGNC Symbol;Acc:24368], tcgaGliomaGE, 4q22.1 type=processed transcript,protein coding,retained intron, GO=[negative regulation of type I tcgaOvarianGE, interferon production; regulation of cyclin-dependent protein kinase activity; response to virus; tscapeHCCd acid-amino acid ligase activity; regulation of protein serine/threonine kinase activity; perinuclear region of cytoplasm; regulation of protein kinase activity; regulation of kinase activity] u HIVEP1 6:12008995-12165232 human immunodeficiency virus type I enhancer binding protein 1 [Source:HGNC Symbol;Acc:4920], tscapeBCa, tscapeOvariana, 6p24.1 type=nonsense mediated decay,protein coding, GO=[transcription repressor activity; negative tscapeSCLCd regulation of transcription from RNA polymerase II promoter; negative regulation of transcription, DNA-dependent; negative regulation of transcription] u HMG20B 19:3572775-3579086 high-mobility group 20B [Source:HGNC Symbol;Acc:5002], tcgaBreastGE, 19p13.3 type=nonsense mediated decay,protein coding,retained intron tcgaGliomaGE, tscapeHCCd, tscapeRCCd u HPGD 4:175411328-175444305 hydroxyprostaglandin dehydrogenase 15-(NAD) [Source:HGNC Symbol;Acc:5154], tscapeBCd, 4q34.1 type=nonsense mediated decay,protein coding,retained intron, GO=[15-hydroxyprostaglandin tscapeMelanomad, dehydrogenase (NAD+) activity; lipoxygenase pathway; prostaglandin E receptor activity; NAD+ tscapeProstated, binding; parturition; prostaglandin metabolic process; prostanoid metabolic process; very tscapeRCCd long-chain fatty acid metabolic process; NAD binding; transforming growth factor beta receptor signaling pathway; transmembrane receptor protein serine/threonine kinase signaling pathway; fatty acid metabolic process; monocarboxylic acid metabolic process; negative regulation of cell cycle; protein homodimerization activity] u HSD17B11 4:88257762-88312538 hydroxysteroid (17-beta) dehydrogenase 11 [Source:HGNC Symbol;Acc:22960], tcgaBreastGE, tscapeHCCd 4q22.1 type=processed transcript,protein coding,retained intron, GO=[androgen catabolic process; estradiol 17-beta-dehydrogenase activity; steroid biosynthetic process] u HSD17B4 5:118788138-118878028 hydroxysteroid (17-beta) dehydrogenase 4 [Source:HGNC Symbol;Acc:5213], tscapeBCd, tscapeNSCLCd, 5q23.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron, tscapeOvariand, GO=[3alpha,7alpha,12alpha-trihydroxy-5beta-cholest-24-enoyl-CoA hydratase activity; tscapeProstated long-chain-enoyl-CoA hydratase activity; 3-hydroxyacyl-CoA dehydrogenase activity; Sertoli cell development; estradiol 17-beta-dehydrogenase activity; fatty acid beta-oxidation using acyl-CoA oxidase; sterol transporter activity; bile acid biosynthetic process; sterol binding; peroxisomal matrix; very long-chain fatty acid metabolic process; fatty acid beta-oxidation; sterol transport; peroxisomal part; microbody part; lipid transporter activity; peroxisome; steroid biosynthetic process; gonad development; reproductive structure development; fatty acid metabolic process; monocarboxylic acid metabolic process; developmental process involved in reproduction] u* IDH1 2:209100951-209130798 isocitrate dehydrogenase 1 (NADP+), soluble [Source:HGNC Symbol;Acc:5382], cosmicRecurrent, 2q34 type=protein coding,retained intron, GO=[isocitrate dehydrogenase (NADP+) activity; glyoxylate tcgaGliomaGE cycle; isocitrate metabolic process; NADPH regeneration; 2-oxoglutarate metabolic process; tricarboxylic acid cycle; peroxisomal matrix; glutathione metabolic process; NADP binding; NAD binding; acetyl-CoA metabolic process; peroxisomal part; microbody part; female gonad development; peroxisome; female sex differentiation; magnesium ion binding; gonad development; response to organic cyclic compound; response to oxidative stress; reproductive structure development; soluble fraction; monocarboxylic acid metabolic process; developmental process involved in reproduction; protein homodimerization activity; generation of precursor metabolites and energy] d IL27RA 19:14142262-14164026 interleukin 27 receptor, alpha [Source:HGNC Symbol;Acc:17290], type=protein coding, fileBC2brain 19p13.12 GO=[interleukin-27 receptor activity; negative regulation of type 2 immune response; positive regulation of T-helper 1 type immune response; regulation of isotype switching to IgG isotypes; positive regulation of interferon-gamma production; defense response to Gram-positive bacterium; regulation of interferon-gamma production; regulation of production of molecular mediator of immune response] u* INPP4B 4:142944313-143768585 inositol polyphosphate-4-phosphatase, type II, 105kDa [Source:HGNC Symbol;Acc:6075], tcgaGliomaGE 4q31.21 type=nonsense mediated decay,processed transcript,protein coding, GO=[phosphatidylinositol-4,5-bisphosphate 4-phosphatase activity; phosphatidylinositol-3,4-bisphosphate 4-phosphatase activity] d INSIG2 2:118846028-118868573 insulin induced gene 2 [Source:HGNC Symbol;Acc:20452], 2q14.2 type=nonsense mediated decay,processed transcript,protein coding, GO=[SREBP-SCAP-Insig complex; ER-nuclear sterol response pathway; cranial suture morphogenesis; negative regulation of fatty acid biosynthetic process; negative regulation of steroid biosynthetic process; middle ear morphogenesis; response to fatty acid; cholesterol biosynthetic process; bone morphogenesis; palate development; triglyceride metabolic process; glycerol ether metabolic process; cholesterol metabolic process; fatty acid biosynthetic process; steroid biosynthetic process; response to insulin stimulus; fatty acid metabolic process; transcription factor binding; skeletal system development; response to peptide hormone stimulus; monocarboxylic acid metabolic process] d IRX3 16:54317216-54320378 iroquois 3 [Source:HGNC Symbol;Acc:14360], type=protein coding 16q12.2 u ISG20 15:89181974-89199714 interferon stimulated exonuclease gene 20kDa [Source:HGNC Symbol;Acc:6130], tcgaBreastGE, 15q26.1 type=protein coding, GO=[exoribonuclease II activity; single-stranded DNA specific 3’-5’ tcgaGliomaGE, exodeoxyribonuclease activity; DNA catabolic process, exonucleolytic; 3’-5’-exoribonuclease tscapeNSCLCa activity; PML body; RNA catabolic process; response to virus] u KCNN2 5:113696666-113832321 potassium intermediate/small conductance calcium-activated channel, subfamily N, member 2 tscapeBCd, tscapeNSCLCd, 5q22.3 [Source:HGNC Symbol;Acc:6291], tscapeOvariand, type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[small tscapeProstated conductance calcium-activated potassium channel activity; calcium activated cation channel activity; calmodulin binding] d KIF5B 10:32297938-32345359 kinesin family member 5B [Source:HGNC Symbol;Acc:6324], tscapeProstated 10p11.22 type=processed transcript,protein coding, GO=[stress granule disassembly; cytoplasm organization; ciliary rootlet; vesicle transport along microtubule; kinesin complex; microtubule-based transport; vesicle localization; microtubule motor activity; microtubule; perinuclear region of cytoplasm] d LGMN 14:93170157-93215047 legumain [Source:HGNC Symbol;Acc:9472], type=protein coding, GO=[negative regulation of tscapeMelanomad 14q32.12 multicellular organism growth; vitamin D metabolic process; cysteine-type endopeptidase activity; response to acid; negative regulation of neuron apoptosis; hormone biosynthetic process; late endosome; neuron apoptosis; apical part of cell; lysosome; lytic vacuole; protein serine/threonine kinase activity] u LIMCH1 4:41361624-41702061 LIM and calponin domains 1 [Source:HGNC Symbol;Acc:29191], tcgaGliomaGE 4p13 type=processed transcript,protein coding, GO=[actomyosin structure organization; actin binding] u LONRF1 8:12579403-12613582 LON peptidase N-terminal domain and ring finger 1 [Source:HGNC Symbol;Acc:26302], tcgaBreastGE 8p23.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[ATP-dependent peptidase activity; ubiquitin ligase complex; ubiquitin-protein ligase activity; acid-amino acid ligase activity] u LPAR3 1:85277285-85358896 lysophosphatidic acid receptor 3 [Source:HGNC Symbol;Acc:14298], tcgaGliomaGE 1p22.3 type=processed transcript,protein coding, GO=[bleb assembly; G-protein alpha-subunit binding; lysosphingolipid and lysophosphatidic acid receptor activity; elevation of cytosolic calcium ion concentration involved in G-protein signaling coupled to IP3 second messenger; positive regulation of calcium ion transport; positive regulation of MAPKKK cascade; G-protein signaling, coupled to cyclic nucleotide second messenger; phospholipid binding; regulation of MAP kinase activity; divalent metal ion transport; regulation of protein serine/threonine kinase activity; regulation of protein kinase activity; regulation of kinase activity] u LRIG1 3:66429221-66551687 leucine-rich repeats and immunoglobulin-like domains 1 [Source:HGNC Symbol;Acc:17360], tscapeNSCLCd, 3p14.1 type=processed transcript,protein coding,retained intron tscapeProstated, tscapeSCLCd d MAPK1IP1L 14:55518362-55536912 mitogen-activated protein kinase 1 interacting protein 1-like [Source:HGNC Symbol;Acc:19840], 14q22.3 type=protein coding u MAPK6 15:52311417-52358462 mitogen-activated protein kinase 6 [Source:HGNC Symbol;Acc:6879], type=protein coding, tcgaGliomaGE 15q21.2 GO=[MAP kinase activity; receptor signaling protein serine/threonine kinase activity; protein serine/threonine kinase activity] u MERTK 2:112656056-112787138 c-mer proto-oncogene tyrosine kinase [Source:HGNC Symbol;Acc:7027], 2q13 type=nonsense mediated decay,protein coding,retained intron, GO=[natural killer cell differentiation; vagina development; apoptotic cell clearance; substrate adhesion-dependent cell spreading; photoreceptor outer segment; protein kinase B signaling cascade; transmembrane receptor protein tyrosine kinase activity; negative regulation of lymphocyte activation; protein tyrosine kinase activity; leukocyte migration; reproductive structure development; visual perception; platelet activation; soluble fraction; developmental process involved in reproduction; gamete generation; protein serine/threonine kinase activity; sexual reproduction; multicellular organismal reproductive process; multicellular organism reproduction] u MLPH 2:238394071-238463961 melanophilin [Source:HGNC Symbol;Acc:29643], tscapeBCd, 2q37.3 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[myosin V tscapeMelanomad, binding; microtubule plus-end binding; melanosome localization; myosin binding; melanocyte tscapeNSCLCd, differentiation; Rab GTPase binding; developmental pigmentation; vesicle localization; tscapeOvariand, melanosome; actin binding] tscapeRCCd u MPHOSPH9 12:123636867- M-phase phosphoprotein 9 [Source:HGNC Symbol;Acc:7215], 123706445 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[Golgi 12q24.31 membrane] Continued on next page. . .

30 S name locus description studies u MRPS18A 6:43639040-43655545 mitochondrial ribosomal protein S18A [Source:HGNC Symbol;Acc:14515], type=protein coding, tscapeNSCLCa 6p21.1 GO=[mitochondrial small ribosomal subunit; organellar small ribosomal subunit; structural constituent of ribosome] u* MTOR 1:11166592-11322608 mechanistic target of rapamycin (serine/threonine kinase) [Source:HGNC Symbol;Acc:3942], cosmicRecurrent 1p36.22 type=processed transcript,protein coding, GO=[mTOR-FKBP12-rapamycin complex; regulation of carbohydrate utilization; carbohydrate utilization; TORC2 complex; TORC1 complex; negative regulation of NFAT protein import into nucleus; negative regulation of macroautophagy; positive regulation of lamellipodium assembly; regulation of Rac GTPase activity; ruffle organization; negative regulation of response to nutrient levels; negative regulation of response to extracellular stimulus; phosphatidylinositol 3-kinase complex; regulation of fatty acid beta-oxidation; regulation of response to food; positive regulation of stress fiber assembly; regulation of glycogen biosynthetic process; lamellipodium assembly; TOR signaling cascade; positive regulation of actin filament polymerization; positive regulation of protein kinase B signaling cascade; peptidyl-threonine phosphorylation; phosphoprotein binding; positive regulation of endothelial cell proliferation; positive regulation of translation; response to amino acid stimulus; fatty acid beta-oxidation; protein kinase B signaling cascade; T cell costimulation; lymphocyte costimulation; regulation of actin filament polymerization; response to acid; positive regulation of peptidyl-tyrosine phosphorylation; peptidyl-serine phosphorylation; phosphatidylinositol-mediated signaling; mitochondrial outer membrane; protein autophosphorylation; cellular response to nutrient levels; organelle outer membrane; negative regulation of cell size; cellular response to extracellular stimulus; insulin receptor signaling pathway; energy reserve metabolic process; response to nutrient; nerve growth factor receptor signaling pathway; lysosome; lytic vacuole; response to insulin stimulus; cellular response to peptide hormone stimulus; fatty acid metabolic process; response to nutrient levels; soluble fraction; response to extracellular stimulus; response to peptide hormone stimulus; monocarboxylic acid metabolic process; developmental process involved in reproduction; Golgi membrane; generation of precursor metabolites and energy; gamete generation; regulation of protein kinase activity; purine ribonucleoside triphosphate metabolic process; regulation of kinase activity; protein serine/threonine kinase activity; sexual reproduction; multicellular organismal reproductive process; multicellular organism reproduction] d NAB1 2:191511472-191557492 NGFI-A binding protein 1 (EGR1 binding protein 1) [Source:HGNC Symbol;Acc:7626], 2q32.2 type=processed transcript,protein coding, GO=[Schwann cell differentiation; endochondral ossification; regulation of epidermis development; endochondral bone morphogenesis; bone morphogenesis; myelination; transcription repressor activity; skeletal system development; negative regulation of transcription] u NBL1 1:19967048-19984945 neuroblastoma, suppression of tumorigenicity 1 [Source:HGNC Symbol;Acc:7650], tcgaGliomaGE, tscapeBCd, 1p36.13 type=protein coding, GO=[positive regulation of neuron differentiation] tscapeCRCd, tscapeNSCLCd, tscapeOvariand, tscapeRCCd d NETO1 18:70409671-70535184 neuropilin (NRP) and tolloid (TLL)-like 1 [Source:HGNC Symbol;Acc:13823], type=protein coding, tcgaBreastGE, 18q22.3 GO=[excitatory synapse; regulation of long-term neuronal synaptic plasticity; visual learning; tcgaGliomaGE, memory; postsynaptic density; postsynaptic membrane] tscapeCRCd, tscapeProstated d NUCB2 11:17229700-17371521 nucleobindin 2 [Source:HGNC Symbol;Acc:8044], 11p15.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[Golgi medial cisterna; nuclear outer membrane; ER-Golgi intermediate compartment; organelle outer membrane; calcium ion binding; extracellular space] u PAQR4 16:3019246-3023490 progestin and adipoQ receptor family member IV [Source:HGNC Symbol;Acc:26386], tcgaBreastGE, tscapeHCCd, 16p13.3 type=protein coding tscapeOvariand u PDIA5 3:122785909-122944074 protein disulfide isomerase family A, member 5 [Source:HGNC Symbol;Acc:24811], 3q21.1 type=nonsense mediated decay,processed transcript,protein coding, GO=[protein disulfide isomerase activity; protein disulfide oxidoreductase activity; cell redox homeostasis; endoplasmic reticulum lumen; glycerol ether metabolic process; electron carrier activity] u PECI 6:4115923-4135831 peroxisomal D3,D2-enoyl-CoA isomerase [Source:HGNC Symbol;Acc:14601], tcgaGliomaGE, 6p25.2 type=nonsense mediated decay,protein coding,retained intron, GO=[dodecenoyl-CoA tscapeOvariana, delta-isomerase activity; fatty-acyl-CoA binding; peroxisomal matrix; peroxisomal part; microbody tscapeSCLCd part; peroxisome; fatty acid metabolic process; monocarboxylic acid metabolic process] u PFKFB2 1:207222801-207254369 6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 2 [Source:HGNC Symbol;Acc:8873], tscapeProstatea 1q32.2 type=processed transcript,protein coding, GO=[6-phosphofructo-2-kinase activity; positive regulation of glucokinase activity; regulation of glucokinase activity; lactate metabolic process; fructose 2,6-bisphosphate metabolic process; fructose-2,6-bisphosphate 2-phosphatase activity; pyruvate metabolic process; positive regulation of insulin secretion; glycolysis; response to glucose stimulus; kinase binding; monocarboxylic acid metabolic process; generation of precursor metabolites and energy; regulation of kinase activity] u PHGR1 15:40643234-40648632 proline/histidine/glycine-rich 1 [Source:HGNC Symbol;Acc:37226], type=protein coding 15q15.1 d PITX1 5:134363424-134370503 paired-like homeodomain 1 [Source:HGNC Symbol;Acc:9004], tcgaBreastGE 5q31.1 type=processed transcript,protein coding,retained intron, GO=[branchiomeric skeletal muscle development; myoblast cell fate commitment; embryonic hindlimb morphogenesis; myoblast differentiation; diencephalon development; muscle tissue development; skeletal system development; protein binding transcription factor activity; brain development; nucleolus; positive regulation of transcription from RNA polymerase II promoter; central nervous system development] d PNMA3 X:152224766-152228827 paraneoplastic antigen MA3 [Source:HGNC Symbol;Acc:18742], tcgaBreastGESurv, Xq28 type=nonsense mediated decay,protein coding, GO=[nucleolus] tcgaGliomaGE u PPAPDC1B 8:38120648-38126761 phosphatidic acid phosphatase type 2 domain containing 1B [Source:HGNC Symbol;Acc:25026], tcgaBreastGE, 8p11.23 type=nonsense mediated decay,processed transcript,protein coding,retained intron, tcgaGliomaGE, GO=[phosphatidate phosphatase activity] tscapeCRCa, tscapeHCCd, tscapeNSCLCd u PPFIBP2 11:7534529-7678358 PTPRF interacting protein, binding protein 2 (liprin beta 2) [Source:HGNC Symbol;Acc:9250], 11p15.4 type=processed transcript,protein coding,retained intron, GO=[integrase activity; DNA integration] u PRKCH 14:61788435-62017698 protein kinase C, eta [Source:HGNC Symbol;Acc:9403], type=protein coding, GO=[protein kinase 14q23.1 C activity; platelet activation; protein serine/threonine kinase activity] u PRR15L 17:46029336-46035110 proline rich 15-like [Source:HGNC Symbol;Acc:28149], type=protein coding 17q21.32 d PRSS1 7:142457319-142460923 protease, serine, 1 (trypsin 1) [Source:HGNC Symbol;Acc:9475], tcgaBreastGE, 7q34 type=protein coding,retained intron, GO=[serine-type endopeptidase activity; extracellular space] tscapeMelanomaa, tscapeOvariand u RAB3B 1:52373628-52456436 RAB3B, member RAS oncogene family [Source:HGNC Symbol;Acc:9778], type=protein coding, tcgaGliomaGE 1p32.3 GO=[peptidyl-cysteine methylation; regulation of exocytosis; synaptic vesicle; GTPase activity; GTP binding; purine ribonucleoside triphosphate metabolic process] u RAB3IP 12:70132642-70211157 RAB3A interacting protein (rabin3) [Source:HGNC Symbol;Acc:16508], tscapeRCCa 12q15 type=nonsense mediated decay,protein coding,retained intron, GO=[Golgi to plasma membrane transport; microtubule basal body; cilium assembly; guanyl-nucleotide exchange factor activity; centrosome] d REEP1 2:86441116-86565206 receptor accessory protein 1 [Source:HGNC Symbol;Acc:25786], tcgaGliomaGE 2p11.2 type=processed transcript,protein coding,retained intron, GO=[olfactory receptor binding; protein insertion into membrane] u REPS2 X:16964814-17171395 RALBP1 associated Eps domain containing 2 [Source:HGNC Symbol;Acc:9963], snp3dProstateC, Xp22.13, Xp22.2 type=processed transcript,protein coding,retained intron, GO=[calcium ion binding] tcgaBreastGE, tcgaGliomaGE, tscapeHCCa u RGS2 1:192778169-192781403 regulator of G-protein signaling 2, 24kDa [Source:HGNC Symbol;Acc:9998], fileBC2brain, tcgaBreastGE 1q31.2 type=processed transcript,protein coding, GO=[positive regulation of cardiac muscle contraction; relaxation of cardiac muscle; negative regulation of phospholipase activity; negative regulation of cardiac muscle hypertrophy; negative regulation of muscle contraction; negative regulation of G-protein coupled receptor protein signaling pathway; brown fat cell differentiation; negative regulation of MAP kinase activity; fat cell differentiation; negative regulation of protein kinase activity; calmodulin binding; regulation of MAP kinase activity; regulation of protein serine/threonine kinase activity; gamete generation; regulation of protein kinase activity; regulation of kinase activity; sexual reproduction; multicellular organismal reproductive process; multicellular organism reproduction] u RIPK4 21:43159529-43187266 receptor-interacting serine-threonine kinase 4 [Source:HGNC Symbol;Acc:496], 21q22.3 type=protein coding, GO=[protein serine/threonine kinase activity] d RNFT2 12:117176096- ring finger protein, transmembrane 2 [Source:HGNC Symbol;Acc:25905], type=protein coding, tcgaBreastGE, 117291436 GO=[iron-sulfur cluster binding; electron carrier activity] tcgaGliomaGE 12q24.22 u RP11- 1:160864770-160865866 [undefined], type=processed pseudogene,pseudogene 312J18.5 1q23.3 u RP11- 12:123641057- [undefined], type=nonsense mediated decay,protein coding,retained intron 546D6.2 123728561 12q24.31 d RP11- 9:139440664-139444345 [undefined], type=processed transcript 611D20.2 9q34.3 u SASH1 6:148593440-148873186 SAM and SH3 domain containing 1 [Source:HGNC Symbol;Acc:19182], tscapeOvariand 6q24.3 type=processed transcript,protein coding Continued on next page. . .

31 S name locus description studies d SDC4 20:43953928-43977064 syndecan 4 [Source:HGNC Symbol;Acc:10661], type=protein coding, GO=[thrombospondin tscapeNSCLCd 20q13.12 receptor activity; positive regulation of focal adhesion assembly; costamere; positive regulation of stress fiber assembly; fibronectin binding; protein kinase C binding; ureteric bud development; focal adhesion; kinase binding; cell surface; regulation of protein kinase activity; regulation of kinase activity] d SELENBP1 1:151336778-151345209 selenium binding protein 1 [Source:HGNC Symbol;Acc:10719], tcgaGliomaGE, 1q21.3 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[selenium tscapeHCCa, binding; nucleolus] tscapeMelanomaa, tscapeOvariana, tscapeSCLCa u SEMA4A 1:156117157-156147543 sema domain, immunoglobulin domain (Ig), transmembrane domain (TM) and short cytoplasmic tcgaGliomaGE, tscapeHCCa 1q22 domain, (semaphorin) 4A [Source:HGNC Symbol;Acc:10729], type=processed transcript,protein coding, GO=[visual perception] u SEPP1 5:42799982-42887494 selenoprotein P, plasma, 1 [Source:HGNC Symbol;Acc:10751], 5p12 type=processed transcript,protein coding,retained intron, GO=[selenium compound metabolic process; response to selenium ion; selenium binding; post-embryonic development; locomotory behavior; brain development; sexual reproduction; central nervous system development; extracellular space] u SETBP1 18:42260138-42648475 SET binding protein 1 [Source:HGNC Symbol;Acc:15573], type=protein coding tcgaBreastGE 18q12.3 u SLC2A12 6:134309835-134373774 solute carrier family 2 (facilitated glucose transporter), member 12 [Source:HGNC tcgaGliomaGE, tscapeBCa, 6q23.2 Symbol;Acc:18067], type=protein coding, GO=[D-glucose transmembrane transporter activity; tscapeCRCa carbohydrate transport; perinuclear region of cytoplasm] d SLC30A3 2:27476552-27498685 solute carrier family 30 (zinc transporter), member 3 [Source:HGNC Symbol;Acc:11014], tcgaGliomaGE 2p23.3 type=protein coding,retained intron, GO=[zinc transporting ATPase activity; regulation of sequestering of zinc ion; zinc ion transport; synaptic vesicle membrane; synaptic vesicle; late endosome; divalent metal ion transport] u SLC41A1 1:205758221-205782876 solute carrier family 41, member 1 [Source:HGNC Symbol;Acc:19429], tscapeBCa 1q32.1 type=processed transcript,protein coding, GO=[magnesium ion transmembrane transporter activity; magnesium ion transport; divalent metal ion transport] d SLC44A1 9:108006903-108201452 solute carrier family 44, member 1 [Source:HGNC Symbol;Acc:18798], 9q31.1, 9q31.2 type=nonsense mediated decay,protein coding, GO=[choline transmembrane transporter activity; choline transport; mitochondrial outer membrane; organelle outer membrane] u* SMPD2 6:109761966-109765122 sphingomyelin phosphodiesterase 2, neutral membrane (neutral sphingomyelinase) [Source:HGNC tcgaBreastGE, tscapeCRCd, 6q21 Symbol;Acc:11121], type=protein coding,retained intron, GO=[sphingomyelin phosphodiesterase tscapeNSCLCd, activity; sphingomyelin metabolic process; ceramide biosynthetic process; caveola; response to tscapeOvariand, mechanical stimulus; induction of apoptosis by extracellular signals; nerve growth factor receptor tscapeSCLCd signaling pathway] u* SMS X:21958691-22025798 spermine synthase [Source:HGNC Symbol;Acc:11123], type=processed transcript,protein coding, Xp22.11 GO=[spermine synthase activity; spermidine synthase activity; spermine biosynthetic process; methionine metabolic process] u SNORD88C 19:51305585-51305675 small nucleolar RNA, C/D box 88C [Source:HGNC Symbol;Acc:32749], type=snoRNA 19q13.33 d SOX9 17:70117161-70122561 SRY (sex determining region Y)-box 9 [Source:HGNC Symbol;Acc:11204], type=protein coding, tcgaBreastGESurv, 17q24.3 GO=[epithelial cell proliferation involved in prostatic bud elongation; male germ-line sex tcgaGliomaGE, determination; primary sex determination, germ-line; intestinal epithelial structure maintenance; tscapeCRCd, Sertoli cell development; negative regulation of myoblast differentiation; negative regulation of tscapeNSCLCa bone mineralization; negative regulation of chondrocyte differentiation; epithelial cell proliferation involved in prostate gland development; branch elongation of an epithelium; regulation of myoblast differentiation; positive regulation of epithelial cell differentiation; negative regulation of ossification; cartilage condensation; cellular response to retinoic acid; endochondral bone morphogenesis; regulation of skeletal muscle tissue development; myoblast differentiation; regulation of cartilage development; cellular response to vitamin; regulation of bone mineralization; neural crest cell development; oligodendrocyte differentiation; bone morphogenesis; epithelial to mesenchymal transition; cell fate specification; negative regulation of epithelial cell proliferation; hair follicle development; specific RNA polymerase II transcription factor activity; cellular response to nutrient levels; cellular response to extracellular stimulus; gonad development; urogenital system development; promoter binding; muscle tissue development; response to nutrient; reproductive structure development; transcription activator activity; response to nutrient levels; skeletal system development; response to extracellular stimulus; developmental process involved in reproduction; gamete generation; positive regulation of transcription from RNA polymerase II promoter; sexual reproduction; negative regulation of transcription, DNA-dependent; multicellular organismal reproductive process; multicellular organism reproduction; central nervous system development; negative regulation of transcription] u SPATA2 20:48519928-48532080 spermatogenesis associated 2 [Source:HGNC Symbol;Acc:14681], type=protein coding, tcgaGliomaGE, tscapeHCCa 20q13.13 GO=[gamete generation; sexual reproduction; multicellular organismal reproductive process; multicellular organism reproduction] u* ST6GALNAC1 17:74620847-74639894 ST6 (alpha-N-acetyl-neuraminyl-2,3-beta-galactosyl-1,3)-N-acetylgalactosaminide tcgaBreastGE 17q25.1 alpha-2,6-sialyltransferase 1 [Source:HGNC Symbol;Acc:23614], type=protein coding, GO=[alpha-N-acetylgalactosaminide alpha-2,6-sialyltransferase activity; integral to Golgi membrane; protein glycosylation; Golgi membrane] d ST7 7:116593292-116870157 suppression of tumorigenicity 7 [Source:HGNC Symbol;Acc:11351], tscapeMelanomaa, 7q31.2 type=nonsense mediated decay,processed transcript,protein coding,retained intron tscapeMelanomad, tscapeNSCLCa u STRA13 17:79976579-79980794 stimulated by retinoic acid 13 homolog (mouse) [Source:HGNC Symbol;Acc:11422], tcgaBreastGE, 17q25.3 type=protein coding, GO=[Fanconi anaemia nuclear complex; chromosome, centromeric region] tscapeMelanomaa, tscapeNSCLCa, tscapeOvariana u TACC2 10:123748689- transforming, acidic coiled-coil containing protein 2 [Source:HGNC Symbol;Acc:11523], tcgaGliomaGE, tscapeBCd, 124014060 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[nuclear tscapeGliomad, 10q26.13 binding] tscapeNSCLCd u TACSTD2 1:59041099-59043166 tumor-associated calcium signal transducer 2 [Source:HGNC Symbol;Acc:11530], 1p32.1 type=protein coding, GO=[visual perception] u TBC1D4 13:75858808-76056250 TBC1 domain family, member 4 [Source:HGNC Symbol;Acc:19165], tcgaBreastGESurv, 13q22.2 type=processed transcript,protein coding, GO=[Rab GTPase activator activity; regulation of Rab tcgaGliomaGE, tscapeBCd, GTPase activity; purine ribonucleoside triphosphate metabolic process] tscapeProstated u TEX2 17:62224796-62340653 testis expressed 2 [Source:HGNC Symbol;Acc:30884], type=protein coding tcgaGliomaGE 17q23.3 d TGM3 20:2276647-2321724 transglutaminase 3 (E polypeptide, protein-glutamine-gamma-glutamyltransferase) [Source:HGNC 20p13 Symbol;Acc:11779], type=processed transcript,protein coding, GO=[cell envelope organization; protein-glutamine gamma-glutamyltransferase activity; extrinsic to internal side of plasma membrane; hair follicle morphogenesis; GDP binding; peptide cross-linking; keratinization; protein tetramerization; hair follicle development; keratinocyte differentiation; magnesium ion binding; GTPase activity; GTP binding; purine ribonucleoside triphosphate metabolic process; calcium ion binding] d TMEM14A 6:52535907-52551386 transmembrane protein 14A [Source:HGNC Symbol;Acc:21076], type=protein coding tcgaBreastGE, 6p12.2 tscapeNSCLCa u TMEM79 1:156252726-156262976 transmembrane protein 79 [Source:HGNC Symbol;Acc:28196], tcgaBreastGE, tscapeHCCa 1q22 type=processed transcript,protein coding u TNS3 7:47314752-47622156 tensin 3 [Source:HGNC Symbol;Acc:21616], tcgaGliomaGE 7p12.3 type=processed transcript,protein coding,retained intron, GO=[lung alveolus development; focal adhesion; dephosphorylation] d TOX2 20:42543492-42698256 TOX high mobility group box family member 2 [Source:HGNC Symbol;Acc:16095], 20q13.12 type=protein coding, GO=[transcription activator activity; positive regulation of transcription from RNA polymerase II promoter] u TRIB3 20:361261-378203 tribbles homolog 3 (Drosophila) [Source:HGNC Symbol;Acc:16228], cosmicRecurrent, 20p13 type=processed transcript,protein coding, GO=[ubiquitin-protein ligase regulator activity; ligase tcgaBreastGE, regulator activity; negative regulation of fatty acid biosynthetic process; positive regulation of tcgaGliomaGE, tscapeBCd, protein binding; negative regulation of fat cell differentiation; protein kinase inhibitor activity; tscapeNSCLCd regulation of glucose transport; ubiquitin protein ligase binding; positive regulation of ubiquitin-protein ligase activity; phosphatidylinositol-mediated signaling; fat cell differentiation; fatty acid biosynthetic process; negative regulation of protein kinase activity; carbohydrate transport; transcription corepressor activity; positive regulation of binding; insulin receptor signaling pathway; regulation of MAP kinase activity; nerve growth factor receptor signaling pathway; kinase binding; response to insulin stimulus; cellular response to peptide hormone stimulus; fatty acid metabolic process; regulation of protein serine/threonine kinase activity; response to peptide hormone stimulus; protein binding transcription factor activity; monocarboxylic acid metabolic process; regulation of protein kinase activity; regulation of kinase activity; protein serine/threonine kinase activity; negative regulation of transcription] Continued on next page. . .

32 S name locus description studies u TRPM4 19:49661052-49715091 transient receptor potential cation channel, subfamily M, member 4 [Source:HGNC tcgaBreastGE, 19q13.33 Symbol;Acc:17993], type=protein coding, GO=[dendritic cell chemotaxis; regulation of T cell tscapeNSCLCd, cytokine production; calcium activated cation channel activity; protein sumoylation; cytokine tscapeOvariand production involved in immune response; positive regulation of canonical Wnt receptor signaling pathway; calcium ion transmembrane transport; regulation of production of molecular mediator of immune response; calcium channel activity; regulation of canonical Wnt receptor signaling pathway; canonical Wnt receptor signaling pathway; calmodulin binding; leukocyte migration; divalent metal ion transport] u TRPM8 2:234826043-234928166 transient receptor potential cation channel, subfamily M, member 8 [Source:HGNC tscapeBCd, 2q37.1 Symbol;Acc:17961], tscapeMelanomad, type=nonsense mediated decay,processed transcript,protein coding,retained intron, tscapeNSCLCd, GO=[thermoception; protein homotrimerization; response to cold; protein homotetramerization; tscapeOvariand, calcium ion transmembrane transport; protein tetramerization; calcium channel activity; external tscapeRCCd side of plasma membrane; divalent metal ion transport; cell surface; protein homodimerization activity] d TSHZ3 19:31765851-31840431 teashirt zinc finger homeobox 3 [Source:HGNC Symbol;Acc:30700], type=protein coding, tcgaBreastGE 19q12 GO=[kidney smooth muscle cell differentiation; ureteric peristalsis; ureter smooth muscle contraction; sensory perception of touch; ureter smooth muscle cell differentiation; ureter smooth muscle development; positive regulation of smooth muscle cell differentiation; regulation of respiratory gaseous exchange by neurological system process; respiratory system process; kidney morphogenesis; metanephros development; growth cone; ureteric bud development; renal system development; urogenital system development; muscle tissue development; transcription repressor activity; negative regulation of transcription] d TSPAN3 15:77336359-77363570 tetraspanin 3 [Source:HGNC Symbol;Acc:17752], type=protein coding tcgaOvarianGE 15q24.3 d TULP4 6:158733692-158932860 tubby like protein 4 [Source:HGNC Symbol;Acc:15530], type=protein coding, GO=[response to tcgaGliomaGE, tscapeBCd, 6q25.3 nutrient; response to nutrient levels; response to extracellular stimulus] tscapeOvariand, tscapeProstated, tscapeSCLCd u* UBE2G1 17:4172512-4269969 ubiquitin-conjugating enzyme E2G 1 (UBC7 homolog, yeast) [Source:HGNC Symbol;Acc:12482], 17p13.2 type=protein coding, GO=[protein K63-linked ubiquitination; protein K48-linked ubiquitination; ubiquitin protein ligase binding; post-translational protein modification; ubiquitin-protein ligase activity; acid-amino acid ligase activity] d* UGT2B17 4:69402902-69434245 UDP glucuronosyltransferase 2 family, polypeptide B17 [Source:HGNC Symbol;Acc:12547], 4q13.2 type=protein coding, GO=[glucuronosyltransferase activity; microsome] u* UGT2B28 4:70146217-70160768 UDP glucuronosyltransferase 2 family, polypeptide B28 [Source:HGNC Symbol;Acc:13479], 4q13.2 type=protein coding, GO=[glucuronosyltransferase activity; xenobiotic metabolic process; microsome] u VIPR1 3:42530791-42579059 vasoactive intestinal peptide receptor 1 [Source:HGNC Symbol;Acc:12694], tcgaGliomaGE 3p22.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[vasoactive intestinal polypeptide receptor activity; G-protein signaling, coupled to cyclic nucleotide second messenger] u VPS26B 11:134094539- vacuolar protein sorting 26 homolog B (S. pombe) [Source:HGNC Symbol;Acc:28119], tcgaGliomaGE, tscapeBCd, 134117686 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[retromer tscapeNSCLCd 11q25 complex; vacuolar transport] u VWF 12:6058040-6233936 von Willebrand factor [Source:HGNC Symbol;Acc:12726], tcgaGliomaGE 12p13.31 type=processed transcript,protein coding,retained intron, GO=[Weibel-Palade body; immunoglobulin binding; blood coagulation, intrinsic pathway; protease binding; chaperone binding; collagen binding; platelet alpha granule lumen; glycoprotein binding; integrin binding; liver development; protein N-terminus binding; platelet degranulation; serine-type endopeptidase inhibitor activity; external side of plasma membrane; growth factor activity; platelet activation; cell surface; protein homodimerization activity] u WDYHV1 8:124428965-124479470 WDYHV motif containing 1 [Source:HGNC Symbol;Acc:25490], tcgaBreastGE 8q24.13 type=nonsense mediated decay,processed transcript,protein coding, GO=[protein N-terminal glutamine amidohydrolase activity] Continued on next page. . .

33 S name locus description studies d* WNT5A 3:55499743-55523973 wingless-type MMTV integration site family, member 5A [Source:HGNC Symbol;Acc:12784], tcgaBreastGESurv, 3p14.3 type=processed transcript,protein coding,retained intron, GO=[receptor tyrosine kinase-like tcgaGliomaGE orphan receptor binding; metestrus; cervix development; positive regulation of cytokine secretion involved in immune response; positive regulation of cell-cell adhesion mediated by cadherin; regulation of cell-cell adhesion mediated by cadherin; planar cell polarity pathway involved in pericardium morphogenesis; planar cell polarity pathway involved in cardiac muscle tissue morphogenesis; planar cell polarity pathway involved in cardiac right atrium morphogenesis; planar cell polarity pathway involved in ventricular septum morphogenesis; planar cell polarity pathway involved in outflow tract morphogenesis; planar cell polarity pathway involved in heart morphogenesis; non-canonical Wnt receptor signaling pathway involved in heart development; lateral sprouting involved in mammary gland duct morphogenesis; positive regulation of type I interferon-mediated signaling pathway; hypophysis morphogenesis; cell-cell adhesion mediated by cadherin; optic cup formation involved in camera-type eye development; pericardium morphogenesis; cardiac right atrium morphogenesis; negative regulation of mesenchymal cell proliferation; epithelial cell proliferation involved in mammary gland duct elongation; negative regulation of prostatic bud formation; positive regulation of protein kinase C signaling cascade; regulation of protein kinase C signaling cascade; development; negative regulation of synaptogenesis; positive regulation of T cell chemotaxis; regulation of T cell chemotaxis; T cell chemotaxis; type B pancreatic cell development; positive regulation of thymocyte apoptosis; positive regulation of T cell apoptosis; positive regulation of macrophage cytokine production; regulation of branching involved in mammary gland duct morphogenesis; mammary gland branching involved in thelarche; positive regulation of meiosis; thelarche; primitive streak formation; dopaminergic neuron differentiation; positive regulation of cartilage development; convergent extension involved in organogenesis; frizzled-2 binding; hindgut morphogenesis; macrophage cytokine production; development; negative regulation of fibroblast growth factor receptor signaling pathway; cellular response to transforming growth factor beta stimulus; hemopoietic stem cell proliferation; mesenchymal-epithelial cell signaling; negative regulation of axon extension involved in axon guidance; positive regulation of chemokine biosynthetic process; positive regulation of peptidyl-threonine phosphorylation; activation of protein kinase B activity; vagina development; positive regulation of cGMP metabolic process; cellular response to calcium ion; anterior/posterior axis specification, embryo; tripartite regional subdivision; midgut development; cochlea morphogenesis; branch elongation of an epithelium; planar cell polarity pathway involved in neural tube closure; regulation of establishment of planar polarity involved in neural tube closure; ventricular septum morphogenesis; negative chemotaxis; tail morphogenesis; response to hyperoxia; dorsal/ventral axis specification; outflow tract morphogenesis; frizzled binding; positive regulation of cell-cell adhesion; face development; negative regulation of fat cell differentiation; Wnt receptor signaling pathway, calcium modulating pathway; positive regulation of interferon-gamma production; response to testosterone stimulus; positive regulation of peptidyl-serine phosphorylation; positive regulation of endothelial cell migration; positive regulation of mesenchymal cell proliferation; cytokine production involved in immune response; negative regulation of reproductive process; cellular response to retinoic acid; positive regulation of JNK cascade; negative regulation of BMP signaling pathway; regulation of synapse structure and activity; positive regulation of ossification; activation of JUN kinase activity; peptidyl-threonine phosphorylation; genitalia development; lung alveolus development; positive regulation of fibroblast proliferation; regulation of cartilage development; heart looping; positive regulation of endothelial cell proliferation; embryonic digit morphogenesis; cellular response to vitamin; lens development in camera-type eye; regulation of interferon-gamma production; cellular response to growth factor stimulus; cellular response to lipopolysaccharide; somitogenesis; embryonic pattern specification; regulation of production of molecular mediator of immune response; epithelial to mesenchymal transition; palate development; negative regulation of canonical Wnt receptor signaling pathway; positive regulation of angiogenesis; positive regulation of protein catabolic process; diencephalon development; negative regulation of epithelial cell proliferation; cellular response to interferon-gamma; determination of left/right symmetry; positive regulation of NF-kappaB transcription factor activity; peptidyl-serine phosphorylation; BMP signaling pathway; regulation of canonical Wnt receptor signaling pathway; keratinocyte differentiation; specific transcriptional repressor activity; embryonic skeletal system development; response to estradiol stimulus; female sex differentiation; fat cell differentiation; cellular response to nutrient levels; positive regulation of MAPKKK cascade; negative regulation of cell growth; positive regulation of transcription factor activity; positive regulation of transcription regulator activity; regulation of transmembrane receptor protein serine/threonine kinase signaling pathway; negative regulation of cell size; response to glucocorticoid stimulus; canonical Wnt receptor signaling pathway; cellular response to extracellular stimulus; renal system development; response to estrogen stimulus; positive regulation of binding; negative regulation of gene-specific transcription from RNA polymerase II promoter; gonad development; urogenital system development; promoter binding; leukocyte migration; cytokine activity; transmembrane receptor protein serine/threonine kinase signaling pathway; regulation of MAP kinase activity; response to organic cyclic compound; muscle tissue development; response to nutrient; reproductive structure development; regulation of protein serine/threonine kinase activity; transcription repressor activity; transcription activator activity; response to nutrient levels; skeletal system development; response to extracellular stimulus; regulation of gene-specific transcription from RNA polymerase II promoter; cell surface; developmental process involved in reproduction; negative regulation of transcription from RNA polymerase II promoter; brain development; regulation of protein kinase activity; regulation of kinase activity; negative regulation of transcription, DNA-dependent; multicellular organismal reproductive process; multicellular organism reproduction; central nervous system development; extracellular space; negative regulation of transcription] u WWC1 5:167718656-167899308 WW and C2 domain containing 1 [Source:HGNC Symbol;Acc:29435], tcgaGliomaGE, 5q34 type=TEC,processed transcript,protein coding,retained intron, GO=[regulation of hippo signaling tscapeNSCLCa, cascade; ruffle membrane; positive regulation of MAPKKK cascade; transcription coactivator tscapeRCCa activity; perinuclear region of cytoplasm; protein binding transcription factor activity] u YIPF1 1:54317392-54356407 Yip1 domain family, member 1 [Source:HGNC Symbol;Acc:25231], tcgaBreastGE 1p32.3 type=nonsense mediated decay,processed transcript,protein coding, GO=[transport vesicle] u ZBTB16 11:113930315- zinc finger and BTB domain containing 16 [Source:HGNC Symbol;Acc:12930], tcgaBreastGE, tscapeBCd, 114121398 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[male tscapeMelanomad, 11q23.2 germ-line stem cell division; germ-line stem cell division; leg morphogenesis; mesonephros tscapeProstated development; embryonic hindlimb morphogenesis; forelimb morphogenesis; embryonic digit morphogenesis; transcriptional repressor complex; negative regulation of myeloid cell differentiation; embryonic pattern specification; PML body; specific transcriptional repressor activity; nuclear speck; double-stranded DNA binding; protein C-terminus binding; renal system development; urogenital system development; transcription repressor activity; skeletal system development; developmental process involved in reproduction; protein homodimerization activity; gamete generation; sexual reproduction; negative regulation of transcription, DNA-dependent; multicellular organismal reproductive process; multicellular organism reproduction; central nervous system development; negative regulation of transcription] u ZBTB24 6:109783797-109804440 zinc finger and BTB domain containing 24 [Source:HGNC Symbol;Acc:21143], type=protein coding tscapeCRCd, tscapeSCLCd 6q21 d* ZIC2 13:100634026- Zic family member 2 (odd-paired homolog, Drosophila) [Source:HGNC Symbol;Acc:12873], tscapeSCLCa 100639018 type=processed transcript,protein coding, GO=[chromatin DNA binding; retinal ganglion cell axon 13q32.3 guidance; developmental pigmentation; positive regulation of transcription factor activity; positive regulation of transcription regulator activity; positive regulation of binding; visual perception; transcription repressor activity; transcription activator activity; brain development; central nervous system development; negative regulation of transcription] d ZMYND15 17:4643319-4649411 zinc finger, MYND-type containing 15 [Source:HGNC Symbol;Acc:20997], type=protein coding 17p13.2 d ZNF462 9:109625378-109775915 zinc finger protein 462 [Source:HGNC Symbol;Acc:21684], tcgaGliomaGE 9q31.2 type=processed transcript,protein coding, GO=[negative regulation of DNA binding; positive regulation of transcription from RNA polymerase II promoter]

34 9.2.1 GO enrichment of all candidates

Table 18: Enriched Gene Ontology terms [1] (FDR corrected p ≤ 0.05). Ratio is the proportion of the annotated genes among the whole gene set. List is sorted based on the FDR corrected p-values. Green and blue borders are referring to up and down regulated genes, respectively.

Ratio Type Description Genes 0.106 BP developmental process involved in ADAMTS1, ARID5B, BMPR1B, CITED2, EAF2, FZD2, HSD17B4, IDH1, MERTK, MTOR, SOX9, WNT5A, reproduction ZBTB16 0.081 BP reproductive structure development ADAMTS1, ARID5B, BMPR1B, EAF2, FZD2, HSD17B4, IDH1, MERTK, SOX9, WNT5A

reproductive structure development

developmental process involved in reproduction

biological_process

Figure 11: Relationships between the enriched biological process Gene Ontology terms that were listed in Table 18. The darkness of the red reflects the significance of the enrichment and the thicknesses of the edges are propotional to the numbers of genes sharing the following annotation.

35 10 Candidate report for Unique DEGs for AR with siFOXA1

10.1 Moksiskaan candidate pathway

DOCK2

CRK

RAPGEF1 IL6R

ABL1 ALDH4A1 GLUD1

ALDH3B2 F12 PLG CBLB GSTM3

ALDH3A2 SHC1 PDGFRA KDR, MET HIBADH

FLT1

PIK3C2A PIK3CD PLCB1 PGF PDGFRB IGF1R

PLCB2 PLCB3 ERBB2, FLT4 JUN ITPR3 PLCB4 NCAM1 GBE1 PHKA1 NCAM2

GNAO1 CCND3 PYGL PIAS1 PIM1

STAT4, STAT6

FZD8

ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9

PDE2A

NTPCR

36 gene gene pathway protein protein protein protein protein-protein dephosphorylation phosphorylation expression repression precedence activation binding dissociation inhibition interaction

Figure 12: Known relationships between the candidate genes. Candidate genes are shown in red if they have only output connections. The ratio of input and output connections determines how light they are. Completely white genes have only input connections. The maximum of 1 other gene step(s) are allowed between the candidate genes and these intermediate genes are shown on gray. Green and blue borders are referring to up and down regulated genes, respectively. Light grey is used to emphasize stably expressed genes. Known regulations are shown with bold borders whereas the predictions are kept thin.

Table 19: Descriptions of the intermediated genes between the candidate genes. Studies that have reported results about the candidate genes are listed so that those with negative evidence have been prefixed with a hyphen. This table has 26 rows.

name description studies ABL1 c- oncogene 1, non-receptor tyrosine kinase [Source:HGNC Symbol;Acc:76] locus=9:133589268-133763062 cosmicRecurrent, tcgaGliomaGE ADCY2 adenylate cyclase 2 (brain) [Source:HGNC Symbol;Acc:233] locus=5:7396321-7830194 tcgaGliomaGE ADCY4 adenylate cyclase 4 [Source:HGNC Symbol;Acc:235] locus=14:24787559-24804277 tcgaBreastGE, tscapeHCCa ADCY5 adenylate cyclase 5 [Source:HGNC Symbol;Acc:236] locus=3:123001143-123168605 tcgaGliomaGE ADCY6 adenylate cyclase 6 [Source:HGNC Symbol;Acc:237] locus=12:49159975-49177877 tscapeRCCa ADCY7 adenylate cyclase 7 [Source:HGNC Symbol;Acc:238] locus=16:50300462-50352046 tscapeBCd ADCY9 adenylate cyclase 9 [Source:HGNC Symbol;Acc:240] locus=16:4012657-4166186 tcgaGliomaGE, tcgaOvarianGE, tscapeHCCd, tscapeOvariand DOCK2 dedicator of cytokinesis 2 [Source:HGNC Symbol;Acc:2988] locus=5:169064251-169510386 tcgaGliomaGE, tscapeNSCLCa, tscapeRCCa ERBB2 v-erb-b2 erythroblastic leukemia viral oncogene homolog 2, neuro/glioblastoma derived oncogene homolog (avian) cosmicRecurrent, snp3dBC, [Source:HGNC Symbol;Acc:3430] locus=17:37844393-37884915 snp3dLungC, snp3dMetastasis, snp3dProstateC, tcgaGliomaGE, tscapeBCa, tscapeCRCa, tscapeCRCd, tscapeNSCLCa, tscapeOvariand, tscapeSCLCd FLT1 fms-related tyrosine kinase 1 (vascular endothelial growth factor/vascular permeability factor receptor) [Source:HGNC tscapeBCd, tscapeSCLCd Symbol;Acc:3763] locus=13:28874489-29069265 FLT4 fms-related tyrosine kinase 4 [Source:HGNC Symbol;Acc:3767] locus=5:180028506-180076624 tscapeBCa, tscapeNSCLCa, tscapeOvariand, tscapeRCCa GNAO1 guanine nucleotide binding protein (G protein), alpha activating activity polypeptide O [Source:HGNC Symbol;Acc:4389] tcgaGliomaGE locus=16:56225302-56391354 IGF1R insulin-like growth factor 1 receptor [Source:HGNC Symbol;Acc:5465] locus=15:99192200-99507759 snp3dBC, tscapeCRCa, tscapeMelanomaa, tscapeNSCLCa KDR kinase insert domain receptor (a type III receptor tyrosine kinase) [Source:HGNC Symbol;Acc:6307] tscapeNSCLCa locus=4:55944644-55991756 MET met proto-oncogene (hepatocyte growth factor receptor) [Source:HGNC Symbol;Acc:7029] locus=7:116312248-116438440 snp3dLungC, snp3dMetastasis, snp3dProstateC, tcgaGliomaGE, tscapeMelanomaa, tscapeNSCLCa, tscapeOvariand NCAM1 neural 1 [Source:HGNC Symbol;Acc:7656] locus=11:112831997-113149158 tcgaGliomaGE, tscapeBCd, tscapeMelanomad, tscapeProstated, tscapeRCCd PDGFRA platelet-derived growth factor receptor, alpha polypeptide [Source:HGNC Symbol;Acc:8803] locus=4:55095264-55164414 cosmicPrimary, tcgaBreastGE, tcgaGliomaGE, tscapeNSCLCa PDGFRB platelet-derived growth factor receptor, beta polypeptide [Source:HGNC Symbol;Acc:8804] locus=5:149493400-149535423 tcgaGliomaGE PLCB1 phospholipase C, beta 1 (phosphoinositide-specific) [Source:HGNC Symbol;Acc:15917] locus=20:8112824-8949003 tcgaBreastGE, tcgaGliomaGE PLCB2 phospholipase C, beta 2 [Source:HGNC Symbol;Acc:9055] locus=15:40580100-40600174 tscapeCRCd, tscapeMelanomad, tscapeNSCLCd, tscapeOvariand PLCB3 phospholipase C, beta 3 (phosphatidylinositol-specific) [Source:HGNC Symbol;Acc:9056] locus=11:64018995-64036622 tcgaGliomaGE PLCB4 phospholipase C, beta 4 [Source:HGNC Symbol;Acc:9059] locus=20:9049410-9461889 tcgaGliomaGE PLG plasminogen [Source:HGNC Symbol;Acc:9071] locus=6:161123270-161174338 cosmicPrimary RAPGEF1 Rap guanine nucleotide exchange factor (GEF) 1 [Source:HGNC Symbol;Acc:4568] locus=9:134452157-134615461 tcgaOvarianGE STAT4 signal transducer and activator of transcription 4 [Source:HGNC Symbol;Acc:11365] locus=2:191894302-192016322 STAT6 signal transducer and activator of transcription 6, interleukin-4 induced [Source:HGNC Symbol;Acc:11368] tcgaGliomaGE locus=12:57489195-57505196

Table 20: List of KEGG [2] pathways supporting the relationships between the genes shown in Figure 12. Number of edges taken from each pathway is shown on edges column.

name edges genes Focal adhesion 26 CCND3, CRK, ERBB2, FLT1, FLT4, IGF1R, JUN, KDR, MET, PDGFRA, PDGFRB, PGF, PIK3CD, RAPGEF1, SHC1 Phosphatidylinositol signaling system 24 ITPR3, PIK3C2A, PIK3CD, PLCB1, PLCB2, PLCB3, PLCB4 Purine metabolism 18 ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, NTPCR, PDE2A Endocytosis 10 CBLB, FLT1, IGF1R, KDR, MET, PDGFRA Glioma 9 IGF1R, PDGFRA, PDGFRB, PIK3CD, SHC1 Long-term depression 9 GNAO1, IGF1R, ITPR3, PLCB1, PLCB2, PLCB3, PLCB4 GnRH signaling pathway8 ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, ITPR3, JUN, PLCB1, PLCB2, PLCB3, PLCB4 Long-term potentiation 8 ITPR3, PLCB1, PLCB2, PLCB3, PLCB4 Gap junction8 ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, ITPR3, PDGFRA, PDGFRB, PLCB1, PLCB2, PLCB3, PLCB4 Calcium signaling pathway 8 ADCY2, ADCY4, ADCY7, ADCY9, ERBB2, ITPR3, PDGFRA, PDGFRB, PHKA1, PLCB1, PLCB2, PLCB3, PLCB4 Jak-STAT signaling pathway7 CBLB, CCND3, IL6R, PIAS1, PIK3CD, PIM1, STAT4, STAT6 Melanogenesis 6 ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, FZD8, GNAO1, PLCB1, PLCB2, PLCB3, PLCB4 Cholinergic synapse6 ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, GNAO1, ITPR3, PIK3CD, PLCB1, PLCB2, PLCB3, PLCB4 Glutamatergic synapse 6 ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, GNAO1, ITPR3, PLCB1, PLCB2, PLCB3, PLCB4 Pathways in cancer5 ABL1, CBLB, CRK, ERBB2, FZD8, IGF1R, JUN, MET, PDGFRA, PDGFRB, PGF, PIAS1, PIK3CD Prostate cancer 4 ERBB2, IGF1R, PDGFRA, PDGFRB, PIK3CD Chagas disease (American trypanosomiasis)4 GNAO1, JUN, PIK3CD, PLCB1, PLCB2, PLCB3, PLCB4 Cell adhesion molecules (CAMs) 4 NCAM1, NCAM2 Inositol phosphate metabolism4 PIK3C2A, PIK3CD, PLCB1, PLCB2, PLCB3, PLCB4 Measles 3 CBLB, CCND3, PIK3CD Insulin signaling pathway3 CBLB, CRK, PHKA1, PIK3CD, PYGL, RAPGEF1, SHC1 Fc gamma R-mediated phagocytosis 3 CRK, DOCK2, PIK3CD Chronic myeloid leukemia2 ABL1, CBLB, CRK, PIK3CD, SHC1 Bacterial invasion of epithelial cells 2 CBLB, CRK, MET, PIK3CD, SHC1 Neurotrophin signaling pathway2 ABL1, CRK, JUN, PIK3CD, RAPGEF1, SHC1 Rheumatoid arthritis 1 FLT1, JUN, PGF Non-small cell lung cancer1 ERBB2, PIK3CD Renal cell carcinoma 1 CRK, JUN, MET, PGF, PIK3CD, RAPGEF1 Toxoplasmosis1 GNAO1, PIK3CD Complement and coagulation cascades 1 F12, PLG VEGF signaling pathway1 KDR, PIK3CD ErbB signaling pathway 1 ABL1, CBLB, CRK, ERBB2, JUN, PIK3CD, SHC1 Drug metabolism - cytochrome P4501 ALDH3B2, GSTM3 Starch and sucrose metabolism 1 GBE1, PYGL Arginine and proline metabolism1 ALDH3A2, ALDH4A1, GLUD1 Valine, leucine and isoleucine degradation 1 ALDH3A2, HIBADH Alanine, aspartate and glutamate metabolism1 ALDH4A1, GLUD1

37 10.1.1 GO enrichment of the candidate pathway

Table 21: Enriched Gene Ontology terms [1] (FDR corrected p ≤ 0.01). Ratio is the proportion of the annotated genes among the whole gene set. List is sorted based on the FDR corrected p-values. Green and blue borders are referring to up and down regulated genes, respectively.

Ratio Type Description Genes 0.510 BP protein phosphorylation ABL1, ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, CCND3, CRK, ERBB2, FLT1, FLT4, FZD8, IGF1R, IL6R, JUN, KDR, MET, PDGFRA, PDGFRB, PHKA1, PIK3CD, PIM1, RAPGEF1, SHC1, STAT4 0.373 BP transmembrane receptor protein tyrosine ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, CRK, ERBB2, FLT1, FLT4, IGF1R, ITPR3, KDR, kinase signaling pathway MET, PDGFRA, PDGFRB, PGF, RAPGEF1, SHC1 0.373 BP regulation of protein phosphorylation ABL1, ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, CCND3, CRK, ERBB2, FLT1, FZD8, IL6R, JUN, MET, PDGFRB, PIM1, RAPGEF1, SHC1 0.294 BP positive regulation of protein kinase ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, CCND3, CRK, ERBB2, FLT1, IL6R, MET, PIM1, activity RAPGEF1, SHC1 0.529 BP intracellular signal transduction ABL1, ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, CBLB, CRK, ERBB2, FLT1, GNAO1, IGF1R, IL6R, JUN, MET, NCAM1, PIAS1, PIK3C2A, PIK3CD, PLCB1, PLCB2, PLCB3, PLCB4, RAPGEF1, SHC1, STAT4 0.115 MF adenylate cyclase activity ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9 0.569 BP response to chemical stimulus ABL1, ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, CRK, DOCK2, ERBB2, F12, FLT1, GNAO1, GSTM3, IGF1R, IL6R, ITPR3, JUN, KDR, MET, NCAM1, PDGFRA, PDGFRB, PGF, PIAS1, PLCB2, SHC1, STAT4, STAT6 0.154 MF transmembrane receptor protein tyrosine ERBB2, FLT1, FLT4, IGF1R, KDR, MET, PDGFRA, PDGFRB kinase activity 0.118 BP activation of protein kinase A activity ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9 0.431 BP response to organic substance ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, CRK, ERBB2, F12, GNAO1, GSTM3, IGF1R, IL6R, JUN, MET, NCAM1, PDGFRA, PGF, PIAS1, SHC1, STAT4, STAT6 0.196 BP energy reserve metabolic process ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, GBE1, ITPR3, PHKA1, PYGL 0.096 MF vascular endothelial growth factor FLT1, FLT4, KDR, PDGFRA, PDGFRB receptor activity 0.423 MF adenyl ribonucleotide binding ABL1, ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, ERBB2, FLT1, FLT4, GLUD1, IGF1R, KDR, MET, NTPCR, PDE2A, PDGFRA, PDGFRB, PIK3C2A, PIK3CD, PIM1, PYGL 0.314 BP cellular response to organic substance ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, CRK, IGF1R, IL6R, MET, NCAM1, PGF, PIAS1, SHC1, STAT4, STAT6 0.314 BP response to endogenous stimulus ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, CRK, ERBB2, GNAO1, GSTM3, IGF1R, IL6R, MET, PDGFRA, PGF, SHC1 0.404 MF ATP binding ABL1, ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, ERBB2, FLT1, FLT4, GLUD1, IGF1R, KDR, MET, NTPCR, PDGFRA, PDGFRB, PIK3C2A, PIK3CD, PIM1, PYGL 0.294 BP response to hormone stimulus ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, CRK, ERBB2, GSTM3, IGF1R, IL6R, MET, PDGFRA, PGF, SHC1 0.154 MF growth factor binding ERBB2, FLT1, FLT4, IGF1R, IL6R, KDR, PDGFRA, PDGFRB 0.157 BP activation of phospholipase C activity ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, ITPR3, PLCB2 0.157 BP protein autophosphorylation ERBB2, IGF1R, JUN, MET, PDGFRA, PDGFRB, PHKA1, PIM1 0.118 BP water transport ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9 0.196 BP nerve growth factor receptor signaling ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, CRK, ITPR3, RAPGEF1, SHC1 pathway 0.118 BP cellular response to glucagon stimulus ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9 0.216 BP second-messenger-mediated signaling ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, ERBB2, IGF1R, NCAM1, PIK3C2A, PIK3CD 0.118 BP inhibition of adenylate cyclase activity by ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9 G-protein signaling pathway 0.118 BP activation of adenylate cyclase activity by ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9 G-protein signaling pathway 0.373 BP cell proliferation CBLB, CCND3, DOCK2, ERBB2, FLT1, FLT4, IGF1R, IL6R, JUN, KDR, MET, PDGFRA, PDGFRB, PGF, PIM1, PLG, SHC1, STAT4, STAT6 0.600 CC plasma membrane ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, CRK, ERBB2, F12, FLT1, FLT4, FZD8, GNAO1, IGF1R, IL6R, ITPR3, KDR, MET, NCAM1, NCAM2, PDGFRA, PDGFRB, PHKA1, PIK3C2A, PIK3CD, PIM1, PLCB2, PLCB3, PLG, SHC1 0.231 MF protein serine/threonine kinase activity ABL1, CCND3, ERBB2, FLT1, FLT4, IGF1R, KDR, MET, PDGFRA, PDGFRB, PHKA1, PIM1 0.098 BP vascular endothelial growth factor FLT1, FLT4, KDR, PDGFRA, PGF receptor signaling pathway 0.412 BP cell communication ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, CBLB, ERBB2, FLT1, GLUD1, IGF1R, IL6R, ITPR3, JUN, MET, NCAM1, PGF, PIK3C2A, PLCB1, PLCB2, PLCB3 0.725 BP multicellular organismal process ABL1, ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, ALDH3A2, CBLB, CRK, DOCK2, ERBB2, F12, FLT1, FLT4, FZD8, GNAO1, GSTM3, IGF1R, IL6R, ITPR3, JUN, KDR, MET, NCAM1, PDE2A, PDGFRA, PDGFRB, PGF, PIM1, PLCB1, PLCB2, PLCB3, PLG, RAPGEF1, SHC1, STAT6 0.098 BP positive regulation of DNA replication IGF1R, JUN, MET, PDGFRA, SHC1 0.176 BP response to cytokine stimulus GNAO1, IL6R, JUN, MET, NCAM1, PDGFRA, PIAS1, STAT4, STAT6 0.216 BP positive regulation of cell proliferation ERBB2, FLT1, FLT4, IGF1R, IL6R, JUN, KDR, PDGFRA, PDGFRB, PGF, SHC1 0.077 MF phosphatidylinositol phospholipase C PLCB1, PLCB2, PLCB3, PLCB4 activity 0.231 MF identical protein binding ERBB2, FLT1, GLUD1, GSTM3, IGF1R, IL6R, JUN, PDE2A, PDGFRA, PGF, PLCB1, PYGL 0.078 BP positive regulation of smooth muscle cell FLT1, IL6R, JUN, SHC1 proliferation 0.176 BP MAPKKK cascade CRK, ERBB2, FLT1, IGF1R, IL6R, JUN, MET, RAPGEF1, SHC1 0.058 MF aldehyde dehydrogenase [NAD(P)+] ALDH3A2, ALDH3B2, ALDH4A1 activity 0.176 BP vasculature development ERBB2, FLT1, FLT4, FZD8, JUN, KDR, PGF, RAPGEF1, SHC1 0.255 BP cell-cell signaling ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, GLUD1, ITPR3, MET, PGF, PLCB1, PLCB2, PLCB3 0.216 BP cell activation CBLB, CCND3, DOCK2, ERBB2, FZD8, ITPR3, PDE2A, PDGFRA, PIK3CD, PLG, STAT6 0.118 BP positive regulation of cell migration FLT1, IGF1R, IL6R, KDR, PDGFRA, PDGFRB 0.196 BP transmission of nerve impulse ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, ERBB2, PLCB1, PLCB2, PLCB3 0.058 MF phosphatidylinositol 3-kinase binding IGF1R, MET, PDGFRA 0.340 CC cytosol ABL1, CRK, DOCK2, GBE1, JUN, PDE2A, PHKA1, PIK3C2A, PIK3CD, PLCB1, PLCB2, PLCB3, PLCB4, PYGL, RAPGEF1, SHC1, STAT6 0.176 BP synaptic transmission ADCY2, ADCY4, ADCY5, ADCY6, ADCY7, ADCY9, PLCB1, PLCB2, PLCB3 0.333 BP regulation of response to stimulus CBLB, CRK, DOCK2, ERBB2, F12, FLT1, IGF1R, IL6R, JUN, KDR, MET, NCAM1, PIAS1, PLG, RAPGEF1, SHC1, STAT6 0.231 MF receptor binding ADCY6, DOCK2, ERBB2, GNAO1, IGF1R, IL6R, KDR, PDGFRA, PDGFRB, PGF, PIAS1, SHC1 0.038 MF platelet-derived growth factor receptor PDGFRA, PDGFRB activity 0.220 CC cell projection ADCY2, ADCY4, ADCY6, ERBB2, FZD8, IGF1R, ITPR3, MET, NCAM1, NCAM2, PLCB4 0.118 BP cellular response to cytokine stimulus IL6R, MET, NCAM1, PIAS1, STAT4, STAT6 0.059 BP platelet-derived growth factor receptor PDGFRA, PDGFRB, RAPGEF1 signaling pathway 0.039 BP positive regulation of fibrinolysis F12, PLG 0.077 MF growth factor receptor binding IL6R, PDGFRA, PDGFRB, SHC1 0.078 BP phosphatidylinositol-mediated signaling ERBB2, IGF1R, PIK3C2A, PIK3CD 0.038 MF 3-chloroallyl aldehyde dehydrogenase ALDH3A2, ALDH3B2 activity 0.038 MF calcium- and calmodulin-responsive ADCY2, ADCY6 adenylate cyclase activity 0.173 MF protein dimerization activity ERBB2, IL6R, JUN, MET, PDE2A, PDGFRA, PGF, PLCB1, PYGL 0.275 BP immune system process CBLB, CCND3, DOCK2, ERBB2, F12, FZD8, IGF1R, IL6R, JUN, NCAM1, PIAS1, PIK3CD, SHC1, STAT6 0.115 MF protein complex binding ADCY2, DOCK2, GNAO1, IGF1R, KDR, SHC1 0.137 BP blood vessel development ERBB2, FLT1, JUN, KDR, PGF, RAPGEF1, SHC1 0.137 BP lymphocyte activation CBLB, CCND3, DOCK2, ERBB2, FZD8, PIK3CD, STAT6 0.098 BP positive regulation of protein CCND3, ERBB2, MET, PIM1, SHC1 serine/threonine kinase activity 0.118 BP angiogenesis ERBB2, FLT1, JUN, KDR, PGF, SHC1 0.078 BP T cell proliferation CBLB, CCND3, DOCK2, ERBB2 0.137 BP positive regulation of protein metabolic CBLB, CCND3, F12, FZD8, IL6R, MET, PIAS1 process 0.039 BP cellular response to interleukin-6 IL6R, MET 0.039 BP sensory perception of bitter taste ITPR3, PLCB2 0.038 MF aldehyde dehydrogenase (NAD) activity ALDH3A2, ALDH4A1 0.135 MF protein homodimerization activity IL6R, JUN, PDE2A, PDGFRA, PGF, PLCB1, PYGL 0.157 BP chemotaxis ABL1, DOCK2, ERBB2, IL6R, KDR, MET, NCAM1, PDGFRB 0.059 BP activation of MAPKK activity CRK, FLT1, RAPGEF1 0.140 CC neuron projection ADCY2, ADCY4, IGF1R, MET, NCAM1, NCAM2, PLCB4 0.137 BP behavior ADCY5, GNAO1, IL6R, JUN, KDR, MET, PLCB1 Continued on next page. . .

38 Ratio Type Description Genes 0.038 MF 1-phosphatidylinositol-3-kinase activity PIK3C2A, PIK3CD 0.098 BP cytokine-mediated signaling pathway IL6R, NCAM1, PIAS1, STAT4, STAT6 0.038 MF platelet-derived growth factor binding PDGFRA, PDGFRB 0.038 MF receptor signaling protein tyrosine kinase ERBB2, KDR activity 0.118 BP response to inorganic substance GNAO1, ITPR3, JUN, MET, PDGFRA, SHC1 0.040 CC phosphatidylinositol 3-kinase complex PIK3C2A, PIK3CD 0.220 CC cell fraction ADCY2, ADCY6, ADCY7, GNAO1, GSTM3, IGF1R, ITPR3, MET, PLCB1, PLCB4, PYGL 0.038 MF Hsp90 protein binding ERBB2, KDR 0.157 BP neuron projection development ABL1, ERBB2, FZD8, GNAO1, IGF1R, MET, NCAM1, SHC1 0.059 BP T cell differentiation in DOCK2, ERBB2, FZD8 0.039 BP positive regulation of cyclin-dependent CCND3, PIM1 protein kinase activity 0.059 BP peripheral nervous system development ALDH3A2, ERBB2, GSTM3 0.059 BP glycogen metabolic process GBE1, PHKA1, PYGL 0.078 BP peptidyl-tyrosine phosphorylation ABL1, IL6R, PDGFRA, PDGFRB 0.038 MF platelet-derived growth factor receptor PDGFRA, PDGFRB binding 0.180 CC membrane fraction ADCY2, ADCY6, ADCY7, GNAO1, IGF1R, ITPR3, MET, PLCB1, PLCB4 0.300 CC plasma membrane part ADCY6, ADCY9, ERBB2, FLT1, FLT4, IGF1R, IL6R, ITPR3, KDR, MET, NCAM1, PDGFRA, PDGFRB, PLG, SHC1

cellular_component

plasma membrane cytosol cell projection cell fraction

plasma membrane phosphatidylinositol neuron projection membrane fraction part 3-kinase complex

Figure 13: Relationships between the enriched cellular component Gene Ontology terms that were listed in Table 21. The darkness of the red reflects the significance of the enrichment and the thicknesses of the edges are propotional to the numbers of genes sharing the following annotation.

molecular_function

transmembrane phosphatidylinositol 3-chloroallyl receptor signaling adenylate cyclase receptor protein adenyl ribonucleotide growth factor protein serine/threonine aldehyde dehydrogenase protein dimerization phosphatidylinositol identical protein protein complex 1-phosphatidylinositol-3-kinase Hsp90 protein phospholipase aldehyde dehydrogenase receptor binding protein tyrosine activity tyrosine kinase binding binding kinase activity [NAD(P)+] activity activity 3-kinase binding binding binding activity binding C activity activity kinase activity activity

calcium- and vascular endothelial platelet-derived platelet-derived calmodulin-responsive aldehyde dehydrogenase protein homodimerization growth factor growth factor growth factor ATP binding growth factor adenylate cyclase (NAD) activity activity receptor binding receptor activity receptor activity binding activity

platelet-derived growth factor receptor binding

Figure 14: Relationships between the enriched molecular function Gene Ontology terms that were listed in Table 21. The darkness of the red reflects the significance of the enrichment and the thicknesses of the edges are propotional to the numbers of genes sharing the following annotation.

39 angiogenesis

blood vessel development

positive regulation of smooth muscle cell proliferation

vasculature neuron projection development synaptic transmission development

peripheral sensory perception nervous system of bitter taste development

T cell differentiation positive regulation in thymus of cell proliferation transmission of nerve impulse T cell proliferation

cell-cell signaling multicellular organismal process

lymphocyte positive regulation activation of fibrinolysis cell proliferation

immune system cell communication process platelet-derived inhibition growth factor of adenylate cell activation receptor signaling cyclase activity pathway by G-protein signaling pathway regulation nerve growth of response factor receptor activation to stimulus signaling pathway of adenylate transmembrane cyclase activity water transport receptor protein by G-protein tyrosine kinase signaling pathway signaling pathway second-messenger-mediated vascular endothelial signaling growth factor biological_process receptor signaling pathway behavior phosphatidylinositol-mediated intracellular positive regulation signaling signal transduction of protein metabolic process

positive regulation activation of DNA replication of phospholipase C activity

positive regulation MAPKKK cascade of cell migration response to endogenous stimulus response to energy reserve chemical stimulus protein phosphorylation metabolic process chemotaxis

protein autophosphorylation

response to activation inorganic substance of MAPKK activity

regulation response to of protein hormone stimulus phosphorylation peptidyl-tyrosine glycogen metabolic response to phosphorylation process organic substance

positive regulation of protein cellular response kinase activity to glucagon stimulus cellular response to organic substance activation of protein kinase A activity response to cytokine stimulus

positive regulation of protein serine/threonine kinase activity cellular response to cytokine stimulus

positive regulation cellular response of cyclin-dependent to interleukin-6 protein kinase activity

cytokine-mediated signaling pathway

Figure 15: Relationships between the enriched biological process Gene Ontology terms that were listed in Table 21. The darkness of the red reflects the significance of the enrichment and the thicknesses of the edges are propotional to the numbers of genes sharing the following annotation.

40 10.2 Candidate genes

Table 22: Descriptions of the candidate genes. Studies that have reported results about the candidate genes are listed so that those with negative evidence have been prefixed with a hyphen. S column contains an at sign if the gene is part of the candidate pathway. The statuses of the genes are shown as: a=absent, d=down regulated, u=up regulated, s=stable. This table has 285 rows.

S name locus description studies u ABCC8 11:17414432-17498449 ATP-binding cassette, sub-family C (CFTR/MRP), member 8 [Source:HGNC Symbol;Acc:59], snp3dDiabetes, 11p15.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron, tcgaGliomaGE GO=[sulfonylurea receptor activity; positive regulation of potassium ion transport; potassium ion transmembrane transporter activity; response to pH; negative regulation of insulin secretion; syntaxin binding; cellular potassium ion transport; potassium ion transmembrane transport; response to zinc ion; synaptic vesicle membrane; negative regulation of hormone secretion; sarcolemma; clathrin coated vesicle membrane; ATPase activity, coupled to transmembrane movement of substances; regulation of insulin secretion; potassium channel activity; insulin secretion; clathrin-coated vesicle; regulation of hormone secretion; energy reserve metabolic process; response to lipopolysaccharide; coated vesicle; hormone secretion; cytoplasmic vesicle membrane; vesicle membrane; response to insulin stimulus; cytoplasmic vesicle part; response to drug; response to peptide hormone stimulus; generation of precursor metabolites and energy; response to hormone stimulus; membrane-bounded vesicle; cytoplasmic vesicle; vesicle] u AC073873.1 7:70256904-70257107 [undefined], type=protein coding 7q11.22 u ACAA1 3:38164201-38178733 acetyl-CoA acyltransferase 1 [Source:HGNC Symbol;Acc:82], 3p22.2 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[acetyl-CoA C-acyltransferase activity; fatty acid beta-oxidation using acyl-CoA oxidase; peroxisomal matrix; generation of precursor metabolites and energy; lipid biosynthetic process] u ACPL2 3:140947568-141013748 acid phosphatase-like 2 [Source:HGNC Symbol;Acc:26303], 3q23 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[acid phosphatase activity] u ACTA2 10:90694831-90751147 actin, alpha 2, smooth muscle, aorta [Source:HGNC Symbol;Acc:130], tcgaBreastGE, 10q23.31 type=processed transcript,protein coding, GO=[smooth muscle contractile fiber; vascular smooth tcgaGliomaGE, muscle contraction; vasoconstriction; regulation of anatomical structure size] tcgaOvarianGE, tscapeCRCd, tscapeSCLCd u ADAM9 8:38854388-38962663 ADAM metallopeptidase domain 9 [Source:HGNC Symbol;Acc:216], tcgaGliomaGE, 8p11.22 type=nonsense mediated decay,processed transcript,protein coding,retained intron, tscapeCRCd, tscapeHCCd, GO=[regulation of macrophage fusion; positive regulation of macrophage fusion; macrophage tscapeNSCLCd fusion; PMA-inducible membrane protein ectodomain ; regulation of keratinocyte migration; positive regulation of keratinocyte migration; monocyte activation; intrinsic to external side of plasma membrane; positive regulation of cell adhesion mediated by integrin; cell-cell adhesion mediated by integrin; positive regulation of membrane protein ectodomain proteolysis; response to manganese ion; syncytium formation by plasma membrane fusion; laminin binding; protein kinase C binding; extracellular matrix binding; collagen binding; activation of MAPKK activity; response to ; positive regulation of protein secretion; positive regulation of protein catabolic process; integrin-mediated signaling pathway; integrin binding; response to calcium ion; response to hydrogen peroxide; keratinocyte differentiation; metalloendopeptidase activity; SH3 domain binding; response to reactive oxygen species; transforming growth factor beta receptor signaling pathway; response to glucocorticoid stimulus; cell-matrix adhesion; response to corticosteroid stimulus; activation of protein kinase activity; metallopeptidase activity; transmembrane receptor protein serine/threonine kinase signaling pathway; visual perception; epidermis development; response to steroid hormone stimulus; positive regulation of protein kinase activity; regulation of anatomical structure morphogenesis; cell-cell adhesion; endopeptidase activity; MAPKKK cascade; regulation of protein kinase activity; positive regulation of developmental process; regulation of protein phosphorylation; response to hormone stimulus; regulation of phosphorylation; cell activation; regulation of phosphate metabolic process; extracellular space] d ADRA2C 4:3768075-3770251 adrenergic, alpha-2C-, receptor [Source:HGNC Symbol;Acc:283], type=protein coding, 4p16.3 GO=[alpha2-adrenergic receptor activity; adrenergic receptor signaling pathway; activation of MAPK activity by adrenergic receptor signaling pathway; epidermal growth factor receptor transactivation by G-protein coupled receptor signaling pathway; receptor transactivation; alpha-2A adrenergic receptor binding; negative regulation of norepinephrine secretion; epinephrine binding; negative regulation of epinephrine secretion; activation of protein kinase B activity; regulation of epidermal growth factor receptor activity; positive regulation of neuron differentiation; regulation of insulin secretion; insulin secretion; regulation of hormone secretion; energy reserve metabolic process; activation of protein kinase activity; hormone secretion; axon; protein heterodimerization activity; regulation of protein serine/threonine kinase activity; positive regulation of protein kinase activity; MAPKKK cascade; protein homodimerization activity; generation of precursor metabolites and energy; regulation of protein kinase activity; positive regulation of developmental process; blood coagulation; regulation of body fluid levels; regulation of protein phosphorylation; regulation of phosphorylation; regulation of phosphate metabolic process] d AIDA 1:222841355-222886552 axin interactor, dorsalization associated [Source:HGNC Symbol;Acc:25761], 1q41 type=processed transcript,protein coding, GO=[negative regulation of JUN kinase activity; regulation of protein homodimerization activity; negative regulation of JNK cascade; negative regulation of stress-activated protein kinase signaling cascade; negative regulation of MAP kinase activity; dorsal/ventral pattern formation; negative regulation of kinase activity; regulation of protein serine/threonine kinase activity; MAPKKK cascade; regulation of protein kinase activity; regulation of protein phosphorylation; regulation of phosphorylation; regulation of phosphate metabolic process] d AIF1L 9:133971863-133998539 allograft inflammatory factor 1-like [Source:HGNC Symbol;Acc:28904], tcgaBreastGE 9q34.12 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[ruffle membrane; actin filament; actin filament binding; focal adhesion; cell-substrate junction; basolateral plasma membrane; calcium ion binding] u AL590369.1 9:132500629-132501902 [undefined], type=pseudogene 9q34.11 u AL662827.3 Ral guanine nucleotide dissociation stimulator-like 2 [Source:UniProtKB/Swiss-Prot;Acc:O15211], type=processed transcript,protein coding,retained intron, GO=[Ras guanyl-nucleotide exchange factor activity; Ras protein signal transduction; regulation of small GTPase mediated signal transduction] u AL662827.4 Beta-1,3-galactosyltransferase 4 [Source:UniProtKB/Swiss-Prot;Acc:O96024], type=protein coding, GO=[ganglioside galactosyltransferase activity; UDP-galactose:beta-N-acetylglucosamine beta-1,3-galactosyltransferase activity; protein glycosylation; Golgi membrane; Golgi apparatus part] u AL844527.3 Beta-1,3-galactosyltransferase 4 [Source:UniProtKB/Swiss-Prot;Acc:O96024], type=protein coding, GO=[ganglioside galactosyltransferase activity; UDP-galactose:beta-N-acetylglucosamine beta-1,3-galactosyltransferase activity; protein glycosylation; Golgi membrane; Golgi apparatus part] d ALCAM 3:105085753-105295744 activated leukocyte cell adhesion molecule [Source:HGNC Symbol;Acc:400], 3q13.11 type=processed transcript,protein coding,retained intron, GO=[motor axon guidance; axon] u* ALDH3A2 17:19551459-19580909 aldehyde dehydrogenase 3 family, member A2 [Source:HGNC Symbol;Acc:403], tcgaGliomaGE, tscapeBCa 17p11.2 type=processed transcript,protein coding, GO=[3-chloroallyl aldehyde dehydrogenase activity; aldehyde dehydrogenase (NAD) activity; aldehyde dehydrogenase [NAD(P)+] activity; cellular aldehyde metabolic process; epidermis development; mitochondrial part] d* ALDH3B2 11:67429633-67448671 aldehyde dehydrogenase 3 family, member B2 [Source:HGNC Symbol;Acc:411], tscapeSCLCa 11q13.2 type=processed transcript,protein coding,retained intron, GO=[3-chloroallyl aldehyde dehydrogenase activity; aldehyde dehydrogenase [NAD(P)+] activity; cellular aldehyde metabolic process] d* ALDH4A1 1:19197926-19229275 aldehyde dehydrogenase 4 family, member A1 [Source:HGNC Symbol;Acc:406], tscapeBCd, tscapeCRCd, 1p36.13 type=protein coding, GO=[1-pyrroline-5-carboxylate dehydrogenase activity; proline catabolic tscapeNSCLCd, process; proline biosynthetic process; aldehyde dehydrogenase (NAD) activity; aldehyde tscapeOvariand, dehydrogenase [NAD(P)+] activity; glutamine family amino acid biosynthetic process; glutamine tscapeRCCd family amino acid catabolic process; electron carrier activity; mitochondrial matrix; mitochondrial part] d ALG13 X:110909043-111003877 asparagine-linked glycosylation 13 homolog (S. cerevisiae) [Source:HGNC Symbol;Acc:30881], Xq23 type=processed transcript,protein coding, GO=[N-acetylglucosaminyldiphosphodolichol N-acetylglucosaminyltransferase activity; lipid glycosylation; dolichol-linked oligosaccharide biosynthetic process; protein N-linked glycosylation via asparagine; peptidyl-asparagine modification; protein glycosylation; post-translational protein modification; carbohydrate binding] Continued on next page. . .

41 S name locus description studies d AMOT X:112017731-112084043 angiomotin [Source:HGNC Symbol;Acc:17810], type=processed transcript,protein coding, tcgaBreastGE, Xq23 GO=[angiostatin binding; positive regulation of embryonic development; negative regulation of tcgaGliomaGE vascular permeability; cell migration involved in gastrulation; positive regulation of blood vessel endothelial cell migration; positive regulation of stress fiber assembly; gastrulation with mouth forming second; stress fiber; negative regulation of angiogenesis; actin filament; cell-cell junction assembly; vasculogenesis; actin filament bundle assembly; positive regulation of cell size; tight junction; lamellipodium; cell-cell junction; angiogenesis; in utero embryonic development; actin cytoskeleton organization; regulation of cell size; regulation of anatomical structure morphogenesis; actin filament-based process; blood vessel development; vasculature development; regulation of cellular component size; positive regulation of developmental process; regulation of anatomical structure size; membrane-bounded vesicle; cytoplasmic vesicle; vesicle] d ANKRD43 5:132149033-132152488 ankyrin repeat domain 43 [Source:HGNC Symbol;Acc:27033], type=protein coding tscapeBCd, tscapeNSCLCd 5q31.1 u ANXA5 4:122589110-122618268 annexin A5 [Source:HGNC Symbol;Acc:543], tcgaGliomaGE 4q27 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[phospholipase inhibitor activity; eukaryotic cell surface binding; intercalated disc; receptor tyrosine kinase binding; calcium-dependent phospholipid binding; regulation of coagulation; sarcolemma; protein homooligomerization; cell-cell junction; blood coagulation; positive regulation of apoptosis; positive regulation of cell death; regulation of body fluid levels; calcium ion binding] d ANXA9 1:150954493-150968110 annexin A9 [Source:HGNC Symbol;Acc:547], type=processed transcript,protein coding, tcgaBreastGE, tscapeHCCa, 1q21.3 GO=[phosphatidylserine binding; acetylcholine receptor activity; calcium-dependent phospholipid tscapeMelanomaa, binding; cell-cell adhesion; protein homodimerization activity; calcium ion binding] tscapeNSCLCa, tscapeOvariana, tscapeSCLCa u AP1S2 X:15843929-15873054 adaptor-related protein complex 1, sigma 2 subunit [Source:HGNC Symbol;Acc:560], tscapeHCCa Xp22.2 type=processed transcript,protein coding, GO=[regulation of defense response to virus by virus; AP-type membrane coat adaptor complex; coated pit; protein transporter activity; lysosomal membrane; vacuolar membrane; vacuolar part; cytoplasmic vesicle membrane; vesicle membrane; cytoplasmic vesicle part; Golgi membrane; Golgi apparatus part; membrane-bounded vesicle; cytoplasmic vesicle; vesicle] d AP3M2 8:42010464-42029191 adaptor-related protein complex 3, mu 2 subunit [Source:HGNC Symbol;Acc:570], tcgaBreastGE, 8p11.21 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[clathrin tcgaGliomaGE, adaptor complex; AP-type membrane coat adaptor complex] tscapeCRCa, tscapeHCCd, tscapeNSCLCa, tscapeNSCLCd, tscapeOvariana, tscapeProstatea, tscapeRCCd, tscapeSCLCa u ARHGAP44 17:12692829-12894960 Rho GTPase activating protein 44 [Source:HGNC Symbol;Acc:29096], type=protein coding, tcgaGliomaGE 17p12 GO=[GTPase activator activity; regulation of small GTPase mediated signal transduction] u ARRDC1 9:140500106-140509812 arrestin domain containing 1 [Source:HGNC Symbol;Acc:28633], tcgaBreastGE, tscapeCRCa 9q34.3 type=nonsense mediated decay,processed transcript,protein coding d ASS1 9:133320094-133376661 argininosuccinate synthase 1 [Source:HGNC Symbol;Acc:758], tcgaGliomaGE 9q34.11 type=processed transcript,protein coding, GO=[argininosuccinate synthase activity; cellular response to ammonium ion; argininosuccinate metabolic process; cellular response to oleic acid; response to mycotoxin; cell body fiber; diaphragm development; arginine biosynthetic process; cellular response to dexamethasone stimulus; toxin binding; cellular response to cAMP; urea cycle; midgut development; cellular response to amino acid stimulus; cellular response to acid; glutamine family amino acid biosynthetic process; response to growth hormone stimulus; response to zinc ion; cellular response to tumor necrosis factor; perikaryon; cellular response to glucagon stimulus; response to amino acid stimulus; acute-phase response; cellular response to lipopolysaccharide; response to lipid; response to tumor necrosis factor; response to cAMP; cellular response to interferon-gamma; liver development; response to amine stimulus; response to interferon-gamma; acute inflammatory response; mitochondrial outer membrane; response to estradiol stimulus; response to organic nitrogen; response to glucocorticoid stimulus; response to corticosteroid stimulus; response to estrogen stimulus; response to lipopolysaccharide; response to steroid hormone stimulus; cellular response to hormone stimulus; response to drug; response to peptide hormone stimulus; mitochondrial part; response to hormone stimulus] u AUTS2 7:69063905-70258054 autism susceptibility candidate 2 [Source:HGNC Symbol;Acc:14262], tscapeBCd, tscapeNSCLCd 7q11.22 type=nonsense mediated decay,processed transcript,protein coding,retained intron u B3GALT4 6:33244917-33246602 UDP-Gal:betaGlcNAc beta 1,3-galactosyltransferase, polypeptide 4 [Source:HGNC 6p21.32 Symbol;Acc:919], type=protein coding, GO=[ganglioside galactosyltransferase activity; UDP-galactose:beta-N-acetylglucosamine beta-1,3-galactosyltransferase activity; protein glycosylation; Golgi membrane; Golgi apparatus part] d B4GALNT4 11:369796-382116 beta-1,4-N-acetyl-galactosaminyl 4 [Source:HGNC Symbol;Acc:26315], tcgaBreastGE, 11p15.5 type=nonsense mediated decay,protein coding,retained intron, tcgaGliomaGE, tscapeBCd, GO=[N-acetyl-beta-glucosaminyl-glycoprotein 4-beta-N-acetylgalactosaminyltransferase activity; tscapeNSCLCd, Golgi cisterna membrane; Golgi membrane; Golgi apparatus part] tscapeOvariand d BARD1 2:215590370-215674428 BRCA1 associated RING domain 1 [Source:HGNC Symbol;Acc:952], tcgaBreastGE, tscapeRCCd 2q35 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[negative regulation of mRNA 3’-end processing; BRCA1-BARD1 complex; negative regulation of protein export from nucleus; regulation of mRNA 3’-end processing; BRCA1-A complex; protein K6-linked ubiquitination; positive regulation of protein catabolic process; ubiquitin-protein ligase activity; protein heterodimerization activity; spermatogenesis; negative regulation of cell cycle; protein homodimerization activity; positive regulation of apoptosis; positive regulation of cell death; regulation of phosphorylation; regulation of phosphate metabolic process; DNA metabolic process] d BTG2 1:203274619-203278730 BTG family, member 2 [Source:HGNC Symbol;Acc:1131], tscapeProstatea 1q32.1 type=nonsense mediated decay,protein coding, GO=[positive regulation of nuclear-transcribed mRNA poly(A) tail shortening; regulation of nuclear-transcribed mRNA poly(A) tail shortening; regulation of mRNA 3’-end processing; negative regulation of neural precursor cell proliferation; dentate gyrus development; response to electrical stimulus; positive regulation of anti-apoptosis; central nervous system neuron development; associative learning; learning; protein methylation; response to mechanical stimulus; response to organic nitrogen; anterior/posterior pattern formation; response to organic cyclic compound; response to peptide hormone stimulus; response to hormone stimulus; DNA metabolic process] u BX000343.1 Ral guanine nucleotide dissociation stimulator-like 2 [Source:UniProtKB/Swiss-Prot;Acc:O15211], type=processed transcript,protein coding,retained intron, GO=[Ras guanyl-nucleotide exchange factor activity; Ras protein signal transduction; regulation of small GTPase mediated signal transduction] d C11orf52 11:111788756- chromosome 11 open reading frame 52 [Source:HGNC Symbol;Acc:30531], tscapeBCd, 111797596 type=processed transcript,protein coding,retained intron tscapeMelanomad 11q23.1 u C11orf75 11:93211638-93276674 chromosome 11 open reading frame 75 [Source:HGNC Symbol;Acc:24810], type=protein coding 11q21 u C16orf93 16:30768744-30774031 open reading frame 93 [Source:HGNC Symbol;Acc:28078], tcgaBreastGE 16p11.2 type=nonsense mediated decay,processed transcript,protein coding,retained intron u C17orf91 17:1614805-1619504 chromosome 17 open reading frame 91 [Source:HGNC Symbol;Acc:28219], 17p13.3 type=lincRNA,processed transcript d C1orf115 1:220863187-220872499 chromosome 1 open reading frame 115 [Source:HGNC Symbol;Acc:25873], type=protein coding tcgaBreastGE, 1q41 tcgaGliomaGE u C6orf52 6:10671651-10695030 chromosome 6 open reading frame 52 [Source:HGNC Symbol;Acc:20881], tcgaBreastGE 6p24.2 type=processed transcript,protein coding d C7orf23 7:86825478-86849903 chromosome 7 open reading frame 23 [Source:HGNC Symbol;Acc:21707], tscapeNSCLCa 7q21.12 type=processed transcript,protein coding,retained intron u C7orf63 7:89874488-89940377 chromosome 7 open reading frame 63 [Source:HGNC Symbol;Acc:26107], 7q21.13 type=nonsense mediated decay,processed transcript,protein coding,retained intron d CA11 19:49141272-49149451 carbonic anhydrase XI [Source:HGNC Symbol;Acc:1370], type=protein coding, GO=[basolateral tcgaGliomaGE, tscapeBCd, 19q13.33 plasma membrane] tscapeNSCLCd, tscapeOvariand u CAMP 3:48264837-48266981 cathelicidin antimicrobial peptide [Source:HGNC Symbol;Acc:1472], type=protein coding, tcgaBreastGE 3p21.31 GO=[negative regulation of growth of symbiont on or near host surface; growth of symbiont on or near host; modulation of growth of symbiont on or near host; killing by host of symbiont cells; disruption by host of symbiont cells; specific granule; modification by host of symbiont morphology or physiology; defense response to Gram-negative bacterium; negative regulation of growth of symbiont in host; growth of symbiont in host; regulation of growth of symbiont in host; defense response to Gram-positive bacterium; membrane-bounded vesicle; cytoplasmic vesicle; vesicle] u CAPN2 1:223889295-223963720 calpain 2, (m/II) large subunit [Source:HGNC Symbol;Acc:1479], snp3dDementia, 1q41 type=processed transcript,protein coding, GO=[myoblast fusion; syncytium formation by plasma tcgaGliomaGE membrane fusion; calcium-dependent cysteine-type endopeptidase activity; blastocyst development; response to hypoxia; protein heterodimerization activity; chromatin; in utero embryonic development; endopeptidase activity; soluble fraction; calcium ion binding] Continued on next page. . .

42 S name locus description studies d* CBLB 3:105374305-105588396 Cas-Br-M (murine) ecotropic retroviral transforming sequence b [Source:HGNC Symbol;Acc:1542], tcgaGliomaGE 3q13.11 type=protein coding,retained intron, GO=[negative regulation of alpha-beta T cell proliferation; regulation of T cell anergy; positive regulation of T cell anergy; lymphocyte anergy; positive regulation of T cell tolerance induction; regulation of T cell tolerance induction; T cell tolerance induction; negative regulation of T cell receptor signaling pathway; negative regulation of antigen receptor-mediated signaling pathway; NLS-bearing substrate import into nucleus; positive regulation of protein catabolic process; calcium ion binding; cell activation] d CCDC14 3:123616152-123680564 coiled-coil domain containing 14 [Source:HGNC Symbol;Acc:25766], tcgaGliomaGE, 3q21.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron tscapeNSCLCd, tscapeProstated u CCDC17 1:46085716-46089729 coiled-coil domain containing 17 [Source:HGNC Symbol;Acc:26574], tscapeSCLCa 1p34.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron u* CCND3 6:41902671-42018095 cyclin D3 [Source:HGNC Symbol;Acc:1585], tscapeCRCa 6p21.1 type=processed transcript,protein coding,retained intron, GO=[positive regulation of cyclin-dependent protein kinase activity; cyclin-dependent protein kinase holoenzyme complex; cyclin-dependent protein kinase activity; regulation of cyclin-dependent protein kinase activity; regulation of protein serine/threonine kinase activity; positive regulation of protein kinase activity; regulation of protein kinase activity; regulation of protein phosphorylation; regulation of phosphorylation; cell activation; regulation of phosphate metabolic process] d CCNG2 4:78078304-78354542 cyclin G2 [Source:HGNC Symbol;Acc:1593], type=processed transcript,protein coding, tscapeHCCd 4q21.1 GO=[mitosis; nuclear division; negative regulation of cell cycle] u CD24P4 Y:21154139-21154595 CD24 molecule pseudogene 4 [Source:HGNC Symbol;Acc:1649], Yq11.222 type=processed pseudogene,pseudogene u CD68 17:7482922-7485431 CD68 molecule [Source:HGNC Symbol;Acc:1693], type=protein coding, GO=[lysosomal membrane; tcgaGliomaGE, 17p13.1 vacuolar membrane; vacuolar part; endosome membrane] tscapeProstated u CDC25A 3:48198636-48229892 cell division cycle 25 homolog A (S. pombe) [Source:HGNC Symbol;Acc:1725], tcgaBreastGE, 3p21.31 type=processed transcript,protein coding, GO=[cellular response to UV; regulation of tcgaGliomaGE cyclin-dependent protein kinase activity; G2/M transition of mitotic cell cycle; S phase of mitotic cell cycle; protein dephosphorylation; phosphoprotein phosphatase activity; DNA replication; regulation of protein serine/threonine kinase activity; mitosis; nuclear division; negative regulation of cell cycle; regulation of protein kinase activity; regulation of protein phosphorylation; regulation of phosphorylation; regulation of phosphate metabolic process; DNA metabolic process] u CDC42EP4 17:71279764-71308143 CDC42 effector protein (Rho GTPase binding) 4 [Source:HGNC Symbol;Acc:17147], tcgaGliomaGE 17q25.1 type=protein coding, GO=[positive regulation of pseudopodium assembly; GTP-Rho binding; Rho GTPase binding; regulation of cell shape; Ras protein signal transduction] u CDKN1C 11:2904443-2907111 cyclin-dependent kinase inhibitor 1C (p57, Kip2) [Source:HGNC Symbol;Acc:1786], snp3dGlioma, 11p15.4 type=protein coding,retained intron, GO=[cyclin-dependent protein kinase inhibitor activity; tcgaBreastGE, tscapeBCd, positive regulation of transforming growth factor beta receptor signaling pathway; neuron tscapeNSCLCd, maturation; G1 phase of mitotic cell cycle; regulation of cyclin-dependent protein kinase activity; tscapeOvariand negative regulation of epithelial cell proliferation; transforming growth factor beta receptor signaling pathway; negative regulation of kinase activity; negative regulation of gene-specific transcription; transmembrane receptor protein serine/threonine kinase signaling pathway; regulation of protein serine/threonine kinase activity; transcription repressor activity; transcription activator activity; negative regulation of cell cycle; regulation of protein kinase activity; regulation of protein phosphorylation; regulation of phosphorylation; regulation of phosphate metabolic process] d CDR2L 17:72983727-73001895 cerebellar degeneration-related protein 2-like [Source:HGNC Symbol;Acc:29999], tcgaBreastGE, 17q25.1 type=protein coding tscapeNSCLCa u CEP70 3:138213186-138313380 centrosomal protein 70kDa [Source:HGNC Symbol;Acc:29972], tcgaBreastGE 3q22.3 type=processed transcript,protein coding, GO=[G2/M transition of mitotic cell cycle] d CFD 19:859665-863606 complement factor D (adipsin) [Source:HGNC Symbol;Acc:2771], type=protein coding, tcgaBreastGE, 19p13.3 GO=[complement activation, alternative pathway; platelet alpha granule lumen; Notch signaling tcgaGliomaGE, tscapeBCd, pathway; platelet degranulation; serine-type endopeptidase activity; platelet activation; tscapeHCCd, cytoplasmic vesicle part; endopeptidase activity; blood coagulation; regulation of body fluid levels; tscapeNSCLCd, membrane-bounded vesicle; cell activation; cytoplasmic vesicle; vesicle; extracellular space] tscapeRCCd d CGN 1:151482986-151511168 cingulin [Source:HGNC Symbol;Acc:17429], type=protein coding,retained intron, GO=[myosin tcgaBreastGE, tscapeBCd, 1q21.3 complex; tight junction; cell-cell junction] tscapeHCCa, tscapeMelanomaa, tscapeOvariana d CGNL1 15:57668703-57842925 cingulin-like 1 [Source:HGNC Symbol;Acc:25931], type=protein coding, GO=[myosin complex; tscapeCRCd 15q21.3 tight junction; cell-cell junction] d CHTF18 16:838046-848074 CTF18, chromosome transmission fidelity factor 18 homolog (S. cerevisiae) [Source:HGNC tcgaBreastGE, 16p13.3 Symbol;Acc:18435], tcgaGliomaGE, tscapeHCCd type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[DNA replication; DNA metabolic process] u CITED1 X:71521488-71527037 Cbp/p300-interacting transactivator, with Glu/Asp-rich carboxy-terminal domain, 1 [Source:HGNC tcgaBreastGE Xq13.1 Symbol;Acc:1986], type=protein coding, GO=[response to interleukin-9; response to interleukin-11; negative regulation of mesenchymal to epithelial transition involved in metanephros morphogenesis; LBD domain binding; response to interleukin-2; response to parathyroid hormone stimulus; response to interleukin-4; spongiotrophoblast layer development; co-SMAD binding; melanin biosynthetic process; response to interleukin-6; SMAD protein signal transduction; positive regulation of transforming growth factor beta receptor signaling pathway; negative regulation of osteoblast differentiation; embryonic axis specification; melanocyte differentiation; response to interleukin-1; SMAD binding; branching involved in ureteric bud morphogenesis; vasculogenesis; regulation of osteoblast differentiation; response to cAMP; embryonic placenta development; negative regulation of Wnt receptor signaling pathway; response to interferon-gamma; osteoblast differentiation; transforming growth factor beta receptor signaling pathway; regulation of cell morphogenesis involved in differentiation; protein C-terminus binding; response to estrogen stimulus; response to lipopolysaccharide; promoter binding; transmembrane receptor protein serine/threonine kinase signaling pathway; transcription coactivator activity; response to insulin stimulus; in utero embryonic development; transcription repressor activity; response to steroid hormone stimulus; transcription activator activity; regulation of anatomical structure morphogenesis; response to peptide hormone stimulus; blood vessel development; vasculature development; response to hormone stimulus] u CLDN12 7:90013035-90142716 claudin 12 [Source:HGNC Symbol;Acc:2034], tcgaGliomaGE, 7q21.13 type=processed transcript,protein coding,retained intron, GO=[calcium-independent cell-cell tscapeNSCLCa adhesion; tight junction assembly; apical junction assembly; cell-cell junction assembly; tight junction; cell-cell junction; cell-cell adhesion] d CMBL 5:10275987-10308138 carboxymethylenebutenolidase homolog (Pseudomonas) [Source:HGNC Symbol;Acc:25090], 5p15.2 type=processed transcript,protein coding,retained intron d COL17A1 10:105791044- collagen, type XVII, alpha 1 [Source:HGNC Symbol;Acc:2194], tcgaBreastGE, tscapeBCd, 105845760 10q24.33, type=processed transcript,protein coding,retained intron, GO=[hemidesmosome; hemidesmosome tscapeCRCd 10q25.1 assembly; basement membrane; cell-substrate junction; cell-matrix adhesion; cell-cell junction; basolateral plasma membrane; epidermis development] u CR759786.11 Beta-1,3-galactosyltransferase 4 [Source:UniProtKB/Swiss-Prot;Acc:O96024], type=protein coding, GO=[ganglioside galactosyltransferase activity; UDP-galactose:beta-N-acetylglucosamine beta-1,3-galactosyltransferase activity; protein glycosylation; Golgi membrane; Golgi apparatus part] u CR759786.5 Ral guanine nucleotide dissociation stimulator-like 2 [Source:UniProtKB/Swiss-Prot;Acc:O15211], type=processed transcript,protein coding,retained intron, GO=[Ras guanyl-nucleotide exchange factor activity; Ras protein signal transduction; regulation of small GTPase mediated signal transduction] u CR759817.7 Beta-1,3-galactosyltransferase 4 [Source:UniProtKB/Swiss-Prot;Acc:O96024], type=protein coding, GO=[ganglioside galactosyltransferase activity; UDP-galactose:beta-N-acetylglucosamine beta-1,3-galactosyltransferase activity; protein glycosylation; Golgi membrane; Golgi apparatus part] u CR759817.9 Ral guanine nucleotide dissociation stimulator-like 2 [Source:UniProtKB/Swiss-Prot;Acc:O15211], type=processed transcript,protein coding,retained intron, GO=[Ras guanyl-nucleotide exchange factor activity; Ras protein signal transduction; regulation of small GTPase mediated signal transduction] u CRISPLD2 16:84853537-84943114 cysteine-rich secretory protein LCCL domain containing 2 [Source:HGNC Symbol;Acc:25248], tscapeProstated 16q24.1 type=protein coding, GO=[membrane-bounded vesicle; cytoplasmic vesicle; vesicle] u* CRK 17:1323983-1359552 v-crk sarcoma virus CT10 oncogene homolog (avian) [Source:HGNC Symbol;Acc:2362], tcgaGliomaGE 17p13.3 type=protein coding, GO=[regulation of Rac protein signal transduction; protein phosphorylated amino acid binding; SH2 domain binding; SH3/SH2 adaptor activity; activation of MAPKK activity; insulin receptor signaling pathway; activation of protein kinase activity; nerve growth factor receptor signaling pathway; response to insulin stimulus; positive regulation of protein kinase activity; actin cytoskeleton organization; cellular response to hormone stimulus; actin filament-based process; Ras protein signal transduction; response to peptide hormone stimulus; regulation of small GTPase mediated signal transduction; MAPKKK cascade; regulation of protein kinase activity; blood coagulation; regulation of body fluid levels; regulation of protein phosphorylation; response to hormone stimulus; regulation of phosphorylation; regulation of phosphate metabolic process] u CRNDE 16:54952798-54962708 colorectal neoplasia differentially expressed (non-protein coding) [Source:HGNC tcgaGliomaGE 16q12.2 Symbol;Acc:37078], type=lincRNA Continued on next page. . .

43 S name locus description studies d CSPG5 3:47603729-47622282 chondroitin sulfate proteoglycan 5 (neuroglycan C) [Source:HGNC Symbol;Acc:2467], tcgaGliomaGE 3p21.31 type=processed transcript,protein coding, GO=[Golgi-associated vesicle membrane; Golgi-associated vesicle; growth factor activity; regulation of synaptic transmission; cytoplasmic vesicle membrane; vesicle membrane; cytoplasmic vesicle part; Golgi membrane; Golgi apparatus part; membrane-bounded vesicle; cytoplasmic vesicle; vesicle] u CTDSPL 3:37903451-38025960 CTD (carboxy-terminal domain, RNA polymerase II, polypeptide A) small phosphatase-like tscapeMelanomad 3p22.2 [Source:HGNC Symbol;Acc:16890], type=nonsense mediated decay,processed transcript,protein coding, GO=[phosphoprotein phosphatase activity] u CTGF 6:132269316-132272513 connective tissue growth factor [Source:HGNC Symbol;Acc:2500], type=protein coding, tscapeBCa 6q23.2 GO=[positive regulation of G0 to G1 transition; regulation of G0 to G1 transition; extracellular matrix constituent secretion; organ senescence; positive regulation of cardiac muscle contraction; response to anoxia; positive regulation of collagen biosynthetic process; positive regulation of collagen metabolic process; fibronectin binding; regulation of collagen biosynthetic process; cartilage condensation; cis-Golgi network; insulin-like growth factor binding; response to mineralocorticoid stimulus; response to amino acid stimulus; response to lipid; cytosolic calcium ion transport; integrin-mediated signaling pathway; integrin binding; fibroblast growth factor receptor signaling pathway; response to amine stimulus; response to glucose stimulus; positive regulation of caspase activity; response to estradiol stimulus; heparin binding; response to organic nitrogen; cell-matrix adhesion; response to corticosteroid stimulus; cell cortex; glycosaminoglycan binding; response to estrogen stimulus; growth factor activity; response to organic cyclic compound; regulation of cell growth; epidermis development; DNA replication; angiogenesis; response to steroid hormone stimulus; regulation of cell size; perinuclear region of cytoplasm; cell-cell adhesion; response to peptide hormone stimulus; carbohydrate binding; blood vessel development; vasculature development; regulation of cellular component size; regulation of anatomical structure size; Golgi apparatus part; positive regulation of cell death; regulation of protein phosphorylation; response to hormone stimulus; regulation of phosphorylation; cell activation; regulation of phosphate metabolic process; DNA metabolic process; extracellular space] d CYFIP2 5:156693089-156822606 cytoplasmic FMR1 interacting protein 2 [Source:HGNC Symbol;Acc:13760], tcgaBreastGE, 5q33.3 type=nonsense mediated decay,processed transcript,protein coding,retained intron, tcgaGliomaGE, tscapeRCCa GO=[chemokine activity; synaptosome; perinuclear region of cytoplasm; cell-cell adhesion] d CYP2J2 1:60358980-60392462 cytochrome P450, family 2, subfamily J, polypeptide 2 [Source:HGNC Symbol;Acc:2634], tcgaGliomaGE 1p32.1 type=processed transcript,protein coding, GO=[linoleic acid epoxygenase activity; arachidonic acid 14,15-epoxygenase activity; arachidonic acid 11,12-epoxygenase activity; arachidonic acid monooxygenase activity; arachidonic acid epoxygenase activity; linoleic acid metabolic process; epoxygenase P450 pathway; arachidonic acid metabolic process; long-chain fatty acid metabolic process; aromatase activity; very long-chain fatty acid metabolic process; icosanoid metabolic process; heme binding; xenobiotic metabolic process; electron carrier activity; iron ion binding; microsome; vesicular fraction] u CYR61 1:86046444-86049645 cysteine-rich, angiogenic inducer, 61 [Source:HGNC Symbol;Acc:2654], tcgaBreastGE, 1p22.3 type=processed transcript,protein coding, GO=[positive regulation of ceramide biosynthetic tscapeGliomad process; positive regulation of sphingolipid biosynthetic process; intussusceptive angiogenesis; apoptosis involved in heart morphogenesis; atrioventricular valve morphogenesis; atrioventricular valve development; chondroblast differentiation; atrial septum morphogenesis; atrial septum development; positive regulation of cartilage development; chorio-allantoic fusion; labyrinthine layer blood vessel development; ventricular septum development; insulin-like growth factor binding; extracellular matrix binding; positive regulation of cell-substrate adhesion; cardiac ventricle development; cardiac chamber morphogenesis; embryonic placenta development; integrin binding; positive regulation of caspase activity; osteoblast differentiation; heparin binding; positive regulation of phospholipase activity; glycosaminoglycan binding; regulation of cell growth; angiogenesis; in utero embryonic development; regulation of cell size; cell-cell adhesion; carbohydrate binding; blood vessel development; vasculature development; regulation of cellular component size; lipid biosynthetic process; positive regulation of developmental process; regulation of anatomical structure size; positive regulation of apoptosis; positive regulation of cell death] u DCXR 17:79993757-79995573 dicarbonyl/L-xylulose reductase [Source:HGNC Symbol;Acc:18985], type=protein coding, tscapeMelanomaa, 17q25.3 GO=[L-xylulose reductase (NADP+) activity; D-xylose metabolic process; xylulose metabolic tscapeNSCLCa, process; pentose metabolic process; NADP metabolic process; brush border; glucose metabolic tscapeOvariana process] d DEPTOR 8:120885957-121063152 DEP domain containing MTOR-interacting protein [Source:HGNC Symbol;Acc:22953], tcgaGliomaGE 8q24.12 type=processed transcript,protein coding, GO=[negative regulation of TOR signaling cascade; negative regulation of kinase activity; regulation of cell size; regulation of cellular component size; regulation of protein kinase activity; regulation of anatomical structure size; regulation of protein phosphorylation; regulation of phosphorylation; regulation of phosphate metabolic process] u DHCR7 11:71139239-71163914 7-dehydrocholesterol reductase [Source:HGNC Symbol;Acc:2860], fileCIN70, tcgaBreastGE 11q13.4 type=protein coding,retained intron, GO=[7-dehydrocholesterol reductase activity; regulation of cholesterol biosynthetic process; nuclear outer membrane; post-embryonic development; multicellular organism growth; microsome; vesicular fraction; blood vessel development; vasculature development; lipid biosynthetic process] u DLG5 10:79550549-79686378 discs, large homolog 5 (Drosophila) [Source:HGNC Symbol;Acc:2904], cosmicPrimary, 10q22.3 type=processed transcript,protein coding, GO=[receptor signaling complex scaffold activity; tcgaGliomaGE, tscapeBCa, beta-catenin binding; cell-cell adhesion] tscapeOvariana d DPYSL2 8:26371547-26515694 dihydropyrimidinase-like 2 [Source:HGNC Symbol;Acc:3014], tcgaBreastGE, 8p21.2 type=processed transcript,protein coding,retained intron, GO=[dihydropyrimidinase activity; tcgaGliomaGE, positive regulation of glutamate secretion; olfactory bulb development; response to cocaine; tscapeOvariand, response to tropane; response to amphetamine; terminal button; synaptic vesicle transport; spinal tscapeProstated cord development; growth cone; response to alkaloid; response to amine stimulus; synaptosome; response to organic nitrogen; response to organic cyclic compound; axon; dendrite; response to drug; soluble fraction] u DUSP1 5:172195093-172198198 dual specificity phosphatase 1 [Source:HGNC Symbol;Acc:3064], type=protein coding, tcgaBreastGE, tscapeBCd, 5q35.1 GO=[protein tyrosine/threonine phosphatase activity; non-membrane spanning protein tyrosine tscapeNSCLCa, phosphatase activity; MAP kinase tyrosine/serine/threonine phosphatase activity; endoderm tscapeOvariand, formation; response to testosterone stimulus; inactivation of MAPK activity; positive regulation of tscapeRCCa anti-apoptosis; negative regulation of MAP kinase activity; response to cAMP; response to calcium ion; response to retinoic acid; response to hydrogen peroxide; response to vitamin A; response to estradiol stimulus; response to reactive oxygen species; response to glucocorticoid stimulus; negative regulation of kinase activity; response to corticosteroid stimulus; response to estrogen stimulus; protein dephosphorylation; phosphoprotein phosphatase activity; regulation of protein serine/threonine kinase activity; response to steroid hormone stimulus; cellular response to hormone stimulus; soluble fraction; MAPKKK cascade; regulation of protein kinase activity; positive regulation of apoptosis; positive regulation of cell death; regulation of protein phosphorylation; response to hormone stimulus; regulation of phosphorylation; regulation of phosphate metabolic process] u EDN2 1:41944446-41950344 endothelin 2 [Source:HGNC Symbol;Acc:3177], type=processed transcript,protein coding, tcgaBreastGE, tscapeBCa, 1p34.2 GO=[hormonal regulation of the force of heart contraction; bombesin receptor binding; endothelin tscapeOvariana B receptor binding; positive regulation of prostaglandin-endoperoxide synthase activity; regulation of prostaglandin-endoperoxide synthase activity; artery smooth muscle contraction; tonic smooth muscle contraction; ovarian follicle rupture; vein smooth muscle contraction; positive regulation of the force of heart contraction by chemical signal; regulation of systemic arterial blood pressure by endothelin; vascular smooth muscle contraction; positive regulation of heart rate; macrophage chemotaxis; prostaglandin biosynthetic process; positive regulation of smooth muscle contraction; prostaglandin metabolic process; regulation of systemic arterial blood pressure by hormone; positive regulation of leukocyte chemotaxis; regulation of heart rate; macrophage activation; regulation of systemic arterial blood pressure mediated by a chemical signal; neutrophil chemotaxis; activation of protein kinase C activity by G-protein coupled receptor protein signaling pathway; icosanoid biosynthetic process; negative regulation of hormone secretion; regulation of vasoconstriction; vasoconstriction; icosanoid metabolic process; inositol phosphate-mediated signaling; positive regulation of hormone secretion; calcium-mediated signaling; hormone activity; elevation of cytosolic calcium ion concentration; regulation of hormone secretion; activation of protein kinase activity; hormone secretion; positive regulation of protein kinase activity; lipid biosynthetic process; regulation of protein kinase activity; regulation of anatomical structure size; regulation of protein phosphorylation; regulation of phosphorylation; cell activation; regulation of phosphate metabolic process; extracellular space] u EEF2K 16:22217616-22297954 eukaryotic elongation factor-2 kinase [Source:HGNC Symbol;Acc:24615], type=protein coding, tcgaBreastGE, 16p12.2 GO=[elongation factor-2 kinase activity; translation factor activity, nucleic acid binding; tcgaBreastGESurv translational elongation; calmodulin binding; insulin receptor signaling pathway; response to insulin stimulus; cellular response to hormone stimulus; response to peptide hormone stimulus; response to hormone stimulus; calcium ion binding] d ELF3 1:201977073-201986316 E74-like factor 3 (ets domain transcription factor, epithelial-specific ) [Source:HGNC 1q32.1 Symbol;Acc:3318], type=processed transcript,protein coding, GO=[mammary gland involution; mammary gland morphogenesis; blastocyst development; positive regulation of gene-specific transcription from RNA polymerase II promoter; transcription coactivator activity; epidermis development; in utero embryonic development; transcription repressor activity; transcription activator activity; regulation of gene-specific transcription from RNA polymerase II promoter] d ENPP4 6:46097701-46114435 ectonucleotide pyrophosphatase/phosphodiesterase 4 (putative) [Source:HGNC Symbol;Acc:3359], tcgaGliomaGE 6p21.1 type=protein coding Continued on next page. . .

44 S name locus description studies d ENPP5 6:46128152-46138708 ectonucleotide pyrophosphatase/phosphodiesterase 5 (putative) [Source:HGNC Symbol;Acc:13717], tcgaGliomaGE 6p21.1 type=processed transcript,protein coding u ETNK2 1:204100190-204121307 ethanolamine kinase 2 [Source:HGNC Symbol;Acc:25575], tcgaBreastGE, tscapeBCa, 1q32.1 type=processed transcript,protein coding,retained intron, GO=[choline kinase activity; tscapeProstatea ethanolamine kinase activity; phosphatidylethanolamine biosynthetic process; ethanolamine metabolic process; CDP-choline pathway; post-embryonic development; multicellular organism growth; in utero embryonic development; lipid biosynthetic process] u ETS2 21:40177231-40196879 v-ets erythroblastosis virus E26 oncogene homolog 2 (avian) [Source:HGNC Symbol;Acc:3489], tcgaBreastGE, 21q22.2 type=protein coding tcgaGliomaGE, tscapeProstated u EVL 14:100438151- Enah/Vasp-like [Source:HGNC Symbol;Acc:20234], type=protein coding, GO=[profilin binding; tcgaBreastGE, 100610572 lamellipodium; focal adhesion; cell-substrate junction; SH3 domain binding; basolateral plasma tcgaGliomaGE, 14q32.2 membrane; actin cytoskeleton organization; actin filament-based process] tscapeMelanomad d EXOG 3:38537618-38583652 endo/exonuclease (5’-3’), endonuclease G-like [Source:HGNC Symbol;Acc:3347], 3p22.2 type=nonsense mediated decay,protein coding,retained intron, GO=[endonuclease activity; mitochondrial part] u EZR 6:159186773-159240444 ezrin [Source:HGNC Symbol;Acc:12691], type=processed transcript,protein coding, tcgaBreastGE, tscapeBCd, 6q25.3 GO=[membrane to membrane docking; uropod; cytoskeletal anchoring at plasma membrane; tscapeOvariand, microvillus membrane; establishment or maintenance of apical/basal cell polarity; ruffle membrane; tscapeProstated leukocyte cell-cell adhesion; filopodium; cell adhesion molecule binding; microtubule basal body; actin filament; actin filament binding; actin filament bundle assembly; regulation of cell shape; extrinsic to membrane; cell cortex; apical plasma membrane; basolateral plasma membrane; actin cytoskeleton organization; cell-cell adhesion; actin filament-based process; nucleolus] d* F12 5:176829141-176836577 coagulation factor XII (Hageman factor) [Source:HGNC Symbol;Acc:3530], tcgaBreastGE, 5q35.3 type=processed transcript,protein coding,retained intron, GO=[positive regulation of plasminogen tcgaGliomaGE, activation; Factor XII activation; plasma kallikrein-kinin cascade; kinin cascade; positive regulation tscapeNSCLCa, of fibrinolysis; response to misfolded protein; protein autoprocessing; misfolded protein binding; tscapeRCCa positive regulation of blood coagulation; blood coagulation, intrinsic pathway; regulation of coagulation; acute inflammatory response; serine-type endopeptidase activity; endopeptidase activity; blood coagulation; regulation of body fluid levels; calcium ion binding; extracellular space] u F2RL1 5:76114758-76131140 coagulation factor II (thrombin) receptor-like 1 [Source:HGNC Symbol;Acc:3538], tcgaGliomaGE, tscapeBCd, 5q13.3 type=protein coding, GO=[thrombin receptor activity; positive regulation of positive chemotaxis; tscapeNSCLCd, positive regulation of leukocyte chemotaxis; regulation of coagulation; elevation of cytosolic tscapeOvariand, calcium ion concentration; blood coagulation; regulation of body fluid levels] tscapeProstated u FADS1 11:61567098-61647626 fatty acid desaturase 1 [Source:HGNC Symbol;Acc:3574], 11q12.2 type=nonsense mediated decay,processed transcript,protein coding, GO=[C-5 sterol desaturase activity; response to sucrose stimulus; arachidonic acid metabolic process; very long-chain fatty acid metabolic process; icosanoid biosynthetic process; icosanoid metabolic process; response to vitamin A; heme binding; electron transport chain; iron ion binding; response to organic cyclic compound; microsome; response to insulin stimulus; vesicular fraction; response to peptide hormone stimulus; generation of precursor metabolites and energy; lipid biosynthetic process; response to hormone stimulus] u FAM107B 10:14560556-14816896 family with sequence similarity 107, member B [Source:HGNC Symbol;Acc:23726], 10p13 type=nonsense mediated decay,processed transcript,protein coding,retained intron d FAM111A 11:58910221-58922512 family with sequence similarity 111, member A [Source:HGNC Symbol;Acc:24725], tcgaGliomaGE 11q12.1 type=processed transcript,protein coding,retained intron, GO=[serine-type endopeptidase activity; endopeptidase activity] u FAM8A1 6:17600583-17611950 family with sequence similarity 8, member A1 [Source:HGNC Symbol;Acc:16372], tcgaBreastGE, tscapeBCa, 6p22.3 type=protein coding tscapeOvariana d FHOD1 16:67263292-67281561 formin homology 2 domain containing 1 [Source:HGNC Symbol;Acc:17905], type=protein coding, tscapeOvariand 16q22.1 GO=[actin cytoskeleton organization; actin filament-based process] d FHOD3 18:33877702-34360018 formin homology 2 domain containing 3 [Source:HGNC Symbol;Acc:26178], type=protein coding, tcgaBreastGE, 18q12.2 GO=[actin cytoskeleton organization; actin filament-based process] tcgaGliomaGE, tscapeCRCd d FKBP1A 20:1349622-1373806 FK506 binding protein 1A, 12kDa [Source:HGNC Symbol;Acc:3711], tcgaBreastGE 20p13 type=processed transcript,protein coding, GO=[terminal cisterna; beta-amyloid formation; regulation of protein phosphatase type 2B activity; negative regulation of protein phosphatase type 2B activity; intracellular ligand-gated calcium channel activity; ryanodine-sensitive calcium-release channel activity; protein maturation by protein folding; fibril organization; type I transforming growth factor beta receptor binding; FK506 binding; macrolide binding; calcium-release channel activity; heart trabecula formation; SMAD protein complex assembly; regulation of ryanodine-sensitive calcium-release channel activity; activin binding; protein refolding; regulation of activin receptor signaling pathway; intracellular ligand-gated ion channel activity; transforming growth factor beta receptor activity; positive regulation of protein binding; ventricular cardiac muscle tissue morphogenesis; peptidyl-prolyl cis-trans isomerase activity; SMAD binding; cardiac ventricle development; ’de novo’ protein folding; cardiac chamber morphogenesis; positive regulation of protein ubiquitination; positive regulation of I-kappaB kinase/NF-kappaB cascade; transmembrane receptor protein serine/threonine kinase signaling pathway; cell activation; regulation of phosphate metabolic process] d FOLH1 11:49168187-49230222 folate (prostate-specific membrane antigen) 1 [Source:HGNC Symbol;Acc:3788], snp3dProstateC 11p11.12 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[dipeptidase activity; folic acid-containing compound metabolic process; carboxypeptidase activity; metallopeptidase activity] u FOXO1 13:41129817-41240734 forkhead box O1 [Source:HGNC Symbol;Acc:3819], type=processed transcript,protein coding, tcgaBreastGE, 13q14.11 GO=[positive regulation of gluconeogenesis; negative regulation of stress-activated MAPK cascade; tcgaGliomaGE, tscapeBCd, negative regulation of stress-activated protein kinase signaling cascade; DNA bending activity; tscapeHCCd, sequence-specific enhancer binding RNA polymerase II transcription factor activity; tscapeNSCLCd, phosphatidylinositol-mediated signaling; specific RNA polymerase II transcription factor activity; tscapeOvariand, specific transcriptional repressor activity; endocrine pancreas development; double-stranded DNA tscapeProstated, binding; insulin receptor signaling pathway; negative regulation of gene-specific transcription from tscapeSCLCd RNA polymerase II promoter; structure-specific DNA binding; promoter binding; glucose metabolic process; negative regulation of gene-specific transcription; nerve growth factor receptor signaling pathway; regulation of transcription factor activity; response to insulin stimulus; transcription factor complex; transcription repressor activity; transcription activator activity; cellular response to hormone stimulus; regulation of gene-specific transcription from RNA polymerase II promoter; response to peptide hormone stimulus; blood vessel development; MAPKKK cascade; vasculature development; response to hormone stimulus] u FSIP1 15:39892232-40075039 fibrous sheath interacting protein 1 [Source:HGNC Symbol;Acc:21674], type=protein coding tcgaBreastGE, tscapeCRCd, 15q14 tscapeMelanomad, tscapeNSCLCd, tscapeOvariand u FSTL1 3:120111140-120170100 follistatin-like 1 [Source:HGNC Symbol;Acc:3972], tcgaGliomaGE 3q13.33 type=processed transcript,protein coding,retained intron, GO=[BMP signaling pathway; heparin binding; glycosaminoglycan binding; transmembrane receptor protein serine/threonine kinase signaling pathway; carbohydrate binding; calcium ion binding; extracellular space] d FYCO1 3:45959396-46037316 FYVE and coiled-coil domain containing 1 [Source:HGNC Symbol;Acc:14673], 3p21.31 type=protein coding,retained intron u* FZD8 10:35927177-35930362 frizzled homolog 8 (Drosophila) [Source:HGNC Symbol;Acc:4046], type=protein coding, GO=[Wnt 10p11.21 receptor activity; Wnt-protein binding; T cell differentiation in thymus; PDZ domain binding; canonical Wnt receptor signaling pathway; regulation of gene-specific transcription from RNA polymerase II promoter; vasculature development; regulation of protein phosphorylation; regulation of phosphorylation; cell activation; regulation of phosphate metabolic process] d GAL 11:68451247-68458643 galanin prepropeptide [Source:HGNC Symbol;Acc:4114], type=processed transcript,protein coding, tscapeMelanomad, 11q13.3 GO=[regulation of glucocorticoid metabolic process; growth hormone secretion; glucocorticoid tscapeSCLCa metabolic process; neuropeptide hormone activity; feeding behavior; neuropeptide signaling pathway; hormone activity; insulin secretion; response to estrogen stimulus; hormone secretion; response to insulin stimulus; response to steroid hormone stimulus; response to drug; response to peptide hormone stimulus; positive regulation of apoptosis; positive regulation of cell death; response to hormone stimulus; membrane-bounded vesicle; cell activation; cytoplasmic vesicle; vesicle] u GALNTL4 11:11292423-11643552 UDP-N-acetyl-alpha-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase-like 4 tcgaGliomaGE 11p15.3 [Source:HGNC Symbol;Acc:30488], type=processed transcript,protein coding, GO=[polypeptide N-acetylgalactosaminyltransferase activity; carbohydrate binding; Golgi membrane; Golgi apparatus part] d GATA2 3:128198270-128212028 GATA binding protein 2 [Source:HGNC Symbol;Acc:4171], tcgaBreastGE 3q21.3 type=processed transcript,protein coding, GO=[positive regulation of phagocytosis; cell fate determination; pituitary gland development; positive regulation of angiogenesis; embryonic placenta development; positive regulation of gene-specific transcription from RNA polymerase II promoter; angiogenesis; in utero embryonic development; regulation of anatomical structure morphogenesis; regulation of gene-specific transcription from RNA polymerase II promoter; blood vessel development; vasculature development; positive regulation of developmental process; blood coagulation; regulation of body fluid levels] u* GBE1 3:81538850-81811312 glucan (1,4-alpha-), branching enzyme 1 [Source:HGNC Symbol;Acc:4180], cosmicPrimary, 3p12.2 type=processed transcript,protein coding,retained intron, GO=[1,4-alpha-glucan branching enzyme tcgaGliomaGE, activity; glycogen biosynthetic process; glycogen metabolic process; energy reserve metabolic tscapeNSCLCd, process; glucose metabolic process; generation of precursor metabolites and energy] tscapeSCLCd Continued on next page. . .

45 S name locus description studies dGC 4:72607410-72669758 group-specific component (vitamin D binding protein) [Source:HGNC Symbol;Acc:4187], 4q13.3 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[vitamin D binding; lysosomal lumen; vitamin D metabolic process; vitamin transporter activity; lactation; response to estradiol stimulus; female pregnancy; vacuolar part; response to estrogen stimulus; axon; response to steroid hormone stimulus; perinuclear region of cytoplasm; regulation of body fluid levels; response to hormone stimulus; extracellular space] d GFOD1 6:13358062-13487787 glucose-fructose oxidoreductase domain containing 1 [Source:HGNC Symbol;Acc:21096], tcgaGliomaGE, tscapeBCa, 6p23, 6p24.1 type=protein coding tscapeOvariana, tscapeSCLCd u GK5 3:141882414-141944449 glycerol kinase 5 (putative) [Source:HGNC Symbol;Acc:28635], 3q23 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[glycerol kinase activity; glycerol metabolic process] u* GLUD1 10:88810243-88854627 glutamate dehydrogenase 1 [Source:HGNC Symbol;Acc:4335], tcgaGliomaGE, tscapeCRCd 10q23.2 type=processed transcript,protein coding, GO=[leucine binding; glutamate dehydrogenase [NAD(P)+] activity; glutamate dehydrogenase activity; glutamate biosynthetic process; glutamate catabolic process; NAD+ binding; glutamine family amino acid biosynthetic process; glutamine family amino acid catabolic process; ADP binding; positive regulation of insulin secretion; NAD binding; positive regulation of hormone secretion; regulation of insulin secretion; insulin secretion; regulation of hormone secretion; hormone secretion; mitochondrial matrix; GTP binding; mitochondrial part] d GRAMD4 22:46971909-47075688 GRAM domain containing 4 [Source:HGNC Symbol;Acc:29113], tscapeBCd, tscapeCRCd, 22q13.31 type=processed transcript,protein coding, GO=[mitochondrial part] tscapeGliomad, tscapeOvariand, tscapeSCLCd u* GSTM3 1:110276554-110284384 glutathione S-transferase mu 3 (brain) [Source:HGNC Symbol;Acc:4635], tscapeBCd, tscapeNSCLCd 1p13.3 type=processed transcript,protein coding, GO=[establishment of blood-nerve barrier; glutathione transferase activity; glutathione metabolic process; response to estrogen stimulus; response to steroid hormone stimulus; soluble fraction; response to hormone stimulus] u HARS 5:139968980-140071609 histidyl-tRNA synthetase [Source:HGNC Symbol;Acc:4816], 5q31.3 type=nonsense mediated decay,protein coding,retained intron, GO=[histidine-tRNA ligase activity; histidyl-tRNA aminoacylation] d HBQ1 16:230452-231180 hemoglobin, theta 1 [Source:HGNC Symbol;Acc:4833], type=protein coding, GO=[oxygen tscapeHCCd, 16p13.3 transport; hemoglobin complex; oxygen transporter activity; oxygen binding; heme binding; iron tscapeNSCLCd ion binding] d HDDC3 15:91474148-91475799 HD domain containing 3 [Source:HGNC Symbol;Acc:30522], type=protein coding,retained intron tscapeNSCLCa 15q26.1 d* HIBADH 7:27565061-27702614 3-hydroxyisobutyrate dehydrogenase [Source:HGNC Symbol;Acc:4907], tcgaGliomaGE 7p15.2 type=nonsense mediated decay,protein coding,retained intron, GO=[3-hydroxyisobutyrate dehydrogenase activity; phosphogluconate dehydrogenase (decarboxylating) activity; valine metabolic process; pentose-phosphate shunt; branched chain family amino acid catabolic process; NADP metabolic process; NAD binding; glucose metabolic process; mitochondrial matrix; mitochondrial part] u HIST1H4H 6:26281283-26285762 histone cluster 1, H4h [Source:HGNC Symbol;Acc:4788], type=protein coding, GO=[negative tcgaBreastGE, 6p22.2 regulation of megakaryocyte differentiation; transcription initiation factor activity; tcgaGliomaGE CenH3-containing nucleosome assembly at centromere; DNA replication-independent nucleosome assembly; telomere maintenance; phosphatidylinositol-mediated signaling; nucleosome; transcription initiation, DNA-dependent; chromatin; DNA metabolic process] u HIST2H2BE 1:149856010-149858232 histone cluster 2, H2be [Source:HGNC Symbol;Acc:4760], type=protein coding, GO=[nucleosome; tcgaBreastGE, tscapeBCa, 1q21.2 chromatin] tscapeHCCa u HS1BP3 2:20760208-20850857 HCLS1 binding protein 3 [Source:HGNC Symbol;Acc:24979], tcgaBreastGE, 2p24.1 type=nonsense mediated decay,protein coding tscapeOvariand d HS3ST1 4:11394774-11431389 heparan sulfate (glucosamine) 3-O-sulfotransferase 1 [Source:HGNC Symbol;Acc:5194], 4p15.33 type=protein coding, GO=[[heparan sulfate]-glucosamine 3-sulfotransferase 1 activity; Golgi lumen; Golgi apparatus part] d HSD11B2 16:67465041-67471456 hydroxysteroid (11-beta) dehydrogenase 2 [Source:HGNC Symbol;Acc:5209], type=protein coding, tscapeOvariand 16q22.1 GO=[11-beta-hydroxysteroid dehydrogenase activity; regulation of blood volume by renal aldosterone; glucocorticoid biosynthetic process; glucocorticoid metabolic process; response to food; regulation of systemic arterial blood pressure by hormone; regulation of systemic arterial blood pressure mediated by a chemical signal; NAD binding; response to glucocorticoid stimulus; response to corticosteroid stimulus; female pregnancy; response to hypoxia; microsome; response to insulin stimulus; vesicular fraction; response to steroid hormone stimulus; response to drug; response to peptide hormone stimulus; lipid biosynthetic process; regulation of body fluid levels; response to hormone stimulus] d HSD3B7 16:30996519-31000473 hydroxy-delta-5-steroid dehydrogenase, 3 beta- and steroid delta-isomerase 7 [Source:HGNC tcgaGliomaGE 16p11.2 Symbol;Acc:18324], type=protein coding, GO=[cholest-5-ene-3-beta,7-alpha-diol 3-beta-dehydrogenase activity; 3-beta-hydroxy-delta5-steroid dehydrogenase activity; bile acid biosynthetic process; lipid biosynthetic process] d IGFBP2 2:217497551-217529159 insulin-like growth factor binding protein 2, 36kDa [Source:HGNC Symbol;Acc:5471], tcgaGliomaGE, 2q35 type=protein coding,retained intron, GO=[insulin-like growth factor II binding; insulin-like growth tcgaGliomaGESurv, factor I binding; regulation of insulin-like growth factor receptor signaling pathway; insulin-like tscapeRCCd growth factor binding; response to lithium ion; response to retinoic acid; response to mechanical stimulus; response to vitamin A; response to estradiol stimulus; response to glucocorticoid stimulus; response to corticosteroid stimulus; female pregnancy; response to estrogen stimulus; apical plasma membrane; regulation of cell growth; response to steroid hormone stimulus; regulation of cell size; cellular response to hormone stimulus; response to drug; regulation of cellular component size; regulation of anatomical structure size; response to hormone stimulus; cytoplasmic vesicle; vesicle; extracellular space] d IL17C 16:88705001-88706884 interleukin 17C [Source:HGNC Symbol;Acc:5983], type=protein coding, GO=[neutrophil tscapeProstated 16q24.3 differentiation; soluble fraction; extracellular space] u IL18R1 2:102927989-103015218 interleukin 18 receptor 1 [Source:HGNC Symbol;Acc:5988], type=protein coding,retained intron, 2q12.1 GO=[interleukin-1 receptor activity] u* IL6R 1:154377669-154441926 interleukin 6 receptor [Source:HGNC Symbol;Acc:6019], tscapeHCCa, 1q21.3 type=processed transcript,protein coding,retained intron, GO=[ciliary neurotrophic factor binding; tscapeNSCLCa interleukin-6 receptor activity; hepatic immune response; interleukin-6 receptor complex; ciliary neurotrophic factor receptor activity; interleukin-6-mediated signaling pathway; ciliary neurotrophic factor-mediated signaling pathway; negative regulation of collagen biosynthetic process; negative regulation of interleukin-8 production; positive regulation of activation of activity; regulation of activation of Janus kinase activity; interleukin-6 receptor binding; monocyte chemotaxis; neutrophil mediated immunity; response to interleukin-6; positive regulation of tyrosine phosphorylation of Stat3 protein; defense response to Gram-negative bacterium; regulation of collagen biosynthetic process; positive regulation of chemokine production; positive regulation of interleukin-6 production; positive regulation of leukocyte chemotaxis; positive regulation of smooth muscle cell proliferation; positive regulation of osteoblast differentiation; defense response to Gram-positive bacterium; response to gamma radiation; positive regulation of anti-apoptosis; acute-phase response; regulation of smooth muscle cell proliferation; regulation of osteoblast differentiation; response to cAMP; acute inflammatory response; response to ionizing radiation; response to ethanol; osteoblast differentiation; endocrine pancreas development; response to glucocorticoid stimulus; response to corticosteroid stimulus; activation of protein kinase activity; response to lipopolysaccharide; apical plasma membrane; basolateral plasma membrane; response to steroid hormone stimulus; positive regulation of protein kinase activity; response to peptide hormone stimulus; MAPKKK cascade; protein homodimerization activity; regulation of protein kinase activity; positive regulation of developmental process; regulation of protein phosphorylation; response to hormone stimulus; regulation of phosphorylation; regulation of phosphate metabolic process; extracellular space] d ILDR1 3:121706170-121741051 immunoglobulin-like domain containing receptor 1 [Source:HGNC Symbol;Acc:28741], 3q13.33 type=processed transcript,protein coding d IRF1 5:131817301-131826490 interferon regulatory factor 1 [Source:HGNC Symbol;Acc:6116], tcgaBreastGE, tscapeBCd, 5q31.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron, tscapeNSCLCd GO=[CD8-positive, alpha-beta T cell differentiation; positive regulation of interleukin-12 biosynthetic process; type I interferon-mediated signaling pathway; cellular response to interferon-gamma; response to interferon-gamma; blood coagulation; regulation of body fluid levels; cell activation] u IRX3 16:54317216-54320378 iroquois homeobox 3 [Source:HGNC Symbol;Acc:14360], type=protein coding 16q12.2 u IRX5 16:54964699-54968397 iroquois homeobox 5 [Source:HGNC Symbol;Acc:14361], type=protein coding, GO=[retinal bipolar 16q12.2 neuron differentiation; vitamin D binding; neuron maturation; regulation of heart rate; visual perception] Continued on next page. . .

46 S name locus description studies u* ITPR3 6:33588522-33664351 inositol 1,4,5-triphosphate receptor, type 3 [Source:HGNC Symbol;Acc:6182], type=protein coding, tcgaBreastGE, 6p21.31 GO=[inositol hexakisphosphate binding; inositol 1,4,5 trisphosphate binding; inositol 1,3,4,5 tcgaGliomaGE tetrakisphosphate binding; inositol 1,4,5-trisphosphate-sensitive calcium-release channel activity; sensory perception of sweet taste; sensory perception of umami taste; platelet dense tubular network membrane; sensory perception of bitter taste; calcium-release channel activity; myelin sheath; intracellular ligand-gated ion channel activity; nuclear outer membrane; brush border; protein heterooligomerization; response to calcium ion; activation of phospholipase C activity; positive regulation of phospholipase activity; regulation of insulin secretion; protein homooligomerization; insulin secretion; elevation of cytosolic calcium ion concentration; regulation of hormone secretion; energy reserve metabolic process; hormone secretion; nerve growth factor receptor signaling pathway; platelet activation; microsome; vesicular fraction; generation of precursor metabolites and energy; nucleolus; blood coagulation; regulation of body fluid levels; cell activation] u ITPRIP 10:106071894- inositol 1,4,5-triphosphate receptor interacting protein [Source:HGNC Symbol;Acc:29370], tcgaGliomaGE 106098162 type=processed transcript,protein coding 10q25.1 u* JUN 1:59246465-59249785 jun proto-oncogene [Source:HGNC Symbol;Acc:6204], type=protein coding, GO=[leading edge cell tcgaBreastGE, 1p32.1 differentiation; cellular response to potassium ion starvation; positive regulation of monocyte tcgaGliomaGE differentiation; positive regulation by host of viral transcription; negative regulation by host of viral transcription; negative regulation of protein autophosphorylation; R-SMAD binding; cellular response to calcium ion; modification by host of symbiont morphology or physiology; SMAD protein import into nucleus; regulation of protein autophosphorylation; SMAD protein signal transduction; release of cytochrome c from mitochondria; positive regulation of neuron apoptosis; Rho GTPase activator activity; positive regulation of smooth muscle cell proliferation; positive regulation of fibroblast proliferation; positive regulation of DNA replication; transcriptional repressor complex; SMAD binding; regulation of smooth muscle cell proliferation; membrane depolarization; response to cAMP; MyD88-independent toll-like receptor signaling pathway; toll-like receptor 3 signaling pathway; toll-like receptor 1 signaling pathway; toll-like receptor 2 signaling pathway; learning; response to calcium ion; MyD88-dependent toll-like receptor signaling pathway; circadian rhythm; response to hydrogen peroxide; nuclear chromatin; sequence-specific enhancer binding RNA polymerase II transcription factor activity; Toll signaling pathway; response to mechanical stimulus; toll-like receptor 4 signaling pathway; negative regulation of DNA binding; response to reactive oxygen species; transforming growth factor beta receptor signaling pathway; double-stranded DNA binding; response to lipopolysaccharide; positive regulation of gene-specific transcription from RNA polymerase II promoter; structure-specific DNA binding; promoter binding; transmembrane receptor protein serine/threonine kinase signaling pathway; transcription coactivator activity; response to organic cyclic compound; regulation of transcription factor activity; GTPase activator activity; transcription factor complex; DNA replication; chromatin; angiogenesis; transcription repressor activity; transcription activator activity; response to drug; regulation of gene-specific transcription from RNA polymerase II promoter; blood vessel development; MAPKKK cascade; vasculature development; protein homodimerization activity; positive regulation of developmental process; positive regulation of apoptosis; positive regulation of cell death; regulation of protein phosphorylation; regulation of phosphorylation; regulation of phosphate metabolic process; DNA metabolic process] d KAZALD1 10:102820999- Kazal-type serine peptidase inhibitor domain 1 [Source:HGNC Symbol;Acc:25460], tcgaBreastGE, tscapeCRCd 102827888 type=processed transcript,protein coding, GO=[interstitial matrix; insulin-like growth factor 10q24.31 binding; regulation of cell growth; regulation of cell size; regulation of cellular component size; regulation of anatomical structure size] u KIAA0040 1:175126123-175162079 KIAA0040 [Source:HGNC Symbol;Acc:28950], type=lincRNA,processed transcript tcgaBreastGE, 1q25.1 tcgaGliomaGE, tcgaGliomaGESurv d KLF13 15:31619058-31670102 Kruppel-like factor 13 [Source:HGNC Symbol;Acc:13672], type=protein coding, GO=[regulation of tcgaGliomaGE 15q13.3 erythrocyte differentiation] u KLF15 3:126061478-126076285 Kruppel-like factor 15 [Source:HGNC Symbol;Acc:14536], tcgaBreastGE 3q21.3 type=processed transcript,protein coding, GO=[glucose transport; hexose transport] u KLF9 9:72999503-73029540 Kruppel-like factor 9 [Source:HGNC Symbol;Acc:1123], type=protein coding, GO=[progesterone tcgaBreastGE 9q21.12 receptor signaling pathway; embryo implantation; female pregnancy] d KLHDC5 12:27932953-27955973 kelch domain containing 5 [Source:HGNC Symbol;Acc:29252], tcgaGliomaGE, 12p11.22 type=nonsense mediated decay,protein coding,retained intron tscapeCRCa, tscapeSCLCa u KRT16 17:39766030-39769005 keratin 16 [Source:HGNC Symbol;Acc:6423], type=protein coding, GO=[intermediate filament tscapeBCd, tscapeOvariand 17q21.2 cytoskeleton organization; structural constituent of cytoskeleton; intermediate filament; intermediate filament cytoskeleton; epidermis development] u KRT19 17:39679869-39684560 keratin 19 [Source:HGNC Symbol;Acc:6436], type=protein coding,retained intron, GO=[costamere; snp3dMetastasis, 17q21.2 cell differentiation involved in embryonic placenta development; sarcomere organization; structural tscapeBCd, tscapeOvariand constituent of muscle; Z disc; embryonic placenta development; structural constituent of cytoskeleton; sarcolemma; response to estrogen stimulus; intermediate filament; intermediate filament cytoskeleton; in utero embryonic development; response to steroid hormone stimulus; actin cytoskeleton organization; actin filament-based process; response to hormone stimulus] u KRTAP13- 21:31743709-31744557 keratin associated protein 13-2 [Source:HGNC Symbol;Acc:18923], type=protein coding, 2 21q22.11 GO=[intermediate filament; intermediate filament cytoskeleton] u LACTB2 8:71547553-71581409 lactamase, beta 2 [Source:HGNC Symbol;Acc:18512], tscapeBCa 8q13.3 type=processed transcript,protein coding,retained intron u LDLRAD3 11:35965531-36253686 low density lipoprotein receptor class A domain containing 3 [Source:HGNC Symbol;Acc:27046], tcgaGliomaGE, 11p13 type=processed transcript,protein coding tscapeNSCLCa d LGALS3BP 17:76967336-76976061 lectin, galactoside-binding, soluble, 3 binding protein [Source:HGNC Symbol;Acc:6564], tcgaGliomaGE 17q25.3 type=protein coding, GO=[scavenger receptor activity; cellular defense response; extracellular space] d LGALS4 19:39292312-39303740 lectin, galactoside-binding, soluble, 4 [Source:HGNC Symbol;Acc:6565], type=protein coding, tscapeOvariana 19q13.2 GO=[carbohydrate binding] d LIMA1 12:50569565-50677353 LIM domain and actin binding 1 [Source:HGNC Symbol;Acc:24636], type=protein coding, tcgaGliomaGE, tscapeRCCa 12q13.12 GO=[actin monomer binding; ruffle organization; negative regulation of actin filament depolymerization; stress fiber; actin filament binding; actin filament bundle assembly; focal adhesion; cell-substrate junction; basolateral plasma membrane; actin cytoskeleton organization; actin filament-based process; regulation of cellular component size; regulation of anatomical structure size] d LIN7A 12:81191175-81331694 lin-7 homolog A (C. elegans) [Source:HGNC Symbol;Acc:17787], type=protein coding, tcgaGliomaGE 12q21.31 GO=[synaptic vesicle transport; PDZ domain binding; tight junction; postsynaptic density; neurotransmitter secretion; synaptosome; postsynaptic membrane; cell-cell junction; basolateral plasma membrane] d LINGO1 15:77905369-77924709 leucine rich repeat and Ig domain containing 1 [Source:HGNC Symbol;Acc:21205], tcgaBreastGE, 15q24.3 type=protein coding, GO=[negative regulation of oligodendrocyte differentiation; epidermal tcgaGliomaGE growth factor receptor binding; negative regulation of axonogenesis; central nervous system neuron development; protein kinase B signaling cascade; regulation of axonogenesis; regulation of cell morphogenesis involved in differentiation; nerve growth factor receptor signaling pathway; regulation of anatomical structure morphogenesis] d LRBA 4:151185594-151936879 LPS-responsive vesicle trafficking, beach and anchor containing [Source:HGNC Symbol;Acc:1742], tcgaBreastGE, tscapeRCCd 4q31.3 type=processed transcript,protein coding,retained intron d LRRC26 9:140063210-140064503 leucine rich repeat containing 26 [Source:HGNC Symbol;Acc:31409], type=protein coding, tcgaGliomaGE, tscapeCRCa 9q34.3 GO=[potassium channel regulator activity; voltage-gated potassium channel complex] d LRRTM4 2:76974845-77820445 leucine rich repeat transmembrane neuronal 4 [Source:HGNC Symbol;Acc:19411], tcgaGliomaGE 2p12 type=processed transcript,protein coding u MAK 6:10762956-10838764 male germ cell-associated kinase [Source:HGNC Symbol;Acc:6816], 6p24.2 type=processed transcript,protein coding, GO=[cyclin-dependent protein kinase activity; spermatogenesis] d MAL2 8:120177273-120257913 mal, T-cell differentiation protein 2 (gene/pseudogene) [Source:HGNC Symbol;Acc:13634], tcgaBreastGE, 8q24.12 type=processed transcript tcgaGliomaGE, tscapeHCCa, tscapeNSCLCa, tscapeOvariana d MATN2 8:98881068-99048944 matrilin 2 [Source:HGNC Symbol;Acc:6908], cosmicPrimary, 8q22.1, 8q22.2 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[calcium tcgaBreastGE, ion binding] tcgaGliomaGE, tscapeBCa d METRN 16:765162-767480 meteorin, glial cell differentiation regulator [Source:HGNC Symbol;Acc:14151], tcgaBreastGE, tscapeHCCd 16p13.3 type=protein coding, GO=[positive regulation of axonogenesis; regulation of axonogenesis; regulation of cell morphogenesis involved in differentiation; regulation of anatomical structure morphogenesis; positive regulation of developmental process; extracellular space] d MMP15 16:58059470-58080805 matrix metallopeptidase 15 (membrane-inserted) [Source:HGNC Symbol;Acc:7161], tcgaBreastGESurv 16q21 type=protein coding, GO=[metalloendopeptidase activity; metallopeptidase activity; endopeptidase activity; calcium ion binding] d MOSC2 1:220921567-220958150 MOCO sulphurase C-terminal domain containing 2 [Source:HGNC Symbol;Acc:26064], 1q41 type=nonsense mediated decay,processed transcript,protein coding, GO=[molybdenum ion binding; pyridoxal phosphate binding; mitochondrial outer membrane; mitochondrial part] u MPZL2 11:118124118- myelin protein zero-like 2 [Source:HGNC Symbol;Acc:3496], tscapeBCd 118135251 type=processed transcript,protein coding,retained intron, GO=[homophilic cell adhesion; cell-cell 11q23.3 adhesion] Continued on next page. . .

47 S name locus description studies d MT1F 16:56691855-56693214 metallothionein 1F [Source:HGNC Symbol;Acc:7398], type=protein coding, GO=[cadmium ion tscapeMelanomad 16q12.2 binding; copper ion binding] d MUC13 3:124624289-124672663 mucin 13, cell surface associated [Source:HGNC Symbol;Acc:7511], 3q21.2 type=processed transcript,protein coding,retained intron, GO=[calcium ion binding] d MYO6 6:76458909-76629254 myosin VI [Source:HGNC Symbol;Acc:7605], type=processed transcript,protein coding, tcgaBreastGE 6q14.1 GO=[minus-end directed microfilament motor activity; unconventional myosin complex; filamentous actin; clathrin-coated endocytic vesicle; ADP binding; ruffle membrane; actin filament; coated pit; actin filament binding; actin filament-based movement; myosin complex; DNA-directed RNA polymerase II, holoenzyme; clathrin coated vesicle membrane; DNA damage response, signal transduction by p53 class mediator; sensory perception of sound; calmodulin binding; cell cortex; clathrin-coated vesicle; coated vesicle; cytoplasmic vesicle membrane; vesicle membrane; cytoplasmic vesicle part; perinuclear region of cytoplasm; actin filament-based process; membrane-bounded vesicle; cytoplasmic vesicle; vesicle] d MYT1 20:62783144-62873604 myelin transcription factor 1 [Source:HGNC Symbol;Acc:7622], type=protein coding tscapeNSCLCa 20q13.33 d* NCAM2 21:22370633-22915650 neural cell adhesion molecule 2 [Source:HGNC Symbol;Acc:7657], tcgaBreastGE, 21q21.1 type=processed transcript,protein coding, GO=[neuron cell-cell adhesion; axon; cell-cell adhesion] tcgaGliomaGE, tscapeNSCLCd d NCOR2 12:124808961- corepressor 2 [Source:HGNC Symbol;Acc:7673], 125052135 type=protein coding,retained intron, GO=[regulation of cellular ketone metabolic process by 12q24.31 negative regulation of transcription from an RNA polymerase II promoter; Notch binding; transcriptional repressor complex; binding; protein N-terminus binding; transcription corepressor activity; negative regulation of gene-specific transcription from RNA polymerase II promoter; negative regulation of gene-specific transcription; transcription repressor activity; regulation of gene-specific transcription from RNA polymerase II promoter] d NCRNA00245 10:77161277-77168740 non-protein coding RNA 245 [Source:HGNC Symbol;Acc:23525], 10q22.2 type=processed transcript,protein coding d NDUFS2 1:161166894-161184185 NADH dehydrogenase (ubiquinone) Fe-S protein 2, 49kDa (NADH-coenzyme Q reductase) tcgaBreastGE 1q23.3 [Source:HGNC Symbol;Acc:7708], type=processed transcript,protein coding, GO=[quinone binding; 4 iron, 4 sulfur cluster binding; NADH dehydrogenase (ubiquinone) activity; NADH dehydrogenase activity; mitochondrial respiratory chain complex I; mitochondrial electron transport, NADH to ubiquinone; NAD binding; electron transport chain; electron carrier activity; generation of precursor metabolites and energy; mitochondrial part] u NET1 10:5454514-5500426 neuroepithelial cell transforming 1 [Source:HGNC Symbol;Acc:14592], tcgaGliomaGE, 10p15.1 type=processed transcript,protein coding, GO=[Rho guanyl-nucleotide exchange factor activity; tcgaGliomaGESurv Ras guanyl-nucleotide exchange factor activity; induction of apoptosis by extracellular signals; nerve growth factor receptor signaling pathway; regulation of cell growth; regulation of cell size; Ras protein signal transduction; regulation of small GTPase mediated signal transduction; induction of apoptosis; regulation of cellular component size; regulation of anatomical structure size; positive regulation of apoptosis; positive regulation of cell death] u NFIL3 9:94171327-94186144 nuclear factor, interleukin 3 regulated [Source:HGNC Symbol;Acc:7787], type=protein coding, tcgaBreastGE 9q22.31 GO=[circadian rhythm; transcription corepressor activity] u NPPC 2:232786530-232791113 natriuretic peptide C [Source:HGNC Symbol;Acc:7941], type=protein coding, GO=[growth plate tscapeRCCd 2q37.1 cartilage chondrocyte differentiation; growth plate cartilage chondrocyte proliferation; positive regulation of cGMP biosynthetic process; peptide hormone receptor binding; receptor guanylyl cyclase signaling pathway; positive regulation of vasodilation; positive regulation of osteoblast differentiation; regulation of vasoconstriction; negative regulation of DNA metabolic process; regulation of smooth muscle cell proliferation; vasoconstriction; regulation of osteoblast differentiation; regulation of multicellular organism growth; post-embryonic development; response to ethanol; hormone activity; osteoblast differentiation; multicellular organism growth; response to hypoxia; response to drug; protein homodimerization activity; positive regulation of developmental process; regulation of anatomical structure size; DNA metabolic process; extracellular space] d* NTPCR 1:233086351-233114548 nucleoside-triphosphatase, cancer-related [Source:HGNC Symbol;Acc:28204], tcgaBreastGE 1q42.2 type=processed transcript,protein coding d NUAK2 1:205271187-205290883 NUAK family, SNF1-like kinase, 2 [Source:HGNC Symbol;Acc:29558], type=protein coding, tcgaBreastGE, tscapeBCa 1q32.1 GO=[cellular response to glucose starvation; magnesium ion binding; actin cytoskeleton organization; actin filament-based process] u NUDT11 X:51232863-51239448 nudix (nucleoside diphosphate linked moiety X)-type motif 11 [Source:HGNC Symbol;Acc:18011], tcgaGliomaGE, Xp11.22 type=protein coding, GO=[diphosphoinositol-polyphosphate diphosphatase activity] tscapeOvariana d NUP54 4:77035812-77069668 nucleoporin 54kDa [Source:HGNC Symbol;Acc:17359], tscapeHCCd 4q21.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[nucleocytoplasmic transporter activity; nuclear pore; regulation of glucose transport; glucose transport; hexose transport; mRNA transport] d OAF 11:120081475- OAF homolog (Drosophila) [Source:HGNC Symbol;Acc:28752], tscapeBCd 120101001 type=processed transcript,protein coding 11q23.3 u OBFC2A 2:192542794-192553251 oligonucleotide/oligosaccharide-binding fold containing 2A [Source:HGNC Symbol;Acc:26232], 2q32.3 type=nonsense mediated decay,protein coding,retained intron, GO=[SOSS complex; G2/M transition checkpoint; double-strand break repair via homologous recombination; single-stranded DNA binding; response to ionizing radiation; structure-specific DNA binding; negative regulation of cell cycle; DNA metabolic process] d ODAM 4:71062213-71070293 odontogenic, ameloblast asssociated [Source:HGNC Symbol;Acc:26043], 4q13.3 type=nonsense mediated decay,protein coding,retained intron, GO=[fibril; odontogenesis of dentine-containing tooth; biomineral tissue development] d OGDHL 10:50942689-50970425 oxoglutarate dehydrogenase-like [Source:HGNC Symbol;Acc:25590], tcgaGliomaGE, 10q11.23 type=processed transcript,protein coding, GO=[oxoglutarate dehydrogenase (succinyl-transferring) tcgaOvarianGE activity; thiamine pyrophosphate binding; glycolysis; glucose metabolic process; mitochondrial matrix; generation of precursor metabolites and energy; mitochondrial part] u OLAH 10:15074226-15115851 oleoyl-ACP hydrolase [Source:HGNC Symbol;Acc:25625], type=processed transcript,protein coding, 10p13 GO=[oleoyl-[acyl-carrier-protein] hydrolase activity; lipid biosynthetic process] u PACSIN1 6:34433916-34503006 protein kinase C and casein kinase substrate in neurons 1 [Source:HGNC Symbol;Acc:8570], tcgaBreastGE, 6p21.31 type=processed transcript,protein coding, GO=[COPI-coated vesicle; negative regulation of tcgaGliomaGE endocytosis; Golgi-associated vesicle; coated vesicle; actin cytoskeleton organization; actin filament-based process; Golgi apparatus part; membrane-bounded vesicle; cytoplasmic vesicle; vesicle] u PAK1IP1 6:10694928-10710015 PAK1 interacting protein 1 [Source:HGNC Symbol;Acc:20882], type=protein coding tcgaOvarianGE, tscapeBCa, 6p24.2 tscapeOvariana, tscapeProstated, tscapeSCLCd u PARP4 13:24995064-25086948 poly (ADP-ribose) polymerase family, member 4 [Source:HGNC Symbol;Acc:271], tcgaGliomaGE, tscapeBCd, 13q12.12 type=processed transcript,protein coding,retained intron, GO=[NAD+ ADP-ribosyltransferase tscapeSCLCd activity; protein ADP-ribosylation; spindle microtubule; response to drug; DNA metabolic process] d* PDE2A 11:72287185-72385635 phosphodiesterase 2A, cGMP-stimulated [Source:HGNC Symbol;Acc:8777], tcgaBreastGE, 11q13.4 type=nonsense mediated decay,processed transcript,protein coding,retained intron, tcgaGliomaGE GO=[cGMP-stimulated cyclic-nucleotide phosphodiesterase activity; cGMP catabolic process; cAMP catabolic process; cGMP binding; 3’,5’-cyclic-GMP phosphodiesterase activity; 3’,5’-cyclic-AMP phosphodiesterase activity; cAMP binding; AMP binding; platelet activation; protein homodimerization activity; blood coagulation; regulation of body fluid levels; cell activation] d PDGFRL 8:17433942-17501580 platelet-derived growth factor receptor-like [Source:HGNC Symbol;Acc:8805], 8p22 type=processed transcript,protein coding, GO=[platelet-derived growth factor beta-receptor activity; platelet activating factor receptor activity] u* PGF 14:75408540-75422291 placental growth factor [Source:HGNC Symbol;Acc:8893], type=protein coding, GO=[vascular tcgaGliomaGE 14q24.3 endothelial growth factor receptor signaling pathway; regulation of morphogenesis of a branching structure; positive regulation of endothelial cell proliferation; positive regulation of cell division; branching involved in ureteric bud morphogenesis; heparin binding; female pregnancy; glycosaminoglycan binding; growth factor activity; response to hypoxia; protein heterodimerization activity; angiogenesis; cellular response to hormone stimulus; regulation of anatomical structure morphogenesis; response to drug; carbohydrate binding; blood vessel development; vasculature development; protein homodimerization activity; response to hormone stimulus; extracellular space] u* PHKA1 X:71798664-71934167 phosphorylase kinase, alpha 1 (muscle) [Source:HGNC Symbol;Acc:8925], type=protein coding, tcgaBreastGE Xq13.1, Xq13.2 GO=[glucan 1,4-alpha-glucosidase activity; phosphorylase kinase complex; phosphorylase kinase activity; glycogen catabolic process; glycogen metabolic process; calmodulin binding; energy reserve metabolic process; glucose metabolic process; generation of precursor metabolites and energy] d PHLDA1 12:76419227-76425556 pleckstrin homology-like domain, family A, member 1 [Source:HGNC Symbol;Acc:8933], tcgaGliomaGE 12q21.2 type=protein coding, GO=[FasL biosynthetic process; cytoplasmic vesicle membrane; vesicle membrane; cytoplasmic vesicle part; induction of apoptosis; positive regulation of apoptosis; positive regulation of cell death; membrane-bounded vesicle; cytoplasmic vesicle; vesicle] u PHLDA2 11:2949503-2950685 pleckstrin homology-like domain, family A, member 2 [Source:HGNC Symbol;Acc:12385], tcgaBreastGE, 11p15.4 type=protein coding tcgaGliomaGE, tscapeBCd, tscapeNSCLCd, tscapeOvariand Continued on next page. . .

48 S name locus description studies u* PIAS1 15:68346572-68480402 protein inhibitor of activated STAT, 1 [Source:HGNC Symbol;Acc:2752], type=protein coding, tcgaGliomaGE 15q23 GO=[SUMO ligase activity; positive regulation of protein sumoylation; regulation of interferon-gamma-mediated signaling pathway; regulation of response to interferon-gamma; androgen receptor binding; androgen receptor signaling pathway; cellular response to interferon-gamma; response to interferon-gamma; nuclear speck; transcription corepressor activity; transcription coactivator activity] u* PIK3C2A 11:17099277-17229530 phosphoinositide-3-kinase, class 2, alpha polypeptide [Source:HGNC Symbol;Acc:8971], cosmicRecurrent 11p15.1 type=nonsense mediated decay,processed transcript,protein coding, GO=[phosphatidylinositol-4-phosphate 3-kinase activity; 1-phosphatidylinositol-3-kinase activity; phosphatidylinositol 3-kinase complex; phosphatidylinositol phosphorylation; phosphatidylinositol biosynthetic process; phosphatidylinositol-mediated signaling; clathrin-coated vesicle; coated vesicle; lipid biosynthetic process; membrane-bounded vesicle; cytoplasmic vesicle; vesicle] u* PIK3CD 1:9711790-9789172 phosphoinositide-3-kinase, catalytic, delta polypeptide [Source:HGNC Symbol;Acc:8977], tscapeBCd, tscapeCRCd, 1p36.22 type=processed transcript,protein coding, GO=[phosphatidylinositol-4,5-bisphosphate 3-kinase tscapeHCCd, activity; 1-phosphatidylinositol-3-kinase activity; phosphatidylinositol 3-kinase complex; tscapeNSCLCd, phosphatidylinositol phosphorylation; B cell homeostasis; phosphatidylinositol-mediated signaling; tscapeOvariana, B cell activation; cell activation] tscapeOvariand, tscapeRCCd d PIK3IP1 22:31677579-31688520 phosphoinositide-3-kinase interacting protein 1 [Source:HGNC Symbol;Acc:24942], 22q12.2 type=protein coding,retained intron d* PIM1 6:37137979-37143202 pim-1 oncogene [Source:HGNC Symbol;Acc:8986], type=processed transcript,protein coding, 6p21.2 GO=[positive regulation of cyclin-dependent protein kinase activity involved in G1/S; regulation of cyclin-dependent protein kinase activity involved by G1/S; positive regulation of cyclin-dependent protein kinase activity; manganese ion binding; regulation of cyclin-dependent protein kinase activity; negative regulation of transcription factor activity; negative regulation of DNA binding; regulation of transcription factor activity; regulation of protein serine/threonine kinase activity; positive regulation of protein kinase activity; regulation of protein kinase activity; regulation of protein phosphorylation; regulation of phosphorylation; regulation of phosphate metabolic process] u PLCXD1 X:192989-220023 phosphatidylinositol-specific phospholipase C, X domain containing 1 [Source:HGNC tscapeNSCLCd, Xp22.33 Symbol;Acc:23148], type=processed transcript,protein coding, GO=[phospholipase C activity] tscapeSCLCd d PLEKHG4 16:67311413-67323402 pleckstrin homology domain containing, family G (with RhoGef domain) member 4 [Source:HGNC 16q22.1 Symbol;Acc:24501], type=protein coding, GO=[Rho guanyl-nucleotide exchange factor activity; Ras guanyl-nucleotide exchange factor activity; Ras protein signal transduction; regulation of small GTPase mediated signal transduction] d PLXDC2 10:20105168-20569286 plexin domain containing 2 [Source:HGNC Symbol;Acc:21013], type=protein coding, 10p12.31 GO=[cell-matrix adhesion] u PLXNB1 3:48445261-48471594 plexin B1 [Source:HGNC Symbol;Acc:9103], 3p21.31 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[semaphorin receptor complex; semaphorin-plexin signaling pathway; semaphorin receptor binding; semaphorin receptor activity; positive regulation of axonogenesis; regulation of cell shape; regulation of axonogenesis; regulation of cell morphogenesis involved in differentiation; GTPase activator activity; regulation of anatomical structure morphogenesis; positive regulation of developmental process] d POLB 8:42195972-42229326 polymerase (DNA directed), beta [Source:HGNC Symbol;Acc:9174], tscapeCRCa, tscapeHCCd, 8p11.21 type=nonsense mediated decay,processed transcript,protein coding,retained intron, tscapeNSCLCa, GO=[base-excision repair, gap-filling; pyrimidine dimer repair; DNA-(apurinic or apyrimidinic site) tscapeNSCLCd, activity; DNA-directed DNA polymerase activity; spindle microtubule; damaged DNA tscapeOvariana, binding; microtubule binding; DNA-dependent DNA replication; response to ethanol; DNA tscapeProstatea, replication; DNA metabolic process] tscapeRCCd, tscapeSCLCa u PPM1H 12:63037766-63328930 protein phosphatase, Mg2+/Mn2+ dependent, 1H [Source:HGNC Symbol;Acc:18583], tcgaBreastGE, 12q14.1, 12q14.2 type=protein coding, GO=[phosphoprotein phosphatase activity] tcgaGliomaGE d PPP1R14B 11:64011956-64014413 protein phosphatase 1, regulatory (inhibitor) subunit 14B [Source:HGNC Symbol;Acc:9057], tcgaBreastGE, 11q13.1 type=protein coding, GO=[protein phosphatase inhibitor activity; protein phosphatase regulator tcgaGliomaGE activity; regulation of phosphorylation; regulation of phosphate metabolic process] d PPP2R2A 8:26149007-26230196 protein phosphatase 2, regulatory subunit B, alpha [Source:HGNC Symbol;Acc:9304], tscapeBCd, tscapeOvariand, 8p21.2 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[protein tscapeProstated phosphatase type 2A regulator activity; protein phosphatase type 2A complex; response to morphine; response to isoquinoline alkaloid; protein phosphatase regulator activity; protein serine/threonine phosphatase activity; response to alkaloid; protein dephosphorylation; phosphoprotein phosphatase activity; response to organic cyclic compound] d PROM2 2:95940201-95957056 prominin 2 [Source:HGNC Symbol;Acc:20685], 2q11.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[prominosome; cilium membrane; microvillus membrane; apical plasma membrane; basolateral plasma membrane; membrane-bounded vesicle; vesicle] u PRPS2 X:12809474-12842341 phosphoribosyl pyrophosphate synthetase 2 [Source:HGNC Symbol;Acc:9465], Xp22.2 type=processed transcript,protein coding, GO=[AMP biosynthetic process; ribose phosphate diphosphokinase activity; ribose phosphate metabolic process; pentose metabolic process; ADP binding; GDP binding; AMP binding; organ regeneration; nucleoside metabolic process; magnesium ion binding; soluble fraction; carbohydrate binding; protein homodimerization activity] u PTGES 9:132500610-132515326 prostaglandin E synthase [Source:HGNC Symbol;Acc:9599], tscapeOvariand 9q34.11 type=processed transcript,protein coding, GO=[prostaglandin-E synthase activity; prostaglandin metabolic process; icosanoid metabolic process] u PTPN1 20:49126891-49201299 protein tyrosine phosphatase, non-receptor type 1 [Source:HGNC Symbol;Acc:9642], tcgaBreastGE, 20q13.13 type=protein coding, GO=[negative regulation of insulin receptor signaling pathway; regulation of tcgaGliomaGE, interferon-gamma-mediated signaling pathway; regulation of response to interferon-gamma; insulin tscapeNSCLCd receptor binding; regulation of type I interferon-mediated signaling pathway; type I interferon-mediated signaling pathway; cellular response to interferon-gamma; response to interferon-gamma; protein dephosphorylation; insulin receptor signaling pathway; phosphoprotein phosphatase activity; response to insulin stimulus; cellular response to hormone stimulus; response to peptide hormone stimulus; blood coagulation; regulation of body fluid levels; response to hormone stimulus; cytoplasmic vesicle; vesicle] d* PYGL 14:51324609-51411248 phosphorylase, glycogen, liver [Source:HGNC Symbol;Acc:9725], tcgaGliomaGE, tscapeBCa, 14q22.1 type=nonsense mediated decay,protein coding,retained intron, GO=[5-phosphoribose tscapeProstatea 1-diphosphate biosynthetic process; glycogen phosphorylase activity; purine base binding; ribose phosphate metabolic process; bile acid binding; glucose binding; pentose metabolic process; glycogen catabolic process; AMP binding; pyridoxal phosphate binding; glycogen metabolic process; glucose homeostasis; carbohydrate homeostasis; energy reserve metabolic process; glucose metabolic process; soluble fraction; carbohydrate binding; protein homodimerization activity; generation of precursor metabolites and energy] d RAI14 5:34656342-34832732 retinoic acid induced 14 [Source:HGNC Symbol;Acc:14873], tcgaBreastGE, 5p13.2 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[cell tcgaGliomaGE cortex] u RAP2A 13:98086476-98120244 RAP2A, member of RAS oncogene family [Source:HGNC Symbol;Acc:9861], tcgaGliomaGE, tscapeBCa, 13q32.1 type=nonsense mediated decay,protein coding, GO=[Rap protein signal transduction; positive tscapeSCLCa regulation of protein autophosphorylation; regulation of protein autophosphorylation; recycling endosome membrane; actin cytoskeleton reorganization; regulation of dendrite morphogenesis; negative regulation of cell migration; negative regulation of locomotion; regulation of cell morphogenesis involved in differentiation; endosome membrane; GTPase activity; actin cytoskeleton organization; regulation of anatomical structure morphogenesis; GTP binding; actin filament-based process; Ras protein signal transduction; MAPKKK cascade; regulation of protein phosphorylation; regulation of phosphorylation; regulation of phosphate metabolic process] d RARRES3 11:63304281-63313934 responder (tazarotene induced) 3 [Source:HGNC Symbol;Acc:9869], tcgaBreastGE, 11q12.3 type=processed transcript,protein coding tcgaGliomaGE u RBBP7 X:16857406-16888537 binding protein 7 [Source:HGNC Symbol;Acc:9890], tcgaBreastGE, Xp22.2 type=protein coding,retained intron, GO=[cellular heat acclimation; heat acclimation; ESC/E(Z) tcgaGliomaGE, tscapeHCCa complex; NuRD complex; CenH3-containing nucleosome assembly at centromere; DNA replication-independent nucleosome assembly; transcriptional repressor complex; regulation of cell growth; DNA replication; regulation of cell size; regulation of cellular component size; regulation of anatomical structure size; DNA metabolic process] u RBM24 6:17281577-17294106 RNA binding motif protein 24 [Source:HGNC Symbol;Acc:21539], tcgaGliomaGE, tscapeBCa, 6p22.3 type=processed transcript,protein coding tscapeOvariana d RCOR2 11:63678693-63684316 REST corepressor 2 [Source:HGNC Symbol;Acc:27455], 11q13.1 type=processed transcript,protein coding,retained intron, GO=[transcription corepressor activity; transcription factor complex; chromatin] u REN 1:204123944-204135465 renin [Source:HGNC Symbol;Acc:9958], type=protein coding, GO=[response to cGMP; angiotensin tscapeBCa, tscapeProstatea 1q32.1 maturation; insulin-like growth factor receptor binding; mesonephros development; aspartic-type endopeptidase activity; regulation of systemic arterial blood pressure by hormone; regulation of systemic arterial blood pressure mediated by a chemical signal; response to cAMP; male gonad development; response to drug; endopeptidase activity; MAPKKK cascade; extracellular space] Continued on next page. . .

49 S name locus description studies u RGS4 1:163038565-163046592 regulator of G-protein signaling 4 [Source:HGNC Symbol;Acc:10000], tcgaBreastGE, 1q23.3 type=processed transcript,protein coding, GO=[inactivation of MAPK activity; negative regulation tcgaGliomaGE of MAP kinase activity; regulation of G-protein coupled receptor protein signaling pathway; negative regulation of kinase activity; calmodulin binding; GTPase activator activity; regulation of protein serine/threonine kinase activity; MAPKKK cascade; regulation of protein kinase activity; regulation of protein phosphorylation; regulation of phosphorylation; regulation of phosphate metabolic process] u RHBDF1 16:108058-126354 rhomboid 5 homolog 1 (Drosophila) [Source:HGNC Symbol;Acc:20561], tcgaGliomaGE, 16p13.3 type=nonsense mediated decay,protein coding,retained intron, GO=[serine-type endopeptidase tscapeHCCd, activity; endopeptidase activity; Golgi membrane; Golgi apparatus part] tscapeNSCLCd u RHOB 2:20646835-20649200 ras homolog gene family, member B [Source:HGNC Symbol;Acc:668], type=protein coding, tscapeNSCLCd, 2p24.1 GO=[transformed cell apoptosis; endosome to lysosome transport; GDP binding; positive tscapeOvariand regulation of angiogenesis; late endosome membrane; endosome membrane; GTPase activity; platelet activation; angiogenesis; regulation of anatomical structure morphogenesis; GTP binding; soluble fraction; Ras protein signal transduction; regulation of small GTPase mediated signal transduction; blood vessel development; negative regulation of cell cycle; vasculature development; positive regulation of developmental process; blood coagulation; regulation of body fluid levels; cell activation] u RIT1 1:155867599-155881195 Ras-like without CAAX 1 [Source:HGNC Symbol;Acc:10023], tcgaBreastGE, 1q22 type=processed transcript,protein coding, GO=[calmodulin binding; nerve growth factor receptor tcgaGliomaGE, signaling pathway; GTPase activity; GTP binding] tcgaOvarianGE, tscapeHCCa, tscapeProstated u RNF24 20:3912068-3996229 ring finger protein 24 [Source:HGNC Symbol;Acc:13779], type=protein coding, GO=[Golgi tcgaBreastGE 20p13 membrane; Golgi apparatus part] d RSL24D1 15:55473004-55489265 ribosomal L24 domain containing 1 [Source:HGNC Symbol;Acc:18479], type=protein coding, 15q21.3 GO=[ribosome biogenesis; structural constituent of ribosome; ribosome; nucleolus] u S100A10 1:151955391-151966866 S100 calcium binding protein A10 [Source:HGNC Symbol;Acc:10487], tcgaGliomaGE, 1q21.3 type=processed transcript,protein coding, GO=[cellular response to acid; extrinsic to plasma tscapeHCCa, membrane; extrinsic to membrane; calcium ion binding] tscapeMelanomaa d SEMA3F 3:50192478-50226508 sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3F snp3dLungC, tcgaBreastGE 3p21.31 [Source:HGNC Symbol;Acc:10728], type=protein coding,retained intron, GO=[chemorepellent activity; negative regulation of axon extension involved in axon guidance; negative chemotaxis; neural crest cell migration; negative regulation of axonogenesis; regulation of axonogenesis; negative regulation of locomotion; regulation of cell morphogenesis involved in differentiation; regulation of cell growth; regulation of cell size; regulation of anatomical structure morphogenesis; regulation of cellular component size; regulation of anatomical structure size; extracellular space] u SERPINA3 14:95078634-95090397 serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 3 [Source:HGNC tscapeMelanomad 14q32.13 Symbol;Acc:16], type=processed transcript,protein coding,retained intron, GO=[maintenance of gastrointestinal epithelium; acute-phase response; acute inflammatory response; serine-type endopeptidase inhibitor activity] u SH3PXD2B 5:171752185-171881527 SH3 and PX domains 2B [Source:HGNC Symbol;Acc:29242], tcgaGliomaGE, 5q35.1 type=processed transcript,protein coding, GO=[phosphatidylinositol-4-phosphate binding; tscapeNSCLCa, phosphatidylinositol-5-phosphate binding; podosome; phosphatidylinositol-3,5-bisphosphate tscapeRCCa binding; phosphatidylinositol-3-phosphate binding; positive regulation of fat cell differentiation; SH2 domain binding; positive regulation of developmental process] u* SHC1 1:154934774-154946871 SHC (Src homology 2 domain containing) transforming protein 1 [Source:HGNC tcgaBreastGE, 1q21.3 Symbol;Acc:10840], type=processed transcript,protein coding, GO=[Shc-EGFR complex; tcgaGliomaGE, tscapeHCCa transmembrane receptor protein tyrosine kinase adaptor activity; phosphotyrosine binding; insulin-like growth factor receptor binding; epidermal growth factor receptor binding; protein phosphorylated amino acid binding; receptor tyrosine kinase binding; regulation of epidermal growth factor receptor activity; positive regulation of vasoconstriction; response to nicotine; actin cytoskeleton reorganization; insulin receptor binding; positive regulation of smooth muscle cell proliferation; positive regulation of DNA replication; regulation of vasoconstriction; organ regeneration; regulation of smooth muscle cell proliferation; vasoconstriction; fibroblast growth factor receptor signaling pathway; response to alkaloid; response to hydrogen peroxide; response to reactive oxygen species; response to glucocorticoid stimulus; response to corticosteroid stimulus; insulin receptor signaling pathway; response to hypoxia; response to organic cyclic compound; endosome membrane; nerve growth factor receptor signaling pathway; mitochondrial matrix; response to insulin stimulus; DNA replication; angiogenesis; regulation of protein serine/threonine kinase activity; response to steroid hormone stimulus; positive regulation of protein kinase activity; actin cytoskeleton organization; cellular response to hormone stimulus; cell-cell adhesion; actin filament-based process; Ras protein signal transduction; response to peptide hormone stimulus; blood vessel development; MAPKKK cascade; vasculature development; regulation of protein kinase activity; regulation of anatomical structure size; blood coagulation; regulation of body fluid levels; regulation of protein phosphorylation; mitochondrial part; response to hormone stimulus; regulation of phosphorylation; regulation of phosphate metabolic process; DNA metabolic process] d SHROOM1 5:132157833-132166590 shroom family member 1 [Source:HGNC Symbol;Acc:24084], tscapeBCd 5q31.1 type=processed transcript,protein coding, GO=[myosin II complex; actin filament binding; actin filament bundle assembly; myosin complex; actin cytoskeleton organization; actin filament-based process] d SIPA1L2 1:232533711-232697304 signal-induced proliferation-associated 1 like 2 [Source:HGNC Symbol;Acc:23800], tcgaBreastGE, tscapeBCa, 1q42.2 type=processed transcript,protein coding, GO=[GTPase activator activity; regulation of small tscapeOvariana, GTPase mediated signal transduction] tscapeProstated d SLC16A10 6:111408706-111545145 solute carrier family 16, member 10 (aromatic amino acid transporter) [Source:HGNC tscapeCRCd 6q21 Symbol;Acc:17027], type=processed transcript,protein coding, GO=[aromatic amino acid transport; amino acid transmembrane transport; amino acid transmembrane transporter activity; basolateral plasma membrane] u SLC16A6 17:66263167-66287405 solute carrier family 16, member 6 (monocarboxylic acid transporter 7) [Source:HGNC tcgaBreastGE 17q24.2 Symbol;Acc:10927], type=protein coding d SLC20A1 2:113403434-113421404 solute carrier family 20 (phosphate transporter), member 1 [Source:HGNC Symbol;Acc:10946], tcgaBreastGE 2q13 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[high affinity inorganic phosphate:sodium symporter activity; inorganic phosphate transmembrane transporter activity; sodium-dependent phosphate transmembrane transporter activity; sodium-dependent phosphate transport; positive regulation of I-kappaB kinase/NF-kappaB cascade; sodium ion transport] d SLC27A2 15:50474393-50528592 solute carrier family 27 (fatty acid transporter), member 2 [Source:HGNC Symbol;Acc:10996], tcgaGliomaGE, tscapeCRCd 15q21.2 type=protein coding, GO=[pristanate-CoA ligase activity; phytanate-CoA ligase activity; very long-chain fatty acid catabolic process; very long-chain fatty acid-CoA ligase activity; fatty acid alpha-oxidation; fatty acid transporter activity; long-chain fatty acid-CoA ligase activity; long-chain fatty acid metabolic process; bile acid biosynthetic process; peroxisomal matrix; very long-chain fatty acid metabolic process; peroxisomal membrane; microsome; vesicular fraction; lipid biosynthetic process] u SLC44A3 1:95285898-95360802 solute carrier family 44, member 3 [Source:HGNC Symbol;Acc:28689], tscapeMelanomad, 1p21.3 type=nonsense mediated decay,processed transcript,protein coding, GO=[choline transmembrane tscapeNSCLCd transporter activity; choline transport] u SLC5A6 2:27422455-27435826 solute carrier family 5 (sodium-dependent vitamin transporter), member 6 [Source:HGNC tcgaOvarianGESurv 2p23.3 Symbol;Acc:11041], type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[pantothenate transmembrane transport; biotin transport; sodium-dependent multivitamin transmembrane transporter activity; biotin metabolic process; pantothenate metabolic process; vitamin transporter activity; brush border membrane; brush border; sodium ion transport; vesicle membrane; membrane-bounded vesicle; vesicle] u SLCO2A1 3:133651540-133771028 solute carrier organic anion transporter family, member 2A1 [Source:HGNC Symbol;Acc:10955], 3q22.1, 3q22.2 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[prostaglandin transmembrane transporter activity; prostaglandin transport; fatty acid transporter activity; sodium-independent organic anion transport] d SLPI 20:43880880-43883205 secretory leukocyte peptidase inhibitor [Source:HGNC Symbol;Acc:11092], type=protein coding, tcgaBreastGE, 20q13.12 GO=[negative regulation of endopeptidase activity; serine-type endopeptidase inhibitor activity] tscapeNSCLCd u SNAI2 8:49830249-49834299 snail homolog 2 (Drosophila) [Source:HGNC Symbol;Acc:11094], type=protein coding, tcgaBreastGE, 8q11.21 GO=[ectoderm and mesoderm interaction; osteoblast differentiation; canonical Wnt receptor tcgaGliomaGE, tscapeBCd, signaling pathway] tscapeNSCLCd, tscapeRCCd d SPTBN5 15:42140345-42186275 spectrin, beta, non-erythrocytic 5 [Source:HGNC Symbol;Acc:15680], type=protein coding, tcgaGliomaGE, 15q15.1 GO=[apical cortex; spectrin; spectrin binding; actin filament capping; negative regulation of actin tscapeCRCd, filament depolymerization; actin filament binding; protein C-terminus binding; cell cortex; protein tscapeMelanomad, heterodimerization activity; actin cytoskeleton organization; actin filament-based process; tscapeNSCLCd, regulation of cellular component size; regulation of anatomical structure size] tscapeOvariand u SSR3 3:156257929-156272973 signal sequence receptor, gamma (translocon-associated protein gamma) [Source:HGNC 3q25.31 Symbol;Acc:11325], type=processed transcript,protein coding,retained intron, GO=[Sec61 translocon complex; cotranslational protein targeting to membrane; signal sequence binding; integral to endoplasmic reticulum membrane; microsome; vesicular fraction] Continued on next page. . .

50 S name locus description studies u ST3GAL5 2:86066267-86116137 ST3 beta-galactoside alpha-2,3-sialyltransferase 5 [Source:HGNC Symbol;Acc:10872], tcgaGliomaGE 2p11.2 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[lactosylceramide alpha-2,3-sialyltransferase activity; neolactotetraosylceramide alpha-2,3-sialyltransferase activity; ganglioside biosynthetic process; integral to Golgi membrane; protein glycosylation; Golgi membrane; lipid biosynthetic process; Golgi apparatus part] d STAMBPL1 10:90639491-90734910 STAM binding protein-like 1 [Source:HGNC Symbol;Acc:24105], tcgaGliomaGE, 10q23.31 type=processed transcript,protein coding, GO=[ubiquitin thiolesterase activity; metallopeptidase tscapeCRCd, tscapeSCLCd activity] u STEAP1 7:89783689-89794143 six transmembrane epithelial antigen of the prostate 1 [Source:HGNC Symbol;Acc:11378], snp3dProstateC, 7q21.13 type=nonsense mediated decay,protein coding,retained intron, GO=[iron ion transport; flavin tscapeNSCLCa adenine dinucleotide binding; electron transport chain; electron carrier activity; cell-cell junction; iron ion binding; endosome membrane; generation of precursor metabolites and energy] u STEAP2 7:89796904-89870091 six transmembrane epithelial antigen of the prostate 2 [Source:HGNC Symbol;Acc:17885], tcgaGliomaGE, 7q21.13 type=protein coding,retained intron, GO=[Golgi to plasma membrane transport; trans-Golgi tscapeNSCLCa network transport vesicle; regulated secretory pathway; integral to Golgi membrane; Golgi-associated vesicle; iron ion transport; flavin adenine dinucleotide binding; early endosome; clathrin-coated vesicle; electron transport chain; electron carrier activity; coated vesicle; iron ion binding; endosome membrane; vesicular fraction; Golgi membrane; generation of precursor metabolites and energy; Golgi apparatus part; response to hormone stimulus; membrane-bounded vesicle; cytoplasmic vesicle; vesicle] d SUSD4 1:223394161-223537544 sushi domain containing 4 [Source:HGNC Symbol;Acc:25470], tcgaBreastGE 1q41 type=processed transcript,protein coding u SYBU 8:110586207-110704020 syntabulin (syntaxin-interacting) [Source:HGNC Symbol;Acc:26011], tcgaGliomaGE 8q23.2 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[Golgi membrane; Golgi apparatus part; membrane-bounded vesicle; cytoplasmic vesicle; vesicle] u TACC1 8:38585704-38710546 transforming, acidic coiled-coil containing protein 1 [Source:HGNC Symbol;Acc:11522], tcgaBreastGE, 8p11.22 type=processed transcript,protein coding,retained intron, GO=[interkinetic nuclear migration; tcgaGliomaGE, regulation of microtubule-based process; cerebral cortex development; intermediate filament tcgaOvarianGE, cytoskeleton; microtubule cytoskeleton organization] tscapeHCCd, tscapeNSCLCd d TANC1 2:159825146-160089170 tetratricopeptide repeat, ankyrin repeat and coiled-coil containing 1 [Source:HGNC tcgaGliomaGE 2q24.2 Symbol;Acc:29364], type=protein coding,retained intron, GO=[postsynaptic density; postsynaptic membrane] u TAX1BP3 17:3566196-3571976 Tax1 (human T-cell leukemia virus type I) binding protein 3 [Source:HGNC Symbol;Acc:30684], tcgaGliomaGE 17p13.2 type=protein coding, GO=[negative regulation of protein localization at cell surface; activation of Cdc42 GTPase activity; beta-catenin binding; negative regulation of Wnt receptor signaling pathway; protein C-terminus binding; Ras protein signal transduction; regulation of small GTPase mediated signal transduction] u TGFBR2 3:30647994-30735634 transforming growth factor, beta receptor II (70/80kDa) [Source:HGNC Symbol;Acc:11773], snp3dCRC, tcgaBreastGE 3p24.1 type=protein coding, GO=[type III transforming growth factor beta receptor binding; regulation of tolerance induction to self antigen; positive regulation of tolerance induction to self antigen; transforming growth factor beta receptor activity, type II; bronchus morphogenesis; regulation of B cell tolerance induction; positive regulation of B cell tolerance induction; B cell tolerance induction; transforming growth factor beta receptor complex; formation; positive regulation of NK T cell differentiation; regulation of NK T cell differentiation; lung lobe morphogenesis; lung lobe development; positive regulation of T cell tolerance induction; regulation of T cell tolerance induction; T cell tolerance induction; type I transforming growth factor beta receptor binding; transforming growth factor beta binding; myeloid dendritic cell differentiation; response to cholesterol; embryonic hemopoiesis; transforming growth factor beta receptor activity; pathway-restricted SMAD protein phosphorylation; positive regulation of mesenchymal cell proliferation; embryonic cranial skeleton morphogenesis; patterning of blood vessels; peptidyl-threonine phosphorylation; SMAD binding; mammary gland morphogenesis; receptor signaling protein serine/threonine kinase activity; response to lipid; palate development; caveola; vasculogenesis; smoothened signaling pathway; peptidyl-serine phosphorylation; transforming growth factor beta receptor signaling pathway; glycosaminoglycan binding; activation of protein kinase activity; transmembrane receptor protein serine/threonine kinase signaling pathway; angiogenesis; in utero embryonic development; positive regulation of protein kinase activity; response to drug; carbohydrate binding; blood vessel development; vasculature development; regulation of protein kinase activity; positive regulation of developmental process; regulation of protein phosphorylation; regulation of phosphorylation; cell activation; regulation of phosphate metabolic process] d TLCD1 17:27051366-27053950 TLC domain containing 1 [Source:HGNC Symbol;Acc:25177], type=protein coding tcgaBreastGE, 17q11.2 tscapeNSCLCa u TMEM100 17:53796990-53809482 transmembrane protein 100 [Source:HGNC Symbol;Acc:25607], type=protein coding tcgaBreastGE 17q22 d TMEM116 12:112369103- transmembrane protein 116 [Source:HGNC Symbol;Acc:25084], type=protein coding 112450943 12q24.13 u TMEM149 19:36230153-36233351 transmembrane protein 149 [Source:HGNC Symbol;Acc:23620], type=protein coding tcgaBreastGE, 19q13.12 tcgaGliomaGE, tscapeNSCLCa u TMEM14C 6:10723148-10731362 transmembrane protein 14C [Source:HGNC Symbol;Acc:20952], tscapeBCa, tscapeOvariana, 6p24.2 type=processed transcript,protein coding, GO=[heme biosynthetic process; mitochondrial part] tscapeProstated, tscapeSCLCd d TMEM51 1:15479028-15546974 transmembrane protein 51 [Source:HGNC Symbol;Acc:25488], type=protein coding tcgaBreastGE, 1p36.21 tcgaGliomaGE, tscapeBCd, tscapeCRCd, tscapeHCCd, tscapeNSCLCd, tscapeOvariand, tscapeRCCd d TP53INP1 8:95938200-95961639 tumor protein p53 inducible nuclear protein 1 [Source:HGNC Symbol;Acc:18022], tcgaBreastGE, 8q22.1 type=protein coding, GO=[PML body; induction of apoptosis; negative regulation of cell cycle; tcgaGliomaGE, tscapeBCa positive regulation of apoptosis; positive regulation of cell death] u TPD52L1 6:125440195-125585553 tumor protein D52-like 1 [Source:HGNC Symbol;Acc:12006], 6q22.31 type=processed transcript,protein coding,retained intron, GO=[caspase activator activity; DNA fragmentation involved in apoptotic nuclear change; DNA catabolic process, endonucleolytic; positive regulation of JNK cascade; G2/M transition of mitotic cell cycle; protein heterodimerization activity; regulation of protein serine/threonine kinase activity; positive regulation of protein kinase activity; perinuclear region of cytoplasm; MAPKKK cascade; induction of apoptosis; protein homodimerization activity; regulation of protein kinase activity; positive regulation of apoptosis; positive regulation of cell death; regulation of protein phosphorylation; regulation of phosphorylation; regulation of phosphate metabolic process; DNA metabolic process] u TPM1 15:63334831-63364111 tropomyosin 1 (alpha) [Source:HGNC Symbol;Acc:12010], type=protein coding, GO=[positive 15q22.2 regulation of heart rate by epinephrine; bleb; muscle thin filament tropomyosin; ruffle organization; positive regulation of ATPase activity; positive regulation of heart rate; filamentous actin; positive regulation of stress fiber assembly; sarcomere organization; ventricular cardiac muscle tissue morphogenesis; regulation of heart rate; ruffle membrane; regulation of systemic arterial blood pressure mediated by a chemical signal; stress fiber; structural constituent of muscle; muscle filament sliding; actin filament; actin filament-based movement; cardiac ventricle development; cellular response to reactive oxygen species; actin filament bundle assembly; cardiac chamber morphogenesis; structural constituent of cytoskeleton; negative regulation of cell migration; negative regulation of locomotion; response to reactive oxygen species; in utero embryonic development; actin cytoskeleton organization; actin filament-based process] d TRIM45 1:117653682-117665209 tripartite motif containing 45 [Source:HGNC Symbol;Acc:19018], tcgaBreastGE, 1p13.1 type=nonsense mediated decay,processed transcript,protein coding tcgaGliomaGE, tscapeMelanomad, tscapeNSCLCd, tscapeSCLCa, tscapeSCLCd u TSC22D1 13:45007655-45151283 TSC22 domain family, member 1 [Source:HGNC Symbol;Acc:16826], tcgaGliomaGE, tscapeBCd, 13q14.11 type=nonsense mediated decay,processed transcript,protein coding,retained intron tscapeHCCd, tscapeNSCLCd, tscapeProstated, tscapeSCLCd u TSC22D3 X:106956451-107020572 TSC22 domain family, member 3 [Source:HGNC Symbol;Acc:3051], type=protein coding, Xq22.3 GO=[MyoD binding; MRF binding; negative regulation of skeletal muscle tissue development; response to osmotic stress; negative regulation of gene-specific transcription from RNA polymerase II promoter; negative regulation of gene-specific transcription; regulation of gene-specific transcription from RNA polymerase II promoter] d TSPAN1 1:46640745-46651630 tetraspanin 1 [Source:HGNC Symbol;Acc:20657], type=processed transcript,protein coding, tcgaBreastGE, 1p34.1 GO=[lysosomal membrane; vacuolar membrane; vacuolar part] tcgaOvarianGE, tscapeOvariand, tscapeSCLCa u TSPAN13 7:16793160-16824161 tetraspanin 13 [Source:HGNC Symbol;Acc:21643], type=processed transcript,protein coding tcgaBreastGE, 7p21.1 tcgaGliomaGE, tscapeNSCLCa Continued on next page. . .

51 S name locus description studies d TSPAN15 10:71211229-71267425 tetraspanin 15 [Source:HGNC Symbol;Acc:23298], 10q22.1 type=processed transcript,protein coding,retained intron d TSPAN7 X:38420623-38548169 tetraspanin 7 [Source:HGNC Symbol;Acc:11854], tcgaBreastGE, Xp11.4 type=nonsense mediated decay,processed transcript,protein coding,retained intron tcgaGliomaGE d TSPAN9 12:3186521-3395730 tetraspanin 9 [Source:HGNC Symbol;Acc:21640], tcgaBreastGE, 12p13.32, 12p13.33 type=nonsense mediated decay,processed transcript,protein coding,retained intron tcgaGliomaGE, tscapeOvariana u TTLL12 22:43562628-43583139 tubulin tyrosine ligase-like family, member 12 [Source:HGNC Symbol;Acc:28974], tscapeGliomad, 22q13.2 type=processed transcript,protein coding,retained intron, GO=[tubulin-tyrosine ligase activity] tscapeOvariand u USP10 16:84733555-84813525 ubiquitin specific peptidase 10 [Source:HGNC Symbol;Acc:12608], type=protein coding, tscapeProstated 16q24.1 GO=[cystic fibrosis transmembrane conductance regulator binding; p53 binding; ubiquitin-specific protease activity; protein deubiquitination; ubiquitin thiolesterase activity; DNA damage response, signal transduction by p53 class mediator; early endosome; intermediate filament cytoskeleton; ubiquitin-dependent protein catabolic process; DNA metabolic process] d VASN 16:4421849-4433529 vasorin [Source:HGNC Symbol;Acc:18517], type=protein coding tscapeHCCd, 16p13.3 tscapeOvariand u VCL 10:75757872-75879918 vinculin [Source:HGNC Symbol;Acc:12665], type=processed transcript,protein coding, tcgaGliomaGE, 10q22.2 GO=[beta-dystroglycan binding; adherens junction assembly; alpha-catenin binding; fascia tscapeProstatea adherens; epithelial cell-cell adhesion; costamere; intercalated disc; cadherin binding; lamellipodium assembly; apical junction assembly; stress fiber; cell adhesion molecule binding; Rho GTPase binding; actin filament; beta-catenin binding; cell-cell junction assembly; negative regulation of cell migration; platelet degranulation; negative regulation of locomotion; focal adhesion; cell-substrate junction; cell-matrix adhesion; cell-cell junction; basolateral plasma membrane; platelet activation; cell-cell adhesion; blood coagulation; regulation of body fluid levels; cell activation] u WDR41 5:76721795-76916436 WD repeat domain 41 [Source:HGNC Symbol;Acc:25601], tscapeBCd, tscapeNSCLCd, 5q13.3, 5q14.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron tscapeOvariand, tscapeProstated u WHAMM 15:83477973-83503613 WAS protein homolog associated with actin, golgi membranes and microtubules [Source:HGNC 15q25.2 Symbol;Acc:30493], type=protein coding, GO=[ER-Golgi intermediate compartment; cytoplasmic vesicle membrane; vesicle membrane; cytoplasmic vesicle part; membrane-bounded vesicle; cytoplasmic vesicle; vesicle] u WIPI1 17:66417423-66453653 WD repeat domain, phosphoinositide interacting 1 [Source:HGNC Symbol;Acc:25471], tcgaGliomaGE 17q24.2 type=protein coding, GO=[vesicle targeting, trans-Golgi to endosome; pre-autophagosomal structure membrane; phosphatidylinositol-3,5-bisphosphate binding; phosphatidylinositol-3-phosphate binding; autophagic vacuole membrane; estrogen receptor binding; androgen receptor binding; trans-Golgi network; autophagy; vacuolar membrane; vacuolar part; clathrin-coated vesicle; coated vesicle; endosome membrane; Golgi apparatus part; membrane-bounded vesicle; cytoplasmic vesicle; vesicle] d YPEL2 17:57409053-57479095 yippee-like 2 (Drosophila) [Source:HGNC Symbol;Acc:18326], type=protein coding, GO=[nucleolus] tcgaGliomaGE 17q22 u Z82214.1 22:43562923-43567756 Tubulin–tyrosine ligase-like protein 12 [Source:UniProtKB/Swiss-Prot;Acc:Q14166], 22q13.2 type=protein coding, GO=[tubulin-tyrosine ligase activity] d Z83851.1 22:42760406-42765242 [undefined], type=processed transcript 22q13.2 u ZDHHC8P1 22:23732793-23744913 zinc finger, DHHC-type containing 8 pseudogene 1 [Source:HGNC Symbol;Acc:26461], 22q11.23 type=processed transcript,pseudogene,transcribed unprocessed pseudogene d ZG16B 16:2880170-2882285 zymogen granule protein 16 homolog B (rat) [Source:HGNC Symbol;Acc:30456], 16p13.3 type=protein coding, GO=[carbohydrate binding] d ZNF239 10:44051792-44070066 zinc finger protein 239 [Source:HGNC Symbol;Acc:13031], tcgaBreastGE 10q11.21 type=processed transcript,protein coding u ZNF589 3:48282590-48340743 zinc finger protein 589 [Source:HGNC Symbol;Acc:16747], 3p21.31 type=nonsense mediated decay,processed transcript,protein coding u ZSWIM6 5:60628100-60841997 zinc finger, SWIM-type containing 6 [Source:HGNC Symbol;Acc:29316], type=protein coding 5q12.1

52 10.2.1 GO enrichment of all candidates

Table 23: Enriched Gene Ontology terms [1] (FDR corrected p ≤ 0.05). Ratio is the proportion of the annotated genes among the whole gene set. List is sorted based on the FDR corrected p-values. Green and blue borders are referring to up and down regulated genes, respectively.

Ratio Type Description Genes 0.021 MF ganglioside galactosyltransferase activity AL662827.4, AL844527.3, B3GALT4, CR759786.11, CR759817.7 0.021 MF UDP-galactose:beta-N-acetylglucosamine AL662827.4, AL844527.3, B3GALT4, CR759786.11, CR759817.7 beta-1,3-galactosyltransferase activity 0.070 BP actin cytoskeleton organization AMOT, CRK, EVL, EZR, FHOD1, FHOD3, KRT19, LIMA1, NUAK2, PACSIN1, RAP2A, SHC1, SHROOM1, SPTBN5, TPM1 0.075 BP actin filament-based process AMOT, CRK, EVL, EZR, FHOD1, FHOD3, KRT19, LIMA1, MYO6, NUAK2, PACSIN1, RAP2A, SHC1, SHROOM1, SPTBN5, TPM1 0.025 CC actin filament AIF1L, AMOT, EZR, MYO6, TPM1, VCL 0.107 BP regulation of phosphate metabolic process ADAM9, ADRA2C, AIDA, BARD1, CCND3, CDC25A, CDKN1C, CRK, CTGF, DEPTOR, DUSP1, EDN2, FKBP1A, FZD8, IL6R, JUN, PIM1, PPP1R14B, RAP2A, RGS4, SHC1, TGFBR2, TPD52L1 0.103 BP response to hormone stimulus ABCC8, ADAM9, ASS1, BTG2, CITED1, CRK, CTGF, DUSP1, EEF2K, FADS1, FOXO1, GAL,GC, GSTM3, HSD11B2, IGFBP2, IL6R, KRT19, PGF, PTPN1, SHC1, STEAP2

cellular_component

actin filament

Figure 16: Relationships between the enriched cellular component Gene Ontology terms that were listed in Table 23. The darkness of the red reflects the significance of the enrichment and the thicknesses of the edges are propotional to the numbers of genes sharing the following annotation.

53 molecular_function

ganglioside UDP-galactose:beta-N-acetylglucosamine galactosyltransferase beta-1,3-galactosyltransferase activity activity

Figure 17: Relationships between the enriched molecular function Gene Ontology terms that were listed in Table 23. The darkness of the red reflects the significance of the enrichment and the thicknesses of the edges are propotional to the numbers of genes sharing the following annotation.

54 response to hormone stimulus

biological_process

actin filament-based regulation process of phosphate metabolic process

actin cytoskeleton organization

Figure 18: Relationships between the enriched biological process Gene Ontology terms that were listed in Table 23. The darkness of the red reflects the significance of the enrichment and the thicknesses of the edges are propotional to the numbers of genes sharing the following annotation.

55 11 Candidate report for Common DEGs for AR and siFOXA1

11.1 Moksiskaan candidate pathway

AGPAT1, PLD2 AGPAT2

EPT1 MBOAT2 ASAH1, ASAH2 CEPT1

PLD1

CHPT1

IRS2

CNTFR, CRLF2, CSF2RB, ADCY1, CSF3R, ADCY10, EPOR, ADCY2, GHR, ADCY5, IL11RA, PNPT1 ADCY6, IL12RB2, ADCY7, IL13RA1, INSR ADCY8, IL13RA2, DGKB, DGKA, ADCY9 IL15RA, DGKD, DGKH, PPAP2A IL20RB, DGKE, DGKZ IL22RA1, DGKG, IL22RA2, DGKI, IL28RA, DGKQ IL3RA, IFNGR1 IL4R, IL5RA, CTPS, IL6R, IL2RA, CTPS2 ENTPD3 IL6ST, IL2RG IL9R, LIFR, PKLR, ENTPD8 MPL, SOCS2 PKM2 OSMR, DGAT2 PNLIPRP1 PRLR JAK1

LIPC POLR3GL ENTPD1 IL2RB JAK2 LIPF TYK2 LEPR PNLIP NME1, ITPA LIPG NME1-NME2, DHCR24 JAK3 NME2, CEL PNLIPRP3 NME3, ADCY3 NME4, CSF2RA, NME5, IL12RB1, NME6, PNLIPRP2 IFNGR2 IL20RA, NME7 PNPLA3 IL21R, ADCY4 IFNAR1, IL23R, IFNAR2, IL7R IL10RA, IL10RB PRKG1 GUCY1A3

DHCR7 KCNMA1

DGAT1 GLDC

PRKG2 GNMT ALAS1

ALAS2

GCAT

INADL CLDN9 RP11-631M21.2 CLDN3 CLDN10

MPDZ

CLDN15 CLDN16 NOS1 CLDN1 CLDN4

CLDN2

CLDN17 CLDN18 DYNC1H1, CLDN23 DYNC1I1, DYNC1I2, CLDN6 TJP1 TUBA3D DYNC1LI1, MPZ GJA1 DYNC1LI2, MPZL1 DYNC2H1 CLDN11 CLDN8 CLDN22

CLDN14 TJP2 CLDN20 CLDN7

CLDN5

OCLN CLDN19 ADH1A, TJP3 ADH1C, TUBB, ADH4, ESAM TUBB1, ADH5, TUBB2B, ADH6, TUBB2C, ADH7 TUBB3, TUBB4, HOMER2 TUBB4Q, TUBB6

ADH1B

GRM1, HOMER3 GRM5, SHANK2 ITPR1, ITPR2, ITPR3, NFKB1, SHANK1, RELA NFKB2 SHANK3

TPH1, HOMER1 NFKBIA TPH2

IDO2

MAOA

COMT IDO1

MAOB IL4I1 ALDH1A3 TAT

UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A7, UGT1A8, UGT1A9, UGT1A5, UGT2A1, UGT1A6, DDC UGT2B10, UGT2A3 UGT2B11, UGT2B15, TH UGT2B17, UGT2B28, UGT2B4, UGT2B7 TYR

CARNS1

GOT1, GOT2

SAT1

ABP1

UGDH

AGMAT

SRM

ODC1

UGP2

GYS1, GYS2

PYGB

gene gene pathway protein protein protein-protein dephosphorylation phosphorylation expression repression precedence binding dissociation interaction

Figure 19: Known relationships between the candidate genes. Candidate genes are shown in red if they have only output connections. The ratio of input and output connections determines how light they are. Completely white genes have only input connections. The maximum of 1 other gene step(s) are allowed between the candidate genes and these intermediate genes are shown on gray. Green and blue borders are referring to up and down regulated genes, respectively. Light grey is used to emphasize stably expressed genes. Known regulations are shown with bold borders whereas the predictions are kept thin.

Table 24: Descriptions of the intermediated genes between the candidate genes. Studies that have reported results about the candidate genes are listed so that those with negative evidence have been prefixed with a hyphen. This table has 210 rows.

name description studies ABP1 amiloride binding protein 1 (amine oxidase (copper-containing)) [Source:HGNC Symbol;Acc:80] tscapeMelanomaa locus=7:150521715-150558592 ADCY1 adenylate cyclase 1 (brain) [Source:HGNC Symbol;Acc:232] locus=7:45613739-45762715 tcgaGliomaGE ADCY10 adenylate cyclase 10 (soluble) [Source:HGNC Symbol;Acc:21285] locus=1:167778625-167883453 ADCY2 adenylate cyclase 2 (brain) [Source:HGNC Symbol;Acc:233] locus=5:7396321-7830194 tcgaGliomaGE ADCY3 adenylate cyclase 3 [Source:HGNC Symbol;Acc:234] locus=2:25042038-25142708 tcgaGliomaGE ADCY4 adenylate cyclase 4 [Source:HGNC Symbol;Acc:235] locus=14:24787559-24804277 tcgaBreastGE, tscapeHCCa ADCY5 adenylate cyclase 5 [Source:HGNC Symbol;Acc:236] locus=3:123001143-123168605 tcgaGliomaGE ADCY6 adenylate cyclase 6 [Source:HGNC Symbol;Acc:237] locus=12:49159975-49177877 tscapeRCCa ADCY7 adenylate cyclase 7 [Source:HGNC Symbol;Acc:238] locus=16:50300462-50352046 tscapeBCd ADCY8 adenylate cyclase 8 (brain) [Source:HGNC Symbol;Acc:239] locus=8:131792547-132054672 tscapeOvariana ADCY9 adenylate cyclase 9 [Source:HGNC Symbol;Acc:240] locus=16:4012657-4166186 tcgaGliomaGE, tcgaOvarianGE, tscapeHCCd, tscapeOvariand ADH1A 1A (class I), alpha polypeptide [Source:HGNC Symbol;Acc:249] locus=4:100197524-100212185 tcgaBreastGE, tcgaOvarianGE, tscapeHCCd ADH1B alcohol dehydrogenase 1B (class I), beta polypeptide [Source:HGNC Symbol;Acc:250] locus=4:100226121-100242558 tcgaOvarianGE, tscapeHCCd Continued on next page. . .

56 name description studies ADH1C alcohol dehydrogenase 1C (class I), gamma polypeptide [Source:HGNC Symbol;Acc:251] locus=4:100257649-100274184 tcgaBreastGE, tscapeHCCd ADH4 alcohol dehydrogenase 4 (class II), pi polypeptide [Source:HGNC Symbol;Acc:252] locus=4:100044808-100078949 tscapeHCCd ADH5 alcohol dehydrogenase 5 (class III), chi polypeptide [Source:HGNC Symbol;Acc:253] locus=4:99992132-100009952 tcgaBreastGE, tscapeHCCd ADH6 alcohol dehydrogenase 6 (class V) [Source:HGNC Symbol;Acc:255] locus=4:100123795-100140694 tscapeHCCd ADH7 alcohol dehydrogenase 7 (class IV), mu or sigma polypeptide [Source:HGNC Symbol;Acc:256] locus=4:100333418-100356894 tscapeHCCd AGMAT agmatine ureohydrolase (agmatinase) [Source:HGNC Symbol;Acc:18407] locus=1:15898848-15911605 tcgaBreastGE, tscapeBCd, tscapeCRCd, tscapeNSCLCd, tscapeOvariand, tscapeRCCd AGPAT1 1-acylglycerol-3-phosphate O-acyltransferase 1 (lysophosphatidic acid acyltransferase, alpha) [Source:HGNC tcgaGliomaGE Symbol;Acc:324] locus=6:32135989-32145873 AGPAT2 1-acylglycerol-3-phosphate O-acyltransferase 2 (lysophosphatidic acid acyltransferase, beta) [Source:HGNC tscapeCRCa Symbol;Acc:325] locus=9:139567595-139581875 ALAS1 aminolevulinate, delta-, synthase 1 [Source:HGNC Symbol;Acc:396] locus=3:52232102-52248343 ALAS2 aminolevulinate, delta-, synthase 2 [Source:HGNC Symbol;Acc:397] locus=X:55035488-55057497 ASAH1 N-acylsphingosine amidohydrolase (acid ceramidase) 1 [Source:HGNC Symbol;Acc:735] locus=8:17913934-17942494 tcgaGliomaGE ASAH2 N-acylsphingosine amidohydrolase (non-lysosomal ceramidase) 2 [Source:HGNC Symbol;Acc:18860] locus=10:51884446-52039568 CARNS1 carnosine synthase 1 [Source:HGNC Symbol;Acc:29268] locus=11:67182439-67193078 tcgaGliomaGE CEL carboxyl ester lipase (bile salt-stimulated lipase) [Source:HGNC Symbol;Acc:1848] locus=9:135937365-135947248 tcgaBreastGE CEPT1 choline/ethanolamine phosphotransferase 1 [Source:HGNC Symbol;Acc:24289] locus=1:111682249-111727724 tcgaGliomaGE, tscapeBCd, tscapeNSCLCd, tscapeSCLCa CHPT1 choline phosphotransferase 1 [Source:HGNC Symbol;Acc:17852] locus=12:102091417-102122846 CLDN1 claudin 1 [Source:HGNC Symbol;Acc:2032] locus=3:190023490-190040264 tscapeBCa, tscapeMelanomad, tscapeOvariana CLDN10 claudin 10 [Source:HGNC Symbol;Acc:2033] locus=13:96085858-96231906 tcgaGliomaGE, tscapeBCa CLDN11 claudin 11 [Source:HGNC Symbol;Acc:8514] locus=3:170136653-170578169 tscapeBCa, tscapeNSCLCa, tscapeOvariana CLDN14 claudin 14 [Source:HGNC Symbol;Acc:2035] locus=21:37832919-37948867 tcgaBreastGE CLDN15 claudin 15 [Source:HGNC Symbol;Acc:2036] locus=7:100875373-100882101 tcgaGliomaGE, tscapeNSCLCa CLDN16 claudin 16 [Source:HGNC Symbol;Acc:2037] locus=3:190040330-190129932 tscapeBCa, tscapeMelanomad, tscapeOvariana CLDN17 claudin 17 [Source:HGNC Symbol;Acc:2038] locus=21:31538241-31538971 CLDN18 claudin 18 [Source:HGNC Symbol;Acc:2039] locus=3:137717577-137752494 CLDN19 claudin 19 [Source:HGNC Symbol;Acc:2040] locus=1:43198764-43205925 tcgaBreastGE, tscapeBCa, tscapeOvariana CLDN2 claudin 2 [Source:HGNC Symbol;Acc:2041] locus=X:106143394-106174091 CLDN20 claudin 20 [Source:HGNC Symbol;Acc:2042] locus=6:155585147-155597682 tscapeOvariand, tscapeSCLCd CLDN22 claudin 22 [Source:HGNC Symbol;Acc:2044] locus=4:184239220-184241927 tscapeCRCd, tscapeHCCd, tscapeNSCLCd, tscapeProstated, tscapeRCCd CLDN23 claudin 23 [Source:HGNC Symbol;Acc:17591] locus=8:8559448-8561616 tscapeHCCd CLDN3 claudin 3 [Source:HGNC Symbol;Acc:2045] locus=7:73183328-73184600 tscapeNSCLCd CLDN4 claudin 4 [Source:HGNC Symbol;Acc:2046] locus=7:73213872-73247017 tscapeNSCLCd CLDN5 claudin 5 [Source:HGNC Symbol;Acc:2047] locus=22:19510547-19515068 tcgaBreastGE, tscapeCRCa CLDN6 claudin 6 [Source:HGNC Symbol;Acc:2048] locus=16:3064713-3070072 tcgaBreastGE, tscapeHCCd, tscapeOvariand CLDN7 claudin 7 [Source:HGNC Symbol;Acc:2049] locus=17:7163223-7166512 tcgaBreastGE, tscapeProstated CLDN9 claudin 9 [Source:HGNC Symbol;Acc:2051] locus=16:3062457-3064506 tcgaBreastGE, tscapeHCCd, tscapeOvariand CNTFR ciliary neurotrophic factor receptor [Source:HGNC Symbol;Acc:2170] locus=9:34551430-34590121 tcgaGliomaGE COMT catechol-O-methyltransferase [Source:HGNC Symbol;Acc:2228] locus=22:19929130-19957498 tcgaGliomaGE, tscapeCRCa CRLF2 cytokine receptor-like factor 2 [Source:HGNC Symbol;Acc:14281] locus=X:1314890-1331616 tscapeNSCLCd, tscapeSCLCd CSF2RA colony stimulating factor 2 receptor, alpha, low-affinity (granulocyte-macrophage) [Source:HGNC Symbol;Acc:2435] tscapeNSCLCd, tscapeSCLCd locus=X:1387693-1429274 CSF2RB colony stimulating factor 2 receptor, beta, low-affinity (granulocyte-macrophage) [Source:HGNC Symbol;Acc:2436] locus=22:37309670-37336491 CSF3R colony stimulating factor 3 receptor (granulocyte) [Source:HGNC Symbol;Acc:2439] locus=1:36931644-36948879 CTPS CTP synthase [Source:HGNC Symbol;Acc:2519] locus=1:41445007-41478235 fileCIN70, tcgaOvarianGE, tscapeBCa, tscapeOvariana CTPS2 CTP synthase II [Source:HGNC Symbol;Acc:2520] locus=X:16606126-16731059 tcgaBreastGE, tcgaGliomaGE, tscapeHCCa DGAT1 diacylglycerol O-acyltransferase 1 [Source:HGNC Symbol;Acc:2843] locus=8:145539954-145550573 tcgaBreastGESurv, tscapeBCd, tscapeNSCLCa, tscapeOvariana DGAT2 diacylglycerol O-acyltransferase 2 [Source:HGNC Symbol;Acc:16940] locus=11:75470557-75512579 tcgaGliomaGE, tscapeOvariana DGKA diacylglycerol kinase, alpha 80kDa [Source:HGNC Symbol;Acc:2849] locus=12:56324946-56347805 tcgaGliomaGE DGKB diacylglycerol kinase, beta 90kDa [Source:HGNC Symbol;Acc:2850] locus=7:14184674-15014402 cosmicMetastasis, tcgaGliomaGE DGKD diacylglycerol kinase, delta 130kDa [Source:HGNC Symbol;Acc:2851] locus=2:234263153-234380750 tscapeBCd, tscapeNSCLCd, tscapeOvariand, tscapeRCCd DGKE diacylglycerol kinase, epsilon 64kDa [Source:HGNC Symbol;Acc:2852] locus=17:54911460-54946034 tcgaGliomaGE, tscapeBCa DGKG diacylglycerol kinase, gamma 90kDa [Source:HGNC Symbol;Acc:2853] locus=3:185823457-186080026 tcgaGliomaGE, tscapeBCa, tscapeOvariana DGKH diacylglycerol kinase, eta [Source:HGNC Symbol;Acc:2854] locus=13:42614176-42830714 tcgaGliomaGE, tscapeBCd, tscapeHCCd, tscapeNSCLCd, tscapeProstated, tscapeSCLCd DGKI diacylglycerol kinase, iota [Source:HGNC Symbol;Acc:2855] locus=7:137073563-137531838 tcgaGliomaGE DGKQ diacylglycerol kinase, theta 110kDa [Source:HGNC Symbol;Acc:2856] locus=4:952675-980683 tscapeBCd DGKZ diacylglycerol kinase, zeta [Source:HGNC Symbol;Acc:2857] locus=11:46354455-46402104 tcgaGliomaGE DHCR7 7-dehydrocholesterol reductase [Source:HGNC Symbol;Acc:2860] locus=11:71139239-71163914 fileCIN70, tcgaBreastGE DYNC1H1 dynein, cytoplasmic 1, heavy chain 1 [Source:HGNC Symbol;Acc:2961] locus=14:102430865-102517135 cosmicPrimary, tcgaGliomaGE, tscapeMelanomad DYNC1I1 dynein, cytoplasmic 1, intermediate chain 1 [Source:HGNC Symbol;Acc:2963] locus=7:95401866-95739634 tcgaGliomaGE, tscapeNSCLCa DYNC1I2 dynein, cytoplasmic 1, intermediate chain 2 [Source:HGNC Symbol;Acc:2964] locus=2:172543919-172604930 DYNC1LI1 dynein, cytoplasmic 1, light intermediate chain 1 [Source:HGNC Symbol;Acc:18745] locus=3:32567463-32612366 tcgaBreastGE, tcgaGliomaGE DYNC1LI2 dynein, cytoplasmic 1, light intermediate chain 2 [Source:HGNC Symbol;Acc:2966] locus=16:66754796-66785701 tscapeOvariand DYNC2H1 dynein, cytoplasmic 2, heavy chain 1 [Source:HGNC Symbol;Acc:2962] locus=11:102980160-103350591 ENTPD1 ectonucleoside triphosphate diphosphohydrolase 1 [Source:HGNC Symbol;Acc:3363] locus=10:97454774-97637023 tcgaGliomaGE, tscapeCRCd ENTPD3 ectonucleoside triphosphate diphosphohydrolase 3 [Source:HGNC Symbol;Acc:3365] locus=3:40428647-40470110 tcgaGliomaGE ENTPD8 ectonucleoside triphosphate diphosphohydrolase 8 [Source:HGNC Symbol;Acc:24860] locus=9:140328816-140336268 tscapeCRCa EPOR receptor [Source:HGNC Symbol;Acc:3416] locus=19:11487881-11495018 tcgaGliomaGE EPT1 ethanolaminephosphotransferase 1 (CDP-ethanolamine-specific) [Source:HGNC Symbol;Acc:29361] tcgaBreastGE, tcgaGliomaGE locus=2:26531415-26618759 ESAM endothelial cell adhesion molecule [Source:HGNC Symbol;Acc:17474] locus=11:124622026-124632186 tcgaGliomaGE, tscapeBCd, tscapeNSCLCd GCAT glycine C-acetyltransferase [Source:HGNC Symbol;Acc:4188] locus=22:38203912-38213183 tcgaBreastGE, tscapeMelanomaa GHR growth hormone receptor [Source:HGNC Symbol;Acc:4263] locus=5:42423879-42721979 tcgaBreastGE, tcgaGliomaGE GJA1 gap junction protein, alpha 1, 43kDa [Source:HGNC Symbol;Acc:4274] locus=6:121756791-121770873 fileBC2brain GOT1 glutamic-oxaloacetic transaminase 1, soluble (aspartate aminotransferase 1) [Source:HGNC Symbol;Acc:4432] tcgaBreastGE, tcgaGliomaGE, locus=10:101156627-101190393 tscapeBCd, tscapeCRCd, tscapeOvariand GOT2 glutamic-oxaloacetic transaminase 2, mitochondrial (aspartate aminotransferase 2) [Source:HGNC Symbol;Acc:4433] tcgaBreastGE, tcgaGliomaGE locus=16:58741035-58768239 GRM1 glutamate receptor, metabotropic 1 [Source:HGNC Symbol;Acc:4593] locus=6:146348782-146758734 tcgaGliomaGE, tscapeOvariand GRM5 glutamate receptor, metabotropic 5 [Source:HGNC Symbol;Acc:4597] locus=11:88237744-88799113 tcgaGliomaGE GYS1 glycogen synthase 1 (muscle) [Source:HGNC Symbol;Acc:4706] locus=19:49471382-49496567 tcgaGliomaGE, tscapeNSCLCd, tscapeOvariand GYS2 glycogen synthase 2 (liver) [Source:HGNC Symbol;Acc:4707] locus=12:21689123-21757781 tscapeBCd HOMER1 homer homolog 1 (Drosophila) [Source:HGNC Symbol;Acc:17512] locus=5:78668459-78810040 tcgaGliomaGE, tscapeBCd, tscapeNSCLCd, tscapeOvariand, tscapeProstated HOMER3 homer homolog 3 (Drosophila) [Source:HGNC Symbol;Acc:17514] locus=19:19040010-19052041 tcgaBreastGE, tcgaGliomaGE IDO1 indoleamine 2,3-dioxygenase 1 [Source:HGNC Symbol;Acc:6059] locus=8:39759794-39785963 tcgaGliomaGE IDO2 indoleamine 2,3-dioxygenase 2 [Source:HGNC Symbol;Acc:27269] locus=8:39792133-39873910 IFNAR1 interferon (alpha, beta and omega) receptor 1 [Source:HGNC Symbol;Acc:5432] locus=21:34696734-34732168 IFNAR2 interferon (alpha, beta and omega) receptor 2 [Source:HGNC Symbol;Acc:5433] locus=21:34602206-34637969 tcgaGliomaGE IFNGR1 receptor 1 [Source:HGNC Symbol;Acc:5439] locus=6:137518621-137540586 tscapeCRCa IFNGR2 interferon gamma receptor 2 (interferon gamma transducer 1) [Source:HGNC Symbol;Acc:5440] locus=21:34757299-34851655 IL10RA interleukin 10 receptor, alpha [Source:HGNC Symbol;Acc:5964] locus=11:117857063-117872196 tcgaGliomaGE, tscapeBCd IL10RB interleukin 10 receptor, beta [Source:HGNC Symbol;Acc:5965] locus=21:34638663-34669539 tcgaGliomaGE IL11RA interleukin 11 receptor, alpha [Source:HGNC Symbol;Acc:5967] locus=9:34652182-34661884 tcgaBreastGE IL12RB1 receptor, beta 1 [Source:HGNC Symbol;Acc:5971] locus=19:18170371-18197697 IL12RB2 interleukin 12 receptor, beta 2 [Source:HGNC Symbol;Acc:5972] locus=1:67773047-67862583 tcgaGliomaGE IL13RA1 interleukin 13 receptor, alpha 1 [Source:HGNC Symbol;Acc:5974] locus=X:117861535-117928502 tcgaBreastGE, tcgaGliomaGE IL13RA2 interleukin 13 receptor, alpha 2 [Source:HGNC Symbol;Acc:5975] locus=X:114238538-114254540 tcgaGliomaGE, tcgaOvarianGE Continued on next page. . .

57 name description studies IL15RA interleukin 15 receptor, alpha [Source:HGNC Symbol;Acc:5978] locus=10:5991038-6020150 IL20RA interleukin 20 receptor, alpha [Source:HGNC Symbol;Acc:6003] locus=6:137321108-137366298 tscapeCRCa IL20RB interleukin 20 receptor beta [Source:HGNC Symbol;Acc:6004] locus=3:136665072-136729927 IL21R interleukin 21 receptor [Source:HGNC Symbol;Acc:6006] locus=16:27413686-27463362 tcgaBreastGE, tscapeHCCd IL22RA1 interleukin 22 receptor, alpha 1 [Source:HGNC Symbol;Acc:13700] locus=1:24446261-24469611 tscapeBCd, tscapeCRCd, tscapeNSCLCd, tscapeOvariand, tscapeRCCd IL22RA2 interleukin 22 receptor, alpha 2 [Source:HGNC Symbol;Acc:14901] locus=6:137464968-137494785 tscapeCRCa IL23R receptor [Source:HGNC Symbol;Acc:19100] locus=1:67632083-67725662 IL28RA interleukin 28 receptor, alpha (interferon, lambda receptor) [Source:HGNC Symbol;Acc:18584] locus=1:24480647-24514449 tscapeBCd, tscapeCRCd, tscapeNSCLCd, tscapeOvariand, tscapeRCCd IL2RA interleukin 2 receptor, alpha [Source:HGNC Symbol;Acc:6008] locus=10:6052652-6104333 tcgaGliomaGE IL2RB interleukin 2 receptor, beta [Source:HGNC Symbol;Acc:6009] locus=22:37515126-37571094 IL2RG interleukin 2 receptor, gamma [Source:HGNC Symbol;Acc:6010] locus=X:70327254-70331958 tcgaGliomaGE IL3RA interleukin 3 receptor, alpha (low affinity) [Source:HGNC Symbol;Acc:6012] locus=X:1455509-1501578 cosmicMetastasis, tcgaOvarianGE, tscapeNSCLCd, tscapeSCLCd IL4I1 interleukin 4 induced 1 [Source:HGNC Symbol;Acc:19094] locus=19:50392916-50432796 tcgaBreastGE, tscapeOvariand IL4R interleukin 4 receptor [Source:HGNC Symbol;Acc:6015] locus=16:27325247-27376099 tcgaBreastGESurv, tscapeHCCd IL5RA interleukin 5 receptor, alpha [Source:HGNC Symbol;Acc:6017] locus=3:3111233-3168297 IL6R interleukin 6 receptor [Source:HGNC Symbol;Acc:6019] locus=1:154377669-154441926 tscapeHCCa, tscapeNSCLCa IL6ST interleukin 6 signal transducer (gp130, receptor) [Source:HGNC Symbol;Acc:6021] tscapeBCd, tscapeNSCLCd, locus=5:55230923-55290772 tscapeOvariand, tscapeProstated IL7R interleukin 7 receptor [Source:HGNC Symbol;Acc:6024] locus=5:35852797-35879705 IL9R interleukin 9 receptor [Source:HGNC Symbol;Acc:6030] locus=X:155227246-155251689 tscapeBCa, tscapeNSCLCa, tscapeRCCa INADL InaD-like (Drosophila) [Source:HGNC Symbol;Acc:28881] locus=1:62208149-62629592 INSR insulin receptor [Source:HGNC Symbol;Acc:6091] locus=19:7112266-7294011 snp3dDiabetes, snp3dObesity ITPA inosine triphosphatase (nucleoside triphosphate pyrophosphatase) [Source:HGNC Symbol;Acc:6176] tcgaBreastGE, tcgaGliomaGE locus=20:3189514-3204516 ITPR1 inositol 1,4,5-triphosphate receptor, type 1 [Source:HGNC Symbol;Acc:6180] locus=3:4535032-4889524 tcgaGliomaGE, tscapeCRCd, tscapeMelanomad ITPR2 inositol 1,4,5-triphosphate receptor, type 2 [Source:HGNC Symbol;Acc:6181] locus=12:26490342-26986131 tcgaGliomaGE, tscapeCRCa ITPR3 inositol 1,4,5-triphosphate receptor, type 3 [Source:HGNC Symbol;Acc:6182] locus=6:33588522-33664351 tcgaBreastGE, tcgaGliomaGE JAK1 [Source:HGNC Symbol;Acc:6190] locus=1:65298912-65432187 cosmicRecurrent JAK2 [Source:HGNC Symbol;Acc:6192] locus=9:4985033-5128183 tscapeBCd JAK3 [Source:HGNC Symbol;Acc:6193] locus=19:17935589-17958880 LEPR leptin receptor [Source:HGNC Symbol;Acc:6554] locus=1:65886248-66107242 cosmicPrimary, snp3dObesity, tcgaBreastGE LIFR leukemia inhibitory factor receptor alpha [Source:HGNC Symbol;Acc:6597] locus=5:38475065-38608456 tcgaBreastGE LIPC lipase, hepatic [Source:HGNC Symbol;Acc:6619] locus=15:58702775-58861151 tscapeCRCd, tscapeOvariand LIPF lipase, gastric [Source:HGNC Symbol;Acc:6622] locus=10:90424198-90438571 tscapeCRCd, tscapeSCLCd LIPG lipase, endothelial [Source:HGNC Symbol;Acc:6623] locus=18:47088427-47119278 MAOA monoamine oxidase A [Source:HGNC Symbol;Acc:6833] locus=X:43515467-43606068 tcgaBreastGE, tcgaGliomaGE MAOB monoamine oxidase B [Source:HGNC Symbol;Acc:6834] locus=X:43625858-43741693 tcgaBreastGE, tcgaBreastGESurv MPDZ multiple PDZ domain protein [Source:HGNC Symbol;Acc:7208] locus=9:13105703-13279589 tcgaBreastGE MPL myeloproliferative leukemia virus oncogene [Source:HGNC Symbol;Acc:7217] locus=1:43803475-43818443 MPZ myelin protein zero [Source:HGNC Symbol;Acc:7225] locus=1:161274525-161279762 tscapeBCd NFKB1 nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 [Source:HGNC Symbol;Acc:7794] snp3dMetastasis, tcgaGliomaGE, locus=4:103422486-103538459 tscapeHCCd NFKB2 nuclear factor of kappa light polypeptide gene enhancer in B-cells 2 (p49/p100) [Source:HGNC Symbol;Acc:7795] tcgaBreastGE, tscapeBCd, locus=10:104154229-104162281 tscapeCRCd NME1 non-metastatic cells 1, protein (NM23A) expressed in [Source:HGNC Symbol;Acc:7849] locus=17:49230897-49239789 snp3dLungC, snp3dMetastasis, tcgaBreastGE NME1- NME1-NME2 readthrough [Source:HGNC Symbol;Acc:33531] locus=17:49230951-49249105 NME2 NME2 non-metastatic cells 2, protein (NM23B) expressed in [Source:HGNC Symbol;Acc:7850] locus=17:49230920-49249108 NME3 non-metastatic cells 3, protein expressed in [Source:HGNC Symbol;Acc:7851] locus=16:1820321-1821710 tcgaBreastGE, tscapeHCCd NME4 non-metastatic cells 4, protein expressed in [Source:HGNC Symbol;Acc:7852] locus=16:446725-460367 tcgaBreastGE, tcgaGliomaGE, tscapeHCCd NME5 non-metastatic cells 5, protein expressed in (nucleoside-diphosphate kinase) [Source:HGNC Symbol;Acc:7853] tcgaGliomaGE locus=5:137450866-137475132 NME6 non-metastatic cells 6, protein expressed in (nucleoside-diphosphate kinase) [Source:HGNC Symbol;Acc:20567] tcgaBreastGE locus=3:48334754-48343175 NME7 non-metastatic cells 7, protein expressed in (nucleoside-diphosphate kinase) [Source:HGNC Symbol;Acc:20461] locus=1:169101769-169337205 NOS1 nitric oxide synthase 1 (neuronal) [Source:HGNC Symbol;Acc:7872] locus=12:117636114-117799582 tcgaGliomaGE OCLN occludin [Source:HGNC Symbol;Acc:8104] locus=5:68788119-68853931 tscapeBCd, tscapeCRCd, tscapeNSCLCd, tscapeOvariand, tscapeProstated OSMR oncostatin M receptor [Source:HGNC Symbol;Acc:8507] locus=5:38845960-38945698 tcgaGliomaGE, tcgaGliomaGESurv, tscapeSCLCa PKLR pyruvate kinase, liver and RBC [Source:HGNC Symbol;Acc:9020] locus=1:155259086-155271225 tscapeHCCa, tscapeProstatea PKM2 pyruvate kinase, muscle [Source:HGNC Symbol;Acc:9021] locus=15:72491370-72524163 tcgaBreastGE, tcgaGliomaGE PLD1 phospholipase D1, phosphatidylcholine-specific [Source:HGNC Symbol;Acc:9067] locus=3:171318195-171528740 tscapeBCa, tscapeOvariana PLD2 phospholipase D2 [Source:HGNC Symbol;Acc:9068] locus=17:4710391-4726727 tcgaGliomaGE PNLIP pancreatic lipase [Source:HGNC Symbol;Acc:9155] locus=10:118305443-118327367 tscapeBCd, tscapeCRCd, tscapeNSCLCd PNLIPRP1 pancreatic lipase-related protein 1 [Source:HGNC Symbol;Acc:9156] locus=10:118349897-118368687 PNLIPRP2 pancreatic lipase-related protein 2 [Source:HGNC Symbol;Acc:9157] locus=10:118380465-118404654 tcgaBreastGE, tscapeBCd, tscapeCRCd, tscapeNSCLCd PNLIPRP3 pancreatic lipase-related protein 3 [Source:HGNC Symbol;Acc:23492] locus=10:118187379-118237469 tcgaGliomaGE PNPLA3 patatin-like phospholipase domain containing 3 [Source:HGNC Symbol;Acc:18590] locus=22:44319619-44360368 tscapeGliomad, tscapeOvariand PNPT1 polyribonucleotide nucleotidyltransferase 1 [Source:HGNC Symbol;Acc:23166] locus=2:55861400-55921045 tcgaBreastGE, tcgaGliomaGE, tcgaOvarianGE PRKG1 protein kinase, cGMP-dependent, type I [Source:HGNC Symbol;Acc:9414] locus=10:52750945-54058110 cosmicRecurrent PRKG2 protein kinase, cGMP-dependent, type II [Source:HGNC Symbol;Acc:9416] locus=4:82009837-82136218 tcgaGliomaGE PRLR prolactin receptor [Source:HGNC Symbol;Acc:9446] locus=5:35048861-35230794 tcgaBreastGE RELA v- reticuloendotheliosis viral oncogene homolog A (avian) [Source:HGNC Symbol;Acc:9955] locus=11:65421067-65430565 tcgaGliomaGE RP11- Tubulin beta-8 chain [Source:UniProtKB/Swiss-Prot;Acc:Q3ZCM7] locus=10:92828-96053 631M21.2 SHANK1 SH3 and multiple ankyrin repeat domains 1 [Source:HGNC Symbol;Acc:15474] locus=19:51165084-51222707 tcgaGliomaGE SHANK2 SH3 and multiple ankyrin repeat domains 2 [Source:HGNC Symbol;Acc:14295] locus=11:70313961-70963623 tcgaGliomaGE SHANK3 SH3 and multiple ankyrin repeat domains 3 [Source:HGNC Symbol;Acc:14294] locus=22:51113070-51171641 tscapeBCd, tscapeHCCd, tscapeNSCLCd, tscapeOvariand, tscapeSCLCd SRM spermidine synthase [Source:HGNC Symbol;Acc:11296] locus=1:11114641-11120081 tscapeBCd, tscapeCRCd, tscapeHCCd, tscapeNSCLCd, tscapeOvariana, tscapeOvariand, tscapeRCCd TAT tyrosine aminotransferase [Source:HGNC Symbol;Acc:11573] locus=16:71599563-71611033 tscapeOvariand TH tyrosine hydroxylase [Source:HGNC Symbol;Acc:11782] locus=11:2185159-2193107 tscapeBCd, tscapeNSCLCd, tscapeOvariand TJP1 tight junction protein 1 (zona occludens 1) [Source:HGNC Symbol;Acc:11827] locus=15:29991571-30248497 tscapeBCd TJP2 tight junction protein 2 (zona occludens 2) [Source:HGNC Symbol;Acc:11828] locus=9:71736224-71870124 TJP3 tight junction protein 3 (zona occludens 3) [Source:HGNC Symbol;Acc:11829] locus=19:3708382-3750810 tcgaBreastGE, tscapeHCCd TPH1 tryptophan hydroxylase 1 [Source:HGNC Symbol;Acc:12008] locus=11:18039111-18063973 tcgaGliomaGE TPH2 tryptophan hydroxylase 2 [Source:HGNC Symbol;Acc:20692] locus=12:72332626-72426221 tcgaGliomaGE TUBB tubulin, beta [Source:HGNC Symbol;Acc:20778] locus=6:30687978-30693203 tcgaBreastGE, tcgaOvarianGE TUBB1 tubulin, beta 1 [Source:HGNC Symbol;Acc:16257] locus=20:57594309-57601709 TUBB2B tubulin, beta 2B [Source:HGNC Symbol;Acc:30829] locus=6:3224517-3231964 tcgaBreastGE, tcgaGliomaGE, tscapeOvariana, tscapeSCLCd TUBB2C tubulin, beta 2C [Source:HGNC Symbol;Acc:20771] locus=9:140135665-140138159 tcgaBreastGE, tscapeCRCa TUBB3 tubulin, beta 3 [Source:HGNC Symbol;Acc:20772] locus=16:89989687-90002505 tscapeCRCd, tscapeNSCLCd, tscapeProstated TUBB4 tubulin, beta 4 [Source:HGNC Symbol;Acc:20774] locus=19:6494331-6502330 tcgaGliomaGE, tscapeHCCd, tscapeRCCd TUBB4Q tubulin, beta polypeptide 4, member Q [Source:HGNC Symbol;Acc:12413] locus=8:30209389-30211034 tcgaBreastGE, tscapeBCd, tscapeHCCd, tscapeMelanomad, tscapeNSCLCd, tscapeProstated, tscapeRCCd TUBB6 tubulin, beta 6 [Source:HGNC Symbol;Acc:20776] locus=18:12308070-12326568 tcgaGliomaGE TYK2 [Source:HGNC Symbol;Acc:12440] locus=19:10461209-10491352 tcgaGliomaGE TYR tyrosinase (oculocutaneous albinism IA) [Source:HGNC Symbol;Acc:12442] locus=11:88910620-89028908 tcgaBreastGE Continued on next page. . .

58 name description studies UGP2 UDP-glucose pyrophosphorylase 2 [Source:HGNC Symbol;Acc:12527] locus=2:64068074-64118696 tcgaGliomaGE UGT1A1 UDP glucuronosyltransferase 1 family, polypeptide A1 [Source:HGNC Symbol;Acc:12530] locus=2:234668894-234681945 tscapeBCd, tscapeMelanomad, tscapeNSCLCd, tscapeOvariand, tscapeRCCd, tscapeSCLCd UGT1A10 UDP glucuronosyltransferase 1 family, polypeptide A10 [Source:HGNC Symbol;Acc:12531] locus=2:234545100-234681951 cosmicPrimary, tscapeBCd, tscapeMelanomad, tscapeNSCLCd, tscapeOvariand, tscapeRCCd, tscapeSCLCd UGT1A3 UDP glucuronosyltransferase 1 family, polypeptide A3 [Source:HGNC Symbol;Acc:12535] locus=2:234637754-234681945 tscapeBCd, tscapeMelanomad, tscapeNSCLCd, tscapeOvariand, tscapeRCCd, tscapeSCLCd UGT1A4 UDP glucuronosyltransferase 1 family, polypeptide A4 [Source:HGNC Symbol;Acc:12536] locus=2:234627424-234681945 tscapeBCd, tscapeMelanomad, tscapeNSCLCd, tscapeOvariand, tscapeRCCd, tscapeSCLCd UGT1A5 UDP glucuronosyltransferase 1 family, polypeptide A5 [Source:HGNC Symbol;Acc:12537] locus=2:234621638-234681945 tscapeBCd, tscapeMelanomad, tscapeNSCLCd, tscapeOvariand, tscapeRCCd, tscapeSCLCd UGT1A6 UDP glucuronosyltransferase 1 family, polypeptide A6 [Source:HGNC Symbol;Acc:12538] locus=2:234600253-234681946 tscapeBCd, tscapeMelanomad, tscapeNSCLCd, tscapeOvariand, tscapeRCCd, tscapeSCLCd UGT1A7 UDP glucuronosyltransferase 1 family, polypeptide A7 [Source:HGNC Symbol;Acc:12539] locus=2:234590584-234681945 tscapeBCd, tscapeMelanomad, tscapeNSCLCd, tscapeOvariand, tscapeRCCd, tscapeSCLCd UGT1A8 UDP glucuronosyltransferase 1 family, polypeptide A8 [Source:HGNC Symbol;Acc:12540] locus=2:234526291-234681956 cosmicPrimary, tscapeBCd, tscapeMelanomad, tscapeNSCLCd, tscapeOvariand, tscapeRCCd, tscapeSCLCd UGT1A9 UDP glucuronosyltransferase 1 family, polypeptide A9 [Source:HGNC Symbol;Acc:12541] locus=2:234580499-234681946 tscapeBCd, tscapeMelanomad, tscapeNSCLCd, tscapeOvariand, tscapeRCCd, tscapeSCLCd UGT2A1 UDP glucuronosyltransferase 2 family, polypeptide A1, complex locus [Source:HGNC Symbol;Acc:12542] locus=4:70454135-70518967 UGT2A3 UDP glucuronosyltransferase 2 family, polypeptide A3 [Source:HGNC Symbol;Acc:28528] locus=4:69794181-69817509 UGT2B10 UDP glucuronosyltransferase 2 family, polypeptide B10 [Source:HGNC Symbol;Acc:12544] locus=4:69681711-69696914 UGT2B11 UDP glucuronosyltransferase 2 family, polypeptide B11 [Source:HGNC Symbol;Acc:12545] locus=4:70065669-70080449 UGT2B15 UDP glucuronosyltransferase 2 family, polypeptide B15 [Source:HGNC Symbol;Acc:12546] locus=4:69512348-69536346 UGT2B17 UDP glucuronosyltransferase 2 family, polypeptide B17 [Source:HGNC Symbol;Acc:12547] locus=4:69402902-69434245 UGT2B28 UDP glucuronosyltransferase 2 family, polypeptide B28 [Source:HGNC Symbol;Acc:13479] locus=4:70146217-70160768 UGT2B4 UDP glucuronosyltransferase 2 family, polypeptide B4 [Source:HGNC Symbol;Acc:12553] locus=4:70345883-70391732 UGT2B7 UDP glucuronosyltransferase 2 family, polypeptide B7 [Source:HGNC Symbol;Acc:12554] locus=4:69917081-69978705

Table 25: List of KEGG [2] pathways supporting the relationships between the genes shown in Figure 19. Number of edges taken from each pathway is shown on edges column.

name edges genes Tight junction 592 CLDN1, CLDN10, CLDN11, CLDN14, CLDN15, CLDN16, CLDN17, CLDN18, CLDN19, CLDN2, CLDN20, CLDN22, CLDN23, CLDN3, CLDN4, CLDN5, CLDN6, CLDN7, CLDN8, CLDN9, INADL, MPDZ, OCLN, TJP1, TJP2, TJP3 Leukocyte transendothelial migration 462 CLDN1, CLDN10, CLDN11, CLDN14, CLDN15, CLDN16, CLDN17, CLDN18, CLDN19, CLDN2, CLDN20, CLDN22, CLDN23, CLDN3, CLDN4, CLDN5, CLDN6, CLDN7, CLDN8, CLDN9, ESAM, OCLN Purine metabolism 456 ADCY1, ADCY10, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9, ENTPD1, ENTPD3, ENTPD8, GUCY1A3, ITPA, NME1, NME1-NME2, NME2, NME3, NME4, NME5, NME6, NME7, PKLR, PKM2, PNPT1, POLR3GL Cell adhesion molecules (CAMs) 382 CLDN1, CLDN10, CLDN11, CLDN14, CLDN15, CLDN16, CLDN17, CLDN18, CLDN19, CLDN2, CLDN20, CLDN22, CLDN23, CLDN3, CLDN4, CLDN5, CLDN6, CLDN7, CLDN8, CLDN9, ESAM, MPZ, MPZL1, OCLN Jak-STAT signaling pathway 299 CNTFR, CRLF2, CSF2RA, CSF2RB, CSF3R, EPOR, GHR, IFNAR1, IFNAR2, IFNGR1, IFNGR2, IL10RA, IL10RB, IL11RA, IL12RB1, IL12RB2, IL13RA1, IL13RA2, IL15RA, IL20RA, IL20RB, IL21R, IL22RA1, IL22RA2, IL23R, IL28RA, IL2RA, IL2RB, IL2RG, IL3RA, IL4R, IL5RA, IL6R, IL6ST, IL7R, IL9R, JAK1, JAK2, JAK3, LEPR, LIFR, MPL, OSMR, PRLR, SOCS2, TYK2 Glycerolipid metabolism 239 AGPAT1, AGPAT2, CEL, DGAT1, DGAT2, DGKA, DGKB, DGKD, DGKE, DGKG, DGKH, DGKI, DGKQ, DGKZ, LIPC, LIPF, LIPG, MBOAT2, PNLIP, PNLIPRP1, PNLIPRP2, PNLIPRP3, PNPLA3, PPAP2A Metabolism of xenobiotics by cytochrome P450 133 ADH1A, ADH1B, ADH1C, ADH4, ADH5, ADH6, ADH7, ALDH1A3, UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, UGT2A1, UGT2A3, UGT2B10, UGT2B11, UGT2B15, UGT2B17, UGT2B28, UGT2B4, UGT2B7 Pyrimidine metabolism 110 CTPS, CTPS2, ENTPD1, ENTPD3, ENTPD8, ITPA, NME1, NME1-NME2, NME2, NME3, NME4, NME5, NME6, NME7, PNPT1, POLR3GL Glycerophospholipid metabolism 109 AGPAT1, AGPAT2, CEPT1, CHPT1, DGKA, DGKB, DGKD, DGKE, DGKG, DGKH, DGKI, DGKQ, DGKZ, EPT1, MBOAT2, PLD1, PLD2, PPAP2A Tyrosine metabolism 73 ADH1A, ADH1B, ADH1C, ADH4, ADH5, ADH6, ADH7, ALDH1A3, COMT, DDC, GOT1, GOT2, IL4I1, MAOA, MAOB, TAT,TH, TYR Glutamatergic synapse 54 ADCY1, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9, GRM1, GRM5, HOMER1, HOMER2, HOMER3, ITPR1, ITPR2, ITPR3, PLD1, PLD2, SHANK1, SHANK2, SHANK3 Measles 32 IFNAR1, IFNAR2, IFNGR1, IFNGR2, IL2RA, IL2RB, IL2RG, JAK1, JAK2, JAK3, NFKB1, NFKBIA, RELA, TYK2 Phagosome 30 DYNC1H1, DYNC1I1, DYNC1I2, DYNC1LI1, DYNC1LI2, DYNC2H1, NOS1, RP11-631M21.2, TUBA3D, TUBB, TUBB1, TUBB2B, TUBB2C, TUBB3, TUBB4, TUBB4Q, TUBB6 Tryptophan metabolism 30 ABP1, DDC, IDO1, IDO2, IL4I1, MAOA, MAOB, TPH1, TPH2 Gap junction 26 ADCY1, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9, GJA1, GRM1, GRM5, GUCY1A3, ITPR1, ITPR2, ITPR3, PRKG1, PRKG2, RP11-631M21.2, TJP1, TUBA3D, TUBB, TUBB1, TUBB2B, TUBB2C, TUBB3, TUBB4, TUBB4Q, TUBB6 Starch and sucrose metabolism 26 GYS1, GYS2, PYGB, UGDH, UGP2, UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, UGT2A1, UGT2A3, UGT2B10, UGT2B11, UGT2B15, UGT2B17, UGT2B28, UGT2B4, UGT2B7 Influenza A 20 IFNAR1, IFNAR2, IFNGR1, IFNGR2, JAK1, JAK2, NFKB1, NFKBIA, RELA, TYK2 Toxoplasmosis 20 IFNGR1, IFNGR2, IL10RA, IL10RB, JAK1, JAK2, NFKB1, NFKBIA, RELA, TYK2 Pentose and glucuronate interconversions 19 UGDH, UGP2, UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, UGT2A1, UGT2A3, UGT2B10, UGT2B11, UGT2B15, UGT2B17, UGT2B28, UGT2B4, UGT2B7 Steroid hormone biosynthesis 18 COMT, UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, UGT2A1, UGT2A3, UGT2B10, UGT2B11, UGT2B15, UGT2B17, UGT2B28, UGT2B4, UGT2B7 Drug metabolism - cytochrome P450 14 ADH1A, ADH1B, ADH1C, ADH4, ADH5, ADH6, ADH7, ALDH1A3, MAOA, MAOB, UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, UGT2A1, UGT2A3, UGT2B10, UGT2B11, UGT2B15, UGT2B17, UGT2B28, UGT2B4, UGT2B7 Osteoclast differentiation 12 IFNAR1, IFNAR2, IFNGR1, IFNGR2, JAK1, NFKB1, NFKB2, NFKBIA, RELA, TYK2 Arginine and proline metabolism 12 ABP1, AGMAT, CARNS1, GOT1, GOT2, MAOA, MAOB, NOS1, ODC1, SAT1, SRM Ether lipid metabolism 11 CEPT1, CHPT1, EPT1, PLD1, PLD2, PPAP2A Glycine, serine and threonine metabolism9 ALAS1, ALAS2, GCAT, GLDC, GNMT, MAOA, MAOB Phenylalanine metabolism 8 ALDH1A3, DDC, GOT1, GOT2, IL4I1, MAOA, MAOB, TAT Retinol metabolism7 ADH1A, ADH1B, ADH1C, ADH4, ADH5, ADH6, ADH7, DGAT1, UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, UGT2A1, UGT2A3, UGT2B10, UGT2B11, UGT2B15, UGT2B17, UGT2B28, UGT2B4, UGT2B7 Glycolysis / Gluconeogenesis 7 ADH1A, ADH1B, ADH1C, ADH4, ADH5, ADH6, ADH7, ALDH1A3, PKLR, PKM2 Pathways in cancer6 CSF2RA, CSF3R, JAK1, NFKB1, NFKB2, NFKBIA, PLD1, RELA Long-term depression 6 GRM1, GRM5, GUCY1A3, ITPR1, ITPR2, ITPR3, NOS1, PRKG1, PRKG2 Adipocytokine signaling pathway5 IRS2, JAK2, LEPR, NFKB1, NFKBIA, RELA Fc gamma R-mediated phagocytosis 5 PLD1, PLD2, PPAP2A Small cell lung cancer4 NFKB1, NFKBIA, RELA Chronic myeloid leukemia 4 NFKB1, NFKBIA, RELA Prostate cancer4 NFKB1, NFKBIA, RELA Hepatitis C 4 CLDN1, CLDN10, CLDN11, CLDN14, CLDN15, CLDN16, CLDN17, CLDN18, CLDN19, CLDN2, CLDN20, CLDN22, CLDN23, CLDN3, CLDN4, CLDN5, CLDN6, CLDN7, CLDN8, CLDN9, IFNAR1, IFNAR2, JAK1, NFKB1, NFKBIA, OCLN, RELA, TYK2 Chagas disease (American trypanosomiasis)4 ADCY1, IFNGR1, IFNGR2, NFKB1, NFKBIA, RELA Leishmaniasis 4 IFNGR1, IFNGR2, JAK1, JAK2, NFKB1, NFKBIA, RELA Shigellosis 4 NFKB1, NFKBIA, RELA Epithelial cell signaling in Helicobacter pylori infection 4 NFKB1, NFKBIA, RELA, TJP1 Continued on next page. . .

59 name edges genes Fat digestion and absorption4 AGPAT1, AGPAT2, CEL, DGAT1, DGAT2, GOT2, LIPF, PNLIP, PNLIPRP1, PNLIPRP2, PPAP2A Olfactory transduction 4 ADCY3, PRKG1, PRKG2 Neurotrophin signaling pathway4 IRS2, NFKB1, NFKBIA, RELA B cell receptor signaling pathway 4 NFKB1, NFKBIA, RELA T cell receptor signaling pathway4 NFKB1, NFKBIA, RELA Cytosolic DNA-sensing pathway 4 NFKB1, NFKBIA, POLR3GL, RELA RIG-I-like receptor signaling pathway4 NFKB1, NFKBIA, RELA NOD-like receptor signaling pathway 4 NFKB1, NFKBIA, RELA Apoptosis 4 CSF2RB, IL3RA, NFKB1, NFKBIA, RELA Chemokine signaling pathway 4 ADCY1, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9, JAK2, JAK3, NFKB1, NFKBIA, RELA Sphingolipid metabolism4 ASAH1, ASAH2, PPAP2A Histidine metabolism 4 ABP1, ALDH1A3, CARNS1, DDC, MAOA, MAOB Steroid biosynthesis4 CEL, DHCR24, DHCR7 Insulin signaling pathway 3 GYS1, GYS2, INSR, IRS2, PKLR, PYGB, SOCS2 Tuberculosis2 IFNGR1, IFNGR2, IL10RA, IL10RB, JAK1, JAK2, NFKB1, RELA Toll-like receptor signaling pathway 2 IFNAR1, IFNAR2, NFKB1, NFKBIA, RELA Type II diabetes mellitus1 INSR, IRS2, PKLR, PKM2, SOCS2 Vascular smooth muscle contraction 1 ADCY1, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9, GUCY1A3, ITPR1, ITPR2, ITPR3, KCNMA1, PRKG1 Amino sugar and nucleotide sugar metabolism1 UGDH, UGP2 Glutathione metabolism 1 ODC1, SRM beta-Alanine metabolism1 ALDH1A3, CARNS1, SRM

60 11.1.1 GO enrichment of the candidate pathway

Table 26: Enriched Gene Ontology terms [1] (FDR corrected p ≤ 0.01). Ratio is the proportion of the annotated genes among the whole gene set. List is sorted based on the FDR corrected p-values. Green and blue borders are referring to up and down regulated genes, respectively.

Ratio Type Description Genes 0.165 MF cytokine receptor activity CNTFR, CSF2RA, CSF2RB, CSF3R, EPOR, GHR, IFNAR1, IFNAR2, IFNGR1, IFNGR2, IL10RA, IL10RB, IL11RA, IL12RB1, IL12RB2, IL13RA1, IL13RA2, IL15RA, IL21R, IL22RA1, IL22RA2, IL23R, IL28RA, IL2RA, IL2RB, IL2RG, IL3RA, IL4R, IL5RA, IL6R, IL6ST, IL7R, IL9R, LEPR, LIFR, MPL, OSMR, PRLR 0.119 CC tight junction CLDN1, CLDN10, CLDN11, CLDN14, CLDN15, CLDN16, CLDN17, CLDN18, CLDN19, CLDN2, CLDN20, CLDN22, CLDN23, CLDN3, CLDN4, CLDN5, CLDN6, CLDN7, CLDN8, CLDN9, ESAM, INADL, MPDZ, OCLN, TJP1, TJP2, TJP3 0.089 BP calcium-independent cell-cell adhesion CLDN1, CLDN10, CLDN11, CLDN14, CLDN15, CLDN16, CLDN17, CLDN18, CLDN19, CLDN2, CLDN20, CLDN22, CLDN23, CLDN3, CLDN4, CLDN5, CLDN6, CLDN7, CLDN8, CLDN9 0.442 BP response to chemical stimulus ADCY1, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9, ADH1A, ADH1B, ADH4, ADH5, ADH6, ADH7, AGPAT1, AGPAT2, ALAS2, ALDH1A3, ASAH1, CLDN3, CLDN4, CNTFR, COMT, CSF2RB, CSF3R, CTPS, DDC, DGKD, DGKI, DGKQ, DHCR24, GHR, GJA1, GOT1, GOT2, GUCY1A3, GYS2, IDO1, IFNAR1, IFNAR2, IFNGR1, IFNGR2, IL12RB1, IL12RB2, IL13RA2, IL15RA, IL21R, IL23R, IL28RA, IL2RB, IL6R, IL6ST, INSR, IRS2, ITPR1, ITPR2, ITPR3, JAK1, JAK2, JAK3, KCNMA1, LIFR, LIPG, MAOA, MAOB, NFKB1, NFKBIA, NME1, NOS1, OCLN, OSMR, PKLR, PKM2, PLD1, PLD2, PNLIPRP1, RELA, SOCS2, TAT, TH, TPH2, TUBB3, TYK2, UGDH, UGP2, UGT1A1, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, UGT2A1, UGT2B11, UGT2B15, UGT2B28, UGT2B4 0.281 BP cellular response to chemical stimulus ADCY1, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9, ADH1A, ADH1B, ADH4, ADH6, ADH7, AGPAT1, AGPAT2, CNTFR, COMT, CSF2RB, CSF3R, GHR, GOT1, IFNAR1, IFNAR2, IFNGR1, IFNGR2, IL12RB1, IL12RB2, IL13RA2, IL15RA, IL21R, IL28RA, IL2RB, IL6R, IL6ST, INSR, IRS2, JAK1, JAK2, JAK3, LIFR, MAOA, MAOB, NOS1, PKLR, RELA, SOCS2, TPH2, TYK2, UGDH, UGP2, UGT1A1, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, UGT2B11, UGT2B15, UGT2B28, UGT2B4 0.078 MF glucuronosyltransferase activity UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, UGT2A1, UGT2A3, UGT2B10, UGT2B11, UGT2B15, UGT2B17, UGT2B28, UGT2B4, UGT2B7 0.121 BP activation of protein kinase activity ADCY1, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9, DGKA, DGKB, DGKD, DGKE, DGKG, DGKH, DGKI, DGKQ, DGKZ, GHR, GRM1, GRM5, IL23R, IL6R, INSR, JAK2, PPAP2A, PRLR 0.299 BP response to organic substance ADCY1, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9, ADH5, ADH6, ADH7, AGPAT1, AGPAT2, ASAH1, CLDN4, CNTFR, COMT, CSF2RB, DGKD, DGKI, DGKQ, GHR, GJA1, GOT1, GOT2, GUCY1A3, GYS2, IDO1, IFNAR1, IFNAR2, IFNGR1, IFNGR2, IL12RB1, IL12RB2, IL13RA2, IL15RA, IL21R, IL23R, IL28RA, IL2RB, IL6R, IL6ST, INSR, IRS2, JAK1, JAK2, JAK3, LIFR, MAOB, NFKBIA, NME1, NOS1, OCLN, OSMR, PKLR, PKM2, PLD1, PLD2, PNLIPRP1, RELA, SOCS2, TAT, TH, TPH2, TYK2, UGT1A1 0.133 CC microsome CEL, CHPT1, COMT, DGAT1, DGAT2, DHCR7, GRM1, INSR, IRS2, ITPR1, ITPR2, ITPR3, NME1, PLD1, UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, UGT2B10, UGT2B11, UGT2B15, UGT2B17, UGT2B28, UGT2B4, UGT2B7 0.379 BP small molecule metabolic process ADCY1, ADCY10, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9, ADH1A, ADH1B, ADH4, ADH5, ADH6, ADH7, AGPAT1, AGPAT2, ALDH1A3, CARNS1, CEL, CHPT1, COMT, CTPS, CTPS2, DDC, DGAT1, DGAT2, DGKD, DGKI, DHCR24, DHCR7, ENTPD1, ENTPD8, GCAT, GHR, GLDC, GNMT, GOT1, GOT2, GUCY1A3, GYS1, GYS2, IDO1, IDO2, IL4I1, IL6ST, INSR, IRS2, ITPA, LEPR, LIPC, LIPF, MAOA, MAOB, NFKB1, NME1, NME1-NME2, NME2, NME3, NME4, NME5, NME6, NME7, NOS1, OCLN, ODC1, PKLR, PKM2, PLD1, PNPLA3, PYGB, TAT, TH, TPH1, TPH2, TUBB3, TYR, UGDH, UGP2, UGT1A1, UGT1A3, UGT1A7, UGT1A8, UGT1A9 0.183 BP cellular nitrogen compound biosynthetic ADCY1, ADCY10, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9, AGMAT, process ALAS1, ALAS2, CARNS1, CTPS, CTPS2, DDC, ENTPD8, GOT1, GOT2, GUCY1A3, INSR, JAK2, NME1, NME1-NME2, NME2, NME3, NME4, NME5, NME6, NME7, NOS1, ODC1, PKLR, PKM2, SAT1, SRM, TH, TPH1, TPH2, TUBB3 0.103 BP energy reserve metabolic process ADCY1, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9, AGPAT1, GNMT, GYS1, GYS2, IL6ST, INSR, IRS2, ITPR1, ITPR2, ITPR3, LEPR, PKLR, PYGB, UGP2 0.098 BP xenobiotic metabolic process ADH1A, ADH1B, ADH4, ADH6, ADH7, COMT, MAOA, MAOB, UGDH, UGP2, UGT1A1, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, UGT2B11, UGT2B15, UGT2B28, UGT2B4 0.191 MF identical protein binding ADH5, ADH7, ALDH1A3, CLDN1, CLDN10, CLDN11, CLDN14, CLDN15, CLDN16, CLDN17, CLDN18, CLDN19, CLDN2, CLDN20, CLDN22, CLDN23, CLDN3, CLDN4, CLDN5, CLDN6, CLDN7, CLDN8, CLDN9, DGKD, GHR, GYS2, IL6R, IL6ST, MAOB, NFKB1, NFKBIA, NME1, PRLR, PYGB, RELA, TYR, UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A6, UGT1A7, UGT1A8, UGT1A9 0.179 BP cellular response to organic substance ADCY1, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9, AGPAT1, AGPAT2, CNTFR, CSF2RB, GHR, GOT1, IFNAR1, IFNAR2, IFNGR1, IFNGR2, IL12RB1, IL12RB2, IL13RA2, IL15RA, IL21R, IL28RA, IL2RB, IL6R, IL6ST, INSR, IRS2, JAK1, JAK2, JAK3, LIFR, NOS1, PKLR, RELA, SOCS2, TYK2, UGT1A1 0.043 MF adenylate cyclase activity ADCY1, ADCY10, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9 0.103 BP cytokine-mediated signaling pathway AGPAT1, AGPAT2, CNTFR, CSF2RB, IFNAR1, IFNAR2, IFNGR1, IFNGR2, IL12RB1, IL12RB2, IL13RA2, IL15RA, IL21R, IL28RA, IL2RB, IL6R, IL6ST, JAK1, JAK2, JAK3, LIFR, RELA, TYK2 0.174 BP small molecule biosynthetic process ADCY1, ADCY10, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9, AGPAT1, AGPAT2, CARNS1, CTPS, CTPS2, DDC, ENTPD8, GNMT, GOT1, GOT2, GUCY1A3, IDO1, LIPC, NFKB1, NME1, NME1-NME2, NME2, NME3, NME4, NME5, NME6, NME7, PKLR, PKM2, PLD1, TH, TUBB3, UGDH, UGP2 0.074 MF growth factor binding CSF2RB, GHR, IL10RA, IL10RB, IL2RA, IL2RB, IL2RG, IL3RA, IL4R, IL5RA, IL6R, IL6ST, IL7R, IL9R, INSR, LIFR, OSMR 0.540 CC integral to membrane ADCY1, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9, AGPAT1, AGPAT2, ASAH2, CEPT1, CHPT1, CLDN1, CLDN10, CLDN11, CLDN14, CLDN15, CLDN16, CLDN17, CLDN18, CLDN19, CLDN2, CLDN20, CLDN22, CLDN23, CLDN3, CLDN4, CLDN5, CLDN6, CLDN7, CLDN8, CLDN9, COMT, CRLF2, CSF2RA, CSF2RB, CSF3R, DGAT1, DGAT2, DGKE, DHCR24, DHCR7, ENTPD1, ENTPD3, ENTPD8, EPOR, EPT1, ESAM, GHR, GJA1, GRM1, GRM5, HOMER1, IFNAR1, IFNAR2, IFNGR1, IFNGR2, IL10RA, IL10RB, IL11RA, IL12RB1, IL12RB2, IL13RA1, IL13RA2, IL15RA, IL20RA, IL20RB, IL21R, IL22RA1, IL22RA2, IL23R, IL28RA, IL2RA, IL2RB, IL2RG, IL3RA, IL4R, IL5RA, IL6R, IL6ST, IL7R, IL9R, INSR, ITPR1, ITPR2, ITPR3, KCNMA1, LEPR, LIFR, MAOA, MAOB, MBOAT2, MPL, MPZ, MPZL1, OCLN, OSMR, PNPLA3, PPAP2A, PRLR, TUBB3, TYR, UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, UGT2A1, UGT2A3, UGT2B10, UGT2B11, UGT2B15, UGT2B17, UGT2B28, UGT2B4, UGT2B7 0.214 BP lipid metabolic process ADH4, ADH5, ADH7, AGPAT1, AGPAT2, ALDH1A3, ASAH1, ASAH2, CEL, CEPT1, CHPT1, COMT, DGAT1, DGAT2, DGKD, DGKE, DGKZ, DHCR24, DHCR7, EPT1, GHR, IL6ST, IRS2, LEPR, LIPC, LIPF, LIPG, MBOAT2, NFKB1, PLD1, PLD2, PNLIP, PNLIPRP1, PNLIPRP3, PNPLA3, PPAP2A, PRLR, UGT1A1, UGT1A3, UGT1A7, UGT1A8, UGT1A9, UGT2B10, UGT2B11, UGT2B15, UGT2B17, UGT2B4, UGT2B7 0.040 BP CTP biosynthetic process CTPS, NME1, NME1-NME2, NME2, NME3, NME4, NME5, NME6, NME7 0.152 MF structural molecule activity CLDN1, CLDN10, CLDN11, CLDN14, CLDN15, CLDN16, CLDN17, CLDN18, CLDN19, CLDN2, CLDN20, CLDN22, CLDN23, CLDN3, CLDN4, CLDN5, CLDN6, CLDN7, CLDN8, CLDN9, HOMER2, MPZ, MPZL1, OCLN, RP11-631M21.2, SHANK1, SHANK2, TUBA3D, TUBB, TUBB1, TUBB2B, TUBB2C, TUBB3, TUBB4, TUBB6 0.039 MF diacylglycerol kinase activity DGKA, DGKB, DGKD, DGKE, DGKG, DGKH, DGKI, DGKQ, DGKZ 0.156 BP regulation of body fluid levels CEL, DGKA, DGKB, DGKD, DGKE, DGKG, DGKH, DGKI, DGKQ, DGKZ, ENTPD1, ESAM, GUCY1A3, IFNAR1, IFNAR2, IFNGR1, IFNGR2, IL10RA, IL10RB, IL20RA, IL20RB, IL22RA1, IL22RA2, IL28RA, ITPR1, ITPR2, ITPR3, JAK2, KCNMA1, MPL, NME1, NOS1, PRKG1, PRKG2, PRLR 0.179 BP carboxylic acid metabolic process ADH7, AGPAT1, AGPAT2, ALDH1A3, CARNS1, CEL, COMT, CTPS, CTPS2, DDC, GCAT, GHR, GLDC, GNMT, GOT1, GOT2, IDO1, IDO2, IL4I1, IRS2, LIPC, LIPF, NOS1, OCLN, ODC1, PKLR, PKM2, PLD1, TAT, TH, TPH1, TPH2, TYR, UGDH, UGP2, UGT1A1, UGT1A3, UGT1A7, UGT1A8, UGT1A9 0.143 BP blood coagulation DGKA, DGKB, DGKD, DGKE, DGKG, DGKH, DGKI, DGKQ, DGKZ, ENTPD1, ESAM, GUCY1A3, IFNAR1, IFNAR2, IFNGR1, IFNGR2, IL10RA, IL10RB, IL20RA, IL20RB, IL22RA1, IL22RA2, IL28RA, ITPR1, ITPR2, ITPR3, JAK2, KCNMA1, MPL, NOS1, PRKG1, PRKG2 0.040 BP activation of protein kinase A activity ADCY1, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9 0.035 MF nucleoside diphosphate kinase activity NME1, NME1-NME2, NME2, NME3, NME4, NME5, NME6, NME7 0.054 BP cell-cell junction assembly CLDN14, CLDN15, CLDN17, CLDN18, CLDN19, CLDN20, CLDN22, CLDN23, CLDN9, GJA1, INADL, TJP1 0.078 MF carboxylic acid binding ADH5, ALAS2, DDC, DGAT1, GNMT, GOT1, IDO1, INSR, NOS1, TAT, TH, TPH1, TPH2, UGT1A1, UGT1A3, UGT1A7, UGT1A8, UGT1A9 0.049 BP activation of protein kinase C activity by DGKA, DGKB, DGKD, DGKE, DGKG, DGKH, DGKI, DGKQ, DGKZ, GRM5, PPAP2A G-protein coupled receptor protein signaling pathway 0.036 BP GTP biosynthetic process NME1, NME1-NME2, NME2, NME3, NME4, NME5, NME6, NME7 0.036 BP UTP biosynthetic process NME1, NME1-NME2, NME2, NME3, NME4, NME5, NME6, NME7 0.045 BP apical junction assembly CLDN14, CLDN15, CLDN17, CLDN18, CLDN19, CLDN20, CLDN22, CLDN23, CLDN9, INADL 0.156 BP cell activation DGKA, DGKB, DGKD, DGKE, DGKG, DGKH, DGKI, DGKQ, DGKZ, ENTPD1, GUCY1A3, IDO1, IFNAR1, IL12RB1, IL15RA, IL21R, IL23R, IL2RA, IL2RG, IL4R, IL6ST, IL7R, IRS2, ITPR1, ITPR2, ITPR3, JAK2, KCNMA1, MPL, NFKB2, NOS1, PLD2, PRKG1, PRKG2, PRLR Continued on next page. . .

61 Ratio Type Description Genes 0.129 BP cellular amine metabolic process AGMAT, CARNS1, CHPT1, COMT, CTPS, CTPS2, DDC, GCAT, GHR, GLDC, GNMT, GOT1, GOT2, IDO1, IDO2, IL4I1, LIPC, MAOA, MAOB, NOS1, OCLN, ODC1, SAT1, SRM, TAT, TH, TPH1, TPH2, TYR 0.035 MF triglyceride lipase activity CEL, LIPC, LIPF, LIPG, PNLIP, PNLIPRP1, PNLIPRP3, PNPLA3 0.067 BP cellular biogenic amine metabolic process AGMAT, CHPT1, COMT, DDC, IDO1, IDO2, LIPC, MAOA, MAOB, ODC1, SAT1, SRM, TH, TPH1, TPH2 0.045 BP retinoid metabolic process ADH4, ADH5, ADH7, ALDH1A3, PNLIP, UGT1A1, UGT1A3, UGT1A7, UGT1A8, UGT1A9 0.134 BP alcohol metabolic process ADH1A, ADH1B, ADH4, ADH5, ADH6, ADH7, CEL, CHPT1, COMT, DDC, DGAT2, DHCR24, DHCR7, GNMT, GOT1, GOT2, GYS1, GYS2, INSR, IRS2, LEPR, LIPC, MAOA, MAOB, PKLR, PKM2, PYGB, TH, UGDH, UGP2 0.040 BP tight junction assembly CLDN14, CLDN15, CLDN17, CLDN18, CLDN20, CLDN22, CLDN23, CLDN9, INADL 0.062 BP activation of phospholipase C activity ADCY1, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9, GRM5, HOMER1, ITPR1, ITPR2, ITPR3 0.080 BP cellular aromatic compound metabolic COMT, DDC, IDO1, IDO2, IL4I1, MAOA, MAOB, TAT, TH, TPH1, TPH2, TYR, UGT1A1, UGT1A10, process UGT1A3, UGT1A7, UGT1A8, UGT1A9 0.036 BP nucleoside diphosphate metabolic process ENTPD1, ENTPD8, NME1, NME1-NME2, NME2, NME4, NME5, NME6 0.027 BP nucleoside diphosphate phosphorylation NME1, NME1-NME2, NME2, NME4, NME5, NME6 0.190 CC integral to plasma membrane ADCY3, ADCY9, CLDN1, CLDN3, CLDN4, CSF2RA, CSF2RB, CSF3R, ENTPD1, EPOR, GHR, GJA1, GRM1, GRM5, HOMER1, IFNAR1, IFNAR2, IFNGR1, IFNGR2, IL10RB, IL11RA, IL12RB1, IL12RB2, IL13RA1, IL23R, IL28RA, IL2RB, IL2RG, IL4R, IL6R, IL6ST, IL9R, INSR, ITPR3, KCNMA1, LIFR, MPL, MPZ, MPZL1, OSMR, PPAP2A, TUBB3, UGT1A1 0.103 BP response to peptide hormone stimulus ADCY1, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9, GHR, GJA1, GOT1, IL6R, INSR, IRS2, JAK2, PKLR, PKM2, PLD1, PLD2, PNLIPRP1, RELA, SOCS2 0.085 BP platelet activation DGKA, DGKB, DGKD, DGKE, DGKG, DGKH, DGKI, DGKQ, DGKZ, ENTPD1, GUCY1A3, ITPR1, ITPR2, ITPR3, KCNMA1, MPL, NOS1, PRKG1, PRKG2 0.040 BP water transport ADCY1, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9 0.248 MF ribonucleotide binding ADCY1, ADCY10, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9, CARNS1, CTPS, CTPS2, DGKA, DGKB, DGKD, DGKE, DGKG, DGKH, DGKI, DGKQ, DGKZ, DYNC1H1, DYNC1LI1, DYNC1LI2, DYNC2H1, ENTPD1, ENTPD3, ENTPD8, GUCY1A3, INSR, JAK1, JAK2, JAK3, NME1, NME1-NME2, NME2, NME3, NME4, NME5, NME6, NME7, PKLR, PKM2, PRKG1, PRKG2, RP11-631M21.2, TUBA3D, TUBB, TUBB1, TUBB2B, TUBB2C, TUBB3, TUBB4, TUBB6, TYK2, UGP2 0.040 BP cellular response to glucagon stimulus ADCY1, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9 0.026 MF alcohol dehydrogenase (NAD) activity ADH1A, ADH1B, ADH4, ADH5, ADH6, ADH7 0.243 MF purine ribonucleoside triphosphate binding ADCY1, ADCY10, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9, CARNS1, CTPS, CTPS2, DGKA, DGKB, DGKD, DGKE, DGKG, DGKH, DGKI, DGKQ, DGKZ, DYNC1H1, DYNC1LI1, DYNC1LI2, DYNC2H1, ENTPD1, ENTPD3, ENTPD8, GUCY1A3, INSR, JAK1, JAK2, JAK3, NME1, NME1-NME2, NME2, NME3, NME4, NME5, NME6, NME7, PKLR, PKM2, PRKG1, PRKG2, RP11-631M21.2, TUBA3D, TUBB, TUBB1, TUBB2B, TUBB2C, TUBB3, TUBB4, TUBB6, TYK2 0.036 BP aromatic amino acid family metabolic IDO1, IDO2, IL4I1, TAT, TH, TPH1, TPH2, TYR process 0.083 MF binding ABP1, ADH4, ADH7, ALAS1, ALAS2, ALDH1A3, DDC, DHCR24, GCAT, GLDC, GOT1, GOT2, INSR, MAOB, NOS1, PYGB, TAT, TH, UGDH 0.031 BP retinoic acid metabolic process ADH7, ALDH1A3, UGT1A1, UGT1A3, UGT1A7, UGT1A8, UGT1A9 0.045 BP regulation of cytokine-mediated signaling AGPAT1, AGPAT2, IFNAR1, IFNAR2, IFNGR1, IFNGR2, IL6ST, JAK1, JAK2, TYK2 pathway 0.535 CC cytoplasmic part ABP1, ADCY10, ADCY6, ADH1A, ADH1B, ADH4, ADH5, ADH6, ADH7, AGMAT, AGPAT1, AGPAT2, ALAS1, ALAS2, ASAH1, ASAH2, CEL, CEPT1, CHPT1, CLDN14, CLDN17, CLDN8, COMT, CTPS, CTPS2, DDC, DGAT1, DGAT2, DGKA, DGKD, DGKH, DGKI, DGKQ, DHCR24, DHCR7, DYNC1H1, DYNC1I1, DYNC1I2, DYNC1LI1, DYNC1LI2, DYNC2H1, GCAT, GHR, GJA1, GLDC, GNMT, GOT1, GOT2, GUCY1A3, GYS1, GYS2, HOMER1, IDO1, IDO2, IFNGR1, IFNGR2, IL15RA, IL22RA2, IL4I1, INADL, INSR, IRS2, ITPR1, ITPR2, ITPR3, JAK1, JAK2, JAK3, LIPF, MAOA, MAOB, NFKB1, NFKB2, NFKBIA, NME1, NME1-NME2, NME2, NME3, NME4, NME6, NOS1, OCLN, ODC1, PKLR, PKM2, PLD1, PLD2, PNPT1, PRKG1, PRKG2, RELA, SAT1, SRM, TAT, TH, TJP1, TPH1, TPH2, TUBB, TUBB2C, TUBB4, TYK2, TYR, UGDH, UGP2, UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, UGT2B10, UGT2B11, UGT2B15, UGT2B17, UGT2B28, UGT2B4, UGT2B7 0.027 BP ethanol metabolic process ADH1A, ADH1B, ADH4, ADH5, ADH6, ADH7 0.027 BP ethanol oxidation ADH1A, ADH1B, ADH4, ADH5, ADH6, ADH7 0.027 BP flavonoid metabolic process UGT1A1, UGT1A10, UGT1A3, UGT1A7, UGT1A8, UGT1A9 0.040 BP inhibition of adenylate cyclase activity by ADCY1, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9 G-protein signaling pathway 0.022 MF calcium- and calmodulin-responsive ADCY1, ADCY2, ADCY3, ADCY6, ADCY8 adenylate cyclase activity 0.022 MF interferon receptor activity IFNAR1, IFNAR2, IFNGR1, IFNGR2, IL22RA1 0.040 BP activation of adenylate cyclase activity by ADCY1, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9 G-protein signaling pathway 0.058 BP microtubule-based movement DYNC1H1, DYNC1I1, DYNC1I2, DYNC2H1, RP11-631M21.2, TUBA3D, TUBB, TUBB1, TUBB2B, TUBB2C, TUBB3, TUBB4, TUBB6 0.124 CC endoplasmic reticulum membrane AGPAT1, AGPAT2, CEPT1, DGAT1, DGAT2, DHCR24, DHCR7, IL15RA, ITPR1, ITPR2, ITPR3, PLD1, UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, UGT2B10, UGT2B11, UGT2B15, UGT2B17, UGT2B28, UGT2B4, UGT2B7 0.022 BP flavone metabolic process UGT1A1, UGT1A10, UGT1A7, UGT1A8, UGT1A9 0.039 MF pyridoxal phosphate binding ALAS1, ALAS2, DDC, GCAT, GLDC, GOT1, GOT2, PYGB, TAT 0.049 CC postsynaptic density GRM1, GRM5, HOMER1, HOMER2, HOMER3, IFNGR1, ITPR1, MPDZ, SHANK1, SHANK2, SHANK3 0.248 CC cytosol ADCY10, ADH1A, ADH1B, ADH4, ADH6, ADH7, CEL, COMT, CTPS, CTPS2, DDC, DGKA, DGKQ, DHCR24, DYNC1H1, DYNC1I2, GJA1, GNMT, GOT1, GUCY1A3, GYS1, GYS2, IDO1, IDO2, IL22RA2, INSR, IRS2, JAK1, JAK2, JAK3, NFKB1, NFKB2, NFKBIA, NME1, NME1-NME2, NME2, NOS1, OCLN, ODC1, PKLR, PKM2, PRKG1, PRKG2, RELA, SAT1, SRM, TAT, TH, TPH1, TPH2, TUBB, TUBB2C, TUBB4, TYK2, UGDH, UGP2 0.058 BP peptidyl-tyrosine phosphorylation GHR, IL12RB1, IL12RB2, IL22RA2, IL23R, IL6R, IL6ST, INSR, JAK1, JAK2, JAK3, PRLR, TYK2 0.026 MF acylglycerol O-acyltransferase activity AGPAT1, AGPAT2, DGAT1, DGAT2, MBOAT2, PNPLA3 0.036 BP cellular biogenic amine biosynthetic AGMAT, DDC, ODC1, SAT1, SRM, TH, TPH1, TPH2 process 0.030 MF retinoid binding ADH4, ADH7, UGT1A1, UGT1A3, UGT1A7, UGT1A8, UGT1A9 0.045 BP JAK-STAT cascade GHR, IFNAR1, IFNAR2, IL22RA2, IL23R, IL6R, IL6ST, JAK2, PRLR, SOCS2 0.200 MF ATP binding ADCY1, ADCY10, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9, CARNS1, CTPS, CTPS2, DGKA, DGKB, DGKD, DGKE, DGKG, DGKH, DGKI, DGKQ, DGKZ, DYNC1H1, DYNC1LI1, DYNC1LI2, DYNC2H1, ENTPD1, ENTPD3, ENTPD8, INSR, JAK1, JAK2, JAK3, NME1, NME1-NME2, NME2, NME3, NME4, NME5, NME6, NME7, PKLR, PKM2, PRKG1, PRKG2, TYK2 0.071 BP nerve growth factor receptor signaling ADCY1, ADCY2, ADCY3, ADCY4, ADCY5, ADCY6, ADCY7, ADCY8, ADCY9, IRS2, ITPR1, ITPR2, pathway ITPR3, NFKB1, NFKBIA, RELA 0.156 BP multi-organism process ADCY7, ADH5, ADH7, CLDN1, CLDN4, CNTFR, COMT, EPOR, GUCY1A3, IDO1, IFNAR1, IFNAR2, IFNGR1, IFNGR2, IL11RA, IL12RB1, IL12RB2, IL23R, IL28RA, IL2RA, IL2RB, IL2RG, IL6R, INSR, MAOB, MPDZ, NFKB1, NFKBIA, ODC1, PKLR, PLD1, PRLR, RELA, TH, UGT1A1 0.103 BP response to other organism ADH5, ADH7, COMT, GUCY1A3, IDO1, IFNAR1, IFNAR2, IFNGR1, IFNGR2, IL12RB1, IL12RB2, IL23R, IL28RA, IL2RA, IL6R, MAOB, NFKB1, NFKBIA, ODC1, PKLR, PLD1, RELA, UGT1A1 0.022 BP activation of Janus kinase activity GHR, IL23R, IL6R, JAK2, PRLR 0.027 BP nucleobase, nucleoside and nucleotide CTPS, CTPS2, NME1, NME1-NME2, NME2, NME4 interconversion 0.027 BP regulation of tyrosine phosphorylation of GHR, IL22RA2, IL23R, IL6R, IL6ST, JAK2 Stat3 protein 0.091 MF protein homodimerization activity ADH5, ADH7, ALDH1A3, DGKD, GHR, GYS2, IL6R, IL6ST, MAOB, NFKB1, PRLR, PYGB, TYR, UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A6, UGT1A7, UGT1A8, UGT1A9 0.017 MF alcohol dehydrogenase activity, ADH1A, ADH1B, ADH4, ADH7 zinc-dependent 0.039 MF amino acid binding ALAS2, DDC, GNMT, IDO1, NOS1, TAT, TH, TPH1, TPH2 0.017 MF ciliary neurotrophic factor receptor CNTFR, IL6R, IL6ST, LIFR activity 0.022 MF retinoic acid binding UGT1A1, UGT1A3, UGT1A7, UGT1A8, UGT1A9 0.045 BP acylglycerol metabolic process AGPAT1, AGPAT2, CEL, DGAT1, DGAT2, DGKD, IL6ST, LIPC, LIPF, PNPLA3 0.080 BP regulation of defense response IDO1, IFNAR1, IFNAR2, IFNGR1, IFNGR2, IL12RB1, IL23R, IL28RA, IL2RA, IL6ST, JAK1, JAK2, NFKB1, NFKB2, NFKBIA, OSMR, RELA, TYK2 0.017 MF growth hormone receptor binding JAK1, JAK2, SOCS2, TYK2 0.036 BP glycogen metabolic process GNMT, GYS1, GYS2, IL6ST, INSR, IRS2, PYGB, UGP2 0.018 BP ciliary neurotrophic factor-mediated CNTFR, IL6R, IL6ST, LIFR signaling pathway 0.049 CC receptor complex CSF2RB, GHR, IL10RB, IL12RB1, IL13RA1, IL23R, IL28RA, IL6R, IL6ST, INSR, OSMR 0.049 BP positive regulation of cytokine production AGPAT1, AGPAT2, IDO1, IFNAR1, IL12RB1, IL12RB2, IL23R, IL4R, IL6R, IL6ST, JAK2 0.022 BP positive regulation of tyrosine GHR, IL23R, IL6R, IL6ST, JAK2 phosphorylation of Stat3 protein 0.026 MF transferase activity, transferring ALAS1, ALAS2, GCAT, GOT1, GOT2, TAT nitrogenous groups 0.040 BP triglyceride metabolic process AGPAT1, AGPAT2, CEL, DGAT1, DGAT2, IL6ST, LIPC, LIPF, PNPLA3 0.018 BP activation of JAK2 kinase activity GHR, IL23R, JAK2, PRLR 0.018 BP metabotropic glutamate receptor signaling GRM5, HOMER1, HOMER2, HOMER3 pathway 0.031 BP ’de novo’ posttranslational protein folding TUBA3D, TUBB1, TUBB2B, TUBB2C, TUBB3, TUBB4, TUBB6 Continued on next page. . .

62 Ratio Type Description Genes 0.022 MF oxidoreductase activity, acting on the ABP1, GLDC, IL4I1, MAOA, MAOB CH-NH2 group of donors 0.027 CC dynein complex DYNC1H1, DYNC1I1, DYNC1I2, DYNC1LI1, DYNC1LI2, DYNC2H1 0.031 BP positive regulation of lymphocyte IL12RB1, IL15RA, IL23R, IL2RA, IL2RG, IL4R, IL7R differentiation 0.066 CC microtubule DYNC1H1, DYNC1I1, DYNC1I2, DYNC1LI1, DYNC1LI2, DYNC2H1, RP11-631M21.2, TUBA3D, TUBB, TUBB1, TUBB2B, TUBB2C, TUBB3, TUBB4, TUBB6 0.075 CC soluble fraction ADCY10, ADH7, COMT, DDC, GOT1, GYS1, GYS2, IDO1, IL13RA2, INSR, NME1, OCLN, PKLR, PKM2, PYGB, SAT1, UGP2 0.040 BP response to ethanol ADCY7, ADH6, ADH7, GOT2, IL6R, INSR, MAOB, TH, UGT1A1 0.067 BP steroid metabolic process CEL, COMT, DHCR24, DHCR7, LEPR, LIPC, NFKB1, PNLIP, PRLR, UGT1A1, UGT1A8, UGT2B11, UGT2B15, UGT2B17, UGT2B4 0.018 CC cytoplasmic dynein complex DYNC1H1, DYNC1I1, DYNC1LI1, DYNC1LI2 0.040 BP amine catabolic process COMT, GLDC, GOT1, GOT2, IDO1, IDO2, MAOA, NOS1, TAT 0.013 MF inositol 1,4,5-trisphosphate-sensitive ITPR1, ITPR2, ITPR3 calcium-release channel activity 0.013 MF interleukin-2 receptor activity IL2RA, IL2RB, IL2RG 0.013 MF oncostatin-M receptor activity IL6ST, LIFR, OSMR 0.054 BP glucose metabolic process GNMT, GOT1, GOT2, GYS1, GYS2, INSR, IRS2, PKLR, PKM2, PYGB, UGDH, UGP2 0.085 BP regulation of immune response IDO1, IFNAR1, IFNAR2, IFNGR1, IFNGR2, IL12RB1, IL23R, IL2RA, IL4R, IL6ST, IL7R, JAK1, JAK2, NFKB1, NFKB2, NFKBIA, PLD2, RELA, TYK2 0.022 BP dopamine metabolic process COMT, DDC, MAOA, MAOB,TH 0.022 BP neurotransmitter metabolic process COMT, MAOA, MAOB, NOS1,TH 0.049 BP regulation of innate immune response IFNAR1, IFNAR2, IFNGR1, IFNGR2, JAK1, JAK2, NFKB1, NFKB2, NFKBIA, RELA, TYK2 0.018 BP polyamine biosynthetic process AGMAT, ODC1, SAT1, SRM 0.022 BP drug metabolic process NOS1, UGT1A1, UGT1A7, UGT1A8, UGT1A9 0.027 BP positive regulation of T cell differentiation IL12RB1, IL23R, IL2RA, IL2RG, IL4R, IL7R 0.017 MF oxidoreductase activity, acting on the ABP1, IL4I1, MAOA, MAOB CH-NH2 group of donors, oxygen as acceptor 0.018 BP neurotransmitter biosynthetic process COMT, MAOA, NOS1,TH 0.013 CC I-kappaB/NF-kappaB complex NFKB1, NFKB2, NFKBIA 0.013 MF interleukin-12 receptor binding IL12RB1, IL23R, JAK2 0.013 MF primary amine oxidase activity ABP1, MAOA, MAOB 0.018 BP estrogen metabolic process COMT, UGT1A1, UGT2B11, UGT2B4 0.040 BP response to glucocorticoid stimulus GHR, GOT1, IL6R, INSR, MAOB, PNLIPRP1, TAT, TPH2, UGT1A1 0.022 BP glycogen biosynthetic process GYS1, GYS2, INSR, IRS2, UGP2 0.040 BP protein polymerization RP11-631M21.2, TUBA3D, TUBB, TUBB1, TUBB2B, TUBB2C, TUBB3, TUBB4, TUBB6 0.018 BP regulation of activated T cell proliferation IDO1, IL12RB1, IL23R, IL2RA 0.045 BP response to lipopolysaccharide ADH5, COMT, IDO1, IL12RB2, IL23R, IL6R, MAOB, NFKBIA, RELA, UGT1A1 0.027 BP glutamine family amino acid metabolic CTPS, CTPS2, GOT1, GOT2, NOS1, TAT process 0.013 BP phosphatidic acid biosynthetic process AGPAT1, AGPAT2, PLD1 0.018 BP 2-oxoglutarate metabolic process GHR, GOT1, GOT2, TAT 0.027 BP cholesterol transport CEL, LIPC, LIPG, NFKB1, NFKBIA, PNLIP 0.013 MF oxidoreductase activity, acting on paired TH, TPH1, TPH2 donors, with incorporation or reduction of molecular oxygen, reduced pteridine as one donor, and incorporation of one atom of oxygen 0.018 BP regulation of interferon-gamma-mediated IFNGR1, IFNGR2, JAK1, JAK2 signaling pathway 0.018 BP regulation of response to IFNGR1, IFNGR2, JAK1, JAK2 interferon-gamma 0.013 CC platelet dense tubular network membrane ITPR1, ITPR2, ITPR3 0.013 MF 1-acylglycerol-3-phosphate AGPAT1, AGPAT2, MBOAT2 O-acyltransferase activity 0.013 MF CDP-alcohol phosphatidyltransferase CEPT1, CHPT1, EPT1 activity 0.018 BP indolalkylamine metabolic process IDO1, IDO2, TPH1, TPH2 0.022 BP triglyceride biosynthetic process AGPAT1, AGPAT2, DGAT1, DGAT2, PNPLA3 0.018 BP cellular biogenic amine catabolic process COMT, IDO1, IDO2, MAOA 0.100 MF enzyme binding ADCY10, DGKI, DGKQ, DHCR24, GHR, IFNAR2, IL12RB2, IL6R, INSR, IRS2, JAK2, NFKBIA, PLD2, RELA, TUBB3, UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A6, UGT1A7, UGT1A8, UGT1A9 0.075 CC neuron projection ADCY10, ADCY2, ADCY4, GOT1, GRM1, GRM5, IFNGR1, ITPR2, MPDZ, NOS1, PYGB, SHANK1, TH, TPH1, TPH2, TUBB3, TUBB4 0.045 BP response to hypoxia ALAS2, CLDN3, ITPR1, ITPR2, KCNMA1, NOS1, PKLR, PKM2, PLD2,TH 0.022 BP leukocyte mediated cytotoxicity IL12RB1, IL23R, IL7R, TUBB, TUBB2C 0.013 BP neurotransmitter catabolic process COMT, MAOA, MAOB 0.013 BP tyrosine phosphorylation of Stat1 protein IL23R, IL6ST, JAK2 0.013 MF diacylglycerol binding CHPT1, DGAT1, DGKD 0.054 BP anti-apoptosis DHCR24, IL2RB, IL6R, IL6ST, LIFR, NFKB1, NFKBIA, NME5, NME6, PRLR, RELA, SOCS2 0.022 BP negative regulation of lipid metabolic NFKB1, UGT1A1, UGT1A7, UGT1A8, UGT1A9 process 0.040 CC membrane raft ADCY3, ADCY6, CEL, GJA1, INSR, ITPR1, JAK2, KCNMA1, PLD2 0.049 BP lipid catabolic process CEL, IRS2, LIPC, LIPF, LIPG, PLD1, PLD2, PNLIP, PNLIPRP3, PNPLA3, UGT2B4 0.031 BP cellular amino acid catabolic process GLDC, GOT1, GOT2, IDO1, IDO2, NOS1, TAT 0.013 BP growth hormone receptor signaling GHR, JAK2, SOCS2 pathway 0.013 BP oxaloacetate metabolic process GHR, GOT1, GOT2 0.013 BP tyrosine metabolic process TAT, TH, TYR 0.018 BP positive regulation of interferon-gamma IFNAR1, IL12RB1, IL12RB2, IL23R production 0.043 MF coenzyme binding ADH4, ADH7, ALAS2, ALDH1A3, DHCR24, INSR, MAOB, NOS1, TH, UGDH 0.018 BP phospholipid catabolic process LIPC, LIPG, PLD1, PLD2 0.045 BP response to virus IFNAR1, IFNAR2, IFNGR1, IFNGR2, IL12RB1, IL23R, IL28RA, IL2RA, ODC1, RELA 0.009 MF 5-aminolevulinate synthase activity ALAS1, ALAS2 0.009 MF CTP synthase activity CTPS, CTPS2 0.009 MF GKAP/Homer scaffold activity HOMER2, SHANK2 0.009 MF L-aspartate:2-oxoglutarate GOT1, GOT2 aminotransferase activity 0.009 MF cGMP-dependent protein kinase activity PRKG1, PRKG2 0.009 MF glycogen (starch) synthase activity GYS1, GYS2 0.009 MF indoleamine 2,3-dioxygenase activity IDO1, IDO2 0.009 MF interferon-gamma receptor activity IFNGR1, IFNGR2 0.009 MF interleukin-10 receptor activity IL10RA, IL10RB 0.009 MF interleukin-23 binding IL12RB1, IL23R 0.009 MF interleukin-23 receptor activity IL12RB1, IL23R 0.009 CC interleukin-23 receptor complex IL12RB1, IL23R 0.009 MF interleukin-3 receptor activity CSF2RB, IL3RA 0.009 MF interleukin-4 receptor activity IL2RG, IL4R 0.009 MF interleukin-5 receptor activity CSF2RB, IL5RA 0.009 MF interleukin-6 receptor activity IL6R, IL6ST 0.009 MF interleukin-7 receptor activity IL2RG, IL7R 0.009 MF leukemia inhibitory factor receptor IL6ST, LIFR activity 0.009 MF pyruvate kinase activity PKLR, PKM2 0.009 MF tryptophan 5-monooxygenase activity TPH1, TPH2 0.009 MF type I interferon receptor activity IFNAR1, IFNAR2 0.048 MF kinase binding DGKQ, GHR, IFNAR2, IL12RB2, INSR, IRS2, JAK2, PLD2, RELA, UGT1A10, UGT1A7 0.013 MF insulin receptor substrate binding IL4R, INSR, JAK2 0.009 BP activation of phospholipase C activity by GRM5, HOMER1 metabotropic glutamate receptor signaling pathway 0.009 BP aspartate biosynthetic process GOT1, GOT2 0.009 BP glutamate catabolic process to GOT1, GOT2 2-oxoglutarate 0.009 BP glutamate catabolic process to aspartate GOT1, GOT2 0.009 BP serotonin biosynthetic process TPH1, TPH2 0.018 BP regulation of interleukin-12 production IDO1, IL23R, NFKB1, RELA 0.018 BP regulation of type I interferon-mediated IFNAR1, IFNAR2, JAK1, TYK2 signaling pathway 0.048 MF protein heterodimerization activity DGKD, GUCY1A3, TYR, UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A6, UGT1A7, UGT1A8, UGT1A9 0.061 MF guanyl nucleotide binding GUCY1A3, INSR, NME1, PRKG1, PRKG2, RP11-631M21.2, TUBA3D, TUBB, TUBB1, TUBB2B, TUBB2C, TUBB3, TUBB4, TUBB6 Continued on next page. . .

63 Ratio Type Description Genes 0.061 MF guanyl ribonucleotide binding GUCY1A3, INSR, NME1, PRKG1, PRKG2, RP11-631M21.2, TUBA3D, TUBB, TUBB1, TUBB2B, TUBB2C, TUBB3, TUBB4, TUBB6 0.013 BP positive regulation of activated T cell IL12RB1, IL23R, IL2RA proliferation

cellular_component

platelet dense integral to cytoplasmic postsynaptic I-kappaB/NF-kappaB tight junction microsome dynein complex microtubule soluble fraction tubular network neuron projection membrane raft membrane part density complex membrane

integral to endoplasmic cytoplasmic cytosol receptor complex plasma membrane reticulum membrane dynein complex

interleukin-23 receptor complex

Figure 20: Relationships between the enriched cellular component Gene Ontology terms that were listed in Table 26. The darkness of the red reflects the significance of the enrichment and the thicknesses of the edges are propotional to the numbers of genes sharing the following annotation.

molecular_function

oxidoreductase activity, acting on paired donors, with incorporation transferase oxidoreductase or reduction nucleoside acylglycerol inositol 1,4,5-trisphosphate-sensitive CDP-alcohol cGMP-dependent indoleamine cytokine receptor growth factor glucuronosyltransferase interleukin-23 identical protein adenylate cyclase diacylglycerol structural carboxylic triglyceride ribonucleotide guanyl nucleotide alcohol dehydrogenase growth hormone activity, transferring interleukin-12 activity, acting of molecular diacylglycerol 5-aminolevulinate CTP synthase glycogen (starch) pyruvate kinase insulin receptor protein heterodimerization diphosphate retinoid binding cofactor binding O-acyltransferase calcium-release phosphatidyltransferase protein kinase 2,3-dioxygenase enzyme binding activity binding activity binding binding activity kinase activity molecule activity acid binding lipase activity binding binding (NAD) activity receptor binding nitrogenous receptor binding on the CH-NH2 oxygen, reduced binding synthase activity activity synthase activity activity substrate binding activity kinase activity activity channel activity activity activity activity groups group of donors pteridine as one donor, and incorporation of one atom of oxygen

oxidoreductase calcium- and ciliary neurotrophic purine ribonucleoside 1-acylglycerol-3-phosphate L-aspartate:2-oxoglutarate activity, acting tryptophan oncostatin-M interferon interleukin-4 interleukin-5 interleukin-6 interleukin-7 interleukin-10 interleukin-2 interleukin-3 interleukin-23 protein homodimerization calmodulin-responsive GKAP/Homer amino acid retinoic acid guanyl ribonucleotide alcohol dehydrogenase pyridoxal phosphate factor receptor triphosphate coenzyme binding O-acyltransferase aminotransferase on the CH-NH2 5-monooxygenase kinase binding receptor activity receptor activity receptor activity receptor activity receptor activity receptor activity receptor activity receptor activity receptor activity receptor activity activity adenylate cyclase scaffold activity binding binding binding activity, zinc-dependent binding activity binding activity activity group of donors, activity activity oxygen as acceptor

leukemia inhibitory interferon-gamma type I interferon primary amine factor receptor ATP binding receptor activity receptor activity oxidase activity activity

Figure 21: Relationships between the enriched molecular function Gene Ontology terms that were listed in Table 26. The darkness of the red reflects the significance of the enrichment and the thicknesses of the edges are propotional to the numbers of genes sharing the following annotation.

64 triglyceride biosynthetic ethanol oxidation process

ethanol metabolic process

glucose metabolic process

oxaloacetate metabolic process

triglyceride metabolic process

glutamate catabolic 2-oxoglutarate process to retinoic acid metabolic process 2-oxoglutarate metabolic process

tyrosine metabolic alcohol metabolic process process

glutamate catabolic process to phospholipid aspartate catabolic process

carboxylic flavone metabolic acid metabolic process process

glutamine family acylglycerol estrogen metabolic amino acid metabolic process process metabolic process retinoid metabolic cellular amino process acid catabolic process phosphatidic acid biosynthetic process

dopamine metabolic negative regulation process of lipid metabolic process nucleobase, aromatic amino nucleoside steroid metabolic acid family and nucleotide process metabolic process interconversion

flavonoid metabolic process

cellular biogenic lipid catabolic amine catabolic process process nucleoside nucleoside aspartate biosynthetic diphosphate diphosphate process metabolic process phosphorylation

small molecule cellular biogenic small molecule metabolic process amine metabolic biosynthetic process process polyamine biosynthetic process cellular amine lipid metabolic metabolic process process inhibition of adenylate cellular aromatic cellular biogenic cyclase activity compound metabolic indolalkylamine amine biosynthetic by G-protein process metabolic process process signaling pathway

serotonin biosynthetic CTP biosynthetic process process activation of phospholipase activation UTP biosynthetic GTP biosynthetic C activity of adenylate process process by metabotropic cyclase activity glutamate receptor amine catabolic by G-protein energy reserve signaling pathway process signaling pathway metabolic process

cellular nitrogen glycogen biosynthetic compound biosynthetic process process

metabotropic glutamate receptor signaling pathway positive regulation of activated T cell proliferation blood coagulation

platelet activation

regulation activation of activated of phospholipase T cell proliferation C activity

regulation of body fluid levels

cell activation cholesterol transport positive regulation of lymphocyte differentiation anti-apoptosis

positive regulation positive regulation of T cell differentiation of cytokine production biological_process

positive regulation drug metabolic 'de novo' posttranslational of interferon-gamma process protein folding production leukocyte mediated regulation cytotoxicity of interleukin-12 production calcium-independent cell-cell junction cell-cell adhesion regulation assembly of defense nerve growth response neurotransmitter factor receptor metabolic process signaling pathway

apical junction protein polymerization assembly microtubule-based water transport movement

neurotransmitter glycogen metabolic biosynthetic process process multi-organism tight junction process assembly JAK-STAT cascade neurotransmitter catabolic process activation peptidyl-tyrosine of protein phosphorylation kinase activity

tyrosine phosphorylation regulation of Stat1 protein of immune response

activation response to of protein other organism kinase C activity regulation by G-protein of tyrosine coupled receptor phosphorylation protein signaling of Stat3 protein pathway

activation response to of Janus kinase chemical stimulus activity

activation of protein kinase A activity

response to virus

regulation of innate immune response

positive regulation response to of tyrosine hypoxia phosphorylation of Stat3 protein response to lipopolysaccharide cellular response to chemical activation stimulus of JAK2 kinase activity response to organic substance xenobiotic metabolic process regulation of response to interferon-gamma

regulation of type I interferon-mediated signaling pathway

response to glucocorticoid stimulus response to response to peptide hormone ethanol stimulus

cellular response to organic substance

regulation of interferon-gamma-mediated signaling pathway

regulation growth hormone of cytokine-mediated receptor signaling signaling pathway pathway

cytokine-mediated cellular response signaling pathway to glucagon stimulus ciliary neurotrophic factor-mediated signaling pathway

Figure 22: Relationships between the enriched biological process Gene Ontology terms that were listed in Table 26. The darkness of the red reflects the significance of the enrichment and the thicknesses of the edges are propotional to the numbers of genes sharing the following annotation.

65 11.2 Candidate genes

Table 27: Descriptions of the candidate genes. Studies that have reported results about the candidate genes are listed so that those with negative evidence have been prefixed with a hyphen. S column contains an at sign if the gene is part of the candidate pathway. The statuses of the genes are shown as: a=absent, d=down regulated, u=up regulated, s=stable. This table has 132 rows.

S name locus description studies u ABHD3 18:19230858-19284766 abhydrolase domain containing 3 [Source:HGNC Symbol;Acc:18718], type=protein coding tcgaGliomaGE, 18q11.2 tscapeNSCLCa d AC010170.1 3:45266617-45267519 [undefined], type=pseudogene 3p21.31 u AC061975.8 17:26603489-26603579 [undefined], type=miRNA 17q11.2 u AC068353.1 8:8175258-8244008 Tyrosine-protein kinase SgK223 [Source:UniProtKB/Swiss-Prot;Acc:Q86YV5], 8p23.1 type=protein coding, GO=[non-membrane spanning protein tyrosine kinase activity] d ACPP 3:132036211-132087142 acid phosphatase, prostate [Source:HGNC Symbol;Acc:125], snp3dProstateC 3q22.1 type=processed transcript,protein coding,retained intron, GO=[choline binding; acid phosphatase activity; multivesicular body; Golgi cisterna; lysosomal membrane; stored secretory granule; apical part of cell; lysosome; lytic vacuole] d ADORA2B 17:15848231-15879060 adenosine A2b receptor [Source:HGNC Symbol;Acc:264], type=protein coding, GO=[positive 17p12 regulation of chronic inflammatory response to non-antigenic stimulus; regulation of chronic inflammatory response to non-antigenic stimulus; chronic inflammatory response to non-antigenic stimulus; positive regulation of guanylate cyclase activity; positive regulation of norepinephrine secretion; relaxation of vascular smooth muscle; negative regulation of collagen biosynthetic process; adenosine receptor activity, G-protein coupled; positive regulation of mast cell degranulation; positive regulation of leukocyte degranulation; positive regulation of mast cell activation involved in immune response; positive regulation of cGMP biosynthetic process; relaxation of muscle; positive regulation of steroid biosynthetic process; positive regulation vascular endothelial growth factor production; negative regulation of muscle contraction; positive regulation of vasodilation; positive regulation of chemokine production; positive regulation of interleukin-6 production; mast cell activation; positive regulation of cAMP biosynthetic process; positive regulation of cAMP metabolic process; purinergic receptor activity; activation of adenylate cyclase activity by G-protein signaling pathway; cellular defense response; activation of adenylate cyclase activity; G-protein signaling, coupled to cAMP nucleotide second messenger; regulation of angiogenesis; activation of MAPK activity; JNK cascade; stress-activated protein kinase signaling cascade; regulation of MAP kinase activity; positive regulation of transport; positive regulation of cell proliferation] u AL162497.1 13:110407168- [undefined], type=pseudogene 110407615 13q34 u* ALDH1A3 15:101419581- aldehyde dehydrogenase 1 family, member A3 [Source:HGNC Symbol;Acc:409], tscapeMelanomaa, 101456831 type=protein coding, GO=[nucleus accumbens development; olfactory pit development; retinoic tscapeNSCLCa 15q26.3 acid biosynthetic process; thyroid hormone binding; optic cup morphogenesis involved in camera-type eye development; retinal metabolic process; aldehyde dehydrogenase (NAD) activity; NAD+ binding; face development; pituitary gland development; NAD binding; kidney development; cofactor binding; response to drug; protein homodimerization activity; protein dimerization activity; identical protein binding] u ATAD2 8:124332090-124428590 ATPase family, AAA domain containing 2 [Source:HGNC Symbol;Acc:30123], fileCIN70, tcgaBreastGE, 8q24.13 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[ATPase tcgaGliomaGE, activity, uncoupled] tcgaOvarianGE, tscapeNSCLCa, tscapeOvariana d ATP12A 13:25254549-25285921 ATPase, H+/K+ transporting, nongastric, alpha polypeptide [Source:HGNC Symbol;Acc:13816], tscapeBCd, tscapeSCLCd 13q12.12 type=protein coding, GO=[hydrogen:potassium-exchanging ATPase complex; hydrogen:potassium-exchanging ATPase activity; potassium ion homeostasis; potassium-transporting ATPase activity; proton transport; ATP biosynthetic process; apical plasma membrane; potassium ion transport; response to metal ion; apical part of cell; ion transmembrane transport; monovalent inorganic cation transport] u ATP1A1 1:116915290-116952883 ATPase, Na+/K+ transporting, alpha 1 polypeptide [Source:HGNC Symbol;Acc:799], tcgaGliomaGE, tscapeBCd, 1p13.1 type=processed transcript,protein coding,retained intron, GO=[4-nitrophenylphosphatase activity; tscapeNSCLCd, negative regulation of glucocorticoid biosynthetic process; negative regulation of glucocorticoid tscapeSCLCa, tscapeSCLCd metabolic process; positive regulation of striated muscle contraction; sodium:potassium-exchanging ATPase complex; sodium:potassium-exchanging ATPase activity; potassium-transporting ATPase activity; negative regulation of heart contraction; positive regulation of heart contraction; regulation of the force of heart contraction; regulation of striated muscle contraction; sarcolemma; melanosome; ATP biosynthetic process; regulation of blood pressure; potassium ion transport; basolateral plasma membrane; microsome; ion transmembrane transport; response to drug; monovalent inorganic cation transport] d BAMBI 10:28966271-28971868 BMP and activin membrane-bound inhibitor homolog (Xenopus laevis) [Source:HGNC tscapeBCd 10p12.1 Symbol;Acc:30251], type=protein coding,retained intron, GO=[type II transforming growth factor beta receptor binding; positive regulation of catenin import into nucleus; positive regulation of epithelial to mesenchymal transition; frizzled binding; positive regulation of protein binding; positive regulation of canonical Wnt receptor signaling pathway; negative regulation of transforming growth factor beta receptor signaling pathway; regulation of cell shape; regulation of intracellular protein transport; positive regulation of transport; positive regulation of cell proliferation] u BCAP29 7:107220422-107269615 B-cell receptor-associated protein 29 [Source:HGNC Symbol;Acc:24131], 7q22.3 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[endoplasmic reticulum membrane; endoplasmic reticulum part] d BCHE 3:165490692-165555260 butyrylcholinesterase [Source:HGNC Symbol;Acc:983], tcgaBreastGE, 3q26.1 type=nonsense mediated decay,protein coding, GO=[cocaine metabolic process; tropane alkaloid tcgaGliomaGE metabolic process; acetylcholinesterase activity; cholinesterase activity; choline binding; choline metabolic process; nuclear envelope lumen; response to folic acid; beta-amyloid binding; negative regulation of synaptic transmission; learning; response to alkaloid; endoplasmic reticulum lumen; response to glucocorticoid stimulus; response to drug; endoplasmic reticulum part; extracellular space] u BRP44 1:167885967-167906278 brain protein 44 [Source:HGNC Symbol;Acc:24515], type=protein coding 1q24.2 u C12orf44 12:52463758-52471278 chromosome 12 open reading frame 44 [Source:HGNC Symbol;Acc:25679], type=protein coding, tcgaBreastGE, tscapeRCCa 12q13.13 GO=[pre-autophagosomal structure; autophagic vacuole assembly] u C15orf23 15:40674922-40686488 chromosome 15 open reading frame 23 [Source:HGNC Symbol;Acc:30767], type=protein coding tcgaBreastGE, 15q15.1 tcgaOvarianGE, tscapeCRCd, tscapeMelanomad, tscapeNSCLCd, tscapeOvariand u C1orf116 1:207191866-207206101 chromosome 1 open reading frame 116 [Source:HGNC Symbol;Acc:28667], type=protein coding tscapeBCa, tscapeProstatea 1q32.1, 1q32.2 u C1orf21 1:184356192-184598154 chromosome 1 open reading frame 21 [Source:HGNC Symbol;Acc:15494], tcgaGliomaGE, tscapeBCd, 1q25.3 type=ambiguous orf,processed transcript,protein coding tscapeProstated u C3orf58 3:143690640-143767561 chromosome 3 open reading frame 58 [Source:HGNC Symbol;Acc:28490], tcgaBreastGE 3q24 type=processed transcript,protein coding,retained intron, GO=[COPI vesicle coat] u C4orf34 4:39552541-39640710 chromosome 4 open reading frame 34 [Source:HGNC Symbol;Acc:27321], 4p14 type=nonsense mediated decay,processed transcript,protein coding d C6orf192 6:133090507-133119747 chromosome 6 open reading frame 192 [Source:HGNC Symbol;Acc:21573], tscapeBCa 6q23.2 type=processed transcript,protein coding u C9orf152 9:112952328-112970469 open reading frame 152 [Source:HGNC Symbol;Acc:31455], 9q31.3 type=processed transcript,protein coding d CAMK2N1 1:20808884-20812713 calcium/calmodulin-dependent protein kinase II inhibitor 1 [Source:HGNC Symbol;Acc:24190], tcgaGliomaGE, tscapeBCd, 1p36.12 type=processed transcript,protein coding, GO=[calcium-dependent protein kinase inhibitor tscapeCRCd, activity; postsynaptic density; synaptosome; postsynaptic membrane; neuronal cell body; cell tscapeNSCLCd, junction] tscapeOvariand, tscapeRCCd u CAP2 6:17393447-17558023 CAP, adenylate cyclase-associated protein, 2 (yeast) [Source:HGNC Symbol;Acc:20039], tcgaGliomaGE, tscapeBCa, 6p22.3 type=nonsense mediated decay,protein coding, GO=[activation of adenylate cyclase activity; tscapeOvariana establishment or maintenance of cell polarity; actin binding; axon guidance; axonogenesis; cell morphogenesis involved in neuron differentiation] Continued on next page. . .

66 S name locus description studies u CBLL1 7:107384142-107401142 Cas-Br-M (murine) ecotropic retroviral transforming sequence-like 1 [Source:HGNC 7q22.3, 7q31.1 Symbol;Acc:21225], type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[positive regulation of endocytosis; negative regulation of cell adhesion; ubiquitin ligase complex; positive regulation of cell migration; ubiquitin-protein ligase activity; regulation of cell migration; positive regulation of transport; protein ubiquitination] d CBLN2 18:70203915-70211774 cerebellin 2 precursor [Source:HGNC Symbol;Acc:1544], type=protein coding tcgaBreastGE, 18q22.3 tcgaGliomaGE d CCDC28B 1:32665987-32670988 coiled-coil domain containing 28B [Source:HGNC Symbol;Acc:28163], tcgaBreastGE, tscapeBCd, 1p35.1 type=nonsense mediated decay,processed transcript,protein coding tscapeOvariand u CEBPD 8:48649471-48651648 CCAAT/enhancer binding protein (C/EBP), delta [Source:HGNC Symbol;Acc:1835], tcgaGliomaGE, 8q11.21 type=protein coding, GO=[protein dimerization activity] tscapeNSCLCd, tscapeRCCd u* CLDN8 21:31586324-31588391 claudin 8 [Source:HGNC Symbol;Acc:2050], type=protein coding, GO=[calcium-independent 21q22.11 cell-cell adhesion; tight junction; apical junction complex; cell junction; identical protein binding] d CNKSR3 6:154726311-154831793 CNKSR family member 3 [Source:HGNC Symbol;Acc:23034], type=protein coding, GO=[positive tscapeOvariand, 6q25.2 regulation of sodium ion transport; negative regulation of peptidyl-serine phosphorylation; tscapeProstated negative regulation of ERK1 and ERK2 cascade; regulation of sodium ion transport; positive regulation of ion transport; positive regulation of transport; monovalent inorganic cation transport] u COBLL1 2:165510134-165700189 COBL-like 1 [Source:HGNC Symbol;Acc:23571], 2q24.3 type=nonsense mediated decay,processed transcript,protein coding,retained intron d COLEC12 18:319355-500729 collectin sub-family member 12 [Source:HGNC Symbol;Acc:16016], type=protein coding, 18p11.32 GO=[carbohydrate mediated signaling; galactose binding; phagocytosis, recognition; pattern recognition receptor activity; low-density lipoprotein particle binding; scavenger receptor activity; pattern recognition receptor signaling pathway; protein homooligomerization] u CREB3L4 1:153940010-153946839 cAMP responsive element binding protein 3-like 4 [Source:HGNC Symbol;Acc:18854], tcgaBreastGE, 1q21.3 type=processed transcript,protein coding, GO=[response to unfolded protein; spermatogenesis; tcgaGliomaGE, positive regulation of transcription from RNA polymerase II promoter; endoplasmic reticulum tscapeHCCa, membrane; protein dimerization activity; endoplasmic reticulum part] tscapeMelanomaa, tscapeNSCLCa, tscapeProstatea u CTD- 5:122736616-122738038 [undefined], type=processed pseudogene,pseudogene tcgaBreastGE 2048F20.1 5q23.2 u CTD- 5:36885308-36886599 [undefined], type=processed pseudogene,pseudogene tcgaBreastGE 2653M23.1 5p13.2 d* DDC 7:50526134-50633154 dopa decarboxylase (aromatic L-amino acid decarboxylase) [Source:HGNC Symbol;Acc:2719], tscapeGliomaa 7p12.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[aromatic-L-amino-acid decarboxylase activity; dopamine biosynthetic process; cellular biogenic amine biosynthetic process; pyridoxal phosphate binding; circadian rhythm; neurotransmitter secretion; alcohol biosynthetic process; cofactor binding; soluble fraction] u* DHCR24 1:55315300-55352921 24-dehydrocholesterol reductase [Source:HGNC Symbol;Acc:2859], type=protein coding, tcgaBreastGE, 1p32.3 GO=[delta24-sterol reductase activity; plasminogen activation; amyloid precursor protein catabolic tcgaGliomaGE process; peptide antigen binding; neuroprotection; male genitalia development; negative regulation of caspase activity; skin development; cholesterol biosynthetic process; flavin adenine dinucleotide binding; response to oxidative stress; anti-apoptosis; cofactor binding; cell cycle arrest; endoplasmic reticulum membrane; endoplasmic reticulum part] u DNASE2B 1:84864215-84880701 deoxyribonuclease II beta [Source:HGNC Symbol;Acc:28875], type=protein coding, tscapeGliomad, 1p31.1 GO=[deoxyribonuclease II activity; lysosome; lytic vacuole] tscapeNSCLCd, tscapeOvariand u ELL2 5:95220802-95297775 elongation factor, RNA polymerase II, 2 [Source:HGNC Symbol;Acc:17064], tscapeBCd, tscapeCRCd, 5q15 type=processed transcript,protein coding,retained intron, GO=[RNA polymerase II transcription tscapeNSCLCd, elongation factor activity; transcription elongation factor complex; transcription elongation from tscapeOvariand, RNA polymerase II promoter; RNA polymerase II transcription factor activity] tscapeProstated u ELL2P1 1:158145640-158147545 elongation factor, RNA polymerase II, 2 pseudogene 1 [Source:HGNC Symbol;Acc:39343], 1q23.1 type=processed pseudogene,pseudogene u ELOVL2 6:10980992-11044547 elongation of very long chain fatty acids (FEN1/Elo2, SUR4/Elo3, yeast)-like 2 [Source:HGNC tcgaGliomaGE, tscapeBCa, 6p24.2 Symbol;Acc:14416], type=protein coding, GO=[fatty acid elongation, polyunsaturated fatty acid; tscapeOvariana, fatty acid elongase activity; very long-chain fatty acid biosynthetic process; fatty acid elongation; tscapeSCLCd long-chain fatty-acyl-CoA biosynthetic process; long-chain fatty-acyl-CoA metabolic process; triglyceride biosynthetic process; microsome; endoplasmic reticulum membrane; endoplasmic reticulum part] u ELOVL5 6:53132196-53213947 ELOVL family member 5, elongation of long chain fatty acids (FEN1/Elo2, SUR4/Elo3-like, yeast) tcgaGliomaGE, 6p12.1 [Source:HGNC Symbol;Acc:21308], type=processed transcript,protein coding, GO=[fatty acid tscapeNSCLCa elongation, monounsaturated fatty acid; fatty acid elongation, polyunsaturated fatty acid; fatty acid elongase activity; very long-chain fatty acid biosynthetic process; fatty acid elongation; long-chain fatty-acyl-CoA biosynthetic process; long-chain fatty-acyl-CoA metabolic process; triglyceride biosynthetic process; endoplasmic reticulum membrane; endoplasmic reticulum part] d ELOVL6 4:110967002-111120355 ELOVL family member 6, elongation of long chain fatty acids (FEN1/Elo2, SUR4/Elo3-like, yeast) tscapeHCCd 4q25 [Source:HGNC Symbol;Acc:15829], type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[long-chain fatty acid biosynthetic process; fatty acid elongation, saturated fatty acid; fatty acid elongase activity; fatty acid elongation; long-chain fatty-acyl-CoA biosynthetic process; long-chain fatty-acyl-CoA metabolic process; triglyceride biosynthetic process; integral to endoplasmic reticulum membrane; integral to organelle membrane; endoplasmic reticulum membrane; endoplasmic reticulum part] u ERRFI1 1:8064464-8086368 ERBB receptor feedback inhibitor 1 [Source:HGNC Symbol;Acc:18185], tscapeBCd, tscapeCRCd, 1p36.23 type=processed transcript,protein coding, GO=[skin morphogenesis; negative regulation of tscapeHCCd, epidermal growth factor receptor activity; lung vasculature development; negative regulation of tscapeNSCLCd, protein autophosphorylation; extrinsic to internal side of plasma membrane; negative regulation of tscapeOvariana, peptidyl-tyrosine phosphorylation; regulation of keratinocyte differentiation; lung epithelium tscapeOvariand, development; Rho GTPase activator activity; lung alveolus development; skin development; Ras tscapeRCCd GTPase activator activity] u FADS2 11:61583728-61634826 fatty acid desaturase 2 [Source:HGNC Symbol;Acc:3575], type=protein coding,retained intron, tcgaBreastGE, 11q12.2 GO=[stearoyl-CoA 9-desaturase activity; unsaturated fatty acid biosynthetic process; heme tcgaGliomaGE binding; electron transport chain; endoplasmic reticulum membrane; endoplasmic reticulum part] d FAM47E 4:77135193-77232282 family with sequence similarity 47, member E [Source:HGNC Symbol;Acc:34343], tcgaBreastGE 4q21.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron d FFAR2 19:35940617-35942667 free fatty acid receptor 2 [Source:HGNC Symbol;Acc:4501], type=protein coding tcgaBreastGE, 19q13.12 tscapeNSCLCa u FKBP5 6:35541362-35696360 FK506 binding protein 5 [Source:HGNC Symbol;Acc:3721], type=protein coding, GO=[FK506 6p21.31 binding; macrolide binding; peptidyl-prolyl cis-trans isomerase activity; drug binding; heat shock protein binding] d* GLDC 9:6532464-6645650 glycine dehydrogenase (decarboxylating) [Source:HGNC Symbol;Acc:4313], tcgaBreastGE, 9p24.1 type=processed transcript,protein coding, GO=[glycine dehydrogenase (decarboxylating) activity; tcgaGliomaGE, glycine catabolic process; pyridoxal phosphate binding; electron carrier activity; cofactor binding] tcgaOvarianGE, tscapeBCd u GLRX 5:95087023-95158709 glutaredoxin (thioltransferase) [Source:HGNC Symbol;Acc:4330], tscapeBCd, tscapeCRCd, 5q15 type=processed transcript,protein coding,retained intron, GO=[glutathione disulfide tscapeNSCLCd, oxidoreductase activity; nucleobase, nucleoside and nucleotide interconversion; protein disulfide tscapeOvariand, oxidoreductase activity; cell redox homeostasis; protein N-terminus binding; electron transport tscapeProstated chain; electron carrier activity] u* GNMT 6:42928496-42931618 glycine N-methyltransferase [Source:HGNC Symbol;Acc:4415], type=protein coding, GO=[glycine tscapeCRCa 6p21.1 N-methyltransferase activity; S-adenosylmethionine metabolic process; S-adenosylhomocysteine metabolic process; glycine binding; regulation of gluconeogenesis; folic acid binding; methionine metabolic process; regulation of carbohydrate biosynthetic process; protein homotetramerization; regulation of glucose metabolic process; glycogen metabolic process; alcohol biosynthetic process; protein homooligomerization; one-carbon metabolic process; glucose metabolic process; carbohydrate biosynthetic process] d GOLIM4 3:167726465-167813763 golgi integral membrane protein 4 [Source:HGNC Symbol;Acc:15448], tcgaGliomaGE 3q26.2 type=protein coding,retained intron, GO=[cis-Golgi network; Golgi lumen; Golgi cisterna membrane; Golgi cisterna; endocytic vesicle; endosome membrane] u* GUCY1A3 4:156587863-156653501 guanylate cyclase 1, soluble, alpha 3 [Source:HGNC Symbol;Acc:4685], tcgaBreastGESurv, 4q32.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[guanylate tcgaGliomaGE, tscapeRCCd cyclase complex, soluble; response to defense-related host nitric oxide production; response to defense-related nitric oxide production by other organism involved in symbiotic interaction; response to herbicide; relaxation of vascular smooth muscle; positive regulation of cGMP biosynthetic process; relaxation of muscle; guanylate cyclase activity; negative regulation of muscle contraction; nitric oxide mediated signal transduction; regulation of blood pressure; heme binding; protein heterodimerization activity; platelet activation; GTP binding; protein dimerization activity] Continued on next page. . .

67 S name locus description studies u HES1 3:193853934-193856521 hairy and enhancer of split 1, (Drosophila) [Source:HGNC Symbol;Acc:5192], tcgaGliomaGE, tscapeBCa, 3q29 type=protein coding,retained intron, GO=[N-box binding; negative regulation of stomach tscapeMelanomad, neuroendocrine cell differentiation; regulation of stomach neuroendocrine cell differentiation; tscapeNSCLCd, negative regulation of auditory receptor cell differentiation; trochlear nerve development; tscapeOvariana oculomotor nerve development; HLH domain binding; auditory receptor cell fate determination; midbrain-hindbrain boundary morphogenesis; regulation of timing of neuron differentiation; auditory receptor cell differentiation; midbrain development; hindbrain morphogenesis; pituitary gland development; histone deacetylase binding; smoothened signaling pathway; liver development; cell maturation; endocrine pancreas development; negative regulation of gene-specific transcription from RNA polymerase II promoter; positive regulation of gene-specific transcription from RNA polymerase II promoter; in utero embryonic development; transcription repressor activity; positive regulation of transcription from RNA polymerase II promoter; cell morphogenesis involved in neuron differentiation; positive regulation of cell proliferation] u* HOMER2 15:83517738-83621473 homer homolog 2 (Drosophila) [Source:HGNC Symbol;Acc:17513], type=protein coding, tcgaGliomaGE 15q25.2 GO=[GKAP/Homer scaffold activity; metabotropic glutamate receptor binding; metabotropic glutamate receptor signaling pathway; postsynaptic density; postsynaptic membrane; actin binding; cell junction] d HOXC13 12:54332549-54340328 homeobox C13 [Source:HGNC Symbol;Acc:5125], type=protein coding, GO=[tongue morphogenesis; tcgaBreastGE, 12q13.13 hair follicle development; RNA polymerase II transcription factor activity] tcgaGliomaGE, tscapeRCCa u* IRS2 13:110406184- insulin receptor substrate 2 [Source:HGNC Symbol;Acc:6126], type=protein coding, GO=[negative snp3dDiabetes, 110438915 regulation of plasma membrane long-chain fatty acid transport; negative regulation of B cell snp3dObesity, tscapeBCd, 13q34 apoptosis; positive regulation of fatty acid beta-oxidation; positive regulation of glycogen tscapeCRCa, biosynthetic process; regulation of fatty acid transport; phosphatidylinositol 3-kinase binding; tscapeNSCLCa, positive regulation of glucose import; positive regulation of mesenchymal cell proliferation; insulin tscapeSCLCa receptor binding; positive regulation of insulin secretion; positive regulation of B cell proliferation; regulation of carbohydrate biosynthetic process; regulation of glucose import; regulation of lipid transport; regulation of glucose metabolic process; glycogen metabolic process; lipid homeostasis; fibroblast growth factor receptor signaling pathway; phosphatidylinositol-mediated signaling; response to glucose stimulus; mammary gland development; positive regulation of cell migration; insulin receptor signaling pathway; glucose metabolic process; carbohydrate biosynthetic process; nerve growth factor receptor signaling pathway; microsome; regulation of cell migration; positive regulation of transport; positive regulation of cell proliferation] u JAG1 20:10618332-10654608 jagged 1 [Source:HGNC Symbol;Acc:6188], type=processed transcript,protein coding, GO=[Notch tcgaGliomaGE 20p12.2 binding; positive regulation of Notch signaling pathway; response to muramyl dipeptide; morphogenesis of an epithelial sheet; Notch receptor processing; regulation of Notch signaling pathway; auditory receptor cell differentiation; endothelial cell differentiation; myoblast differentiation; positive regulation of myeloid cell differentiation; regulation of myeloid cell differentiation; growth factor activity; peptidase inhibitor activity; regulation of cell migration; calcium ion binding] u* KCNMA1 10:78637355-79398353 potassium large conductance calcium-activated channel, subfamily M, alpha member 1 tcgaGliomaGE, 10q22.3 [Source:HGNC Symbol;Acc:6284], type=processed transcript,protein coding, GO=[large tscapeOvariana, conductance calcium-activated potassium channel activity; negative regulation of cell volume; tscapeProstatea response to carbon monoxide; cellular potassium ion homeostasis; smooth muscle contraction involved in micturition; potassium ion homeostasis; response to osmotic stress; caveola; response to calcium ion; voltage-gated potassium channel complex; voltage-gated potassium channel activity; apical plasma membrane; potassium ion transport; response to hypoxia; response to metal ion; apical part of cell; platelet activation; actin binding; monovalent inorganic cation transport] u KIF22 16:29802041-29816706 kinesin family member 22 [Source:HGNC Symbol;Acc:6391], type=protein coding, tcgaBreastGE, 16p11.2 GO=[microtubule motor activity; kinetochore; microtubule-based movement; spindle; chromatin; tcgaGliomaGE, microtubule; DNA repair; response to DNA damage stimulus; microtubule cytoskeleton] tcgaOvarianGE u KLK2 19:51376689-51383822 kallikrein-related peptidase 2 [Source:HGNC Symbol;Acc:6363], type=protein coding, snp3dProstateC 19q13.33 GO=[serine-type endopeptidase activity] u KLK3 19:51358171-51364020 kallikrein-related peptidase 3 [Source:HGNC Symbol;Acc:6364], type=protein coding, snp3dProstateC 19q13.33 GO=[negative regulation of angiogenesis; regulation of angiogenesis; serine-type endopeptidase activity] u KRT8 12:53290971-53298868 keratin 8 [Source:HGNC Symbol;Acc:6446], type=protein coding, GO=[nuclear matrix; keratin snp3dLungC, tscapeRCCa 12q13.13 filament; intermediate filament] u LCP1 13:46700061-46786006 lymphocyte cytosolic protein 1 (L-plastin) [Source:HGNC Symbol;Acc:6528], tcgaGliomaGE, tscapeBCd, 13q14.13 type=processed transcript,protein coding, GO=[phagocytic cup; T cell activation involved in tscapeHCCd, immune response; ruffle membrane; actin filament; actin filament binding; organ regeneration; tscapeNSCLCd, actin filament bundle assembly; regulation of intracellular protein transport; actin binding; cell tscapeProstated, junction; calcium ion binding; identical protein binding] tscapeSCLCd d LMO4 1:87794151-87812788 LIM domain only 4 [Source:HGNC Symbol;Acc:6644], type=processed transcript,protein coding, tcgaGliomaGE, tscapeBCd, 1p22.3 GO=[neural tube closure; transcription factor complex; positive regulation of transcription from tscapeGliomad, RNA polymerase II promoter] tscapeNSCLCd, tscapeOvariand d LRRN1 3:3841121-3889387 leucine rich repeat neuronal 1 [Source:HGNC Symbol;Acc:20980], tcgaGliomaGE, 3p26.2 type=processed transcript,protein coding,retained intron tscapeOvariand u MAP7D1 1:36621180-36646450 MAP7 domain containing 1 [Source:HGNC Symbol;Acc:25514], tcgaGliomaGE 1p34.3 type=processed transcript,protein coding,retained intron, GO=[spindle; microtubule cytoskeleton] d MAPKAPK3 3:50648951-50686720 mitogen-activated protein kinase-activated protein kinase 3 [Source:HGNC Symbol;Acc:6888], tcgaBreastGE, 3p21.2 type=processed transcript,protein coding, GO=[MAP kinase kinase activity; stress-activated tcgaGliomaGE MAPK cascade; toll-like receptor 3 signaling pathway; MyD88-independent toll-like receptor signaling pathway; toll-like receptor 1 signaling pathway; toll-like receptor 2 signaling pathway; MyD88-dependent toll-like receptor signaling pathway; Toll signaling pathway; toll-like receptor 4 signaling pathway; pattern recognition receptor signaling pathway; activation of MAPK activity; stress-activated protein kinase signaling cascade; regulation of MAP kinase activity; nerve growth factor receptor signaling pathway] u* MBOAT2 2:8992820-9143942 membrane bound O-acyltransferase domain containing 2 [Source:HGNC Symbol;Acc:25193], tcgaBreastGE 2p25.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[1-acylglycerol-3-phosphate O-acyltransferase activity; phospholipid biosynthetic process] u MICAL1 6:109765265-109787171 microtubule associated monoxygenase, calponin and LIM domain containing 1 [Source:HGNC tscapeCRCd, tscapeSCLCd 6q21 Symbol;Acc:20619], type=processed transcript,protein coding,retained intron, GO=[monooxygenase activity; SH3 domain binding; intermediate filament] u MKLN1 7:130794855-131181395 muskelin 1, intracellular mediator containing kelch motifs [Source:HGNC Symbol;Acc:7109], tcgaGliomaGE, 7q32.3 type=nonsense mediated decay,processed transcript,protein coding,retained intron tscapeMelanomaa u* MPZL1 1:167690429-167761156 myelin protein zero-like 1 [Source:HGNC Symbol;Acc:7226], tcgaBreastGE, 1q24.2 type=nonsense mediated decay,processed transcript,protein coding,retained intron tcgaGliomaGE d MT1A 16:56672578-56673997 metallothionein 1A [Source:HGNC Symbol;Acc:7393], type=protein coding, GO=[cadmium ion 16q12.2 binding; copper ion binding] d MT1X 16:56716336-56718108 metallothionein 1X [Source:HGNC Symbol;Acc:7405], type=protein coding, GO=[response to metal tscapeMelanomad 16q13 ion] u MYBPC1 12:101988747- myosin binding protein C, slow type [Source:HGNC Symbol;Acc:7549], type=protein coding, 102079658 GO=[titin binding; myosin filament; structural constituent of muscle; muscle filament sliding; 12q23.2 myofibril; actin binding] d NAP1L3 X:92925929-92928567 nucleosome assembly protein 1-like 3 [Source:HGNC Symbol;Acc:7639], tcgaBreastGE, Xq21.32 type=processed transcript,protein coding, GO=[chromatin assembly complex; nucleosome tcgaGliomaGE assembly] u NCAPD3 11:134020014- non-SMC condensin II complex, subunit D3 [Source:HGNC Symbol;Acc:28952], tcgaBreastGE, 134095348 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[nuclear tcgaGliomaGE, 11q25 condensin complex; mitotic chromosome condensation; methylated histone residue binding; cell tcgaOvarianGE, tscapeBCd, division] tscapeNSCLCd u NDRG1 8:134249414-134314265 N-myc downstream regulated 1 [Source:HGNC Symbol;Acc:7679], snp3dMetastasis, 8q24.22 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[mast cell tscapeNSCLCa, activation; response to metal ion; microtubule cytoskeleton] tscapeOvariana Continued on next page. . .

68 S name locus description studies u* NFKBIA 14:35870717-35873952 nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor, alpha [Source:HGNC tscapeBCa, tscapeHCCa, 14q13.2 Symbol;Acc:7797], type=protein coding, GO=[cellular response to cold; nucleotide-binding tscapeProstatea oligomerization domain containing 1 signaling pathway; I-kappaB/NF-kappaB complex; nucleotide-binding oligomerization domain containing 2 signaling pathway; cytoplasmic sequestering of NF-kappaB; negative regulation of Notch signaling pathway; nuclear localization sequence binding; positive regulation of cholesterol efflux; response to muramyl dipeptide; negative regulation of macrophage derived foam cell differentiation; response to exogenous dsRNA; negative regulation of lipid storage; NF-kappaB binding; positive regulation of lipid transport; regulation of Notch signaling pathway; lipopolysaccharide-mediated signaling pathway; protein import into nucleus, translocation; negative regulation of NF-kappaB transcription factor activity; regulation of lipid transport; negative regulation of myeloid cell differentiation; toll-like receptor 3 signaling pathway; MyD88-independent toll-like receptor signaling pathway; toll-like receptor 1 signaling pathway; toll-like receptor 2 signaling pathway; positive regulation of NF-kappaB transcription factor activity; MyD88-dependent toll-like receptor signaling pathway; ubiquitin protein ligase binding; negative regulation of transcription factor activity; heat shock protein binding; Toll signaling pathway; toll-like receptor 4 signaling pathway; T cell receptor signaling pathway; pattern recognition receptor signaling pathway; regulation of myeloid cell differentiation; regulation of intracellular protein transport; positive regulation of gene-specific transcription from RNA polymerase II promoter; nerve growth factor receptor signaling pathway; anti-apoptosis; positive regulation of transport; positive regulation of transcription from RNA polymerase II promoter; identical protein binding] d NIPSNAP3A 9:107509969-107522403 nipsnap homolog 3A (C. elegans) [Source:HGNC Symbol;Acc:23619], 9q31.1 type=processed transcript,protein coding u* ODC1 2:10580094-10588630 ornithine decarboxylase 1 [Source:HGNC Symbol;Acc:8109], tcgaGliomaGE 2p25.1 type=processed transcript,protein coding, GO=[ornithine decarboxylase activity; polyamine biosynthetic process; cellular biogenic amine biosynthetic process; regulation of cellular amino acid metabolic process; response to virus] d P2RX4 12:121647660- purinergic receptor P2X, ligand-gated ion channel, 4 [Source:HGNC Symbol;Acc:8535], tcgaBreastGE 121671909 type=nonsense mediated decay,processed transcript,protein coding,retained intron, 12q24.31 GO=[relaxation of cardiac muscle; endothelial cell activation; negative regulation of cardiac muscle hypertrophy; extracellular ATP-gated cation channel activity; purinergic nucleotide receptor signaling pathway; positive regulation of prostaglandin secretion; regulation of prostaglandin secretion; positive regulation of icosanoid secretion; regulation of icosanoid secretion; relaxation of muscle; response to fluid shear stress; regulation of fatty acid transport; negative regulation of muscle contraction; response to ATP; cadherin binding; positive regulation of calcium-mediated signaling; positive regulation of lipid transport; regulation of excitatory postsynaptic membrane potential; regulation of sodium ion transport; positive regulation of calcium ion transport into cytosol; purinergic nucleotide receptor activity; positive regulation of nitric oxide biosynthetic process; regulation of striated muscle contraction; terminal button; purinergic receptor activity; regulation of lipid transport; copper ion binding; sensory perception of pain; positive regulation of ion transport; dendritic spine; drug binding; regulation of action potential in neuron; tissue homeostasis; postsynaptic density; regulation of blood pressure; protein homooligomerization; apical part of cell; neuronal cell body; perinuclear region of cytoplasm; positive regulation of transport; monovalent inorganic cation transport; protein homodimerization activity; cell junction; protein dimerization activity; identical protein binding] u PEA15 1:160175127-160185166 phosphoprotein enriched in astrocytes 15 [Source:HGNC Symbol;Acc:8822], 1q23.2 type=processed transcript,protein coding, GO=[negative regulation of glucose import; response to morphine; response to isoquinoline alkaloid; regulation of glucose import; response to alkaloid; microtubule associated complex; anti-apoptosis; microtubule cytoskeleton] u PGC 6:41704449-41721847 progastricsin (pepsinogen C) [Source:HGNC Symbol;Acc:8890], type=protein coding, tscapeCRCa, 6p21.1 GO=[aspartic-type endopeptidase activity; digestion; extracellular space] tscapeOvariana u PLEKHF1 19:30156327-30166376 pleckstrin homology domain containing, family F (with FYVE domain) member 1 [Source:HGNC tscapeBCa, tscapeOvariana 19q12 Symbol;Acc:20764], type=protein coding, GO=[lysosome; lytic vacuole; perinuclear region of cytoplasm; induction of apoptosis] u PMEPA1 20:56223448-56286592 prostate transmembrane protein, androgen induced 1 [Source:HGNC Symbol;Acc:14107], 20q13.31 type=processed transcript,protein coding, GO=[WW domain binding; androgen receptor signaling pathway] d* POLR3GL 1:145456236-145470388 polymerase (RNA) III (DNA directed) polypeptide G (32kD)-like [Source:HGNC 1q21.1 Symbol;Acc:28466], type=protein coding,retained intron u* PPAP2A 5:54720682-54830878 phosphatidic acid phosphatase type 2A [Source:HGNC Symbol;Acc:9228], tcgaBreastGE, tscapeBCd, 5q11.2 type=nonsense mediated decay,processed transcript,protein coding,retained intron, tscapeNSCLCd, GO=[sphingosine-1-phosphate phosphatase activity; phospholipid dephosphorylation; tscapeOvariand, phosphatidate phosphatase activity; germ cell migration; activation of protein kinase C activity by tscapeProstated G-protein coupled receptor protein signaling pathway; androgen receptor signaling pathway; sphingolipid metabolic process; protein dephosphorylation] d PRKD1 14:30045687-30396948 protein kinase D1 [Source:HGNC Symbol;Acc:9407], tcgaBreastGE, 14q12 type=processed transcript,protein coding,retained intron, GO=[protein kinase C activity; tcgaGliomaGE, tscapeHCCa sphingolipid metabolic process; cell cortex; cell junction] u PTGER4 5:40679600-40693837 prostaglandin E receptor 4 (subtype EP4) [Source:HGNC Symbol;Acc:9596], tcgaBreastGE, 5p13.1 type=processed transcript,protein coding, GO=[prostaglandin E receptor activity; G-protein tscapeSCLCa signaling, coupled to cAMP nucleotide second messenger; regulation of ossification] u PTPRCAP 11:67202981-67205538 protein tyrosine phosphatase, receptor type, C-associated protein [Source:HGNC Symbol;Acc:9667], 11q13.2 type=protein coding u* PYGB 20:25228705-25278650 phosphorylase, glycogen; brain [Source:HGNC Symbol;Acc:9723], 20p11.21 type=processed transcript,protein coding, GO=[glycogen phosphorylase activity; glycogen catabolic process; pyridoxal phosphate binding; glycogen metabolic process; drug binding; glucose metabolic process; cofactor binding; soluble fraction; protein homodimerization activity; protein dimerization activity; identical protein binding] d RASL11B 4:53728457-53733000 RAS-like, family 11, member B [Source:HGNC Symbol;Acc:23804], tcgaGliomaGE 4q12 type=processed transcript,protein coding,retained intron, GO=[GTPase activity; GTP binding] d REXO2 11:114310108- REX2, RNA exonuclease 2 homolog (S. cerevisiae) [Source:HGNC Symbol;Acc:17851], tcgaGliomaGE, tscapeBCd, 114321001 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[3’-5’ tscapeMelanomad, 11q23.2 exonuclease activity] tscapeProstated d RP11- 1:78276507-78277621 [undefined], type=processed pseudogene,pseudogene,retrotransposed 181C21.4 1p31.1 d RP11- 1:201980498-201984785 [undefined], type=processed transcript 510N19.5 1q32.1 u RP11- 4:88238322-88266362 [undefined], type=processed transcript 529H2.2 4q22.1 u S100P 4:6694796-6698897 S100 calcium binding protein P [Source:HGNC Symbol;Acc:10504], tcgaBreastGE 4p16.1 type=processed transcript,protein coding, GO=[calcium-dependent protein binding; endothelial cell migration; magnesium ion binding; calcium ion binding] u* SAT1 X:23801290-23804343 spermidine/spermine N1-acetyltransferase 1 [Source:HGNC Symbol;Acc:10540], tcgaGliomaGE Xp22.11 type=processed transcript,protein coding, GO=[spermidine binding; polyamine binding; diamine N-acetyltransferase activity; polyamine biosynthetic process; cellular biogenic amine biosynthetic process; soluble fraction] d SERPINI1 3:167453031-167543356 serpin peptidase inhibitor, clade I (neuroserpin), member 1 [Source:HGNC Symbol;Acc:8943], fileBC2brain, 3q26.1 type=processed transcript,protein coding,retained intron, GO=[peripheral nervous system tcgaGliomaGE development; serine-type endopeptidase inhibitor activity; peptidase inhibitor activity] u SGK1 6:134490384-134639250 serum/glucocorticoid regulated kinase 1 [Source:HGNC Symbol;Acc:10810], cosmicRecurrent, 6q23.2 type=processed transcript,protein coding,retained intron, GO=[phosphatidylinositol binding; tcgaBreastGE, tscapeBCa, monovalent inorganic cation transport; response to DNA damage stimulus] tscapeCRCa u SHROOM3 4:77356253-77704406 shroom family member 3 [Source:HGNC Symbol;Acc:30422], tscapeHCCd 4q21.1 type=processed transcript,protein coding,retained intron, GO=[cellular pigment accumulation; pigment accumulation; columnar/cuboidal epithelial cell development; apical protein localization; neural tube closure; regulation of cell shape; apical junction complex; adherens junction; apical plasma membrane; apical part of cell; microtubule; actin binding; cell junction; microtubule cytoskeleton] u SLC35F2 11:107661717- solute carrier family 35, member F2 [Source:HGNC Symbol;Acc:23615], tcgaBreastGE, 107799019 type=nonsense mediated decay,protein coding tcgaGliomaGE, tscapeBCd, 11q22.3 tscapeMelanomad u SLC43A1 11:57252007-57283259 solute carrier family 43, member 1 [Source:HGNC Symbol;Acc:9225], 11q12.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[neutral amino acid transmembrane transporter activity; neutral amino acid transport; L-amino acid transport; amino acid transmembrane transport; L-amino acid transmembrane transporter activity] u SLC45A3 1:205626979-205649587 solute carrier family 45, member 3 [Source:HGNC Symbol;Acc:8642], tscapeBCa 1q32.1 type=processed transcript,protein coding d SLITRK3 3:164904508-164914897 SLIT and NTRK-like family, member 3 [Source:HGNC Symbol;Acc:23501], type=protein coding, cosmicPrimary, 3q26.1 GO=[axonogenesis; cell morphogenesis involved in neuron differentiation] tcgaGliomaGE u* SOCS2 12:93963598-93969978 suppressor of cytokine signaling 2 [Source:HGNC Symbol;Acc:19382], type=protein coding, tcgaBreastGE, 12q22 GO=[JAK pathway signal transduction adaptor activity; prolactin receptor binding; growth tcgaGliomaGE hormone receptor binding; growth hormone receptor signaling pathway; insulin-like growth factor receptor binding; SH3/SH2 adaptor activity; JAK-STAT cascade; response to estradiol stimulus; regulation of cell growth; anti-apoptosis] Continued on next page. . .

69 S name locus description studies u SORD 15:45315305-45367043 [Source:HGNC Symbol;Acc:11184], type=protein coding, GO=[L-iditol tcgaBreastGE, 15q21.1 2-dehydrogenase activity; L-xylitol catabolic process; fructose biosynthetic process; pentitol tcgaGliomaGE, catabolic process; pentitol metabolic process; sorbitol catabolic process; sorbitol metabolic process; tscapeMelanomad, hexitol metabolic process; sperm motility; flagellum; NAD binding; alcohol biosynthetic process; tscapeNSCLCd cilium; glucose metabolic process; carbohydrate biosynthetic process; cofactor binding; soluble fraction; mitochondrial membrane; extracellular space] u SPDEF 6:34505579-34524110 SAM pointed domain containing ets transcription factor [Source:HGNC Symbol;Acc:17257], snp3dProstateC 6p21.31 type=protein coding, GO=[lung goblet cell differentiation; positive regulation of cell fate commitment; intestinal epithelial cell development; columnar/cuboidal epithelial cell development; negative regulation of survival gene expression; negative regulation of cell fate commitment; lung epithelium development; negative regulation of gene-specific transcription from RNA polymerase II promoter; anti-apoptosis; positive regulation of transcription from RNA polymerase II promoter] d SRSF7 2:38970741-38978636 serine/arginine-rich splicing factor 7 [Source:HGNC Symbol;Acc:10789], 2p22.1 type=nonsense mediated decay,protein coding,retained intron, GO=[termination of RNA polymerase II transcription; mRNA export from nucleus; mRNA 3’-end processing; nuclear mRNA splicing, via spliceosome; RNA splicing, via transesterification reactions with bulged adenosine as nucleophile] u ST3GAL4 11:126225535- ST3 beta-galactoside alpha-2,3-sialyltransferase 4 [Source:HGNC Symbol;Acc:10864], tcgaBreastGE, tscapeBCd, 126310239 type=nonsense mediated decay,processed transcript,protein coding,retained intron, tscapeNSCLCd 11q24.2 GO=[monosialoganglioside sialyltransferase activity; beta-galactoside alpha-2,3-sialyltransferase activity; integral to Golgi membrane; Golgi cisterna membrane; Golgi cisterna; protein N-linked glycosylation via asparagine; peptidyl-asparagine modification; integral to organelle membrane; protein glycosylation; post-translational protein modification] d STBD1 4:77172886-77232752 starch binding domain 1 [Source:HGNC Symbol;Acc:24854], type=protein coding 4q21.1 u STK39 2:168810530-169104651 serine threonine kinase 39 [Source:HGNC Symbol;Acc:17717], tcgaGliomaGE 2q24.3 type=processed transcript,protein coding, GO=[positive regulation of potassium ion transport; receptor signaling protein serine/threonine kinase activity; positive regulation of ion transport; apical plasma membrane; potassium ion transport; apical part of cell; basolateral plasma membrane; positive regulation of transport; monovalent inorganic cation transport] u SUN2 22:39130730-39190148 Sad1 and UNC84 domain containing 2 [Source:HGNC Symbol;Acc:14210], 22q13.1 type=protein coding,retained intron, GO=[nuclear migration along microfilament; nuclear matrix anchoring at nuclear membrane; nuclear matrix organization; SUN-KASH complex; microtubule organizing center attachment site; cytoskeletal anchoring at nuclear membrane; lamin binding; centrosome localization; nuclear envelope organization; nuclear chromosome, telomeric region; mitotic spindle organization; nuclear inner membrane; microtubule binding; positive regulation of cell migration; endosome membrane; regulation of cell migration] u TBC1D8 2:101624079-101869328 TBC1 domain family, member 8 (with GRAM domain) [Source:HGNC Symbol;Acc:17791], 2q11.2 type=processed transcript,protein coding,retained intron, GO=[Rab GTPase activator activity; regulation of Rab GTPase activity; Ras GTPase activator activity; positive regulation of cell proliferation; calcium ion binding] u TIPARP 3:156391024-156424559 TCDD-inducible poly(ADP-ribose) polymerase [Source:HGNC Symbol;Acc:23696], 3q25.31 type=protein coding, GO=[face morphogenesis; estrogen metabolic process; smooth muscle tissue development; face development; NAD+ ADP-ribosyltransferase activity; protein ADP-ribosylation; androgen metabolic process; platelet-derived growth factor receptor signaling pathway; palate development; vasculogenesis; post-embryonic development; female gonad development; kidney development; skeletal system morphogenesis] u TM4SF1 3:149086809-149095652 transmembrane 4 L six family member 1 [Source:HGNC Symbol;Acc:11853], snp3dMetastasis, 3q25.1 type=nonsense mediated decay,protein coding tscapeMelanomad u TMEFF2 2:192813769-193060435 transmembrane protein with EGF-like and two follistatin-like domains 2 [Source:HGNC tcgaGliomaGE 2q32.3 Symbol;Acc:11867], type=processed transcript,protein coding d TMEM123 11:102267063- transmembrane protein 123 [Source:HGNC Symbol;Acc:30138], tcgaGliomaGE, 102341115 type=processed transcript,protein coding,retained intron, GO=[oncosis; external side of plasma tscapeNSCLCa, 11q22.2 membrane] tscapeRCCd d TMEM144 4:159122756-159176563 transmembrane protein 144 [Source:HGNC Symbol;Acc:25633], tcgaGliomaGE, tscapeRCCd 4q32.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron d TMEM158 3:45265958-45267770 transmembrane protein 158 (gene/pseudogene) [Source:HGNC Symbol;Acc:30293], tcgaGliomaGE 3p21.31 type=protein coding u TMPRSS2 21:42836478-42903043 transmembrane protease, serine 2 [Source:HGNC Symbol;Acc:11876], tscapeProstated 21q22.3 type=processed transcript,protein coding,retained intron, GO=[scavenger receptor activity; serine-type endopeptidase activity] d TNFRSF21 6:47199268-47277641 tumor necrosis factor receptor superfamily, member 21 [Source:HGNC Symbol;Acc:13469], tcgaGliomaGE 6p12.3 type=protein coding d TRIB1 8:126442563-126450647 tribbles homolog 1 (Drosophila) [Source:HGNC Symbol;Acc:16891], tscapeHCCa, 8q24.13 type=processed transcript,protein coding, GO=[negative regulation of lipopolysaccharide-mediated tscapeOvariana signaling pathway; ubiquitin-protein ligase regulator activity; ligase regulator activity; mitogen-activated protein kinase kinase binding; negative regulation of smooth muscle cell migration; negative regulation of smooth muscle cell proliferation; lipopolysaccharide-mediated signaling pathway; positive regulation of proteasomal ubiquitin-dependent protein catabolic process; ubiquitin protein ligase binding; negative regulation of transcription factor activity; JNK cascade; stress-activated protein kinase signaling cascade; regulation of MAP kinase activity; regulation of cell migration] u TSKU 11:76493295-76509198 tsukushi small leucine rich proteoglycan homolog (Xenopus laevis) [Source:HGNC tscapeOvariana 11q13.5 Symbol;Acc:28850], type=protein coding d TSPAN8 12:71518869-71835678 tetraspanin 8 [Source:HGNC Symbol;Acc:11855], type=protein coding, GO=[negative regulation of 12q21.1 blood coagulation; protein glycosylation; lysosome; lytic vacuole] u* TUBA3D 2:132233666-132240507 tubulin, alpha 3d [Source:HGNC Symbol;Acc:24071], type=processed transcript,protein coding, tcgaBreastGE, 2q21.1 GO=[’de novo’ posttranslational protein folding; protein polymerization; microtubule-based tscapeOvariand, movement; GTPase activity; microtubule; GTP binding; microtubule cytoskeleton] tscapeProstated u UAP1 1:162531321-162569627 UDP-N-acteylglucosamine pyrophosphorylase 1 [Source:HGNC Symbol;Acc:12457], 1q23.3 type=processed transcript,protein coding, GO=[UDP-N-acetylglucosamine diphosphorylase activity; UDP-N-acetylglucosamine biosynthetic process; N-acetylglucosamine biosynthetic process; glucosamine biosynthetic process; nucleotide-sugar metabolic process; dolichol-linked oligosaccharide biosynthetic process; protein N-linked glycosylation via asparagine; peptidyl-asparagine modification; alcohol biosynthetic process; protein glycosylation; post-translational protein modification; carbohydrate biosynthetic process] u* UGDH 4:39500375-39529931 UDP-glucose 6-dehydrogenase [Source:HGNC Symbol;Acc:12525], tcgaGliomaGE 4p14 type=nonsense mediated decay,processed transcript,protein coding, GO=[UDP-glucose 6-dehydrogenase activity; UDP-glucuronate biosynthetic process; UDP-glucose metabolic process; nucleotide-sugar metabolic process; glycosaminoglycan biosynthetic process; gastrulation with mouth forming second; NAD binding; xenobiotic metabolic process; electron carrier activity; glucose metabolic process; carbohydrate biosynthetic process; cofactor binding] u ZMIZ1 10:80828792-81076285 zinc finger, MIZ-type containing 1 [Source:HGNC Symbol;Acc:16493], tscapeBCa 10q22.3 type=processed transcript,protein coding, GO=[vitellogenesis; artery morphogenesis; positive regulation of fibroblast proliferation; cell aging; vasculogenesis; nuclear speck; heart morphogenesis; developmental growth; in utero embryonic development; positive regulation of transcription from RNA polymerase II promoter; positive regulation of cell proliferation] d ZNF503 10:77039484-77161664 zinc finger protein 503 [Source:HGNC Symbol;Acc:23589], tscapeOvariana, 10q22.2 type=processed transcript,protein coding tscapeProstatea

70 11.2.1 GO enrichment of all candidates

Table 28: Enriched Gene Ontology terms [1] (FDR corrected p ≤ 0.05). Ratio is the proportion of the annotated genes among the whole gene set. List is sorted based on the FDR corrected p-values. Green and blue borders are referring to up and down regulated genes, respectively.

Ratio Type Description Genes 0.028 MF fatty acid elongase activity ELOVL2, ELOVL5, ELOVL6 0.029 BP fatty acid elongation ELOVL2, ELOVL5, ELOVL6 0.029 BP relaxation of muscle ADORA2B, GUCY1A3, P2RX4

molecular_function

fatty acid elongase activity

Figure 23: Relationships between the enriched molecular function Gene Ontology terms that were listed in Table 28. The darkness of the red reflects the significance of the enrichment and the thicknesses of the edges are propotional to the numbers of genes sharing the following annotation.

71 fatty acid relaxation elongation of muscle

biological_process

Figure 24: Relationships between the enriched biological process Gene Ontology terms that were listed in Table 28. The darkness of the red reflects the significance of the enrichment and the thicknesses of the edges are propotional to the numbers of genes sharing the following annotation.

72 12 ChIP-seq peaks

73 12.1 AR DHT binding sites

Chromosome specific statistics are shown in Table 29. A histogram of sequence lengths is shown in Figure 25.

length chromosome frequency min mean total coverage 1 668 37 441 921 294538 0.001182 10 268 15 407 927 109018 0.000804 11 316 41 428 1186 135187 0.001001 12 253 172 394 937 99742 0.000745 13 164 192 454 1452 74482 0.000647 14 210 53 428 934 89961 0.000838 15 216 13 444 1127 95987 0.000936 16 172 51 404 956 69534 0.00077 17 239 45 407 840 97299 0.001198 18 128 15 448 1080 57402 0.000735 19 75 153 390 949 29264 0.000495 2 479 22 412 1036 197146 0.000811 20 151 239 432 1583 65201 0.001035 21 82 115 447 750 36616 0.000761 22 68 178 374 582 25455 0.000496 3 495 42 467 1041 231115 0.001167 4 310 148 403 934 124864 0.000653 5 414 14 429 1152 177596 0.000982 6 416 14 397 908 165141 0.000965 7 348 8 432 1114 150453 0.000945 8 346 60 434 939 150036 0.001025 9 231 129 420 844 97026 0.000687 X 141 49 359 802 50676 0.000326 Y 3 197 270 347 809 1.4e-05 all 24 6193 8 424 1583 2624548 0.000357

Table 29: Chromosome specific distribution of the regions. The last line represents the overall statistics. 1500 1000 Frequency 500 0

0 500 1000 1500

length (base pairs)

Figure 25: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

74 The following table shows the properties of seqAR-gCount component. property value genes 12520

(a) seqAR-deNovo-meme1: width=15, sites=220, (b) seqAR-deNovo-meme2: width=15, sites=89, (c) seqAR-deNovo-meme3: width=15, sites=55, llr=2219, E=4.1e-98 llr=981, E=8.3e-09 llr=656, E=0.18

Figure 26: De novo motifs for the filtered AR DHT binding sites sequences.

Table 30: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 15.1889 0.1077 0.0070 607 80 3.68E-200 0.04753 TLX1::NFIC 3.6009 0.0178 0.0049 87 51 3.56E-12 0.01253 FOXA1pAR 2.9218 0.2027 0.0693 1153 686 1.803E-172 0.13727 FOXA1 2.8454 1.0124 0.3558 4008 3344 0.00E00 0.69032 Foxa2 2.5245 0.8818 0.3493 3683 2939 0.00E00 0.80588 FOXF2 2.4597 0.3025 0.1229 1605 1304 37.6E-180 0.20044 FOXD1 1.8098 1.3190 0.7288 4415 5568 0.00E00 1.18573 Tal1::Gata1 1.7757 0.0586 0.0330 348 374 3.063E-16 0.04294 ARMotifTH 1.7728 0.0677 0.0381 407 433 10.57E-20 0.04898 RXRA::VDR 1.7161 0.0089 0.0052 55 60 38.45E-04 0.00641 ARMotifT 1.7080 0.6655 0.3896 2952 3597 49.82E-264 0.54156 GABPA 1.7033 0.0877 0.0515 507 571 12.83E-22 0.06836 Stat3 1.6866 0.2303 0.1365 1062 1266 13.93E-48 0.23383 STAT1 1.6590 0.0501 0.0302 246 286 34.69E-10 0.05129 Foxq1 1.6428 0.6308 0.3840 2802 3356 15.16E-244 0.57899 ARMotifTT 1.5835 0.0123 0.0077 75 87 21.18E-04 0.00967 GRMotifTH 1.5534 0.3547 0.2283 1789 2314 60.72E-90 0.29995 AR 1.5232 0.0636 0.0418 378 465 6.365E-12 0.05107 PPARG::RXRA 1.5194 0.0226 0.0149 134 172 6.049E-04 0.01826 Evi1 1.5152 0.0438 0.0289 261 320 3.006E-08 0.03636 NR3C1 1.4861 0.3782 0.2545 1839 2530 7.91E-80 0.32991 ESR2 1.4841 0.0213 0.0143 114 146 16.44E-04 0.02109 ELK4 1.4460 0.0265 0.0183 154 191 62.72E-06 0.02866 1.4337 0.0157 0.0109 95 124 66.44E-04 0.01297 Gata1 1.4066 0.7884 0.5605 3224 4649 49.01E-240 0.75580 Esrrb 1.3524 0.1193 0.0882 690 959 11.14E-16 0.10270 MIZF 1.3514 0.0294 0.0217 178 248 10.96E-04 0.02492 GRMotifT 1.3475 2.4214 1.7969 5547 9361 0.00E00 2.56924 Tcfcp2l1 1.3362 0.0656 0.0491 375 515 7.202E-08 0.06615 FOXI1 1.3218 1.0023 0.7582 3620 5181 1.655E-320 1.46496 RXR::RAR DR5 1.2930 0.0292 0.0226 179 252 14.91E-04 0.02641 ARMotifHH 1.2889 0.3908 0.3032 1895 2814 3.315E-68 0.39250 GR 1.2744 0.2894 0.2271 1536 2226 1.394E-48 0.27380 ARMotifH 1.2580 8.5055 6.7613 6174 11511 0.00E00 15.61415 NHLH1 1.2447 0.1295 0.1040 506 755 12.62E-08 0.19920 NFIC 1.2413 2.6978 2.1735 5471 9611 0.00E00 3.62545 GRMotifHH 1.2125 0.2351 0.1939 1232 1930 7.574E-24 0.23544 FOXO3 1.2049 1.8395 1.5266 4938 8372 0.00E00 2.38753 NF-kappaB 0.8174 0.1702 0.2083 781 1868 2.136E-02 0.28265 SP1 0.8112 0.9882 1.2181 2548 5460 49.26E-40 4.95777 MZF1 5-13 0.8073 1.1368 1.4082 3928 8076 6.144E-210 1.93632 Prrx2 0.7967 2.0578 2.5830 4818 9577 0.00E00 4.63812 Foxd3 0.7835 1.9756 2.5214 4493 8111 0.00E00 10.64486 CREB1 0.7770 0.6465 0.8320 2779 5985 61.72E-54 1.00667 FOXL1 0.7769 6.1612 7.9305 5825 10857 0.00E00 68.39779 PLAG1 0.6819 0.0115 0.0168 70 188 1.65E-02 0.01587 Zfx 0.6715 0.1129 0.1681 582 1589 8.263E-08 0.19946 Ddit3::Cebpa 0.5262 0.2185 0.4153 1197 3420 6.309E-08 0.45901

75 12.2 FoxA1 binding sites

Chromosome specific statistics are shown in Table 31. A histogram of sequence lengths is shown in Figure 27.

length chromosome frequency min mean max total coverage 1 1857 7 408 1181 757524 0.003039 10 860 15 380 1018 327063 0.002413 11 889 6 391 1109 347932 0.002577 12 776 9 374 948 290418 0.00217 13 587 5 424 1426 248990 0.002162 14 663 38 414 1305 274706 0.002559 15 633 34 388 1028 245625 0.002396 16 457 2 377 1574 172472 0.001909 17 649 1 376 826 244180 0.003007 18 419 2 384 1175 160934 0.002061 19 194 14 331 702 64299 0.001087 2 1494 27 385 1299 575475 0.002366 20 454 0 390 1050 176897 0.002807 21 217 173 400 1203 86765 0.001803 22 188 18 335 723 63035 0.001229 3 1663 39 427 1326 710162 0.003586 4 945 2 377 1036 356174 0.001863 5 1085 9 395 1115 428987 0.002371 6 1064 12 370 1216 393366 0.002299 7 1017 22 392 1150 398720 0.002505 8 988 11 402 1588 397370 0.002715 9 795 3 389 1201 309147 0.002189 X 346 83 332 726 114943 0.00074 Y 16 214 336 461 5370 9e-05 all 24 18256 0 392 1588 7150554 0.000973

Table 31: Chromosome specific distribution of the regions. The last line represents the overall statistics. 6000 4000 Frequency 2000 0

0 500 1000 1500

length (base pairs)

Figure 27: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

76 The following table shows the properties of seqFX-gCount component. property value genes 22746

(a) seqFX-deNovo-meme1: width=15, sites=403, (b) seqFX-deNovo-meme2: width=15, sites=106, (c) seqFX-deNovo-meme3: width=10, sites=51, llr=3653, E=1.7e-237 llr=1156, E=7.3e-18 llr=577, E=12000

Figure 28: De novo motifs for the filtered FoxA1 binding sites sequences.

Table 32: Motif enrichments

motif ratio fC fR n1C n1R p1 var CTCF 5.2286 0.0175 0.0033 316 99 1.894E-70 0.01046 FOXA1 3.8703 1.2889 0.3330 14270 9226 0.00E00 0.81914 Foxa2 3.8463 1.2202 0.3172 14078 8096 0.00E00 0.86540 TLX1::NFIC 3.7209 0.0178 0.0047 246 133 50.93E-36 0.01326 FOXF2 3.6717 0.4041 0.1100 6231 3455 0.00E00 0.22891 Ar 3.5932 0.0270 0.0075 460 252 4.938E-64 0.01560 FOXD1 2.4565 1.6715 0.6804 15348 15698 0.00E00 1.34291 Foxq1 2.0812 0.7214 0.3466 9146 9107 0.00E00 0.59924 FOXA1pAR 1.7951 0.1218 0.0678 2028 2037 28.54E-118 0.10223 Tal1::Gata1 1.7651 0.0500 0.0283 873 955 14.75E-36 0.03673 ELK4 1.6973 0.0297 0.0175 490 561 53.04E-18 0.02530 E2F1 1.6130 0.0171 0.0106 300 354 11.7E-10 0.01349 GABPA 1.5922 0.0808 0.0507 1342 1652 8.952E-40 0.06689 FOXI1 1.5688 1.0711 0.6827 10867 14233 0.00E00 1.36166 Stat3 1.5132 0.1916 0.1266 2683 3440 85.53E-88 0.20321 FOXO3 1.4485 2.0266 1.3991 15372 23617 0.00E00 2.33339 STAT1 1.4315 0.0420 0.0293 603 803 5.993E-12 0.04711 RREB1 1.4044 0.0508 0.0362 379 848 1.365E-02 1.12071 NHLH1 1.3883 0.1305 0.0940 1534 2059 10.08E-34 0.18529 Gata1 1.3254 0.6881 0.5191 8473 13151 0.00E00 0.68006 ARMotifTH 1.3122 0.0485 0.0370 861 1203 5.603E-14 0.04551 Evi1 1.3105 0.0384 0.0293 663 960 91.68E-10 0.03450 ESR2 1.2962 0.0139 0.0107 225 332 41.93E-04 0.01418 ARMotifT 1.2765 0.4523 0.3543 6385 9722 16.12E-244 0.43434 AR 1.2721 0.0484 0.0380 849 1269 14.8E-10 0.04250 HNF1B 1.2574 0.0923 0.0734 1561 2377 1.668E-18 0.08410 MIZF 1.2483 0.0247 0.0198 438 667 2.071E-04 0.02196 PPARG::RXRA 1.2477 0.0176 0.0141 312 470 14.93E-04 0.01591 NFIC 1.2468 2.4761 1.9860 15921 27420 0.00E00 3.25736 TAL1::TCF3 1.2458 0.2388 0.1917 3111 4740 3.388E-56 0.30508 Myb 1.2305 0.7870 0.6396 9446 15550 0.00E00 0.79266 NKX3-1 1.2292 0.8656 0.7042 9418 14589 0.00E00 1.29778 GRMotifTH 1.2225 0.2461 0.2013 3870 6064 31.57E-76 0.23461 ARMotifTT 1.2209 0.0089 0.0073 161 243 3.044E-02 0.00815 Lhx3 0.8290 0.2675 0.3227 3418 7096 4.875E-06 0.55513 MEF2A 0.8260 0.2625 0.3178 3596 7629 21.43E-06 0.54174 Prrx2 0.8224 1.9681 2.3930 13725 27446 0.00E00 4.39305 NFKB1 0.8187 0.0474 0.0579 655 1429 1.781E-02 0.09389 SP1 0.8155 0.9080 1.1134 6577 15090 39.71E-42 7.09440 MZF1 5-13 0.7974 1.0245 1.2848 10800 22754 0.00E00 1.71883 PLAG1 0.7910 0.0107 0.0136 186 445 86.88E-04 0.01420 CREB1 0.7788 0.6005 0.7711 7699 16718 3.624E-114 0.92978 NF-kappaB 0.7176 0.1390 0.1938 1891 5165 74.1E-22 0.25479 Zfx 0.6073 0.0941 0.1550 1435 4378 23.38E-38 0.17280 Ddit3::Cebpa 0.5523 0.2068 0.3745 3322 9241 7.43E-18 0.41882 EWSR1-FLI1 0.1937 0.0043 0.0223 34 137 64.66E-06 0.21689

77 12.3 AR binding sites (siFOXA1)

Chromosome specific statistics are shown in Table 33. A histogram of sequence lengths is shown in Figure 29.

length chromosome frequency min mean max total coverage 1 1972 0 509 2521 1003641 0.004027 10 824 13 484 1855 398619 0.002941 11 914 4 494 1635 451186 0.003342 12 644 3 471 1839 303126 0.002265 13 482 60 555 1935 267600 0.002324 14 534 180 516 1578 275365 0.002565 15 646 8 493 1247 318358 0.003105 16 577 1 485 1768 280094 0.0031 17 884 10 497 2184 439527 0.005413 18 363 4 470 1493 170573 0.002185 19 314 2 441 1706 138569 0.002344 2 1224 10 487 1801 595741 0.00245 20 466 15 489 2364 227853 0.003615 21 214 7 490 1255 104890 0.002179 22 281 7 445 981 125153 0.002439 3 1262 3 533 2174 672560 0.003396 4 653 9 454 1278 296418 0.001551 5 1040 7 486 1482 505855 0.002796 6 784 4 460 1321 360962 0.002109 7 878 97 489 1748 428949 0.002695 8 821 1 486 1319 398839 0.002725 9 757 11 483 1414 365787 0.00259 X 389 68 411 1118 159698 0.001029 Y 18 244 649 1774 11687 0.000197 all 24 16941 0 490 2521 8301050 0.001129

Table 33: Chromosome specific distribution of the regions. The last line represents the overall statistics. 6000 Frequency 2000 0

0 500 1000 1500 2000 2500

length (base pairs)

Figure 29: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

78 The following table shows the properties of seqARnoF-gCount component. property value genes 20973

(a) seqARnoF-deNovo-meme1: width=15, sites=325, (b) seqARnoF-deNovo-meme2: width=15, sites=431, (c) seqARnoF-deNovo-meme3: width=15, sites=132, llr=3221, E=4.6e-173 llr=3640, E=1.9e-91 llr=1409, E=4.1e-10

Figure 30: De novo motifs for the filtered AR binding sites (siFOXA1) sequences.

Table 34: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 12.2740 0.1347 0.0109 2038 333 0.00E00 0.06383 REST 6.4679 0.0009 0.0001 10 3 38.0E-04 0.00082 TLX1::NFIC 4.8680 0.0320 0.0065 415 168 64.47E-78 0.02218 ESR1 2.6739 0.0012 0.0004 19 13 63.35E-04 0.00072 CTCF 2.6115 0.0123 0.0047 206 143 19.33E-22 0.00756 ESR2 2.5126 0.0402 0.0160 583 457 36.08E-50 0.03022 ELK4 2.4594 0.0543 0.0221 832 659 7.78E-70 0.03832 GABPA 2.4264 0.1563 0.0644 2345 1927 4.493E-210 0.10656 E2F1 2.2990 0.0282 0.0123 454 374 65.93E-36 0.02007 Stat3 2.0898 0.3250 0.1555 3953 3858 1.104E-308 0.30191 Tal1::Gata1 2.0445 0.0716 0.0350 1161 1085 1.834E-72 0.04857 Tcfcp2l1 2.0113 0.1226 0.0609 1813 1712 1.445E-118 0.10118 PPARG::RXRA 1.9601 0.0352 0.0179 577 555 3.555E-32 0.02471 ARMotifTH 1.8527 0.0846 0.0456 1346 1377 18.88E-70 0.06413 PPARG 1.8498 0.0038 0.0021 56 54 5.585E-04 0.00349 MIZF 1.8393 0.0444 0.0241 719 749 39.32E-34 0.03230 NHLH1 1.7655 0.2273 0.1287 2341 2534 16.96E-120 0.30967 STAT1 1.7036 0.0644 0.0378 849 974 6.058E-30 0.06544 GRMotifTH 1.6910 0.4336 0.2564 5773 6937 0.00E00 0.36369 Myf 1.6625 0.4070 0.2448 4674 5679 2.656E-254 0.51397 ARMotifHH 1.6505 0.5800 0.3514 6968 8540 0.00E00 0.55556 TFAP2A 1.6457 2.9618 1.7997 13179 19532 0.00E00 9.11978 EBF1 1.6296 1.1082 0.6800 9394 13127 0.00E00 1.51145 RXRA::VDR 1.6268 0.0101 0.0062 171 195 1.625E-06 0.00754 ARMotifT 1.6113 0.7156 0.4441 8383 10702 0.00E00 0.63533 RXR::RAR DR5 1.6028 0.0448 0.0279 736 863 6.628E-24 0.03448 Zfp423 1.5972 0.1011 0.0633 1327 1597 81.8E-44 0.11032 Mycn 1.5928 0.1908 0.1197 2310 2777 4.302E-88 0.22602 Esrrb 1.5624 0.1644 0.1052 2523 3114 3.923E-92 0.13138 Pax5 1.5537 0.0848 0.0546 1346 1642 13.17E-42 0.06869 Egr1 1.5434 0.1418 0.0919 2050 2479 18.06E-74 0.15301 Myc 1.5282 0.2276 0.1489 2792 3455 4.653E-106 0.27807 NR3C1 1.5192 0.4331 0.2851 5689 7557 33.47E-296 0.37861 NFIC 1.5046 3.8133 2.5343 15934 27285 0.00E00 5.88138 Arnt 1.4864 0.2671 0.1797 2833 3590 81.67E-102 0.39454 AR 1.4669 0.0695 0.0474 1120 1441 20.32E-28 0.05752 GRMotifT 1.4365 2.9578 2.0591 15622 26608 0.00E00 3.40970 ELK1 1.4181 1.2288 0.8664 11409 17113 0.00E00 1.26805 INSM1 1.4073 0.2757 0.1959 3779 5123 1.624E-130 0.28875 NFKB1 1.4045 0.1084 0.0772 1327 1700 2.903E-34 0.16697 ARMotifH 1.3905 10.9032 7.8413 16873 31337 0.00E00 24.75480 ARMotifTT 1.3695 0.0117 0.0085 196 259 1.809E-04 0.01008 NR2F1 1.3586 0.1583 0.1165 2451 3411 5.823E-58 0.14022 PLAG1 1.3513 0.0268 0.0198 422 587 68.05E-08 0.02586 Hand1::Tcfe2a 1.3485 1.6418 1.2175 12966 21192 0.00E00 1.79707 1.3314 0.3346 0.2513 4140 6110 1.987E-116 0.41023 GRMotifTT 1.3152 0.0308 0.0234 512 727 17.36E-08 0.02638 MYC::MAX 1.3121 0.0918 0.0700 1180 1713 69.19E-18 0.11164 Gata1 1.3106 0.8424 0.6428 8946 13953 0.00E00 0.89473 SP1 1.3084 1.9275 1.4732 10080 16407 0.00E00 10.36771 HIF1A::ARNT 1.2926 0.5675 0.4391 6018 9250 1.887E-226 0.87676 FEV 1.2784 0.8034 0.6284 9048 13981 0.00E00 0.79956 SPI1 1.2671 2.3955 1.8905 14703 25086 0.00E00 3.15823 RELA 1.2583 0.1431 0.1137 2025 3045 3.382E-30 0.15823 Mafb 1.2551 2.6602 2.1194 14699 25372 0.00E00 4.20146 Myb 1.2529 0.9942 0.7935 10085 16470 0.00E00 1.04307 TAL1::TCF3 1.2396 0.3016 0.2433 3610 5511 3.524E-78 0.38166 MZF1 1-4 1.2354 5.5616 4.5017 16226 29285 0.00E00 18.70876 GR 1.2300 0.3259 0.2649 4585 6886 1.314E-134 0.32241 NFE2L2 1.2287 0.1681 0.1368 2571 4002 9.749E-38 0.15247 GRMotifHH 1.2129 0.2726 0.2247 3797 5902 3.165E-80 0.28347 Zfx 1.2117 0.2454 0.2025 3084 5130 20.66E-38 0.30440 Pax6 0.8149 0.0093 0.0115 154 358 3.507E-02 0.01098 FOXA1pAR 0.8000 0.0690 0.0862 1063 2387 21.13E-04 0.09976 Foxq1 0.7969 0.3416 0.4287 4454 9917 3.68E-10 0.52964 SOX9 0.7915 0.5422 0.6851 6652 14617 8.343E-74 0.77096 Cebpa 0.7785 0.7670 0.9852 8172 18086 12.84E-166 1.22471 HOXA5 0.7539 5.6116 7.4431 16265 30932 0.00E00 20.28287 HNF1B 0.7449 0.0667 0.0895 1046 2611 30.56E-10 0.08853 Nobox 0.7258 1.3140 1.8105 10280 21892 0.00E00 3.37084 Sox5 0.7256 1.6776 2.3120 12328 25636 0.00E00 3.90446 PBX1 0.7187 0.0898 0.1249 1400 3397 17.85E-08 0.15698 Nkx2-5 0.6835 4.9366 7.2230 15582 30279 0.00E00 24.90744 SRY 0.6822 2.6807 3.9295 14021 28604 0.00E00 8.58833 TBP 0.6629 0.2267 0.3420 3203 8155 23.2E-06 0.50882 FOXI1 0.6576 0.5540 0.8425 6358 15181 5.813E-34 1.27112 ARID3A 0.6394 2.7748 4.3397 13602 27838 0.00E00 12.12494 Pdx1 0.6236 1.9514 3.1293 12349 27180 0.00E00 6.50588 Pou5f1 0.6076 0.0075 0.0124 127 388 2.56E-06 0.01078 Continued on next page. . .

79 motif ratio fC fR n1C n1R p1 var IRF1 0.6071 0.1780 0.2932 2573 7264 66.53E-24 0.32835 NKX3-1 0.5957 0.5123 0.8601 6029 15323 1.192E-12 1.27341 Prrx2 0.5890 1.7479 2.9675 11738 26758 0.00E00 5.79511 NFIL3 0.5688 0.1318 0.2318 1790 5286 1.349E-32 0.36557 MEF2A 0.5195 0.2044 0.3935 2612 8482 15.71E-56 0.53922 FOXL1 0.5020 4.5175 8.9984 14612 29754 0.00E00 74.45057 Ddit3::Cebpa 0.4946 0.2396 0.4845 3505 10350 2.872E-24 0.55378 Lhx3 0.4375 0.1736 0.3968 2192 7726 1.046E-80 0.61409 Foxd3 0.4316 1.2825 2.9713 8772 23170 2.649E-106 13.03165 EWSR1-FLI1 0.3054 0.0083 0.0272 56 166 36.45E-04 0.25523

80 12.4 FoxA1 binding sites (siFOXA1)

Chromosome specific statistics are shown in Table 35. A histogram of sequence lengths is shown in Figure 31.

length chromosome frequency min mean max total coverage 1 166 144 339 751 56234 0.000226 10 182 111 288 592 52434 0.000387 11 225 50 300 667 67390 0.000499 12 168 166 285 717 47850 0.000357 13 157 170 337 919 52979 0.00046 14 160 42 308 731 49255 0.000459 15 169 158 304 749 51456 0.000502 16 136 141 284 495 38634 0.000428 17 165 15 295 817 48614 0.000599 18 89 21 322 804 28691 0.000367 19 61 171 272 443 16595 0.000281 2 330 69 306 725 100905 0.000415 20 102 6 285 492 29037 0.000461 21 70 168 298 553 20849 0.000433 22 66 170 275 483 18150 0.000354 3 372 22 316 775 117447 0.000593 4 195 39 288 722 56246 0.000294 5 270 122 292 621 78899 0.000436 6 222 69 286 566 63467 0.000371 7 245 3 306 924 75045 0.000472 8 272 18 308 585 83695 0.000572 9 184 28 297 682 54620 0.000387 X 50 97 270 440 13481 8.7e-05 all 23 4056 3 301 924 1221973 0.000166

Table 35: Chromosome specific distribution of the regions. The last line represents the overall statistics. 2000 1000 Frequency 500 0

0 200 400 600 800 1000

length (base pairs)

Figure 31: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

81 The following table shows the properties of seqFXnoF-gCount component. property value genes 9542

(a) seqFXnoF-deNovo-meme1: width=14, sites=405, (b) seqFXnoF-deNovo-meme2: width=15, sites=137, (c) seqFXnoF-deNovo-meme3: width=15, sites=70, llr=3663, E=0 llr=1358, E=3.9e-23 llr=760, E=8.8

Figure 32: De novo motifs for the filtered FoxA1 binding sites (siFOXA1) sequences.

Table 36: Motif enrichments

motif ratio fC fR n1C n1R p1 var Pax4 84.8678 0.0042 0.0000 7 0 12.33E-04 0.00797 Ar 6.4510 0.0392 0.0060 150 46 37.14E-36 0.01897 TLX1::NFIC 5.4363 0.0195 0.0035 60 21 18.25E-14 0.01328 Foxa2 4.6906 1.1465 0.2444 3193 1440 0.00E00 0.71499 FOXA1 4.6623 1.2020 0.2578 3197 1632 0.00E00 0.68339 FOXF2 4.6051 0.3804 0.0826 1349 588 5.561E-304 0.19596 CTCF 4.4698 0.0084 0.0018 34 14 30.54E-08 0.00410 FOXD1 2.6993 1.4169 0.5249 3287 2898 0.00E00 1.00933 Foxq1 2.2687 0.5876 0.2590 1802 1591 37.13E-240 0.43559 GABPA 2.1680 0.0812 0.0374 307 276 1.854E-22 0.05549 Tal1::Gata1 2.1604 0.0508 0.0235 199 175 56.61E-16 0.03378 ELK4 2.1121 0.0242 0.0114 95 85 29.54E-08 0.01646 FOXA1pAR 2.0018 0.1041 0.0520 401 356 76.1E-32 0.07768 ESR2 1.9734 0.0146 0.0074 55 55 9.039E-04 0.01061 PPARG::RXRA 1.9020 0.0170 0.0089 64 66 5.585E-04 0.01314 RXRA::VDR 1.8656 0.0059 0.0032 24 24 3.696E-02 0.00410 STAT1 1.8042 0.0365 0.0202 117 121 1.726E-06 0.03651 Stat3 1.7984 0.1707 0.0949 519 586 1.047E-24 0.16736 ARMotifTH 1.7149 0.0390 0.0227 156 170 20.25E-08 0.02858 NHLH1 1.5824 0.1132 0.0715 299 336 40.15E-14 0.15418 E2F1 1.5676 0.0138 0.0088 54 61 66.62E-04 0.01283 Gata1 1.5389 0.6201 0.4029 1761 2400 3.13E-116 0.55850 AR 1.5184 0.0427 0.0281 164 210 69.11E-06 0.03428 NFIC 1.4808 2.2324 1.5075 3480 5532 0.00E00 2.49911 ARMotifT 1.4794 0.3925 0.2653 1269 1699 1.488E-62 0.34890 Tcfcp2l1 1.4664 0.0535 0.0365 200 258 10.13E-06 0.04729 MIZF 1.4638 0.0227 0.0155 92 117 39.13E-04 0.01784 FOXI1 1.4481 0.7398 0.5108 1942 2599 1.136E-148 0.94765 FOXO3 1.4293 1.5432 1.0797 3109 4591 0.00E00 1.66150 GRMotifTH 1.4019 0.2087 0.1488 752 1034 25.22E-24 0.17805 Evi1 1.4016 0.0363 0.0259 135 194 91.52E-04 0.03135 NFYA 1.3745 0.0491 0.0357 184 245 96.28E-06 0.05021 FEV 1.3581 0.4956 0.3649 1540 2281 4.36E-72 0.44587 RXR::RAR DR5 1.3113 0.0200 0.0152 81 115 4.709E-02 0.01676 Myb 1.3100 0.6357 0.4852 1828 2846 46.85E-96 0.59697 NR3C1 1.2659 0.2148 0.1697 742 1158 4.399E-14 0.20124 TAL1::TCF3 1.2584 0.1845 0.1466 552 818 63.59E-12 0.23364 ELK1 1.2582 0.6626 0.5266 1889 2948 1.896E-102 0.65941 Esrrb 1.2540 0.0787 0.0627 303 451 59.24E-06 0.07133 Hand1::Tcfe2a 1.2470 0.9058 0.7263 2354 3826 1.288E-166 0.87814 ARMotifHH 1.2439 0.2617 0.2104 888 1335 1.229E-22 0.26405 Arnt 1.2396 0.1315 0.1061 371 546 1.352E-06 0.18959 GRMotifT 1.2333 1.5442 1.2520 3116 5274 0.00E00 1.58016 Nr2e3 1.2315 0.3653 0.2966 898 1416 32.83E-20 0.53016 TEAD1 1.2232 0.0580 0.0474 227 348 32.23E-04 0.05205 NFE2L2 1.2166 0.1024 0.0841 391 614 37.19E-06 0.09154 NR2F1 1.2155 0.0831 0.0684 313 496 9.04E-04 0.07684 YY1 0.8294 2.3885 2.8798 3559 6936 0.00E00 4.35980 MZF1 5-13 0.8158 0.8091 0.9917 2099 4447 97.7E-64 1.24957 CREB1 0.8031 0.4857 0.6048 1481 3113 1.361E-18 0.72184 Foxd3 0.7679 1.3643 1.7766 2469 4529 3.975E-152 6.93989 Pdx1 0.7659 1.4924 1.9484 2779 5789 3.407E-176 3.25774 NF-kappaB 0.7503 0.1221 0.1628 375 938 36.35E-04 0.24170 FOXL1 0.7224 4.0429 5.5968 3591 6756 0.00E00 38.32273 PLAG1 0.7133 0.0089 0.0125 28 87 2.801E-02 0.01658 Prrx2 0.7097 1.3187 1.8581 2639 5621 1.663E-140 2.99570 Zfx 0.7057 0.0851 0.1206 295 759 9.306E-04 0.14369 MEF2A 0.6890 0.1640 0.2381 526 1318 79.18E-04 0.32938 PBX1 0.6871 0.0570 0.0830 221 527 3.018E-02 0.15072 IRF1 0.6498 0.1194 0.1838 440 1164 1.489E-04 0.19910 Lhx3 0.6159 0.1522 0.2471 474 1245 2.872E-04 0.38028 Ddit3::Cebpa 0.5431 0.1524 0.2807 556 1661 27.88E-10 0.29214

82 12.5 AR and FoxA1 binding site overlaps

Chromosome specific statistics are shown in Table 37. A histogram of sequence lengths is shown in Figure 33.

length chromosome frequency min mean max total coverage 1 466 4 393 886 183264 0.000735 10 186 65 367 758 68317 0.000504 11 217 1 370 861 80260 0.000594 12 183 1 343 853 62843 0.000469 13 138 107 401 1028 55345 0.000481 14 148 6 388 889 57447 0.000535 15 161 11 386 1020 62077 0.000605 16 111 51 331 599 36737 0.000407 17 147 47 367 690 53937 0.000664 18 99 73 378 1040 37423 0.000479 19 44 7 324 535 14245 0.000241 2 362 12 363 998 131227 0.00054 20 102 129 378 790 38516 0.000611 21 53 69 400 651 21187 0.00044 22 37 18 323 515 11948 0.000233 3 387 3 402 989 155436 0.000785 4 215 19 353 798 75848 0.000397 5 289 9 374 937 108226 0.000598 6 271 14 348 877 94343 0.000551 7 252 4 379 1105 95472 6e-04 8 269 11 378 896 101737 0.000695 9 179 7 364 814 65234 0.000462 X 84 75 314 598 26417 0.00017 Y 2 189 227 265 454 8e-06 all 24 4402 1 372 1105 1637940 0.000223

Table 37: Chromosome specific distribution of the regions. The last line represents the overall statistics. 1500 1000 Frequency 500 0

0 200 400 600 800 1000 1200

length (base pairs)

Figure 33: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

83 The following table shows the properties of seqARFX-gCount component. property value genes 9609

(a) seqARFX-deNovo-meme1: width=11, sites=386, (b) seqARFX-deNovo-meme2: width=15, sites=109, (c) seqARFX-deNovo-meme3: width=15, sites=35, llr=3207, E=2.7e-136 llr=1148, E=2.9e-13 llr=459, E=0.0044

Figure 34: De novo motifs for the filtered AR and FoxA1 binding site overlaps sequences.

Table 38: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 13.0736 0.0719 0.0055 293 44 1.706E-92 0.03202 FOXA1 3.6164 1.1676 0.3228 3196 2152 0.00E00 0.74609 FOXA1pAR 3.5056 0.2067 0.0589 844 434 54.0E-148 0.12109 Foxa2 3.5030 1.0558 0.3014 3065 1865 0.00E00 0.75686 FOXF2 3.1511 0.3559 0.1129 1325 863 66.72E-206 0.21111 TLX1::NFIC 2.3742 0.0139 0.0058 48 39 86.9E-06 0.01203 FOXD1 2.3153 1.4677 0.6339 3415 3584 0.00E00 1.16494 Tal1::Gata1 2.2032 0.0551 0.0250 231 202 20.57E-18 0.03657 Foxq1 2.0101 0.6653 0.3309 2099 2111 98.46E-250 0.54603 ARMotifTH 1.9398 0.0624 0.0321 267 258 1.634E-16 0.04321 ARMotifTT 1.9095 0.0109 0.0057 47 46 17.62E-04 0.00778 ESR2 1.8486 0.0173 0.0093 69 73 5.626E-04 0.01370 ARMotifT 1.6831 0.5779 0.3433 1897 2281 1.191E-156 0.46831 FOXI1 1.5938 1.0528 0.6606 2679 3368 16.25E-304 1.21307 STAT1 1.5777 0.0421 0.0267 144 175 45.03E-06 0.04574 GABPA 1.5624 0.0735 0.0471 309 365 13.63E-12 0.05943 Stat3 1.5511 0.1862 0.1201 624 785 1.82E-22 0.19683 GRMotifTH 1.5322 0.2992 0.1952 1096 1412 13.66E-50 0.25287 Gata1 1.5067 0.7357 0.4882 2186 3072 4.737E-160 0.65537 Evi1 1.4466 0.0398 0.0275 168 222 1.992E-04 0.03286 AR 1.4387 0.0553 0.0384 232 307 6.216E-06 0.04567 NR3C1 1.4041 0.3074 0.2189 1093 1593 2.369E-34 0.27011 MIZF 1.3906 0.0241 0.0173 106 140 45.22E-04 0.01979 PPARG::RXRA 1.3838 0.0173 0.0125 72 99 4.231E-02 0.01539 FOXO3 1.3363 1.8226 1.3639 3548 5587 0.00E00 2.11891 GRMotifT 1.3225 2.0845 1.5762 3769 6333 0.00E00 2.11219 Esrrb 1.3104 0.0979 0.0747 405 586 12.56E-08 0.08542 TAL1::TCF3 1.2722 0.2297 0.1806 717 1092 8.904E-14 0.29060 HNF1B 1.2710 0.0888 0.0699 360 543 18.06E-06 0.08120 GR 1.2535 0.2507 0.2000 957 1426 71.74E-26 0.23626 NFIC 1.2490 2.3875 1.9115 3752 6490 0.00E00 3.08042 GRMotifHH 1.2371 0.2077 0.1678 781 1187 4.702E-16 0.20645 ARMotifH 1.2343 7.3001 5.9141 4362 8100 0.00E00 12.65366 TEAD1 1.2052 0.0751 0.0623 312 506 34.81E-04 0.06676 FEV 1.2007 0.5724 0.4767 1851 2987 15.47E-80 0.57310 MZF1 1-4 0.8197 2.7782 3.3894 3786 7210 0.00E00 8.30648 Prrx2 0.8129 1.8440 2.2684 3340 6523 31.21E-302 3.88805 MZF1 5-13 0.7486 0.9194 1.2281 2470 5368 75.87E-86 1.59551 CREB1 0.7340 0.5412 0.7373 1711 3878 70.14E-18 0.87456 NF-kappaB 0.7242 0.1421 0.1962 467 1257 18.23E-06 0.25689 Zfp423 0.6867 0.0353 0.0514 129 328 1.09E-02 0.06793 SP1 0.6584 0.7236 1.0990 1476 3550 19.16E-06 4.62701 Klf4 0.6530 0.1209 0.1852 457 1232 15.86E-06 0.21974 Pou5f1 0.6333 0.0080 0.0126 34 100 3.25E-02 0.01167 RREB1 0.6327 0.0225 0.0357 73 195 1.775E-02 0.10879 PLAG1 0.5426 0.0077 0.0143 33 114 27.85E-04 0.01267 Zfx 0.5307 0.0795 0.1498 308 1050 65.44E-16 0.15379 Ddit3::Cebpa 0.5131 0.1881 0.3666 738 2181 1.021E-08 0.39905 EWSR1-FLI1 0.0902 0.0025 0.0283 8 37 2.602E-02 0.32106

84 12.6 AR and FoxA1 binding site overlaps (up)

Chromosome specific statistics are shown in Table 39. A histogram of sequence lengths is shown in Figure 35.

length chromosome frequency min mean max total coverage 1 53 63 417 757 22122 8.9e-05 10 10 240 382 644 3815 2.8e-05 11 19 1 378 806 7180 5.3e-05 12 14 81 344 606 4817 3.6e-05 13 11 143 314 488 3453 3e-05 14 10 220 330 424 3305 3.1e-05 15 17 237 397 627 6755 6.6e-05 16 7 299 424 599 2967 3.3e-05 17 18 237 377 617 6780 8.4e-05 18 1 684 684 684 684 9e-06 19 6 311 431 535 2586 4.4e-05 2 22 105 396 687 8702 3.6e-05 20 11 129 366 586 4026 6.4e-05 21 11 303 425 605 4671 9.7e-05 3 18 39 384 568 6904 3.5e-05 4 29 127 397 798 11508 6e-05 5 21 201 361 694 7578 4.2e-05 6 31 182 370 641 11462 6.7e-05 7 16 252 390 703 6238 3.9e-05 8 13 63 383 585 4982 3.4e-05 9 1 457 457 457 457 3e-06 X 1 322 322 322 322 2e-06 all 22 340 1 386 806 131314 1.8e-05

Table 39: Chromosome specific distribution of the regions. The last line represents the overall statistics. 120 80 60 Frequency 40 20 0

0 200 400 600 800

length (base pairs)

Figure 35: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

85 The following table shows the properties of seqARFXu-gCount component. property value genes 144

(a) seqARFXu-deNovo-meme1: width=15, (b) seqARFXu-deNovo-meme2: width=15, sites=68, (c) seqARFXu-deNovo-meme3: width=15, sites=23, sites=214, llr=1789, E=1.4e-12 llr=736, E=1.2e-05 llr=313, E=25

Figure 36: De novo motifs for the filtered AR and FoxA1 binding site overlaps (up) sequences.

Table 40: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 14.8286 0.0944 0.0063 26 4 42.67E-10 0.05012 FOXA1 3.8032 1.1357 0.2986 244 156 71.83E-64 0.70755 Foxa2 3.5375 1.0619 0.3002 232 139 4.879E-60 0.79999 FOXF2 2.9297 0.3333 0.1137 94 68 11.39E-14 0.20782 FOXA1pAR 2.6218 0.1740 0.0664 53 32 1.16E-08 0.14470 GABPA 2.1425 0.0914 0.0427 30 26 23.42E-04 0.06029 FOXD1 2.0952 1.4366 0.6856 260 296 12.97E-46 1.16202 Pax5 1.9820 0.0501 0.0253 17 15 3.707E-02 0.03489 ARMotifT 1.7879 0.5988 0.3349 156 178 4.409E-16 0.44471 Stat3 1.6779 0.2094 0.1248 55 68 31.64E-04 0.19037 GRMotifTH 1.6354 0.3540 0.2164 96 121 3.796E-06 0.28532 NR3C1 1.6270 0.3599 0.2212 94 121 10.27E-06 0.30420 GRMotifHH 1.6267 0.2596 0.1596 70 92 10.55E-04 0.23095 Myf 1.6208 0.2330 0.1438 56 70 32.58E-04 0.28658 Foxq1 1.5863 0.6165 0.3886 160 185 1.607E-16 0.56026 Esrrb 1.5851 0.1327 0.0837 42 53 2.101E-02 0.09693 FOXO3 1.5204 1.9528 1.2844 279 430 1.075E-40 1.96571 FOXI1 1.4659 1.0236 0.6983 204 261 23.68E-24 1.26112 GRMotifT 1.4612 2.2714 1.5545 295 490 2.097E-44 2.15331 FEV 1.4480 0.6726 0.4645 157 220 10.15E-12 0.62376 ARMotifH 1.3796 8.0029 5.8009 338 622 1.983E-62 13.59463 Gata1 1.3613 0.6667 0.4897 154 242 47.19E-10 0.61424 Hand1::Tcfe2a 1.3064 1.2301 0.9415 222 374 17.94E-20 1.19183 ARMotifHH 1.3059 0.3363 0.2575 97 138 79.82E-06 0.31314 Nr2e3 1.3011 0.4543 0.3491 96 132 39.86E-06 0.61825 NFIC 1.2724 2.5428 1.9984 297 517 24.77E-44 3.00365 BRCA1 1.2659 2.0118 1.5893 286 485 11.32E-40 2.34457 Myb 1.2615 0.7434 0.5893 172 281 93.06E-12 0.71794 TAL1::TCF3 1.2448 0.2065 0.1659 55 79 2.566E-02 0.24253 GRMotifH 1.2392 5.6637 4.5703 328 609 3.791E-56 8.59395 SPI1 1.2382 1.8348 1.4818 270 469 1.421E-32 2.11771 ELK1 1.2209 0.8024 0.6572 179 300 28.45E-12 0.78169 SOX9 1.2178 0.6637 0.5450 161 250 3.253E-10 0.64855 RUNX1 1.2172 0.3038 0.2496 91 145 26.19E-04 0.25223 YY1 0.8287 3.1327 3.7804 317 591 4.976E-50 5.54285 MZF1 5-13 0.8206 1.0383 1.2654 202 418 2.626E-10 1.58321 SP1 0.8070 0.9027 1.1185 134 264 8.966E-04 3.62532 Pdx1 0.7713 1.9971 2.5893 256 511 33.69E-24 4.90178 Prrx2 0.7575 1.8525 2.4455 252 502 4.073E-22 4.18396 FOXL1 0.7352 5.4277 7.3823 319 584 7.243E-52 36.98649 CREB1 0.7322 0.5575 0.7615 136 303 1.224E-02 0.89371 Foxd3 0.7215 1.7109 2.3712 235 436 8.731E-20 8.20565 IRF1 0.6377 0.1652 0.2591 42 135 3.733E-02 0.31329 RORA 2 0.1353 0.0029 0.0221 1 14 4.514E-02 0.01521

86 12.7 AR and FoxA1 binding site overlaps (down)

Chromosome specific statistics are shown in Table 41. A histogram of sequence lengths is shown in Figure 37.

length chromosome frequency min mean max total coverage 1 3 333 400 436 1201 5e-06 2 3 264 334 428 1002 4e-06 3 9 281 435 589 3916 2e-05 4 4 270 342 399 1369 7e-06 5 2 269 324 380 649 4e-06 6 6 141 366 707 2199 1.3e-05 7 9 194 393 568 3538 2.2e-05 8 1 243 243 243 243 2e-06 9 1 231 231 231 231 2e-06 10 4 233 350 419 1400 1e-05 11 12 215 438 646 5262 3.9e-05 12 5 121 302 490 1512 1.1e-05 13 1 466 466 466 466 4e-06 14 8 397 550 702 4399 4.1e-05 15 7 156 342 548 2396 2.3e-05 16 2 232 289 346 578 6e-06 20 5 194 406 627 2029 3.2e-05 all 17 82 121 395 707 32390 4e-06

Table 41: Chromosome specific distribution of the regions. The last line represents the overall statistics. 25 20 15 10 Frequency 5 0

100 200 300 400 500 600 700 800

length (base pairs)

Figure 37: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

87 The following table shows the properties of seqARFXd-gCount component. property value genes 49

(a) seqARFXd-deNovo-meme1: width=11, sites=35, (b) seqARFXd-deNovo-meme2: width=15, sites=6, (c) seqARFXd-deNovo-meme3: width=11, sites=13, llr=349, E=0.064 llr=99, E=640000 llr=156, E=740000

Figure 38: De novo motifs for the filtered AR and FoxA1 binding site overlaps (down) sequences.

Table 42: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 2196.1220 0.1098 0.0000 6 0 29.15E-04 0.06211 FOXA1pAR 3.5682 0.2073 0.0581 17 8 2.753E-04 0.10656 FOXA1 3.1272 1.1098 0.3548 58 43 2.246E-14 0.75449 Foxa2 3.0562 0.9268 0.3032 55 37 4.955E-14 0.61510 GABPA 2.9682 0.1341 0.0452 11 7 2.102E-02 0.07048 FOXF2 2.8901 0.3171 0.1097 22 15 3.259E-04 0.19999 Foxq1 2.3324 0.7073 0.3032 39 40 10.54E-06 0.52746 Stat3 2.2048 0.2561 0.1161 17 17 2.948E-02 0.18043 FOXI1 1.7803 0.9878 0.5548 47 58 2.608E-06 1.09036 FOXD1 1.6959 1.1707 0.6903 57 76 1.786E-08 0.97933 Gata1 1.6938 0.8415 0.4968 48 59 1.381E-06 0.61042 ARMotifT 1.6659 0.6341 0.3806 43 47 3.358E-06 0.45344 ELK1 1.5977 0.8659 0.5419 47 59 3.455E-06 0.74419 GRMotifTH 1.5613 0.2317 0.1484 19 22 4.34E-02 0.15490 GRMotifT 1.3269 2.1829 1.6452 76 120 1.284E-14 1.81038 Myb 1.3168 0.7561 0.5742 37 70 3.203E-02 0.71523 ARMotifH 1.2615 7.5366 5.9742 82 150 2.202E-16 12.15762 Mafb 1.2434 1.8049 1.4516 70 105 2.989E-12 2.00830 SPI1 1.2335 1.6951 1.3742 60 101 20.66E-08 2.19999 FOXL1 0.8169 5.6341 6.8968 79 145 1.401E-14 26.41894 Nkx3-2 0.7876 1.4634 1.8581 59 112 3.701E-06 2.83737 Prrx2 0.7868 2.0000 2.5419 66 120 96.43E-10 4.13656 RELA 0.1803 0.0244 0.1355 2 18 4.993E-02 0.11342

88 12.8 AR without FoxA1 binding site overlaps

Chromosome specific statistics are shown in Table 43. A histogram of sequence lengths is shown in Figure 39.

length chromosome frequency min mean max total coverage 1 169 37 376 819 63543 0.000255 10 80 15 356 586 28497 0.00021 11 95 209 381 640 36229 0.000268 12 67 172 354 875 23686 0.000177 13 33 197 346 620 11422 9.9e-05 14 62 199 376 649 23286 0.000217 15 55 13 360 577 19800 0.000193 16 48 216 389 684 18665 0.000207 17 85 45 367 568 31224 0.000385 18 27 248 403 514 10871 0.000139 19 33 153 357 726 11776 0.000199 2 116 167 364 681 42281 0.000174 20 47 241 377 630 17734 0.000281 21 29 115 398 750 11556 0.00024 22 30 178 332 521 9973 0.000194 3 117 42 402 796 47033 0.000238 4 87 217 361 543 31429 0.000164 5 115 34 372 727 42804 0.000237 6 141 50 351 682 49510 0.000289 7 91 8 374 699 34014 0.000214 8 77 113 371 662 28581 0.000195 9 59 199 359 648 21161 0.00015 X 53 49 328 802 17402 0.000112 Y 1 347 347 347 347 6e-06 all 24 1717 8 369 875 632824 8.6e-05

Table 43: Chromosome specific distribution of the regions. The last line represents the overall statistics. 600 400 Frequency 200 0

0 200 400 600 800

length (base pairs)

Figure 39: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

89 The following table shows the properties of seqARnFX-gCount component. property value genes 5727

(a) seqARnFX-deNovo-meme1: width=15, (b) seqARnFX-deNovo-meme2: width=14, sites=57, (c) seqARnFX-deNovo-meme3: width=15, sites=66, sites=258, llr=2608, E=2.7e-182 llr=710, E=4.2e-27 llr=755, E=2e-08

Figure 40: De novo motifs for the filtered AR without FoxA1 binding site overlaps sequences.

Table 44: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 23.2972 0.1754 0.0075 269 24 53.09E-100 0.07466 TLX1::NFIC 3.6184 0.0216 0.0059 32 17 12.59E-06 0.01409 CTCF 3.1554 0.0070 0.0022 12 7 1.834E-02 0.00384 FOXA1pAR 2.8221 0.1690 0.0599 260 163 4.216E-36 0.11676 ELK4 2.7290 0.0350 0.0128 54 39 2.63E-06 0.02375 GABPA 2.4100 0.0985 0.0408 152 127 16.36E-14 0.06941 Stat3 2.0734 0.2424 0.1169 311 303 21.76E-24 0.22018 NR3C1 2.0674 0.4073 0.1970 553 563 47.64E-50 0.29600 ESR2 2.0128 0.0239 0.0118 33 32 87.24E-04 0.02270 E2F1 2.0091 0.0245 0.0122 42 37 7.258E-04 0.01700 RXR::RAR DR5 1.9704 0.0332 0.0168 57 53 1.647E-04 0.02245 ARMotifT 1.9452 0.6573 0.3379 802 874 10.1E-84 0.52576 GRMotifTH 1.9216 0.3636 0.1892 508 536 2.378E-40 0.28951 AR 1.9037 0.0624 0.0327 106 104 49.13E-08 0.04202 ARMotifTH 1.8683 0.0682 0.0365 113 113 34.11E-08 0.05137 FOXA1 1.8545 0.5833 0.3145 769 841 53.59E-78 0.43825 Tcfcp2l1 1.7771 0.0793 0.0446 127 128 6.225E-08 0.06402 STAT1 1.6770 0.0460 0.0274 68 77 16.32E-04 0.04212 ARMotifHH 1.6598 0.4248 0.2559 548 681 38.2E-34 0.36727 Esrrb 1.6343 0.1136 0.0695 184 214 2.19E-08 0.08623 FOXF2 1.6021 0.1643 0.1026 262 308 1.841E-12 0.12618 GRMotifT 1.5119 2.3357 1.5449 1530 2460 43.69E-244 2.31917 Tal1::Gata1 1.5118 0.0519 0.0343 85 108 38.16E-04 0.04123 Foxa2 1.4829 0.4382 0.2955 606 725 82.63E-44 0.51779 GR 1.4656 0.2686 0.1833 401 512 1.858E-18 0.22863 GRMotifTT 1.4492 0.0262 0.0181 45 57 4.878E-02 0.02089 Myc 1.4405 0.1469 0.1019 200 262 1.361E-06 0.15781 Arnt 1.4246 0.1830 0.1284 219 282 9.573E-08 0.23664 Mycn 1.4127 0.1136 0.0804 153 200 61.58E-06 0.12865 Gata1 1.4073 0.6585 0.4679 790 1139 2.962E-50 0.60069 ARMotifH 1.4044 8.2541 5.8775 1707 3161 2.358E-306 13.49393 GRMotifHH 1.3737 0.2261 0.1646 331 462 1.32E-10 0.20711 Foxq1 1.3604 0.4376 0.3217 575 805 16.05E-28 0.43985 NFIC 1.3377 2.5227 1.8859 1488 2511 30.37E-214 3.17890 EBF1 1.3337 0.6457 0.4841 682 1030 1.33E-32 0.85699 Zfp423 1.3225 0.0705 0.0533 95 132 1.229E-02 0.10089 ELK1 1.3012 0.8514 0.6543 946 1473 92.91E-68 0.82026 TFAP2A 1.2923 1.7669 1.3672 1060 1771 7.917E-80 4.71802 NHLH1 1.2802 0.1241 0.0969 142 206 31.48E-04 0.17636 FEV 1.2558 0.5653 0.4501 734 1123 23.86E-38 0.52296 Myf 1.2513 0.2395 0.1914 312 468 14.53E-08 0.31640 Hand1::Tcfe2a 1.2510 1.1305 0.9037 1128 1849 5.377E-98 1.13133 FOXD1 1.2220 0.7634 0.6247 878 1421 42.11E-52 0.77424 NR2F1 1.2188 0.1125 0.0923 181 280 31.15E-04 0.10165 SPI1 1.2097 1.7162 1.4186 1339 2308 80.74E-152 1.97817 Nkx2-5 0.8286 4.4406 5.3591 1584 3005 34.17E-234 14.45917 ARID3A 0.8191 2.6340 3.2157 1395 2655 3.741E-152 7.59886 CREB1 0.8020 0.5880 0.7332 730 1505 35.22E-16 0.90450 NKX3-1 0.7637 0.4994 0.6540 601 1304 3.251E-06 1.03426 Pdx1 0.7629 1.7937 2.3513 1246 2545 2.459E-94 4.18031 Prrx2 0.7345 1.6084 2.1898 1194 2519 52.19E-78 3.58647 FOXL1 0.6587 4.4114 6.6970 1531 2928 2.554E-206 58.03803 Foxd3 0.6446 1.4347 2.2257 1017 2122 55.07E-46 8.65442 Lhx3 0.6212 0.1772 0.2852 231 610 1.777E-02 0.43738 NFIL3 0.5955 0.1125 0.1889 156 443 11.34E-04 0.43131 MEF2A 0.5255 0.1661 0.3161 229 713 8.89E-06 0.48310 Ddit3::Cebpa 0.5159 0.1783 0.3457 277 819 2.456E-04 0.37424

90 12.9 AR without FoxA1 binding site overlaps (up)

Chromosome specific statistics are shown in Table 45. A histogram of sequence lengths is shown in Figure 41.

length chromosome frequency min mean max total coverage 1 21 102 369 535 7743 3.1e-05 2 13 239 368 681 4783 2e-05 3 4 298 466 622 1863 9e-06 4 11 316 436 678 4797 2.5e-05 5 5 300 366 436 1832 1e-05 6 25 189 372 559 9292 5.4e-05 7 12 8 359 501 4313 2.7e-05 8 4 264 324 386 1298 9e-06 10 6 320 429 581 2575 1.9e-05 11 16 273 407 540 6506 4.8e-05 12 3 413 572 875 1716 1.3e-05 13 1 272 272 272 272 2e-06 14 4 350 451 544 1803 1.7e-05 15 5 76 377 550 1886 1.8e-05 16 7 244 340 530 2383 2.6e-05 17 6 247 444 551 2666 3.3e-05 18 1 321 321 321 321 4e-06 19 3 269 353 469 1060 1.8e-05 20 6 296 369 437 2214 3.5e-05 21 6 204 441 603 2646 5.5e-05 all 20 159 8 390 875 61969 8e-06

Table 45: Chromosome specific distribution of the regions. The last line represents the overall statistics. 60 50 40 30 Frequency 20 10 0

0 200 400 600 800

length (base pairs)

Figure 41: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

91 The following table shows the properties of seqARnFXu-gCount component. property value genes 97

(a) seqARnFXu-deNovo-meme1: width=15, (b) seqARnFXu-deNovo-meme2: width=15, (c) seqARnFXu-deNovo-meme3: width=15, sites=51, llr=554, E=2.8e-12 sites=40, llr=433, E=730 sites=38, llr=406, E=5500000

Figure 42: De novo motifs for the filtered AR without FoxA1 binding site overlaps (up) sequences.

Table 46: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 3165.5570 0.1582 0.0000 24 0 20.41E-12 0.05621 ARMotifTT 507.3291 0.0253 0.0000 4 0 2.539E-02 0.00870 GRMotifTT 6.5817 0.0443 0.0067 7 2 1.593E-02 0.01935 MIZF 5.6425 0.0380 0.0067 6 2 3.925E-02 0.01724 Esrrb 3.6092 0.1329 0.0368 19 11 8.347E-04 0.07403 Stat3 3.0828 0.2785 0.0903 34 20 1.637E-06 0.22800 FOXA1pAR 2.9717 0.1392 0.0468 22 13 2.942E-04 0.07711 GABPA 2.3276 0.1013 0.0435 15 12 2.341E-02 0.06833 NFKB1 2.2965 0.1076 0.0468 14 12 4.296E-02 0.08530 GRMotifTH 2.0056 0.3354 0.1672 44 47 1.859E-04 0.23199 GRMotifHH 1.9916 0.2532 0.1271 36 35 4.841E-04 0.17256 MAX 1.9844 0.2722 0.1371 32 34 36.59E-04 0.23368 ARMotifT 1.9108 0.6519 0.3411 76 86 47.2E-10 0.48035 Mycn 1.8919 0.1582 0.0836 20 17 92.42E-04 0.15906 ARMotifHH 1.7796 0.5000 0.2809 61 65 64.05E-08 0.41417 NR3C1 1.7327 0.4810 0.2776 59 69 8.81E-06 0.38965 Myc 1.6963 0.1646 0.0970 22 23 2.585E-02 0.14996 GRMotifT 1.6688 2.6456 1.5853 146 235 1.704E-26 2.78277 GR 1.5826 0.2911 0.1839 39 48 41.45E-04 0.23394 NFIC 1.5776 2.9177 1.8495 145 234 6.40E-26 3.18008 FOXA1 1.5482 0.4557 0.2943 57 75 1.448E-04 0.36838 USF1 1.5284 0.5316 0.3478 45 62 49.23E-04 0.72952 ARMotifH 1.4679 9.0380 6.1572 158 299 8.279E-30 13.30105 TFAP2A 1.4663 1.8734 1.2776 107 165 3.76E-12 4.25905 EBF1 1.4412 0.7278 0.5050 66 102 2.236E-04 0.89292 ELK1 1.3375 0.8544 0.6388 88 136 9.851E-08 0.81458 Hand1::Tcfe2a 1.3006 1.3354 1.0268 113 190 1.646E-12 1.23872 REL 1.2823 0.6519 0.5084 64 114 40.15E-04 0.73841 ELF5 1.2674 1.8354 1.4482 126 219 3.451E-16 1.87977 SPI1 1.2630 1.9304 1.5284 131 227 3.849E-18 2.02071 Myb 1.2583 0.8165 0.6488 84 138 2.453E-06 0.76910 ETS1 1.2385 6.6772 5.3913 155 295 6.859E-28 10.77783 SPIB 1.2382 3.1266 2.5251 151 262 13.93E-28 3.70051 NR4A2 1.2212 1.7848 1.4615 122 220 3.585E-14 1.89429 FEV 1.2131 0.6329 0.5217 74 123 93.79E-06 0.54516 SP1 1.2083 1.5316 1.2676 77 144 4.062E-04 5.55953 CREB1 0.8269 0.6582 0.7960 79 152 4.033E-04 0.80715 HOXA5 0.8180 5.0063 6.1204 149 289 3.537E-24 13.52404 FOXI1 0.8170 0.5000 0.6120 61 121 3.078E-02 0.72762 Nkx2-5 0.7865 4.5823 5.8261 147 282 22.84E-24 15.42393 ARID3A 0.7345 2.6899 3.6622 130 258 24.22E-16 8.77285 Pdx1 0.7264 1.9241 2.6488 122 237 58.34E-14 4.97701 Prrx2 0.6734 1.7025 2.5284 112 243 1.401E-08 4.51763 Fos 0.6586 1.0000 1.5184 91 228 20.93E-04 1.68954 FOXL1 0.5960 4.4051 7.3913 139 273 38.58E-20 67.37532 Foxd3 0.5686 1.2911 2.2709 88 198 6.529E-04 7.73881 Ddit3::Cebpa 0.3689 0.1456 0.3946 18 91 22.58E-04 0.37170

92 12.10 AR without FoxA1 binding site overlaps (down)

Chromosome specific statistics are shown in Table 47. A histogram of sequence lengths is shown in Figure 43.

length chromosome frequency min mean max total coverage 1 3 301 429 504 1288 5e-06 2 5 235 325 382 1624 7e-06 3 4 340 528 804 2112 1.1e-05 4 2 306 324 342 648 3e-06 5 2 312 383 454 766 4e-06 6 7 220 363 509 2538 1.5e-05 7 6 325 403 562 2416 1.5e-05 9 1 303 303 303 303 2e-06 10 2 288 296 305 593 4e-06 11 7 231 318 377 2225 1.6e-05 12 1 410 410 410 410 3e-06 13 1 220 220 220 220 2e-06 14 1 271 271 271 271 3e-06 16 1 398 398 398 398 4e-06 17 2 288 345 402 690 8e-06 18 1 298 298 298 298 4e-06 19 1 362 362 362 362 6e-06 all 17 47 220 365 804 17162 2e-06

Table 47: Chromosome specific distribution of the regions. The last line represents the overall statistics. 25 20 15 10 Frequency 5 0

200 300 400 500 600 700 800 900

length (base pairs)

Figure 43: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

93 The following table shows the properties of seqARnFXd-gCount component. property value genes 35

(a) seqARnFXd-deNovo-meme1: width=15, (b) seqARnFXd-deNovo-meme2: width=15, (c) seqARnFXd-deNovo-meme3: width=11, sites=3, sites=18, llr=208, E=640 sites=22, llr=227, E=310000 llr=45, E=2300000

Figure 44: De novo motifs for the filtered AR without FoxA1 binding site overlaps (down) sequences.

Table 48: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 2979.7234 0.1489 0.0000 7 0 7.025E-04 0.04817 RELA 3.9115 0.1702 0.0435 7 3 2.618E-02 0.10843 Foxa2 3.2962 0.6809 0.2065 23 18 49.73E-06 0.37890 Foxq1 2.9355 0.4468 0.1522 15 14 1.242E-02 0.29121 FOXA1 2.8468 0.6809 0.2391 25 19 9.409E-06 0.39871 FOXF2 2.6905 0.2340 0.0870 11 8 1.966E-02 0.11886 GRMotifTH 2.3485 0.3830 0.1630 14 14 2.641E-02 0.25482 NR3C1 1.8790 0.5106 0.2717 19 22 93.61E-04 0.36034 ARMotifT 1.7738 0.6170 0.3478 23 26 12.61E-04 0.43645 FOXD1 1.7271 0.9574 0.5543 28 36 2.607E-04 0.83839 FOXI1 1.7225 0.9362 0.5435 24 34 50.58E-04 1.01762 FOXO3 1.6312 1.8085 1.1087 34 53 38.77E-06 2.53206 GRMotifT 1.5100 2.2979 1.5217 39 69 2.903E-06 2.09801 Spz1 1.4789 0.7234 0.4891 28 32 77.74E-06 0.60943 NFATC2 1.4304 1.6170 1.1304 37 59 4.588E-06 1.57179 NR4A2 1.3912 1.8298 1.3152 37 71 44.26E-06 1.68648 FEV 1.3422 0.5106 0.3804 21 28 1.122E-02 0.39099 SPI1 1.3231 1.5532 1.1739 39 59 35.72E-08 1.38630 ARMotifH 1.3098 7.6596 5.8478 47 89 7.198E-10 10.74299 Sox17 1.3050 1.4894 1.1413 36 57 9.777E-06 1.49765 ELK1 1.3049 0.7660 0.5870 24 40 1.765E-02 0.65019 Sox5 1.3004 2.0213 1.5543 40 63 22.84E-08 2.45282 TFAP2A 0.8248 1.5957 1.9348 27 53 1.853E-02 10.00365 Nkx3-2 0.8146 1.4255 1.7500 36 64 39.66E-06 2.46387 MZF1 1-4 0.8106 2.8723 3.5435 41 84 2.833E-06 7.43530 Nobox 0.7970 1.2128 1.5217 28 57 1.628E-02 2.56376 Fos 0.7932 1.3191 1.6630 35 69 2.648E-04 2.27859 YY1 0.7770 2.7447 3.5326 44 88 7.912E-08 4.26921 Foxd3 0.7733 1.3617 1.7609 31 52 6.385E-04 6.62715 Pdx1 0.7612 1.7872 2.3478 36 72 1.521E-04 4.01825 Prrx2 0.7164 1.5106 2.1087 35 70 3.089E-04 3.14336

94 12.11 FoxA1 without AR binding site overlaps

Chromosome specific statistics are shown in Table 49. A histogram of sequence lengths is shown in Figure 45.

length chromosome frequency min mean max total coverage 1 1345 7 381 1075 512497 0.002056 10 620 15 364 879 225860 0.001666 11 641 6 371 906 237649 0.00176 12 565 9 358 826 202541 0.001513 13 427 5 402 1426 171833 0.001492 14 484 38 391 1283 189147 0.001762 15 431 34 368 796 158648 0.001547 16 342 2 367 1574 125483 0.001389 17 485 1 352 806 170942 0.002105 18 282 2 361 829 101908 0.001305 19 149 14 318 702 47332 8e-04 2 1049 27 364 1237 381826 0.00157 20 333 0 372 1050 123739 0.001963 21 151 173 372 1203 56202 0.001168 22 140 176 324 723 45379 0.000885 3 1187 39 404 1326 479001 0.002419 4 690 2 359 833 247609 0.001295 5 742 9 373 1115 276496 0.001528 6 748 12 349 807 260979 0.001525 7 723 83 365 940 263598 0.001656 8 687 27 380 1588 261125 0.001784 9 594 3 373 1201 221659 0.00157 X 251 83 320 726 80257 0.000517 Y 14 214 338 461 4736 8e-05 all 24 13080 0 371 1588 4846446 0.000659

Table 49: Chromosome specific distribution of the regions. The last line represents the overall statistics. 3000 Frequency 1000 0

0 500 1000 1500

length (base pairs)

Figure 45: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

95 The following table shows the properties of seqFXnAR-gCount component. property value genes 20628

(a) seqFXnAR-deNovo-meme1: width=14, (b) seqFXnAR-deNovo-meme2: width=15, (c) seqFXnAR-deNovo-meme3: width=15, sites=22, sites=410, llr=3655, E=1e-258 sites=127, llr=1336, E=1.1e-29 llr=305, E=110000

Figure 46: De novo motifs for the filtered FoxA1 without AR binding site overlaps sequences.

Table 50: Motif enrichments

motif ratio fC fR n1C n1R p1 var CTCF 7.8820 0.0213 0.0027 275 65 13.84E-72 0.00921 FOXA1 4.0641 1.2843 0.3160 10305 6349 0.00E00 0.80249 Foxa2 4.0416 1.2333 0.3051 10281 5541 0.00E00 0.88585 FOXF2 3.8633 0.4052 0.1048 4509 2382 0.00E00 0.22269 TLX1::NFIC 3.4496 0.0184 0.0053 183 102 16.44E-26 0.01436 REST 3.3940 0.0005 0.0001 7 3 4.496E-02 0.00027 FOXD1 2.6097 1.6685 0.6393 11100 10806 0.00E00 1.30245 Foxq1 2.1274 0.7052 0.3315 6471 6214 0.00E00 0.59402 E2F1 1.9163 0.0184 0.0096 231 230 2.603E-12 0.01337 ELK4 1.8577 0.0326 0.0175 382 415 6.855E-16 0.02577 Tal1::Gata1 1.7708 0.0444 0.0250 559 599 2.583E-24 0.03316 GABPA 1.6506 0.0814 0.0493 954 1143 25.15E-32 0.06784 Stat3 1.5926 0.1852 0.1162 1876 2314 2.955E-68 0.18677 Ar 1.5851 0.0107 0.0067 130 161 3.952E-04 0.00880 FOXI1 1.5543 1.0131 0.6518 7515 9840 0.00E00 1.28643 FOXO3 1.4891 1.9751 1.3263 10962 16472 0.00E00 2.28324 STAT1 1.3953 0.0389 0.0279 403 557 45.31E-08 0.04352 NHLH1 1.3813 0.1301 0.0941 1108 1456 56.93E-28 0.19610 FOXA1pAR 1.3329 0.0846 0.0635 1009 1376 1.434E-20 0.08499 ARMotifTT 1.3112 0.0077 0.0059 100 143 3.923E-02 0.00659 Tcfcp2l1 1.3109 0.0594 0.0453 705 1025 7.591E-10 0.05673 MIZF 1.2983 0.0237 0.0183 298 433 3.005E-04 0.02135 HNF1B 1.2812 0.0871 0.0680 1064 1572 52.0E-16 0.07905 Gata1 1.2640 0.6268 0.4959 5694 9070 3.217E-260 0.63256 Myb 1.2626 0.7604 0.6022 6633 10676 0.00E00 0.75127 ARMotifTH 1.2624 0.0415 0.0328 527 786 2.957E-06 0.03669 NFIC 1.2585 2.3655 1.8797 11330 19219 0.00E00 3.12473 RXR::RAR DR5 1.2548 0.0243 0.0193 308 468 18.62E-04 0.02134 TAL1::TCF3 1.2499 0.2291 0.1833 2163 3273 10.83E-40 0.28900 NKX3-1 1.2371 0.8241 0.6661 6533 10106 0.00E00 1.18881 AR 1.2215 0.0439 0.0359 554 862 41.37E-06 0.03930 Nr2e3 1.2128 0.4518 0.3725 3491 5483 94.29E-88 0.68091 NFYA 1.2118 0.0530 0.0437 628 970 4.121E-06 0.05894 MEF2A 0.8314 0.2501 0.3009 2459 5310 1.475E-02 0.42051 YY1 0.8291 2.9323 3.5367 11888 23067 0.00E00 5.65683 MZF1 5-13 0.8065 0.9842 1.2204 7537 15975 4.189E-290 1.62915 Prrx2 0.8057 1.8349 2.2775 9530 19263 0.00E00 4.09136 Lhx3 0.8028 0.2495 0.3108 2298 4876 1.40E-02 0.55639 CREB1 0.7917 0.5712 0.7214 5336 11441 1.651E-76 0.85456 PLAG1 0.7763 0.0109 0.0140 133 325 1.525E-02 0.01531 NF-kappaB 0.7139 0.1298 0.1819 1267 3463 18.08E-16 0.23777 Zfx 0.6427 0.0936 0.1456 1006 3008 1.801E-24 0.16120 Ddit3::Cebpa 0.5411 0.1973 0.3647 2297 6478 10.33E-16 0.40342 EWSR1-FLI1 0.2198 0.0042 0.0193 18 89 1.489E-04 0.15697

96 12.12 FoxA1 without AR binding site overlaps (up)

Chromosome specific statistics are shown in Table 51. A histogram of sequence lengths is shown in Figure 47.

length chromosome frequency min mean max total coverage 1 77 132 386 1075 29736 0.000119 10 29 35 376 665 10907 8e-05 11 16 217 344 509 5512 4.1e-05 12 23 190 381 771 8773 6.6e-05 13 18 165 319 479 5744 5e-05 14 14 251 384 793 5382 5e-05 15 11 202 294 424 3231 3.2e-05 16 9 189 317 433 2849 3.2e-05 17 29 141 352 590 10198 0.000126 18 5 187 330 468 1648 2.1e-05 19 11 204 307 458 3381 5.7e-05 2 38 155 334 632 12679 5.2e-05 20 22 190 390 625 8581 0.000136 21 7 263 372 596 2606 5.4e-05 22 2 247 256 265 512 1e-05 3 30 220 393 687 11804 6e-05 4 42 151 395 739 16606 8.7e-05 5 27 92 369 710 9955 5.5e-05 6 34 213 364 562 12359 7.2e-05 7 28 234 377 627 10568 6.6e-05 8 18 205 337 598 6071 4.1e-05 9 1 228 228 228 228 2e-06 X 10 209 342 543 3418 2.2e-05 all 23 501 35 365 1075 182748 2.5e-05

Table 51: Chromosome specific distribution of the regions. The last line represents the overall statistics. 150 100 Frequency 50 0

0 200 400 600 800 1000

length (base pairs)

Figure 47: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

97 The following table shows the properties of seqFXnARu-gCount component. property value genes 172

(a) seqFXnARu-deNovo-meme1: width=11, (b) seqFXnARu-deNovo-meme2: width=15, (c) seqFXnARu-deNovo-meme3: width=15, sites=382, llr=3062, E=2e-91 sites=85, llr=936, E=1.8e-12 sites=38, llr=471, E=63000

Figure 48: De novo motifs for the filtered FoxA1 without AR binding site overlaps (up) sequences.

Table 52: Motif enrichments

motif ratio fC fR n1C n1R p1 var TLX1::NFIC 440.1218 0.0220 0.0000 8 0 4.40E-04 0.01174 CTCF 5.5808 0.0240 0.0043 12 4 16.5E-04 0.01098 E2F1 4.9468 0.0160 0.0032 8 3 1.907E-02 0.00758 Foxa2 4.0174 1.1657 0.2901 387 208 1.47E-116 0.77285 FOXF2 3.9683 0.3713 0.0935 160 81 41.1E-34 0.19982 FOXA1 3.8878 1.1776 0.3029 385 236 31.47E-108 0.68151 Ar 2.8032 0.0180 0.0064 9 5 3.922E-02 0.01169 FOXD1 2.6770 1.6387 0.6121 423 400 33.43E-98 1.27939 Foxq1 2.4136 0.6747 0.2795 243 217 36.21E-36 0.48058 GABPA 2.1518 0.0938 0.0436 40 40 24.73E-04 0.07261 FOXA1pAR 2.0648 0.0878 0.0425 39 38 21.01E-04 0.06600 Tcfcp2l1 1.8291 0.0758 0.0414 36 35 33.55E-04 0.05891 FOXI1 1.5786 0.9361 0.5930 273 365 7.504E-26 1.07118 Arnt 1.5271 0.1477 0.0967 55 65 31.75E-04 0.16663 TAL1::TCF3 1.4730 0.2395 0.1626 82 113 22.15E-04 0.29377 FOXO3 1.4212 1.8124 1.2752 409 626 55.98E-60 2.03637 Myf 1.4028 0.2415 0.1722 96 131 3.528E-04 0.26056 Nr2e3 1.3757 0.4591 0.3337 138 196 2.553E-06 0.60706 HNF1B 1.3579 0.0938 0.0691 46 60 3.02E-02 0.08001 Myb 1.3396 0.7844 0.5855 258 400 9.366E-18 0.76337 Stat3 1.3334 0.1417 0.1063 58 79 1.837E-02 0.16983 NFIC 1.3260 2.4491 1.8470 435 744 38.83E-64 2.95659 INSM1 1.3095 0.1517 0.1158 67 92 92.52E-04 0.16465 GRMotifTH 1.2971 0.2495 0.1923 113 153 28.0E-06 0.22836 NHLH1 1.2830 0.1377 0.1073 45 59 3.49E-02 0.21649 FEV 1.2612 0.5549 0.4400 198 327 1.714E-08 0.54678 TFAP2A 1.2611 1.4970 1.1870 293 482 7.525E-22 3.78885 BRCA1 1.2334 1.9242 1.5600 420 714 39.33E-58 2.14179 ELK1 1.2256 0.7984 0.6514 254 421 50.99E-16 0.87257 RUNX1 1.2009 0.2655 0.2210 117 187 15.24E-04 0.23481 MZF1 5-13 0.8276 1.0220 1.2349 290 615 87.61E-14 1.55633 YY1 0.8214 2.8423 3.4601 454 879 3.522E-64 4.91124 Pdx1 0.8047 1.9960 2.4803 377 744 4.255E-34 4.69922 CREB1 0.7944 0.5808 0.7311 220 445 2.787E-06 0.85659 FOXL1 0.7524 5.3353 7.0914 447 871 1.464E-60 45.25607 Prrx2 0.7476 1.7725 2.3709 361 754 36.59E-28 4.30306 NF-kappaB 0.6409 0.1158 0.1807 44 144 1.286E-02 0.19983

98 12.13 FoxA1 without AR binding site overlaps (down)

Chromosome specific statistics are shown in Table 53. A histogram of sequence lengths is shown in Figure 49.

length chromosome frequency min mean max total coverage 1 23 50 382 901 8792 3.5e-05 10 16 205 389 573 6222 4.6e-05 11 19 195 346 628 6573 4.9e-05 12 11 179 333 476 3662 2.7e-05 13 16 75 407 651 6506 5.6e-05 14 25 206 461 997 11516 0.000107 15 8 34 421 781 3368 3.3e-05 16 2 270 365 460 730 8e-06 17 8 103 304 588 2436 3e-05 18 12 214 380 582 4566 5.8e-05 19 5 238 332 408 1662 2.8e-05 2 13 27 366 693 4754 2e-05 20 11 215 352 593 3867 6.1e-05 3 25 39 423 962 10572 5.3e-05 4 15 204 369 642 5531 2.9e-05 5 9 189 335 479 3011 1.7e-05 6 24 195 381 690 9133 5.3e-05 7 21 83 384 673 8064 5.1e-05 8 1 365 365 365 365 2e-06 9 12 209 368 711 4420 3.1e-05 X 1 229 229 229 229 1e-06 all 21 277 27 383 997 105979 1.4e-05

Table 53: Chromosome specific distribution of the regions. The last line represents the overall statistics. 80 60 40 Frequency 20 0

0 200 400 600 800 1000

length (base pairs)

Figure 49: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

99 The following table shows the properties of seqFXnARd-gCount component. property value genes 93

(a) seqFXnARd-deNovo-meme1: width=11, (b) seqFXnARd-deNovo-meme2: width=15, (c) seqFXnARd-deNovo-meme3: width=11, sites=2, sites=239, llr=1809, E=2.3e-29 sites=33, llr=403, E=100 llr=32, E=2.3e+08

Figure 50: De novo motifs for the filtered FoxA1 without AR binding site overlaps (down) sequences.

Table 54: Motif enrichments

motif ratio fC fR n1C n1R p1 var CTCF 289.8087 0.0144 0.0000 4 0 2.722E-02 0.00503 Foxa2 4.0203 1.2960 0.3223 216 125 66.66E-64 0.86092 FOXA1 3.6557 1.2708 0.3476 216 145 12.33E-58 0.77748 FOXF2 2.4268 0.3394 0.1398 82 67 1.126E-10 0.21897 FOXD1 2.2866 1.6606 0.7262 227 248 50.14E-46 1.38277 GABPA 2.0904 0.0975 0.0466 23 23 2.78E-02 0.07297 Foxq1 1.9844 0.6859 0.3456 138 137 2.779E-18 0.53731 Stat3 1.9289 0.1986 0.1029 40 42 24.57E-04 0.18619 NR2F1 1.6476 0.1408 0.0854 36 40 92.2E-04 0.11163 FOXI1 1.5819 1.0505 0.6641 168 214 12.68E-20 1.18973 Nr2e3 1.4917 0.4838 0.3243 85 110 15.86E-06 0.55449 FOXO3 1.4885 1.9134 1.2854 236 335 13.41E-40 2.03284 TAL1::TCF3 1.3309 0.2274 0.1709 45 58 1.421E-02 0.30115 Gata1 1.3086 0.6606 0.5049 126 190 3.395E-08 0.67915 Myf 1.2742 0.2202 0.1728 51 70 1.472E-02 0.25738 NFIC 1.2522 2.3610 1.8854 243 398 10.12E-38 3.02766 MZF1 5-13 0.8027 1.0505 1.3087 156 343 2.673E-06 2.07991 YY1 0.7428 2.8989 3.9029 245 493 10.19E-32 14.98847 Zfx 0.4165 0.0542 0.1301 14 57 2.195E-02 0.13339

100 12.14 AR and FoxA1 siFOXA1 binding site overlaps

Chromosome specific statistics are shown in Table 55. A histogram of sequence lengths is shown in Figure 51.

length chromosome frequency min mean max total coverage 1 71 135 339 751 24077 9.7e-05 10 77 8 289 503 22248 0.000164 11 104 7 301 649 31283 0.000232 12 70 105 295 712 20683 0.000155 13 61 152 341 919 20821 0.000181 14 62 42 318 731 19723 0.000184 15 70 14 322 749 22571 0.00022 16 60 164 297 494 17836 0.000197 17 79 3 311 817 24598 0.000303 18 29 160 331 712 9587 0.000123 19 25 182 306 443 7647 0.000129 2 153 20 307 725 46998 0.000193 20 56 6 290 492 16213 0.000257 21 26 168 303 553 7877 0.000164 22 25 134 282 360 7039 0.000137 3 141 2 311 722 43851 0.000221 4 87 39 287 722 24999 0.000131 5 130 74 291 621 37773 0.000209 6 100 13 286 564 28614 0.000167 7 107 47 314 917 33634 0.000211 8 117 56 318 562 37182 0.000254 9 71 31 302 682 21420 0.000152 X 27 192 278 366 7504 4.8e-05 all 23 1748 2 306 919 534178 7.3e-05

Table 55: Chromosome specific distribution of the regions. The last line represents the overall statistics. 600 400 Frequency 200 0

0 200 400 600 800 1000

length (base pairs)

Figure 51: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

101 The following table shows the properties of seqAXSi-gCount component. property value genes 4733

(a) seqAXSi-deNovo-meme1: width=14, sites=361, (b) seqAXSi-deNovo-meme2: width=15, sites=157, (c) seqAXSi-deNovo-meme3: width=15, sites=90, llr=3048, E=9.2e-150 llr=1576, E=3.3e-52 llr=968, E=3.6e-13

Figure 52: De novo motifs for the filtered AR and FoxA1 siFOXA1 binding site overlaps sequences.

Table 56: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 20.6014 0.0763 0.0037 125 12 91.39E-46 0.03161 RREB1 14.0141 0.3081 0.0219 49 54 58.19E-04 11.16943 FOXA1 3.6709 0.9587 0.2611 1206 730 27.99E-310 0.53069 Foxa2 3.5528 0.8629 0.2428 1160 639 32.7E-308 0.50704 FOXF2 3.5299 0.3024 0.0856 466 262 10.65E-82 0.17000 ESR2 3.0658 0.0207 0.0067 33 18 10.75E-06 0.01420 TLX1::NFIC 3.0658 0.0207 0.0067 28 18 3.024E-04 0.01619 Tal1::Gata1 2.9858 0.0620 0.0207 102 68 1.059E-12 0.03619 FOXA1pAR 2.7664 0.1274 0.0460 213 138 4.239E-28 0.07789 PPARG::RXRA 2.4795 0.0212 0.0085 33 27 12.66E-04 0.01556 GABPA 2.2913 0.0866 0.0378 147 117 2.731E-14 0.05652 MIZF 2.2454 0.0281 0.0125 49 41 86.46E-06 0.01759 STAT1 2.1847 0.0453 0.0207 60 57 1.351E-04 0.04074 FOXD1 2.1403 1.1543 0.5393 1259 1276 95.06E-226 0.86575 ARMotifTH 2.0624 0.0522 0.0253 91 78 6.421E-08 0.03662 Stat3 2.0028 0.1910 0.0954 244 252 13.39E-16 0.18092 Foxq1 1.8274 0.4917 0.2690 672 709 4.448E-64 0.40476 GRMotifTH 1.8195 0.2645 0.1453 401 429 7.974E-28 0.20002 ARMotifT 1.8113 0.4796 0.2648 654 720 45.97E-58 0.39779 NHLH1 1.7456 0.1245 0.0713 138 150 20.37E-08 0.15536 Gata1 1.6842 0.6954 0.4129 822 1061 38.53E-68 0.60144 AR 1.6748 0.0465 0.0277 77 89 9.114E-04 0.03545 Evi1 1.5938 0.0413 0.0259 64 81 1.285E-02 0.03545 Tcfcp2l1 1.5899 0.0562 0.0353 90 107 5.283E-04 0.05034 SP1 1.5631 1.3517 0.8647 605 1237 1.349E-08 32.54803 NFIC 1.5461 2.3964 1.5500 1510 2368 6.619E-234 2.73508 NR3C1 1.5387 0.2719 0.1767 392 518 1.914E-16 0.23147 GR 1.5070 0.2186 0.1450 346 421 66.95E-18 0.18090 ARMotifHH 1.4959 0.3150 0.2105 450 589 42.25E-22 0.27628 GRMotifT 1.4455 1.8009 1.2459 1431 2225 6.771E-202 1.77333 ELK4 1.4445 0.0247 0.0171 43 53 3.824E-02 0.02091 Esrrb 1.4391 0.0987 0.0686 159 217 1.426E-04 0.08154 Arnt 1.4284 0.1371 0.0960 175 215 56.84E-08 0.17017 EBF1 1.3631 0.5462 0.4007 616 924 1.196E-26 0.69233 ELK1 1.3560 0.7028 0.5183 846 1241 1.42E-56 0.66804 ARMotifH 1.3476 6.4544 4.7895 1725 3171 14.77E-310 9.87327 FEV 1.3464 0.5227 0.3882 694 1046 4.633E-34 0.47032 NFE2L2 1.3158 0.1079 0.0820 178 258 3.455E-04 0.09145 Hand1::Tcfe2a 1.3111 0.9656 0.7364 1038 1645 17.83E-82 0.91694 FOXO3 1.3102 1.4240 1.0868 1275 2022 2.594E-142 1.55958 Myb 1.3027 0.6363 0.4884 785 1207 44.39E-44 0.62069 Myf 1.2565 0.1922 0.1530 266 382 39.41E-08 0.24076 NR2F1 1.2316 0.0901 0.0731 144 227 1.979E-02 0.08313 SPI1 1.2258 1.4223 1.1603 1284 2151 29.87E-136 1.56470 ELF5 1.2253 1.4464 1.1804 1283 2131 1.806E-136 1.60758 TAL1::TCF3 1.2172 0.1899 0.1560 237 374 3.11E-04 0.25151 GRMotifH 1.2144 4.4418 3.6575 1676 3091 1.994E-280 6.43572 CREB1 0.8201 0.4963 0.6051 628 1324 2.359E-08 0.76223 Nkx2-5 0.8099 3.6517 4.5091 1577 2981 5.649E-224 11.80747 ARID3A 0.7828 2.1899 2.7974 1369 2618 1.227E-138 5.93800 PLAG1 0.7785 0.0109 0.0140 11 44 3.438E-02 0.02272 Pdx1 0.7081 1.3861 1.9576 1157 2474 15.32E-66 3.28665 Prrx2 0.6619 1.2444 1.8800 1095 2417 2.802E-50 3.02078 Foxd3 0.6619 1.1761 1.7770 983 1953 2.918E-44 6.29790 FOXL1 0.6407 3.5829 5.5926 1499 2911 7.271E-184 32.53586 NFIL3 0.6044 0.0987 0.1633 145 402 35.63E-04 0.21621 MEF2A 0.5872 0.1497 0.2550 207 613 1.696E-04 0.33202 IRF1 0.5624 0.1119 0.1990 175 544 19.64E-06 0.20519 Ddit3::Cebpa 0.5280 0.1549 0.2934 244 747 31.43E-06 0.30264 Lhx3 0.4832 0.1234 0.2553 174 551 7.548E-06 0.36432

102 12.15 AR and FoxA1 siFOXA1 binding site overlaps (up)

Chromosome specific statistics are shown in Table 57. A histogram of sequence lengths is shown in Figure 53.

length chromosome frequency min mean max total coverage 1 5 213 383 751 1914 8e-06 2 11 69 319 590 3511 1.4e-05 3 10 188 365 605 3654 1.8e-05 4 9 200 294 394 2646 1.4e-05 5 9 99 281 365 2529 1.4e-05 6 13 149 357 564 4636 2.7e-05 7 12 213 314 461 3767 2.4e-05 8 11 200 316 395 3480 2.4e-05 9 5 54 331 682 1656 1.2e-05 10 11 210 319 426 3509 2.6e-05 11 7 235 282 389 1971 1.5e-05 12 8 221 297 398 2373 1.8e-05 13 2 213 286 359 572 5e-06 14 6 192 297 400 1780 1.7e-05 15 3 231 367 613 1102 1.1e-05 16 3 219 344 439 1033 1.1e-05 17 6 229 394 681 2361 2.9e-05 19 3 362 385 419 1155 2e-05 20 6 211 342 432 2049 3.3e-05 21 2 281 340 398 679 1.4e-05 all 20 142 54 327 751 46377 6e-06

Table 57: Chromosome specific distribution of the regions. The last line represents the overall statistics. 50 40 30 20 Frequency 10 0

0 200 400 600 800

length (base pairs)

Figure 53: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

103 The following table shows the properties of seqAXSiu-gCount component. property value genes 93

(a) seqAXSiu-deNovo-meme1: width=15, sites=112, (b) seqAXSiu-deNovo-meme2: width=15, sites=39, (c) seqAXSiu-deNovo-meme3: width=15, sites=16, llr=910, E=1.2e-16 llr=442, E=1e-09 llr=215, E=2700

Figure 54: De novo motifs for the filtered AR and FoxA1 siFOXA1 binding site overlaps (up) sequences.

Table 58: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 2677.0563 0.1338 0.0000 17 0 2.60E-08 0.05358 ARMotifTH 8.5683 0.0634 0.0074 9 2 22.28E-04 0.02593 FOXA1pAR 3.0081 0.1549 0.0515 22 14 4.086E-04 0.07959 Esrrb 2.9579 0.1197 0.0404 15 11 1.262E-02 0.07290 Foxa2 2.7293 0.8028 0.2941 82 61 25.24E-18 0.79683 FOXA1 2.7064 0.9155 0.3382 93 73 1.764E-20 0.56890 GRMotifTH 2.5362 0.3451 0.1360 38 35 73.63E-06 0.24246 ARMotifT 2.3713 0.5493 0.2316 61 56 51.24E-10 0.36556 FOXF2 2.3618 0.2606 0.1103 31 25 1.518E-04 0.19408 NR3C1 2.1781 0.4085 0.1875 40 46 7.993E-04 0.32519 GRMotifHH 2.1544 0.2535 0.1176 29 29 27.85E-04 0.21509 NFE2L2 2.0422 0.1127 0.0551 16 15 3.79E-02 0.06944 FOXD1 1.8552 1.0845 0.5846 89 114 7.889E-12 0.92097 Arnt 1.8384 0.1690 0.0919 18 17 2.513E-02 0.17724 Myc 1.7732 0.1761 0.0993 20 21 3.232E-02 0.17305 ARMotifHH 1.7238 0.3803 0.2206 44 46 63.65E-06 0.33561 NFIC 1.6472 2.4648 1.4963 118 184 1.358E-18 2.92693 USF1 1.6301 0.5634 0.3456 44 58 17.81E-04 0.75271 HIF1A::ARNT 1.6232 0.3521 0.2169 37 48 69.38E-04 0.35424 GRMotifT 1.5623 2.0563 1.3162 120 195 1.032E-18 1.90670 Foxq1 1.5563 0.5493 0.3529 62 77 2.523E-06 0.46215 GR 1.5235 0.2465 0.1618 33 37 34.05E-04 0.20321 ARMotifH 1.5129 7.6479 5.0551 142 262 10.92E-28 11.30683 MAX 1.4843 0.2183 0.1471 26 33 4.255E-02 0.20538 TFAP2A 1.4832 1.5704 1.0588 94 133 7.763E-12 3.06604 Myb 1.4296 0.7254 0.5074 72 103 1.062E-06 0.65062 Gata1 1.4254 0.6761 0.4743 66 100 34.26E-06 0.66033 SPI1 1.4210 1.7606 1.2390 111 188 1.657E-14 1.74506 Sox17 1.3754 1.1479 0.8346 91 145 18.57E-10 1.21213 Mafb 1.3682 1.5845 1.1581 99 176 4.427E-10 1.84904 EBF1 1.3375 0.5704 0.4265 56 78 1.696E-04 0.67133 ELK1 1.3374 0.8310 0.6213 74 112 1.719E-06 0.84271 GRMotifH 1.3322 5.0986 3.8272 138 261 79.14E-26 7.26465 FEV 1.3074 0.6056 0.4632 55 99 1.059E-02 0.65724 NR4A2 1.2629 1.4718 1.1654 105 178 1.714E-12 1.71356 SPIB 1.2254 2.7887 2.2757 128 227 89.22E-22 3.45892 FOXO3 1.2029 1.4859 1.2353 102 173 14.24E-12 2.15078 HOXA5 0.8088 4.0915 5.0588 132 256 39.7E-22 10.88174 MZF1 5-13 0.7769 0.8169 1.0515 72 155 33.07E-04 1.34540 Nkx2-5 0.7746 3.7676 4.8640 127 248 1.357E-18 13.40784 Gfi 0.7696 0.6338 0.8235 65 142 1.927E-02 0.91972 YY1 0.7617 2.4085 3.1618 121 248 32.11E-16 5.38047 Prrx2 0.6590 1.3592 2.0625 97 204 16.96E-08 3.17863 ARID3A 0.6548 2.1690 3.3125 109 233 1.978E-10 7.64738 Pdx1 0.6406 1.4437 2.2537 95 218 4.065E-06 3.51515 Foxd3 0.5347 1.0282 1.9228 79 183 17.22E-04 5.70444 FOXL1 0.4991 3.4296 6.8713 120 249 1.244E-14 86.05430 TBP 0.3733 0.1056 0.2831 15 65 2.754E-02 0.24590 MEF2A 0.3675 0.0986 0.2684 11 50 3.26E-02 0.34072

104 12.16 AR and FoxA1 siFOXA1 binding site overlaps (down)

Chromosome specific statistics are shown in Table 59. A histogram of sequence lengths is shown in Figure 55.

length chromosome frequency min mean max total coverage 1 6 270 375 506 2249 9e-06 2 3 299 374 444 1122 5e-06 3 7 199 327 429 2292 1.2e-05 4 5 142 292 513 1460 8e-06 6 5 69 207 298 1037 6e-06 7 2 105 226 348 453 3e-06 8 6 186 302 372 1810 1.2e-05 9 1 278 278 278 278 2e-06 10 1 358 358 358 358 3e-06 11 2 96 196 297 393 3e-06 12 6 171 249 417 1496 1.1e-05 14 3 195 247 294 742 7e-06 15 6 14 282 510 1695 1.7e-05 16 5 202 311 378 1555 1.7e-05 17 5 3 254 817 1268 1.6e-05 18 1 160 160 160 160 2e-06 22 2 231 293 355 586 1.1e-05 all 17 66 3 287 817 18954 3e-06

Table 59: Chromosome specific distribution of the regions. The last line represents the overall statistics. 20 15 10 Frequency 5 0

0 200 400 600 800

length (base pairs)

Figure 55: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

105 The following table shows the properties of seqAXSid-gCount component. property value genes 46

(a) seqAXSid-deNovo-meme1: width=15, sites=45, (b) seqAXSid-deNovo-meme2: width=15, sites=8, (c) seqAXSid-deNovo-meme3: width=15, sites=11, llr=413, E=2.4e-06 llr=122, E=24000 llr=149, E=2e+05

Figure 56: De novo motifs for the filtered AR and FoxA1 siFOXA1 binding site overlaps (down) sequences.

Table 60: Motif enrichments

motif ratio fC fR n1C n1R p1 var NR2F1 1539.4615 0.0769 0.0000 4 0 2.381E-02 0.03653 FOXA1 4.4067 1.0308 0.2339 42 24 6.294E-12 0.68744 FOXA1pAR 3.8110 0.1231 0.0323 8 3 1.321E-02 0.07042 Foxa2 3.1791 0.8462 0.2661 41 22 6.308E-12 0.81397 GRMotifTH 2.2439 0.3077 0.1371 19 16 31.5E-04 0.17956 Foxq1 2.1965 0.5846 0.2661 25 29 39.68E-04 0.42728 FOXD1 2.0489 0.8923 0.4355 42 40 2.525E-08 0.65760 RUNX1 2.0078 0.3077 0.1532 17 17 1.975E-02 0.24975 Gata1 1.7452 0.6615 0.3790 30 37 9.954E-04 0.56991 NFIC 1.6569 2.3385 1.4113 58 76 2.005E-12 3.48531 ARMotifT 1.5787 0.3692 0.2339 20 24 2.561E-02 0.29860 GR 1.5260 0.3077 0.2016 18 20 2.771E-02 0.26748 Cebpa 1.4968 0.7846 0.5242 33 49 19.78E-04 0.72768 ELK1 1.4769 0.7385 0.5000 34 47 5.937E-04 0.61691 GRMotifT 1.4716 1.6615 1.1290 50 72 4.156E-08 1.99246 TFAP2A 1.4085 1.2154 0.8629 37 53 2.165E-04 1.86677 NR4A2 1.3457 1.4000 1.0403 46 69 1.857E-06 1.78678 Hand1::Tcfe2a 1.3007 0.9231 0.7097 33 60 1.418E-02 1.02184 FOXO3 1.2718 1.2615 0.9919 43 74 93.28E-06 1.31194 BRCA1 1.2547 1.5077 1.2016 47 79 5.324E-06 1.63937 ARMotifH 1.2031 5.5692 4.6290 61 114 77.73E-12 12.05623 ARID3A 0.8184 2.1385 2.6129 52 92 26.26E-08 6.32326 Pdx1 0.8112 1.4000 1.7258 41 87 32.23E-04 3.00428 NFATC2 0.8081 0.9385 1.1613 40 76 14.77E-04 1.33322 Sox5 0.8026 1.0615 1.3226 37 79 1.612E-02 1.83913 Nkx2-5 0.8026 3.1846 3.9677 52 108 3.478E-06 10.81814 Prrx2 0.6653 1.0462 1.5726 36 83 4.295E-02 2.48418 FOXL1 0.5433 3.4308 6.3145 55 105 6.775E-08 126.23038 IRF1 0.2011 0.0308 0.1532 2 19 4.368E-02 0.09929

106 12.17 AR without FoxA1 siFOXA1 binding site overlaps

Chromosome specific statistics are shown in Table 61. A histogram of sequence lengths is shown in Figure 57.

length chromosome frequency min mean max total coverage 1 1889 0 503 1678 949877 0.003811 10 727 13 478 1855 347201 0.002562 11 800 4 480 1635 384386 0.002847 12 561 3 455 1417 255527 0.001909 13 408 60 544 1935 221881 0.001927 14 463 180 499 1395 230826 0.00215 15 564 8 478 1165 269834 0.002632 16 501 1 477 1768 238960 0.002645 17 790 10 486 2184 384332 0.004733 18 318 4 453 1168 143999 0.001844 19 290 2 428 1380 124165 0.0021 2 1039 10 469 1801 487054 0.002003 20 410 15 473 1424 194038 0.003079 21 184 7 472 1255 86940 0.001806 22 256 7 438 981 112125 0.002185 3 1088 3 514 1420 558726 0.002822 4 541 9 442 1278 238903 0.00125 5 864 7 471 1482 406841 0.002249 6 655 4 444 1321 291116 0.001701 7 755 97 473 1249 357194 0.002245 8 691 1 468 1307 323193 0.002208 9 676 11 477 1118 322294 0.002282 X 360 68 406 1115 146035 0.000941 Y 18 244 649 1774 11687 0.000197 all 24 14848 0 477 2184 7087134 0.000964

Table 61: Chromosome specific distribution of the regions. The last line represents the overall statistics. 6000 4000 Frequency 2000 0

0 500 1000 1500 2000

length (base pairs)

Figure 57: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

107 The following table shows the properties of seqAXnSi-gCount component. property value genes 20158

(a) seqAXnSi-deNovo-meme1: width=15, sites=282, (b) seqAXnSi-deNovo-meme2: width=15, sites=432, (c) seqAXnSi-deNovo-meme3: width=11, sites=88, llr=2959, E=4.7e-190 llr=3657, E=2.9e-100 llr=967, E=0.027

Figure 58: De novo motifs for the filtered AR without FoxA1 siFOXA1 binding site overlaps sequences.

Table 62: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 14.8188 0.1390 0.0093 1845 244 0.00E00 0.06443 ESR1 5.7667 0.0013 0.0002 18 5 34.12E-06 0.00061 TLX1::NFIC 5.3227 0.0324 0.0060 367 142 2.475E-70 0.02169 REST 3.9754 0.0010 0.0002 10 6 4.002E-02 0.00101 CTCF 3.0255 0.0130 0.0043 191 115 4.939E-24 0.00765 ESR2 2.9428 0.0409 0.0139 519 345 1.639E-56 0.02948 GABPA 2.5535 0.1565 0.0612 2048 1594 54.04E-200 0.10567 ELK4 2.4548 0.0558 0.0227 742 601 11.78E-60 0.03956 E2F1 2.1322 0.0286 0.0134 401 367 65.21E-26 0.01971 Stat3 2.0857 0.3230 0.1548 3454 3334 47.47E-274 0.30248 Tcfcp2l1 1.9721 0.1250 0.0633 1617 1483 5.439E-112 0.23419 Tal1::Gata1 1.9564 0.0693 0.0354 988 944 3.456E-58 0.04898 ARMotifTH 1.8921 0.0842 0.0445 1171 1187 23.74E-62 0.06151 PPARG 1.8756 0.0040 0.0021 51 50 14.02E-04 0.00350 PPARG::RXRA 1.8473 0.0340 0.0184 494 499 2.246E-24 0.02420 NHLH1 1.8420 0.2258 0.1226 2048 2118 9.416E-116 0.28645 MIZF 1.8315 0.0438 0.0239 617 641 1.68E-28 0.03256 GRMotifTH 1.7668 0.4322 0.2446 5044 5871 0.00E00 0.35040 STAT1 1.7636 0.0633 0.0359 737 828 13.93E-28 0.06172 Myf 1.7417 0.4095 0.2351 4121 4873 7.698E-238 0.48085 Pax5 1.7247 0.0854 0.0495 1187 1318 56.92E-50 0.06499 EBF1 1.7197 1.1098 0.6453 8230 11078 0.00E00 1.47330 ARMotifHH 1.7139 0.5798 0.3383 6104 7285 0.00E00 0.53123 TFAP2A 1.7038 2.9994 1.7604 11598 17084 0.00E00 8.82470 Mycn 1.6561 0.1940 0.1171 2060 2345 12.04E-92 0.23647 RXR::RAR DR5 1.6520 0.0438 0.0265 630 714 1.052E-22 0.03438 Esrrb 1.6474 0.1606 0.0975 2169 2504 1.205E-94 0.12622 Zfp423 1.6305 0.1033 0.0633 1174 1367 6.284E-42 0.11527 ARMotifT 1.6281 0.7077 0.4347 7282 9225 0.00E00 0.62240 Egr1 1.6044 0.1434 0.0894 1818 2120 2.082E-72 0.14740 Arnt 1.5862 0.2668 0.1682 2460 2973 14.7E-100 0.37681 RXRA::VDR 1.5795 0.0099 0.0063 147 173 33.05E-06 0.00748 Myc 1.5696 0.2283 0.1454 2450 2993 6.567E-96 0.27327 NR3C1 1.5287 0.4277 0.2798 4952 6443 5.21E-266 0.37355 NFIC 1.5257 3.7591 2.4638 13943 23523 0.00E00 5.51730 AR 1.5256 0.0682 0.0447 964 1192 38.28E-28 0.05456 INSM1 1.4726 0.2754 0.1870 3321 4383 12.38E-124 0.26309 GRMotifT 1.4697 2.9288 1.9928 13658 23069 0.00E00 3.30091 NFKB1 1.4510 0.1093 0.0753 1169 1480 23.1E-32 0.14585 ELK1 1.4466 1.2125 0.8382 9952 14754 0.00E00 1.21165 ARMotifH 1.4166 10.8039 7.6264 14782 27389 0.00E00 23.22889 MYC::MAX 1.3863 0.0904 0.0652 1023 1430 15.49E-18 0.10500 HIF1A::ARNT 1.3808 0.5715 0.4139 5275 7738 3.957E-224 0.79534 PLAG1 1.3806 0.0266 0.0193 376 497 9.301E-08 0.02481 Hand1::Tcfe2a 1.3734 1.6200 1.1796 11308 18243 0.00E00 1.71252 Klf4 1.3725 0.3411 0.2485 3696 5285 1.294E-116 0.43960 NR2F1 1.3629 0.1557 0.1142 2129 2937 24.12E-52 0.13311 SP1 1.3193 1.8785 1.4239 8885 14020 0.00E00 6.98049 FEV 1.3050 0.7807 0.5982 7804 11825 0.00E00 0.76604 GR 1.2945 0.3192 0.2466 3940 5700 39.99E-128 0.30446 Mafb 1.2899 2.6385 2.0454 12870 21857 0.00E00 3.98850 GRMotifTT 1.2884 0.0314 0.0244 457 649 1.031E-06 0.02772 TAL1::TCF3 1.2814 0.2967 0.2315 3121 4560 1.511E-78 0.36889 Gata1 1.2711 0.8040 0.6325 7643 12181 0.00E00 0.83864 SPI1 1.2709 2.3447 1.8449 12840 21743 0.00E00 3.01912 Zfx 1.2517 0.2497 0.1995 2736 4427 1.279E-38 0.29807 MZF1 1-4 1.2482 5.4840 4.3936 14216 25517 0.00E00 15.11870 GRMotifHH 1.2414 0.2667 0.2148 3254 4907 32.44E-76 0.27369 Myb 1.2404 0.9719 0.7835 8728 14311 0.00E00 1.01307 RELA 1.2349 0.1407 0.1139 1750 2661 3.859E-24 0.15619 MAX 1.2195 0.2619 0.2147 2848 4531 12.49E-46 0.33310 NFE2L2 1.2078 0.1631 0.1350 2190 3447 29.42E-30 0.15143 TEAD1 1.2078 0.0947 0.0784 1334 2054 70.62E-16 0.08648 NR4A2 1.2010 2.2120 1.8418 12715 22010 0.00E00 2.70934 Gfi 0.8317 0.9562 1.1496 8544 17766 0.00E00 1.35104 En1 0.8313 4.5005 5.4137 14331 27045 0.00E00 9.35991 FOXO3 0.7978 1.3600 1.7047 10299 20693 0.00E00 2.50118 FOXD1 0.7910 0.6422 0.8119 6577 14174 3.886E-114 0.94683 SOX9 0.7696 0.5122 0.6656 5601 12457 7.031E-50 0.74919 FOXF2 0.7606 0.1056 0.1389 1455 3499 4.593E-06 0.13447 Cebpa 0.7586 0.7261 0.9571 6922 15634 1.24E-114 1.16165 HOXA5 0.7363 5.3336 7.2443 14202 26997 0.00E00 18.99166 Pax6 0.7289 0.0088 0.0120 127 328 28.38E-04 0.01116 Sox5 0.7069 1.5875 2.2459 10576 22197 0.00E00 3.77262 Nobox 0.7020 1.2406 1.7671 8754 19002 0.00E00 3.28811 HNF1B 0.6988 0.0620 0.0888 862 2270 4.938E-12 0.08643 FOXA1pAR 0.6822 0.0557 0.0817 754 1967 1.213E-10 0.09216 Nkx2-5 0.6648 4.6637 7.0151 13551 26436 0.00E00 24.03540 PBX1 0.6531 0.0858 0.1314 1173 2994 1.263E-10 0.28843 Continued on next page. . .

108 motif ratio fC fR n1C n1R p1 var TBP 0.6508 0.2128 0.3271 2667 6845 1.67E-06 0.39455 SRY 0.6500 2.4927 3.8350 12045 24912 0.00E00 8.30146 ARID3A 0.6148 2.6033 4.2344 11693 24238 0.00E00 11.68531 Pou5f1 0.6098 0.0072 0.0119 107 325 18.18E-06 0.01028 FOXI1 0.6035 0.4987 0.8264 5177 13155 33.39E-10 1.26373 Pdx1 0.6013 1.8362 3.0539 10596 23586 0.00E00 6.35887 IRF1 0.5907 0.1680 0.2844 2148 6199 65.94E-26 0.31993 Prrx2 0.5674 1.6416 2.8932 10015 23209 0.00E00 5.70998 NKX3-1 0.5577 0.4735 0.8490 4992 13167 18.73E-04 1.36741 NFIL3 0.5410 0.1245 0.2302 1491 4575 14.52E-36 0.46051 MEF2A 0.4878 0.1907 0.3909 2149 7201 2.844E-58 0.61672 Ddit3::Cebpa 0.4837 0.2299 0.4753 2984 8793 1.518E-22 0.55103 FOXL1 0.4742 4.2100 8.8785 12628 25920 0.00E00 85.52804 Lhx3 0.4256 0.1615 0.3794 1808 6427 29.43E-74 0.57768 Foxd3 0.4020 1.1662 2.9008 7284 20066 9.024E-52 12.44300 EWSR1-FLI1 0.2491 0.0068 0.0275 44 142 18.11E-04 0.22136

109 12.18 AR without FoxA1 siFOXA1 binding site overlaps (up)

Chromosome specific statistics are shown in Table 63. A histogram of sequence lengths is shown in Figure 59.

length chromosome frequency min mean max total coverage 1 207 200 554 1404 114744 0.00046 10 72 118 509 1132 36629 0.00027 11 85 29 498 1175 42299 0.000313 12 17 280 495 890 8416 6.3e-05 13 19 315 491 1495 9332 8.1e-05 14 30 240 563 1395 16897 0.000157 15 46 8 505 952 23208 0.000226 16 31 12 502 931 15574 0.000172 17 65 59 533 1176 34659 0.000427 18 1 433 433 433 433 6e-06 19 12 39 454 884 5445 9.2e-05 2 76 55 504 1307 38313 0.000158 20 33 306 514 1424 16955 0.000269 21 16 312 552 965 8829 0.000183 22 6 286 402 508 2415 4.7e-05 3 86 71 532 1225 45715 0.000231 4 26 148 471 801 12250 6.4e-05 5 45 57 495 1245 22297 0.000123 6 82 65 453 1050 37138 0.000217 7 37 236 479 929 17711 0.000111 8 34 260 505 932 17186 0.000117 9 23 11 422 864 9707 6.9e-05 X 22 250 470 828 10329 6.7e-05 all 23 1071 8 510 1495 546481 7.4e-05

Table 63: Chromosome specific distribution of the regions. The last line represents the overall statistics. 250 150 Frequency 50 0

0 500 1000 1500

length (base pairs)

Figure 59: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

110 The following table shows the properties of seqAXnSiu-gCount component. property value genes 214

(a) seqAXnSiu-deNovo-meme1: width=15, (b) seqAXnSiu-deNovo-meme2: width=15, (c) seqAXnSiu-deNovo-meme3: width=15, sites=273, llr=2588, E=1.4e-73 sites=297, llr=2704, E=8.3e-52 sites=102, llr=1113, E=0.0094

Figure 60: De novo motifs for the filtered AR without FoxA1 siFOXA1 binding site overlaps (up) sequences.

Table 64: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 10.3396 0.1299 0.0125 126 24 2.499E-38 0.06041 CTCF 4.5231 0.0206 0.0045 21 9 1.058E-04 0.01066 TLX1::NFIC 4.2625 0.0364 0.0085 33 14 58.99E-08 0.02380 ESR2 3.5731 0.0449 0.0125 41 22 58.34E-08 0.03042 PPARG::RXRA 2.5614 0.0411 0.0160 42 32 79.11E-06 0.02548 ELK4 2.4466 0.0589 0.0240 56 45 9.684E-06 0.04142 GABPA 2.4305 0.1729 0.0711 159 132 15.74E-16 0.12137 Stat3 2.2417 0.3346 0.1492 261 225 55.73E-28 0.30779 NHLH1 2.1346 0.2748 0.1287 176 153 2.982E-16 0.34769 Pax5 2.1289 0.0981 0.0461 101 89 1.169E-08 0.06469 NFKB1 1.9933 0.1607 0.0806 118 122 14.71E-08 0.17444 ARMotifTH 1.9863 0.0766 0.0386 78 74 6.409E-06 0.05374 Tcfcp2l1 1.9038 0.1402 0.0736 137 129 1.21E-10 0.10901 Tal1::Gata1 1.8910 0.0682 0.0361 68 68 96.92E-06 0.05093 RXR::RAR DR5 1.8648 0.0523 0.0280 53 56 18.5E-04 0.03715 TFAP2A 1.8488 3.4523 1.8673 860 1261 23.21E-124 9.86277 EBF1 1.8263 1.2383 0.6780 616 843 6.982E-58 1.71719 Egr1 1.8168 0.1729 0.0951 158 168 6.987E-10 0.14519 Myf 1.8118 0.4673 0.2579 320 393 21.87E-20 0.51699 ARMotifHH 1.7444 0.6290 0.3605 464 548 2.147E-40 0.57284 GRMotifTH 1.7061 0.4579 0.2684 373 470 5.929E-24 0.36751 Zfp423 1.6530 0.1234 0.0746 101 122 1.594E-04 0.12957 Zfx 1.6451 0.3477 0.2113 269 346 36.91E-14 0.34000 SP1 1.6059 2.4084 1.4997 695 1051 35.37E-68 7.69048 INSM1 1.6046 0.3439 0.2143 287 353 3.429E-16 0.32270 Mycn 1.6025 0.2159 0.1347 160 195 25.28E-08 0.25521 MIZF 1.5888 0.0430 0.0270 46 53 1.296E-02 0.03220 NFIC 1.5594 4.2028 2.6950 1007 1720 2.89E-170 6.77108 ARMotifT 1.5367 0.7234 0.4707 531 711 58.93E-44 0.65170 AR 1.5353 0.0738 0.0481 76 92 18.37E-04 0.05904 Klf4 1.5288 0.4234 0.2769 315 418 7.62E-16 0.46837 Arnt 1.4863 0.2925 0.1968 188 236 2.491E-08 0.43558 RELA 1.4700 0.1804 0.1227 150 203 44.52E-06 0.19225 GR 1.4519 0.3701 0.2549 308 419 2.451E-14 0.34637 Myc 1.4391 0.2299 0.1597 175 237 3.798E-06 0.27362 Hand1::Tcfe2a 1.4357 1.8290 1.2739 855 1338 32.08E-114 1.94896 NR3C1 1.4222 0.4430 0.3115 364 500 57.22E-20 0.41498 ELK1 1.4195 1.3121 0.9244 741 1107 80.48E-82 1.36793 Esrrb 1.4101 0.1561 0.1107 154 205 16.74E-06 0.13076 ARMotifH 1.4015 11.5271 8.2248 1067 1980 1.177E-192 29.63675 TEAD1 1.3901 0.1037 0.0746 103 145 52.9E-04 0.08674 MZF1 1-4 1.3815 6.3991 4.6319 1023 1860 4.497E-170 17.90825 GRMotifT 1.3775 3.0364 2.2043 995 1703 12.34E-164 3.67993 STAT1 1.3549 0.0570 0.0421 50 62 2.308E-02 0.06919 HIF1A::ARNT 1.2977 0.6121 0.4717 395 608 66.81E-18 0.90588 ZNF354C 1.2813 4.5234 3.5303 1023 1805 2.438E-174 8.86282 Mafb 1.2786 2.8607 2.2374 953 1637 2.105E-142 4.40916 SPI1 1.2742 2.5664 2.0140 938 1603 3.38E-136 3.37343 MAX 1.2715 0.2897 0.2278 213 343 1.448E-04 0.36601 NFE2L2 1.2547 0.1860 0.1482 178 272 2.993E-04 0.17453 Myb 1.2321 1.0439 0.8473 634 1080 31.65E-44 1.13969 NR4A2 1.2292 2.4664 2.0065 928 1635 8.962E-128 3.08367 GRMotifHH 1.2122 0.3065 0.2529 270 420 10.12E-08 0.30684 RUNX1 1.2063 0.4065 0.3370 341 553 1.191E-10 0.39456 En1 0.8256 4.7065 5.7011 1029 1952 19.87E-168 10.57973 FOXO3 0.7563 1.3944 1.8438 758 1536 33.33E-56 2.81002 Cebpa 0.7522 0.7710 1.0250 501 1167 1.094E-08 1.26634 Gfi 0.7375 0.9542 1.2939 615 1339 42.21E-24 1.53217 FOXD1 0.7321 0.6262 0.8553 457 1041 96.77E-08 1.02551 SOX9 0.7247 0.5150 0.7106 402 958 25.76E-04 0.79016 HOXA5 0.6998 5.4589 7.8002 1023 1959 77.83E-164 20.65171 Nobox 0.6651 1.2495 1.8788 621 1440 2.977E-20 3.33886 Sox5 0.6511 1.5421 2.3686 737 1664 31.34E-42 3.91463 Nkx2-5 0.6168 4.6458 7.5318 976 1914 10.34E-136 26.47386 SRY 0.5995 2.4598 4.1032 854 1825 38.1E-78 9.34247 HNF1B 0.5974 0.0598 0.1002 58 184 26.35E-04 0.09370 IRF1 0.5741 0.1860 0.3240 172 500 65.79E-04 0.34072 ARID3A 0.5635 2.5720 4.5643 837 1790 10.09E-72 12.85407 Pdx1 0.5589 1.8243 3.2644 744 1746 11.95E-40 6.61261 Prrx2 0.5442 1.6813 3.0896 733 1712 8.81E-38 6.04146 NFIL3 0.5272 0.1262 0.2394 101 338 59.28E-06 0.62051 Ddit3::Cebpa 0.4301 0.2252 0.5238 209 692 69.81E-06 0.59326 MEF2A 0.4296 0.1869 0.4352 159 551 3.656E-06 0.63614 Lhx3 0.4248 0.1682 0.3961 122 487 35.25E-10 0.65218 SRF 0.4237 0.0112 0.0265 9 42 1.671E-02 0.03054 FOXL1 0.4181 4.0112 9.5944 903 1889 1.176E-96 90.88222 Foxd3 0.3695 1.1850 3.2073 520 1497 10.36E-04 15.15680 Pou5f1 0.2103 0.0028 0.0135 3 27 80.06E-04 0.00969 Pax6 0.0613 0.0009 0.0160 1 31 3.573E-04 0.01130

111 12.19 AR without FoxA1 siFOXA1 binding site overlaps (down)

Chromosome specific statistics are shown in Table 65. A histogram of sequence lengths is shown in Figure 61.

length chromosome frequency min mean max total coverage 1 73 59 503 992 36752 0.000147 10 14 248 412 557 5770 4.3e-05 11 26 283 428 675 11133 8.2e-05 12 20 256 443 616 8868 6.6e-05 13 3 390 601 822 1802 1.6e-05 14 9 297 511 708 4601 4.3e-05 15 26 268 535 885 13906 0.000136 16 40 211 470 935 18796 0.000208 17 24 279 549 1349 13165 0.000162 18 13 4 430 727 5584 7.2e-05 19 11 273 410 707 4514 7.6e-05 2 23 239 453 969 10428 4.3e-05 20 9 246 490 846 4409 7e-05 21 4 131 382 499 1530 3.2e-05 22 7 299 487 727 3408 6.6e-05 3 55 277 529 1311 29083 0.000147 4 25 257 433 681 10835 5.7e-05 5 19 31 415 743 7884 4.4e-05 6 19 256 472 687 8967 5.2e-05 7 9 232 471 745 4240 2.7e-05 8 31 242 487 1116 15104 0.000103 9 23 96 471 840 10838 7.7e-05 X 5 258 465 628 2327 1.5e-05 all 23 488 4 479 1349 233944 3.2e-05

Table 65: Chromosome specific distribution of the regions. The last line represents the overall statistics. 100 60 Frequency 20 0

0 200 400 600 800 1000 1200 1400

length (base pairs)

Figure 61: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

112 The following table shows the properties of seqAXnSid-gCount component. property value genes 150

(a) seqAXnSid-deNovo-meme1: width=15, (b) seqAXnSid-deNovo-meme2: width=15, (c) seqAXnSid-deNovo-meme3: width=15, sites=91, sites=222, llr=2044, E=6.1e-17 sites=129, llr=1325, E=1.2e-11 llr=982, E=1

Figure 62: De novo motifs for the filtered AR without FoxA1 siFOXA1 binding site overlaps (down) sequences.

Table 66: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 17.6639 0.1170 0.0066 51 5 53.91E-20 0.05448 CTCF 5.5151 0.0123 0.0022 6 2 4.291E-02 0.00569 ELK4 4.0229 0.0575 0.0143 24 13 1.674E-04 0.03562 TLX1::NFIC 3.7276 0.0287 0.0077 12 6 87.3E-04 0.01909 ESR2 3.2686 0.0431 0.0132 16 10 67.05E-04 0.03735 E2F1 3.0180 0.0431 0.0143 20 13 26.26E-04 0.02516 STAT1 2.9233 0.1027 0.0351 38 27 41.91E-06 0.08383 RORA 2 2.8032 0.0431 0.0154 20 14 44.77E-04 0.02584 Stat3 2.7074 0.4127 0.1524 137 118 9.354E-16 0.34576 GABPA 2.5849 0.1786 0.0691 76 59 57.12E-10 0.12154 PLAG1 2.4924 0.0411 0.0164 20 14 44.77E-04 0.02584 RXR::RAR DR5 2.4655 0.0595 0.0241 29 22 9.274E-04 0.03515 Tal1::Gata1 2.3871 0.0760 0.0318 37 28 1.262E-04 0.04641 Myc 2.1932 0.2526 0.1151 98 79 24.25E-12 0.21663 Zfp423 2.0833 0.1417 0.0680 55 43 2.188E-06 0.16361 MYC::MAX 2.0797 0.1027 0.0493 37 37 40.36E-04 0.09338 Tcfcp2l1 2.0448 0.1458 0.0713 60 59 61.77E-06 0.11358 TFAP2A 2.0285 3.7680 1.8575 395 571 17.12E-60 11.31405 Mycn 2.0283 0.2136 0.1053 78 71 23.89E-08 0.20415 INSM1 1.9627 0.3573 0.1820 143 145 37.76E-14 0.27137 Myf 1.9352 0.4435 0.2292 138 153 95.95E-12 0.50922 GRMotifTH 1.8725 0.4394 0.2346 160 188 4.097E-12 0.35555 NHLH1 1.8084 0.2320 0.1283 70 71 18.02E-06 0.28053 Egr1 1.7891 0.1766 0.0987 77 81 11.29E-06 0.13724 EBF1 1.7836 1.2341 0.6919 294 391 11.74E-32 1.49719 ARMotifTH 1.7413 0.0821 0.0471 37 42 1.649E-02 0.06157 Esrrb 1.7343 0.1807 0.1042 73 86 2.736E-04 0.15097 Arnt 1.7267 0.3162 0.1831 99 112 1.281E-06 0.36577 Zfx 1.7247 0.3593 0.2083 120 163 9.248E-06 0.35463 ARMotifHH 1.7029 0.6181 0.3629 216 252 1.781E-20 0.60121 SP1 1.6765 2.3162 1.3816 305 459 53.26E-30 7.01849 HIF1A::ARNT 1.6760 0.7002 0.4178 201 258 26.88E-16 0.79641 Klf4 1.6508 0.4127 0.2500 138 176 3.499E-08 0.43594 NFIC 1.5787 4.0021 2.5351 461 787 4.471E-80 5.35842 ELK1 1.5756 1.2854 0.8158 355 466 4.466E-50 1.25064 NFKB1 1.5603 0.1643 0.1053 51 67 2.126E-02 0.22594 ARMotifT 1.5230 0.6530 0.4287 230 307 1.414E-18 0.55771 GRMotifT 1.4882 2.9487 1.9814 446 757 1.349E-72 3.63051 Gata1 1.4671 0.8542 0.5822 260 379 77.9E-22 0.83545 ARMotifH 1.4594 10.8624 7.4430 486 903 63.41E-90 23.45986 Hand1::Tcfe2a 1.4294 1.6489 1.1535 370 597 4.262E-44 1.68481 MAX 1.4293 0.2649 0.1853 96 135 6.759E-04 0.27934 MZF1 1-4 1.3562 6.0287 4.4452 473 838 2.353E-84 16.04076 NR3C1 1.3459 0.3306 0.2456 133 195 14.05E-06 0.28973 GR 1.3117 0.2977 0.2270 128 172 1.681E-06 0.27570 Myb 1.3095 0.9979 0.7621 296 466 41.33E-26 0.99001 GRMotifHH 1.3087 0.2526 0.1930 106 147 1.341E-04 0.24113 SPI1 1.2943 2.3943 1.8498 424 725 9.986E-62 2.84323 Mafb 1.2822 2.6797 2.0899 426 721 30.6E-64 4.19104 Arnt::Ahr 1.2613 1.8809 1.4912 356 565 1.991E-40 5.71333 TAL1::TCF3 1.2545 0.2834 0.2259 92 151 2.26E-02 0.35151 ZNF354C 1.2480 4.1766 3.3465 468 824 6.385E-82 8.11594 USF1 1.2342 0.6509 0.5274 169 262 22.12E-08 1.00487 NR4A2 1.2179 2.2382 1.8377 417 716 1.971E-58 2.49590 FOXC1 0.8242 6.0534 7.3443 472 897 31.96E-80 20.13702 FOXO3 0.7878 1.2793 1.6239 321 689 1.433E-18 2.20724 SOX9 0.7734 0.4969 0.6425 178 397 1.406E-02 0.68523 Gfi 0.7494 0.8891 1.1864 256 602 89.31E-08 1.40943 FOXD1 0.7305 0.5975 0.8180 201 465 35.55E-04 1.07177 Cebpa 0.6873 0.6263 0.9112 207 508 95.06E-04 1.00970 HOXA5 0.6872 4.9610 7.2193 461 888 2.318E-72 18.79220 Sox5 0.6258 1.4025 2.2412 315 723 33.42E-16 3.89656 FOXF2 0.6141 0.0821 0.1338 36 116 2.362E-02 0.11677 Nobox 0.5775 1.0144 1.7566 266 633 16.42E-08 2.92256 Nkx2-5 0.5720 4.0801 7.1327 426 863 89.28E-54 24.10668 SRY 0.5547 2.2053 3.9759 362 820 9.122E-26 8.93430 ARID3A 0.5298 2.2895 4.3213 358 808 67.6E-26 11.63488 Pdx1 0.5139 1.5708 3.0570 317 792 62.86E-14 6.02972 Prrx2 0.4868 1.4251 2.9276 296 762 41.55E-10 5.62876 NFIL3 0.4549 0.1047 0.2303 44 147 91.44E-04 0.30065 MEF2A 0.4387 0.1602 0.3651 58 233 62.03E-06 0.43080 FOXL1 0.4213 3.6057 8.5592 393 850 59.52E-38 66.50133 Ddit3::Cebpa 0.4013 0.2094 0.5219 90 320 12.72E-04 0.56881 IRF1 0.3916 0.1232 0.3147 57 206 10.71E-04 0.40410 Lhx3 0.2828 0.1150 0.4068 40 229 57.06E-10 0.53553 T 0.1801 0.0041 0.0230 2 21 1.703E-02 0.01618

113 12.20 FoxA1 without AR siFOXA1 binding site overlaps

Chromosome specific statistics are shown in Table 67. A histogram of sequence lengths is shown in Figure 63.

length chromosome frequency min mean max total coverage 1 95 144 330 749 31340 0.000126 10 99 117 281 592 27789 0.000205 11 115 166 278 607 32000 0.000237 12 91 167 268 531 24433 0.000183 13 84 170 338 671 28401 0.000247 14 89 101 285 548 25323 0.000236 15 93 158 284 611 26449 0.000258 16 71 141 268 495 18995 0.00021 17 81 72 262 562 21184 0.000261 18 49 21 305 804 14932 0.000191 19 36 171 245 343 8835 0.000149 2 153 125 283 483 43224 0.000178 20 44 150 268 451 11812 0.000187 21 34 196 288 475 9782 0.000203 22 40 170 264 483 10564 0.000206 3 208 89 302 775 62791 0.000317 4 97 118 271 501 26298 0.000138 5 123 128 276 533 33929 0.000188 6 115 135 270 566 31001 0.000181 7 128 3 287 473 36693 0.000231 8 143 18 285 560 40799 0.000279 9 104 28 276 559 28718 0.000203 X 23 97 249 345 5733 3.7e-05 all 23 2115 3 284 804 601025 8.2e-05

Table 67: Chromosome specific distribution of the regions. The last line represents the overall statistics. 500 300 Frequency 100 0

0 200 400 600 800

length (base pairs)

Figure 63: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

114 The following table shows the properties of seqSinAX-gCount component. property value genes 6304

(a) seqSinAX-deNovo-meme1: width=15, sites=399, (b) seqSinAX-deNovo-meme2: width=15, sites=89, (c) seqSinAX-deNovo-meme3: width=15, sites=48, llr=3674, E=0 llr=966, E=3e-23 llr=562, E=3.2

Figure 64: De novo motifs for the filtered FoxA1 without AR siFOXA1 binding site overlaps sequences.

Table 68: Motif enrichments

motif ratio fC fR n1C n1R p1 var Pax4 11.0215 0.0033 0.0003 5 1 3.997E-02 0.00199 CTCF 6.7304 0.0123 0.0018 26 7 32.57E-08 0.00544 Foxa2 6.0376 1.3288 0.2200 1841 660 0.00E00 0.79273 FOXA1 5.9920 1.3600 0.2269 1816 782 0.00E00 0.74342 FOXF2 5.5079 0.4272 0.0775 789 285 5.468E-204 0.20939 TLX1::NFIC 4.8964 0.0189 0.0038 30 13 3.396E-06 0.01301 FOXD1 3.2367 1.5714 0.4855 1835 1409 0.00E00 1.07163 Foxq1 2.4716 0.6391 0.2585 1007 833 71.78E-150 0.45938 GABPA 2.3031 0.0776 0.0337 147 125 3.766E-12 0.05559 Tal1::Gata1 2.1655 0.0426 0.0196 89 77 26.86E-08 0.02724 ELK4 2.0089 0.0241 0.0120 48 46 12.1E-04 0.01730 E2F1 1.7860 0.0132 0.0074 27 29 4.921E-02 0.00969 NFYA 1.7542 0.0582 0.0331 112 119 5.57E-06 0.05276 NHLH1 1.6963 0.1064 0.0627 149 155 2.963E-08 0.13492 Evi1 1.5888 0.0341 0.0214 68 81 52.45E-04 0.02750 Stat3 1.5752 0.1518 0.0964 245 302 39.32E-10 0.15941 FOXI1 1.5626 0.7630 0.4883 1029 1284 56.43E-90 0.89874 FOXA1pAR 1.5590 0.0823 0.0528 163 176 2.748E-08 0.07837 FOXO3 1.5505 1.5691 1.0120 1640 2318 85.34E-228 1.49679 Arnt 1.4326 0.1268 0.0885 178 233 26.79E-06 0.17835 NFIC 1.3896 2.0203 1.4539 1781 2801 11.47E-256 2.20631 Tcfcp2l1 1.3692 0.0520 0.0380 101 133 43.38E-04 0.04969 Myb 1.3484 0.6116 0.4536 917 1367 45.85E-50 0.58633 Gata1 1.3476 0.5274 0.3914 821 1220 1.619E-38 0.50283 TAL1::TCF3 1.3422 0.1746 0.1300 281 375 1.841E-08 0.20862 FEV 1.3089 0.4636 0.3542 766 1145 1.349E-32 0.42108 Mycn 1.3014 0.0823 0.0632 129 192 1.802E-02 0.10646 ELK1 1.2419 0.6206 0.4997 941 1485 10.5E-46 0.60618 Nr2e3 1.2174 0.3666 0.3011 464 736 10.25E-10 0.54120 Myc 1.2129 0.0965 0.0796 165 245 41.66E-04 0.12094 YY1 0.8271 2.2332 2.7002 1821 3522 1.292E-220 3.79866 CREB1 0.8047 0.4536 0.5637 737 1565 8.187E-08 0.61527 FOXL1 0.8047 4.1353 5.1390 1884 3424 14.61E-264 24.59160 Pdx1 0.7840 1.4555 1.8565 1426 2872 78.56E-92 3.10311 MZF1 5-13 0.7776 0.7389 0.9503 1038 2259 3.128E-24 1.12387 NF-kappaB 0.7594 0.1045 0.1377 167 425 1.604E-02 0.18410 Prrx2 0.7218 1.2725 1.7629 1353 2792 2.522E-72 2.78788 PBX1 0.6436 0.0487 0.0757 99 247 4.77E-02 0.14374 Lhx3 0.6389 0.1528 0.2392 250 638 1.711E-02 0.35685 Zfx 0.5597 0.0629 0.1124 114 386 68.98E-08 0.11523 Ddit3::Cebpa 0.5497 0.1410 0.2565 268 790 18.75E-06 0.26616

115 12.21 FoxA1 without AR siFOXA1 binding site overlaps (up)

Chromosome specific statistics are shown in Table 69. A histogram of sequence lengths is shown in Figure 65.

length chromosome frequency min mean max total coverage 1 7 206 345 466 2412 1e-05 10 4 225 273 340 1092 8e-06 11 4 195 272 355 1086 8e-06 12 4 210 283 381 1131 8e-06 13 1 322 322 322 322 3e-06 15 2 195 218 242 437 4e-06 16 3 221 313 495 938 1e-05 17 1 299 299 299 299 4e-06 2 7 164 248 346 1738 7e-06 20 1 281 281 281 281 4e-06 21 1 350 350 350 350 7e-06 22 1 206 206 206 206 4e-06 3 10 191 318 418 3179 1.6e-05 4 2 321 333 345 666 3e-06 5 7 201 272 386 1902 1.1e-05 6 10 181 271 355 2708 1.6e-05 7 7 178 316 459 2213 1.4e-05 8 2 223 254 285 508 3e-06 9 6 194 232 294 1395 1e-05 X 3 219 244 266 732 5e-06 all 20 83 164 284 495 23595 3e-06

Table 69: Chromosome specific distribution of the regions. The last line represents the overall statistics. 20 15 10 Frequency 5 0

150 200 250 300 350 400 450 500

length (base pairs)

Figure 65: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

116 The following table shows the properties of seqSinAXu-gCount component. property value genes 67

(a) seqSinAXu-deNovo-meme1: width=11, sites=69, (b) seqSinAXu-deNovo-meme2: width=15, sites=32, (c) seqSinAXu-deNovo-meme3: width=15, sites=4, llr=579, E=7.1e-26 llr=333, E=0.23 llr=72, E=8e+05

Figure 66: De novo motifs for the filtered FoxA1 without AR siFOXA1 binding site overlaps (up) sequences.

Table 70: Motif enrichments

motif ratio fC fR n1C n1R p1 var FOXA1 4.3496 1.3133 0.3019 69 38 39.3E-24 0.75994 NFYA 4.3037 0.1084 0.0252 8 4 3.074E-02 0.05934 Foxa2 3.8670 1.2892 0.3333 71 39 2.36E-24 0.78097 FOXF2 3.2557 0.4096 0.1258 31 17 22.12E-08 0.22386 Stat3 2.9782 0.1687 0.0566 12 8 1.613E-02 0.11126 FOXD1 2.7196 1.3855 0.5094 71 58 11.42E-20 0.85998 Nr2e3 1.9590 0.5422 0.2767 24 29 1.575E-02 0.60692 Foxq1 1.9155 0.4940 0.2579 30 35 19.21E-04 0.40753 Foxd3 1.6491 2.3855 1.4465 52 90 41.51E-06 31.20764 Myb 1.5963 0.6627 0.4151 40 51 1.207E-04 0.54149 FEV 1.5180 0.5060 0.3333 32 43 42.43E-04 0.43032 SPIB 1.4153 2.3855 1.6855 75 121 9.793E-14 2.44258 FOXO3 1.3953 1.4217 1.0189 60 94 5.104E-08 1.70968 BRCA1 1.2899 1.6145 1.2516 64 106 76.47E-10 1.93685 NFIC 1.2692 1.9398 1.5283 65 119 2.80E-08 2.28860 SPI1 1.2558 1.4217 1.1321 61 102 9.135E-08 1.53959 SOX9 1.2481 0.5181 0.4151 33 51 1.257E-02 0.46434 ELF5 1.2479 1.3735 1.1006 58 100 1.259E-06 1.36876 AP1 1.2404 2.1687 1.7484 75 127 33.37E-14 2.18758 Nkx2-5 0.8330 3.3373 4.0063 74 146 39.54E-12 9.12842 GATA3 0.8250 3.1807 3.8553 81 155 47.76E-16 4.57585 Sox5 0.8210 1.0843 1.3208 49 102 21.83E-04 1.98381 Nobox 0.8084 0.8795 1.0881 42 89 2.487E-02 1.67607 ARID3A 0.7974 1.9759 2.4780 61 121 2.351E-06 4.73598 NR4A2 0.7856 0.8795 1.1195 49 105 31.33E-04 1.04010 FOXL1 0.6882 3.8434 5.5849 70 141 30.69E-10 50.71769 Pdx1 0.6428 1.2048 1.8742 53 110 3.974E-04 2.84414 Prrx2 0.5950 1.0964 1.8428 49 110 54.78E-04 2.83269

117 12.22 FoxA1 without AR siFOXA1 binding site overlaps (down)

Chromosome specific statistics are shown in Table 71. A histogram of sequence lengths is shown in Figure 67.

length chromosome frequency min mean max total coverage 1 10 253 329 418 3292 1.3e-05 2 4 230 301 391 1203 5e-06 3 7 218 295 392 2066 1e-05 4 3 212 250 270 749 4e-06 5 2 296 332 368 664 4e-06 6 7 223 287 391 2010 1.2e-05 7 3 242 268 290 803 5e-06 8 6 208 281 363 1686 1.2e-05 9 1 405 405 405 405 3e-06 10 5 253 303 352 1514 1.1e-05 11 6 196 242 365 1451 1.1e-05 12 4 195 226 304 902 7e-06 13 2 395 533 671 1066 9e-06 14 5 229 438 627 2189 2e-05 15 3 275 298 329 894 9e-06 16 5 141 232 329 1162 1.3e-05 17 2 207 212 218 425 5e-06 18 1 170 170 170 170 2e-06 19 1 195 195 195 195 3e-06 20 2 222 252 282 504 8e-06 21 1 457 457 457 457 9e-06 22 2 198 241 284 482 9e-06 all 22 82 141 296 671 24289 3e-06

Table 71: Chromosome specific distribution of the regions. The last line represents the overall statistics. 20 15 10 Frequency 5 0

100 200 300 400 500 600 700

length (base pairs)

Figure 67: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

118 The following table shows the properties of seqSinAXd-gCount component. property value genes 63

(a) seqSinAXd-deNovo-meme1: width=11, sites=70, (b) seqSinAXd-deNovo-meme2: width=15, sites=35, (c) seqSinAXd-deNovo-meme3: width=15, sites=19, llr=603, E=3.1e-36 llr=361, E=0.019 llr=224, E=1e+05

Figure 68: De novo motifs for the filtered FoxA1 without AR siFOXA1 binding site overlaps (down) sequences.

Table 72: Motif enrichments

motif ratio fC fR n1C n1R p1 var Foxa2 5.0730 1.2927 0.2548 69 26 23.24E-28 0.78492 FOXA1 4.6195 1.3537 0.2930 68 34 8.279E-24 0.82297 FOXF2 3.5887 0.3659 0.1019 28 15 1.098E-06 0.18129 FOXD1 3.2081 1.5122 0.4713 72 52 5.781E-22 0.99146 Foxq1 2.2249 0.5244 0.2357 31 30 1.661E-04 0.41690 Gata1 1.9528 0.6220 0.3185 36 43 2.538E-04 0.48033 FOXI1 1.6261 0.7561 0.4650 43 48 3.602E-06 0.86027 Myb 1.6261 0.7561 0.4650 45 56 6.836E-06 0.60817 FOXO3 1.6165 1.7195 1.0637 66 97 62.99E-12 1.74403 NFIC 1.4618 2.2439 1.5350 70 120 48.91E-12 2.16490 BRCA1 1.3324 1.4512 1.0892 56 99 5.177E-06 1.49629 ELK1 1.3195 0.6220 0.4713 37 58 42.61E-04 0.54464 SPI1 1.2927 1.2927 1.0000 54 97 21.28E-06 1.35962 Cebpa 1.2764 0.7073 0.5541 35 62 2.529E-02 0.84466 Hand1::Tcfe2a 1.2308 0.8780 0.7134 51 77 6.227E-06 0.79136 NFATC2 1.2153 1.2927 1.0637 56 96 3.021E-06 1.42506 Pdx1 0.8313 1.4878 1.7898 55 112 91.73E-06 2.72044 ARID3A 0.7878 2.0976 2.6624 63 129 76.33E-08 5.53578 Prrx2 0.7058 1.1463 1.6242 53 111 3.707E-04 2.14862 FOXL1 0.6979 3.8049 5.4522 72 136 61.51E-12 25.72248

119 12.23 AR and AR siFOXA1 binding site overlaps

Chromosome specific statistics are shown in Table 73. A histogram of sequence lengths is shown in Figure 69.

length chromosome frequency min mean max total coverage 1 383 1 422 921 161616 0.000648 10 165 169 396 923 65413 0.000483 11 195 3 399 925 77774 0.000576 12 148 44 376 876 55673 0.000416 13 84 192 452 1452 37950 0.00033 14 116 34 393 773 45612 0.000425 15 129 13 414 1057 53424 0.000521 16 119 22 387 956 46027 0.000509 17 156 9 402 840 62664 0.000772 18 79 189 431 1080 34036 0.000436 19 53 45 405 949 21452 0.000363 2 261 22 402 806 104990 0.000432 20 95 3 420 1583 39872 0.000633 21 51 53 412 750 21025 0.000437 22 44 178 354 571 15575 0.000304 3 231 3 446 1010 103070 0.00052 4 147 185 385 798 56609 0.000296 5 239 25 408 975 97467 0.000539 6 212 14 375 908 79509 0.000465 7 207 8 398 984 82315 0.000517 8 199 3 413 784 82108 0.000561 9 128 51 397 765 50795 0.00036 X 73 122 351 690 25649 0.000165 Y 2 232 290 347 579 1e-05 all 24 3516 1 404 1583 1421204 0.000193

Table 73: Chromosome specific distribution of the regions. The last line represents the overall statistics. 1000 600 Frequency 200 0

0 500 1000 1500

length (base pairs)

Figure 69: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

120 The following table shows the properties of seqARSi-gCount component. property value genes 8869

(a) seqARSi-deNovo-meme1: width=15, sites=397, (b) seqARSi-deNovo-meme2: width=15, sites=125, (c) seqARSi-deNovo-meme3: width=15, sites=61, llr=3567, E=1.4e-243 llr=1278, E=2.4e-09 llr=720, E=6.1e-07

Figure 70: De novo motifs for the filtered AR and AR siFOXA1 binding site overlaps sequences.

Table 74: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 15.3916 0.1562 0.0101 495 61 6.249E-168 0.06957 TLX1::NFIC 3.8505 0.0219 0.0057 61 29 93.23E-12 0.01601 RXRA::VDR 2.4654 0.0103 0.0041 36 27 3.259E-04 0.00624 FOXA1 2.2351 0.7586 0.3394 1894 1771 50.52E-270 0.57253 Tal1::Gata1 2.1119 0.0647 0.0306 216 195 24.06E-16 0.04391 FOXF2 2.0455 0.2286 0.1117 707 672 23.21E-56 0.16118 ARMotifTH 2.0181 0.0733 0.0363 250 226 8.242E-18 0.05196 FOXA1pAR 1.9946 0.1368 0.0686 456 411 45.67E-36 0.09843 Stat3 1.9865 0.2591 0.1304 664 685 26.12E-44 0.24563 GABPA 1.9700 0.1035 0.0525 346 328 45.72E-24 0.07215 STAT1 1.9542 0.0536 0.0274 148 152 4.797E-08 0.04996 Foxa2 1.9510 0.6260 0.3208 1655 1545 45.21E-212 0.61030 ARMotifT 1.9168 0.7007 0.3655 1727 1890 97.25E-186 0.55570 NR3C1 1.9006 0.4222 0.2221 1149 1241 1.543E-90 0.33325 ESR2 1.8594 0.0277 0.0148 82 87 1.812E-04 0.02433 GRMotifTH 1.8176 0.3837 0.2111 1081 1218 17.51E-76 0.30433 Tcfcp2l1 1.7272 0.0807 0.0467 258 275 81.06E-14 0.06709 ARMotifTT 1.6552 0.0117 0.0070 41 44 1.24E-02 0.00899 AR 1.6545 0.0616 0.0372 211 238 55.88E-10 0.04562 MIZF 1.5949 0.0342 0.0214 118 140 1.473E-04 0.02562 E2F1 1.5670 0.0168 0.0107 58 69 1.173E-02 0.01308 ARMotifHH 1.5572 0.4293 0.2757 1157 1453 7.991E-68 0.38743 PPARG::RXRA 1.5094 0.0245 0.0162 81 105 1.176E-02 0.02035 ELK4 1.5083 0.0305 0.0202 102 130 26.86E-04 0.02483 FOXD1 1.4709 1.0222 0.6949 2177 3039 12.1E-216 1.00077 GRMotifT 1.4564 2.5185 1.7292 3195 5222 0.00E00 2.56830 NHLH1 1.4543 0.1596 0.1098 348 435 18.76E-12 0.26520 Gata1 1.4443 0.7791 0.5394 1791 2596 3.369E-126 0.72609 Evi1 1.4333 0.0410 0.0286 138 175 2.838E-04 0.03587 Esrrb 1.4282 0.1286 0.0900 427 555 52.79E-14 0.10453 Foxq1 1.4242 0.4986 0.3501 1325 1794 1.294E-74 0.47467 Arnt 1.3924 0.1901 0.1365 469 591 2.939E-16 0.25149 NFIC 1.3890 2.9102 2.0952 3165 5335 0.00E00 3.90055 ARMotifH 1.3729 8.8720 6.4624 3485 6430 0.00E00 15.90593 Myf 1.3547 0.2648 0.1955 706 980 8.667E-22 0.37942 GR 1.3201 0.2902 0.2198 881 1222 5.305E-32 0.26576 GRMotifHH 1.3160 0.2383 0.1811 706 995 1.053E-20 0.23277 ELK1 1.2987 0.9202 0.7086 2029 3150 3.357E-152 0.91105 RXR::RAR DR5 1.2923 0.0311 0.0240 108 153 1.959E-02 0.02679 FEV 1.2912 0.6645 0.5146 1679 2551 2.472E-98 0.67661 NR2F1 1.2896 0.1271 0.0986 416 602 2.068E-08 0.11272 Hand1::Tcfe2a 1.2740 1.2674 0.9948 2426 3919 1.183E-232 1.30703 EBF1 1.2721 0.6907 0.5429 1480 2306 2.159E-68 0.91240 Mycn 1.2664 0.1157 0.0914 313 455 9.203E-06 0.14391 Myc 1.2636 0.1462 0.1157 392 588 1.218E-06 0.17994 TFAP2A 1.2349 1.8127 1.4679 2243 3729 3.95E-178 5.52110 GRMotifH 1.2286 5.9153 4.8146 3453 6322 0.00E00 9.20255 Myb 1.2249 0.7899 0.6449 1851 2953 11.06E-114 0.79266 SPI1 1.2248 1.9074 1.5573 2860 4849 0.00E00 2.42234 Egr1 1.2229 0.0944 0.0771 300 438 22.06E-06 0.11051 NFE2L2 1.2107 0.1308 0.1081 420 659 6.902E-06 0.12130 YY1 0.8272 3.2617 3.9429 3250 6207 0.00E00 7.25763 Nkx2-5 0.8213 4.8663 5.9255 3281 6170 0.00E00 17.53102 CREB1 0.8042 0.6491 0.8071 1566 3270 1.123E-32 0.97998 ARID3A 0.7926 2.8828 3.6371 2935 5540 0.00E00 9.04300 NKX3-1 0.7846 0.5661 0.7216 1379 2810 31.53E-24 1.11706 Pdx1 0.7473 1.9199 2.5691 2623 5371 27.84E-210 4.72157 Prrx2 0.7018 1.7038 2.4278 2493 5264 2.62E-166 4.20567 NFIL3 0.6828 0.1351 0.1979 376 927 1.087E-02 0.36594 IRF1 0.6761 0.1639 0.2425 501 1284 41.36E-04 0.26482 FOXL1 0.6235 4.7700 7.6507 3192 6027 0.00E00 72.67483 Foxd3 0.6145 1.4766 2.4030 2211 4438 1.894E-120 10.02348 MEF2A 0.6128 0.2030 0.3312 554 1458 15.72E-04 0.70600 Lhx3 0.5653 0.1887 0.3338 494 1373 13.63E-06 0.52889 Ddit3::Cebpa 0.4909 0.1964 0.4001 606 1869 12.89E-10 0.43710

121 12.24 AR and AR siFOXA1 binding site overlaps (up)

Chromosome specific statistics are shown in Table 75. A histogram of sequence lengths is shown in Figure 71.

length chromosome frequency min mean max total coverage 1 33 239 460 773 15166 6.1e-05 2 16 239 443 787 7089 2.9e-05 3 4 478 546 622 2185 1.1e-05 4 13 282 455 678 5918 3.1e-05 5 8 156 363 436 2904 1.6e-05 6 27 189 408 682 11006 6.4e-05 7 10 8 353 481 3534 2.2e-05 8 10 133 407 585 4071 2.8e-05 9 1 498 498 498 498 4e-06 10 8 198 445 608 3558 2.6e-05 11 10 273 392 511 3919 2.9e-05 12 8 301 456 875 3648 2.7e-05 14 2 319 382 446 765 7e-06 15 7 280 475 664 3324 3.2e-05 19 5 423 581 711 2905 4.9e-05 20 8 332 481 683 3847 6.1e-05 21 6 351 448 527 2691 5.6e-05 all 17 176 8 438 875 77028 1e-05

Table 75: Chromosome specific distribution of the regions. The last line represents the overall statistics. 50 40 30 20 Frequency 10 0

0 200 400 600 800

length (base pairs)

Figure 71: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

122 The following table shows the properties of seqARSiu-gCount component. property value genes 70

(a) seqARSiu-deNovo-meme1: width=15, sites=91, (b) seqARSiu-deNovo-meme2: width=15, sites=29, (c) seqARSiu-deNovo-meme3: width=15, sites=22, llr=896, E=2.5e-25 llr=350, E=860 llr=286, E=100

Figure 72: De novo motifs for the filtered AR and AR siFOXA1 binding site overlaps (up) sequences.

Table 76: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 26.1134 0.1600 0.0061 26 2 1.01E-10 0.06404 GABPA 2.9740 0.1086 0.0365 18 11 20.31E-04 0.06579 AR 2.4416 0.0743 0.0304 13 10 3.554E-02 0.04364 NR3C1 2.3436 0.5486 0.2340 68 65 1.139E-08 0.41673 GRMotifTH 2.2974 0.4400 0.1915 57 57 2.115E-06 0.32030 FOXA1pAR 2.2747 0.1314 0.0578 22 17 30.67E-04 0.08847 FOXA1 2.2558 0.7886 0.3495 98 85 43.58E-18 0.60437 Mycn 2.1925 0.1600 0.0729 20 20 3.217E-02 0.15236 Foxa2 2.1194 0.6571 0.3100 78 73 64.97E-12 0.65918 Egr1 1.9781 0.1143 0.0578 18 16 2.295E-02 0.09539 Esrrb 1.9688 0.1257 0.0638 20 19 2.288E-02 0.09410 Stat3 1.9181 0.2857 0.1489 39 38 3.323E-04 0.25359 ARMotifT 1.8799 0.7543 0.4012 88 107 12.35E-10 0.58393 FOXF2 1.7486 0.2286 0.1307 34 41 1.214E-02 0.16964 Myf 1.7329 0.3371 0.1945 39 45 28.57E-04 0.45126 GRMotifT 1.5726 2.7771 1.7660 163 271 61.61E-30 2.91470 GRMotifHH 1.5293 0.2743 0.1793 38 49 1.231E-02 0.27492 ARMotifHH 1.5039 0.5257 0.3495 69 91 13.56E-06 0.45324 ARMotifH 1.4749 9.8629 6.6869 175 328 62.92E-34 16.72307 ELK1 1.4422 0.9600 0.6657 103 159 18.6E-10 0.82672 FOXD1 1.4349 1.0686 0.7447 110 171 65.84E-12 0.92190 RUNX1 1.4099 0.3943 0.2796 58 78 3.891E-04 0.32519 EBF1 1.3962 0.7257 0.5198 81 116 1.834E-06 0.96183 FEV 1.3918 0.7657 0.5502 93 139 6.067E-08 0.64041 Foxq1 1.3714 0.5086 0.3708 68 91 24.87E-06 0.50232 NFIC 1.3600 3.2286 2.3739 163 277 2.006E-28 4.24120 Hand1::Tcfe2a 1.3451 1.4514 1.0790 129 202 4.904E-16 1.49329 Arnt::Ahr 1.3207 1.6057 1.2158 116 194 31.41E-12 4.73328 TFAP2A 1.3028 2.2057 1.6930 123 206 47.82E-14 9.45650 USF1 1.2958 0.6971 0.5380 66 95 1.826E-04 1.14834 GR 1.2838 0.3200 0.2492 48 68 67.21E-04 0.30261 MAX 1.2709 0.2743 0.2158 39 55 2.694E-02 0.29603 Gata1 1.2697 0.7371 0.5805 90 129 6.321E-08 0.75313 SPI1 1.2555 2.2286 1.7751 152 266 1.787E-22 2.58391 HIF1A::ARNT 1.2253 0.4171 0.3404 48 76 2.605E-02 0.71788 GRMotifH 1.2070 6.2000 5.1368 173 324 9.808E-32 9.77730 Mafb 1.2045 2.1600 1.7933 145 244 2.304E-20 3.94597 En1 0.8235 4.4629 5.4195 169 327 1.136E-28 7.87706 HOXA5 0.8190 5.5714 6.8024 166 319 34.19E-28 16.44160 Spz1 0.8026 0.6343 0.7903 76 166 2.443E-02 0.93420 Cebpa 0.7916 0.7314 0.9240 82 187 1.642E-02 1.02528 Nobox 0.7745 1.3771 1.7781 108 222 1.108E-06 3.53137 Gfi 0.7687 0.8971 1.1672 104 212 4.224E-06 1.34052 Nkx2-5 0.7678 5.0057 6.5198 165 310 35.87E-28 18.52283 ARID3A 0.7593 2.8457 3.7477 153 287 19.15E-22 8.22234 Prrx2 0.6501 1.8514 2.8480 129 279 1.481E-10 5.46322 Pdx1 0.6403 1.9714 3.0790 129 282 2.154E-10 5.76729 Foxd3 0.6399 1.4743 2.3040 114 211 30.01E-10 7.99180 FOXL1 0.5959 4.4286 7.4316 158 304 36.28E-24 33.96775 Ddit3::Cebpa 0.4082 0.1886 0.4620 28 111 1.819E-02 0.46738

123 12.25 AR and AR siFOXA1 binding site overlaps (down)

Chromosome specific statistics are shown in Table 77. A histogram of sequence lengths is shown in Figure 73.

length chromosome frequency min mean max total coverage 1 1 504 504 504 504 2e-06 3 8 340 469 637 3754 1.9e-05 4 3 293 347 427 1040 5e-06 6 1 633 633 633 633 4e-06 7 4 342 448 653 1790 1.1e-05 8 1 523 523 523 523 4e-06 9 1 303 303 303 303 2e-06 11 1 616 616 616 616 5e-06 12 2 279 434 590 869 6e-06 14 3 348 376 431 1128 1.1e-05 16 1 291 291 291 291 3e-06 19 1 362 362 362 362 6e-06 all 12 27 279 438 653 11813 2e-06

Table 77: Chromosome specific distribution of the regions. The last line represents the overall statistics. 7 6 5 4 3 Frequency 2 1 0

300 400 500 600 700

length (base pairs)

Figure 73: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

124 The following table shows the properties of seqARSid-gCount component. property value genes 19

(a) seqARSid-deNovo-meme1: width=11, sites=3, (b) seqARSid-deNovo-meme2: width=14, sites=2, (c) seqARSid-deNovo-meme3: width=15, sites=21, llr=46, E=420000 llr=39, E=440000 llr=196, E=140000

Figure 74: De novo motifs for the filtered AR and AR siFOXA1 binding site overlaps (down) sequences.

Table 78: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 4445.4444 0.2222 0.0000 5 0 65.44E-04 0.09672 FOXA1pAR 6.7333 0.2593 0.0385 6 2 2.519E-02 0.12788 GRMotifHH 3.1510 0.6667 0.2115 12 9 61.02E-04 0.59429 ARMotifT 3.1293 0.9630 0.3077 21 12 31.94E-08 0.50860 NR3C1 2.9953 0.5185 0.1731 10 8 2.515E-02 0.36287 Gata1 2.0933 0.9259 0.4423 17 17 9.919E-04 0.65174 FOXI1 1.8383 0.7778 0.4231 16 16 20.51E-04 0.58455 NR4A2 1.7119 2.3704 1.3846 26 40 3.566E-06 2.12658 GRMotifH 1.5472 7.1111 4.5962 27 51 5.475E-06 8.37942 Hand1::Tcfe2a 1.5407 1.3333 0.8654 20 30 15.35E-04 1.20448 REL 1.5132 0.8148 0.5385 15 17 80.85E-04 0.90198 Arnt::Ahr 1.4727 1.4444 0.9808 23 29 23.88E-06 1.89062 GRMotifT 1.3374 2.7778 2.0769 26 41 4.447E-06 2.75755 ELK1 1.3298 1.0741 0.8077 18 31 1.471E-02 0.81013 BRCA1 1.2625 2.1852 1.7308 24 41 98.67E-06 3.07660 Fos 1.2560 1.6667 1.3269 24 41 98.67E-06 1.71146 SOX10 1.2388 9.1481 7.3846 27 52 6.463E-06 14.62804 Sox17 1.2286 1.3704 1.1154 19 36 1.387E-02 1.26615 YY1 1.2180 4.3333 3.5577 27 46 2.218E-06 8.07076 Nkx3-2 0.8282 1.5926 1.9231 21 42 48.68E-04 1.95067 Sox5 0.8082 1.7407 2.1538 23 38 2.157E-04 3.67932 Mafb 0.7543 1.7407 2.3077 20 41 1.156E-02 3.53814 FOXL1 0.7374 6.2963 8.5385 26 48 17.39E-06 58.53716 MZF1 1-4 0.7109 3.1852 4.4808 25 50 1.126E-04 10.60110 Foxd3 0.4741 1.1852 2.5000 17 29 2.464E-02 13.71535

125 12.26 AR without AR siFOXA1 binding site overlaps

Chromosome specific statistics are shown in Table 79. A histogram of sequence lengths is shown in Figure 75.

length chromosome frequency min mean max total coverage 1 265 84 417 860 110483 0.000443 10 96 15 382 626 36643 0.00027 11 122 41 403 1026 49226 0.000365 12 91 172 365 680 33223 0.000248 13 71 197 426 1063 30211 0.000262 14 83 91 414 761 34401 0.00032 15 77 230 431 724 33165 0.000323 16 46 234 391 644 17967 0.000199 17 80 129 363 656 29065 0.000358 18 37 15 393 772 14545 0.000186 19 22 153 305 436 6706 0.000113 2 187 128 369 695 68944 0.000283 20 52 239 380 627 19759 0.000314 21 30 204 432 682 12952 0.000269 22 20 198 356 505 7110 0.000139 3 204 114 449 1041 91558 0.000462 4 144 148 377 706 54331 0.000284 5 157 14 398 1152 62504 0.000345 6 176 50 375 770 66061 0.000386 7 116 194 400 890 46458 0.000292 8 127 113 413 936 52498 0.000359 9 84 199 413 775 34697 0.000246 X 60 49 332 528 19927 0.000128 Y 1 197 197 197 197 3e-06 all 24 2348 14 397 1152 932631 0.000127

Table 79: Chromosome specific distribution of the regions. The last line represents the overall statistics. 800 600 400 Frequency 200 0

0 200 400 600 800 1000 1200

length (base pairs)

Figure 75: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

126 The following table shows the properties of seqARnSi-gCount component. property value genes 6209

(a) seqARnSi-deNovo-meme1: width=15, sites=242, (b) seqARnSi-deNovo-meme2: width=15, sites=13, (c) seqARnSi-deNovo-meme3: width=15, sites=56, llr=2209, E=7.6e-52 llr=199, E=2600000 llr=635, E=2.8e+08

Figure 76: De novo motifs for the filtered AR without AR siFOXA1 binding site overlaps sequences.

Table 80: Motif enrichments

motif ratio fC fR n1C n1R p1 var FOXA1pAR 4.4736 0.2807 0.0627 591 231 1.687E-134 0.16765 Ar 4.4512 0.0396 0.0089 89 39 1.06E-16 0.02036 FOXA1 3.8378 1.2538 0.3267 1778 1197 0.00E00 0.78816 Foxa2 3.5163 1.1320 0.3219 1685 1044 0.00E00 0.81833 FOXF2 3.0683 0.3688 0.1202 734 483 2.973E-114 0.23006 FOXD1 2.3089 1.5605 0.6758 1870 1986 0.00E00 1.26377 Foxq1 2.1515 0.7415 0.3446 1220 1154 7.587E-168 0.60386 TLX1::NFIC 2.0991 0.0115 0.0055 23 18 64.57E-04 0.01046 FOXI1 1.8123 1.2364 0.6822 1567 1816 3.691E-220 1.36116 ARMotifTT 1.7351 0.0111 0.0064 25 26 4.372E-02 0.00883 Tal1::Gata1 1.5919 0.0481 0.0302 110 130 1.314E-04 0.03720 ARMotifTH 1.5678 0.0524 0.0334 121 143 51.54E-06 0.04018 ARMotifT 1.5239 0.5588 0.3667 1003 1305 16.96E-72 0.46401 AR 1.4850 0.0537 0.0361 120 157 10.38E-04 0.04282 FOXO3 1.3841 1.9991 1.4443 1940 3087 2.694E-268 2.19472 HNF1B 1.3707 0.0997 0.0727 217 303 27.28E-06 0.08720 GRMotifTH 1.3621 0.2773 0.2035 557 796 1.588E-18 0.24099 Stat3 1.3236 0.1678 0.1268 309 441 16.63E-08 0.19140 Evi1 1.3158 0.0422 0.0320 96 134 1.585E-02 0.03726 NR3C1 1.3125 0.2943 0.2242 562 852 17.28E-16 0.27780 Gata1 1.3091 0.7147 0.5459 1169 1749 1.644E-76 0.69457 SRY 1.2560 4.0635 3.2353 2231 3907 0.00E00 6.95774 NKX3-1 1.2454 0.9229 0.7410 1294 1929 1.169E-98 1.25863 GRMotifT 1.2369 2.0596 1.6652 2018 3469 61.38E-280 2.18971 GR 1.2365 0.2509 0.2029 514 770 9.627E-14 0.23805 FOXC1 1.2312 7.3952 6.0066 2326 4319 0.00E00 14.13436 Arnt 1.2303 0.1440 0.1170 234 333 22.78E-06 0.21532 TAL1::TCF3 1.2244 0.2325 0.1899 390 605 14.81E-08 0.30005 SOX9 1.2238 0.6942 0.5672 1123 1801 6.517E-60 0.70436 MZF1 1-4 0.7800 2.8203 3.6159 2041 3932 63.15E-258 7.84957 MZF1 5-13 0.7387 0.9659 1.3076 1384 2980 8.046E-56 1.59602 TFAP2A 0.7338 0.9915 1.3512 1196 2445 1.301E-38 3.35235 NFKB1 0.7310 0.0422 0.0577 72 187 4.239E-02 0.08424 CREB1 0.7262 0.5652 0.7783 965 2230 94.3E-12 0.84489 NF-kappaB 0.6636 0.1269 0.1913 228 651 1.009E-04 0.24470 NFYA 0.6006 0.0311 0.0518 70 204 41.11E-04 0.05565 Klf4 0.5795 0.1103 0.1904 223 689 98.21E-08 0.21288 Ddit3::Cebpa 0.5593 0.2164 0.3869 459 1241 1.635E-02 0.41321 Zfx 0.5228 0.0733 0.1402 157 529 4.401E-08 0.13880

127 12.27 AR without AR siFOXA1 binding site overlaps (up)

Chromosome specific statistics are shown in Table 81. A histogram of sequence lengths is shown in Figure 77.

length chromosome frequency min mean max total coverage 1 18 102 470 782 8458 3.4e-05 2 4 289 365 430 1461 6e-06 3 6 298 427 519 2563 1.3e-05 4 2 290 460 629 919 5e-06 5 4 264 376 490 1504 8e-06 6 8 270 399 538 3194 1.9e-05 7 7 400 470 581 3288 2.1e-05 8 3 294 414 515 1241 8e-06 10 4 307 394 539 1578 1.2e-05 11 1 323 323 323 323 2e-06 12 2 172 278 385 557 4e-06 13 2 352 410 467 819 7e-06 15 1 367 367 367 367 4e-06 20 6 296 367 443 2200 3.5e-05 21 4 204 409 532 1636 3.4e-05 all 15 72 102 418 782 30108 4e-06

Table 81: Chromosome specific distribution of the regions. The last line represents the overall statistics. 20 15 10 Frequency 5 0

100 200 300 400 500 600 700 800

length (base pairs)

Figure 77: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

128 The following table shows the properties of seqARnSiu-gCount component. property value genes 37

(a) seqARnSiu-deNovo-meme1: width=11, sites=53, (b) seqARnSiu-deNovo-meme2: width=15, sites=13, (c) seqARnSiu-deNovo-meme3: width=15, sites=5, llr=450, E=6 llr=173, E=75000 llr=87, E=730000

Figure 78: De novo motifs for the filtered AR without AR siFOXA1 binding site overlaps (up) sequences.

Table 82: Motif enrichments

motif ratio fC fR n1C n1R p1 var HNF1B 5.5316 0.1250 0.0226 7 3 3.845E-02 0.07499 Foxa2 3.8993 1.3194 0.3383 57 37 1.847E-16 0.78623 FOXA1 3.7296 1.4583 0.3910 58 42 6.143E-16 0.94491 FOXA1pAR 2.4619 0.1667 0.0677 12 8 1.859E-02 0.10220 FOXD1 2.1170 1.4167 0.6692 60 58 3.974E-14 1.04433 FOXI1 2.0729 1.4028 0.6767 52 54 5.136E-10 1.33845 FOXF2 2.0413 0.2917 0.1429 18 17 1.556E-02 0.21664 NKX3-1 1.7372 1.0972 0.6316 48 54 5.494E-08 1.07547 NFIL3 1.7197 0.3750 0.2180 23 20 14.89E-04 0.36619 RUNX1 1.7151 0.3611 0.2105 22 28 3.67E-02 0.23419 FOXO3 1.7074 2.3750 1.3910 65 90 45.09E-14 2.37145 Foxq1 1.5833 0.5833 0.3684 31 37 13.17E-04 0.54218 EBF1 1.4650 0.6389 0.4361 29 36 39.82E-04 0.95705 Gata1 1.3854 0.6667 0.4812 31 50 2.262E-02 0.67064 SRY 1.3745 4.3611 3.1729 69 116 26.19E-14 7.10579 ARMotifT 1.3634 0.4306 0.3158 26 37 3.04E-02 0.34806 FOXC1 1.2742 8.2778 6.4962 72 128 1.591E-14 17.72525 ELK1 1.2569 0.9167 0.7293 45 69 51.23E-06 0.82056 Myb 1.2569 0.9167 0.7293 44 63 33.03E-06 1.01664 GRMotifH 1.2541 6.1667 4.9173 70 127 38.8E-14 10.63238 BRCA1 1.2342 2.0694 1.6767 62 107 12.97E-10 2.27920 SOX9 1.2164 0.7500 0.6165 38 59 15.45E-04 0.65576 YY1 0.8149 3.2778 4.0226 67 129 56.51E-12 5.42788 NR4A2 0.7904 1.2778 1.6165 50 96 81.24E-06 2.04534 MZF1 1-4 0.7547 2.9167 3.8647 67 125 29.37E-12 9.35806 TFAP2A 0.6717 1.0556 1.5714 37 77 3.963E-02 3.73912 Arnt::Ahr 0.5534 0.9444 1.7068 42 76 18.14E-04 6.29651

129 12.28 AR without AR siFOXA1 binding site overlaps (down)

Chromosome specific statistics are shown in Table 83. A histogram of sequence lengths is shown in Figure 79.

length chromosome frequency min mean max total coverage 3 5 467 522 592 2612 1.3e-05 4 1 290 290 290 290 2e-06 6 4 308 502 765 2010 1.2e-05 7 2 284 442 600 884 6e-06 9 1 260 260 260 260 2e-06 10 1 387 387 387 387 3e-06 11 4 231 422 795 1687 1.2e-05 12 1 498 498 498 498 4e-06 14 3 414 613 733 1838 1.7e-05 16 1 519 519 519 519 6e-06 18 1 298 298 298 298 4e-06 all 11 24 231 470 795 11283 2e-06

Table 83: Chromosome specific distribution of the regions. The last line represents the overall statistics. 6 5 4 3 2 Frequency 1 0

200 300 400 500 600 700 800

length (base pairs)

Figure 79: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

130 The following table shows the properties of seqARnSid-gCount component. property value genes 17

(a) seqARnSid-deNovo-meme1: width=11, sites=18, (b) seqARnSid-deNovo-meme2: width=11, sites=2, (c) seqARnSid-deNovo-meme3: width=9, sites=2, llr=172, E=6200 llr=32, E=1300000 llr=27, E=1100000

Figure 80: De novo motifs for the filtered AR without AR siFOXA1 binding site overlaps (down) sequences.

Table 84: Motif enrichments

motif ratio fC fR n1C n1R p1 var T 3334.3333 0.1667 0.0000 4 0 2.431E-02 0.05619 FOXA1pAR 20.1246 0.4583 0.0227 9 1 3.113E-04 0.23705 FOXF2 6.1074 0.4167 0.0682 9 3 29.32E-04 0.18679 FOXA1 5.8324 1.4583 0.2500 20 11 43.86E-08 0.81914 Foxa2 4.0892 1.2083 0.2955 19 12 4.08E-06 0.59789 Foxq1 4.0733 0.8333 0.2045 14 9 6.655E-04 0.48705 FOXD1 2.1999 1.7500 0.7955 22 20 1.48E-06 1.37028 Nr2e3 2.1387 0.8750 0.4091 11 8 1.033E-02 1.47212 FOXI1 1.8888 1.4167 0.7500 17 20 18.86E-04 1.44754 Gata1 1.5714 1.0000 0.6364 17 19 13.77E-04 0.74978 Foxd3 1.5685 3.2083 2.0455 21 30 1.656E-04 8.69952 SRY 1.5617 5.7500 3.6818 24 40 12.4E-06 10.06673 NFATC2 1.4667 2.5000 1.7045 21 34 4.005E-04 3.47739 Hand1::Tcfe2a 1.4309 1.3333 0.9318 17 26 87.53E-04 1.47212 NKX3-1 1.3750 1.1250 0.8182 18 21 7.747E-04 1.02436 Gfi 1.3652 1.4583 1.0682 16 28 3.328E-02 1.86743 HOXA5 1.3187 8.5417 6.4773 24 42 18.01E-06 17.71817 Cebpa 1.2799 1.5417 1.2045 18 32 1.001E-02 1.41615 Pdx1 1.2536 3.3333 2.6591 23 37 36.91E-06 3.91462 FOXO3 1.2433 2.4583 1.9773 21 34 4.005E-04 3.41089 Nkx2-5 1.2407 8.3750 6.7500 24 43 21.51E-06 16.84899 GATA3 1.2136 7.8333 6.4545 24 44 25.54E-06 13.22037 Fos 0.8294 1.5833 1.9091 18 37 2.182E-02 2.22564 Pax2 0.8209 2.5000 3.0455 21 41 13.91E-04 3.23178 Nkx3-2 0.8105 1.7500 2.1591 22 40 3.008E-04 2.67142 MZF1 5-13 0.8088 1.2500 1.5455 16 30 4.624E-02 2.75768 Mafb 0.8075 1.5417 1.9091 20 34 14.86E-04 3.18942 YY1 0.8049 3.7500 4.6591 22 44 5.736E-04 5.98837 FOXL1 0.7573 8.4167 11.1136 24 41 14.99E-06 124.79434 MZF1 1-4 0.6491 3.0833 4.7500 21 40 11.87E-04 13.54061

131 12.29 AR siFOXA1 without AR binding site overlaps

Chromosome specific statistics are shown in Table 85. A histogram of sequence lengths is shown in Figure 81.

length chromosome frequency min mean max total coverage 1 1540 0 485 1678 747009 0.002997 10 624 13 461 1137 287376 0.00212 11 697 4 470 1448 327695 0.002427 12 473 3 448 1247 212132 0.001585 13 383 60 548 1935 210057 0.001824 14 413 180 499 1395 206184 0.001921 15 491 8 471 1108 231184 0.002255 16 445 1 466 1768 207189 0.002293 17 716 10 476 2184 341109 0.004201 18 257 4 441 994 113342 0.001452 19 259 2 411 918 106396 0.001799 2 904 10 463 1801 418248 0.00172 20 363 15 458 1424 166234 0.002638 21 153 7 450 1255 68844 0.00143 22 236 7 436 981 102992 0.002007 3 968 3 511 2174 494787 0.002499 4 461 9 434 1278 200203 0.001047 5 730 7 457 1482 333459 0.001843 6 537 4 434 1321 233297 0.001363 7 635 110 461 1174 292972 0.001841 8 596 1 455 1307 271024 0.001852 9 607 11 464 1118 281877 0.001996 X 310 68 394 1115 122219 0.000787 Y 15 244 706 1774 10593 0.000178 all 24 12813 0 467 2184 5986422 0.000814

Table 85: Chromosome specific distribution of the regions. The last line represents the overall statistics. 6000 4000 Frequency 2000 0

0 500 1000 1500 2000

length (base pairs)

Figure 81: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

132 The following table shows the properties of seqSinAR-gCount component. property value genes 18795

(a) seqSinAR-deNovo-meme1: width=15, sites=340, (b) seqSinAR-deNovo-meme2: width=15, sites=293, (c) seqSinAR-deNovo-meme3: width=15, sites=70, llr=3105, E=2.3e-95 llr=2698, E=1.6e-45 llr=821, E=4.1

Figure 82: De novo motifs for the filtered AR siFOXA1 without AR binding site overlaps sequences.

Table 86: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 12.1587 0.1249 0.0102 1436 234 0.00E00 0.05925 REST 6.2143 0.0008 0.0001 9 2 31.94E-04 0.00038 TLX1::NFIC 5.3926 0.0332 0.0061 322 116 27.26E-66 0.02281 ESR1 4.8326 0.0014 0.0003 17 5 79.13E-06 0.00076 CTCF 3.5917 0.0137 0.0038 173 87 94.19E-28 0.00745 ESR2 2.9800 0.0418 0.0140 460 296 1.18E-52 0.02997 GABPA 2.5654 0.1618 0.0630 1821 1381 12.96E-186 0.11431 ELK4 2.4466 0.0573 0.0234 656 515 4.56E-56 0.04332 PPARG 2.2668 0.0037 0.0016 41 33 3.338E-04 0.00291 Stat3 2.1736 0.3221 0.1482 2972 2776 24.92E-250 0.29645 E2F1 2.1541 0.0293 0.0136 354 315 14.71E-24 0.02051 Tcfcp2l1 2.0854 0.1259 0.0604 1409 1294 8.121E-98 0.09984 PPARG::RXRA 2.0492 0.0351 0.0171 439 399 5.938E-28 0.02409 Tal1::Gata1 2.0184 0.0693 0.0343 850 800 5.022E-52 0.04766 NHLH1 1.9174 0.2296 0.1197 1808 1797 78.86E-114 0.28692 ARMotifTH 1.8709 0.0832 0.0444 992 1020 2.715E-50 0.06278 Mycn 1.8324 0.1967 0.1073 1794 1888 4.309E-98 0.21755 STAT1 1.8204 0.0626 0.0344 624 661 7.466E-28 0.06236 EBF1 1.7845 1.1333 0.6350 7181 9457 0.00E00 1.47383 Egr1 1.7810 0.1453 0.0816 1580 1713 26.85E-78 0.13141 TFAP2A 1.7762 3.0707 1.7287 10152 14538 0.00E00 8.73052 Myf 1.7754 0.4148 0.2336 3563 4108 3.114E-218 0.50218 Myc 1.7625 0.2335 0.1325 2163 2393 5.14E-112 0.25381 RXR::RAR DR5 1.7376 0.0443 0.0255 548 597 1.852E-22 0.03272 MIZF 1.7254 0.0440 0.0255 536 585 7.871E-22 0.03873 GRMotifTH 1.7165 0.4219 0.2458 4276 5069 2.467E-282 0.34546 ARMotifHH 1.7103 0.5799 0.3391 5281 6255 0.00E00 0.52951 Esrrb 1.6580 0.1626 0.0980 1893 2161 7.413E-86 0.12836 Arnt 1.6570 0.2701 0.1630 2137 2512 4.116E-94 0.37463 Zfp423 1.6477 0.1035 0.0628 1017 1160 55.06E-40 0.11506 Pax5 1.6158 0.0854 0.0529 1022 1216 2.208E-34 0.06698 ARMotifT 1.5895 0.6743 0.4242 6099 7868 0.00E00 0.59613 NFIC 1.5670 3.7728 2.4076 12037 20163 0.00E00 5.46375 INSM1 1.5376 0.2800 0.1821 2925 3659 48.01E-130 0.25861 NR3C1 1.5229 0.4092 0.2687 4122 5400 35.76E-212 0.35927 AR 1.5099 0.0660 0.0437 804 1008 12.4E-22 0.05291 MYC::MAX 1.4993 0.0931 0.0621 900 1178 63.35E-22 0.10306 PLAG1 1.4831 0.0277 0.0187 330 418 2.963E-08 0.02525 NFKB1 1.4741 0.1110 0.0753 1018 1282 3.689E-28 0.15383 GRMotifT 1.4622 2.8614 1.9569 11730 19785 0.00E00 3.16362 ELK1 1.4591 1.2096 0.8290 8573 12623 0.00E00 1.20095 SP1 1.4389 1.9853 1.3797 7801 12032 0.00E00 10.74863 Klf4 1.4330 0.3477 0.2426 3253 4494 64.57E-118 0.39088 HIF1A::ARNT 1.4287 0.5780 0.4045 4576 6587 3.78E-206 0.80711 GRMotifTT 1.4240 0.0306 0.0215 384 505 3.197E-08 0.02496 ARMotifH 1.4188 10.6286 7.4915 12747 23598 0.00E00 22.36613 ARMotifTT 1.4063 0.0106 0.0075 135 175 13.22E-04 0.00888 Hand1::Tcfe2a 1.3953 1.6143 1.1570 9733 15585 0.00E00 1.69582 RXRA::VDR 1.3807 0.0090 0.0065 115 150 37.69E-04 0.00791 NR2F1 1.3633 0.1524 0.1118 1799 2439 3.527E-46 0.13297 Zfx 1.3473 0.2605 0.1934 2447 3677 54.63E-52 0.30727 FEV 1.3164 0.7757 0.5892 6683 10098 0.00E00 0.75243 Mafb 1.3129 2.6400 2.0109 11121 18689 0.00E00 3.93512 MZF1 1-4 1.3037 5.5818 4.2814 12283 22002 0.00E00 18.21716 RELA 1.3036 0.1391 0.1067 1486 2160 8.733E-26 0.15284 MAX 1.2929 0.2656 0.2054 2474 3803 36.52E-48 0.31471 Gata1 1.2890 0.7893 0.6124 6487 10267 0.00E00 0.81572 SPI1 1.2779 2.3253 1.8196 11048 18785 0.00E00 2.92798 GR 1.2766 0.3102 0.2429 3315 4847 40.46E-102 0.29771 TAL1::TCF3 1.2640 0.2957 0.2339 2678 3945 25.34E-66 0.37730 NFE2L2 1.2613 0.1626 0.1289 1892 2854 9.47E-32 0.14505 Myb 1.2535 0.9626 0.7679 7508 12218 0.00E00 0.98996 NR4A2 1.2089 2.1827 1.8055 10938 18894 0.00E00 2.65149 USF1 1.2066 0.6534 0.5415 4370 6814 2.059E-148 1.14673 GRMotifHH 1.2036 0.2580 0.2143 2738 4232 4.173E-56 0.26860 En1 0.8262 4.3439 5.2575 12323 23310 0.00E00 8.86479 Gfi 0.8240 0.9244 1.1218 7206 15093 10.93E-266 1.32976 FOXO3 0.7823 1.2910 1.6503 8687 17576 0.00E00 2.26194 FOXD1 0.7580 0.5947 0.7845 5385 12005 3.391E-68 0.89018 Pax6 0.7457 0.0084 0.0112 104 264 1.121E-02 0.01051 SOX9 0.7428 0.4802 0.6465 4617 10547 30.51E-30 0.71729 FOXF2 0.7413 0.0950 0.1281 1140 2777 2.712E-06 0.12372 Cebpa 0.7400 0.6903 0.9328 5784 13180 34.8E-84 1.12760 HOXA5 0.7146 5.0654 7.0882 12180 23256 0.00E00 18.53484 Sox5 0.6806 1.4890 2.1877 8867 18962 0.00E00 3.60063 Nobox 0.6688 1.1531 1.7242 7248 16261 5.923E-224 3.08919 Foxq1 0.6643 0.2626 0.3953 2713 7053 2.464E-04 0.45579 PBX1 0.6553 0.0829 0.1265 977 2514 7.391E-10 0.25456 HNF1B 0.6551 0.0564 0.0862 682 1928 32.67E-16 0.07999 Continued on next page. . .

133 motif ratio fC fR n1C n1R p1 var Nkx2-5 0.6401 4.3829 6.8468 11568 22764 0.00E00 22.70034 TBP 0.6181 0.1986 0.3214 2156 5799 65.53E-12 0.42825 SRY 0.6154 2.2998 3.7368 10178 21360 0.00E00 7.98643 HLF 0.6138 0.3400 0.5538 3307 9090 1.801E-04 0.65548 ARID3A 0.5842 2.4062 4.1188 9848 20775 0.00E00 11.16861 Pdx1 0.5742 1.7108 2.9795 8875 20413 0.00E00 5.93662 IRF1 0.5740 0.1607 0.2800 1774 5274 1.42E-26 0.31132 FOXA1pAR 0.5732 0.0442 0.0771 501 1606 5.032E-22 0.08394 Prrx2 0.5427 1.5343 2.8271 8370 19918 0.00E00 5.36920 NFIL3 0.5303 0.1135 0.2141 1186 3790 1.678E-36 0.29145 Ddit3::Cebpa 0.4921 0.2257 0.4586 2533 7488 1.524E-20 0.52184 Pou5f1 0.4889 0.0064 0.0132 82 307 2.298E-08 0.01107 MEF2A 0.4887 0.1775 0.3633 1738 5883 1.28E-52 0.49168 FOXL1 0.4570 3.8714 8.4706 10675 22275 0.00E00 62.86358 Lhx3 0.3879 0.1422 0.3666 1386 5506 1.072E-90 0.53512 Foxd3 0.3823 1.0537 2.7565 5849 17004 2.701E-16 11.23117

134 12.30 AR siFOXA1 without AR binding site overlaps (up)

Chromosome specific statistics are shown in Table 87. A histogram of sequence lengths is shown in Figure 83.

length chromosome frequency min mean max total coverage 1 65 237 520 1404 33774 0.000136 2 30 213 479 908 14378 5.9e-05 3 22 145 538 1225 11833 6e-05 4 20 148 402 595 8050 4.2e-05 5 19 260 502 783 9535 5.3e-05 6 33 65 406 819 13406 7.8e-05 7 9 286 413 577 3720 2.3e-05 8 11 260 444 675 4884 3.3e-05 9 4 254 374 496 1498 1.1e-05 10 30 281 544 1099 16305 0.00012 11 29 35 457 1175 13267 9.8e-05 12 10 280 478 710 4784 3.6e-05 13 7 329 441 686 3085 2.7e-05 14 6 303 468 699 2808 2.6e-05 15 29 8 484 952 14030 0.000137 16 4 382 572 931 2287 2.5e-05 18 1 433 433 433 433 6e-06 19 9 39 420 884 3780 6.4e-05 20 16 328 540 1424 8643 0.000137 21 8 312 491 965 3930 8.2e-05 22 3 374 434 508 1303 2.5e-05 all 21 365 8 481 1424 175733 2.4e-05

Table 87: Chromosome specific distribution of the regions. The last line represents the overall statistics. 150 100 Frequency 50 0

0 500 1000 1500

length (base pairs)

Figure 83: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

135 The following table shows the properties of seqSinARu-gCount component. property value genes 78

(a) seqSinARu-deNovo-meme1: width=15, (b) seqSinARu-deNovo-meme2: width=15, sites=37, (c) seqSinARu-deNovo-meme3: width=15, sites=41, sites=207, llr=1842, E=1.7e-21 llr=462, E=370 llr=493, E=3300000

Figure 84: De novo motifs for the filtered AR siFOXA1 without AR binding site overlaps (up) sequences.

Table 88: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 12.2744 0.1071 0.0087 36 6 2.652E-12 0.04657 TLX1::NFIC 10.6462 0.0467 0.0043 14 3 82.15E-06 0.02431 ESR2 3.7807 0.0330 0.0087 12 5 35.5E-04 0.01868 Tal1::Gata1 3.5681 0.0879 0.0246 30 17 19.95E-06 0.04813 RXR::RAR DR5 3.3840 0.0687 0.0203 23 14 4.723E-04 0.03943 ELK4 3.0611 0.0577 0.0188 18 11 25.57E-04 0.04071 GABPA 2.7599 0.1758 0.0637 56 39 5.214E-08 0.11854 Zfp423 2.3492 0.1429 0.0608 40 33 1.079E-04 0.12868 MIZF 2.1194 0.0522 0.0246 19 17 2.465E-02 0.03299 Mycn 1.9991 0.2170 0.1085 55 58 2.061E-04 0.21586 NFKB1 1.8979 0.1731 0.0912 37 45 2.232E-02 0.20963 Arnt 1.8458 0.2912 0.1577 64 71 98.12E-06 0.35786 TFAP2A 1.8209 3.1676 1.7395 285 413 19.02E-42 9.63767 NHLH1 1.7926 0.2335 0.1302 54 54 98.73E-06 0.31117 Tcfcp2l1 1.7883 0.1346 0.0753 48 43 55.72E-06 0.10942 Stat3 1.7773 0.2830 0.1592 80 84 94.42E-08 0.25996 Myf 1.7662 0.4780 0.2706 112 134 1.893E-08 0.55737 GRMotifTH 1.7497 0.4533 0.2590 125 151 7.677E-10 0.36607 Myc 1.7477 0.2555 0.1462 65 76 2.296E-04 0.28494 Zfx 1.7243 0.3269 0.1896 82 103 54.89E-06 0.36315 Egr1 1.7172 0.1566 0.0912 51 56 8.195E-04 0.12747 ARMotifHH 1.6573 0.6236 0.3763 153 205 13.44E-12 0.58652 HIF1A::ARNT 1.6564 0.6209 0.3748 147 184 3.831E-12 0.66228 RELA 1.6434 0.1951 0.1187 53 68 63.2E-04 0.20950 SP1 1.6069 2.1511 1.3386 228 353 12.09E-22 5.57194 NFIC 1.5597 3.9231 2.5152 337 583 2.359E-56 6.00285 ELK1 1.5232 1.2830 0.8423 252 373 15.85E-30 1.22388 EBF1 1.5232 1.1044 0.7250 197 292 2.184E-16 1.59023 Hand1::Tcfe2a 1.5139 1.7418 1.1505 285 457 35.93E-38 1.72051 INSM1 1.5009 0.2802 0.1867 85 116 2.268E-04 0.24328 Klf4 1.4992 0.3819 0.2547 96 135 77.74E-06 0.45251 ARMotifT 1.4484 0.6456 0.4457 161 236 75.32E-12 0.61814 GRMotifHH 1.4415 0.2775 0.1925 78 116 34.97E-04 0.26765 ARMotifH 1.4111 11.0604 7.8379 364 683 13.49E-68 26.82575 GRMotifT 1.3872 2.8929 2.0854 333 588 23.93E-54 3.35506 SPI1 1.3871 2.4973 1.8003 317 558 5.743E-46 2.86646 Arnt::Ahr 1.3566 1.7747 1.3082 269 419 75.61E-34 4.52063 MZF1 1-4 1.3474 5.9121 4.3878 350 648 4.804E-60 14.06561 NF-kappaB 1.3266 0.3379 0.2547 77 124 1.546E-02 0.48601 GR 1.2929 0.3462 0.2677 96 156 22.9E-04 0.33901 FEV 1.2907 0.7995 0.6194 199 300 2.62E-16 0.76754 Gata1 1.2810 0.7582 0.5919 186 303 3.746E-12 0.70990 NR4A2 1.2528 2.3297 1.8596 312 548 6.859E-44 2.78795 ELF5 1.2304 2.3736 1.9291 322 542 2.759E-50 3.09281 ZNF354C 1.2297 4.2692 3.4718 354 620 45.84E-66 7.76416 Mafb 1.2174 2.7308 2.2431 318 567 7.783E-46 4.33346 USF1 1.2023 0.6264 0.5210 125 193 5.688E-06 0.99837 Sox17 0.8271 1.2005 1.4515 239 503 57.37E-16 1.81452 Gfi 0.7993 0.9231 1.1548 207 442 32.2E-10 1.30084 FOXO3 0.7600 1.2857 1.6918 248 523 83.16E-18 2.14700 HOXA5 0.7154 5.2363 7.3198 345 678 1.731E-54 18.38615 Cebpa 0.6774 0.6951 1.0260 161 390 67.78E-04 1.28539 Nobox 0.6769 1.2088 1.7858 210 482 3.772E-08 3.10988 FOXD1 0.6547 0.5742 0.8770 151 370 2.804E-02 1.02220 Nkx2-5 0.6064 4.2857 7.0680 328 659 44.85E-46 23.38869 SRY 0.5882 2.2610 3.8437 283 624 1.902E-24 8.29274 ARID3A 0.5672 2.3819 4.1997 266 602 61.47E-20 11.13302 Pdx1 0.5662 1.7610 3.1100 244 592 3.012E-12 6.44754 Sox5 0.5623 1.3077 2.3256 238 573 23.47E-12 3.98322 Prrx2 0.5448 1.5714 2.8842 239 578 20.07E-12 5.72559 Lhx3 0.5129 0.1841 0.3589 41 157 23.27E-04 0.55308 HNF1B 0.4290 0.0385 0.0897 13 57 1.325E-02 0.07830 FOXL1 0.4281 3.7665 8.7988 298 646 6.627E-30 56.15358 Pou5f1 0.1377 0.0027 0.0203 1 14 4.775E-02 0.01403

136 12.31 AR siFOXA1 without AR binding site overlaps (down)

Chromosome specific statistics are shown in Table 89. A histogram of sequence lengths is shown in Figure 85.

length chromosome frequency min mean max total coverage 1 14 327 541 817 7577 3e-05 2 1 526 526 526 526 2e-06 3 20 277 528 1311 10559 5.3e-05 4 10 266 467 681 4673 2.4e-05 6 3 478 526 616 1579 9e-06 7 3 398 545 688 1635 1e-05 8 1 323 323 323 323 2e-06 9 6 245 424 741 2543 1.8e-05 10 1 320 320 320 320 2e-06 11 4 293 370 592 1482 1.1e-05 12 4 258 416 525 1665 1.2e-05 13 3 390 601 822 1802 1.6e-05 14 5 405 549 700 2747 2.6e-05 16 5 250 421 617 2107 2.3e-05 18 5 247 375 682 1875 2.4e-05 19 6 273 370 530 2223 3.8e-05 all 16 91 245 480 1311 43636 6e-06

Table 89: Chromosome specific distribution of the regions. The last line represents the overall statistics. 25 20 15 10 Frequency 5 0

200 400 600 800 1000 1200 1400

length (base pairs)

Figure 85: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

137 The following table shows the properties of seqSinARd-gCount component. property value genes 33

(a) seqSinARd-deNovo-meme1: width=15, sites=37, (b) seqSinARd-deNovo-meme2: width=8, sites=19, (c) seqSinARd-deNovo-meme3: width=11, sites=15, llr=385, E=2e+05 llr=200, E=3400000 llr=177, E=5500000

Figure 86: De novo motifs for the filtered AR siFOXA1 without AR binding site overlaps (down) sequences.

Table 90: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 4.6625 0.1099 0.0235 9 3 66.71E-04 0.06634 GABPA 3.5001 0.1648 0.0471 13 7 50.74E-04 0.10374 Stat3 2.8362 0.4505 0.1588 27 23 4.676E-04 0.41648 NHLH1 2.2827 0.2418 0.1059 13 10 2.748E-02 0.30719 Myf 2.2511 0.5165 0.2294 30 36 49.7E-04 0.47563 FEV 2.0015 0.9890 0.4941 57 63 53.63E-10 0.75385 EBF1 1.8512 1.2088 0.6529 56 66 3.839E-08 1.47642 ARMotifHH 1.7623 0.5495 0.3118 37 39 1.032E-04 0.50136 TFAP2A 1.7361 2.8901 1.6647 71 109 2.37E-10 7.37613 GR 1.6345 0.3846 0.2353 31 33 10.83E-04 0.29019 Arnt 1.5933 0.3187 0.2000 20 20 1.692E-02 0.39151 SPI1 1.5700 2.6044 1.6588 83 129 20.23E-16 3.23448 TAL1::TCF3 1.5239 0.3407 0.2235 22 27 4.238E-02 0.40292 HIF1A::ARNT 1.5161 0.6154 0.4059 38 51 14.54E-04 0.65052 GRMotifTH 1.4706 0.4066 0.2765 30 40 1.326E-02 0.32679 NFIC 1.4624 3.4066 2.3294 84 139 35.88E-16 5.10109 ELF5 1.4258 2.3736 1.6647 79 130 78.54E-14 2.55759 ARMotifT 1.4114 0.7473 0.5294 50 68 14.7E-06 0.60136 Myb 1.3572 0.9341 0.6882 57 82 1.018E-06 0.87563 MZF1 1-4 1.3559 5.9341 4.3765 90 157 7.156E-18 16.01273 ARMotifH 1.3507 10.3846 7.6882 91 170 12.59E-18 20.69596 SP1 1.3402 2.1758 1.6235 54 94 1.247E-04 9.56605 Gata1 1.3225 0.8791 0.6647 49 78 2.639E-04 0.82417 ETS1 1.2585 8.1209 6.4529 90 169 58.64E-18 15.80265 ELK1 1.2531 1.1868 0.9471 61 104 2.512E-06 1.22983 GRMotifT 1.2301 2.9451 2.3941 82 147 30.67E-14 3.34350 Hand1::Tcfe2a 1.2247 1.2967 1.0588 63 103 33.31E-08 1.25290 Gfi 0.8086 0.9560 1.1824 54 109 11.1E-04 1.51618 HOXA5 0.7932 5.6923 7.1765 90 167 41.92E-18 17.91789 Sox5 0.7797 1.6923 2.1706 63 135 48.82E-06 3.48075 Nkx2-5 0.7129 4.9780 6.9824 84 163 26.1E-14 23.43469 Nobox 0.6365 1.1868 1.8647 59 116 87.19E-06 3.12673 Prrx2 0.6356 1.8132 2.8529 61 139 3.461E-04 6.37395 Pdx1 0.6335 1.9451 3.0706 69 147 1.283E-06 6.97294 SRY 0.6173 2.4945 4.0412 73 155 6.982E-08 7.67403 ARID3A 0.6141 2.6044 4.2412 70 151 83.64E-08 10.40640 FOXL1 0.4408 3.8242 8.6765 83 160 67.49E-14 63.99976 Lhx3 0.3063 0.1099 0.3588 7 36 4.235E-02 0.64495

138 12.32 FoxA1 and FoxA1 siFOXA1 binding site overlaps

Chromosome specific statistics are shown in Table 91. A histogram of sequence lengths is shown in Figure 87.

length chromosome frequency min mean max total coverage 1 154 113 327 750 50309 0.000202 10 168 9 280 561 47122 0.000348 11 217 47 294 667 63737 0.000472 12 160 118 284 717 45461 0.00034 13 149 14 342 919 50901 0.000442 14 148 101 308 731 45624 0.000425 15 171 158 299 749 51213 0.000499 16 129 124 275 495 35469 0.000393 17 164 15 285 665 46734 0.000576 18 86 21 318 804 27380 0.000351 19 60 10 259 423 15536 0.000263 2 326 69 299 599 97486 0.000401 20 98 6 283 492 27729 0.00044 21 64 168 292 478 18715 0.000389 22 66 161 265 483 17511 0.000341 3 376 2 311 775 116755 0.00059 4 184 39 284 650 52260 0.000273 5 260 120 287 627 74691 0.000413 6 224 40 277 566 61995 0.000362 7 233 4 300 878 69994 0.00044 8 271 15 298 564 80720 0.000552 9 184 28 292 666 53743 0.000381 X 45 97 267 397 12017 7.7e-05 all 23 3937 2 295 919 1163102 0.000158

Table 91: Chromosome specific distribution of the regions. The last line represents the overall statistics. 2000 1000 Frequency 500 0

0 200 400 600 800 1000

length (base pairs)

Figure 87: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

139 The following table shows the properties of seqFXSi-gCount component. property value genes 9347

(a) seqFXSi-deNovo-meme1: width=15, sites=397, (b) seqFXSi-deNovo-meme2: width=15, sites=53, (c) seqFXSi-deNovo-meme3: width=15, sites=58, llr=3616, E=1.99997773436537e-319 llr=622, E=2.8e-05 llr=658, E=0.065

Figure 88: De novo motifs for the filtered FoxA1 and FoxA1 siFOXA1 binding site overlaps sequences.

Table 92: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 6.0284 0.0363 0.0060 137 42 6.352E-32 0.01790 Foxa2 4.9657 1.1795 0.2375 3148 1326 0.00E00 0.75573 FOXA1 4.9521 1.2328 0.2489 3143 1548 0.00E00 0.69471 TLX1::NFIC 4.6867 0.0193 0.0041 58 25 35.87E-12 0.01339 FOXF2 4.4124 0.3869 0.0876 1332 606 2.164E-290 0.19982 CTCF 3.2015 0.0084 0.0026 33 19 26.56E-06 0.00459 FOXD1 2.8706 1.4428 0.5026 3223 2741 0.00E00 1.00263 Foxq1 2.3686 0.6009 0.2537 1782 1527 8.094E-246 0.44017 FOXA1pAR 2.1195 0.1045 0.0493 391 329 99.22E-34 0.07411 GABPA 2.1195 0.0788 0.0372 289 267 9.889E-20 0.05397 Tal1::Gata1 1.9917 0.0483 0.0242 185 176 9.081E-12 0.03280 ELK4 1.8449 0.0239 0.0129 91 92 20.23E-06 0.01771 Stat3 1.7384 0.1645 0.0946 489 547 28.41E-24 0.16704 E2F1 1.7245 0.0127 0.0073 49 54 79.95E-04 0.00931 Evi1 1.5772 0.0333 0.0211 123 152 3.401E-04 0.02684 Gata1 1.5743 0.6042 0.3838 1683 2244 1.776E-112 0.54419 ESR2 1.5680 0.0135 0.0086 50 58 1.451E-02 0.01160 NHLH1 1.5057 0.1101 0.0731 280 340 12.47E-10 0.15164 FOXI1 1.5033 0.7613 0.5064 1932 2506 75.35E-160 0.93060 FOXO3 1.5015 1.5463 1.0298 3019 4363 0.00E00 1.62033 STAT1 1.4742 0.0353 0.0240 113 137 3.869E-04 0.03902 NFIC 1.4719 2.1878 1.4864 3375 5227 0.00E00 2.42572 MIZF 1.4354 0.0229 0.0159 90 116 65.05E-04 0.01819 ARMotifTH 1.4232 0.0391 0.0275 152 199 3.609E-04 0.03145 Tcfcp2l1 1.4161 0.0511 0.0361 185 241 47.6E-06 0.04918 ARMotifT 1.4006 0.3917 0.2797 1226 1749 4.292E-50 0.34978 NFYA 1.3735 0.0488 0.0355 180 237 95.86E-06 0.04688 AR 1.3550 0.0437 0.0323 163 228 17.43E-04 0.03813 TEAD1 1.3271 0.0574 0.0433 218 310 3.13E-04 0.04873 GRMotifTH 1.3103 0.2044 0.1560 716 1038 8.325E-18 0.18517 NR3C1 1.3096 0.2100 0.1603 710 1064 19.98E-16 0.19591 Myb 1.2961 0.6235 0.4811 1746 2702 1.336E-88 0.59160 FEV 1.2961 0.4812 0.3713 1462 2216 11.19E-62 0.44360 TAL1::TCF3 1.2851 0.1851 0.1440 533 784 1.35E-10 0.23059 ELK1 1.2660 0.6444 0.5090 1795 2813 4.577E-92 0.62307 Arnt 1.2582 0.1296 0.1030 354 502 34.4E-08 0.19526 Esrrb 1.2519 0.0770 0.0615 288 433 2.206E-04 0.06883 Hand1::Tcfe2a 1.2423 0.8856 0.7128 2264 3616 70.66E-162 0.86367 ARMotifHH 1.2390 0.2539 0.2050 845 1250 5.49E-22 0.25945 Myf 1.2204 0.1754 0.1437 537 849 11.19E-08 0.22124 GRMotifT 1.2156 1.5160 1.2471 3006 5100 0.00E00 1.53257 Nr2e3 1.2155 0.3648 0.3001 866 1379 30.69E-18 0.53928 NR2F1 1.2087 0.0788 0.0652 291 459 17.22E-04 0.07232 YY1 0.8274 2.3302 2.8163 3434 6645 0.00E00 4.15703 CREB1 0.8133 0.4728 0.5814 1404 2957 6.698E-16 0.66068 MZF1 5-13 0.7834 0.7684 0.9809 1977 4246 30.96E-52 1.18687 Foxd3 0.7831 1.3460 1.7187 2431 4294 15.43E-164 6.24020 Pdx1 0.7718 1.4718 1.9070 2669 5531 2.064E-164 3.20799 FOXL1 0.7317 4.0597 5.5487 3481 6483 0.00E00 40.55307 Prrx2 0.7252 1.3030 1.7968 2527 5369 75.99E-130 2.81174 NF-kappaB 0.7076 0.1136 0.1606 339 940 4.98E-06 0.20746 Zfx 0.6886 0.0806 0.1170 274 749 29.0E-06 0.12518 MEF2A 0.6801 0.1607 0.2363 503 1265 49.25E-04 0.38410 PLAG1 0.6311 0.0076 0.0121 23 87 33.55E-04 0.01469 IRF1 0.6278 0.1139 0.1814 410 1124 8.558E-06 0.18720 Lhx3 0.6254 0.1546 0.2471 471 1200 18.05E-04 0.37897 Ddit3::Cebpa 0.5282 0.1510 0.2859 533 1611 5.66E-10 0.29725

140 12.33 FoxA1 and FoxA1 siFOXA1 binding site overlaps (up)

Chromosome specific statistics are shown in Table 93. A histogram of sequence lengths is shown in Figure 89.

length chromosome frequency min mean max total coverage 1 11 176 371 750 4080 1.6e-05 10 10 223 304 426 3035 2.2e-05 11 10 195 263 335 2629 1.9e-05 12 10 182 293 398 2933 2.2e-05 13 2 322 340 359 681 6e-06 14 4 267 336 436 1344 1.3e-05 15 5 195 308 613 1539 1.5e-05 16 6 219 327 495 1962 2.2e-05 17 7 229 363 665 2541 3.1e-05 19 3 302 365 419 1095 1.9e-05 2 17 69 283 456 4818 2e-05 20 7 211 326 432 2285 3.6e-05 21 3 281 347 411 1042 2.2e-05 22 1 206 206 206 206 4e-06 3 17 191 316 442 5375 2.7e-05 4 9 205 278 357 2501 1.3e-05 5 16 122 270 353 4321 2.4e-05 6 21 149 315 564 6607 3.9e-05 7 17 149 310 461 5272 3.3e-05 8 13 200 306 477 3973 2.7e-05 9 10 206 312 666 3121 2.2e-05 X 3 219 244 266 732 5e-06 all 22 202 69 307 750 62092 8e-06

Table 93: Chromosome specific distribution of the regions. The last line represents the overall statistics. 80 60 40 Frequency 20 0

0 200 400 600 800

length (base pairs)

Figure 89: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

141 The following table shows the properties of seqFXSiu-gCount component. property value genes 119

(a) seqFXSiu-deNovo-meme1: width=11, sites=177, (b) seqFXSiu-deNovo-meme2: width=15, sites=35, (c) seqFXSiu-deNovo-meme3: width=15, sites=64, llr=1357, E=2.3e-49 llr=400, E=8.9 llr=639, E=0.3

Figure 90: De novo motifs for the filtered FoxA1 and FoxA1 siFOXA1 binding site overlaps (up) sequences.

Table 94: Motif enrichments

motif ratio fC fR n1C n1R p1 var TLX1::NFIC 595.0594 0.0297 0.0000 4 0 2.386E-02 0.01682 Ar 9.5976 0.0743 0.0077 14 3 59.15E-06 0.03291 FOXA1 4.8054 1.1584 0.2410 155 81 32.5E-50 0.62313 Foxa2 3.9150 1.0743 0.2744 148 81 2.011E-44 0.73211 FOXF2 3.5044 0.3416 0.0974 60 35 5.704E-12 0.19232 Tcfcp2l1 2.6507 0.0545 0.0205 10 7 4.992E-02 0.03789 FOXA1pAR 2.6403 0.1287 0.0487 26 17 1.597E-04 0.07712 FOXD1 2.5321 1.2921 0.5103 150 141 11.36E-32 0.92820 Esrrb 2.4968 0.1089 0.0436 20 17 92.15E-04 0.06841 Foxq1 2.3930 0.5891 0.2462 90 75 1.163E-14 0.44148 Stat3 1.9300 0.1337 0.0692 24 21 42.78E-04 0.11349 NFE2L2 1.9299 0.1089 0.0564 22 21 1.365E-02 0.07230 GRMotifHH 1.8854 0.2079 0.1103 35 38 30.9E-04 0.18747 Myf 1.8190 0.2426 0.1333 37 40 19.32E-04 0.25003 FEV 1.6041 0.5594 0.3487 77 105 6.329E-06 0.55883 GRMotifTH 1.5934 0.2574 0.1615 43 56 60.76E-04 0.21770 FOXO3 1.5728 1.5446 0.9821 149 231 28.67E-20 1.60250 EBF1 1.5560 0.5347 0.3436 73 100 21.55E-06 0.59742 NFIC 1.4416 2.2178 1.5385 168 286 4.635E-24 2.59688 ELK1 1.4266 0.7426 0.5205 98 159 85.97E-08 0.63708 ARMotifT 1.4138 0.4604 0.3256 75 105 20.41E-06 0.38620 Myb 1.3977 0.6881 0.4923 99 148 5.392E-08 0.58872 Gata1 1.3824 0.5743 0.4154 81 126 45.18E-06 0.55745 SOX9 1.3514 0.4851 0.3590 73 117 5.596E-04 0.42355 GRMotifT 1.3496 1.7475 1.2949 169 286 1.291E-24 1.47628 SPI1 1.3489 1.5842 1.1744 153 261 1.692E-18 1.65408 TFAP2A 1.3435 1.4158 1.0538 130 177 2.97E-16 2.85005 ARMotifHH 1.3382 0.3020 0.2256 52 65 6.004E-04 0.30710 ARMotifH 1.3296 6.5594 4.9333 201 376 3.93E-38 10.07093 Nr2e3 1.3089 0.3960 0.3026 47 76 4.313E-02 0.58169 ELF5 1.3025 1.5396 1.1821 141 260 22.79E-14 1.72127 Sox17 1.2489 1.0792 0.8641 126 213 27.84E-12 1.11453 BRCA1 1.2419 1.6782 1.3513 158 277 13.48E-20 1.83956 FOXI1 1.2327 0.6733 0.5462 93 125 2.026E-08 0.93952 GRMotifH 1.2025 4.4307 3.6846 198 372 3.236E-36 6.69342 SP1 0.8207 0.7723 0.9410 72 149 4.241E-02 2.96947 Nkx2-5 0.7825 3.6337 4.6436 184 365 24.52E-28 12.29117 MZF1 5-13 0.7714 0.8168 1.0590 102 233 13.23E-04 1.38015 Gfi 0.7478 0.6040 0.8077 85 201 4.805E-02 0.91780 ARID3A 0.7410 2.1089 2.8462 151 316 9.991E-14 6.70508 Prrx2 0.6651 1.2723 1.9128 129 296 34.06E-08 3.05188 Pdx1 0.6357 1.3267 2.0872 129 306 95.17E-08 3.44911 Foxd3 0.6148 1.1980 1.9487 122 232 1.349E-08 9.15744 FOXL1 0.6132 3.7030 6.0385 173 358 17.84E-22 46.47116 MEF2A 0.4380 0.1089 0.2487 17 72 1.775E-02 0.27594 IRF1 0.4181 0.1040 0.2487 17 78 71.69E-04 0.25800

142 12.34 FoxA1 and FoxA1 siFOXA1 binding site overlaps (down)

Chromosome specific statistics are shown in Table 95. A histogram of sequence lengths is shown in Figure 91.

length chromosome frequency min mean max total coverage 1 16 249 353 506 5646 2.3e-05 2 7 214 326 444 2280 9e-06 3 14 199 312 429 4370 2.2e-05 4 7 212 298 513 2084 1.1e-05 5 2 296 332 368 664 4e-06 6 11 40 242 373 2662 1.6e-05 7 4 242 288 348 1151 7e-06 8 13 15 269 372 3494 2.4e-05 9 2 221 313 405 626 4e-06 10 5 253 308 358 1540 1.1e-05 11 8 196 254 365 2031 1.5e-05 12 9 192 244 389 2199 1.6e-05 13 2 395 510 626 1021 9e-06 14 8 195 366 627 2931 2.7e-05 15 7 185 288 378 2014 2e-05 16 9 141 258 337 2320 2.6e-05 17 7 15 235 512 1646 2e-05 18 2 166 206 246 412 5e-06 19 1 195 195 195 195 3e-06 20 2 222 252 282 504 8e-06 21 1 457 457 457 457 9e-06 22 4 198 259 339 1035 2e-05 all 22 141 15 293 627 41282 6e-06

Table 95: Chromosome specific distribution of the regions. The last line represents the overall statistics. 30 20 Frequency 10 5 0

0 100 200 300 400 500 600

length (base pairs)

Figure 91: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

143 The following table shows the properties of seqFXSid-gCount component. property value genes 82

(a) seqFXSid-deNovo-meme1: width=12, sites=117, (b) seqFXSid-deNovo-meme2: width=15, sites=48, (c) seqFXSid-deNovo-meme3: width=11, sites=10, llr=985, E=3.3e-55 llr=500, E=1.7e-07 llr=129, E=7600000

Figure 92: De novo motifs for the filtered FoxA1 and FoxA1 siFOXA1 binding site overlaps (down) sequences.

Table 96: Motif enrichments

motif ratio fC fR n1C n1R p1 var FOXA1 4.5516 1.2128 0.2664 107 58 7.47E-32 0.73684 Foxa2 3.8447 1.1135 0.2896 108 54 9.646E-34 0.70035 FOXF2 2.8251 0.2837 0.1004 38 21 13.12E-08 0.18323 FOXD1 2.6949 1.2695 0.4710 113 91 9.967E-28 0.81829 Foxq1 2.5333 0.5674 0.2239 55 52 45.4E-08 0.41201 Gata1 1.9461 0.6312 0.3243 63 64 6.223E-08 0.54681 Stat3 1.9045 0.1986 0.1042 21 20 1.903E-02 0.19909 NFIC 1.5653 2.2482 1.4363 121 181 6.004E-20 2.47167 FOXI1 1.5232 0.7234 0.4749 75 89 87.25E-10 0.69784 Myb 1.4517 0.6950 0.4788 67 98 29.08E-06 0.61351 FOXO3 1.4390 1.5390 1.0695 107 160 1.926E-14 1.50353 SP1 1.4336 0.9078 0.6332 52 90 1.928E-02 2.16752 ELK1 1.2911 0.6879 0.5328 70 105 16.92E-06 0.59383 SOX9 1.2060 0.4610 0.3822 50 72 27.1E-04 0.48812 Nkx2-5 0.8126 3.4043 4.1892 123 225 23.57E-18 11.30811 Foxd3 0.8004 1.2979 1.6216 84 159 12.85E-06 4.76185 FOXL1 0.7479 3.7021 4.9498 124 224 5.291E-18 19.57383 Prrx2 0.6825 1.1489 1.6834 87 184 44.03E-06 2.50123

144 12.35 FoxA1 without FoxA1 siFOXA1 binding site overlaps

Chromosome specific statistics are shown in Table 97. A histogram of sequence lengths is shown in Figure 93.

length chromosome frequency min mean max total coverage 1 1690 7 400 1124 675216 0.002709 10 649 15 364 922 236099 0.001742 11 655 6 369 1109 241784 0.001791 12 582 9 355 872 206470 0.001543 13 416 5 390 1056 162264 0.001409 14 482 38 385 1283 185577 0.001729 15 420 34 364 740 152697 0.001489 16 321 2 368 1574 118040 0.001306 17 471 1 358 806 168774 0.002079 18 309 2 361 773 111424 0.001427 19 133 14 303 564 40340 0.000682 2 1097 27 361 1237 395717 0.001627 20 336 0 367 1050 123373 0.001958 21 149 174 374 1203 55729 0.001158 22 113 18 311 723 35158 0.000685 3 1202 39 404 1089 485859 0.002454 4 727 2 361 833 262254 0.001372 5 780 9 374 998 292018 0.001614 6 790 12 350 837 276667 0.001617 7 750 22 370 878 277657 0.001745 8 690 11 373 957 257553 0.00176 9 595 3 367 852 218591 0.001548 X 295 83 325 710 95943 0.000618 Y 16 214 336 461 5370 9e-05 all 24 13668 0 372 1574 5080574 0.000691

Table 97: Chromosome specific distribution of the regions. The last line represents the overall statistics. 3000 Frequency 1000 0

0 500 1000 1500

length (base pairs)

Figure 93: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

145 The following table shows the properties of seqFXnSi-gCount component. property value genes 20533

(a) seqFXnSi-deNovo-meme1: width=11, sites=347, (b) seqFXnSi-deNovo-meme2: width=15, sites=117, (c) seqFXnSi-deNovo-meme3: width=15, sites=66, llr=2864, E=2.1e-81 llr=1203, E=1.1e-09 llr=732, E=1e+05

Figure 94: De novo motifs for the filtered FoxA1 without FoxA1 siFOXA1 binding site overlaps sequences.

Table 98: Motif enrichments

motif ratio fC fR n1C n1R p1 var CTCF 5.9323 0.0190 0.0032 258 79 1.504E-58 0.00882 Foxa2 3.9683 1.1828 0.2980 10306 5791 0.00E00 0.81108 FOXA1 3.9442 1.2467 0.3160 10461 6630 0.00E00 0.78625 FOXF2 3.5299 0.3895 0.1103 4513 2600 0.00E00 0.22324 TLX1::NFIC 3.1832 0.0157 0.0049 162 96 40.01E-22 0.01290 Ar 2.7369 0.0230 0.0084 289 204 33.88E-30 0.01516 FOXD1 2.5330 1.6262 0.6420 11379 11292 0.00E00 1.27297 Foxq1 2.1610 0.7090 0.3280 6774 6534 0.00E00 0.57734 FOXA1pAR 1.8502 0.1194 0.0645 1486 1457 1.907E-90 0.09846 FOXI1 1.6556 1.0771 0.6506 8177 10397 0.00E00 1.25942 E2F1 1.6071 0.0162 0.0100 213 251 33.26E-08 0.01280 ELK4 1.5817 0.0277 0.0175 343 426 26.63E-10 0.02390 Stat3 1.5581 0.1773 0.1138 1889 2356 35.09E-66 0.18071 GABPA 1.5553 0.0723 0.0465 895 1129 5.447E-24 0.06205 Tal1::Gata1 1.5373 0.0458 0.0298 600 745 2.69E-16 0.03698 FOXO3 1.4795 1.9900 1.3451 11467 17427 0.00E00 2.22471 STAT1 1.4016 0.0389 0.0277 410 580 2.178E-06 0.04386 HNF1B 1.3407 0.0907 0.0676 1158 1620 16.05E-22 0.08025 ARMotifTH 1.3246 0.0459 0.0346 612 862 9.098E-10 0.03963 MIZF 1.3240 0.0232 0.0175 306 443 2.158E-04 0.01994 NKX3-1 1.3166 0.8670 0.6585 7056 10430 0.00E00 1.18010 Gata1 1.3068 0.6392 0.4891 6062 9455 18.8E-300 0.62377 ESR2 1.3016 0.0127 0.0098 152 230 3.715E-02 0.01283 ARMotifTT 1.2974 0.0086 0.0066 117 163 1.438E-02 0.00770 AR 1.2665 0.0447 0.0353 590 883 82.86E-08 0.03924 NHLH1 1.2640 0.1184 0.0937 1051 1503 42.63E-18 0.18159 Evi1 1.2555 0.0361 0.0287 471 705 20.22E-06 0.03313 ARMotifT 1.2425 0.4268 0.3435 4574 7131 18.28E-154 0.40892 Myb 1.2422 0.7445 0.5993 6834 10997 0.00E00 0.75418 TAL1::TCF3 1.2386 0.2279 0.1839 2239 3495 53.72E-36 0.28205 RXR::RAR DR5 1.2114 0.0236 0.0195 314 490 54.55E-04 0.02143 Nr2e3 1.2100 0.4460 0.3686 3623 5743 5.641E-86 0.66250 GRMotifTH 1.2007 0.2311 0.1925 2740 4374 2.671E-46 0.21911 YY1 0.8307 2.9553 3.5576 12454 24051 0.00E00 5.56789 MZF1 1-4 0.8277 2.7519 3.3247 11651 22497 0.00E00 7.88976 INSM1 0.7892 0.1151 0.1459 1407 3218 71.97E-04 0.16007 Klf4 0.7815 0.1453 0.1859 1610 3807 6.993E-04 0.25278 MZF1 5-13 0.7756 0.9468 1.2207 7714 16621 2.615E-268 1.59664 NFKB1 0.7704 0.0420 0.0545 430 994 25.27E-04 0.08507 CREB1 0.7660 0.5598 0.7308 5476 12081 5.266E-64 0.85676 SP1 0.7237 0.7805 1.0785 4579 11067 89.48E-14 4.03171 RREB1 0.7209 0.0252 0.0350 262 603 1.357E-02 0.08858 NF-kappaB 0.6654 0.1246 0.1873 1281 3700 1.031E-22 0.24144 PLAG1 0.6472 0.0094 0.0145 126 345 4.403E-04 0.01439 Zfx 0.5477 0.0816 0.1491 928 3144 8.906E-42 0.16811 Ddit3::Cebpa 0.5401 0.1964 0.3636 2389 6747 1.068E-16 0.40066 EWSR1-FLI1 0.1548 0.0038 0.0249 23 111 28.61E-06 0.22110

146 12.36 FoxA1 without FoxA1 siFOXA1 binding site overlaps (up)

Chromosome specific statistics are shown in Table 99. A histogram of sequence lengths is shown in Figure 95.

length chromosome frequency min mean max total coverage 1 120 54 421 1123 50570 0.000203 10 33 15 348 594 11481 8.5e-05 11 31 195 326 563 10107 7.5e-05 12 15 190 357 530 5356 4e-05 13 8 231 322 420 2574 2.2e-05 14 11 206 331 508 3637 3.4e-05 15 22 187 329 462 7229 7.1e-05 16 14 162 314 533 4396 4.9e-05 17 21 37 345 575 7255 8.9e-05 18 2 187 232 277 464 6e-06 19 6 241 339 458 2036 3.4e-05 2 34 39 343 632 11657 4.8e-05 20 15 253 401 620 6019 9.6e-05 21 10 192 366 510 3663 7.6e-05 22 4 152 245 317 981 1.9e-05 3 34 226 412 644 14011 7.1e-05 4 18 151 348 575 6262 3.3e-05 5 28 92 372 610 10404 5.8e-05 6 30 195 347 605 10406 6.1e-05 7 35 232 398 627 13934 8.8e-05 8 20 205 353 599 7053 4.8e-05 9 13 216 371 478 4818 3.4e-05 X 16 173 350 572 5597 3.6e-05 all 23 540 15 370 1123 199910 2.7e-05

Table 99: Chromosome specific distribution of the regions. The last line represents the overall statistics. 150 100 Frequency 50 0

0 200 400 600 800 1000 1200

length (base pairs)

Figure 95: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

147 The following table shows the properties of seqFXnSiu-gCount component. property value genes 187

(a) seqFXnSiu-deNovo-meme1: width=11, sites=290, (b) seqFXnSiu-deNovo-meme2: width=15, sites=86, (c) seqFXnSiu-deNovo-meme3: width=14, sites=14, llr=2490, E=3.5e-57 llr=967, E=2.5e-22 llr=207, E=5e+06

Figure 96: De novo motifs for the filtered FoxA1 without FoxA1 siFOXA1 binding site overlaps (up) sequences.

Table 100: Motif enrichments

motif ratio fC fR n1C n1R p1 var CTCF 9.0667 0.0185 0.0020 10 2 12.79E-04 0.00773 Ar 5.1488 0.0259 0.0050 13 5 20.04E-04 0.01348 FOXA1 3.8542 1.1630 0.3017 405 252 5.375E-108 0.67970 ARMotifTT 3.6739 0.0148 0.0040 8 4 4.451E-02 0.00773 Foxa2 3.6074 1.0741 0.2977 395 235 3.902E-106 0.69205 FOXF2 3.4003 0.3500 0.1029 169 97 1.094E-30 0.18875 GABPA 2.5833 0.0981 0.0380 47 38 41.79E-06 0.06599 FOXD1 2.4369 1.5241 0.6254 447 445 52.65E-96 1.13150 Foxq1 1.9647 0.6222 0.3167 245 248 2.735E-28 0.51188 NHLH1 1.5939 0.1593 0.0999 60 57 50.55E-06 0.23477 Tcfcp2l1 1.5734 0.0833 0.0529 42 46 74.64E-04 0.07388 TAL1::TCF3 1.4852 0.2315 0.1558 90 115 3.116E-04 0.26608 FOXI1 1.4676 0.9222 0.6284 297 410 2.596E-26 1.03557 GRMotifHH 1.3973 0.1815 0.1299 86 119 27.37E-04 0.15861 NR2F1 1.3967 0.0963 0.0689 49 66 4.344E-02 0.08019 FOXA1pAR 1.3610 0.0870 0.0639 44 55 2.823E-02 0.08377 GRMotifTH 1.3580 0.2537 0.1868 121 170 72.48E-06 0.21551 Myf 1.3526 0.2500 0.1848 99 134 4.117E-04 0.41529 ARMotifT 1.3240 0.4537 0.3427 187 278 1.192E-08 0.42054 FOXO3 1.3186 1.7889 1.3566 440 677 9.753E-62 2.10984 Klf4 1.2907 0.2463 0.1908 93 153 3.717E-02 0.33499 NFIC 1.2695 2.3981 1.8891 461 777 1.878E-64 3.10323 Mafb 1.2648 1.9130 1.5125 416 692 3.481E-48 2.69952 ELK1 1.2223 0.8389 0.6863 299 470 16.36E-22 0.86925 RUNX1 1.2185 0.3056 0.2507 146 211 5.225E-06 0.28162 TFAP2A 1.2003 1.6093 1.3407 335 544 28.77E-28 4.09266 MZF1 5-13 0.8312 0.9981 1.2008 326 644 11.08E-18 1.59483 Pdx1 0.8169 1.9537 2.3916 390 803 22.95E-30 4.80493 Spz1 0.7986 0.5130 0.6424 217 458 2.744E-04 0.75503 Prrx2 0.7777 1.7667 2.2717 380 787 1.192E-26 4.39621 Foxd3 0.7698 1.7204 2.2348 366 661 15.72E-30 9.14897 FOXL1 0.7553 5.1259 6.7862 470 915 43.44E-60 43.61858

148 12.37 FoxA1 without FoxA1 siFOXA1 binding site overlaps (down)

Chromosome specific statistics are shown in Table 101. A histogram of sequence lengths is shown in Figure 97.

length chromosome frequency min mean max total coverage 1 60 233 386 901 23155 9.3e-05 10 13 205 365 538 4751 3.5e-05 11 19 200 387 628 7345 5.4e-05 12 26 179 334 496 8695 6.5e-05 13 2 498 525 552 1050 9e-06 14 22 250 434 827 9540 8.9e-05 15 19 218 380 631 7227 7e-05 16 14 217 315 608 4406 4.9e-05 17 9 232 371 487 3335 4.1e-05 18 10 241 406 630 4055 5.2e-05 19 8 14 258 389 2067 3.5e-05 2 29 65 335 671 9706 4e-05 20 10 186 315 453 3148 5e-05 21 10 190 338 574 3376 7e-05 22 4 251 296 346 1185 2.3e-05 3 51 39 423 962 21583 0.000109 4 29 154 354 645 10252 5.4e-05 5 7 217 306 459 2140 1.2e-05 6 19 194 377 707 7165 4.2e-05 7 11 255 410 673 4515 2.8e-05 8 25 243 424 780 10596 7.2e-05 9 8 209 382 711 3059 2.2e-05 X 4 208 316 460 1264 8e-06 all 23 409 14 376 962 153615 2.1e-05

Table 101: Chromosome specific distribution of the regions. The last line represents the overall statistics. 150 100 Frequency 50 0

0 200 400 600 800 1000

length (base pairs)

Figure 97: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

149 The following table shows the properties of seqFXnSid-gCount component. property value genes 146

(a) seqFXnSid-deNovo-meme1: width=11, sites=248, (b) seqFXnSid-deNovo-meme2: width=15, sites=55, (c) seqFXnSid-deNovo-meme3: width=15, sites=19, llr=2073, E=1.2e-35 llr=630, E=2.1 llr=259, E=2.9e+07

Figure 98: De novo motifs for the filtered FoxA1 without FoxA1 siFOXA1 binding site overlaps (down) sequences.

Table 102: Motif enrichments

motif ratio fC fR n1C n1R p1 var ARMotifTT 9.0892 0.0244 0.0026 10 2 12.82E-04 0.01020 CTCF 8.1821 0.0220 0.0026 8 2 78.27E-04 0.01108 Foxa2 4.1812 1.1174 0.2672 291 162 8.295E-80 0.69779 FOXA1 3.7880 1.2127 0.3201 304 199 4.969E-78 0.74957 FOXF2 3.7211 0.3545 0.0952 122 68 1.166E-22 0.21184 PPARG::RXRA 3.6823 0.0342 0.0093 14 7 43.57E-04 0.01772 TLX1::NFIC 2.8944 0.0269 0.0093 8 4 4.457E-02 0.02553 FOXD1 2.4120 1.5379 0.6376 325 336 8.689E-64 1.29166 GABPA 2.2950 0.1002 0.0437 38 33 6.221E-04 0.06469 Foxq1 1.9673 0.6870 0.3492 204 190 3.024E-28 0.62718 NHLH1 1.8176 0.1467 0.0807 38 39 47.86E-04 0.18422 Stat3 1.7206 0.1980 0.1151 64 75 7.257E-04 0.17506 MAX 1.5891 0.2249 0.1415 70 82 2.922E-04 0.24313 FOXI1 1.5250 1.0147 0.6653 240 317 76.59E-26 1.21532 Esrrb 1.5058 0.1076 0.0714 41 49 1.832E-02 0.09086 TAL1::TCF3 1.4160 0.2323 0.1640 69 93 44.45E-04 0.26790 Myf 1.3696 0.2518 0.1839 75 109 84.39E-04 0.32967 USF1 1.3376 0.5379 0.4021 126 164 18.25E-08 0.85766 Myc 1.3269 0.1369 0.1032 44 58 3.958E-02 0.16373 EBF1 1.3175 0.6675 0.5066 174 260 4.769E-10 0.80465 ARMotifT 1.3079 0.4083 0.3122 126 190 25.02E-06 0.40171 FOXO3 1.2935 1.8582 1.4365 330 536 42.31E-44 2.27227 NFIC 1.2390 2.4059 1.9418 346 588 28.61E-48 3.14366 TFAP2A 1.2316 1.6308 1.3241 247 393 4.074E-20 5.99641 ELK1 1.2184 0.7897 0.6481 211 346 75.74E-14 0.84334 Gata1 1.2170 0.6504 0.5344 180 291 40.43E-10 0.67584 Prrx2 0.8317 1.9022 2.2870 301 594 17.77E-26 4.23377 SP1 0.8076 0.8973 1.1111 156 312 12.05E-04 4.79423 MZF1 5-13 0.8001 1.0244 1.2804 237 503 2.856E-10 1.79871 FOXL1 0.7961 5.8557 7.3558 369 695 2.516E-52 49.82045 YY1 0.7935 2.8264 3.5622 362 707 21.97E-48 5.63267 CREB1 0.7890 0.6064 0.7685 176 376 4.858E-04 0.85489 Zfx 0.5184 0.0733 0.1415 28 92 3.074E-02 0.13478

150 12.38 FoxA1 siFOXA1 without FoxA1 binding site overlaps

Chromosome specific statistics are shown in Table 103. A histogram of sequence lengths is shown in Figure 99.

length chromosome frequency min mean max total coverage 1 10 213 339 749 3394 1.4e-05 10 14 160 254 325 3561 2.6e-05 11 10 195 285 389 2847 2.1e-05 12 10 179 241 347 2410 1.8e-05 13 10 187 248 350 2484 2.2e-05 14 7 191 270 400 1891 1.8e-05 15 6 203 304 510 1823 1.8e-05 16 7 145 257 360 1802 2e-05 17 10 72 234 446 2340 2.9e-05 18 3 195 218 236 655 8e-06 19 2 204 224 244 448 8e-06 2 13 125 268 725 3486 1.4e-05 20 5 159 244 335 1220 1.9e-05 21 5 189 208 242 1038 2.2e-05 3 19 91 233 372 4433 2.2e-05 4 12 135 230 334 2763 1.4e-05 5 20 179 295 500 5898 3.3e-05 6 8 186 270 391 2157 1.3e-05 7 12 3 239 381 2865 1.8e-05 8 8 142 276 465 2212 1.5e-05 9 8 190 246 359 1967 1.4e-05 X 7 187 241 320 1687 1.1e-05 all 22 206 3 259 749 53381 7e-06

Table 103: Chromosome specific distribution of the regions. The last line represents the overall statistics. 100 80 60 40 Frequency 20 0

0 200 400 600 800

length (base pairs)

Figure 99: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

151 The following table shows the properties of seqSinFX-gCount component. property value genes 711

(a) seqSinFX-deNovo-meme1: width=15, sites=21, (b) seqSinFX-deNovo-meme2: width=15, sites=67, (c) seqSinFX-deNovo-meme3: width=15, sites=22, llr=261, E=120000 llr=642, E=0.017 llr=268, E=1200000

Figure 100: De novo motifs for the filtered FoxA1 siFOXA1 without FoxA1 binding site overlaps sequences.

Table 104: Motif enrichments

motif ratio fC fR n1C n1R p1 var Pax4 586.3659 0.0293 0.0000 4 0 2.642E-02 0.01690 RREB1 84.3608 0.8829 0.0104 25 4 68.28E-10 3.05594 PLAG1 14.7219 0.0390 0.0026 6 1 1.43E-02 0.02187 Ar 12.0693 0.0634 0.0052 10 2 10.73E-04 0.03506 STAT1 4.9697 0.0390 0.0078 6 2 4.155E-02 0.02856 Tal1::Gata1 4.8063 0.0878 0.0182 17 7 2.986E-04 0.04411 SP1 4.5397 3.0976 0.6823 87 120 1.203E-06 17.21589 GABPA 3.2316 0.0927 0.0286 19 11 11.2E-04 0.04842 Tcfcp2l1 3.2051 0.0585 0.0182 12 6 76.17E-04 0.03467 FOXA1 3.0011 0.6488 0.2161 115 72 5.128E-26 0.34488 NHLH1 2.4956 0.0976 0.0391 15 10 1.048E-02 0.09000 FOXF2 2.3832 0.2049 0.0859 40 33 42.73E-06 0.11811 MZF1 1-4 2.2052 4.9561 2.2474 181 306 3.324E-28 20.31072 Foxa2 2.2025 0.6195 0.2812 114 77 3.15E-24 0.60415 Gata1 2.1555 0.7073 0.3281 91 105 16.8E-10 0.59577 EBF1 1.8730 0.6439 0.3438 80 90 5.437E-08 0.75794 NFIC 1.6688 1.9122 1.1458 156 244 4.04E-20 2.16454 FOXD1 1.6351 0.7707 0.4714 122 139 1.605E-16 0.59165 FEV 1.6233 0.5073 0.3125 82 99 17.98E-08 0.40274 NR3C1 1.5488 0.2098 0.1354 36 49 4.107E-02 0.16952 Hand1::Tcfe2a 1.5002 0.8439 0.5625 115 158 15.92E-12 0.76546 SPI1 1.4777 1.3854 0.9375 149 221 55.89E-20 1.34671 Foxq1 1.4516 0.3024 0.2083 55 72 12.57E-04 0.23770 ELK1 1.4163 0.6049 0.4271 92 127 23.79E-08 0.54622 TFAP2A 1.3684 1.1902 0.8698 112 165 8.038E-10 2.82448 Myf 1.3588 0.1805 0.1328 31 39 3.859E-02 0.22594 REL 1.3553 0.4341 0.3203 65 91 5.16E-04 0.48928 ARMotifT 1.3210 0.3268 0.2474 57 80 25.49E-04 0.28477 ELF5 1.2858 1.3561 1.0547 146 236 3.026E-16 1.48809 ARMotifH 1.2396 5.2976 4.2734 201 373 4.08E-36 9.04305 Fos 1.2364 1.1366 0.9193 128 219 1.154E-10 1.29759 NR4A2 1.2157 1.1366 0.9349 137 218 4.484E-14 1.18195 GRMotifT 1.2051 1.3463 1.1172 142 250 18.05E-14 1.52577 Nobox 0.7882 0.8293 1.0521 92 211 2.07E-02 1.65411 ARID3A 0.7751 1.9317 2.4922 145 296 14.95E-12 5.69559 Nkx2-5 0.7665 3.1659 4.1302 170 350 58.28E-20 9.16691 Prrx2 0.7271 1.1171 1.5365 130 261 1.491E-08 2.29964 Pdx1 0.7000 1.2341 1.7630 128 272 22.68E-08 2.83602 FOXL1 0.5242 2.7902 5.3229 163 342 3.808E-16 32.68236 Lhx3 0.4523 0.1024 0.2266 14 61 1.558E-02 0.31326 NFIL3 0.3994 0.0634 0.1589 12 48 4.803E-02 0.20868

152 12.39 FoxA1 siFOXA1 without FoxA1 binding site overlaps (up)

Chromosome specific statistics are shown in Table 105. A histogram of sequence lengths is shown in Figure 101.

length chromosome frequency min mean max total coverage 1 1 213 213 213 213 1e-06 2 1 200 200 200 200 1e-06 3 3 188 381 605 1144 6e-06 4 2 266 298 330 596 3e-06 6 2 286 286 287 573 3e-06 7 2 197 248 299 496 3e-06 9 1 194 194 194 194 1e-06 10 5 210 296 362 1479 1.1e-05 11 1 389 389 389 389 3e-06 12 2 252 260 268 520 4e-06 13 1 213 213 213 213 2e-06 14 2 192 296 400 592 6e-06 all 12 23 188 287 605 6609 1e-06

Table 105: Chromosome specific distribution of the regions. The last line represents the overall statistics. 7 6 5 4 3 Frequency 2 1 0

200 300 400 500 600

length (base pairs)

Figure 101: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

153 The following table shows the properties of seqSinFXu-gCount component. property value genes 19

(a) seqSinFXu-deNovo-meme1: width=8, sites=14, (b) seqSinFXu-deNovo-meme2: width=10, sites=2, (c) seqSinFXu-deNovo-meme3: width=15, sites=11, llr=119, E=39000 llr=29, E=240000 llr=127, E=360000

Figure 102: De novo motifs for the filtered FoxA1 siFOXA1 without FoxA1 binding site overlaps (up) sequences.

Table 106: Motif enrichments

motif ratio fC fR n1C n1R p1 var Stat3 8.6003 0.3913 0.0455 7 2 79.54E-04 0.19991 ARMotifHH 4.2073 0.4783 0.1136 8 5 2.784E-02 0.30574 MAX 2.6775 0.3043 0.1136 7 4 3.957E-02 0.17956 ELK1 2.2072 0.6522 0.2955 12 13 2.154E-02 0.36816 GRMotifT 1.9130 2.0000 1.0455 16 30 2.811E-02 2.38896 ELF5 1.9130 1.6522 0.8636 19 26 3.927E-04 1.14835 SPI1 1.8260 1.8261 1.0000 20 29 2.055E-04 1.17594 Hand1::Tcfe2a 1.7713 1.0870 0.6136 14 18 1.471E-02 1.11578 NFIC 1.6662 2.3478 1.4091 17 30 1.052E-02 3.04794 MZF1 1-4 1.6530 3.8696 2.3409 22 38 67.58E-06 14.26956 ARMotifH 1.6210 7.0000 4.3182 23 44 36.99E-06 14.94211 SPIB 1.5979 3.0870 1.9318 21 36 2.167E-04 3.07237 TFAP2A 1.3551 1.4783 1.0909 14 21 3.097E-02 4.17639 NR4A2 1.3308 1.3913 1.0455 18 25 11.25E-04 1.29082 ETS1 1.3217 4.9565 3.7500 21 42 6.141E-04 6.10900 GRMotifH 1.2470 3.8261 3.0682 21 41 5.242E-04 6.70873 AP1 1.2422 2.1739 1.7500 20 36 8.913E-04 2.79195 ARID3A 0.7717 2.0870 2.7045 19 33 18.67E-04 5.61737 En1 0.7573 2.4783 3.2727 22 40 97.7E-06 5.15152 SRY 0.7542 1.7826 2.3636 20 35 7.411E-04 4.07870 HOXA5 0.6597 3.0435 4.6136 20 40 17.42E-04 11.25192 YY1 0.6515 2.0435 3.1364 19 42 77.79E-04 3.76029 Nkx2-5 0.5860 2.9565 5.0455 17 40 4.512E-02 15.82994 FOXL1 0.4492 2.5217 5.6136 18 40 1.745E-02 27.64496

154 12.40 FoxA1 siFOXA1 without FoxA1 binding site overlaps (down)

Chromosome specific statistics are shown in Table 107. A histogram of sequence lengths is shown in Figure 103.

length chromosome frequency min mean max total coverage 4 1 190 190 190 190 1e-06 6 1 391 391 391 391 2e-06 7 1 207 207 207 207 1e-06 10 1 306 306 306 306 2e-06 12 1 179 179 179 179 1e-06 15 1 510 510 510 510 5e-06 16 1 329 329 329 329 4e-06 all 7 7 179 302 510 2112 0

Table 107: Chromosome specific distribution of the regions. The last line represents the overall statistics. 3.0 2.0 Frequency 1.0 0.0

100 200 300 400 500 600

length (base pairs)

Figure 103: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

155 The following table shows the properties of seqSinFXd-gCount component. property value genes 7

(a) seqSinFXd-deNovo-meme1: width=11, sites=6, (b) seqSinFXd-deNovo-meme2: width=8, sites=6, (c) seqSinFXd-deNovo-meme3: width=14, sites=4, llr=71, E=44 llr=56, E=16000 llr=58, E=26000

Figure 104: De novo motifs for the filtered FoxA1 siFOXA1 without FoxA1 binding site overlaps (down) sequences.

Table 108: Motif enrichments

motif ratio fC fR n1C n1R p1 var FEV 17143.8571 0.8571 0.0000 4 0 1.809E-02 0.45029 Foxa2 17143.8571 0.8571 0.0000 5 0 40.96E-04 0.33918 FOXA1 13.7067 1.1429 0.0833 5 1 1.408E-02 0.70760 MZF1 1-4 3.1746 7.1429 2.2500 7 9 2.747E-02 23.05263 Hand1::Tcfe2a 2.9997 1.0000 0.3333 6 4 2.473E-02 0.36842 Cebpa 2.8568 0.7143 0.2500 5 2 3.337E-02 0.36842 NFIC 2.0357 2.7143 1.3333 7 6 1.077E-02 2.69591 ETS1 1.6049 6.2857 3.9167 7 9 2.747E-02 9.17544 GATA3 1.5184 4.4286 2.9167 7 11 4.27E-02 11.37427 SOX10 1.4505 6.2857 4.3333 7 11 4.27E-02 10.49708 GATA2 1.2952 9.7143 7.5000 7 11 4.27E-02 26.56140 FOXC1 0.8000 4.0000 5.0000 7 11 4.27E-02 11.69006 ZNF354C 0.7619 3.4286 4.5000 7 9 2.747E-02 43.54386 En1 0.6690 2.2857 3.4167 7 10 3.472E-02 3.77778

156 12.41 AR and FoxA1 binding site overlaps (stable)

Chromosome specific statistics are shown in Table 109. A histogram of sequence lengths is shown in Figure 105.

length chromosome frequency min mean max total coverage 1 326 4 389 838 126747 0.000509 10 116 173 377 758 43723 0.000323 11 145 1 367 861 53241 0.000394 12 91 1 329 754 29972 0.000224 13 56 107 398 1028 22284 0.000193 14 91 6 396 889 36022 0.000336 15 94 207 394 1020 36996 0.000361 16 81 111 336 561 27211 0.000301 17 109 47 359 690 39079 0.000481 18 56 73 354 772 19833 0.000254 19 39 7 323 535 12612 0.000213 2 223 12 360 748 80336 0.00033 20 62 208 366 710 22701 0.00036 21 29 69 375 577 10864 0.000226 22 31 223 331 512 10247 2e-04 3 228 39 402 908 91607 0.000463 4 116 19 352 629 40860 0.000214 5 169 9 376 937 63538 0.000351 6 141 111 346 707 48832 0.000285 7 154 32 380 780 58587 0.000368 8 142 11 371 780 52621 0.00036 9 111 7 354 814 39318 0.000278 X 48 100 319 598 15318 9.9e-05 all 23 2658 1 370 1028 982549 0.000134

Table 109: Chromosome specific distribution of the regions. The last line represents the overall statistics. 1000 600 Frequency 200 0

0 200 400 600 800 1000

length (base pairs)

Figure 105: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

157 The following table shows the properties of seqARFXs-gCount component. property value genes 2565

(a) seqARFXs-deNovo-meme1: width=11, sites=380, (b) seqARFXs-deNovo-meme2: width=15, sites=88, (c) seqARFXs-deNovo-meme3: width=15, sites=55, llr=3087, E=3.1e-97 llr=950, E=6.3e-06 llr=649, E=0.002

Figure 106: De novo motifs for the filtered AR and FoxA1 binding site overlaps (stable) sequences.

Table 110: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 11.3070 0.0683 0.0060 167 29 32.35E-52 0.03124 FOXA1 3.7517 1.1535 0.3074 1931 1265 0.00E00 0.72141 CTCF 3.6808 0.0053 0.0014 14 7 41.73E-04 0.00274 Foxa2 3.3612 1.0294 0.3062 1812 1111 0.00E00 0.77870 FOXA1pAR 3.2415 0.1995 0.0615 496 279 2.996E-80 0.11801 FOXF2 3.0050 0.3369 0.1121 770 507 6.226E-118 0.20532 TLX1::NFIC 2.6097 0.0121 0.0046 28 19 5.257E-04 0.00922 FOXD1 2.2232 1.4221 0.6396 2031 2200 0.00E00 1.13685 Foxq1 2.0053 0.6522 0.3252 1267 1248 1.51E-156 0.54640 Tal1::Gata1 1.8867 0.0498 0.0264 127 129 13.23E-08 0.03538 STAT1 1.8866 0.0490 0.0260 99 106 21.07E-06 0.04874 ARMotifTH 1.8197 0.0611 0.0336 161 164 16.87E-10 0.04255 ESR2 1.8173 0.0204 0.0112 51 47 3.303E-04 0.01782 Stat3 1.7312 0.1999 0.1155 406 454 11.08E-22 0.20241 ARMotifT 1.6985 0.5639 0.3320 1142 1343 13.94E-100 0.49469 PPARG::RXRA 1.6202 0.0230 0.0142 57 68 97.72E-04 0.01903 FOXI1 1.6089 1.0426 0.6480 1612 1986 23.15E-190 1.25077 GABPA 1.5040 0.0739 0.0491 184 235 5.506E-06 0.06093 GRMotifTH 1.4949 0.2882 0.1928 643 861 96.51E-28 0.24407 Gata1 1.4054 0.7005 0.4984 1252 1832 71.01E-82 0.67404 AR 1.3975 0.0505 0.0362 127 178 40.21E-04 0.04206 NR3C1 1.3839 0.2991 0.2161 633 938 38.79E-20 0.27454 MIZF 1.3628 0.0245 0.0180 65 87 3.138E-02 0.02062 Esrrb 1.3609 0.1071 0.0787 265 378 4.692E-06 0.09012 GRMotifT 1.3541 2.0785 1.5350 2264 3798 32.62E-320 2.16576 FOXO3 1.3538 1.7816 1.3160 2128 3328 2.608E-286 2.12778 TEAD1 1.2792 0.0785 0.0613 190 298 77.26E-04 0.07032 NFIC 1.2773 2.4451 1.9143 2302 4008 10.0E-324 3.04197 GR 1.2435 0.2452 0.1972 568 849 26.11E-16 0.23263 FEV 1.2351 0.5805 0.4700 1136 1786 57.89E-56 0.56736 ARMotifH 1.2332 7.2961 5.9163 2636 4922 0.00E00 12.31116 Myb 1.2249 0.7397 0.6039 1315 2201 1.505E-68 0.73823 TAL1::TCF3 1.2018 0.2278 0.1896 446 706 4.316E-08 0.28213 NF-kappaB 0.8087 0.1441 0.1782 284 713 2.329E-02 0.23172 Foxd3 0.7993 1.8382 2.2998 1946 3307 9.368E-200 9.00704 FOXL1 0.7913 5.5621 7.0292 2485 4563 0.00E00 52.80627 Prrx2 0.7875 1.7978 2.2829 1986 3923 1.782E-172 3.94760 MZF1 5-13 0.7445 0.9336 1.2541 1502 3275 9.506E-54 1.63024 CREB1 0.7442 0.5564 0.7477 1059 2362 24.36E-14 0.88951 Zfp423 0.7210 0.0377 0.0523 83 221 1.996E-02 0.06177 SP1 0.6818 0.7307 1.0717 918 2201 1.201E-04 3.68346 Klf4 0.6745 0.1245 0.1846 290 737 1.529E-02 0.21900 Zfx 0.5433 0.0807 0.1486 196 640 3.329E-08 0.14553 RREB1 0.5277 0.0181 0.0344 37 106 4.367E-02 0.06448 Ddit3::Cebpa 0.5133 0.1871 0.3646 441 1333 2.099E-06 0.39544 Pax6 0.4839 0.0049 0.0102 13 48 4.275E-02 0.00907 PLAG1 0.4737 0.0075 0.0160 20 79 42.17E-04 0.01315 EWSR1-FLI1 0.0647 0.0008 0.0124 2 19 2.895E-02 0.05845

158 12.42 AR without FoxA1 binding site overlaps (stable)

Chromosome specific statistics are shown in Table 111. A histogram of sequence lengths is shown in Figure 107.

length chromosome frequency min mean max total coverage 1 146 102 374 819 54540 0.000219 10 57 169 355 742 20262 0.000149 11 78 209 378 640 29507 0.000219 12 54 172 340 575 18380 0.000137 13 21 197 364 620 7642 6.6e-05 14 63 193 385 649 24265 0.000226 15 45 13 357 554 16046 0.000156 16 45 216 374 684 16838 0.000186 17 74 45 371 568 27485 0.000339 18 23 248 371 514 8534 0.000109 19 27 153 347 726 9368 0.000158 2 109 167 358 681 39005 0.00016 20 37 241 372 630 13762 0.000218 21 25 115 382 750 9559 0.000199 22 25 196 335 485 8374 0.000163 3 79 58 393 796 31048 0.000157 4 55 217 346 517 19033 1e-04 5 82 14 365 780 29904 0.000165 6 83 50 348 537 28893 0.000169 7 78 6 369 756 28749 0.000181 8 59 86 362 662 21380 0.000146 9 47 199 356 648 16741 0.000119 X 41 49 318 802 13041 8.4e-05 Y 1 347 347 347 347 6e-06 all 24 1354 6 364 819 492703 6.7e-05

Table 111: Chromosome specific distribution of the regions. The last line represents the overall statistics. 250 150 Frequency 50 0

0 200 400 600 800

length (base pairs)

Figure 107: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

159 The following table shows the properties of seqARnFXs-gCount component. property value genes 1713

(a) seqARnFXs-deNovo-meme1: width=15, (b) seqARnFXs-deNovo-meme2: width=15, (c) seqARnFXs-deNovo-meme3: width=15, sites=73, sites=251, llr=2485, E=8.1e-150 sites=110, llr=1171, E=1.1e-24 llr=819, E=1.6e-08

Figure 108: De novo motifs for the filtered AR without FoxA1 binding site overlaps (stable) sequences.

Table 112: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 39.3665 0.1737 0.0044 212 11 12.22E-84 0.07136 TLX1::NFIC 3.7004 0.0192 0.0052 22 12 4.765E-04 0.01255 RXRA::VDR 3.0621 0.0074 0.0024 10 6 3.931E-02 0.00411 FOXA1pAR 2.3014 0.1744 0.0758 211 160 30.21E-24 0.13683 ELK4 2.2325 0.0399 0.0179 48 42 2.343E-04 0.03007 PPARG::RXRA 2.1154 0.0244 0.0115 32 28 34.59E-04 0.01678 E2F1 2.0287 0.0266 0.0131 36 32 21.31E-04 0.01801 ARMotifT 2.0202 0.6275 0.3106 625 652 1.121E-68 0.47624 ARMotifTT 1.9721 0.0126 0.0063 17 15 4.548E-02 0.00896 ARMotifTH 1.9549 0.0621 0.0317 84 76 1.06E-06 0.04262 GABPA 1.9047 0.0998 0.0524 122 126 18.62E-08 0.07865 GRMotifTH 1.8954 0.3466 0.1829 388 409 2.014E-30 0.27905 ESR2 1.8595 0.0214 0.0115 21 19 2.765E-02 0.02869 FOXA1 1.8423 0.5876 0.3189 612 659 37.7E-64 0.44591 NR3C1 1.8018 0.3910 0.2170 422 485 14.91E-30 0.30135 Stat3 1.7433 0.2262 0.1297 232 249 2.738E-14 0.24518 AR 1.7411 0.0532 0.0305 71 75 2.738E-04 0.03854 FOXF2 1.6477 0.1641 0.0996 203 233 2.235E-10 0.12787 Tcfcp2l1 1.5772 0.0776 0.0492 98 116 2.148E-04 0.06389 GRMotifT 1.5256 2.2373 1.4665 1193 1894 33.32E-188 2.12789 Foxa2 1.4757 0.4279 0.2900 474 550 24.93E-36 0.45624 Tal1::Gata1 1.4309 0.0466 0.0325 60 79 3.00E-02 0.03914 ARMotifHH 1.4180 0.3932 0.2773 404 559 4.425E-18 0.39448 GR 1.4100 0.2668 0.1892 317 418 14.11E-14 0.23153 Arnt 1.4016 0.1818 0.1297 167 210 1.769E-06 0.24432 ARMotifH 1.3796 8.0333 5.8227 1345 2480 20.05E-242 13.23825 Egr1 1.3769 0.0880 0.0639 109 146 19.02E-04 0.08411 Foxq1 1.3762 0.4198 0.3050 430 608 32.75E-20 0.43108 Esrrb 1.3748 0.1020 0.0742 132 178 4.524E-04 0.08462 NFIC 1.3444 2.4745 1.8405 1178 1980 1.324E-170 3.11777 NR2F1 1.3369 0.1146 0.0857 144 204 8.44E-04 0.09901 Gata1 1.3186 0.6334 0.4804 603 914 39.08E-34 0.61505 ELK1 1.2813 0.8381 0.6541 738 1149 32.81E-52 0.86490 NFE2L2 1.2663 0.1286 0.1015 159 246 44.94E-04 0.11264 EBF1 1.2622 0.6364 0.5042 538 824 60.72E-26 0.94983 TFAP2A 1.2561 1.7598 1.4010 836 1441 13.49E-60 4.98575 FEV 1.2533 0.5528 0.4411 577 841 1.092E-32 0.53472 Mycn 1.2337 0.1086 0.0881 114 166 96.15E-04 0.13732 Myc 1.2290 0.1390 0.1131 148 213 10.43E-04 0.17590 Hand1::Tcfe2a 1.2159 1.0939 0.8996 877 1410 1.85E-76 1.15464 Pax5 1.2118 0.0495 0.0409 66 91 4.052E-02 0.06314 ARID3A 0.8250 2.6179 3.1733 1107 2079 2.779E-124 7.56890 Pdx1 0.8018 1.7938 2.2372 984 1986 4.108E-76 4.06887 NKX3-1 0.7849 0.4922 0.6271 467 993 23.06E-06 0.86540 CREB1 0.7740 0.5661 0.7315 557 1218 1.604E-08 0.81580 Prrx2 0.7661 1.6098 2.1012 934 1932 41.37E-62 3.54499 Foxd3 0.6791 1.4398 2.1202 799 1615 10.03E-38 8.61440 FOXL1 0.6764 4.4043 6.5117 1210 2292 28.91E-166 36.78959 PBX1 0.6523 0.0673 0.1031 84 222 3.619E-02 0.12992 MEF2A 0.5913 0.1707 0.2888 181 536 6.616E-04 0.35828 Ddit3::Cebpa 0.5225 0.1826 0.3495 222 624 78.03E-04 0.39493

160 12.43 FoxA1 without AR binding site overlaps (stable)

Chromosome specific statistics are shown in Table 113. A histogram of sequence lengths is shown in Figure 109.

length chromosome frequency min mean max total coverage 1 943 7 378 1075 356436 0.00143 10 345 15 364 865 125413 0.000925 11 450 6 370 906 166297 0.001232 12 350 9 357 749 124836 0.000933 13 211 55 387 829 81552 0.000708 14 297 38 381 1283 113146 0.001054 15 302 186 358 795 108254 0.001056 16 252 16 362 1574 91298 0.00101 17 345 1 341 697 117674 0.001449 18 163 179 365 829 59529 0.000762 19 130 14 321 702 41721 0.000706 2 638 98 364 1237 232145 0.000955 20 207 182 361 716 74640 0.001184 21 85 173 355 762 30184 0.000627 22 121 173 323 723 39037 0.000761 3 687 170 405 1326 278363 0.001406 4 318 2 363 765 115459 0.000604 5 403 41 372 893 149747 0.000828 6 394 149 352 807 138682 0.00081 7 455 22 356 940 161834 0.001017 8 368 27 379 957 139502 0.000953 9 344 3 362 762 124607 0.000882 X 128 83 315 726 40280 0.000259 Y 8 260 337 461 2694 4.5e-05 all 24 7944 1 367 1574 2913330 0.000396

Table 113: Chromosome specific distribution of the regions. The last line represents the overall statistics. 2500 1500 Frequency 500 0

0 500 1000 1500

length (base pairs)

Figure 109: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

161 The following table shows the properties of seqFXnARs-gCount component. property value genes 5451

(a) seqFXnARs-deNovo-meme1: width=15, (b) seqFXnARs-deNovo-meme2: width=15, (c) seqFXnARs-deNovo-meme3: width=15, sites=24, sites=371, llr=3356, E=4.6e-205 sites=138, llr=1428, E=2.1e-31 llr=333, E=400

Figure 110: De novo motifs for the filtered FoxA1 without AR binding site overlaps (stable) sequences.

Table 114: Motif enrichments

motif ratio fC fR n1C n1R p1 var CTCF 9.2823 0.0242 0.0026 191 38 7.329E-54 0.01008 FOXA1 4.0157 1.2504 0.3113 6205 3816 0.00E00 0.76712 Foxa2 3.9999 1.1994 0.2998 6166 3343 0.00E00 0.83109 FOXF2 3.8657 0.3969 0.1026 2676 1406 0.00E00 0.22088 TLX1::NFIC 3.3222 0.0191 0.0057 113 64 6.557E-16 0.01557 FOXD1 2.5912 1.6253 0.6272 6679 6460 0.00E00 1.25556 Foxq1 2.1658 0.6922 0.3196 3880 3735 0.00E00 0.56040 ELK4 2.1120 0.0379 0.0179 267 249 1.043E-16 0.02928 E2F1 1.9881 0.0203 0.0102 155 149 23.06E-10 0.01421 GABPA 1.7701 0.0886 0.0500 614 705 10.15E-24 0.07250 Ar 1.7565 0.0105 0.0059 77 86 10.16E-04 0.00815 Tal1::Gata1 1.5895 0.0446 0.0280 342 411 63.43E-12 0.03416 Stat3 1.5512 0.1833 0.1181 1130 1392 17.96E-42 0.19204 NHLH1 1.5447 0.1393 0.0902 713 853 57.23E-26 0.18404 FOXI1 1.5314 0.9773 0.6382 4438 5850 0.00E00 1.24087 FOXO3 1.4508 1.9162 1.3208 6634 9949 0.00E00 2.16930 Tcfcp2l1 1.3747 0.0631 0.0459 452 624 1.194E-08 0.05904 PPARG::RXRA 1.3512 0.0181 0.0134 142 196 44.26E-04 0.01527 STAT1 1.3339 0.0374 0.0280 241 342 3.696E-04 0.04244 Arnt 1.2927 0.1636 0.1266 854 1257 3.758E-14 0.26447 NFIC 1.2785 2.3775 1.8596 6910 11573 0.00E00 3.01485 MIZF 1.2713 0.0247 0.0194 185 283 2.152E-02 0.02220 TAL1::TCF3 1.2663 0.2257 0.1782 1291 1948 4.793E-24 0.27996 RORA 2 1.2561 0.0194 0.0154 154 227 1.673E-02 0.01671 Myb 1.2531 0.7580 0.6049 4001 6451 68.09E-224 0.75769 Myf 1.2451 0.2245 0.1803 1333 2043 11.12E-24 0.30315 FOXA1pAR 1.2427 0.0812 0.0654 582 851 78.15E-10 0.08932 RXR::RAR DR5 1.2372 0.0249 0.0201 194 292 1.129E-02 0.02231 ELK1 1.2167 0.8010 0.6583 4156 6750 1.111E-242 0.84816 Gata1 1.2099 0.6018 0.4974 3327 5411 13.7E-136 0.64295 ARMotifTH 1.2010 0.0394 0.0328 306 469 16.56E-04 0.03644 YY1 0.8210 2.9057 3.5393 7228 13962 0.00E00 6.36317 MZF1 5-13 0.8193 1.0014 1.2223 4597 9716 7.897E-180 1.61826 CREB1 0.8033 0.5740 0.7145 3246 6896 54.25E-50 0.85210 Prrx2 0.7917 1.7568 2.2190 5639 11583 0.00E00 4.03173 Zfx 0.7439 0.1061 0.1426 679 1753 90.1E-08 0.16885 NF-kappaB 0.7067 0.1338 0.1893 787 2171 1.012E-10 0.25048 Ddit3::Cebpa 0.5462 0.1952 0.3575 1379 3914 60.84E-12 0.38284 EWSR1-FLI1 0.2440 0.0049 0.0203 11 58 15.69E-04 0.17048

162 12.44 AR and AR siFOXA1 binding site overlaps (stable)

Chromosome specific statistics are shown in Table 115. A histogram of sequence lengths is shown in Figure 111.

length chromosome frequency min mean max total coverage 1 136 8 411 921 55953 0.000224 10 44 202 360 608 15837 0.000117 11 78 11 401 685 31277 0.000232 12 54 16 360 876 19459 0.000145 13 18 192 430 1077 7749 6.7e-05 14 42 34 370 773 15547 0.000145 15 49 14 429 1057 21040 0.000205 16 54 22 372 639 20080 0.000222 17 52 9 385 764 20025 0.000247 18 24 281 418 692 10028 0.000128 19 23 221 393 711 9030 0.000153 2 77 74 408 744 31446 0.000129 20 28 207 398 599 11131 0.000177 21 14 82 332 448 4649 9.7e-05 22 12 257 366 460 4395 8.6e-05 3 69 58 447 918 30815 0.000156 4 46 86 363 638 16688 8.7e-05 5 74 221 404 878 29892 0.000165 6 65 19 378 736 24546 0.000143 7 68 6 389 877 26477 0.000166 8 49 218 412 666 20198 0.000138 9 35 232 378 606 13220 9.4e-05 X 15 225 377 690 5658 3.6e-05 Y 1 347 347 347 347 6e-06 all 24 1127 6 395 1077 445487 6.1e-05

Table 115: Chromosome specific distribution of the regions. The last line represents the overall statistics. 400 300 200 Frequency 100 0

0 200 400 600 800 1000

length (base pairs)

Figure 111: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

163 The following table shows the properties of seqARSis-gCount component. property value genes 966

(a) seqARSis-deNovo-meme1: width=15, sites=295, (b) seqARSis-deNovo-meme2: width=15, sites=118, (c) seqARSis-deNovo-meme3: width=15, sites=42, llr=2791, E=2.4e-153 llr=1236, E=1.1e-18 llr=520, E=26

Figure 112: De novo motifs for the filtered AR and AR siFOXA1 binding site overlaps (stable) sequences.

Table 116: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 19.2766 0.1564 0.0081 164 16 9.258E-60 0.06483 TLX1::NFIC 5.1538 0.0222 0.0043 22 6 2.70E-06 0.01413 ESR2 3.1396 0.0329 0.0104 32 19 39.62E-06 0.02288 Tal1::Gata1 2.5065 0.0596 0.0237 66 49 18.54E-08 0.03614 FOXA1 2.2978 0.7351 0.3199 600 540 5.891E-90 0.52489 STAT1 2.1882 0.0613 0.0280 54 50 1.784E-04 0.05724 Stat3 2.1339 0.2684 0.1258 219 211 10.11E-18 0.25550 ARMotifTH 2.0396 0.0756 0.0370 85 75 26.94E-08 0.04976 Foxa2 2.0288 0.5893 0.2905 504 464 1.138E-64 0.52183 PPARG::RXRA 2.0077 0.0258 0.0128 29 26 65.37E-04 0.01765 FOXF2 1.9203 0.2142 0.1115 215 219 12.72E-16 0.15348 GRMotifTH 1.9161 0.3920 0.2046 355 375 7.885E-30 0.30044 GABPA 1.9055 0.1013 0.0532 107 106 19.88E-08 0.07310 FOXA1pAR 1.8863 0.1191 0.0631 127 115 1.768E-10 0.09376 ARMotifT 1.8572 0.6364 0.3427 526 586 6.637E-54 0.48412 Esrrb 1.8246 0.1360 0.0745 145 155 84.43E-10 0.09293 ELK4 1.6884 0.0329 0.0195 34 40 4.61E-02 0.02665 AR 1.6638 0.0569 0.0342 62 72 31.34E-04 0.04156 NR3C1 1.6366 0.3636 0.2221 321 411 28.39E-18 0.30053 Egr1 1.6227 0.1156 0.0712 114 135 23.29E-06 0.10267 Tcfcp2l1 1.5656 0.0862 0.0551 89 106 3.759E-04 0.07334 MIZF 1.5459 0.0338 0.0218 38 46 4.35E-02 0.02532 NHLH1 1.5068 0.1538 0.1020 110 147 9.81E-04 0.19666 GRMotifT 1.4617 2.4018 1.6431 1005 1641 59.3E-160 2.62400 Myf 1.4576 0.2684 0.1841 231 297 3.034E-10 0.31281 Arnt 1.4164 0.1849 0.1305 145 182 5.461E-06 0.23795 GR 1.4114 0.2747 0.1946 257 349 2.174E-10 0.25288 ARMotifHH 1.4010 0.4302 0.3071 367 509 4.045E-18 0.42439 Gata1 1.4007 0.7093 0.5064 534 789 2.631E-34 0.70158 NFIC 1.3979 2.9324 2.0978 1014 1697 10.3E-160 3.88972 ARMotifH 1.3895 8.7262 6.2800 1114 2068 2.684E-198 15.17645 FOXD1 1.3771 0.9431 0.6849 648 956 1.998E-54 0.95699 ELK1 1.3517 0.9013 0.6668 643 951 34.9E-54 0.89956 Foxq1 1.3395 0.4756 0.3550 414 556 1.649E-24 0.50872 NR2F1 1.2965 0.1280 0.0987 134 196 18.85E-04 0.11194 EBF1 1.2854 0.7040 0.5477 482 747 1.764E-24 0.96883 FEV 1.2720 0.6436 0.5059 532 771 24.6E-36 0.62787 TFAP2A 1.2621 1.8773 1.4874 734 1175 4.933E-66 5.38564 Hand1::Tcfe2a 1.2577 1.2613 1.0028 759 1272 12.27E-68 1.31110 Myc 1.2283 0.1440 0.1172 133 187 6.743E-04 0.17309 SPI1 1.2147 1.8880 1.5543 904 1533 46.84E-112 2.32562 YY1 0.8234 3.1227 3.7926 1049 1973 2.675E-160 5.80396 HOXA5 0.8175 4.9324 6.0332 1081 2029 12.67E-178 13.33557 Nobox 0.7993 1.2204 1.5268 660 1368 13.47E-30 2.59282 Nkx2-5 0.7883 4.5796 5.8097 1030 1977 5.686E-148 17.21406 ARID3A 0.7762 2.6933 3.4699 921 1763 3.203E-102 8.55719 CREB1 0.7398 0.6151 0.8315 496 1074 7.816E-10 1.00584 Pdx1 0.7015 1.7796 2.5368 820 1732 3.898E-60 4.43570 Prrx2 0.6610 1.6062 2.4300 772 1691 15.03E-46 4.05097 Foxd3 0.6301 1.4693 2.3318 672 1441 44.37E-30 9.45180 FOXL1 0.6186 4.5316 7.3251 1016 1940 5.131E-142 56.95942 NFIL3 0.6024 0.1218 0.2022 112 296 4.677E-02 0.42616 Lhx3 0.5490 0.1662 0.3028 152 415 2.664E-02 0.45819 Ddit3::Cebpa 0.5162 0.2036 0.3944 200 580 89.53E-04 0.45817 Pou5f1 0.3494 0.0044 0.0128 5 27 3.822E-02 0.00981

164 12.45 AR without AR siFOXA1 binding site overlaps (stable)

Chromosome specific statistics are shown in Table 117. A histogram of sequence lengths is shown in Figure 113.

length chromosome frequency min mean max total coverage 1 117 170 401 781 46975 0.000188 10 28 198 391 590 10953 8.1e-05 11 52 194 382 680 19883 0.000147 12 36 172 364 542 13116 9.8e-05 13 18 197 417 731 7508 6.5e-05 14 43 239 412 761 17723 0.000165 15 28 230 421 821 11795 0.000115 16 27 234 403 581 10874 0.00012 17 30 129 352 542 10566 0.00013 18 19 253 395 772 7509 9.6e-05 19 15 153 289 436 4336 7.3e-05 2 69 128 374 648 25818 0.000106 20 24 241 367 606 8807 0.00014 21 8 334 409 584 3269 6.8e-05 22 14 196 326 470 4558 8.9e-05 3 82 114 445 813 36506 0.000184 4 42 223 352 612 14805 7.7e-05 5 55 34 401 1152 22072 0.000122 6 49 50 377 765 18481 0.000108 7 49 197 386 756 18903 0.000119 8 43 209 386 583 16605 0.000113 9 38 152 368 517 13966 9.9e-05 X 18 202 361 647 6501 4.2e-05 all 23 904 34 389 1152 351529 4.8e-05

Table 117: Chromosome specific distribution of the regions. The last line represents the overall statistics. 250 150 Frequency 50 0

0 200 400 600 800 1000 1200

length (base pairs)

Figure 113: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

165 The following table shows the properties of seqARnSis-gCount component. property value genes 820

(a) seqARnSis-deNovo-meme1: width=11, sites=365, (b) seqARnSis-deNovo-meme2: width=15, sites=54, (c) seqARnSis-deNovo-meme3: width=15, sites=22, llr=2807, E=1.2e-35 llr=616, E=1e+06 llr=302, E=120000

Figure 114: De novo motifs for the filtered AR without AR siFOXA1 binding site overlaps (stable) sequences.

Table 118: Motif enrichments

motif ratio fC fR n1C n1R p1 var FOXA1 3.9722 1.2600 0.3172 695 445 1.617E-186 0.78034 FOXA1pAR 3.8966 0.2699 0.0692 222 103 35.79E-46 0.15610 Foxa2 3.7097 1.1416 0.3077 660 403 78.37E-176 0.78729 FOXF2 3.4676 0.3551 0.1024 277 164 2.631E-48 0.19820 Ar 3.2430 0.0442 0.0136 38 23 7.603E-06 0.02525 Tal1::Gata1 2.4535 0.0509 0.0207 45 33 19.32E-06 0.03258 FOXD1 2.2781 1.4923 0.6550 705 775 3.935E-128 1.16416 Foxq1 2.1354 0.7102 0.3325 456 427 1.128E-62 0.57507 FOXI1 1.7759 1.1549 0.6503 583 702 9.735E-76 1.21041 Stat3 1.4300 0.1692 0.1183 123 163 1.326E-04 0.17392 ARMotifT 1.3974 0.5044 0.3609 353 494 1.913E-20 0.44425 FOXO3 1.3799 1.9392 1.4053 722 1179 14.63E-92 2.16693 NFE2L2 1.3214 0.1361 0.1030 116 165 16.77E-04 0.11377 NKX3-1 1.2540 0.8971 0.7154 478 725 5.503E-34 1.23370 Gata1 1.2414 0.6538 0.5266 423 653 1.528E-24 0.63303 NR3C1 1.2384 0.2887 0.2331 211 347 70.87E-06 0.27675 SOX9 1.2270 0.6571 0.5355 411 660 50.28E-22 0.68753 Myb 1.2231 0.7389 0.6041 456 743 2.686E-26 0.72783 SRY 1.2020 3.8352 3.1905 847 1498 3.61E-138 6.43958 REL 0.8245 0.4303 0.5219 281 633 3.336E-02 0.61251 CREB1 0.8049 0.5896 0.7325 386 815 5.226E-08 0.81137 MZF1 1-4 0.7984 2.8374 3.5538 788 1514 81.59E-102 8.44312 TFAP2A 0.7837 1.0619 1.3550 485 905 43.46E-24 4.14505 MZF1 5-13 0.7109 0.9082 1.2775 522 1106 14.62E-22 1.59297 Zfx 0.5364 0.0796 0.1485 66 202 56.51E-04 0.16228 Klf4 0.5303 0.1073 0.2024 85 267 16.38E-04 0.24709

166 12.46 AR siFOXA1 without AR binding site overlaps (stable)

Chromosome specific statistics are shown in Table 119. A histogram of sequence lengths is shown in Figure 115.

length chromosome frequency min mean max total coverage 1 621 12 467 1678 290214 0.001164 10 156 13 452 932 70454 0.00052 11 330 28 471 1448 155527 0.001152 12 184 101 440 1247 80913 0.000604 13 83 217 548 1616 45493 0.000395 14 137 225 479 1240 65613 0.000611 15 207 8 463 1091 95890 0.000935 16 213 1 451 1174 95974 0.001062 17 315 10 478 1312 150447 0.001853 18 77 192 443 994 34112 0.000437 19 120 39 402 781 48209 0.000815 2 284 55 460 1801 130593 0.000537 20 131 217 448 1240 58721 0.000932 21 54 217 458 1255 24714 0.000513 22 114 13 418 766 47651 0.000929 3 315 3 510 1369 160560 0.000811 4 137 167 427 947 58538 0.000306 5 243 9 471 1372 114552 0.000633 6 160 49 420 1321 67266 0.000393 7 227 168 437 867 99310 0.000624 8 213 40 447 1174 95305 0.000651 9 232 221 467 1118 108457 0.000768 X 102 80 386 630 39412 0.000254 Y 5 285 359 434 1793 3e-05 all 24 4660 1 459 1801 2139718 0.000291

Table 119: Chromosome specific distribution of the regions. The last line represents the overall statistics. 1200 800 Frequency 400 0

0 500 1000 1500

length (base pairs)

Figure 115: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

167 The following table shows the properties of seqSinARs-gCount component. property value genes 2219

(a) seqSinARs-deNovo-meme1: width=15, sites=289, (b) seqSinARs-deNovo-meme2: width=15, sites=274, (c) seqSinARs-deNovo-meme3: width=15, sites=137, llr=2776, E=1.3e-110 llr=2557, E=3.8e-55 llr=1432, E=2.6e-15

Figure 116: De novo motifs for the filtered AR siFOXA1 without AR binding site overlaps (stable) sequences.

Table 120: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 13.5881 0.1297 0.0095 534 79 10.45E-170 0.06387 ESR1 7.0388 0.0019 0.0002 9 2 33.1E-04 0.00083 TLX1::NFIC 6.0894 0.0369 0.0060 125 46 13.23E-26 0.02501 ESR2 3.7691 0.0442 0.0117 178 91 20.87E-28 0.02875 GABPA 2.9928 0.1759 0.0587 714 469 4.751E-90 0.11649 ELK4 2.9142 0.0659 0.0226 277 184 16.94E-32 0.04358 PPARG 2.8679 0.0041 0.0014 16 10 84.85E-04 0.00308 CTCF 2.7835 0.0178 0.0064 81 45 7.252E-12 0.01750 E2F1 2.5989 0.0350 0.0134 153 114 88.25E-16 0.02252 Tcfcp2l1 2.3476 0.1314 0.0560 526 429 3.724E-46 0.09986 Stat3 2.1364 0.3308 0.1548 1125 1028 16.79E-100 0.30419 RXR::RAR DR5 2.0654 0.0498 0.0241 224 204 48.38E-16 0.03383 PPARG::RXRA 2.0581 0.0363 0.0176 169 150 5.108E-12 0.02388 Tal1::Gata1 2.0260 0.0677 0.0334 302 285 56.02E-20 0.04589 STAT1 2.0230 0.0739 0.0365 266 254 3.273E-16 0.06957 MIZF 2.0029 0.0455 0.0227 201 193 4.315E-12 0.03187 NHLH1 1.9562 0.2328 0.1190 654 644 5.687E-42 0.29276 ARMotifTH 1.9033 0.0808 0.0424 353 346 1.202E-20 0.06507 Egr1 1.8584 0.1544 0.0831 622 644 27.42E-36 0.13027 TFAP2A 1.8554 3.2644 1.7594 3756 5219 0.00E00 9.10488 Mycn 1.8495 0.2070 0.1119 676 699 24.26E-40 0.22713 Myf 1.8165 0.4315 0.2375 1335 1535 9.529E-84 0.50753 EBF1 1.7870 1.1568 0.6473 2679 3456 16.28E-266 1.55705 ARMotifHH 1.7577 0.5861 0.3334 1947 2217 1.226E-166 0.52905 Myc 1.7480 0.2457 0.1405 818 917 5.272E-42 0.26456 GRMotifTH 1.7450 0.4109 0.2354 1533 1731 7.389E-110 0.33868 Pax5 1.7235 0.0915 0.0531 401 429 26.92E-20 0.07159 Arnt 1.6967 0.2859 0.1685 805 946 2.039E-36 0.37477 Zfp423 1.6447 0.1085 0.0659 381 463 1.159E-12 0.11796 INSM1 1.6394 0.2977 0.1816 1116 1351 19.73E-56 0.26078 NFKB1 1.6137 0.1201 0.0744 391 458 1.318E-14 0.15749 Klf4 1.5814 0.3806 0.2406 1258 1615 1.776E-58 0.41216 ARMotifT 1.5763 0.6310 0.4003 2117 2705 68.47E-162 0.55630 MYC::MAX 1.5717 0.1007 0.0641 351 432 44.52E-12 0.11075 Esrrb 1.5712 0.1602 0.1020 669 799 63.64E-28 0.13420 NFIC 1.5703 3.7764 2.4049 4375 7322 0.00E00 5.35337 PLAG1 1.5434 0.0311 0.0202 135 163 1.138E-04 0.02690 HIF1A::ARNT 1.5184 0.6246 0.4113 1757 2448 69.39E-92 0.81087 ELK1 1.4978 1.2085 0.8069 3092 4515 8.806E-316 1.18678 SP1 1.4817 2.1310 1.4382 2943 4408 25.58E-268 8.37126 AR 1.4782 0.0610 0.0412 274 344 6.036E-08 0.04916 GRMotifT 1.4421 2.7610 1.9145 4246 7076 0.00E00 3.14698 NR2F1 1.4340 0.1542 0.1075 669 848 1.668E-22 0.13007 ARMotifH 1.4340 10.5662 7.3682 4638 8528 0.00E00 22.02846 NR3C1 1.4296 0.3881 0.2715 1437 1978 2.655E-62 0.34501 Zfx 1.4118 0.2704 0.1915 945 1324 18.72E-28 0.30361 Hand1::Tcfe2a 1.3705 1.5732 1.1480 3513 5618 0.00E00 1.65197 Mafb 1.3492 2.6768 1.9840 4047 6772 0.00E00 3.96867 MZF1 1-4 1.3208 5.6933 4.3105 4484 7953 0.00E00 15.53131 MAX 1.3058 0.2708 0.2074 900 1383 11.37E-18 0.32157 FEV 1.2879 0.7515 0.5835 2351 3666 48.7E-140 0.73239 SPI1 1.2845 2.2867 1.7802 3990 6743 0.00E00 2.87553 GRMotifTT 1.2732 0.0288 0.0226 132 191 1.831E-02 0.02505 Myb 1.2594 0.9650 0.7662 2750 4436 39.4E-198 1.04162 ZNF354C 1.2411 4.0595 3.2710 4392 7755 0.00E00 7.24266 Gata1 1.2370 0.7453 0.6025 2240 3638 3.729E-112 0.81173 USF1 1.2279 0.6684 0.5443 1603 2442 59.36E-60 1.17788 TAL1::TCF3 1.2154 0.2695 0.2218 906 1370 38.96E-20 0.34316 RELA 1.2148 0.1368 0.1126 534 802 1.869E-08 0.16003 NFE2L2 1.2054 0.1622 0.1345 687 1080 6.421E-10 0.15006 RORA 1 1.2031 0.3003 0.2496 1165 1882 7.29E-24 0.28755 En1 0.8212 4.2274 5.1481 4466 8440 0.00E00 8.34790 FOXA1 0.8141 0.3123 0.3836 1193 2617 16.45E-04 0.40104 Gfi 0.8021 0.8771 1.0936 2547 5401 13.04E-84 1.28766 FOXO3 0.7570 1.2152 1.6054 3055 6325 3.755E-170 2.12183 FOXD1 0.7258 0.5677 0.7822 1893 4297 11.8E-20 0.89405 HNF1B 0.7008 0.0552 0.0788 243 638 1.642E-04 0.07504 SOX9 0.6995 0.4394 0.6282 1564 3732 20.66E-06 0.68618 HOXA5 0.6829 4.7109 6.8987 4398 8416 0.00E00 17.84263 Cebpa 0.6769 0.6321 0.9338 1981 4768 31.28E-18 1.10759 FOXF2 0.6718 0.0859 0.1279 376 1011 5.838E-06 0.11951 Sox5 0.6569 1.4008 2.1324 3111 6751 26.47E-166 3.44805 PBX1 0.6523 0.0777 0.1192 343 864 5.922E-04 0.16057 Foxq1 0.6364 0.2539 0.3989 973 2535 1.352E-02 0.46102 Nobox 0.6335 1.0567 1.6681 2487 5733 1.213E-58 2.99785 Nkx2-5 0.6109 4.0316 6.5994 4141 8178 0.00E00 21.73812 HLF 0.6010 0.3250 0.5407 1136 3167 13.99E-04 0.68625 TBP 0.5865 0.1828 0.3117 729 2091 1.719E-08 0.34723 SRY 0.5829 2.1095 3.6189 3570 7678 35.14E-280 7.53593 Continued on next page. . .

168 motif ratio fC fR n1C n1R p1 var IRF1 0.5794 0.1549 0.2673 625 1840 2.105E-10 0.29610 Pou5f1 0.5544 0.0067 0.0120 31 103 59.26E-04 0.01021 ARID3A 0.5497 2.1920 3.9874 3455 7481 80.85E-246 10.65383 Pdx1 0.5388 1.5490 2.8749 3089 7251 6.584E-136 5.71103 FOXA1pAR 0.5251 0.0432 0.0823 176 603 38.19E-12 0.09247 Prrx2 0.5206 1.4227 2.7327 2919 7187 46.15E-96 5.06285 NKX3-1 0.5099 0.4053 0.7949 1374 3919 3.162E-02 1.16712 Ddit3::Cebpa 0.4805 0.2152 0.4479 889 2686 1.843E-10 0.49996 MEF2A 0.4762 0.1695 0.3559 600 2113 4.483E-24 0.47306 NFIL3 0.4537 0.0964 0.2126 377 1341 1.006E-20 0.42205 FOXL1 0.4360 3.5833 8.2178 3760 8016 0.00E00 69.77546 Foxd3 0.3474 0.9663 2.7816 2017 6084 1.808E-02 12.38761 Lhx3 0.3296 0.1224 0.3716 443 1934 56.2E-44 0.57349

169 12.47 FoxA1 and FoxA1 siFOXA1 binding site overlaps (stable)

Chromosome specific statistics are shown in Table 121. A histogram of sequence lengths is shown in Figure 117.

length chromosome frequency min mean max total coverage 1 97 113 331 750 32149 0.000129 10 102 115 278 503 28341 0.000209 11 152 166 290 607 44044 0.000326 12 88 118 281 525 24768 0.000185 13 69 167 323 919 22317 0.000194 14 86 101 305 627 26270 0.000245 15 109 158 302 749 32964 0.000322 16 81 12 265 494 21478 0.000238 17 114 15 275 665 31297 0.000385 18 50 160 339 804 16969 0.000217 19 50 10 262 423 13119 0.000222 2 160 69 289 599 46266 0.00019 20 49 150 272 451 13330 0.000212 21 33 168 263 405 8667 0.00018 22 40 161 254 483 10149 0.000198 3 232 2 315 775 73072 0.000369 4 91 166 287 650 26083 0.000136 5 131 139 294 543 38515 0.000213 6 127 149 268 566 33990 0.000199 7 120 44 295 878 35386 0.000222 8 150 128 293 564 43928 3e-04 9 92 28 286 542 26311 0.000186 X 20 97 258 417 5157 3.3e-05 all 23 2243 2 292 919 654570 8.9e-05

Table 121: Chromosome specific distribution of the regions. The last line represents the overall statistics. 1000 600 Frequency 200 0

0 200 400 600 800 1000

length (base pairs)

Figure 117: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

170 The following table shows the properties of seqFXSis-gCount component. property value genes 2327

(a) seqFXSis-deNovo-meme1: width=11, sites=392, (b) seqFXSis-deNovo-meme2: width=15, sites=58, (c) seqFXSis-deNovo-meme3: width=11, sites=55, llr=3469, E=8.6e-287 llr=670, E=5.6e-08 llr=587, E=1.4e+07

Figure 118: De novo motifs for the filtered FoxA1 and FoxA1 siFOXA1 binding site overlaps (stable) sequences.

Table 122: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 6.2138 0.0254 0.0040 57 17 3.866E-14 0.01136 TLX1::NFIC 5.3265 0.0205 0.0038 35 12 2.327E-08 0.01420 Foxa2 4.9818 1.1976 0.2404 1805 795 0.00E00 0.71271 FOXA1 4.9440 1.2462 0.2520 1822 889 0.00E00 0.69358 FOXF2 4.4482 0.3912 0.0879 769 343 5.881E-172 0.20017 CTCF 3.2376 0.0094 0.0029 21 12 9.307E-04 0.00510 FOXD1 2.9687 1.4505 0.4886 1843 1527 0.00E00 1.00982 GABPA 2.7919 0.0892 0.0319 182 128 1.587E-20 0.05695 Foxq1 2.3016 0.5834 0.2535 996 863 40.05E-136 0.43646 Tal1::Gata1 2.2878 0.0464 0.0202 103 84 23.19E-10 0.02911 ELK4 2.2553 0.0312 0.0138 68 54 1.066E-06 0.02135 ESR2 2.2226 0.0138 0.0062 27 23 63.2E-04 0.01095 Stat3 2.0700 0.1677 0.0810 287 288 1.655E-18 0.14791 FOXA1pAR 2.0547 0.0950 0.0462 199 164 9.368E-18 0.08002 STAT1 1.9419 0.0361 0.0186 63 68 12.54E-04 0.03309 NHLH1 1.6510 0.1200 0.0727 170 200 46.06E-08 0.15450 Gata1 1.5192 0.5660 0.3726 905 1230 13.4E-56 0.51541 Tcfcp2l1 1.5108 0.0598 0.0395 124 146 34.34E-06 0.05653 NFIC 1.4895 2.2105 1.4840 1912 2980 1.186E-286 2.49766 FOXI1 1.4760 0.7342 0.4974 1072 1407 68.68E-86 0.84656 Arnt 1.4624 0.1369 0.0936 213 269 25.44E-08 0.18200 FOXO3 1.4624 1.5192 1.0388 1694 2527 21.28E-214 1.57751 Pax5 1.4121 0.0370 0.0262 82 106 76.25E-04 0.03094 ARMotifTH 1.4035 0.0348 0.0248 77 103 1.825E-02 0.02809 ARMotifT 1.3811 0.3662 0.2651 667 937 4.486E-28 0.32912 Myf 1.3786 0.1980 0.1436 340 487 1.277E-08 0.22780 Myb 1.3411 0.6293 0.4693 999 1548 7.02E-52 0.57245 Esrrb 1.3319 0.0803 0.0603 170 238 4.338E-04 0.07111 NFYA 1.3177 0.0477 0.0362 102 139 73.11E-04 0.04699 FEV 1.3120 0.4835 0.3685 854 1259 69.7E-42 0.43798 NFE2L2 1.3016 0.0977 0.0750 205 300 3.361E-04 0.08568 GRMotifTH 1.2973 0.2003 0.1544 403 572 18.61E-12 0.18421 ARMotifHH 1.2845 0.2632 0.2049 506 727 6.151E-16 0.25338 Hand1::Tcfe2a 1.2771 0.9090 0.7118 1307 2071 16.44E-98 0.87056 TAL1::TCF3 1.2623 0.1864 0.1477 309 458 1.289E-06 0.23648 ELK1 1.2268 0.6534 0.5326 1050 1652 1.485E-56 0.63645 Nr2e3 1.2167 0.3492 0.2870 477 756 4.436E-10 0.50416 MZF1 5-13 0.7967 0.7681 0.9640 1134 2443 1.492E-30 1.08792 Pdx1 0.7711 1.4273 1.8511 1489 3159 3.036E-84 3.12147 NF-kappaB 0.7626 0.1155 0.1515 195 516 43.59E-04 0.19423 FOXL1 0.7316 3.9260 5.3666 1942 3715 5.787E-244 32.39785 Prrx2 0.7206 1.2703 1.7627 1386 3053 6.30E-60 2.76632 Foxd3 0.7171 1.3185 1.8387 1341 2492 11.3E-78 7.65492 Zfx 0.6697 0.0812 0.1212 155 437 5.831E-04 0.13463 MEF2A 0.6531 0.1481 0.2268 265 694 72.97E-04 0.29570 Ddit3::Cebpa 0.4739 0.1396 0.2947 282 931 35.67E-10 0.30734

171 12.48 FoxA1 without FoxA1 siFOXA1 binding site overlaps (stable)

Chromosome specific statistics are shown in Table 123. A histogram of sequence lengths is shown in Figure 119.

length chromosome frequency min mean max total coverage 1 1034 34 391 1124 404206 0.001622 10 383 15 356 786 136532 0.001007 11 453 6 363 1109 164321 0.001217 12 352 9 352 872 123732 0.000924 13 207 169 376 998 77760 0.000675 14 304 38 383 1283 116290 0.001083 15 279 143 364 715 101482 0.00099 16 203 2 367 1574 74494 0.000824 17 309 1 355 797 109623 0.00135 18 163 179 364 730 59402 0.000761 19 108 14 303 564 32686 0.000553 2 555 27 364 1237 202060 0.000831 20 210 186 355 1050 74515 0.001182 21 58 174 361 1203 20959 0.000435 22 81 18 321 723 26009 0.000507 3 699 61 402 973 281250 0.00142 4 354 2 362 833 127986 0.00067 5 413 9 374 998 154391 0.000853 6 446 12 349 807 155468 0.000909 7 417 83 368 878 153584 0.000965 8 379 11 365 957 138286 0.000945 9 317 3 358 781 113593 0.000804 X 137 83 334 710 45798 0.000295 Y 4 260 306 381 1225 2.1e-05 all 24 7865 1 368 1574 2895652 0.000394

Table 123: Chromosome specific distribution of the regions. The last line represents the overall statistics. 2500 1500 Frequency 500 0

0 500 1000 1500

length (base pairs)

Figure 119: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

172 The following table shows the properties of seqFXnSis-gCount component. property value genes 5121

(a) seqFXnSis-deNovo-meme1: width=11, sites=391, (b) seqFXnSis-deNovo-meme2: width=15, sites=131, (c) seqFXnSis-deNovo-meme3: width=15, sites=45, llr=3111, E=2.1e-98 llr=1325, E=1.1e-15 llr=539, E=36000

Figure 120: De novo motifs for the filtered FoxA1 without FoxA1 siFOXA1 binding site overlaps (stable) sequences.

Table 124: Motif enrichments

motif ratio fC fR n1C n1R p1 var CTCF 8.6150 0.0226 0.0026 176 38 3.944E-48 0.00966 FOXA1 4.0188 1.2154 0.3024 5934 3688 0.00E00 0.76950 Foxa2 3.9570 1.1547 0.2918 5851 3234 0.00E00 0.83291 FOXF2 3.7154 0.3863 0.1039 2581 1412 0.00E00 0.21766 FOXD1 2.5771 1.6028 0.6219 6510 6369 0.00E00 1.25029 Ar 2.5103 0.0212 0.0084 155 122 10.44E-14 0.01406 TLX1::NFIC 2.5097 0.0153 0.0061 88 69 2.912E-08 0.01379 Foxq1 2.2612 0.6957 0.3076 3807 3577 0.00E00 0.55691 FOXA1pAR 2.0085 0.1160 0.0577 831 762 1.618E-58 0.08978 ELK4 1.7370 0.0319 0.0184 225 252 46.6E-10 0.02690 GABPA 1.7112 0.0782 0.0457 550 633 70.88E-22 0.06469 RXRA::VDR 1.6736 0.0057 0.0034 45 46 45.49E-04 0.00508 FOXI1 1.6671 1.0478 0.6285 4574 5861 0.00E00 1.26831 E2F1 1.5975 0.0181 0.0113 137 162 51.41E-06 0.01426 FOXO3 1.5066 1.9545 1.2973 6536 9825 0.00E00 2.17300 Stat3 1.4699 0.1742 0.1185 1076 1402 1.608E-32 0.18296 Tal1::Gata1 1.4392 0.0421 0.0292 320 422 22.96E-08 0.03436 NHLH1 1.3957 0.1257 0.0900 632 860 12.32E-14 0.17725 MIZF 1.3908 0.0240 0.0173 184 250 5.315E-04 0.02005 Tcfcp2l1 1.3023 0.0587 0.0450 404 593 8.599E-06 0.05926 RXR::RAR DR5 1.2857 0.0249 0.0194 188 282 1.139E-02 0.02184 NKX3-1 1.2813 0.8399 0.6555 3944 5972 20.05E-250 1.22689 ARMotifTH 1.2727 0.0436 0.0343 337 497 1.017E-04 0.03728 Myb 1.2711 0.7594 0.5974 3969 6362 41.3E-226 0.75688 STAT1 1.2572 0.0352 0.0280 221 338 97.87E-04 0.04202 ARMotifT 1.2427 0.4107 0.3305 2570 3967 88.63E-88 0.39077 TAL1::TCF3 1.2348 0.2232 0.1807 1261 1957 1.718E-20 0.27899 Gata1 1.2335 0.6104 0.4949 3357 5424 10.98E-144 0.62924 NFIC 1.2291 2.3141 1.8827 6756 11513 0.00E00 2.98672 Myf 1.2110 0.2070 0.1709 1258 1936 30.31E-22 0.26755 Evi1 1.2050 0.0344 0.0285 263 405 50.22E-04 0.03156 MEF2A 0.8269 0.2416 0.2922 1446 3061 2.497E-02 0.45971 YY1 0.8256 2.9307 3.5500 7164 13780 0.00E00 5.98664 Prrx2 0.8011 1.8112 2.2608 5710 11520 0.00E00 4.21768 NFKB1 0.7914 0.0426 0.0539 256 577 4.877E-02 0.08158 CREB1 0.7859 0.5684 0.7233 3206 6852 63.49E-48 0.85433 MZF1 5-13 0.7845 0.9704 1.2369 4492 9565 67.03E-168 1.68655 SP1 0.7807 0.8604 1.1021 2791 6361 6.425E-18 4.63367 PLAG1 0.7025 0.0099 0.0141 77 201 1.895E-02 0.01322 NF-kappaB 0.6663 0.1271 0.1908 757 2179 13.84E-14 0.24327 Zfx 0.6228 0.0924 0.1483 597 1803 4.432E-16 0.16818 Ddit3::Cebpa 0.5514 0.1957 0.3549 1387 3845 99.57E-10 0.38618 EWSR1-FLI1 0.1838 0.0041 0.0224 11 68 1.615E-04 0.14463

173 12.49 FoxA1 siFOXA1 without FoxA1 binding site overlaps (stable)

Chromosome specific statistics are shown in Table 125. A histogram of sequence lengths is shown in Figure 121.

length chromosome frequency min mean max total coverage 1 5 233 372 749 1859 7e-06 10 9 160 239 306 2152 1.6e-05 11 10 195 264 364 2635 2e-05 12 6 177 212 287 1269 9e-06 13 4 187 248 318 992 9e-06 14 4 191 270 335 1079 1e-05 15 5 203 256 343 1281 1.2e-05 16 3 187 257 329 770 9e-06 17 5 72 194 291 972 1.2e-05 18 2 236 260 284 520 7e-06 19 2 204 224 244 448 8e-06 2 5 191 208 231 1042 4e-06 20 4 210 269 335 1077 1.7e-05 21 4 189 199 217 796 1.7e-05 3 10 188 307 531 3067 1.5e-05 4 6 173 246 334 1477 8e-06 5 19 179 291 500 5531 3.1e-05 6 6 186 252 353 1511 9e-06 7 3 3 172 322 516 3e-06 8 8 142 262 465 2100 1.4e-05 9 7 190 230 274 1608 1.1e-05 X 4 187 223 290 893 6e-06 all 22 131 3 256 749 33595 5e-06

Table 125: Chromosome specific distribution of the regions. The last line represents the overall statistics. 50 30 Frequency 10 0

0 200 400 600 800

length (base pairs)

Figure 121: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

174 The following table shows the properties of seqSinFXs-gCount component. property value genes 197

(a) seqSinFXs-deNovo-meme1: width=14, sites=42, (b) seqSinFXs-deNovo-meme2: width=15, sites=23, (c) seqSinFXs-deNovo-meme3: width=14, sites=9, llr=435, E=5.2e-09 llr=272, E=47 llr=128, E=2400000

Figure 122: De novo motifs for the filtered FoxA1 siFOXA1 without FoxA1 binding site overlaps (stable) sequences.

Table 126: Motif enrichments

motif ratio fC fR n1C n1R p1 var Tal1::Gata1 1693.3077 0.0846 0.0000 11 0 18.55E-06 0.02870 Pax4 924.0769 0.0462 0.0000 4 0 2.63E-02 0.02662 RREB1 74.0799 1.2231 0.0165 23 3 77.28E-10 4.06927 SP1 4.5353 3.6769 0.8107 56 98 42.99E-04 21.70829 Egr1 4.1961 0.0692 0.0165 9 4 1.705E-02 0.03373 FOXA1 3.9550 0.7000 0.1770 81 40 54.21E-24 0.30608 Foxa2 3.9115 0.6923 0.1770 80 29 2.577E-26 0.48811 FOXF2 3.3434 0.2615 0.0782 32 18 1.737E-06 0.13836 MZF1 1-4 2.4093 5.6615 2.3498 114 196 11.63E-18 24.42270 Klf4 2.3532 0.2615 0.1111 25 24 59.79E-04 0.22318 NR3C1 2.2084 0.2000 0.0905 22 22 1.69E-02 0.13393 Gata1 2.0875 0.6615 0.3169 57 64 1.626E-06 0.52626 EBF1 1.9845 0.6615 0.3333 54 63 10.92E-06 0.65653 GRMotifTH 1.7996 0.2000 0.1111 23 26 3.343E-02 0.14373 ELK1 1.6214 0.5538 0.3416 52 69 1.951E-04 0.45857 FOXD1 1.5677 0.8000 0.5103 81 86 87.28E-14 0.68450 Foxq1 1.4428 0.3385 0.2346 37 48 70.5E-04 0.29476 NFIC 1.4349 1.7538 1.2222 97 157 4.811E-12 1.92489 FEV 1.4281 0.5231 0.3663 55 64 7.132E-06 0.52397 SPI1 1.4238 1.3769 0.9671 94 144 7.655E-12 1.25401 ELF5 1.3923 1.3923 1.0000 90 145 5.194E-10 1.53233 ARMotifT 1.3845 0.3077 0.2222 34 47 2.414E-02 0.26428 FOXO3 1.3739 1.1308 0.8230 83 123 34.9E-10 1.23169 Mafb 1.3180 1.2692 0.9630 89 137 2.638E-10 1.44136 TFAP2A 1.2999 1.1769 0.9053 72 96 8.122E-08 3.79032 ARMotifH 1.2932 5.2846 4.0864 127 232 26.12E-24 7.93883 NR4A2 1.2841 1.2154 0.9465 85 141 2.279E-08 1.23763 Fos 1.2780 1.1308 0.8848 75 133 13.3E-06 1.32977 SPIB 1.2591 2.2385 1.7778 105 184 11.04E-14 2.78382 Nr2e3 1.2558 0.3308 0.2634 30 41 4.295E-02 0.41480 GRMotifT 1.2461 1.3538 1.0864 89 162 2.749E-08 1.55636 Myb 1.2059 0.4615 0.3827 49 72 20.61E-04 0.47914 Hand1::Tcfe2a 1.2051 0.7538 0.6255 73 109 71.73E-08 0.72161 Nkx2-5 0.8283 3.1769 3.8354 105 216 18.95E-12 9.00824 HOXA5 0.7949 3.2615 4.1029 115 227 6.136E-16 8.19754 Pdx1 0.7374 1.2077 1.6379 76 176 14.29E-04 2.35805 Prrx2 0.6820 1.0692 1.5679 77 172 5.578E-04 2.12115 FOXL1 0.6290 2.9923 4.7572 101 210 5.199E-10 18.59535

175 12.50 AR and FoxA1 siFOXA1 binding site overlaps (stable)

Chromosome specific statistics are shown in Table 127. A histogram of sequence lengths is shown in Figure 123.

length chromosome frequency min mean max total coverage 1 43 143 343 751 14757 5.9e-05 10 44 8 282 503 12395 9.1e-05 11 69 96 303 570 20914 0.000155 12 33 166 306 525 10097 7.5e-05 13 27 187 328 919 8865 7.7e-05 14 34 216 325 500 11062 0.000103 15 53 185 331 749 17557 0.000171 16 35 140 283 494 9891 0.000109 17 57 15 296 681 16896 0.000208 18 18 31 343 712 6180 7.9e-05 19 19 182 324 443 6149 0.000104 2 77 16 306 644 23595 9.7e-05 20 24 184 279 422 6701 0.000106 21 19 168 278 384 5275 0.00011 22 14 195 272 355 3814 7.4e-05 3 99 22 325 682 32174 0.000162 4 45 166 295 650 13284 6.9e-05 5 74 139 293 500 21660 0.00012 6 57 13 272 504 15476 9e-05 7 51 47 320 917 16300 0.000102 8 66 56 316 555 20833 0.000142 9 41 31 284 454 11631 8.2e-05 X 16 181 267 417 4279 2.8e-05 all 23 1015 8 305 919 309785 4.2e-05

Table 127: Chromosome specific distribution of the regions. The last line represents the overall statistics. 400 300 200 Frequency 100 0

0 200 400 600 800 1000

length (base pairs)

Figure 123: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

176 The following table shows the properties of seqAXSis-gCount component. property value genes 1176

(a) seqAXSis-deNovo-meme1: width=11, sites=366, (b) seqAXSis-deNovo-meme2: width=15, sites=73, (c) seqAXSis-deNovo-meme3: width=15, sites=45, llr=2987, E=3.7e-123 llr=779, E=51000 llr=527, E=95000

Figure 124: De novo motifs for the filtered AR and FoxA1 siFOXA1 binding site overlaps (stable) sequences.

Table 128: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 10.8271 0.0523 0.0048 53 9 94.98E-18 0.02096 TLX1::NFIC 5.5149 0.0237 0.0043 19 7 99.49E-06 0.01508 Foxa2 4.1052 0.8945 0.2179 683 332 7.526E-194 0.52438 FOXA1 3.9444 1.0020 0.2540 720 399 2.054E-196 0.55864 FOXF2 3.6048 0.3028 0.0840 272 152 13.87E-48 0.16454 ESR2 3.1256 0.0217 0.0069 17 13 1.995E-02 0.01540 ELK4 2.9444 0.0345 0.0117 35 22 36.69E-06 0.01930 GABPA 2.6319 0.0868 0.0329 85 60 4.498E-10 0.05258 Tal1::Gata1 2.5637 0.0641 0.0250 64 46 16.77E-08 0.03857 ARMotifTH 2.3632 0.0503 0.0213 51 40 19.43E-06 0.03045 STAT1 2.3042 0.0454 0.0197 35 30 15.98E-04 0.04097 FOXD1 2.2343 1.1706 0.5239 745 710 19.41E-142 0.93765 Esrrb 1.8717 0.1124 0.0600 103 110 4.064E-06 0.08263 GRMotifTH 1.8624 0.2732 0.1467 242 251 4.188E-18 0.19807 Stat3 1.8177 0.1893 0.1041 141 152 2.009E-08 0.19137 FOXA1pAR 1.8078 0.1124 0.0622 106 104 17.71E-08 0.08863 Foxq1 1.7940 0.4862 0.2710 389 410 1.326E-36 0.45514 ARMotifT 1.7317 0.4675 0.2699 376 426 1.222E-30 0.36581 MIZF 1.7257 0.0266 0.0154 27 27 2.50E-02 0.02035 Evi1 1.6573 0.0414 0.0250 39 46 3.305E-02 0.03325 Gata1 1.6410 0.6775 0.4129 461 595 9.506E-36 0.61137 NHLH1 1.6388 0.1193 0.0728 73 92 52.38E-04 0.15648 NR3C1 1.6238 0.2623 0.1615 223 270 11.48E-12 0.21686 AR 1.5800 0.0454 0.0287 42 50 2.834E-02 0.03888 EBF1 1.5470 0.5730 0.3704 368 497 66.46E-22 0.67082 NFIC 1.5425 2.4024 1.5574 868 1366 21.23E-130 2.78916 Tcfcp2l1 1.5135 0.0611 0.0404 59 72 96.01E-04 0.05023 GRMotifT 1.4324 1.7840 1.2455 839 1270 6.318E-122 1.78218 FEV 1.4216 0.5325 0.3746 418 567 1.20E-26 0.46762 ARMotifHH 1.3688 0.3215 0.2349 272 351 13.65E-14 0.31790 GR 1.3681 0.2130 0.1557 193 258 22.96E-08 0.18982 Myb 1.3640 0.6371 0.4671 452 683 38.75E-26 0.59204 NR2F1 1.3534 0.0957 0.0707 88 131 3.559E-02 0.08143 ARMotifH 1.3507 6.4606 4.7832 1002 1811 7.706E-180 10.19048 Myf 1.3411 0.2081 0.1552 168 232 15.08E-06 0.23683 Hand1::Tcfe2a 1.3009 0.9684 0.7444 607 965 41.76E-48 0.91403 GRMotifHH 1.2998 0.1913 0.1472 169 238 28.93E-06 0.18667 ELK1 1.2975 0.6805 0.5244 484 710 29.4E-32 0.67147 SP1 1.2946 1.1509 0.8889 358 738 31.16E-06 6.18408 NFE2L2 1.2766 0.1065 0.0834 101 150 1.783E-02 0.09352 FOXI1 1.2751 0.6538 0.5128 459 636 45.72E-32 0.81410 FOXO3 1.2657 1.3935 1.1010 729 1146 1.097E-78 1.62598 GRMotifH 1.2137 4.3925 3.6190 969 1767 9.904E-160 6.55332 YY1 0.8326 2.4842 2.9835 885 1696 1.153E-112 6.23559 CREB1 0.8299 0.5128 0.6180 372 767 7.342E-06 0.76094 Nobox 0.8153 0.9349 1.1467 529 1045 3.936E-20 1.85243 Nkx2-5 0.8062 3.6824 4.5675 915 1682 33.22E-132 12.85936 Pdx1 0.7155 1.3895 1.9421 662 1386 56.94E-38 3.42003 Foxd3 0.7064 1.1913 1.6865 560 1101 2.241E-24 6.38394 Prrx2 0.6969 1.2594 1.8071 616 1373 60.89E-26 2.98220 FOXL1 0.6542 3.6006 5.5037 867 1651 6.653E-106 33.07509 NFIL3 0.6008 0.0878 0.1461 75 203 3.83E-02 0.19905 IRF1 0.5766 0.1124 0.1950 99 302 15.12E-04 0.21040 Lhx3 0.5620 0.1302 0.2317 108 289 3.937E-02 0.34563 Ddit3::Cebpa 0.4971 0.1450 0.2917 133 419 3.484E-04 0.30906 PLAG1 0.3120 0.0039 0.0128 4 23 4.755E-02 0.01027

177 12.51 AR without FoxA1 siFOXA1 binding site overlaps (stable)

Chromosome specific statistics are shown in Table 129. A histogram of sequence lengths is shown in Figure 125.

length chromosome frequency min mean max total coverage 1 1232 0 491 1678 604553 0.002425 10 443 13 466 1252 206512 0.001524 11 557 8 472 1635 263066 0.001949 12 369 89 450 1247 165989 0.00124 13 188 193 490 1616 92048 0.000799 14 296 216 481 1255 142439 0.001327 15 395 8 472 1165 186321 0.001817 16 350 1 474 1768 165764 0.001835 17 577 10 482 1362 277969 0.003423 18 168 117 437 929 73430 0.00094 19 249 2 423 1380 105290 0.001781 2 577 10 463 1801 267309 0.001099 20 273 217 467 1240 127435 0.002022 21 88 130 460 1255 40454 0.000841 22 178 7 425 969 75643 0.001474 3 675 3 514 1420 346696 0.001751 4 300 148 444 1041 133061 0.000696 5 478 7 470 1482 224701 0.001242 6 393 49 436 1321 171441 0.001002 7 464 97 464 1026 215323 0.001353 8 419 40 462 1300 193664 0.001323 9 393 11 475 1118 186577 0.001321 X 207 80 403 1115 83436 0.000537 Y 6 285 371 434 2226 3.7e-05 all 24 9275 0 469 1801 4351347 0.000592

Table 129: Chromosome specific distribution of the regions. The last line represents the overall statistics. 2500 1500 Frequency 500 0

0 500 1000 1500

length (base pairs)

Figure 125: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

178 The following table shows the properties of seqAXnSis-gCount component. property value genes 5268

(a) seqAXnSis-deNovo-meme1: width=15, sites=339, (b) seqAXnSis-deNovo-meme2: width=15, sites=288, (c) seqAXnSis-deNovo-meme3: width=15, sites=91, llr=3257, E=1.8e-165 llr=2735, E=3.5e-80 llr=1028, E=3e-05

Figure 126: De novo motifs for the filtered AR without FoxA1 siFOXA1 binding site overlaps (stable) sequences.

Table 130: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 15.0973 0.1411 0.0093 1167 152 0.00E00 0.06565 ESR1 8.7808 0.0014 0.0001 13 2 80.92E-06 0.00056 TLX1::NFIC 4.6484 0.0351 0.0075 241 110 66.13E-42 0.02488 CTCF 3.5067 0.0151 0.0043 138 73 1.254E-20 0.00821 ESR2 3.3369 0.0433 0.0129 351 207 14.57E-46 0.02838 ELK4 2.5171 0.0576 0.0229 476 376 68.27E-42 0.04084 GABPA 2.4853 0.1595 0.0642 1292 1037 42.33E-122 0.11110 E2F1 2.4549 0.0310 0.0126 270 209 7.771E-24 0.02082 Tcfcp2l1 2.0596 0.1302 0.0632 1036 984 3.648E-68 0.10275 Stat3 2.0509 0.3161 0.1541 2117 2103 66.93E-160 0.29407 Tal1::Gata1 2.0008 0.0665 0.0332 594 561 1.328E-36 0.04559 PPARG::RXRA 1.9888 0.0351 0.0176 321 299 6.766E-20 0.02389 PPARG 1.9479 0.0042 0.0021 33 36 3.183E-02 0.00338 ARMotifTH 1.9470 0.0825 0.0423 717 706 47.65E-42 0.05977 NHLH1 1.8983 0.2286 0.1204 1296 1317 9.857E-78 0.28529 MIZF 1.8203 0.0459 0.0252 406 426 52.0E-20 0.03360 STAT1 1.8153 0.0670 0.0369 484 513 3.034E-22 0.06683 GRMotifTH 1.7736 0.4201 0.2368 3085 3543 3.743E-218 0.33582 TFAP2A 1.7630 3.1296 1.7752 7314 10614 0.00E00 9.16514 EBF1 1.7481 1.1350 0.6493 5216 6952 0.00E00 1.50490 Mycn 1.7384 0.2016 0.1160 1332 1454 4.427E-68 0.22871 ARMotifHH 1.7330 0.5805 0.3350 3828 4494 38.41E-312 0.53228 Myf 1.7053 0.4106 0.2408 2596 3083 57.97E-152 0.49617 Egr1 1.6523 0.1500 0.0908 1194 1350 8.242E-54 0.15050 Pax5 1.6488 0.0875 0.0531 758 869 19.85E-30 0.06958 Zfp423 1.6443 0.1069 0.0650 759 877 85.68E-30 0.12005 RXR::RAR DR5 1.6268 0.0466 0.0286 416 480 69.09E-16 0.03641 Esrrb 1.6140 0.1628 0.1008 1371 1605 62.01E-60 0.13062 Myc 1.6125 0.2350 0.1457 1582 1881 35.98E-70 0.26364 Arnt 1.5754 0.2729 0.1732 1559 1930 2.291E-60 0.36932 ARMotifT 1.5690 0.6679 0.4256 4389 5677 0.00E00 0.58688 NFIC 1.5511 3.7521 2.4189 8712 14646 0.00E00 5.31374 RXRA::VDR 1.5420 0.0098 0.0064 91 108 13.67E-04 0.00766 GRMotifTT 1.5190 0.0301 0.0198 274 334 2.952E-08 0.02398 PLAG1 1.5183 0.0282 0.0185 249 304 16.23E-08 0.02383 ARMotifTT 1.5090 0.0114 0.0076 106 126 5.283E-04 0.00929 AR 1.4701 0.0669 0.0455 594 769 1.012E-14 0.05371 NR3C1 1.4674 0.4044 0.2755 2963 4027 55.8E-140 0.35179 INSM1 1.4485 0.2815 0.1943 2118 2790 7.499E-82 0.27808 ELK1 1.4392 1.2012 0.8346 6171 9120 0.00E00 1.24194 NFKB1 1.4365 0.1152 0.0802 750 954 1.113E-20 0.17376 GRMotifT 1.4342 2.8420 1.9815 8490 14346 0.00E00 3.19260 MYC::MAX 1.4339 0.0943 0.0658 666 902 4.977E-14 0.10846 HIF1A::ARNT 1.4317 0.5945 0.4152 3393 4802 15.64E-166 0.80854 Klf4 1.4316 0.3576 0.2498 2394 3351 56.11E-86 0.41670 NR2F1 1.4286 0.1559 0.1091 1327 1743 21.74E-40 0.13609 ARMotifH 1.4143 10.7214 7.5805 9234 17142 0.00E00 22.74696 SP1 1.3736 1.9901 1.4488 5707 8884 0.00E00 7.11192 Hand1::Tcfe2a 1.3603 1.5788 1.1606 6986 11303 0.00E00 1.69630 Mafb 1.2914 2.6510 2.0528 8013 13784 0.00E00 4.02426 FEV 1.2831 0.7598 0.5921 4773 7409 7.396E-298 0.73873 Zfx 1.2789 0.2557 0.1999 1769 2784 1.053E-30 0.30129 MZF1 1-4 1.2757 5.5677 4.3644 8886 15953 0.00E00 15.44758 SPI1 1.2683 2.3031 1.8158 7993 13603 0.00E00 2.94761 Myb 1.2537 0.9641 0.7689 5395 8834 0.00E00 1.01073 MAX 1.2465 0.2663 0.2136 1811 2871 46.51E-32 0.31799 Gata1 1.2388 0.7677 0.6197 4602 7454 13.87E-248 0.81725 RELA 1.2345 0.1403 0.1137 1085 1637 2.551E-16 0.16270 GR 1.2288 0.3006 0.2446 2364 3530 40.1E-68 0.29183 NFE2L2 1.2287 0.1618 0.1317 1364 2128 6.362E-20 0.14537 TAL1::TCF3 1.2239 0.2812 0.2297 1873 2851 11.43E-40 0.35625 GRMotifHH 1.2120 0.2563 0.2114 1971 3026 3.299E-42 0.26950 NR4A2 1.2017 2.1725 1.8079 7894 13758 0.00E00 2.59183 En1 0.8250 4.3801 5.3094 8932 16918 0.00E00 9.09397 Gfi 0.8137 0.9162 1.1260 5249 10961 6.219E-200 1.33873 FOXO3 0.7883 1.2991 1.6480 6302 12769 0.00E00 2.25270 FOXD1 0.7816 0.6153 0.7872 3962 8680 1.213E-58 0.92556 FOXF2 0.7676 0.1024 0.1335 884 2097 11.57E-04 0.13056 SOX9 0.7339 0.4798 0.6538 3323 7749 3.27E-18 0.71450 Cebpa 0.7290 0.6874 0.9429 4186 9624 99.56E-60 1.13742 HOXA5 0.7160 5.0635 7.0715 8826 16869 0.00E00 18.20425 Sox5 0.6905 1.5022 2.1755 6405 13668 0.00E00 3.57788 FOXA1pAR 0.6884 0.0555 0.0806 457 1218 7.195E-08 0.09176 Nobox 0.6770 1.1689 1.7265 5228 11734 36.54E-162 3.14069 HNF1B 0.6685 0.0600 0.0898 514 1434 94.22E-12 0.08651 PBX1 0.6643 0.0813 0.1224 708 1773 4.694E-06 0.23228 Nkx2-5 0.6488 4.4034 6.7866 8371 16460 0.00E00 22.52530 SRY 0.6308 2.3530 3.7304 7354 15429 0.00E00 8.03806 HLF 0.6292 0.3397 0.5399 2370 6443 47.75E-04 0.64478 Continued on next page. . .

179 motif ratio fC fR n1C n1R p1 var TBP 0.6233 0.1973 0.3166 1560 4238 66.38E-10 0.35720 Pou5f1 0.6216 0.0072 0.0117 67 200 13.03E-04 0.01017 FOXI1 0.6007 0.4786 0.7969 3111 7964 2.606E-04 1.18405 ARID3A 0.5927 2.4285 4.0973 7096 15097 0.00E00 10.93582 IRF1 0.5911 0.1661 0.2811 1323 3835 2.026E-16 0.31680 Pdx1 0.5796 1.7173 2.9630 6382 14654 0.00E00 5.96639 Prrx2 0.5501 1.5400 2.7994 6019 14381 21.67E-242 5.30173 NFIL3 0.5057 0.1118 0.2212 848 2790 4.227E-30 0.34702 MEF2A 0.5003 0.1809 0.3617 1264 4442 55.11E-46 0.47442 Ddit3::Cebpa 0.4812 0.2257 0.4691 1835 5517 44.87E-18 0.53168 FOXL1 0.4707 3.9593 8.4109 7701 16217 0.00E00 63.09240 Foxd3 0.3885 1.0961 2.8216 4355 12412 50.43E-20 11.98456 Lhx3 0.3740 0.1421 0.3800 1024 4028 3.664E-64 0.57212 EWSR1-FLI1 0.2591 0.0068 0.0264 24 84 83.55E-04 0.21920

180 12.52 FoxA1 without AR siFOXA1 binding site overlaps (stable)

Chromosome specific statistics are shown in Table 131. A histogram of sequence lengths is shown in Figure 127.

length chromosome frequency min mean max total coverage 1 59 173 342 749 20157 8.1e-05 10 67 117 272 477 18205 0.000134 11 93 166 273 607 25399 0.000188 12 61 168 265 471 16155 0.000121 13 46 170 320 555 14739 0.000128 14 57 101 289 627 16477 0.000153 15 61 158 275 518 16794 0.000164 16 49 141 264 422 12959 0.000143 17 62 72 255 509 15797 0.000195 18 35 160 321 804 11224 0.000144 19 32 171 246 343 7869 0.000133 2 88 158 275 483 24192 9.9e-05 20 29 150 265 451 7695 0.000122 21 18 183 260 405 4674 9.7e-05 22 26 182 262 483 6811 0.000133 3 143 172 312 775 44660 0.000226 4 52 167 277 501 14400 7.5e-05 5 76 179 292 543 22199 0.000123 6 77 149 258 566 19883 0.000116 7 72 3 278 473 20010 0.000126 8 93 128 278 479 25830 0.000176 9 58 28 274 542 15895 0.000113 X 8 97 233 316 1864 1.2e-05 all 23 1362 3 282 804 383888 5.2e-05

Table 131: Chromosome specific distribution of the regions. The last line represents the overall statistics. 400 300 200 Frequency 100 0

0 200 400 600 800

length (base pairs)

Figure 127: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

181 The following table shows the properties of seqSinAXs-gCount component. property value genes 1658

(a) seqSinAXs-deNovo-meme1: width=15, sites=391, (b) seqSinAXs-deNovo-meme2: width=15, sites=90, (c) seqSinAXs-deNovo-meme3: width=15, sites=45, llr=3612, E=0 llr=925, E=0.00025 llr=524, E=56

Figure 128: De novo motifs for the filtered FoxA1 without AR siFOXA1 binding site overlaps (stable) sequences.

Table 132: Motif enrichments

motif ratio fC fR n1C n1R p1 var Pax4 89.1705 0.0044 0.0000 4 0 2.68E-02 0.00255 Foxa2 6.2523 1.3556 0.2168 1189 439 0.00E00 0.77833 CTCF 6.2392 0.0125 0.0020 17 5 69.97E-06 0.00559 FOXA1 5.8366 1.3593 0.2328 1175 514 0.00E00 0.73667 FOXF2 5.2991 0.4364 0.0823 520 201 2.221E-132 0.20781 TLX1::NFIC 3.8856 0.0169 0.0043 17 10 38.19E-04 0.01220 FOXD1 3.1859 1.5687 0.4924 1169 942 2.458E-302 1.05436 GABPA 3.1721 0.0896 0.0282 106 70 16.93E-14 0.05686 ELK4 2.8396 0.0279 0.0098 36 24 54.53E-06 0.01738 Foxq1 2.6225 0.6231 0.2376 639 512 54.59E-100 0.41871 Stat3 2.0965 0.1521 0.0725 162 160 33.01E-12 0.13110 NHLH1 1.7696 0.1124 0.0635 103 108 3.439E-06 0.13184 FOXA1pAR 1.6780 0.0757 0.0451 97 107 31.38E-06 0.06030 Tal1::Gata1 1.6386 0.0360 0.0220 49 54 56.73E-04 0.02715 FOXI1 1.6201 0.7494 0.4626 645 842 8.604E-52 0.78210 NFYA 1.6115 0.0588 0.0365 74 76 89.99E-06 0.06171 FOXO3 1.5597 1.5702 1.0067 1048 1502 14.84E-144 1.50710 Mycn 1.5375 0.0940 0.0612 93 121 25.85E-04 0.11235 Tcfcp2l1 1.5329 0.0595 0.0388 74 94 62.7E-04 0.05004 Arnt 1.5096 0.1492 0.0988 133 176 1.995E-04 0.19895 NFIC 1.4229 2.0470 1.4387 1143 1828 55.2E-164 2.20190 ELK1 1.3765 0.6400 0.4649 625 920 1.52E-38 0.58230 Myf 1.3698 0.1896 0.1384 195 282 37.09E-06 0.22899 TAL1::TCF3 1.3683 0.1749 0.1278 180 238 3.161E-06 0.21189 FEV 1.3623 0.4555 0.3344 495 694 4.311E-26 0.40562 Myb 1.3446 0.6098 0.4535 598 908 1.302E-32 0.55478 Gata1 1.3042 0.4908 0.3763 496 742 2.175E-22 0.49668 Myc 1.2867 0.1014 0.0788 108 168 4.131E-02 0.11599 Hand1::Tcfe2a 1.2803 0.8677 0.6778 777 1233 1.227E-56 0.78531 Nr2e3 1.2737 0.3350 0.2630 280 427 31.33E-08 0.47004 MAX 1.2638 0.1506 0.1192 152 256 4.224E-02 0.18071 NFE2L2 1.2461 0.0918 0.0737 118 187 4.025E-02 0.07823 SPI1 1.2335 1.2630 1.0239 934 1560 4.224E-86 1.45848 YY1 0.8165 2.1888 2.6809 1170 2309 2.078E-140 3.89252 Foxd3 0.7909 1.3762 1.7401 833 1476 3.525E-56 6.75469 FOXL1 0.7785 4.0639 5.2199 1186 2234 1.846E-154 44.12607 MZF1 5-13 0.7749 0.7472 0.9643 661 1482 1.672E-14 1.15536 Pdx1 0.7715 1.4313 1.8554 897 1857 55.04E-54 3.21070 Prrx2 0.7186 1.2513 1.7413 844 1807 5.18E-40 2.85052 NF-kappaB 0.6727 0.0999 0.1486 106 302 48.72E-04 0.18389 Zfx 0.6038 0.0698 0.1156 80 259 2.73E-04 0.12148 Ddit3::Cebpa 0.4895 0.1381 0.2822 167 562 1.80E-06 0.28720

182 12.53 GR DEX binding sites

Chromosome specific statistics are shown in Table 133. A histogram of sequence lengths is shown in Figure 129.

length chromosome frequency min mean max total coverage 1 585 257 611 1579 357711 0.001435 10 249 234 580 1577 144376 0.001065 11 282 263 585 1584 164868 0.001221 12 221 266 569 1353 125777 0.00094 13 186 262 643 2132 119628 0.001039 14 183 207 608 1553 111206 0.001036 15 194 240 604 1432 117199 0.001143 16 184 272 574 1622 105661 0.001169 17 231 254 577 1360 133338 0.001642 18 150 252 603 1418 90421 0.001158 19 81 256 561 1554 45459 0.000769 2 453 234 563 1404 255048 0.001049 20 143 257 593 1744 84797 0.001345 21 95 322 591 1160 56139 0.001166 22 74 298 507 989 37489 0.000731 3 483 215 651 1654 314342 0.001587 4 297 253 551 1237 163652 0.000856 5 433 235 598 1669 259047 0.001432 6 335 263 558 1161 186835 0.001092 7 350 258 584 1614 204389 0.001284 8 348 242 589 1206 204841 0.0014 9 242 250 593 1916 143484 0.001016 X 119 265 490 1205 58352 0.000376 Y 6 284 423 497 2538 4.3e-05 all 24 5924 207 589 2132 3486597 0.000474

Table 133: Chromosome specific distribution of the regions. The last line represents the overall statistics. 1500 1000 Frequency 500 0

500 1000 1500 2000

length (base pairs)

Figure 129: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

183 The following table shows the properties of seqGR-gCount component. property value genes 11060

(a) seqGR-deNovo-meme1: width=15, sites=337, (b) seqGR-deNovo-meme2: width=15, sites=109, (c) seqGR-deNovo-meme3: width=15, sites=36, llr=3290, E=3.9e-169 llr=1218, E=1.5e-13 llr=478, E=1200

Figure 130: De novo motifs for the filtered GR DEX binding sites sequences.

Table 134: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 14.7795 0.1810 0.0122 933 130 7.309E-312 0.10171 TLX1::NFIC 2.8337 0.0209 0.0074 93 69 13.44E-10 0.01707 ARMotifTH 2.1082 0.1192 0.0565 643 605 2.999E-44 0.08373 FOXA1 1.9866 1.0078 0.5073 3646 4155 0.00E00 0.84224 ARMotifTT 1.9260 0.0203 0.0105 117 111 10.55E-08 0.01474 Foxa2 1.8842 0.9087 0.4822 3360 3680 0.00E00 0.97412 FOXF2 1.8709 0.3141 0.1679 1565 1678 20.56E-112 0.23586 ARMotifT 1.8532 0.9870 0.5326 3560 4346 0.00E00 0.98405 ESR2 1.8508 0.0324 0.0175 157 183 4.44E-06 0.02825 GRMotifTH 1.7787 0.5552 0.3121 2411 2907 9.348E-188 0.52430 NR3C1 1.7333 0.6018 0.3472 2574 3156 43.18E-208 0.51002 CTCF 1.6344 0.0069 0.0042 41 47 2.393E-02 0.00513 AR 1.6160 0.0955 0.0591 524 628 50.96E-20 0.07750 GRMotifTT 1.5646 0.0451 0.0288 253 312 6.19E-08 0.03724 ARMotifHH 1.5600 0.6458 0.4140 2565 3465 19.15E-172 0.85808 Stat3 1.5353 0.2878 0.1874 1227 1562 84.9E-52 0.32786 Tal1::Gata1 1.5208 0.0690 0.0454 387 493 51.8E-12 0.05541 FOXD1 1.5115 1.5518 1.0266 4394 6615 0.00E00 1.62931 Foxq1 1.4487 0.7414 0.5117 2878 4025 1.355E-206 0.77232 GABPA 1.4288 0.1107 0.0775 614 804 30.28E-18 0.09364 GRMotifT 1.4122 3.5005 2.4787 5663 9997 0.00E00 4.62369 GRMotifHH 1.3965 0.3651 0.2614 1678 2378 1.593E-64 0.36161 GR 1.3870 0.4289 0.3092 1981 2779 4.559E-92 0.51896 PPARG::RXRA 1.3708 0.0304 0.0222 173 243 15.87E-04 0.02579 ARMotifH 1.3136 12.2551 9.3294 5923 11130 0.00E00 28.65634 Tcfcp2l1 1.2941 0.0940 0.0726 489 721 6.143E-08 0.09720 MIZF 1.2826 0.0383 0.0299 217 321 21.01E-04 0.03442 Esrrb 1.2804 0.1570 0.1226 841 1258 8.634E-16 0.14127 RXR::RAR DR5 1.2775 0.0403 0.0316 236 346 8.241E-04 0.03447 Arnt 1.2658 0.2625 0.2074 985 1469 1.17E-20 0.42331 STAT1 1.2533 0.0579 0.0462 267 413 28.19E-04 0.07010 NHLH1 1.2414 0.1835 0.1478 654 1027 1.695E-08 0.29490 RORA 2 1.2133 0.0316 0.0260 179 285 4.548E-02 0.02868 FOXI1 1.2096 1.2498 1.0333 3789 6136 28.72E-320 1.84328 Gata1 1.2092 0.9549 0.7897 3445 5671 1.00E-238 1.03851 GRMotifH 1.2047 8.5370 7.0861 5904 11077 0.00E00 15.77390 ELK4 1.2028 0.0341 0.0283 193 299 1.702E-02 0.03305 MZF1 5-13 0.8312 1.6507 1.9860 4438 8945 0.00E00 3.09395 CREB1 0.8278 0.9642 1.1648 3405 6923 24.93E-148 1.51884 Prrx2 0.8265 3.0049 3.6358 5105 9996 0.00E00 7.58171 Foxd3 0.7885 2.8336 3.5937 4647 8825 0.00E00 16.03098 IRF1 0.7714 0.2834 0.3674 1379 3029 29.39E-04 0.44866 Zfx 0.7483 0.1759 0.2351 816 2060 12.18E-04 0.29176 RREB1 0.5956 0.0344 0.0578 164 453 1.356E-04 0.09388

184 12.54 GR DEX binding sites (siFOXA1)

Chromosome specific statistics are shown in Table 135. A histogram of sequence lengths is shown in Figure 131.

length chromosome frequency min mean max total coverage 1 1098 191 486 1213 534041 0.002143 10 459 207 459 1303 210647 0.001554 11 473 207 466 1069 220449 0.001633 12 337 237 446 1291 150184 0.001122 13 215 187 495 2116 106516 0.000925 14 332 230 465 1233 154496 0.001439 15 343 211 460 1078 157685 0.001538 16 367 198 470 1250 172573 0.00191 17 472 197 477 1153 225153 0.002773 18 225 195 457 983 102842 0.001317 19 178 221 443 1573 78778 0.001332 2 752 189 436 1138 328070 0.001349 20 290 193 461 1565 133584 0.00212 21 150 221 447 923 66989 0.001392 22 183 228 435 960 79520 0.00155 3 736 219 497 1291 365731 0.001847 4 390 192 433 987 169008 0.000884 5 585 208 461 1220 269580 0.00149 6 497 179 438 1463 217873 0.001273 7 537 203 466 1257 250429 0.001574 8 498 180 467 1264 232756 0.00159 9 435 208 459 1010 199458 0.001412 X 213 191 401 1095 85438 0.00055 Y 6 327 374 408 2241 3.8e-05 all 24 9771 179 462 2116 4514041 0.000614

Table 135: Chromosome specific distribution of the regions. The last line represents the overall statistics. 3000 2000 Frequency 1000 0

500 1000 1500 2000

length (base pairs)

Figure 131: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

185 The following table shows the properties of seqGRnoF-gCount component. property value genes 15379

(a) seqGRnoF-deNovo-meme1: width=15, sites=351, (b) seqGRnoF-deNovo-meme2: width=15, sites=140, (c) seqGRnoF-deNovo-meme3: width=11, sites=47, llr=3252, E=9.6e-151 llr=1460, E=1.5e-14 llr=567, E=7.7

Figure 132: De novo motifs for the filtered GR DEX binding sites (siFOXA1) sequences.

Table 136: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 20.1996 0.1774 0.0087 1555 150 0.00E00 0.08807 TLX1::NFIC 3.6241 0.0228 0.0063 171 89 12.97E-26 0.01755 ESR2 2.8416 0.0382 0.0134 309 215 6.381E-32 0.02922 PPARG 2.4797 0.0031 0.0012 26 20 34.5E-04 0.00228 ARMotifTH 2.4200 0.1052 0.0434 960 771 4.393E-84 0.06765 ESR1 2.4025 0.0011 0.0004 11 7 3.708E-02 0.00075 CTCF 2.2990 0.0089 0.0038 85 70 25.15E-08 0.00572 GRMotifTH 2.2679 0.5397 0.2379 3889 3762 0.00E00 0.46336 GABPA 2.1715 0.1302 0.0599 1155 1029 1.495E-86 0.09313 ARMotifT 2.1024 0.8583 0.4082 5474 5855 0.00E00 0.77424 ARMotifHH 2.0537 0.6655 0.3241 4377 4664 0.00E00 0.76687 GRMotifTT 1.9906 0.0393 0.0197 370 344 90.95E-24 0.02878 NR3C1 1.9815 0.5264 0.2656 3907 4097 0.00E00 0.39896 RXR::RAR DR5 1.9419 0.0444 0.0228 419 406 14.91E-24 0.03132 Stat3 1.9372 0.2996 0.1546 2127 2226 12.25E-140 0.28858 Tcfcp2l1 1.9322 0.1166 0.0603 992 978 1.648E-58 0.09608 MIZF 1.8877 0.0442 0.0234 408 412 92.3E-22 0.03402 ARMotifTT 1.8706 0.0154 0.0082 149 142 45.2E-10 0.01122 AR 1.8049 0.0802 0.0444 726 784 1.836E-32 0.06442 PPARG::RXRA 1.7962 0.0312 0.0174 294 310 21.14E-14 0.02306 ELK4 1.7865 0.0366 0.0205 340 356 9.533E-16 0.02838 RXRA::VDR 1.7680 0.0102 0.0058 98 105 70.77E-06 0.00742 Mycn 1.7241 0.1803 0.1046 1240 1401 41.96E-56 0.20587 GRMotifT 1.7097 3.2876 1.9229 9260 15172 0.00E00 4.11816 Esrrb 1.6673 0.1553 0.0931 1379 1597 24.44E-60 0.12014 Myc 1.6661 0.2216 0.1330 1549 1806 1.957E-68 0.25772 NHLH1 1.6660 0.1981 0.1189 1200 1366 10.1E-52 0.26264 Arnt 1.6592 0.2665 0.1606 1607 1891 2.575E-70 0.36451 Tal1::Gata1 1.6467 0.0578 0.0351 548 626 3.431E-20 0.04354 Myf 1.6263 0.3647 0.2242 2429 3027 1.061E-114 0.46131 EBF1 1.5787 0.9635 0.6103 5019 7096 0.00E00 1.31113 ARMotifH 1.5679 11.5161 7.3448 9771 18113 0.00E00 24.75241 GR 1.5642 0.3806 0.2433 2933 3682 1.249E-156 0.42240 GRMotifHH 1.5343 0.3112 0.2028 2395 3126 72.17E-100 0.29343 Zfp423 1.5170 0.0893 0.0589 690 860 12.42E-20 0.10057 Pax5 1.5162 0.0752 0.0496 694 870 23.22E-20 0.06105 MYC::MAX 1.5094 0.0940 0.0622 699 903 18.32E-18 0.10540 TFAP2A 1.4736 2.5070 1.7013 7164 11121 0.00E00 7.30466 STAT1 1.4701 0.0529 0.0360 409 528 9.856E-10 0.05851 Egr1 1.4449 0.1259 0.0871 1063 1342 4.647E-32 0.15450 E2F1 1.4323 0.0181 0.0126 169 223 4.773E-04 0.01548 NFIC 1.4237 3.3691 2.3663 9051 15420 0.00E00 4.85158 ELK1 1.4140 1.1295 0.7988 6331 9488 0.00E00 1.17851 NFKB1 1.3803 0.0980 0.0710 665 916 1.244E-12 0.14898 HIF1A::ARNT 1.3762 0.5508 0.4002 3411 4970 90.58E-146 0.76750 INSM1 1.3412 0.2466 0.1839 1987 2834 11.53E-54 0.25196 TAL1::TCF3 1.3390 0.2993 0.2235 2108 2915 2.576E-66 0.36112 Hand1::Tcfe2a 1.3127 1.4967 1.1402 7282 11950 0.00E00 1.59714 FEV 1.3124 0.7616 0.5803 5055 7682 0.00E00 0.75273 MAX 1.2939 0.2707 0.2092 1941 2915 3.961E-42 0.33453 NR2F1 1.2791 0.1416 0.1107 1264 1876 25.31E-22 0.12735 NR4A2 1.2710 2.2340 1.7576 8439 14401 0.00E00 2.75853 Klf4 1.2619 0.2942 0.2332 2153 3336 34.54E-46 0.36695 USF1 1.2574 0.6776 0.5389 3466 5139 55.68E-144 1.16449 GRMotifH 1.2552 6.9886 5.5678 9731 17953 0.00E00 11.60719 RELA 1.2448 0.1290 0.1036 1058 1619 7.058E-14 0.13955 PLAG1 1.2416 0.0229 0.0185 214 311 28.82E-04 0.02276 SPI1 1.2320 2.1834 1.7722 8348 14284 0.00E00 2.76365 Myb 1.2320 0.9232 0.7494 5668 9131 0.00E00 0.95843 MZF1 1-4 1.2245 5.1135 4.1759 9358 16856 0.00E00 14.36667 Mafb 1.2215 2.4155 1.9775 8255 14283 0.00E00 3.63415 TEAD1 1.2035 0.0990 0.0822 905 1415 9.359E-10 0.09204 Gfi 0.8232 0.9115 1.1072 5528 11510 20.97E-210 1.25377 Foxq1 0.7928 0.3173 0.4002 2406 5418 9.523E-04 0.49936 Sox5 0.7692 1.6674 2.1678 7214 14602 0.00E00 3.47677 HOXA5 0.7549 5.3154 7.0414 9453 17838 0.00E00 16.83952 IRF2 0.7545 0.0060 0.0080 56 146 4.248E-02 0.00770 Pax6 0.7447 0.0085 0.0114 82 205 3.404E-02 0.01058 HLF 0.7436 0.4036 0.5429 2841 6821 6.79E-04 0.69906 PBX1 0.7410 0.0855 0.1153 755 1800 2.604E-04 0.18298 HNF1B 0.7321 0.0638 0.0871 574 1483 16.89E-08 0.08538 FOXA1pAR 0.7280 0.0588 0.0808 532 1298 61.4E-06 0.08826 SRY 0.7143 2.6474 3.7062 8183 16466 0.00E00 7.45877 Nobox 0.7119 1.2138 1.7050 5785 12417 2.165E-228 3.03639 Pou5f1 0.6932 0.0077 0.0111 75 201 1.024E-02 0.00987 Nkx2-5 0.6904 4.7017 6.8101 9032 17443 0.00E00 21.33670 FOXI1 0.6641 0.5322 0.8014 3589 8450 16.64E-20 1.21224 TBP 0.6337 0.2117 0.3342 1720 4553 52.45E-08 0.53384 ARID3A 0.6257 2.5679 4.1039 7753 15934 0.00E00 10.51303 Continued on next page. . .

186 motif ratio fC fR n1C n1R p1 var Pdx1 0.6216 1.8387 2.9578 7128 15619 0.00E00 5.72488 NKX3-1 0.6204 0.5111 0.8239 3501 8565 5.48E-12 1.32365 IRF1 0.5849 0.1654 0.2828 1386 4045 2.098E-18 0.31209 NFIL3 0.5830 0.1320 0.2265 1021 2990 62.55E-20 0.37654 Prrx2 0.5818 1.6392 2.8177 6779 15323 0.00E00 5.15266 FOXL1 0.5146 4.4903 8.7259 8514 17099 0.00E00 77.68600 Ddit3::Cebpa 0.5145 0.2400 0.4665 2022 5802 7.678E-12 0.53893 MEF2A 0.5141 0.1931 0.3757 1432 4677 1.602E-34 0.52639 Lhx3 0.4723 0.1738 0.3680 1238 4200 1.133E-40 0.55664 Foxd3 0.4339 1.2066 2.7808 5014 13188 85.19E-60 11.60620 EWSR1-FLI1 0.2160 0.0058 0.0272 26 85 1.53E-02 0.20828

187 12.55 GR and GR siFOXA1 binding site overlaps

Chromosome specific statistics are shown in Table 137. A histogram of sequence lengths is shown in Figure 133.

length chromosome frequency min mean max total coverage 1 366 0 502 1062 183630 0.000737 10 154 220 470 1219 72328 0.000534 11 161 37 475 1061 76430 0.000566 12 124 257 468 1165 58087 0.000434 13 85 18 480 1219 40761 0.000354 14 110 201 463 752 50885 0.000474 15 109 240 491 1078 53484 0.000522 16 126 100 483 1157 60907 0.000674 17 157 4 474 1108 74467 0.000917 18 82 189 475 899 38967 0.000499 19 70 1 460 1498 32176 0.000544 2 259 17 440 1012 113862 0.000468 20 103 15 473 1565 48765 0.000774 21 45 226 456 923 20522 0.000426 22 60 238 435 815 26083 0.000508 3 271 16 507 1095 137324 0.000693 4 151 212 440 952 66386 0.000347 5 219 11 459 1083 100621 0.000556 6 213 9 445 1065 94869 0.000554 7 232 46 454 1187 105408 0.000662 8 193 71 469 1053 90586 0.000619 9 155 9 455 775 70487 0.000499 X 77 206 410 1034 31555 0.000203 all 23 3522 0 468 1565 1648590 0.000224

Table 137: Chromosome specific distribution of the regions. The last line represents the overall statistics. 1000 600 Frequency 200 0

0 500 1000 1500

length (base pairs)

Figure 133: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

188 The following table shows the properties of seqGRSi-gCount component. property value genes 8377

(a) seqGRSi-deNovo-meme1: width=15, sites=360, (b) seqGRSi-deNovo-meme2: width=15, sites=95, (c) seqGRSi-deNovo-meme3: width=15, sites=37, llr=3305, E=1e-170 llr=1051, E=1.1e-08 llr=482, E=720

Figure 134: De novo motifs for the filtered GR and GR siFOXA1 binding site overlaps sequences.

Table 138: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 25.4825 0.2296 0.0090 703 57 15.58E-266 0.13074 ESR2 2.8288 0.0418 0.0147 120 89 3.463E-12 0.03149 TLX1::NFIC 2.8130 0.0210 0.0074 54 41 8.654E-06 0.01817 ARMotifTH 2.6122 0.1179 0.0451 378 278 6.971E-40 0.07958 RXRA::VDR 2.5163 0.0131 0.0052 44 34 88.4E-06 0.00825 GRMotifTH 2.4305 0.5698 0.2344 1445 1312 2.213E-174 0.50845 ARMotifT 2.2557 0.9471 0.4199 2068 2145 3.718E-284 0.94161 NR3C1 2.2506 0.5956 0.2646 1539 1493 11.39E-178 0.42826 GRMotifTT 2.0103 0.0443 0.0220 148 141 13.77E-10 0.03347 Stat3 2.0102 0.2859 0.1422 722 738 4.895E-50 0.27647 ARMotifTT 1.9862 0.0188 0.0094 66 61 53.46E-06 0.01271 ARMotifHH 1.9789 0.6644 0.3357 1530 1693 44.79E-146 1.02786 GABPA 1.9224 0.1145 0.0595 377 370 6.378E-24 0.08221 AR 1.8631 0.0807 0.0433 264 277 8.04E-14 0.06128 Tal1::Gata1 1.8080 0.0585 0.0324 194 208 15.41E-10 0.04332 Tcfcp2l1 1.7866 0.1034 0.0579 311 349 5.851E-14 0.08812 PPARG::RXRA 1.7460 0.0324 0.0185 109 120 30.07E-06 0.02460 GRMotifT 1.6920 3.3066 1.9543 3314 5478 0.00E00 4.45631 GR 1.6418 0.3876 0.2361 1063 1301 2.944E-62 0.53135 ELK4 1.6370 0.0361 0.0220 125 138 7.191E-06 0.02798 RXR::RAR DR5 1.5992 0.0421 0.0263 146 170 7.315E-06 0.03176 FOXA1 1.5602 0.5914 0.3790 1546 1979 14.74E-116 0.53333 NHLH1 1.5591 0.1850 0.1186 404 485 9.396E-16 0.26087 ARMotifH 1.5321 11.4581 7.4785 3505 6523 0.00E00 26.85608 Myf 1.5018 0.3367 0.2242 826 1095 48.11E-34 0.43600 Esrrb 1.5015 0.1492 0.0993 485 612 35.47E-18 0.12032 Arnt 1.4961 0.2577 0.1723 559 724 26.96E-20 0.41547 Mycn 1.4916 0.1611 0.1080 396 531 91.29E-12 0.20298 MIZF 1.4782 0.0386 0.0261 129 170 12.64E-04 0.03134 GRMotifHH 1.4654 0.3166 0.2160 858 1173 2.848E-32 0.32802 Myc 1.4567 0.1921 0.1319 484 649 4.681E-14 0.23879 Egr1 1.4558 0.1248 0.0857 375 478 5.954E-12 0.13597 TEAD1 1.4508 0.1051 0.0725 350 461 7.739E-10 0.08534 Evi1 1.4383 0.0455 0.0316 140 202 84.74E-04 0.04619 STAT1 1.4349 0.0534 0.0372 152 190 61.4E-06 0.05964 FOXF2 1.3768 0.1799 0.1306 576 782 26.6E-18 0.15447 EBF1 1.3725 0.8338 0.6075 1621 2552 5.947E-84 1.17740 NFIC 1.3674 3.2668 2.3890 3210 5599 0.00E00 4.57342 Pax5 1.3472 0.0725 0.0538 244 337 16.86E-06 0.06240 Foxa2 1.3274 0.4902 0.3693 1317 1789 8.831E-74 0.58749 TFAP2A 1.3221 2.2691 1.7162 2414 4023 12.88E-220 7.29344 TAL1::TCF3 1.3183 0.2998 0.2274 766 1074 1.581E-24 0.36029 NFKB1 1.3172 0.0853 0.0647 222 312 1.151E-04 0.11502 ELK1 1.3139 1.0918 0.8309 2232 3511 1.187E-192 1.25297 Hand1::Tcfe2a 1.3004 1.4896 1.1455 2607 4280 2.672E-280 1.60177 GRMotifH 1.2997 7.3021 5.6183 3488 6454 0.00E00 13.33643 NR2F1 1.2835 0.1466 0.1142 464 699 2.11E-08 0.13235 FEV 1.2783 0.7488 0.5858 1784 2790 2.76E-108 0.73936 MYC::MAX 1.2650 0.0796 0.0629 214 333 69.85E-04 0.09626 Myb 1.2619 0.9466 0.7501 2060 3364 21.08E-146 0.96093 INSM1 1.2306 0.2191 0.1780 630 1008 1.936E-10 0.23674 Gata1 1.2089 0.7397 0.6119 1721 2836 4.327E-88 0.77659 HOXA5 0.8279 5.8662 7.0860 3430 6435 0.00E00 16.95114 Nobox 0.8198 1.4106 1.7206 2230 4518 5.376E-122 3.12140 HLF 0.8186 0.4564 0.5575 1132 2528 1.015E-06 0.74948 NKX3-1 0.7837 0.6343 0.8094 1484 3086 8.74E-28 1.22011 Nkx2-5 0.7737 5.3399 6.9020 3309 6294 0.00E00 21.91856 ARID3A 0.7391 3.0634 4.1449 2962 5809 0.00E00 10.59054 NFIL3 0.7206 0.1640 0.2276 449 1106 2.26E-02 0.38994 Pdx1 0.7081 2.1188 2.9924 2754 5636 3.152E-248 5.88874 Prrx2 0.6610 1.8852 2.8519 2634 5524 1.036E-204 5.20990 Lhx3 0.6341 0.2282 0.3599 574 1492 55.22E-04 0.56767 FOXL1 0.6240 5.4217 8.6881 3203 6182 0.00E00 73.05454 IRF1 0.6236 0.1799 0.2885 539 1491 45.75E-06 0.32505 MEF2A 0.6155 0.2310 0.3754 609 1723 19.5E-06 0.49861 Foxd3 0.5329 1.5138 2.8407 2165 4762 15.74E-92 11.82360 Ddit3::Cebpa 0.5094 0.2433 0.4776 739 2142 30.84E-06 0.54016

189 12.56 GR without GR siFOXA1 binding site overlaps

Chromosome specific statistics are shown in Table 139. A histogram of sequence lengths is shown in Figure 135.

length chromosome frequency min mean max total coverage 1 173 270 578 1232 99939 0.000401 10 80 234 539 924 43139 0.000318 11 102 324 540 829 55046 0.000408 12 80 266 539 1135 43101 0.000322 13 78 262 647 2132 50428 0.000438 14 67 250 607 1492 40690 0.000379 15 63 301 567 1186 35725 0.000348 16 52 272 524 1021 27274 0.000302 17 59 277 526 995 31010 0.000382 18 47 275 585 1014 27502 0.000352 19 11 256 479 694 5270 8.9e-05 2 162 234 517 988 83756 0.000344 20 30 257 528 904 15846 0.000251 21 43 362 552 888 23726 0.000493 22 10 332 430 633 4296 8.4e-05 3 172 256 616 1269 105878 0.000535 4 123 253 514 1083 63166 0.00033 5 184 278 553 1262 101738 0.000562 6 109 277 520 1161 56713 0.000331 7 100 258 548 1125 54785 0.000344 8 132 242 567 1102 74794 0.000511 9 75 250 552 1150 41394 0.000293 X 42 265 478 1076 20060 0.000129 Y 6 284 423 497 2538 4.3e-05 all 24 2000 234 554 2132 1107814 0.000151

Table 139: Chromosome specific distribution of the regions. The last line represents the overall statistics. 1000 600 Frequency 200 0

500 1000 1500 2000

length (base pairs)

Figure 135: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

190 The following table shows the properties of seqGRnSi-gCount component. property value genes 4696

(a) seqGRnSi-deNovo-meme1: width=15, sites=203, (b) seqGRnSi-deNovo-meme2: width=15, sites=33, (c) seqGRnSi-deNovo-meme3: width=11, sites=2, llr=1981, E=6.4e-44 llr=439, E=8 llr=34, E=2.7e+08

Figure 136: De novo motifs for the filtered GR without GR siFOXA1 binding site overlaps sequences.

Table 140: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 12.3128 0.1125 0.0091 202 32 5.439E-64 0.05181 Foxa2 2.8707 1.2190 0.4246 1425 1182 19.9E-296 0.95246 FOXA1 2.7563 1.3060 0.4738 1506 1361 2.716E-308 0.90219 FOXF2 2.4859 0.4155 0.1671 687 562 2.317E-86 0.27090 ARMotifTT 2.2595 0.0200 0.0088 38 33 12.04E-04 0.01326 ARMotifTH 2.0559 0.0990 0.0481 180 175 5.931E-12 0.06989 FOXD1 1.9720 1.9225 0.9749 1668 2161 3.667E-286 1.80046 RXRA::VDR 1.9436 0.0120 0.0061 23 22 3.054E-02 0.00882 Foxq1 1.8046 0.9255 0.5128 1186 1362 4.264E-146 0.82980 ARMotifT 1.7264 0.8600 0.4981 1116 1386 24.43E-116 0.71867 AR 1.6797 0.0930 0.0553 179 201 1.389E-08 0.06867 FOXI1 1.5702 1.5635 0.9957 1471 2061 2.161E-190 1.84491 NR3C1 1.5108 0.4985 0.3299 756 1039 2.427E-42 0.41329 GRMotifTH 1.5072 0.4260 0.2826 679 897 16.6E-38 0.35269 HNF1B 1.4691 0.1540 0.1048 276 361 27.25E-10 0.13289 GRMotifTT 1.4634 0.0415 0.0283 78 102 1.13E-02 0.03498 NKX3-1 1.4506 1.4200 0.9789 1399 1978 21.33E-166 1.85330 FOXA1pAR 1.4413 0.1430 0.0992 256 326 36.58E-10 0.13239 Tal1::Gata1 1.3936 0.0615 0.0441 119 158 14.02E-04 0.05150 FOXO3 1.3749 2.7730 2.0168 1825 3022 6.304E-296 3.31695 GRMotifHH 1.3505 0.3380 0.2503 543 755 8.07E-22 0.33548 Gata1 1.2939 0.9670 0.7473 1205 1882 33.63E-98 0.92959 GRMotifT 1.2788 2.9040 2.2709 1873 3267 63.17E-308 3.03791 GR 1.2687 0.3935 0.3102 632 931 88.85E-26 0.38832 SRY 1.2554 5.7270 4.5618 1959 3490 0.00E00 10.97910 Stat3 1.2513 0.2235 0.1786 330 517 3.218E-06 0.26771 FOXC1 1.2464 10.3920 8.3374 1994 3722 0.00E00 22.45752 SOX9 1.2384 0.9775 0.7893 1228 1915 1.973E-102 0.96848 TBP 1.2026 0.4865 0.4045 726 1120 32.46E-30 0.63762 EBF1 0.8031 0.6040 0.7521 779 1699 4.672E-10 1.12358 ZNF354C 0.7650 2.9835 3.9003 1807 3533 24.56E-246 7.82636 CREB1 0.7594 0.8300 1.0930 1046 2323 5.287E-28 1.32183 MZF1 1-4 0.7494 3.7060 4.9455 1837 3540 35.5E-264 12.27007 MZF1 5-13 0.7154 1.3160 1.8396 1379 2930 14.06E-86 2.45674 Tcfcp2l1 0.7032 0.0515 0.0733 98 247 4.89E-02 0.07532 TFAP2A 0.6567 1.2695 1.9332 1136 2406 23.28E-44 5.99425 Zfp423 0.6497 0.0415 0.0639 73 189 4.674E-02 0.07770 INSM1 0.6389 0.1445 0.2262 254 704 11.88E-04 0.22941 NF-kappaB 0.6387 0.1740 0.2725 257 777 10.93E-06 0.35084 Klf4 0.5946 0.1580 0.2658 264 766 1.245E-04 0.30581 SP1 0.5694 0.9160 1.6088 794 2032 3.505E-04 5.53269 NFKB1 0.5568 0.0430 0.0773 67 224 1.28E-04 0.09941 RREB1 0.5048 0.0240 0.0476 39 121 1.069E-02 0.08000 Zfx 0.4474 0.0965 0.2158 160 650 4.604E-14 0.22997

191 12.57 GR siFOXA1 without GR binding site overlaps

Chromosome specific statistics are shown in Table 141. A histogram of sequence lengths is shown in Figure 137.

length chromosome frequency min mean max total coverage 1 653 191 452 980 295447 0.001185 10 263 207 433 1152 113775 0.000839 11 278 207 435 1024 120966 0.000896 12 171 237 416 990 71142 0.000531 13 111 206 474 1403 52582 0.000457 14 204 252 438 1233 89399 0.000833 15 205 211 421 925 86352 0.000842 16 229 198 430 953 98533 0.001091 17 285 197 457 1153 130362 0.001606 18 128 234 422 903 54014 0.000692 19 104 221 398 848 41443 0.000701 2 431 189 413 922 177937 0.000732 20 171 227 422 1235 72225 0.001146 21 98 221 428 916 41984 0.000872 22 116 228 401 757 46540 0.000907 3 406 219 467 1033 189558 0.000957 4 199 192 402 875 80002 0.000419 5 330 208 431 1004 142233 0.000786 6 262 179 398 1188 104339 0.00061 7 242 203 438 875 105975 0.000666 8 278 180 434 1240 120711 0.000825 9 259 208 433 1010 112051 0.000793 X 126 191 369 667 46491 0.000299 Y 6 327 374 408 2241 3.8e-05 all 24 5555 179 431 1403 2396302 0.000326

Table 141: Chromosome specific distribution of the regions. The last line represents the overall statistics. 1500 1000 Frequency 500 0

200 400 600 800 1000 1200 1400

length (base pairs)

Figure 137: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

192 The following table shows the properties of seqSinGR-gCount component. property value genes 11432

(a) seqSinGR-deNovo-meme1: width=15, sites=293, (b) seqSinGR-deNovo-meme2: width=15, sites=222, (c) seqSinGR-deNovo-meme3: width=15, sites=92, llr=2622, E=1.5e-68 llr=2065, E=8.8e-29 llr=1007, E=1e-09

Figure 138: De novo motifs for the filtered GR siFOXA1 without GR binding site overlaps sequences.

Table 142: Motif enrichments

motif ratio fC fR n1C n1R p1 var Ar 21.0408 0.1424 0.0067 734 67 1.471E-260 0.05936 ESR1 6.8231 0.0016 0.0002 9 2 33.75E-04 0.00069 TLX1::NFIC 3.5346 0.0225 0.0063 96 50 98.85E-16 0.01768 ESR2 3.0412 0.0356 0.0117 169 108 5.876E-20 0.02576 CTCF 2.7730 0.0101 0.0036 54 37 1.878E-06 0.00610 ARMotifTH 2.4062 0.0945 0.0392 498 396 9.56E-44 0.06002 MIZF 2.3651 0.0459 0.0194 239 197 51.87E-20 0.03293 GRMotifTH 2.2987 0.5042 0.2193 2117 2001 3.504E-224 0.40395 GABPA 2.1715 0.1352 0.0622 678 593 1.218E-52 0.09973 Stat3 2.1594 0.2959 0.1370 1204 1116 1.409E-98 0.26815 ARMotifT 2.0660 0.7825 0.3787 2966 3143 0.00E00 0.65915 Tcfcp2l1 2.0619 0.1195 0.0579 584 519 67.92E-44 0.09915 ARMotifHH 2.0538 0.6385 0.3109 2457 2551 23.76E-254 0.56724 Tal1::Gata1 1.9225 0.0549 0.0285 302 287 3.597E-18 0.03763 NR3C1 1.9039 0.4677 0.2456 2032 2142 2.055E-176 0.36135 AR 1.8211 0.0749 0.0411 386 416 21.45E-18 0.06052 Zfp423 1.7843 0.0963 0.0540 415 446 55.88E-20 0.10024 MYC::MAX 1.7766 0.0990 0.0557 416 460 12.72E-18 0.10381 GRMotifT 1.7570 3.1424 1.7885 5234 8369 0.00E00 3.38926 NHLH1 1.7509 0.1946 0.1111 683 726 43.47E-36 0.24421 RXR::RAR DR5 1.7462 0.0437 0.0250 235 255 3.156E-10 0.03187 Mycn 1.7435 0.1861 0.1067 729 811 5.729E-34 0.20688 Myf 1.7398 0.3698 0.2125 1391 1666 5.653E-72 0.45582 Esrrb 1.7369 0.1492 0.0859 744 831 1.627E-34 0.11566 RXRA::VDR 1.7323 0.0086 0.0050 48 51 65.79E-04 0.00622 Myc 1.7313 0.2293 0.1325 913 1016 7.803E-46 0.25834 Arnt 1.7179 0.2610 0.1519 903 1015 5.073E-44 0.33153 EBF1 1.6900 0.9838 0.5821 2916 3824 59.07E-250 1.25281 PPARG::RXRA 1.6871 0.0286 0.0169 155 170 96.69E-08 0.02161 GRMotifTT 1.6818 0.0367 0.0218 199 212 37.85E-10 0.02885 GR 1.6747 0.3631 0.2168 1618 1924 2.354E-94 0.32368 Pax5 1.6645 0.0736 0.0442 387 441 61.86E-16 0.05599 ELK4 1.6519 0.0351 0.0212 183 208 42.3E-08 0.02846 ARMotifH 1.6065 11.0828 6.8989 5555 10202 0.00E00 20.41834 TFAP2A 1.5901 2.5284 1.5901 4161 6050 0.00E00 6.75827 Egr1 1.5769 0.1204 0.0764 584 707 4.174E-20 0.11108 STAT1 1.5082 0.0504 0.0334 218 268 1.457E-06 0.05653 ARMotifTT 1.4762 0.0137 0.0093 75 90 55.27E-04 0.01183 GRMotifHH 1.4732 0.2941 0.1996 1317 1706 15.07E-54 0.28089 HIF1A::ARNT 1.4658 0.5554 0.3788 1969 2723 8.801E-96 0.71257 NFIC 1.4553 3.2745 2.2501 5152 8608 0.00E00 4.35490 INSM1 1.4401 0.2477 0.1720 1154 1503 2.808E-42 0.23727 ELK1 1.4127 1.0896 0.7713 3540 5242 0.00E00 1.07140 MAX 1.3823 0.2713 0.1962 1117 1579 1.317E-30 0.29984 NFKB1 1.3621 0.0999 0.0733 379 523 19.44E-08 0.16230 Hand1::Tcfe2a 1.3490 1.4283 1.0587 4058 6514 0.00E00 1.57490 E2F1 1.3416 0.0167 0.0125 90 127 4.575E-02 0.01440 USF1 1.3299 0.6736 0.5065 1975 2755 2.391E-94 1.06816 FEV 1.3288 0.7260 0.5464 2784 4141 10.61E-178 0.70827 PLAG1 1.3264 0.0243 0.0183 131 180 66.9E-04 0.02151 TAL1::TCF3 1.3221 0.2832 0.2142 1142 1582 4.087E-34 0.34274 Klf4 1.3194 0.2905 0.2202 1239 1782 1.307E-34 0.34509 NR4A2 1.3002 2.1262 1.6353 4751 7905 0.00E00 2.43322 TEAD1 1.2919 0.0907 0.0702 473 687 11.73E-08 0.08028 NR2F1 1.2887 0.1339 0.1039 686 991 63.45E-14 0.12234 SPI1 1.2810 2.1080 1.6456 4685 7852 0.00E00 2.60696 MZF1 1-4 1.2743 5.0477 3.9610 5338 9402 0.00E00 12.52164 Mafb 1.2526 2.3204 1.8525 4660 7944 0.00E00 3.21442 GRMotifH 1.2435 6.4725 5.2052 5531 10096 0.00E00 9.59405 SP1 1.2369 1.6049 1.2974 3193 4920 16.83E-236 7.24321 Myb 1.2266 0.8594 0.7006 3116 4963 10.99E-208 0.86081 RELA 1.2127 0.1271 0.1048 596 898 1.143E-08 0.14507 ZNF354C 1.2111 3.6922 3.0486 5209 9228 0.00E00 6.16328 ELF5 1.2106 2.0860 1.7231 4693 7934 0.00E00 2.69116 NFE2L2 1.2097 0.1483 0.1226 728 1181 7.491E-08 0.14403 RORA 1 1.2065 0.2673 0.2216 1256 2038 6.049E-22 0.25411 En1 0.8333 4.0403 4.8488 5379 10061 0.00E00 6.82499 Cebpa 0.8314 0.7003 0.8423 2530 5395 88.65E-50 1.01064 FOXO3 0.7921 1.2020 1.5174 3719 7405 72.16E-234 1.92137 Gfi 0.7889 0.8110 1.0280 2932 6302 86.23E-84 1.13565 SOX9 0.7630 0.4610 0.6042 1946 4349 55.61E-14 0.66450 HOXA5 0.7070 4.6115 6.5224 5308 10041 0.00E00 14.11075 PBX1 0.6781 0.0743 0.1097 384 941 9.064E-04 0.16011 FOXD1 0.6771 0.4999 0.7383 2064 4979 14.37E-10 0.80176 Sox5 0.6760 1.3824 2.0448 3822 8144 6.691E-226 3.12635 HNF1B 0.6538 0.0524 0.0802 269 764 19.42E-08 0.07784 FOXF2 0.6331 0.0769 0.1214 406 1149 62.55E-10 0.11079 Nobox 0.6286 1.0097 1.6063 3004 6922 1.198E-72 2.62720 Nkx2-5 0.6284 3.9901 6.3494 5019 9779 0.00E00 18.25020 Continued on next page. . .

193 motif ratio fC fR n1C n1R p1 var FOXA1pAR 0.6181 0.0441 0.0714 224 651 21.67E-08 0.07531 SRY 0.6104 2.1208 3.4745 4386 9239 0.00E00 6.60473 Pou5f1 0.5793 0.0072 0.0125 40 128 33.73E-04 0.01051 IRF1 0.5757 0.1447 0.2515 701 2088 32.41E-14 0.26681 TBP 0.5735 0.1692 0.2951 818 2355 1.448E-10 0.32612 Foxq1 0.5700 0.2155 0.3781 1001 2908 5.258E-10 0.42527 Foxa2 0.5630 0.1960 0.3483 950 2584 2.248E-06 0.45378 Pdx1 0.5560 1.5244 2.7419 3736 8638 15.0E-176 4.92120 ARID3A 0.5461 2.0805 3.8095 4126 8944 1.184E-290 9.15214 Prrx2 0.5225 1.3629 2.6087 3531 8469 13.82E-128 4.49921 NFIL3 0.5127 0.1046 0.2040 471 1528 8.899E-18 0.33355 Ddit3::Cebpa 0.5125 0.2178 0.4251 1056 3055 30.65E-10 0.47407 FOXI1 0.5094 0.3775 0.7411 1603 4480 3.655E-02 1.04159 FOXL1 0.4599 3.5838 7.7922 4627 9614 0.00E00 50.62777 MEF2A 0.4467 0.1532 0.3430 659 2452 1.808E-34 0.42526 Foxd3 0.3599 0.9188 2.5534 2391 7236 2.594E-02 9.90414 Lhx3 0.3551 0.1204 0.3392 513 2206 2.231E-48 0.49213 EWSR1-FLI1 0.1799 0.0050 0.0282 11 52 52.53E-04 0.31523

194 12.58 AR and FOXA1 overlaps unique for AR parental

Chromosome specific statistics are shown in Table 143. A histogram of sequence lengths is shown in Figure 139.

length chromosome frequency min mean max total coverage 1 200 170 439 860 87729 0.000352 10 73 210 405 626 29573 0.000218 11 91 41 417 1026 37952 0.000281 12 68 172 382 680 25985 0.000194 13 55 200 465 1063 25556 0.000222 14 59 91 434 761 25594 0.000238 15 60 230 448 724 26890 0.000262 16 35 269 397 644 13906 0.000154 17 55 129 386 656 21204 0.000261 18 33 125 415 772 13702 0.000175 19 12 208 318 433 3820 6.5e-05 2 135 128 386 695 52148 0.000214 20 37 239 392 627 14516 0.00023 21 19 303 468 682 8896 0.000185 22 13 198 371 505 4829 9.4e-05 3 159 114 468 1041 74390 0.000376 4 103 148 395 706 40667 0.000213 5 110 80 423 1152 46521 0.000257 6 115 170 399 770 45850 0.000268 7 85 194 410 890 34854 0.000219 8 98 207 431 936 42226 0.000288 9 63 260 441 775 27757 0.000197 X 34 198 353 512 11998 7.7e-05 Y 1 197 197 197 197 3e-06 all 24 1713 41 418 1152 716760 9.7e-05

Table 143: Chromosome specific distribution of the regions. The last line represents the overall statistics. 500 300 Frequency 100 0

0 200 400 600 800 1000 1200

length (base pairs)

Figure 139: Sequence length distribution. Dashed line represents a normal distribution approximation that is based on the mean and the standard deviation of the sequece lengths.

195 The following table shows the properties of seqUARover-gCount component. property value genes 4702

(a) seqUARover-deNovo-meme1: width=15, (b) seqUARover-deNovo-meme2: width=15, (c) seqUARover-deNovo-meme3: width=15, sites=159, llr=1600, E=1e-43 sites=43, llr=525, E=2700 sites=55, llr=632, E=1100000

Figure 140: De novo motifs for the filtered AR and FOXA1 overlaps unique for AR parental sequences.

Table 144: Motif enrichments

motif ratio fC fR n1C n1R p1 var FOXA1 4.2210 1.3975 0.3311 1380 875 0.00E00 0.88620 FOXA1pAR 3.7871 0.2674 0.0706 415 198 11.69E-82 0.16146 FOXF2 3.5979 0.4151 0.1153 588 334 11.75E-110 0.24803 Foxa2 3.4985 1.2790 0.3656 1326 805 0.00E00 1.10154 Ar 2.9774 0.0362 0.0121 61 39 3.168E-08 0.02048 TLX1::NFIC 2.7930 0.0105 0.0037 15 10 1.403E-02 0.00808 FOXD1 2.4323 1.7443 0.7171 1462 1528 2.077E-312 1.35889 Foxq1 2.2066 0.8033 0.3640 945 895 19.07E-138 0.64255 Tal1::Gata1 1.8957 0.0578 0.0305 96 97 4.127E-06 0.03999 FOXI1 1.8779 1.3351 0.7109 1196 1376 3.673E-180 1.53546 ARMotifTH 1.8387 0.0566 0.0308 95 97 6.36E-06 0.03981 ARMotifT 1.5966 0.5733 0.3590 740 939 1.452E-56 0.46758 HNF1B 1.4941 0.1092 0.0730 172 218 3.096E-06 0.09371 GRMotifTH 1.4345 0.2890 0.2014 421 577 89.98E-18 0.24711 FOXO3 1.4310 2.1494 1.5020 1460 2279 16.58E-220 2.62498 Evi1 1.4170 0.0450 0.0317 75 98 1.108E-02 0.03743 AR 1.4152 0.0572 0.0404 93 130 1.384E-02 0.04615 Gata1 1.4098 0.7542 0.5350 880 1246 24.39E-68 0.70878 NKX3-1 1.3760 1.0193 0.7408 1007 1462 54.83E-90 1.46011 SRY 1.2969 4.3386 3.3454 1641 2890 4.805E-282 7.08615 FOXC1 1.2393 7.8955 6.3708 1703 3182 2.254E-304 15.55474 SOX9 1.2390 0.7221 0.5828 849 1364 2.681E-48 0.71446 GRMotifT 1.2359 2.1395 1.7311 1488 2572 4.059E-210 2.26141 TEAD1 1.2271 0.0870 0.0709 142 223 2.011E-02 0.07591 Stat3 1.2113 0.1740 0.1436 229 360 4.184E-04 0.20549 Sox17 1.2056 1.5231 1.2633 1311 2228 16.85E-146 1.53921 ZNF354C 0.8318 2.3964 2.8809 1485 2832 1.22E-188 4.64916 MZF1 1-4 0.7790 2.9212 3.7498 1496 2926 2.886E-188 8.30256 MZF1 5-13 0.7284 1.0006 1.3736 1017 2243 5.477E-40 1.70003 CREB1 0.7144 0.5820 0.8147 723 1639 4.369E-10 0.91516 TFAP2A 0.6808 1.0029 1.4731 886 1881 42.62E-28 3.58491 NF-kappaB 0.6279 0.1366 0.2176 179 522 3.823E-04 0.29885 Klf4 0.5946 0.1173 0.1974 169 507 1.194E-04 0.23011 NFYA 0.5751 0.0327 0.0569 55 157 1.834E-02 0.07048 Zfp423 0.4277 0.0280 0.0656 41 161 57.72E-06 0.08103 Zfx 0.4148 0.0683 0.1647 107 433 2.541E-10 0.17166

196 13 AR versus AR siFOXA1

AR and AR siFOXA1 BS overlap red

7 1

2 43 0 17 33 6

7 3 23 0 1 6

AR without AR siFOXA1AR siFOXA1 BS overlap without AR BS overlap green blue

(a) (b)

Figure 141: Intersections of sets: (a) AR and AR siFOXA1 BS overlap, AR siFOXA1 without AR BS overlap and AR without AR siFOXA1 BS overlap (b) red, blue and green.

Vertex fill colors: no matching annotations AR and AR siFOXA1 BS overlap AR and AR siFOXA1 BS overlap and AR without AR siFOXA1 BS overlap AR without AR siFOXA1 BS overlap AR without AR siFOXA1 BS overlap and AR siFOXA1 without AR BS overlap AR siFOXA1 without AR BS overlap AR and AR siFOXA1 BS overlap and AR siFOXA1 without AR BS overlap AR and AR siFOXA1 BS overlap and AR without AR siFOXA1 BS overlap and AR siFOXA1 without AR BS overlap

gene gene pathway protein protein protein protein-protein dephosphorylation phosphorylation expression repression precedence activation binding dissociation interaction

Figure 142: Descriptions of the edge types and the gene colors used in the candidate pathway shown in Figure 143. Green and blue borders are referring to up and down regulated genes, respectively. Light grey is used to emphasize stably expressed genes. Known regulations are shown with bold borders whereas the predictions are kept thin.

197 CNTFR, PEMT CRLF2, CSF2RB, CSF3R, EPOR, PISD GHR, IL11RA, IL12RB2, IL13RA1, AGPAT6, IL13RA2, AGPAT9, IL15RA, GPAM IL20RB, IL22RA1, AMAC1L3, IL22RA2, NOS1 POLR1A, IL28RA, CDS1, POLR1B, IL3RA, MBOAT2 CDS2 IL4R, POLR1C, IL5RA, POLR1D, IL6R, POLR1E, POLR2B, IL6ST, MBOAT1 GALC IL9R, POLR2C, LIFR, POLR2D, MPL, POLR2E, OSMR, POLR2F, AGPAT1, PRLR CEPT1 EPT1 POLR2G, AGPAT2 POLR2H, POLR2I, POLR2J, PLD1 POLR2J2, PLD2 POLR2J3, POLR2K, CSF2RA, KCNMA1 POLR2L, IL12RB1, POLR3A, IL20RA, IFNGR2 GBA, AGPAT3, CHPT1 POLR3B, IL21R, LEPR UGCG AGPAT4, POLR3C, IL23R, LCLAT1 POLR3D, IL7R POLR3E, NME1, POLR3F, ACER1, NME1-NME2, ACER2 POLR3G, NME2, POLR3H, NME3, ZNRD1 NME4, PRKG1 IFNAR1, IFNGR1 ASAH1, FUT1, NME5, FUT3 IFNAR2, TYK2 ASAH2 FUT2 NME6, IL10RA, JAK3 NME7 IL10RB PKLR, PKM2 JAK2 SPHK1, ST3GAL4 SPHK2 PNPT1 JAK1 ATP1A1 POLR3GL SLC8A1 DGKB, GCAT GUCY1A3 POLR3K IL2RB DGKD, SOCS2 DGKE, DAO DGKG, DGKI, IL2RA, ALAS2 DGKQ IL2RG NUDT2 ADCY3 ALAS1 PRKG2 AK3 PIK3CA, AGK DGKA, ENTPD3 PIK3CB, DGKH, GNMT ADCY4 PIK3R1 PIK3CD, DGKZ UGT8 PIK3CG, PIK3R2, ENTPD8 PIK3R3, PIK3R5 GLDC PPAP2A CTPS, KCNJ1 NEDD4L ITPA CTPS2 ENTPD1

AKR1A1 KDSR

SGK1 PDE5A, PDE6G UPB1 LIPF ADCY1, INSR ADCY10, OCLN ADCY2, IRS1, AMT DLD TJP1 IRS4 ADCY5, PNLIPRP2 PNLIP ADCY6, PDE1A, ADCY7, PDE1B, ADCY8, PDE6A, TJP3 PDE1C, ADCY9 PDE2A, PDE6B, GAD1 GAD2 PDE3A, PDE6C, PDE10A, PDE6D, IRS2 CLDN16 PDE3B, PDE11A, CLDN19 CLDN7 PDE6H, PDE4A, PDE7A, PDE4B, PDE9A PDE7B, ABAT PNPLA3 ACER3 ESAM PDE4C, PDE8B PDE4D, CLDN10 PDE8A CLDN3 LIPG LPL

CLDN2 LHPP, CLDN6 PNLIPRP1 LIPC SGPL1 PPA1 CLDN5 CLDN22

CLDN20 GRB2 ATP12A DGAT1 PNLIPRP3 DGAT2

TJP2 CLDN17 CLDN18 PPA2 CNDP1 CEL CLDN9

CLDN15 BDKRB2 CLDN14 CARNS1 KLK2 HEMK1, PHPT1 LCMT1, LCMT2, CLDN23 METTL2B, CLDN8 METTL6, CLDN4 TRMT11, ALDH1A3 CLDN1 CLDN11 WBSCR22

GNE

UAP1 DHCR7 ADH1A, PAH ADH1C, ADH4, HK1, SORD KHK ATF4 ADH5, HK2, MPDZ ADH6, HK3, INADL ADH7 HKDC1 PRDX6 HES1 CREBBP EP300 NEUROG3 GOT1, AOC2, MAOA GOT2 AOC3 ACPP CREB3L4 FLAD1 AQP2 COMT JAG1 NOTCH1, TH TAT ADH1B MTMR1, MAPKAPK3 NOTCH2, MTMR2, NFKB1, NOTCH3, MTMR6, RELA NOTCH4 MTMR7 HOMER2 IDO2 DDC HSPB1 NFKBIA HOMER3 TYR IL4I1 MAOB GRM1, GRM5, SHANK2 ITPR1, MPZ ITPR2, ITPR3, SHANK1, TPH1, IDO1 MPZL1 TPH2 SHANK3

HOMER1 NFKB2 AFMID HNMT DHCR24

UGT1A5, INMT C5orf4 UGT1A6, UGT2A3

ABP1 SRM UGT1A1, UGT1A10, UGT1A3, UGT1A4, ODC1 UGT1A7, UGT1A8, UGT1A9, UGT2A1, ACY1 UGT2B10, UGT2B11, UGT2B15, UGT2B17, UGT2B28, UGDH UGT2B4, UGT2B7

CYP51A1 GALE

TM7SF2 GALT

SC4MOL UGP2

GYS1, GYS2

PYGB

Figure 143: Candidate pathway for AR versus AR siFOXA1. Graph notations are described in Figure 142.

198 14 Expression comparision

Color Key and Histogram 600 400 Count 200 0

8 10 12 14 Value

ENSG00000226936 ENSG00000206285 ENSG00000235155 ENSG00000235863 ENSG00000236802 ENSG00000168615 ENSG00000250539 ENSG00000170222 ENSG00000198189 ENSG00000105879 ENSG00000220988 ENSG00000255813 ENSG00000185950 ENSG00000167751 ENSG00000171522 ENSG00000067177 ENSG00000011405 ENSG00000160691 ENSG00000245694 ENSG00000134202 ENSG00000150667 ENSG00000179241 ENSG00000157557 ENSG00000150907 ENSG00000137414 ENSG00000197070 ENSG00000126246 ENSG00000152217 ENSG00000077549 ENSG00000235609 ENSG00000145088 ENSG00000138646 ENSG00000114315 ENSG00000023572 ENSG00000156113 ENSG00000101384 ENSG00000138162 ENSG00000087470 ENSG00000186352 ENSG00000117143 ENSG00000149150 ENSG00000075790 ENSG00000111845 ENSG00000128944 ENSG00000163884 ENSG00000196118 ENSG00000182378 ENSG00000143797 ENSG00000075886 ENSG00000173848 ENSG00000035403 ENSG00000115525 ENSG00000130449 ENSG00000256467 ENSG00000158321 ENSG00000115194 ENSG00000157514 ENSG00000065809 ENSG00000181744 ENSG00000164048 ENSG00000186832 ENSG00000235750 ENSG00000182816 ENSG00000117152 ENSG00000115604 ENSG00000111837 ENSG00000163273 ENSG00000177283 ENSG00000159588 ENSG00000156232 ENSG00000151208 ENSG00000143845 ENSG00000119630 ENSG00000118523 ENSG00000255824 ENSG00000148344 ENSG00000197747 ENSG00000164047 ENSG00000174705 ENSG00000165030 ENSG00000111110 ENSG00000164045 ENSG00000112183 ENSG00000114850 ENSG00000160712 ENSG00000163430 ENSG00000186594 ENSG00000143839 ENSG00000152463 ENSG00000171345 ENSG00000176842 ENSG00000070540 ENSG00000133519 ENSG00000108932 ENSG00000197977 ENSG00000129226 ENSG00000181649 ENSG00000162909 ENSG00000163513 ENSG00000196136 ENSG00000185275 ENSG00000118515 ENSG00000019549 ENSG00000103196 ENSG00000227429 ENSG00000206337 ENSG00000230389 ENSG00000237105 ENSG00000164850 ENSG00000237886 ENSG00000084636 ENSG00000104998 ENSG00000068615 ENSG00000075673 ENSG00000128045 ENSG00000126262 ENSG00000115263 ENSG00000196132 ENSG00000121297 ENSG00000124215 ENSG00000163545 ENSG00000237149 ENSG00000179715 ENSG00000104213 ENSG00000167771 ENSG00000107821 ENSG00000137877 ENSG00000127324 ENSG00000114251 ENSG00000211452 ENSG00000065618 ENSG00000198944 ENSG00000164403 ENSG00000186642 ENSG00000169783 ENSG00000173702 ENSG00000125347 ENSG00000124391 ENSG00000109205 ENSG00000171747 ENSG00000133321 ENSG00000184160 ENSG00000229891 ENSG00000132561 ENSG00000158864 ENSG00000140459 ENSG00000110931 ENSG00000172771 ENSG00000135226 ENSG00000158747 ENSG00000109452 ENSG00000110799 ENSG00000164120 ENSG00000136205 ENSG00000250722 ENSG00000103222 ENSG00000184292 ENSG00000138696 ENSG00000182319 ENSG00000196091 ENSG00000169213 ENSG00000144481 ENSG00000136111 ENSG00000070526 ENSG00000063180 ENSG00000171729 ENSG00000176204 ENSG00000116991 ENSG00000157036 ENSG00000186310 ENSG00000100100 ENSG00000099377 ENSG00000134253 ENSG00000143412 ENSG00000249007 ENSG00000165655 ENSG00000168350 ENSG00000145321 ENSG00000120594 ENSG00000156298 ENSG00000176387 ENSG00000137193 ENSG00000101901 ENSG00000145990 ENSG00000139289 ENSG00000134775 ENSG00000138376 ENSG00000197766 ENSG00000145103 ENSG00000100504 ENSG00000075240 ENSG00000126878 ENSG00000198417 ENSG00000099282 ENSG00000138134 ENSG00000143502 ENSG00000149300 ENSG00000132436 ENSG00000134461 ENSG00000124191 ENSG00000181751 ENSG00000130338 ENSG00000183837 ENSG00000132964 ENSG00000186063 ENSG00000138386 ENSG00000117472 ENSG00000132746 ENSG00000135185 ENSG00000182272 ENSG00000159388 ENSG00000128849 ENSG00000050405 ENSG00000134716 ENSG00000170425 ENSG00000127586 ENSG00000180340 ENSG00000141497 ENSG00000183087 ENSG00000136783 ENSG00000112167 ENSG00000069482 ENSG00000184232 ENSG00000086506 ENSG00000164938 ENSG00000164124 ENSG00000178445 ENSG00000123364 ENSG00000055163 ENSG00000198589 ENSG00000197444 ENSG00000140284 ENSG00000163435 ENSG00000196498 ENSG00000114646 ENSG00000196227 ENSG00000124145 ENSG00000170522 ENSG00000175455 ENSG00000112394 ENSG00000146409 ENSG00000219201 ENSG00000100600 ENSG00000004866 ENSG00000043355 ENSG00000122786 ENSG00000125780 ENSG00000245750 ENSG00000197888 ENSG00000164125 ENSG00000224841 ENSG00000206282 ENSG00000228736 ENSG00000237825 ENSG00000079616 ENSG00000213402 ENSG00000257076 ENSG00000051825 ENSG00000147535 ENSG00000154359 ENSG00000153208 ENSG00000169252 ENSG00000123983 ENSG00000172183 ENSG00000096080 ENSG00000132388 ENSG00000183421 ENSG00000111961 ENSG00000162694 ENSG00000136167 ENSG00000116741 ENSG00000124225 ENSG00000156284 ENSG00000135596 ENSG00000151503 ENSG00000169908 ENSG00000198833 ENSG00000158480 ENSG00000135587 ENSG00000066629 ENSG00000112365 ENSG00000095951 ENSG00000169891 ENSG00000123836 ENSG00000124713 ENSG00000137976 ENSG00000171517 ENSG00000174469 ENSG00000146411 ENSG00000189058 ENSG00000101236 ENSG00000143622 ENSG00000175066 ENSG00000006071 ENSG00000137434 ENSG00000143036 ENSG00000033800 ENSG00000124507 ENSG00000101911 ENSG00000125249 ENSG00000164251 ENSG00000164253 ENSG00000164050 ENSG00000102804 ENSG00000173221 ENSG00000109814 ENSG00000166289 ENSG00000155893 ENSG00000147642 ENSG00000196368 ENSG00000110328 ENSG00000142871 ENSG00000148841 ENSG00000182287 ENSG00000107796 ENSG00000149573 ENSG00000143878 ENSG00000204634 ENSG00000174640 ENSG00000125931 ENSG00000127129 ENSG00000166292 ENSG00000129757 ENSG00000142515 ENSG00000116667 ENSG00000255723 ENSG00000227295 ENSG00000118985 ENSG00000120833 ENSG00000110080 ENSG00000221869 ENSG00000103942 ENSG00000163659 ENSG00000096088 ENSG00000114423 ENSG00000130707 ENSG00000002587 ENSG00000108679 ENSG00000159423 ENSG00000102996 ENSG00000126016 ENSG00000001617 ENSG00000155792 ENSG00000175155 ENSG00000138750 ENSG00000112796 ENSG00000169926 ENSG00000135124 ENSG00000138764 ENSG00000135723 ENSG00000001561 ENSG00000182809 ENSG00000125398 ENSG00000189157 ENSG00000118804 ENSG00000162545 ENSG00000249992 ENSG00000178550 ENSG00000175928 ENSG00000146072 ENSG00000124107 ENSG00000151498 ENSG00000196189 ENSG00000136478 ENSG00000144749 ENSG00000167183 ENSG00000150347 ENSG00000163472 ENSG00000113645 ENSG00000130529 ENSG00000198721 ENSG00000125257 ENSG00000082438 ENSG00000058799 ENSG00000110660 ENSG00000157224 ENSG00000256687 ENSG00000100304 ENSG00000123395 ENSG00000162073 ENSG00000065485 ENSG00000198793 ENSG00000164647 ENSG00000166387 ENSG00000128585 ENSG00000172725 ENSG00000184254 ENSG00000116285 ENSG00000112186 ENSG00000012660 ENSG00000130066 ENSG00000182704 ENSG00000198648 ENSG00000207844 ENSG00000166002 ENSG00000147592 ENSG00000072210 ENSG00000167193 ENSG00000144677 ENSG00000119138 ENSG00000197965 ENSG00000177606 ENSG00000102054 ENSG00000120129 ENSG00000096433 ENSG00000158406 ENSG00000164237 ENSG00000109089 ENSG00000144136 ENSG00000155066 ENSG00000168140 ENSG00000170017 ENSG00000179348 ENSG00000096092 ENSG00000140391 ENSG00000203724 ENSG00000162144 ENSG00000103260 ENSG00000184508 ENSG00000111052 ENSG00000114738 ENSG00000166801 ENSG00000115875 ENSG00000147676 ENSG00000135778 ENSG00000115457 ENSG00000011105 ENSG00000160606 ENSG00000196586 ENSG00000184304 ENSG00000131187 ENSG00000162078 ENSG00000162817 ENSG00000070501 ENSG00000114200 ENSG00000070718 ENSG00000198270 ENSG00000163820 ENSG00000087448 ENSG00000143375 ENSG00000153879 ENSG00000185112 ENSG00000092964 ENSG00000088832 ENSG00000160050 ENSG00000117791 ENSG00000196155 ENSG00000121851 ENSG00000086205 ENSG00000115183 ENSG00000173334 ENSG00000196793 ENSG00000039560 ENSG00000134986 ENSG00000070214 ENSG00000148143 ENSG00000125629 ENSG00000170759 ENSG00000135119 ENSG00000168175 ENSG00000121871 ENSG00000158270 ENSG00000166342 ENSG00000204983 ENSG00000196167 ENSG00000173905 ENSG00000163536 ENSG00000141668 ENSG00000132437 ENSG00000014257 ENSG00000153721 ENSG00000213977 ENSG00000118960 ENSG00000106537 ENSG00000173559 ENSG00000114107 ENSG00000006740 ENSG00000111907 ENSG00000140416 ENSG00000114480 ENSG00000103194 ENSG00000114812 ENSG00000156795 ENSG00000158201 ENSG00000171608 ENSG00000151502 ENSG00000127328 ENSG00000164116 ENSG00000109906 ENSG00000103319 ENSG00000196396 ENSG00000142396 ENSG00000116133 ENSG00000104419 ENSG00000154734 ENSG00000156802 ENSG00000027075 ENSG00000143578 ENSG00000140263 ENSG00000133065 ENSG00000105792 ENSG00000233041 ENSG00000188959 ENSG00000182795 ENSG00000124664 ENSG00000163993 ENSG00000144339 ENSG00000102172 ENSG00000116871 ENSG00000115648 ENSG00000067113 ENSG00000142192 ENSG00000051620 ENSG00000069956 ENSG00000064961 ENSG00000158715 ENSG00000096060 ENSG00000187193 ENSG00000106049 ENSG00000221914 ENSG00000143416 ENSG00000152558 ENSG00000070081 ENSG00000205362 ENSG00000095739 ENSG00000076043 ENSG00000143013 ENSG00000069011 ENSG00000144476 ENSG00000169689 ENSG00000166123 ENSG00000213080 ENSG00000132142 ENSG00000157214 ENSG00000148672 ENSG00000184709 ENSG00000173457 ENSG00000100242 ENSG00000092820 ENSG00000111843 ENSG00000196405 ENSG00000102699 ENSG00000163399 ENSG00000064042 ENSG00000155368 ENSG00000138413 ENSG00000170445 ENSG00000179604 ENSG00000112576 ENSG00000007384 ENSG00000138074 ENSG00000147526 ENSG00000060971 ENSG00000184678 ENSG00000164111 ENSG00000163683 ENSG00000155115 ENSG00000134824 ENSG00000128590 ENSG00000149485 ENSG00000100906 ENSG00000170421 ENSG00000138771 ENSG00000100994 ENSG00000162734 ENSG00000108175 ENSG00000185442 ENSG00000166451 ENSG00000133835 ENSG00000080709 ENSG00000101255 ENSG00000172893 ENSG00000177508 ENSG00000164442 ENSG00000169738 ENSG00000133639 ENSG00000143153 ENSG00000137876 ENSG00000154654 ENSG00000249850 ENSG00000143158 ENSG00000130513 ENSG00000184012 ENSG00000115758 ce1 ce2 cd2 cd1 a1e2 a1e1 a1d2 a1d1

Figure 144: DHT DEGs (parental or siFOXA1)

199 14.1 Box plots

AR unique 4 3 2 1 0 −2

fcC fcA1

Figure 145: AR unique. The bold line shows the location of median. The filled rectangle contains values between 25’th and 75’th percentile. The extremes show locations of minimum and maximum.

AR siFOXA1 unique 3 2 1 0 −2

fcC fcA1

Figure 146: AR siFOXA1 unique. The bold line shows the location of median. The filled rectangle contains values between 25’th and 75’th percentile. The extremes show locations of minimum and maximum.

AR and AR siFOXA1 4 2 0 −2

fcC fcA1

Figure 147: AR and AR siFOXA1. The bold line shows the location of median. The filled rectangle contains values between 25’th and 75’th percentile. The extremes show locations of minimum and maximum.

200 (a) AP1: class=Zipper-Type, family=Leucine Zip- (b) ARID3A: class=Helix-Turn-Helix, family=Arid, (c) Ar: class=Zinc-coordinating, family=Hormone- per, species=9606,10116,10090, type=COMPILED, species=10090, type=SELEX, acc=Q62431, nuclear Receptor, species=10117, type=SELEX, acc=P05412,P01100, id=MA0099 id=MA0151 acc=P15207, id=MA0007

(d) AR (e) ARMotifHH (f) ARMotifH

(g) ARMotifTH (h) ARMotifT (i) ARMotifTT

(j) Arnt::Ahr: class=Zipper-Type, family=Helix- (k) Arnt: class=Zipper-Type, family=Helix-Loop- (l) BRCA1: class=Other, family=Other, Loop-Helix, species=10090, type=SELEX, Helix, species=10090, type=SELEX, acc=P53762, species=9606, type=SELEX, acc=P38398, acc=P30561,P53762, id=MA0006 id=MA0004 id=MA0133

201 (a) Cebpa: class=Zipper-Type, family=Leucine (b) CREB1: class=Zipper-Type, fam- (c) CTCF: class=Zinc-coordinating , Zipper, species=10090,10116, type=COMPILED, ily=, species=10116,9606,10090, family=BetaBetaAlpha-zinc finger, species=9606, id=MA0102 type=COMPILED, acc=P16220, id=MA0018 type=ChiP-seq, acc=P49711, id=MA0139

(d) Ddit3::Cebpa: class=Zipper-Type, fam- (e) E2F1: class=Winged Helix-Turn-Helix, (f) EBF1: class=Zipper-Type, family=Helix- ily=Leucine Zipper, species=10116, type=SELEX, family=, species=9606, type=COMPILED, Loop-Helix, species=10090, type=COMPILED, acc=Q62857,P05554, id=MA0019 acc=NP 005216, id=MA0024 acc=Q07802, id=MA0154

(g) Egr1: class=Zinc-coordinating, (h) ELF5: class=Winged Helix-Turn-Helix, fam- (i) ELK1: class=Winged Helix-Turn-Helix, fam- family=BetaBetaAlpha-zinc finger, species=10090, ily=Ets, species=10090, type=SELEX, id=MA0136 ily=Ets, species=9606, type=SELEX, acc=P19419, type=bacterial 1-hybrid, acc=P08046, id=MA0162 id=MA0028

(j) ELK4: class=Winged Helix-Turn-Helix, fam- (k) En1: class=Helix-Turn-Helix, family=Homeo, (l) ESR1: class=Zinc-coordinating, ily=Ets, species=9606, type=SELEX, acc=P28324, species=10090, type=SELEX, acc=P09065, family=Hormone-nuclear Receptor, species=9606, id=MA0076 id=MA0027 type=ChiP-seq, acc=P03372, id=MA0112

202 (a) ESR2: class=Zinc-coordinating, (b) Esrrb: class=Zinc-coordinating , (c) ETS1: class=Winged Helix-Turn-Helix, family=Hormone-nuclear Receptor, species=9606, family=Hormone-nuclear Receptor, species=10090, family=Ets, species=9606, type=SELEX, type=ChIP-chip, acc=Q92731, id=MA0258 type=ChiP-seq, acc=Q61539, id=MA0141 acc=CAG47050, id=MA0098

(d) Evi1: class=Zinc-coordinating, (e) EWSR1-FLI1: class=Winged Helix-Turn- (f) FEV: class=Winged Helix-Turn-Helix, fam- family=BetaBetaAlpha-zinc finger, species=10090, Helix, family=Ets, species=9606, type=ChiP-seq, ily=Ets, species=10116,9606, type=COMPILED, type=SELEX, acc=AAI39763, id=MA0029 acc=Q9BZD1, id=MA0149 acc=Q99581, id=MA0156

(g) Fos: class=Zipper-Type, family=Leucine Zip- (h) FOXA1: class=Winged Helix-Turn-Helix, (i) FOXA1pAR per, species=10090, type=SELEX, acc=P01101, family=Forkhead, species=9606, type=ChiP-Seq, id=MA0099 acc=P55317, id=MA0148

(j) Foxa2: class=Winged Helix-Turn-Helix, fam- (k) FOXC1: class=Winged Helix-Turn-Helix, (l) FOXD1: class=Winged Helix-Turn-Helix, ily=Forkhead, species=10090, type=ChiP-seq, family=Forkhead, species=9606, type=SELEX, family=Forkhead, species=9606, type=SELEX, acc=P35583, id=MA0047 acc=Q12948, id=MA0032 acc=Q16676, id=MA0031

203 (a) Foxd3: class=Winged Helix-Turn-Helix, (b) FOXF2: class=Winged Helix-Turn-Helix, (c) FOXI1: class=Winged Helix-Turn-Helix, family=Forkhead, species=10116, type=SELEX, family=Forkhead, species=9606, type=SELEX, family=Forkhead, species=9606, type=SELEX, acc=Q63245, id=MA0041 acc=Q12947, id=MA0030 acc=Q12951, id=MA0042

(d) FOXL1: class=Winged Helix-Turn-Helix, (e) FOXO3: class=Winged Helix-Turn- (f) Foxq1: class=Winged Helix-Turn-Helix, family=Forkhead, species=9606, type=SELEX, Helix, family=Forkhead, species=10090,9606, family=Forkhead, species=10116, type=SELEX, acc=Q12952, id=MA0033 type=COMPILED, acc=O43524, id=MA0157 acc=Q63244, id=MA0040

(g) GABPA: class=Winged Helix-Turn-Helix, (h) Gata1: class=Zinc-coordinating , family=GATA, (i) GATA2: class=Zinc-coordinating, family=GATA, family=Ets, species=10090, type=ChiP-seq, species=10090, type=ChiP-seq, acc=P17679, species=9606, type=SELEX, acc=P23769, acc=Q91YY8, id=MA0062 id=MA0035 id=MA0036

(j) GATA3: class=Zinc-coordinating, family=GATA, (k) Gfi: class=Zinc-coordinating, (l) GR species=9606, type=SELEX, acc=P23771, family=BetaBetaAlpha-zinc finger, species=10116, id=MA0037 type=SELEX, acc=Q07120, id=MA0038

204 (a) GRMotifHH (b) GRMotifH (c) GRMotifTH

(d) GRMotifT (e) GRMotifTT (f) Hand1::Tcfe2a: class=Zipper-Type, family=Helix-Loop-Helix, species=10090, type=SELEX, acc=Q64279,P15806, id=MA0092

(g) HIF1A::ARNT: class=Zipper- (h) HLF: class=Zipper-Type, family=Leucine (i) HNF1B: class=Helix-Turn-Helix, fam- Type, family=Helix-Loop-Helix, Zipper, species=9606, type=SELEX, acc=Q16534, ily=Homeo, species=9606,10090, type=COMPILED, species=9606,10090,10117,9986, type=COMPILED, id=MA0043 acc=P35680, id=MA0153 acc=EAW80806,EAW53510, id=MA0259

(j) HOXA5: class=Helix-Turn-Helix, fam- (k) INSM1: class=Zinc-coordinating, (l) IRF1: class=Winged Helix-Turn-Helix, fam- ily=Homeo, species=10090,9606, type=COMPILED, family=BetaBetaAlpha-zinc finger, species=9606, ily=IRF, species=9606, type=SELEX, acc=P10914, acc=P20719, id=MA0158 type=COMPILED, acc=Q01101, id=MA0155 id=MA0050

205 (a) IRF2: class=Winged Helix-Turn-Helix, fam- (b) Klf4: class=Zinc-coordinating, (c) Lhx3: class=Helix-Turn-Helix, family=Homeo, ily=IRF, species=9606, type=SELEX, acc=P14316, family=BetaBetaAlpha-zinc finger, species=10090, species=10090, type=SELEX, acc=P50481, id=MA0051 type=ChiP-seq, acc=Q60793, id=MA0039 id=MA0135

(d) Mafb: class=Zipper-Type, family=Leucine Zip- (e) MAX: class=Zipper-Type, family=Helix-Loop- (f) MEF2A: class=Other Alpha-Helix, fam- per, species=10116, type=SELEX, acc=P54842, Helix, species=9606, type=SELEX, acc=AAH36092, ily=MADS, species=9606, type=SELEX, id=MA0117 id=MA0058 acc=EAX02249, id=MA0052

(g) MIZF: class=Zinc-coordinating, (h) Myb: class=Helix-Turn-Helix, family=Myb, (i) MYC::MAX: class=Zipper-Type, family=Helix- family=BetaBetaAlpha-zinc finger, species=9606, species=10090, type=SELEX, acc=P06876, Loop-Helix, species=9606, type=SELEX, type=SELEX, acc=Q9BQA5, id=MA0131 id=MA0100 acc=AAH36092,Q6LBK7, id=MA0059

(j) Myc: class=Zipper-Type, family=Helix-Loop- (k) Mycn: class=Zipper-Type, family=Helix-Loop- (l) Myf: class=Zipper-Type, family=Helix-Loop- Helix, species=10090, type=ChiP-seq, acc=P01108, Helix, species=10090, type=ChiP-Seq, acc=P03966, Helix, species=9606, type=COMPILED, id=MA0055 id=MA0147 id=MA0104

206 (a) MZF1 1-4: class=Zinc-coordinating, (b) MZF1 5-13: class=Zinc-coordinating, (c) NFATC2: class=Ig-fold, family=Rel, family=BetaBetaAlpha-zinc finger, species=9606, family=BetaBetaAlpha-zinc finger, species=9606, species=10090,9606,10116, type=COMPILED, type=SELEX, acc=P28698, id=MA0056 type=SELEX, acc=P28698, id=MA0057 acc=Q13469, id=MA0152

(d) NFE2L2: class=Zipper-Type, family=Leucine (e) NFIC: class=Other, family=NFI CCAAT- (f) NFIL3: class=Zipper-Type, family=Leucine Zip- Zipper, species=9606, type=COMPILED, binding, species=9606, type=High-throughput SE- per, species=9606, type=SELEX, acc=NP 005375, acc=Q16236, id=MA0150 LEX SAGE, acc=P08651, id=MA0161 id=MA0025

(g) NF-kappaB: class=Ig-fold, family=Rel, (h) NFKB1: class=Ig-fold, family=Rel, (i) NFYA: class=Other Alpha- species=9606,10090,10116,9986, type=COMPILED, species=9606, type=SELEX, acc=P19838, Helix, family=NFY CCAAT-binding, id=MA0061 id=MA0105 species=9606,10090,10116,9031,8355,8364,9913,9986, type=COMPILED, acc=P23511, id=MA0060

(j) NHLH1: class=Zipper-Type, family=Helix-Loop- (k) Nkx2-5: class=Helix-Turn-Helix, family=Homeo, (l) NKX3-1: class=Helix-Turn-Helix, family=Homeo, Helix, species=9606, type=SELEX, acc=Q02575, species=10090, type=SELEX, acc=P42582, species=9606, type=SELEX, acc=Q99801, id=MA0048 id=MA0063 id=MA0124

207 (a) Nkx3-2: class=Helix-Turn-Helix, family=Homeo, (b) Nobox: class=Helix-Turn-Helix, family=Homeo, (c) NR1H2::RXRA: class=Zinc-coordinating, species=10090, type=SELEX, acc=P97503, species=10090, type=SELEX, acc=Q8VIH1, family=Hormone-nuclear Receptor, species=9606, id=MA0122 id=MA0125 type=SELEX, acc=P55055,P19793, id=MA0115

(d) Nr2e3: class=Zinc-coordinating, (e) NR2F1: class=Zinc-coordinating, (f) NR3C1: class=Zinc-coordinating, family=Hormone-nuclear Receptor, species=10090, family=Hormone-nuclear Receptor, species=9606, family=Hormone-nuclear Receptor, type=SELEX, acc=Q9QXZ7, id=MA0164 type=COMPILED, acc=P10589, id=MA0017 species=9606,10090,10116,9031,8022, type=COMPILED, acc=P04150, id=MA0113

(g) NR4A2: class=Zinc-coordinating, (h) Pax2: class=Helix-Turn-Helix, family=Homeo, (i) Pax4: class=Helix-Turn-Helix, family=Homeo, family=Hormone-nuclear Receptor, species=10090, type=SELEX, acc=P32114, species=10090, type=SELEX, acc=P32115, species=10090,10116,9606, type=COMPILED, id=MA0067 id=MA0068 acc=P43354, id=MA0160

(j) Pax5: class=Helix-Turn-Helix, family=Homeo, (k) Pax6: class=Helix-Turn-Helix, family=Homeo, (l) PBX1: class=Helix-Turn-Helix, family=Homeo, species=10090, type=COMPILED, acc=Q02650, species=9606, type=SELEX, acc=P26367, species=9606, type=SELEX, acc=Q5T486, id=MA0014 id=MA0069 id=MA0070

208 (a) Pdx1: class=Helix-Turn-Helix, family=Homeo, (b) PLAG1: class=Zinc-coordinating, (c) Pou5f1: class=Helix-Turn-Helix, family=Homeo, species=10090, type=SELEX, acc=NP 032840, family=BetaBetaAlpha-zinc finger, species=9606, species=10090, type=Chip-seq, acc=P20263, id=MA0132 type=bacterial 1-hybrid, acc=Q6DJT9, id=MA0163 id=MA0142

(d) PPARG: class=Zinc-coordinating, (e) PPARG::RXRA: class=Zinc-coordinating, (f) Prrx2: class=Helix-Turn-Helix, family=Homeo, family=Hormone-nuclear Receptor, species=9606, family=Hormone-nuclear Receptor, species=10090, species=10090, type=SELEX, acc=Q06348, type=SELEX, acc=P37231, id=MA0066 type=ChiP-seq, acc=P37238,P28700, id=MA0065 id=MA0075

(g) RELA: class=Ig-fold, family=Rel, species=9606, (h) REL: class=Ig-fold, family=Rel, species=9606, (i) REST: class=Zinc-coordinating, type=SELEX, acc=Q04206, id=MA0107 type=SELEX, acc=Q04864, id=MA0101 family=BetaBetaAlpha-zinc finger, species=9606, type=Chip-seq, acc=NP 005603, id=MA0138

(j) RORA 1: class=Zinc-coordinating, (k) RORA 2: class=Zinc-coordinating, (l) RREB1: class=Zinc-coordinating, family=Hormone-nuclear Receptor, species=9606, family=Hormone-nuclear Receptor, species=9606, family=BetaBetaAlpha-zinc finger, species=9606, type=SELEX, acc=NP 599023, id=MA0071 type=SELEX, acc=NP 599022, id=MA0072 type=SELEX, acc=Q92766, id=MA0073

209 (a) RUNX1: class=Ig-fold, family=Runt, (b) RXRA::VDR: class=Zinc-coordinating, (c) RXR::RAR DR5: class=Zinc-coordinating, species=10090, type=ChiP-seq, acc=Q01196, family=Hormone-nuclear Receptor, species=9606, family=Hormone-nuclear Receptor, species=9606, id=MA0002 type=SELEX, acc=P19793,P11473, id=MA0074 type=COMPILED, acc=P10276,P19793, id=MA0159

(d) SOX10: class=Other Alpha-Helix, family=High (e) Sox17: class=Other Alpha-Helix, family=High (f) : class=Other Alpha-Helix, family=High Mobility Group, species=10090,9606,10116, Mobility Group, species=10090, type=SELEX, Mobility Group, species=10090, type=ChiP-seq, type=COMPILED, acc=P56693, id=MA0442 acc=Q61473, id=MA0078 acc=P48432, id=MA0143

(g) Sox5: class=Other Alpha-Helix, family=High (h) SOX9: class=Other Alpha-Helix, family=High (i) SP1: class=Zinc-coordinating, Mobility Group, species=10090, type=SELEX, Mobility Group, species=9606, type=SELEX, family=BetaBetaAlpha-zinc finger, acc=P35710, id=MA0087 acc=P48436, id=MA0077 species=9606,10090,10116, type=COMPILED, acc=P08047, id=MA0079

(j) SPI1: class=Winged Helix-Turn-Helix, (k) SPIB: class=Winged Helix-Turn-Helix, fam- (l) Spz1: class=Other, family=Other, species=10090, family=Ets, species=9606, type=COMPILED, ily=Ets, species=9606, type=SELEX, acc=Q01892, type=SELEX, acc=AAK15458, id=MA0111 acc=P17947, id=MA0080 id=MA0081

210 (a) SRF: class=Other Alpha-Helix, family=MADS, (b) SRY: class=Other Alpha-Helix, family=High (c) STAT1: class=Ig-fold, family=Stat, species=9606, type=SELEX, acc=P11831, Mobility Group, species=9606, type=SELEX, species=9606, type=ChiP-seq, acc=Q53XW4, id=MA0083 acc=Q05066, id=MA0084 id=MA0137

(d) Stat3: class=Ig-fold, family=Stat, (e) Tal1::Gata1: class=Zipper-Type , family=Helix- (f) TAL1::TCF3: class=Zipper-Type, family=Helix- species=10090, type=ChiP-seq, acc=P42227, Loop-Helix, species=10090, type=ChiP-seq, Loop-Helix, species=9606, type=SELEX, id=MA0144 acc=P22091,P17679, id=MA0140 acc=P17542,P15923, id=MA0091

(g) TBP: class=Beta-sheet, family=TATA-binding, (h) Tcfcp2l1: class=Other, family=CP2, (i) TEAD1: class=Helix-Turn-Helix, family=Homeo, species=, id=MA0108 species=10090, type=ChiP-seq, acc=Q3UNW5, species=9606, type=COMPILED, acc=P28347, id=MA0145 id=MA0090

(j) TFAP2A: class=Zipper-Type, family=Helix- (k) TLX1::NFIC: class=Helix-Turn-Helix::Other, (l) T: class=Beta-Hairpin-Ribbon, family=T, Loop-Helix, species=9606, type=SELEX, family=Homeo::-CCAAT- species=10090, type=SELEX, acc=P20293, acc=P05549, id=MA0003 binding, species=9606, type=SELEX, id=MA0009 acc=P31314,NP 995315.1, id=MA0119

211 (a) TP53: class=Zinc-coordinating, family=Loop- (b) USF1: class=Zipper-Type, family=Helix-Loop- (c) YY1: class=Zinc-coordinating, Sheet-Helix, species=9606, type=SELEX, Helix, species=9606, type=SELEX, acc=P22415, family=BetaBetaAlpha-zinc finger, species=9606, acc=P04637, id=MA0106 id=MA0093 type=COMPILED, acc=P25490, id=MA0095

(d) Zfp423: class=Zinc-coordinating, (e) Zfx: class=Zinc-coordinating, (f) ZNF354C: class=Zinc-coordinating, family=BetaBetaAlpha-zinc finger, species=10116, family=BetaBetaAlpha-zinc finger, species=10090, family=BetaBetaAlpha-zinc finger, species=9606, type=SELEX, acc=O08961, id=MA0116 type=ChiP-seq, acc=P17012, id=MA0146 type=SELEX, acc=Q86Y25, id=MA0130

Figure 148: Motifs in use

212 15 System configuration

Pipeline configuration

213 References

[1] M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, and J. T. Eppig. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25(1):25–29, 2000. [2] M. Kanehisa, S. Goto, M. Furumichi, M. Tanabe, and M. Hirakawa. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic acids research, 2009.

214